WoS İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14627/6

Browse

Search Results

Now showing 1 - 3 of 3
  • Article
    Citation - WoS: 1
    Citation - Scopus: 2
    Cochleogram-Based Speech Emotion Recognition With the Cascade of Asymmetric Resonators With Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines
    (Mdpi, 2025) Parlak, Cevahir
    Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.
  • Article
    Citation - Scopus: 1
    A Quest for Formant-Based Compact Nonuniform Trapezoidal Filter Banks for Speech Processing With Vgg16
    (Springer Birkhauser, 2024) Parlak, Cevahir; Altun, Yusuf
    In this text, we discuss the filter banks used for speech analysis and propose a novel filter bank for speech processing applications. Filter banks are building blocks of speech processing applications. Multiple filter strategies have been proposed, including Mel, PLP, Seneff, Lyon, and Gammatone filters. MFCC is a transformed version of Mel filters and is still a state-of-the-art method for speech recognition applications. However, 40 years after their debut, time is running out to launch new structures as novel speech features. The proposed acoustic filter banks (AFB) are innovative alternatives to dethrone Mel filters, PLP filters, and MFCC features. Foundations of AFB filters are based on the formant regions of vowels and consonants. In this study, we pioneer an acoustic filter bank comprising 11 frequency regions and conduct experiments using the VGG16 model on the TIMIT and Speech Command V2 datasets. The outcomes of the study concretely indicate that MFCC, Mel, and PLP filters can effectively be replaced with novel AFB filter bank features.
  • Article
    Citation - WoS: 11
    Citation - Scopus: 23
    Improving Yolo Detection Performance of Autonomous Vehicles in Adverse Weather Conditions Using Metaheuristic Algorithms
    (Mdpi, 2024) Ozcan, Ibrahim; Altun, Yusuf; Parlak, Cevahir
    Despite the rapid advances in deep learning (DL) for object detection, existing techniques still face several challenges. In particular, object detection in adverse weather conditions (AWCs) requires complex and computationally costly models to achieve high accuracy rates. Furthermore, the generalization capabilities of these methods struggle to show consistent performance under different conditions. This work focuses on improving object detection using You Only Look Once (YOLO) versions 5, 7, and 9 in AWCs for autonomous vehicles. Although the default values of the hyperparameters are successful for images without AWCs, there is a need to find the optimum values of the hyperparameters in AWCs. Given the many numbers and wide range of hyperparameters, determining them through trial and error is particularly challenging. In this study, the Gray Wolf Optimizer (GWO), Artificial Rabbit Optimizer (ARO), and Chimpanzee Leader Selection Optimization (CLEO) are independently applied to optimize the hyperparameters of YOLOv5, YOLOv7, and YOLOv9. The results show that the preferred method significantly improves the algorithms' performances for object detection. The overall performance of the YOLO models on the object detection for AWC task increased by 6.146%, by 6.277% for YOLOv7 + CLEO, and by 6.764% for YOLOv9 + GWO.