Browsing by Author "Parlak, Cevahir"

Now showing 1 - 4 of 4

Cochleogram-Based Speech Emotion Recognition With the Cascade of Asymmetric Resonators With Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines
(Mdpi, 2025) Parlak, Cevahir
Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.
Citation - WoS: 6
Citation - Scopus: 11
Improving Yolo Detection Performance of Autonomous Vehicles in Adverse Weather Conditions Using Metaheuristic Algorithms
(Mdpi, 2024) Ozcan, Ibrahim; Parlak, Cevahir; Altun, Yusuf; Parlak, Cevahir; Bilgisayar Mühendisliği Bölümü
Despite the rapid advances in deep learning (DL) for object detection, existing techniques still face several challenges. In particular, object detection in adverse weather conditions (AWCs) requires complex and computationally costly models to achieve high accuracy rates. Furthermore, the generalization capabilities of these methods struggle to show consistent performance under different conditions. This work focuses on improving object detection using You Only Look Once (YOLO) versions 5, 7, and 9 in AWCs for autonomous vehicles. Although the default values of the hyperparameters are successful for images without AWCs, there is a need to find the optimum values of the hyperparameters in AWCs. Given the many numbers and wide range of hyperparameters, determining them through trial and error is particularly challenging. In this study, the Gray Wolf Optimizer (GWO), Artificial Rabbit Optimizer (ARO), and Chimpanzee Leader Selection Optimization (CLEO) are independently applied to optimize the hyperparameters of YOLOv5, YOLOv7, and YOLOv9. The results show that the preferred method significantly improves the algorithms' performances for object detection. The overall performance of the YOLO models on the object detection for AWC task increased by 6.146%, by 6.277% for YOLOv7 + CLEO, and by 6.764% for YOLOv9 + GWO.
Konuşma Duygu Tanıma Uygulamalarında Hiper Parametre Optimizasyonu ile Derin Öğrenme Metotlarının Geliştirilmesi
(2024) Parlak, Cevahir
Bu çalışmada derin öğrenme uygulamalarında oldukça yeni ve önemli bir aşama olan hiper parametre ayarlama metotlarının bir karşılaştırılması verilecektir. Veriseti olarak yeni duygu verisetlerinden NEMO duygusal konuşma veriseti kullanılacak olup, KerasTuner ile CNN, LSTM ve DNN modelleri Rassal arama, Hiperkomite ve Bayesçi optimizasyon metotları kullanılarak karşılaştırılacaktır. Genel olarak makine öğrenmesi ve özellikle de derin öğrenme uygulamalarında başarılı bir model üretebilmek zaman ve hesaplama gücü açısından oldukça pahalı ve zorlu bir işlem olarak araştırmacıların karşısına çıkmaktadır. Hiper parametre optimizasyonunun genel olarak iki temel aşamadan oluştuğu kabul edilebilir. Birinci aşamada öncelikle değişkenlik gösteren parametrelerin alabilecekleri değerlere dayalı bir arama uzayı belirlenir. Bu parametreler öğrenme katsayısı, nöron sayısı, katman sayısı, aktivasyon fonksiyonu ve benzeri değişkenler olabilir. İkinci aşama ise bu parametreleri kullanarak yapay zekâ modellerini oluşturur ve belirlenen bir başarı kriterine göre test eder. Optimizör bu modelleri çalıştırırken işlemi hızlandırmak için değişik algoritmalar kullanabilir. Hiper parametre optimizasyon uygulamaları bu konuda gün geçtikçe daha iyi çözümler sunmakta ve insan faktörünü kademeli olarak aradan çıkarmaktadırlar. Izgara arama mevcut bütün konfigürasyonları bütün kaynakları sonuna kadar tüketerek çalıştırırken, Rasgele arama ise mevcut kümeden tesadüfi olarak seçilen belli konfigürasyonları dener. Rassal arama her ne kadar bütün olası konfigürasyonları denemese bile genellikle Izgara aramaya yakın sonuçlar üretebilmektedir. Ardışık arama, Asenkron Ardışık arama, Populasyon-Tabanlı Eğitim, Hiperkomite ve Bayesçi yaklaşımlarda diğer hiper parametre optimizasyon metotları arasında sayılabilir. Bu çalışmada NEMO konuşma duygu veriseti 4 duygu ile CNN, LSTM ve DNN derin öğrenme sınıflandırıcılarıyla çalıştırılmış ve KerasTuner’in Rassal Arama, Bayesçi Arama ve Hiperkomite Arama metotlarıyla otomatik üretilen metotların performansları karşılaştırılmıştır. Hiper parametre optimizasyon metotlarından Bayesçi Optimizasyon metodunun diğerlerine göre daha iyi ve hızlı sonuçlar ürettiği görülmüştür.
Citation - Scopus: 1
A Quest for Formant-Based Compact Nonuniform Trapezoidal Filter Banks for Speech Processing With Vgg16
(Springer Birkhauser, 2024) Parlak, Cevahir; Altun, Yusuf; Bilgisayar Mühendisliği Bölümü
In this text, we discuss the filter banks used for speech analysis and propose a novel filter bank for speech processing applications. Filter banks are building blocks of speech processing applications. Multiple filter strategies have been proposed, including Mel, PLP, Seneff, Lyon, and Gammatone filters. MFCC is a transformed version of Mel filters and is still a state-of-the-art method for speech recognition applications. However, 40 years after their debut, time is running out to launch new structures as novel speech features. The proposed acoustic filter banks (AFB) are innovative alternatives to dethrone Mel filters, PLP filters, and MFCC features. Foundations of AFB filters are based on the formant regions of vowels and consonants. In this study, we pioneer an acoustic filter bank comprising 11 frequency regions and conduct experiments using the VGG16 model on the TIMIT and Speech Command V2 datasets. The outcomes of the study concretely indicate that MFCC, Mel, and PLP filters can effectively be replaced with novel AFB filter bank features.