Cochleogram-Based Speech Emotion Recognition With the Cascade of Asymmetric Resonators With Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines

dc.authorscopusid55807221400
dc.contributor.authorParlak, Cevahir
dc.date.accessioned2025-04-11T19:30:54Z
dc.date.available2025-04-11T19:30:54Z
dc.date.issued2025
dc.departmentFenerbahçe Universityen_US
dc.department-temp[Parlak, Cevahir] Fenerbahce Univ, Fac Engn, Dept Comp Engn, TR-34758 Istanbul, Turkiyeen_US
dc.description.abstractFeature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.3390/biomimetics10030167
dc.identifier.issn2313-7673
dc.identifier.issue3en_US
dc.identifier.pmid40136820
dc.identifier.scopus2-s2.0-105001340992
dc.identifier.scopusqualityQ3
dc.identifier.urihttps://doi.org/10.3390/biomimetics10030167
dc.identifier.urihttps://hdl.handle.net/20.500.14627/896
dc.identifier.volume10en_US
dc.identifier.wosWOS:001452951100001
dc.identifier.wosqualityQ1
dc.institutionauthorParlak, Cevahir
dc.language.isoenen_US
dc.publisherMdpien_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectSpeech Emotion Recognitionen_US
dc.subjectCascade Of Asymmetric Resonatorsen_US
dc.subjectDeep Neural Networksen_US
dc.subjectSupport Vector Machinesen_US
dc.subjectSpeech Filter Banksen_US
dc.titleCochleogram-Based Speech Emotion Recognition With the Cascade of Asymmetric Resonators With Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machinesen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files