Cochleogram-Based Speech Emotion Recognition With the Cascade of Asymmetric Resonators With Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines

dc.authorscopusid 55807221400
dc.contributor.author Parlak, Cevahir
dc.date.accessioned 2025-04-11T19:30:54Z
dc.date.available 2025-04-11T19:30:54Z
dc.date.issued 2025
dc.department Fenerbahçe University en_US
dc.department-temp [Parlak, Cevahir] Fenerbahce Univ, Fac Engn, Dept Comp Engn, TR-34758 Istanbul, Turkiye en_US
dc.description.abstract Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications. en_US
dc.description.woscitationindex Science Citation Index Expanded
dc.identifier.doi 10.3390/biomimetics10030167
dc.identifier.issn 2313-7673
dc.identifier.issue 3 en_US
dc.identifier.pmid 40136820
dc.identifier.scopus 2-s2.0-105001340992
dc.identifier.scopusquality Q3
dc.identifier.uri https://doi.org/10.3390/biomimetics10030167
dc.identifier.uri https://hdl.handle.net/20.500.14627/896
dc.identifier.volume 10 en_US
dc.identifier.wos WOS:001452951100001
dc.identifier.wosquality Q1
dc.institutionauthor Parlak, Cevahir
dc.language.iso en en_US
dc.publisher Mdpi en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.scopus.citedbyCount 0
dc.subject Speech Emotion Recognition en_US
dc.subject Cascade Of Asymmetric Resonators en_US
dc.subject Deep Neural Networks en_US
dc.subject Support Vector Machines en_US
dc.subject Speech Filter Banks en_US
dc.title Cochleogram-Based Speech Emotion Recognition With the Cascade of Asymmetric Resonators With Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines en_US
dc.type Article en_US
dc.wos.citedbyCount 0
dspace.entity.type Publication

Files