A Speech Emotion Recognition Model Based on Multi-Level Local Binary and Local Ternary Patterns

dc.authoridVarol, Asaf/0000-0003-1606-4079en_US
dc.authoridSönmez, Yeşim Ülgen/0000-0002-2090-0263en_US
dc.contributor.authorSönmez, Yeşim Ülgen
dc.contributor.authorVarol, Asaf
dc.date.accessioned2024-07-12T21:37:58Z
dc.date.available2024-07-12T21:37:58Z
dc.date.issued2020en_US
dc.department[Belirlenecek]en_US
dc.description.abstractInterpreting a speech signal is quite challenging because it consists of different frequencies and features that vary according to emotions. Although different algorithms are being developed in the speech emotion recognition (SER) domain, the success rates vary according to the spoken languages, emotions, and databases. In this study, a new lightweight effective SER method has been developed that has low computational complexity. This method, called 1BTPDN, is applied on RAVDESS, EMO-DB, SAVEE, and EMOVO databases. First, low-pass filter coefficients are obtained by applying a one-dimensional discrete wavelet transform on the raw audio data. The features are extracted by applying textural analysis methods, a one-dimensional local binary pattern, and a one-dimensional local ternary pattern to each filter. Using neighborhood component analysis, the most dominant 1024 features are selected from 7680 features while the other features are discarded. These 1024 features are selected as the input of the classifier which is a third-degree polynomial kernel-based support vector machine. The success rates of the 1BTPDN reached 95.16% 89.16%, 76.67%, and 74.31%; in the RAVDESS, EMO-DB, SAVEE, and EMOVO databases, respectively. The recognition rates are higher compared to many textural, acoustic, and deep learning state-of-the-art SER methods.en_US
dc.identifier.doi10.1109/ACCESS.2020.3031763
dc.identifier.endpage190796en_US
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-85102863932en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage190784en_US
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2020.3031763
dc.identifier.urihttps://hdl.handle.net/20.500.12415/7001
dc.identifier.volume8en_US
dc.identifier.wosWOS:000584839800001en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoenen_US
dc.publisherIeee-Inst Electrical Electronics Engineers Incen_US
dc.relation.ispartofIeee Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.snmzKY04343
dc.subjectFeature Extractionen_US
dc.subjectTime-Frequency Analysisen_US
dc.subjectClassification Algorithmsen_US
dc.subjectDatabasesen_US
dc.subjectTransformsen_US
dc.subjectSupport Vector Machinesen_US
dc.subjectDiscrete Wavelet Transformen_US
dc.subjectLocal Binary Patternen_US
dc.subjectLocal Ternary Patternen_US
dc.subjectNeighborhood Component Analysisen_US
dc.subjectSpeech Emotion Recognitionen_US
dc.titleA Speech Emotion Recognition Model Based on Multi-Level Local Binary and Local Ternary Patternsen_US
dc.typeArticle
dspace.entity.typePublication

Dosyalar