Pluricentric languages in speech technology: Interspeech 2019 Workshop

Satellite Workshop at Interspeech 2019:
Pluricentric Languages in Speech Technology

Final Programme -Book of abstracts
Key note speech
Saturday, September 14, 2019


Opening ceremony - Welcome address by the organizing committee

Morning session 1: Chair:  Barbara Schuppler

09.15 - 09.30

1. Muhr R.:  Some fundamentals of pluricentric theory.


2. Keynote Speech: Adda-Decker M.: Variation in spoken pluricentric languages: insights from large corpora and challenges for speech technology


3. Qasim M., Habib T., Mumtaz B., and Urooj S.: Speech emotion recognition for Urdu language


Coffee break

Morning session 2: Chair:  György Szaszák


4. Niebuhr. O., Brem A., Tegtmeier S., Fischer K., Michalsky J., and Sydow A.: Research and development perspectives based on corpus analyses, automatic assessment tools, and speaker-specific effects


5. El Zarka D. and Hödl P.: Topic or Focus: Do Egyptians interpret prosodic differences in terms of information structure?


6. Ludusan B. and Schuppler B.: Automatic detection of prosodic boundaries in two varieties of German


Lunch break

Afternoon-Session 1: Chair: Tania Habib


7. Miller C.: Accommodating pluricentrism in speech technology


8. Szaszák G. and Pierucci P.:  Accent adaptation of ASR acoustic models: shall we make it really so complicated?


9. Chakraborty J., Saramah P., and Vijaya S.: Speech recognition and dialect identification systems for Bangladeshi and Indian varieties of Bangla

14.30 14.50

10. Whettam D., Gargett A., and Dethlefs N.: Cross-dialect speech processing


Coffee break

Afternoon-Session 2: Chair: Corey Miller


11. Gorisch J.:  Challenges in widening the transcription bottleneck


12. Wu Y., Lamel, L., and Adda-Decker M.: Variation in pluricentric Mandarin using large corpus


13. Sinha S., Bansal S., and Agrawal S. S.: Acoustic phonetic convergence and divergence between Hindi spoken in India and Nepal

Panel Discussion: Chair: Rudolf Muhr

16.30– 16.45

14. Cucchiarini C.: Introduction to the panel discussion

16.45– 17.30

15. Panel discussion: The role of pluricentricity for speech technology, and the role of speech technology for pluricentric languages.

Invited panel participants: Catia Cucchiarini (Dutch Language Union/ Radboud University Nijmegen), Juraj Šimko (University of Helsinki), Michael Stadtschnitzer (Fraunhofer Institute, IAIS), Andrej Žgank (University of Maribor), Shyam Agrawal (Gurgaon, India)


    Keynote Speech
    Variation in spoken pluricentric languages:
    insights from large corpora and challenges for speech technology
    Martine Adda Decker

    The Laboratory of Phonetics and Phonology (LPP, Paris)






        The term 'pluricentric language' refers to languages that are shared by, and have official roles, in more than one country. A major difference between pluricentric languages as compared to other regional varieties lies in their official status level more than in objective and ascertainable linguistic features.Research in automatic speech processing started with a focus on the major languages in the world, which tend to be pluricentric (English, French, German, Spanish, Mandarin, Arabic...) and has the aim of developing high-performance technologies, be they text-to-speech synthesis, automatic speech transcription and translation, information retrieval, dialog systems, chatbots... These technologies work best if language-specific resources are available in abundance, for example high-coverage lexica and pronunciation dictionaries, large corpora including written material and spoken recordings. A further facilitating factor is that the country policy actively supports NLP and speech processing research and development in its language(s). As a consequence, dominant varieties for which there tends to be the largest amount of resources and the strongest national support, give rise to the best performing speech technologies, thus reinforcing their norm-setting power with respect to non-dominant varieties. Thus, there is a risk for non-dominant varieties to have their different codified standards overlooked.  However, in recent years, porting speech technologies to non-dominant varieties of pluricentric languages has been the subject of increasing attention, and there has been growing attention oriented towards some of the less documented oral languages. These efforts produce as by-products new language resources thus providing challenging opportunities for both improved technologies and numerous linguistic studies.In this talk I will give an overview of ongoing efforts in research and speech technology development to deal with pluricentric languages. As my main research interests are in pronunciation variation across languages and speaking styles, I will develop this latter issue in more detail taking examples from pluricentric languages.Martine Adda-Decker is a French CNRS researcher since 1990. After more than 20 years of research in multilingual speech recognition with the Spoken Language Processing group at LIMSI-CNRS (Orsay), she joined the Laboratory of Phonetics and Phonology (LPP, Paris) in 2010.  Her research interests go to man-machine communication, language and accent identification, multilingual speech recognition, pronunciation variants, corpus phonetics and phonology, and large corpus-based studies.  Martine Adda-Decker has authored or co-authored over 150 peer-reviewed articles in the field. She is currently vice-president of the French-speaking Speech Communication Association (AFCP), which is one of the  ISCA Special Interest Groups.