Rice University Department of Mathematics Colloquium

Scaling up language technology for the next 1000 languages

4:00 pm Thursday, October 26, 2017
Kevin Scannell (Saint Louis University)

Abstract: Over the last 25 years we have seen incredible advances in the performance of end-user language technologies such as speech recognition and machine translation. However, almost all of the research and engineering effort to date has been expended on 100 or so languages, primarily those of greatest commercial interest: English, French, Chinese, Japanese, German, etc. We'll begin by explaining some of the mathematical models underlying this work, focusing on language modeling as a relatively simple but foundational technology. We'll move on to discuss some of the ways existing models fall short for non-Indo-European languages, the difficulties faced by small language groups in terms of resource-building, and our efforts to overcome some of these difficulties for the next thousand (or more) languages.

