Prof. Benjamin K. Tsou
City University of Hong Kong
Hong Kong University of Science and Technology
Some Questions on the Lexicon in Future Natural Language Processing
Notable recent advances in Natural Language Processing (NLP) have benefited from many spectacular improvements in Machine Learning and its applications in Neural Machine Translation. However there remain certain questions which will continue to challenge on-going and future developments in NLP. These include Out Of Vocabulary (OOV) words which have been baffling and bewildering humans and machines alike, especially in NLP and Machine Translation. They would persist simply because scientific and technological developments have always outpaced developments in lexicography and lexicology. There are two major factors which could contribute to the mounting urgency to cope with the persistent issues: 1) The rapid and unprecedented developments in science and technology and so the accumulations of new knowledge as well as the words to manage it, 2) The concomitant needs to mediate between different linguistic traditions and specialized domains for purposes of global trade and international relations.
In this paper we propose to examine three questions: 1. What may be a practical upper bound on the size of the operational human lexicon which the machine and human speaker could be expected to deal with in some given context? The suggestion has been raised in NLP circles that dynamic processing is more useful than static recovery in the processing of vocabulary (Church 2020). 2. Given the time lag between scientific and technological developments and lexicography, what are some ways, by which we can usefully begin to deal with this problem? 3. We propose to examine some of these issues through the case of the Chinese language and to account for its vocabulary development over a period of 20-years by means of a dynamically maintained Big Database, and also the multiple renditions of old and new technical terms, which have appeared within a 10-year period of accumulated English and Chinese patents.
In the process we hope to touch on not just practical issues in computational linguistics and NLP but also broader issues on the fundamental link between the lexicon and the knowledge base of human beings.
Introduction to Prof. Benjamin K. Tsou
Benjamin Tsou is an Emeritus Chair Professor at the City University of Hong Kong and an Adjunct Professor at Hong Kong University of Science and Technology. He worked on machine translation at M.I.T. in Professor Victor Yngve’s Machine Translation Group and later headed the Chinese-English machine translation project at UC Berkeley. His interests have since focused on the rigorous cultivation of quality linguistic data for applications in NLP and digital humanities. In 1995, he began cultivating the pan-Chinese media corpus, LIVAC（https://en.wikipedia.org/wiki/LIVAC_Synchronous_Corpus and in 2019 his group produced a prototype platform (Patentlex (http://patentlex.chilin.hk/) to alleviate lexical gap deficiency problems involving technical terms in NLP applications. It is based on a corpus of 300+K comparable Chinese and English patents curated over more than 10 years. The derived bilingually aligned sentences have been useful for many applicants, including its use in the two pioneering NTCIR Chinese-English patent machine translation competitions held in Tokyo in 2009 and 2010. The further extraction and accumulation of technical terms in the Patentlex platform have won second place in the Game Changer Competition organized by TAUS in Singapore in 2019.
He completed his Master’s degree at Harvard University and Ph.D. at UC Berkeley and founded the Research Centre on Language Information Sciences at the City University of Hong Kong and was the founding Chiang Chen Chair Professor of Linguistics and Language Sciences at the Hong Kong Institute of education. He is also the Founding President of the Asian Federation of Natural Language Processing, and an Academician [Académie Royale des Sciences d’Outre-Mer (Belgium)]. His extensive publications are focused on quantitative and qualitative linguistic analysis, and NLP.
Prof. Sergei Gorlatch
University of Muenster, Germany
Speech title: Distributed Software Applications Based on Mobile Cloud and Software-Defined Networks
Abstract: We consider an emerging class of challenging software applications called Real-Time Online Interactive Applications (ROIA). ROIA are networked applications connecting a potentially very high number of users who interact with the application and with each other in real-time, i.e., a response to a user’s action happens virtually immediately. Typical representatives of ROIA are multiplayer online computer games, advanced simulation-based e-learning and serious gaming. All these applications are characterized by high performance and QoS requirements, such as: short response times to user inputs (about 0.1-1.5 s); frequent state updates (up to 100 Hz); large and frequently changing numbers of users in a single application instance (up to tens of thousands simultaneous users). This talk will address two challenging aspects of software for future Internet-based ROIA applications: a) using Mobile Cloud Computing for allowing high application performance when a ROIA application is accessed from multiple mobile devices, and b) managing dynamic QoS requirements of ROIA applications by employing the emerging technology of Software-Defined Networking (SDN).
Introduction to Prof. Sergei Gorlatch:
Sergei Gorlatch is a Full Professor of Computer Science at the University of Muenster (Germany) since 2003. Earlier he was an Associate Professor at the Technical University of Berlin, Assistant Professor at the University of Passau, and Humboldt Research Fellow at the Technical University of Munich, all in Germany. Prof. Gorlatch has more than 200 peer-reviewed publications in renowned international books, journals and conferences. He was a principal investigator in several international research and development projects in the field of software for parallel, distributed, Grid and Cloud systems and networking, funded by the European Community and by German national bodies.
Prof. Roberto Navigli
Sapienza University of Rome, Italy
Speech Title: What's new in multilingual sense embeddings, Word Sense Disambiguation and Semantic Role Labeling
Abstract: Natural Language Processing has seen an explosion of interest in recent years, with many industrial applications relying on key technological developments in the field. However, Natural Language Understanding (NLU) – which requires the machine to get beyond processing strings and involves a deep, semantic level – is particularly challenging due to the pervasive ambiguity of language.
In this talk I will present recent research at the Sapienza NLP group on multilingual NLU, including work on new multilingual sense embeddings, and novel neural approaches to word sense disambiguation and semantic role labeling which scale across languages easily and achieve state-of-the-art performance thanks to the integration of deep learning and explicit knowledge.
Introduction to Prof. Roberto Navigli:Roberto Navigli is a Professor of Computer Science at the Sapienza University of Rome, where he heads the Sapienza NLP group. He was awarded two ERC grants in computer science, namely an ERC Starting Grant on multilingual word sense disambiguation (2011-2016) and an ERC Consolidator Grant on multilingual language- and syntax-independent open-text unified representations (2017-2022). He was also a co-PI of a Google Focused Research Award on NLU. In 2015 he received the META prize for groundbreaking work in overcoming language barriers with BabelNet, a project also highlighted in The Guardian and Time magazine, and winner of the Artificial Intelligence Journal prominent paper award 2017. He is the co-founder of Babelscape, a successful spin-off company that enables Natural Language Understanding in dozens of languages.