Sharing of Language Resources for Research on Cantonese Linguistics
Dr CHIN Chi On Andy (Assistant Professor of Department of Linguistics and Modern Language Studies)
Cantonese is spoken as the first language by nearly 90% of Hong Kong’s population. This vernacular language however has not developed a formal and proper writing system. The study of Cantonese thus poses a lot of challenges and has to rely on spoken data.
Currently, we have a relatively rich source of Cantonese language materials published between the 19th and mid-20th centuries. These materials allow us to reconstruct the Cantonese language of about 200 years ago, and reveal a number of significant linguistic changes that took place around the 1940s. Unfortunately, there is a critical gap in the provision of comparable language data after the 1950s, both qualitatively and quantitatively, for further examining the nature and mechanism of these linguistic changes.
To bridge this gap in diachronic research of Cantonese, an initial attempt was thus made, under the support of an Internal Research Grant, to construct an annotated corpus by transcribing one type of authentic and natural spoken data that has not received serious attention in previous Cantonese linguistics research - early Cantonese movies (generally known as 粵語長片) produced in Hong Kong between the 1950s and the 1970s. Considering the production practice of Hong Kong movies of that time, the spoken data in early Cantonese movies can be claimed to largely reflect and represent the actual use of the language of the period concerned.
The transcribed corpus data has been converted into an online search engine for sharing with other scholars having similar research interest. Since its launching in April 2012, the online system has a total of 250 local and overseas registered users and about 80,000 visits. This corpus was first introduced at The 16th International Conference on Yue Dialects held at the Polytechnic University of Hong Kong in December 2011, then The Workshop on Innovations in Cantonese Linguistics held at the Ohio State University in March 2012, as well as Cantonese Linguistics courses offered at local universities.
This corpus can be further developed by including more language data and annotation at linguistic and non-linguistic levels so that it could benefit research in other disciplines, such as discourse analysis, pragmatics, conversation analysis, language and gender, as well as the inter-relationship between language, society and culture of mid-20th century Hong Kong.
Please click here to download the presentation PDF.