The Hong Kong Bilingual Child Language Corpus
The Hong Kong Bilingual Child Language Corpus contains longitudinal speech data of six bilingual children exposed to Cantonese and English from birth. These children grew up in a one parent-one language environment where each parent is a native speaker of the respective language (see Table 1). The subjects' development in both languages was observed and recorded at weekly or bi-weekly intervals for a period of one to two and a half years. On the average, each recording session consisted of an hour of audio and in some cases video recordings of the children engaged in their daily activities such as playing, reading and role playing. The subjects were encouraged to speak in Cantonese for half an hour and in English for half an hour.
The bilingual data were collected as part of two projects funded by grants from the Hong Kong Research Grants Council: (1) RGC ref. no. HKU336/94H to Stephen Matthews (University of Hong Kong), Virginia Yip (Chinese University of Hong Kong) and Huang Yue-Yuan (Hong Kong Baptist University) and (2) RGC ref. no. CUHK4002/97H to Virginia Yip and Stephen Matthews. The subjects investigated in the first project "The Development of Bilingual Competence in Hong Kong Children" include Timmy, Kathryn and Llywelyn while those in the second project "A Cantonese-English Bilingual Child Language Corpus" include Sophie and Charlotte. The corpus data of five children are now deposited at the CHILDES (Child Language Data Exchange System) archive and can be retrieved from the "Database" folder (Under "Zipped Transcripts" click "Bilinguals" and then download YipMatthews.zip ). You can also download the raw audio and video files.
The corpus is now available in XML format. With this new "language" of the world-wide web, you can view our corpus directly by the web browsers such as Internet Explorer or Netscape. XML format also supports quicktime streaming audio and video linked transcripts.
Please go to YipMatthews Browsable transcripts , YipMatthews Linked Browsable Audio and
YipMatthews Linked Browsable Video and take a look at our online corpus data.
Table 1 Subject Information
Native language of parents
Age span during longitudinal study
Name Date of Birth Mother Father 93.5.21 Cantonese English 1;05.20 - 3;06.25 92.1.23 English Cantonese 2;09.23 - 4;06.07 Llywelyn 93.6.21 Cantonese English 1;06.00 - 3;05.28 96.2.28 Cantonese English 1;06.00 - 4;00.00 96.8.23 Cantonese English 1;05.10 - 3;06.14
Background on Cantonese: the language and its speakers
Tagging of English and Cantonese data
This page was created and updated by Uta Lam on 25 March, 2004.