The Hong Kong Bilingual Child Language Corpus

The Hong Kong Bilingual Child Language Corpus contains longitudinal speech data of six bilingual children exposed to Cantonese and English from birth. These children grew up in a one parent-one language environment where each parent is a native speaker of the respective language (see Table 1). The subjects' development in both languages was observed and recorded at weekly or bi-weekly intervals for a period of one to two and a half years. On the average, each recording session consisted of an hour of audio and in some cases video recordings of the children engaged in their daily activities such as playing, reading and role playing. The subjects were encouraged to speak in Cantonese for half an hour and in English for half an hour.

The bilingual data were collected as part of two projects funded by grants from the Hong Kong Research Grants Council: (1) RGC ref. no. HKU336/94H to Stephen Matthews (University of Hong Kong), Virginia Yip (Chinese University of Hong Kong) and Huang Yue-Yuan (Hong Kong Baptist University) and (2) RGC ref. no. CUHK4002/97H to Virginia Yip and Stephen Matthews. The subjects investigated in the first project "The Development of Bilingual Competence in Hong Kong Children" include Timmy, Kathryn and Llywelyn while those in the second project "A Cantonese-English Bilingual Child Language Corpus" include Sophie and Charlotte. The corpus data of five children are now deposited at the CHILDES (Child Language Data Exchange System) archive and can be retrieved from the "Database" folder (Under "Zipped Transcripts" click "Bilinguals" and then download YipMatthews.zip ). You can also download the raw audio and video files.

The corpus is now available in XML format. With this new "language" of the world-wide web, you can view our corpus directly by the web browsers such as Internet Explorer or Netscape. XML format also supports quicktime streaming audio and video linked transcripts.

Please go to YipMatthews Browsable transcripts ,
YipMatthews Linked Browsable Audio and
YipMatthews Linked Browsable Video
and take a look at our online corpus data.

Table 1 Subject Information

 

Subjects

Native language of parents

Age span during longitudinal study

Name
Date of Birth
Mother
Father
93.5.21
Cantonese
English
1;05.20 - 3;06.25
92.1.23
English
Cantonese
2;09.23 - 4;06.07
Llywelyn
93.6.21
Cantonese
English
1;06.00 - 3;05.28
96.2.28
Cantonese
English
1;06.00 - 4;00.00
96.8.23
Cantonese
English
1;05.10 - 3;06.14

Background on Cantonese: the language and its speakers

Tagging of English and Cantonese data

Acknowledgments

This page was created and updated by Uta Lam on 25 March, 2004.