Tagging of English and Cantonese data

    Grammatical categories

    The grammatical category labels for the English corpus are based on the MOR grammars for English in the CHILDES Windows Tools while those for the Cantonese corpus are based on those of Cancorp (Lee et. al 1996) with thirty-three categories distinguished, as shown in Table 1 (see MacWhinney 2000:364-365). These are as used in Cancorp apart from the following modifications:

    (i) the category 'particle' (prt) rather than 'clitic' is used for the postverbal modal dak1 and postverbal dou3 introducing an extent complement;

    (ii) the category 'localizer' (loc) is used for locative expressions such as dou6 as in zoeng1 toi2 dou6 '(lit.) the table there' as well as for expressions such as haa6bin6 'down there' which are tagged as locative noun phrases (nnloc) in Cancorp.

    (iii) the category 'onomatopoeic expression' (onoma) is introduced in our Cantonese corpus for sounds such as wo1wo1 'barking of dogs' and baang4 'crashing/shooting noise'.

    (iv) the category 'ditransitive verb' (vd) is applied only to verbs which allow two NP objects such as bei2 'give', excluding other three-place predicates such as baai2 'put'.

    Table 1 Grammatical categories for the Cantonese corpus

    Syntactic categories
    Example
    1.
    adj
    adjective
    sau3, leng3, faai3 hou2teng1
    thin, pretty, fast, good to listen to
    2.
    advf
    focus adverb
    dou1, sin1, jau4, zung6
    also, first, again, still
    3.
    advi
    adverb of intensity
    gam3, hou2, taai3, zeoi3
    so, very, too, most
    4.
    advm
    adverb of manner
    gwaai1gwaai1dei2, maan6maan2
    obediently, slowly
    5.
    advs
    sentential adverb
    jan1wai6, so2ji5, bat1jyu4
    because, therefore, how about
    6.
    asp
    aspectual marker
    zo2, gwo3, gan2, hoi1, haa5
    PFV, EXP, PROG, HAB, DEL
    7.
    aux
    auxiliary/modal verb
    jing1goi1, wui5, m4hou2
    should, would, don't
    8.
    cl
    classifer
    bun2, go3, gaa3, tiu4
    CL
    9.
    com
    comparative morpheme
    di1 as in leng3 di1, gwo3 as in leng3 gwo3 keoi5
    more beautiful, prettier than her
    10.
    conj
    connective
    ding6hai6, tung4maai4waak6ze2
    or, and, or
    11.
    corr
    correlative
    jat1lou6... jat1lou6, jyut6...jyut6
    while, the more...the more
    12.
    det
    determiner
    li1, go2, dai6
    this, that, number
    13.
    dir
    directional verb
    lei4/lai4, heoi3, ceot1, jap6, soeng5, lok6
    come, go, out, in, go up, go down
    14.
    ex
    expressive utterance
    ai1jaa3, e3, m4goi1
    oops, well, please/thanks
    15.
    gen
    genitive marker
    ge3, as in Timmy ge3 pang4jau5 Timmy
    Timmy's friends
    16.
    ins
    emphatic inserted marker

    gwai2 as in gam3 gwai2 lyun6
    what a mess!
    17.
    loc
    localizer
    dou6as in zoeng1 toi2 dou6 , soeng6min6
    on the table, up there
    18.
    nn
    noun
    ce1, wun6geoi6, sing1sing1 , kau3fu2
    car, toy, star, uncle
    19.
    nnpr
    pronoun
    ngo5, lei5, keoi5, ngo5dei6,
    lei5dei6 , keoi5dei6
    I/me, you, s/he, we/us, you(pl), they/them
    20.
    nnpp
    proper noun
    ciu1jan4 , je4sou1, jing1gwok3
    Superman, Jesus, Britain
    21.
    neg
    negative morphem
    m4 , mai6 , mou5
    not, not, not have
    22.
    onoma
    onomatopoeic expression
    wou1wou1, baang4, gok6gok6
    ONOMA
    23.
    prt
    (postverbal)particle
    dak1, dou3, saai3, maai4, jyun4
    can, until, all, as well, finish
    24.
    prep
    preposition
    hai2, bei2
    at, for
    25.
    q
    quantifier
    jat1, sap6saam1, mui5
    one, thirteen, each
    26.
    rfl
    reflexive pronoun
    zi6gei2
    self
    27.
    sfp
    sentence-final particle
    aa3, laa1, gaa3, ho2
    SFP
    28.
    vd
    ditransitive verb
    bei2, sung3
    give, give (as a gift)
    29.
    verg
    ergative(unaccusative) verb
    dit3, tyun5
    fall, break
    30.
    vf
    function verb
    hai6, jau5
    be, have
    31.
    vi
    intransitive verb
    siu3, jau1sik1, kei4tou2
    smile, rest, pray
    32.
    vt
    transitive verb
    sik6, gong2, zi1dou3
    eat, say, know
    33.
    wh
    wh phrases

    bin1go3, mat1je5(me1), bin1dou6, dim2gaai2 who, what, where, why

     

    Morpheme tier %mor

    The %mor tier was generated using a tagging program developed by Lawrence Cheung. Since Cantonese has many homophonous morphemes, it was necessary to carry out disambiguation with respect to word class. The disambiguation and checking were performed by Gene Chu and Simon Huang for both Cantonese and English files.

    Cantonese Tier %can

    The child's Cantonese was first transcribed using romanized Cantonese instead of Chinese characters. The %can tier was generated at a later stage to provide readers who can read Chinese characters with quicker access to the speakers' utterances. Fonts for Cantonese characters are available at the Hong Kong SAR government website, http://www.5c.org/ as well as through Microsoft.

    The same characters are used for allophonic representations of a morpheme. Due to ongoing sound changes, there is variation especially between n/l and ng/initials (Matthews and Yip 1994: 29-30). For example, the first person pronoun is represented as ngo5 in the corpus but is often pronounced o5. The second person pronoun is represented as lei5 although the prescribed form is nei5. For the demonstrative there are several variant forms: li1/ni1/ji1/nei1/lei1 'this'. The experiential aspect marker may appear as gwo3 or go3. Other alternative forms result from contraction, for example mat1je5 'what' becomes me1 and hou2 m4 hou2 'is it okay?' becomes hou2 mou2.