Abstract:
This study contributes a longitudinal naturalistic corpus of a child acquiring Italian as an L1, based on family tapes from age 0;7 to age 3;2. The study aimed to investigate whether it was possible to track the emergence of grammatical categories and structures. It also provides a guide for non experts (e.g. parents) on how to build a similar naturalistic corpus, with particular attention on reliability and consistency.
Method
Family tapes were analysed and manually transcribed by the researcher. The resulting corpus is consists of 2408 spoken utterances. Overall, 973 utterances, (40.41% of the utterances in the corpus) were produced by the child. The corpus was analysed with Python to verify that all participants were always correctly identified and that no mismatch was accidentally performed. Then, a morphosyntactic analysis was performed on the corpus. This further measure helped at finding word frequencies and determining vocabulary size for the child both to compare it with similar corpora and to verify a gradual but constant growth in the length and complexity of the child’s utterances.
Analyses
Earlier in the corpus, I noticed an absence of functional words, which were gradually filled with filler words (shwa-fillers). At around 1;5 determiners started to emerge in the child’s production, at first imitating the mother’s speech, then they became more and more independently uttered. This is consistent with most theories discussed in this paper (i.e. Valian 1991, Demuth 2019, Rinaldi et al. 2004, Lorusso et al. 2004, Song et al. 2018, Guasti 1993, Bencini & Valian 2008).
Proto-articles, verbs, morphology, and emerging syntax were analysed using the methods in Valian (1991). I examined whether there was evidence for a consistent production increase through time. There was no evidence for consistent increase in the production of correct grammatical forms, however, an overall increase in the length of child-spoken utterances was observed. This lack of evidence is likely due to sparse data in the corpus. Vocabulary increase, on the other hand (Rinaldi et al. 2004, & Song et al. 2018), was detectable and mostly correlated to vocabulary use in the mother’s speech. The distribution of infinitives and clitics was always on target, showing that the distinction between finite and non finite verbs is mastered at an early age (as suggested by Guasti, 1993).
The correct identification of shapes through an interactive game with the mother was observed at 1;11. This confirms that the imaginative framework of the child is built at a very early stage, as predicted by Rinaldi et al. (2004).
The first unstimulated multi-utterance discourse was uttered at 2;7, when most of the child’s utterances can be described as complete sentences. As Song et al. (2018) observed, speech is a difficult-to-predict behaviour, because it is influenced by a vast number of factors, the predominant one appearing to be mother’s speech.
This corpus, despite not being sufficient for a complete analysis, can provide general findings and guidelines to help both researchers and parents to build a consistent and significant corpus in the future.