Welcome to the Infant Bookreading Database (IBDb).

This is a data set that was created by Carla Hudson Kam and Lisa Matthewson in the Department of Linguistics at the University of British Columbia. It contains information on English-language books being read to children aged 0-36 months, and is being provided for use by researchers interested in children’s books and bookreading.


It is available for download (see below) in two formats, csv and txt. There is an accompanying Read Me file (in pdf format) that details the organization of the data.


In the survey, caregivers were asked to list up to 5 books that they were reading to their child frequently at the time. They also answered some questions about the child and about themselves. The resulting dataset contains over 1100 responses. 1058 of the responses are complete, meaning that respondents answered all of the questions about the child and themselves and entered at least one English-language book title. 1091 of the responses contain full information about the child, and at list at least one English-language book.*

Respondents generally listed only titles, but the dataset contains information on authors/publishers as well whenever possible. More details about the survey and the data are contained in the following paper:

Hudson Kam, C. L. & Matthewson, L. (2016). Introducing the Infant Bookreading Database (IBDb). Journal of Child Language. doi:10.1017/S03050009116000490

We ask people to cite this paper when using the data.



IBDb Read Me File


*A few of these were initially incomplete but were possible to complete according to the responses given (e.g., the responses indicated that the child did not produce any words yet, so the lack of response to the question about two-word sequences could be filled in with a no). If a response could not be confidently inferred, it was not. For example, a child who was indicated to be producing two-word sequences as well as sequences of more than three words would almost certainly have a vocabulary of larger than 10 understandable words, but since the former do not strictly entail the latter, the question about 10 understandable words was left blank despite the responses to the questions about word sequences. In the end, 12 responses were missing some information about the child that was not inferable, (e.g., a 13-14 month old child listed as producing 10 understandable words, no sequences of more three words, but where the question about two-word sequences was not answered