Home » About


The messages in this corpus were collected as part of the project, “Literacies of Bilingual Youth: A profile of bilingual academic, social, and txt literacy”. The messages have been identified as Spanish, English, Bilingual, or Other. “Other” refers to messages that are not identifiable (i.e., only an emoji, proper noun, or “lol”). When possible, conversations were kept intact and identified with a conversation number, and the relationship type between the texters, and the gender of the receiver was identified as well. When the timestamp was available, it was included as well.

Participants signed two consent forms for this and had to enter their passcode to allow access  to their messages. There are 44,597 messages in the corpus, including spam, automated messages, and mass messages from participants’ cellphone carriers.

More detailed information about how this information was categorized is available on the Read Me page.

Need help with the Commons? Visit our
help page
Send us a message