Powered by TEITOK
Maarten Janssen, 2014-




DOESTE v0.5 is a set of developmental corpora of texts written by school-age Brazilian and Portuguese children and adolescents. It is a work in progress (78% researchable). Both corpora were duely authorized by local Research Ethics Committees.

The texts written by children and adolescents in European Portuguese were collected between September 2011 and January 2012, in different public schools in Lisbon (Portugal). It consists of 244 narrative (n=122) and argumentative texts (n=122). The subjects (51% female and 49% male) are students enrolled in the 5th grade (n=52; mean age=10.19), in the 7th grade (n=92; mean age=12.33) and in the 10th grade (n=100; mean age=15.16) of Portuguese basic education. The subcorpus of Portuguese texts is fully tokenized and morphologically annotated, in addition to presenting the occurrences of the sentences.

The texts written by children and adolescents in Brazilian Portuguese have been collected since 2017, in different public schools in three cities in Rio Grande do Norte (Brazil). It currently consists of narrative (n=225) and argumentative (n=225) texts. The subjects (53% female and 47% male) are students enrolled in the 5th grade (n=68; mean age=11.13), in the 9th grade (n=82; mean age=15.32) and in the 12th grade (n=224; mean age=17.96) of Brazilian basic schooling. The subcorpus of Brazilian texts is still in compilation, but a large part is already searchable, being tokenized and morphologically annotated. The Brazilian subcorpus also presents itself with the original transcriptions, along with the original images.

Some general corpus statistics are available in both Stats and Distribution windows.

The next version of DOESTE, in addition to having more school texts, intends to present semantic annotations and segmentation of clauses and t-units.


How to cite DOESTE:

Martins, Mário; Janssen, Maarten; Santos, Taiza; et al., 2020, DOESTE v0.5, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-3262.


DOESTE v0.5 is developed and maintained by the Research Group on Educational Linguistics (LEd), based at the Federal Rural University of Semiarid Region (UFERSA).

To report errors or make suggestions, please contact us (e-mail).

DOESTE v0.5 is licenced under 


Last update: 10/09/2020