Elham Alayiaboozar
Abstract
The aim of this paper is to take readers through the basic steps involved in building a corpus of language data for different purposes. This is done via gathering information about ...
Read More
The aim of this paper is to take readers through the basic steps involved in building a corpus of language data for different purposes. This is done via gathering information about corpus construction from related sources. After a review of literature (regarding corpus construction and the use of corpus in different fields) , this article offers advice in a non-technical style to help the researchers to make sure that their corpus is well-designed and fit for the intended purpose. Key points to be considered in constructing any corpus (written or spoken language) include: Sampling, Size, Representativeness, Balance, General vs. Specialized corpus and Homogeneity. The steps involved in constructing a text corpus are: text selection, text normalization and different kinds of annotation. The steps to be followed in constructing a spoken language/speech-based corpus are: data gathering, transcription, representation, annotation and access. In this paper all the afore-mentioned steps have been explained with related details.