Text Corpus

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.A corpus may contain texts in a single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora.In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation.
Posts about Text Corpus
  • Co-Occurrence as a Ranking Signal

    …, since nearly all of them are about entrepreneurship and/or online marketing.” Analysed Titles20Relevant to his interests20Knows the author personally06Interacted online02Unfamiliar with the author12 The Billionaire Who Wasn’t: How Chuck Feeney Secretly Made and Gave Away a Fortune I recommend this book constantly. It’s one of my favorites. Other…

    Dan Petrovic/ DEJAN SEOin SEO Google Facebook Twitter- 15 readers -
Get the top posts daily into your mailbox!