Editing Corpus linguistics (section)

=== Ancient languages corpora ===

Besides these corpora of living languages, computerized corpora have also been made of collections of texts in ancient languages. An example is the [[Francis Andersen|Andersen]]-Forbes database of the Hebrew Bible, developed since the 1970s, in which every clause is parsed using graphs representing up to seven levels of syntax, and every segment tagged with seven fields of information.<ref>
{{Citation
 | last1 =Andersen
 | first1 =Francis I.
 | last2 =Forbes
 | first2 =A. Dean
 | year =2003
 | title =Hebrew Grammar Visualized: I. Syntax
 | periodical =Ancient Near Eastern Studies
 | volume =40
 | pages =43–61 [45]
}}</ref><ref>{{Citation | last =Eyland| first =E. Ann| year =1987  | contribution =Revelations from Word Counts | editor-last =Newing | editor-first =Edward G. | editor2-last =Conrad | editor2-first =Edgar W. | title =Perspectives on Language and Text: Essays and Poems in Honor of Francis I. Andersen's Sixtieth Birthday, July 28, 1985 | location =Winona Lake, IN | publisher =[[Eisenbrauns]] | page =51 | isbn =0-931464-26-9 }}</ref> The [[Quranic Arabic Corpus]] is an annotated corpus for the Classical Arabic language of the [[Quran]]. This is a recent project with multiple layers of annotation including morphological segmentation, [[part-of-speech tagging]], and syntactic analysis using dependency grammar.<ref>Dukes, K., Atwell, E. and Habash, N. 'Supervised Collaboration for Syntactic Annotation of Quranic Arabic'. ''Language Resources and Evaluation Journal''. 2011.</ref> The Digital Corpus of Sanskrit (DCS) is a "Sandhi-split corpus of Sanskrit texts with full morphological and lexical analysis... designed for text-historical research in Sanskrit linguistics and philology."<ref>{{cite web |url=http://www.sanskrit-linguistics.org/dcs/#:~:text=The%20Digital%20Corpus%20of%20Sanskrit,in%20Sanskrit%20linguistics%20and%20philology. |title=Digital Corpus of Sanskrit (DCS) |access-date=2022-06-28}}</ref>