Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Text corpus
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Applications == Corpora are the main knowledge base in [[corpus linguistics]]. Other notable areas of application include: * [[Language technology]], [[natural language processing]], [[computational linguistics]] ** The analysis and processing of various types of corpora are also the subject of much work in [[computational linguistics]], [[speech recognition]] and [[machine translation]], where they are often used to create [[hidden Markov model]]s for part of speech tagging and other purposes. Corpora and [[frequency list]]s derived from them are useful for [[language teaching]]. Corpora can be considered as a type of [[foreign language writing aid]] as the contextualised grammatical knowledge acquired by non-native language users through exposure to authentic texts in corpora allows learners to grasp the manner of sentence formation in the target language, enabling effective writing.<ref name="Yoon">Yoon, H., & Hirvela, A. (2004). [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1073.2322&rep=rep1&type=pdf ESL Student Attitudes toward Corpus Use in L2 Writing]. ''Journal of Second Language Writing, 13''(4), 257–283. Retrieved 21 March 2012.</ref> * [[Machine translation]] ** Multilingual corpora that have been specially formatted for side-by-side comparison are called ''aligned parallel corpora''. There are two main types of [[parallel corpora]] which contain texts in two languages. In a ''translation corpus'', the texts in one language are translations of texts in the other language. In a ''comparable corpus'', the texts are of the same kind and cover the same content, but they are not translations of each other.<ref>{{cite book | last1 = Wołk | first1 = K. | last2 = Marasek | first2 = K. | title = New Perspectives in Information Systems and Technologies, Volume 1 | chapter = Real-Time Statistical Speech Translation | series = Advances in Intelligent Systems and Computing | date = 7 April 2014 | publisher = Springer | volume = 275 | pages = 107–114 | doi = 10.1007/978-3-319-05951-8_11 | arxiv = 1509.09090 | issn = 2194-5357 | isbn = 978-3-319-05950-1| s2cid = 15361632}}</ref> To exploit a parallel text, some kind of text alignment identifying equivalent text segments (phrases or sentences) is a prerequisite for analysis. [[Machine translation]] algorithms for translating between two languages are often trained using parallel fragments comprising a first-language corpus and a second-language corpus, which is an element-for-element translation of the first-language corpus.<ref>{{cite conference |last1=Wolk |first1=Krzysztof |last2=Marasek |first2=Krzysztof |editor1-last=Král |editor1-first=Pavel |editor2-last=Matoušek |editor2-first=Václav |arxiv=1509.08639 |contribution=Tuned and GPU-accelerated parallel data mining from comparable corpora |doi=10.1007/978-3-319-24033-6_4 |pages=32–40 |publisher=Springer |series=Lecture Notes in Computer Science |title=Text, Speech, and Dialogue – 18th International Conference, TSD 2015, Plzeň, Czech Republic, September 14–17, 2015, Proceedings |volume=9302 |year=2015|isbn=978-3-319-24032-9 }}</ref> * [[Philology|Philologies]] ** Text corpora are also used in the study of [[historical document]]s, for example in attempts to [[decipherment|decipher]] ancient scripts, or in [[Biblical scholarship]]. Some archaeological corpora can be of such short duration that they provide a snapshot in time. One of the shortest corpora in time may be the 15–30 year [[Amarna letters]] texts ([[1350 BC]]). The ''corpus'' of an ancient city, (for example the "[[Kültepe]] Texts" of Turkey), may go through a series of corpora, determined by their find site dates.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Text corpus
(section)
Add topic