Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Corpus linguistics
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== English corpora === A landmark in modern corpus linguistics was the publication of ''Computational Analysis of Present-Day American English'' in 1967. Written by [[Henry Kučera]] and [[W. Nelson Francis]], the work was based on an analysis of the [[Brown Corpus]], which is a structured and balanced corpus of one million words of American English from the year 1961. The corpus comprises 2000 text samples, from a variety of genres.<ref>{{cite book | last1=Francis | first1=W. Nelson | last2=Kučera | first2=Henry | title=Computational Analysis of Present-Day American English | publisher=Brown University Press | date=1 June 1967 | location=Providence | isbn= 978-0870571053}}</ref> The Brown Corpus was the first computerized corpus designed for linguistic research.<ref>{{Citation |last=Kennedy |first=G. |title=Corpus Linguistics |date=2001-01-01 |url=https://www.sciencedirect.com/science/article/pii/B0080430767030564 |encyclopedia=International Encyclopedia of the Social & Behavioral Sciences |pages=2816–2820 |editor-last=Smelser |editor-first=Neil J. |access-date=2023-10-31 |place=Oxford |publisher=Pergamon |isbn=978-0-08-043076-8 |editor2-last=Baltes |editor2-first=Paul B.}}</ref> Kučera and Francis subjected the Brown Corpus to a variety of computational analyses and then combined elements of linguistics, language teaching, [[psychology]], statistics, and sociology to create a rich and variegated opus. A further key publication was [[Randolph Quirk]]'s "Towards a description of English Usage" in 1960<ref>{{cite journal | last1=Quirk | first1= Randolph | title=Towards a description of English Usage | journal=Transactions of the Philological Society | date=November 1960 | pages=40–61 | volume=59 | issue=1| doi= 10.1111/j.1467-968X.1960.tb00308.x }}</ref> in which he introduced [[Survey of English Usage|the Survey of English Usage]]. Quirk's corpus was the first modern corpus to be built with the purpose of representing the whole language.<ref>{{Citation |last=Kennedy |first=G. |title=Corpus Linguistics |date=2001-01-01 |url=https://www.sciencedirect.com/science/article/pii/B0080430767030564 |encyclopedia=International Encyclopedia of the Social & Behavioral Sciences |pages=2816–2820 |editor-last=Smelser |editor-first=Neil J. |access-date=2023-10-31 |place=Oxford |publisher=Pergamon |doi=10.1016/b0-08-043076-7/03056-4 |isbn=978-0-08-043076-8 |editor2-last=Baltes |editor2-first=Paul B.}}</ref> Shortly thereafter, Boston publisher [[Houghton-Mifflin]] approached Kučera to supply a million-word, three-line citation base for its new ''[[The American Heritage Dictionary of the English Language|American Heritage Dictionary]]'', the first [[dictionary]] compiled using corpus linguistics. The ''AHD'' took the innovative step of combining prescriptive elements (how language ''should'' be used) with descriptive information (how it actually ''is'' used). Other publishers followed suit. The British publisher Collins' [[COBUILD]] [[monolingual learner's dictionary]], designed for users learning [[English language learning and teaching|English as a foreign language]], was compiled using the [[Bank of English]]. The [[Survey of English Usage]] Corpus was used in the development of one of the most important Corpus-based Grammars, which was written by Quirk ''et al.'' and published in 1985 as ''A Comprehensive Grammar of the English Language''.<ref>{{cite book | last1=Quirk | first1=Randolph | last2=Greenbaum | first2=Sidney | last3=Leech | first3=Geoffrey | last4=Svartvik | first4=Jan | title=A Comprehensive Grammar of the English Language | publisher=Longman | location=London | date=1985 | isbn=978-0582517349}}</ref> The [[Brown Corpus]] has also spawned a number of similarly structured corpora: the [[LOB Corpus]] (1960s [[British English]]), Kolhapur ([[Indian English]]), Wellington ([[New Zealand English]]), Australian Corpus of English ([[Australian English]]), the Frown Corpus (early 1990s [[American English]]), and the FLOB Corpus (1990s British English). Other corpora represent many languages, varieties and modes, and include the [[International Corpus of English]], and the [[British National Corpus]], a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities ([[Oxford University|Oxford]] and [[Lancaster University|Lancaster]]) and the [[British Library]]. For contemporary American English, work has stalled on the [[American National Corpus]], but the 400+ million word [[Corpus of Contemporary American English]] (1990–present) is now available through a web interface. The first computerized corpus of transcribed spoken language was constructed in 1971 by the Montreal French Project,<ref>{{cite journal | last1=Sankoff | first1=David | last2=Sankoff | first2=Gillian | title=Sample survey methods and computer-assisted analysis in the study of grammatical variation | journal=Canadian Languages in Their Social Context | location=Edmonton | publisher=Linguistic Research Incorporated | date=1973 | pages=7–63 | editor-last=Darnell | editor-first=R.}}</ref> containing one million words, which inspired [[Shana Poplack]]'s much larger corpus of spoken French in the Ottawa-Hull area.<ref>{{cite journal | last1=Poplack | first1=Shana | title=The care and handling of a mega-corpus | editor-first1=R. | editor-last1=Fasold | editor-first2=D. | editor-last2=Schiffrin | journal=Language Change and Variation | series=Current Issues in Linguistic Theory | location=Amsterdam | publisher=Benjamins | date=1989 | volume=52 |pages=411–451| doi=10.1075/cilt.52.25pop | isbn=978-90-272-3546-6 }}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Corpus linguistics
(section)
Add topic