Loading in 2 Seconds...
Loading in 2 Seconds...
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Adam Kilgarriff Lexical Computing Ltd Universities of Leeds Using Corpora in Language Research-alsoIntroduction to the Sketch Engine (WS15) part 1
Adam Kilgarriff What is language?
Adam Kilgarriff What is language? In our heads
Adam Kilgarriff What is language? In our heads In texts and sound signals
Adam Kilgarriff What is language? In our heads In texts and sound signals Both
Adam Kilgarriff Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)
Adam Kilgarriff Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz) Odd method for objective science Practical problems: coverage, arbitrariness
Adam Kilgarriff Methodology Study text “empiricist” (Locke, Hume) Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals
Adam Kilgarriff It goes against the grain What is important about a sentence? its meaning Corpus methodology: Throw away individual sentence meaning Find patterns
Adam Kilgarriff Computer power Corpora bigger and bigger data sets Language technology tools lemmatizers, POS-taggers, parsers Machine learning, pattern-finding 20 years of rapid ascent
Adam Kilgarriff All the linguisticses Theoretical Socio Psycho Developmental Law and Computational Contrastive Applied ... linguistics
Adam Kilgarriff Developmental CHILDES, TalkBank How children learn language Parents record all interactions Since 1980s Prof. Brian MacWhinney, Carnegie-Mellon Many languages Largest chunk: English, 23m words
Adam Kilgarriff Language change Brown family Small but perfectly formed I m words 500 x 2000-word samples the same 15 text types Supports comparison American and British English 1931, 1961, 1991, 2006
Adam Kilgarriff Language and gender When you see a dentist <he/she/they> ... What is now normal? Recent study they now the norm themself now needed despite what spellcheck says BNC (most text from 1989) 0.2/million EnTenTen (mostly 2009) 0.4/million
Adam Kilgarriff Language and law Trade marks Hoover and similar trademark or generic Cases sabatier, botox, kettle chips Key evidence Do people tend to capitalize?
Adam Kilgarriff English nouns: % capitalized
Adam Kilgarriff Syntax and semantics
Adam Kilgarriff DANTE Detailed account of English lexis Corpus-driven From word sketches Lexicographers assign to senses High precision Available at http://webdante.com Brochures
Adam Kilgarriff What data shall I use?
Adam Kilgarriff Think hard
Adam Kilgarriff Sometimes ... Just-in-time corpus from the web Use case: Translator, French-to-English Translation task volcanoes In French I understand it OK, but I'm no vulcanologist, I don't know the English terminology BootCaT, Baroni and Bernardini
Adam Kilgarriff Corpora in Sketch Engine Access-to-all 42 languages All major world languages Mostly large, web-crawled Various other CHILDES, Brown, ... “My corpora” BootCat and other
Adam Kilgarriff LCL sponsorship of LSA One year free accounts for participants http://www.sketchengine.co.uk “Register” “Site licence member” Your details and Organisation: select LSA2011 Site licence key: Boulder Password by email change it (under Settings)
Adam Kilgarriff Today Motivations, taster Sunday 9-12 practical