1 / 24

Overview

Overview. LING 5200 Computational Corpus Linguistics Martha Palmer. What’s a corpus?. McEnery & Wilson: (i) (loosely) any body of text (ii) (most commonly) a body of machine-readable text

nubia
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview LING 5200 Computational Corpus Linguistics Martha Palmer

  2. What’s a corpus? • McEnery & Wilson: • (i) (loosely) any body of text • (ii) (most commonly) a body of machine-readable text • (iii) (more strictly) a finite collection of machine-readable text, sampled to be maximally representable of a language or variety BASED on Kevin Cohen’s LING 5200

  3. What’s corpus linguistics? • “the study of language based on examples of ‘real life’ language use” (McEnery & Wilson) • A methodology, not a branch of linguistics • Biber et al.: • Uses computers • “Natural” texts • Large & principled collection • Both quantitative and qualitative BASED on Kevin Cohen’s LING 5200

  4. What was Chomsky’s complaint? • Linguistics should model competence not performance. What are the underlying rules that allow us to generate language? • Context – structuralists believed in collecting linguistic data about a language without taking meaning and communication into consideration. • Mirrors the debate between the rationalists and the empiricists. • But, does Chomsky account for meaning? (see Searle) BASED on Kevin Cohen’s LING 5200

  5. Phonetics Phonology Morphology Syntax Semantics Pragmatics Psycholinguistics Computational Lx Descriptive Lx Historical Lx Sociolinguistics Which Linguistic branches can make use of corpus linguistics? BASED on Kevin Cohen’s LING 5200

  6. Natural Corpus Language Linguistics Processing Computational Linguistics Corpus linguistics in context data applications models BASED on Kevin Cohen’s LING 5200

  7. What’s LING 5200 Corpus Linguistics? • Tools • Techniques BASED on Kevin Cohen’s LING 5200

  8. Overview • Quick intro to Unix • A little corpus design • Quick tour of corpora and annotation • Tools for working with corpora • Programming in Python • Some software engineering BASED on Kevin Cohen’s LING 5200

  9. Why Python? • It works • Many advantages • It’s a bona fide programming language • You’ll need it for CSCI 5832 BASED on Kevin Cohen’s LING 5200

  10. Administrative things • Textbooks – Unix, Python • Office hours – Mon 5-6, Tues 1-2 • verbs.colorado.edu/mpalmer/ling5200 • Prerequisites - none • Grades – homeworks/project • Accounts on babel BASED on Kevin Cohen’s LING 5200

  11. Logging on for the first time • First thing to do: change your password. • passwd • Give it your current password, then your new password. Repeat the new one. (to catch typos) BASED on Kevin Cohen’s LING 5200

  12. Connecting with another computer ssh –l your_name babel.colorado.edu You are prompted to log in. BASED on Kevin Cohen’s LING 5200

  13. Logging on for the first time, again • First thing to do: change your password. • passwd • Give it your current password, then your new password. Repeat the new one. (Why?) BASED on Kevin Cohen’s LING 5200

  14. Where am I? • Type pwd • You see something like this: /home/mpalmer BASED on Kevin Cohen’s LING 5200

  15. What's that mean?? BASED on Kevin Cohen’s LING 5200

  16. Important directories / bin home etc usr local mpalmer ling5200 bin BASED on Kevin Cohen’s LING 5200 RCS

  17. Important directories / bin home etc usr local mpalmer /home/mpalmer/ling5200 ling5200 bin BASED on Kevin Cohen’s LING 5200 RCS

  18. Important directories / bin home etc usr local mpalmer /home/mpalmer/ling5200 /usr/local/bin ling5200 bin BASED on Kevin Cohen’s LING 5200 RCS

  19. Navigating directories • ls to list contents, cd to change directory • Directories are just like windows folders • /home/mpalmer shortcut: ~ • “the directory above this one”: .. • “this directory”: . BASED on Kevin Cohen’s LING 5200

  20. What's in the neighborhood? • Type ls • You see a list of directories and files that are contained within the current directory Homework_1.txt tools buglog.txt BASED on Kevin Cohen’s LING 5200

  21. I'd like to go somewhere else… • Type pwd • Type cd • Where are you? • Type cd .. • Where are you? • Type cd your_user_id • Where are you? BASED on Kevin Cohen’s LING 5200

  22. Unix is a verb-initial language cd .. "go" where to go BASED on Kevin Cohen’s LING 5200

  23. Unix is a verb-initial language cd If no argument, I assume you mean "home" "go" BASED on Kevin Cohen’s LING 5200

  24. Making a new directory • Type cd • Type ls • Type mkdir ling5200 • Type ls • Go to the directory you just made (how?) • Type pwd • Type ls BASED on Kevin Cohen’s LING 5200

More Related