1 / 45

LING 408/508: Computational Techniques for Linguists

LING 408/508: Computational Techniques for Linguists. Lecture 1 8/20/2012. Course web page. Go to: http://www.u.arizona.edu/~echan3/508.html Not using D2L in this course. Outline. Fill out survey Course introduction Syllabus Some advice Schedule office hours Python. Survey.

duer
Download Presentation

LING 408/508: Computational Techniques for Linguists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 408/508: Computational Techniques for Linguists Lecture 1 8/20/2012

  2. Course web page • Go to: • http://www.u.arizona.edu/~echan3/508.html • Not using D2L in this course

  3. Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python

  4. Survey • Please fill out the survey and hand it in by the next class. • You don’t need to list every single course you’ve ever taken.

  5. Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python

  6. “Computational Techniques for Linguists” • Learn computer programming • Use the Python language • (Relatively) easy to learn • Write programs quickly • Good for working with text • Commonly used in computational linguistics

  7. Linguistic applications • Use corpora • Corpus: large electronic database of language N.B.: plural of corpus is “corpora”, not “corpuses” • Examples: • Brown Corpus, 1 million words of mixed English texts • CELEX, dictionary of English words and their pronunciations • CHILDES: transcriptions of child / caretaker speech • Penn-Helsinki Parsed Corpus of Middle English • Investigate frequencies of words and constructions

  8. Computational Linguistics / Natural Language Processing • Given a text: • Morphological analysis • Part of speech tagging • Parsing • Semantic analysis • etc. • Taught in LING 538 Computational Linguistics and LING 539 Statistical Natural Language Processing • Need to know how to program in order to built these kinds of systems

  9. Computer programming = algorithmic thinking • Algorithm: precise, step-by-step series of instructions to accomplish a task • Joke: Why did the freshman die in the shower? • Because he followed the instructions on the shampoo bottle: lather, rinse, repeat • The shampoo bottle’s instructions were not a proper algorithm • Should say: lather, rinse, repeat until hair is clean

  10. Who is this class for? • Primarily intended for HLT students • Master’s program in Human Language Technology, Department of Linguistics • (graduate) Master’s, and Accelerated Master’s • No assumption of previous experience in programming • Need to get up to speed very quickly in programming skills • Be competitive with students who have taken multiple undergraduate courses in programming

  11. In the news • http://www.nytimes.com/2011/08/20/technology/finding-fake-reviews-online.html • Problem: fake product reviews for web sites • Researchers created a data set of 400 positive but fake reviews, and 400 genuine reviews • Human judges couldn’t tell them apart • Team developed computer program that got 90% correct • Companies offered jobs to these students = $$$

  12. Prerequisites • Some knowledge of elementary linguistics • Concepts like: constituent, grammar, morpheme • Prior experience in programming is not assumed • But previous coursework in technical topics such as mathematics, logic, etc. probably means that programming will come (relatively) easy to you

  13. Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python

  14. Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python

  15. This class is graduate-level • 400/500-level: higher expectations • Although this an introductory course, we will progress through the material quickly • 100/200-level intro programming courses are offered in C SC and ISTA departments • Assignments are time-consuming • Characteristic of all beginning programming courses • Strongly consider withdrawing if you fall behind early on • Not possible to catch up • Not possible to catch up • Not possible to catch up

  16. But grading for undergrads is easier • Short assignments: 50% • These will be straightforward • Do all of these and you’ll be well on your way to a good course grade • Weekly assignments: 50% • These will be substantially harder • Less work: fewer problems for undergrads

  17. How to get maximum points on assignments • Answer all questions on assignments. • I will be looking for answers to anything that is being asked or requested of you

  18. Learning how to program • Learning to programming requires working on programming problems • Cannot learn how to program merely by: • Reading a book • Listening to lectures • Reading code • Learning programming involves: • Repeated attempts to correct one’s mistakes • Repeatedly refining code until you have a clear solution • Initial solutions are often obscure and too long • In every program there’s a beautiful solution struggling to break through

  19. Cooking analogy • You want to bake a wedding cake. • You have an idea of what you want it to look like. • You have an idea of what ingredients and cooking tools you will need. • You don’t have a cookbook. • Need to develop a recipe = procedure = algorithm. • Algorithm: precise, step-by-step series of instructions to accomplish a task • Initial attempts at producing recipes may need to be refined.

  20. Ask for help • It is common for novice programmers to get stuck • You don’t know why your program doesn’t work • You spend hours trying different things, but you don’t know why it still doesn’t work • Save time: get help • Work with your classmates • Go to office hours • Have discussion with instructor and other students • Send e-mail to the instructor

  21. Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python

  22. Schedule office hours • I will create a doodle poll and e-mail the link to you • No guarantees that I can select times that fits everyone’s schedule

  23. http://www.doodle.ethz.ch/graphics/doodlePollReunion.png

  24. Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python

  25. Python language • Developed by Guido von Rossum • Released in early 1990s

  26. Named after Monty Python,a British comedy group

  27. But the Python logo looks like two snakes

  28. Why Python • Datatypes such as strings, lists, and hash tables are built in to the language as primitives • Not verbose, not too concise • Easy to read • Gentle learning curve • Widely used in Natural Language Processing community • Examples of NLP in Python vs. other languages http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

  29. “Hello World” in Python and Java • Python: print('hello world') • Java: public class HelloWorld { public static void main(String[] args) { System.out.println("Hello World!"); } }

  30. Installation • Download from www.python.org • Current versions are 2.7 and 3.2 • I will cover Python 3.2 • Similarities/differences • For novice programmers, the two versions are largely similar, but certain language constructs are different (and incompatible) • Example: • Python 2 print “Hello World!” • Python 3 print(“Hello World!”) • Once you become proficient in Python, it is not hard to switch to a different version • Much existing code is written in Python 2

  31. Set Python environment variable • Create a directory mypythoncode for your Python code • e.g. C:\Users\Arizona\Desktop\508\mypythoncode\ • Set environment variable so Python knows where to find your code • Windows Vista: • Right-click on My Computer • Choose "Advanced system properties" • Add a new User variable called PYTHONPATH • Set the value of the variable to mypythoncode

  32. Set Python environment variable • OS X terminal, Unix/Linux, etc.: • csh • edit .cshrc • setenv PYTHONPATH /home/me/mypythoncode • bash • edit .bashrcor .bash_profile • export PYTHONPATH=/home/me/mypythoncode

  33. Running Python from the command line • Mac Terminal, Unix/Linux, etc. • Which command you type in to run python 3 depends on the system that you are using

  34. IDLE, a Python Graphical User Interface (GUI) • Consists of Python shell (for running commands) and text editor • Available for Mac and Windows within the standard Python installation

  35. Other installations and editors • There are other ways to run Python. • Obtain a different implementation of Python and/or IDE (integrated development environment) • PyDev + Eclipse IDE • ActiveState Python • etc. • Doesn’t matter what you use • Demonstrations in earliest lectures will be in IDLE only

  36. Using Python interactively • Whether run from command line, or through IDLE >>> a = 3 >>> a 3 >>> print('hello') hello >>>

  37. Test setting of Python environment variable >>> import sys >>> sys.path ['C:\\Python25\\Lib\\idlelib', 'C:\\Users\\Arizona\\Desktop\\508\\mypythoncode', 'C:\\Windows\\system32\\python25.zip', 'C:\\Python25\\DLLs', 'C:\\Python25\\lib', 'C:\\Python25\\lib\\plat-win', 'C:\\Python25\\lib\\lib-tk', 'C:\\Python25', 'C:\\Python25\\lib\\site-packages']

  38. Python script files • Create a file called myfile.py # myfile.py # these are comments a = 3 a print('hello') • If using IDLE in Windows, be sure to add a .py extension to the file • Do notsave as a text file, otherwise it becomes myfile.py.txt

  39. Run script file • Command line: [mint][~]> python myfile.py • Within IDLE: • Go to window for myfile.py • Press F5, or click on Run  Run Module in menu bar • Will execute in Python shell • Output: >>> =========== RESTART================= >>> hello Note that the value of a is not shown, unlike when the code was typed directly in the shell

  40. Order of execution in script files • Code executes from top to bottom • Incorrect: at print statement, b hasn’t been defined print(b) b = 7 • Correct: define the variable first b = 7 print(b)

  41. Using IDLE (in Windows) • F5 execute Python script file • control-C cancel execution • control-D quit IDLE • alt-p previous in command history • alt-n next in command history • alt-3 comment a block of code • alt-4 uncomment a block of code

  42. Comments • Comments are ignored by the Python interpreter but are useful for describing the purpose of a section of code a = 1 # everything after hash mark is a comment b = 2 c = 3 # statement below does not execute because # it is within a comment # d = 4

  43. Comment a block of code • Select a block of code • Press alt-3, now it is commented • Select and press alt-4 to uncomment

  44. Beware of Windows… • When you repeatedly execute code and cancel execution (with control-C), sometimes the processes continue anyway, and after a while IDLE won’t let you run your code • Solution: • press ctrl-alt-delete • start Task Manager • select lowest pythonw.exe processes • click on End Process • sometimes you might have to restart IDLE

  45. Until next class… • Download and install Python • Send me e-mail if you have any problems • Buy an intro book (if desired) • Recommended reading for this week: • Zelle chapters 1, 2, 3, 6 • (But my lecture content will not be based on Zelle)

More Related