1 / 38

Tutorial I: Natural Language Processing With Python

Natural Language Processing – The technology making computer listen and talk as human beings do. Tutorial I: Natural Language Processing With Python. Xu Ruifeng Harbin Institute of Technology, Shenzhen Graduate School. Contents.

bautistac
Download Presentation

Tutorial I: Natural Language Processing With Python

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing – The technology making computer listen and talk as human beings do. Tutorial I: Natural Language Processing With Python Xu Ruifeng Harbin Institute of Technology, Shenzhen Graduate School Natural Language Processing

  2. Contents Welcome to Python world How to run Python programs Data Types and Operators Basic Syntax NLP with Python

  3. Welcom to Python world About Python • It’s named after Monty Python • Python is an elegant and robust rogramming language that delivers both the power and general applicability of traditional compiled languages with the ease of use of simpler scripting and interpreted languages. • It is commonly defined as an object-oriented scripting language —a definition that blends support for OOP with an overall orientation toward scripting roles.

  4. Welcom to Python world • Python is often compared to Perl, Ruby.Python Is Engineering, Not Art.Some of its key distinguishing features include: Easy to read:very clear, readable syntax Easy to learn and use:simple structure, and a clearly defined syntax Portable:runs everywhere,Windows,Linux,Mac... Extensible:extensions and modules easily written in C, C++ (or Java for Jython, or .NET languages for IronPython) Describe a vision of company or strategic contents.

  5. How to run Python programs • Downloading and Installing Python The most obvious place to get all Python-related software is at the main Web site at http://python.org.

  6. How to run Python programs • Three main ways • The simplest way is by starting the interpreter interactively, entering one line of Python at a time for execution • Another way to start Python is by running a script written in Python • IDEs;(Eclipse+PyDev) http://www.cnblogs.com/sevenyuan/archive/2009/12/10/1620939.html

  7. Data Types and Operators • Operators + - * / % ** • comparison operators < <= > >= == != <> • conjunction operators and or not

  8. Data Types and Operators • Program Output, the print Statement, and "Hello World!" • Program Input and the raw_input() num=raw_input('Now enter a number:') print 'Doubling your number:%d'%(int(num)*2) • Comments(注释) • # one line • ''' one part

  9. Data Types and Operators • Variables and Assignment Python is dynamically typed, meaning that no pre-declaration of a variable or its type is necessary. Thetype (and value) are initialized on assignment. Assignmentsare performed using the equal sign counter=0 miles=1000.0 name='Bob' kilometers=1.609*miles print'%f miles is the same as %f km'%(miles,kilometers) Multiple Assignment: x = y = z = 1 a, b, c = 1, 2, 'a string'

  10. Numbers set List Dictionaries Strings Tuples Data Types and Operators

  11. Data Types and Operators • Numbers Python long integers should not be confused with C long.If you are familiar with Java, a Python long is similar to numbers of the BigInteger class type.

  12. Data Types and Operators • Strings • Subsets of strings can be taken using the index ( [ ] ) and slice ( [ : ] ) operators, which work with indexes starting at 0 in the beginning of the string and working their way from -1 at the end. • The plus ( + ) sign is the string concatenation operator, and theasterisk ( * ) is the repetition operator. #字符串 pystr='Python' iscool='is cool' print pystr[0] print iscool[1:2] print pystr+' '+iscool print pystr*2 Some example( StringExamples.py)

  13. Data Types and Operators • Lists and Tuples They are similar to arrays, except that lists and tuples can store different types of objects. • differences #列表[]和元组() aList=[1,2,3,4] print aList print aList[0] print aList[2:] print aList[:3] aTuple=('robots',77,94,'try') print aTuple print aTuple[:3] print aTuple[2:] L=[...] and T=(...) Lists' elements and size can be changed but tuples' can not be changed Subsets can be taken with the slice operator ( [] and [ : ] )

  14. Data Types and Operators • Dictionaries(Dic.py) • Dictionaries (or "dicts" for short) are Python's mapping type and work like associative arrays or hashes • D={key:value} • Key:usually numbers or strings.Value: any Python object. #create dict aDict={'host':'earth'} #add to dict aDict['port']=80 print aDict print aDict.keys() print aDict['host'] #输出键值对需要用到循环 for key in aDict: print key,aDict[key]

  15. Data Types and Operators • Set: A set is an unordered collection with no duplicate elements (set.py)

  16. Basic Syntax • if if expression1: if_suite elif expression2: elif_suite else: else_suite • while Loop #while 循环 counter=0 while counter<3: print 'loop # %d'%(counter) counter+=1

  17. Basic Syntax • for Loop and the range() Built-in Function #for循环和range()内建函数 foo='abc' for c in foo: print c for i in range(len(foo)): print foo[i],'(%d)'%i The range() function has been often seen with len() for indexing into a string. Here, we can display both elements and their corresponding index value • range(start, end, step =1) range(2, 19, 3)

  18. Basic Syntax • List Comprehensions Use a for loop to put together an entire list on a single line #列表解析:使用for循环将所有的值放在列表中 squared=[x**2 for x in range(4)] for i in squared: print i sqdEvens=[x**2 for x in range(8) if x%2] for i in sqdEvens: print i

  19. Basic Syntax • Functions def addMe2Me(x): 'apply + operation to argument' return(x+x) print addMe2Me(4.25) print addMe2Me('Python') print addMe2Me([-1,'abc'])

  20. Basic Syntax • Files and the open() and file() Built-in Functions • File Built-in Methods • read() / readline() / readlines() • write() / writelines() Note: (1)The readlines() method does not return a string like the other two input methods. Instead, it reads all (remaining) lines and returns them as a list of strings. (2)Line separators are preserved

  21. Basic Syntax R/W Open for read / Open for write OPen() Open for append A + for read-write access If you are a C programmer, these are the same file open modes used for the C library function fopen()

  22. Basic Syntax

  23. Basic Syntax • Traverse import os #递归遍历E:\Kugou目录下所有文件 def show(arg, dirname, filenames): print 'dirname:' + dirname for f in filenames: if os.path.isfile(dirname+'\\'+f): print '-----' + f os.path.walk('E:\Kugou', show, None)

  24. Basic Syntax • Class class FooClass(object): version=0.1 def __init__(self,nm='John Doe'): self.name=nm print 'Create a instance for',nm def showname(self): print 'Your name is ',self.name print 'My name is ',self.__class__.__name__ def showver(self): print self.version def addMe2Me(self,x): return x+x fool=FooClass() fool.showname() fool.showver() print fool.addMe2Me(5) print fool.addMe2Me('xyz')

  25. NLP with Python • Why choose Python? Python is a simple yet powerful programming language with excellent functionality forprocessing linguistic data. For a example(FIND-ing.py),a five-line Python program that processes testing.txt and prints all the words ending in ing

  26. NLP with Python • Regular Expressions(re.py) Regular expressions (REs) provide such an infrastructure for advanced text pattern matching, extraction, and/or search-and-replace functionality.they enable matching of multiple stringsan RE pattern

  27. NLP with Python

  28. NLP with Python

  29. NLP with Python • Word Frequence(WordFrequency.py) #实现单个文本的词频统计 wordlist=open('data.txt').read().split() wordfreq=[wordlist.count(p) for p in wordlist] dictionary=dict(zip(wordlist,wordfreq)) aux=[(dictionary[key],key) for key in dictionary] aux.sort() aux.reverse() for a in aux: print a

  30. NLP with Python • Web Downloads(WebDownloadHTML.py) import urllib url = "http://www.baidu.com" path = ".\web.html" urllib.urlretrieve(url,path)

  31. NLP with Python • Web Downloads(WebDownload.py) # -* - coding: UTF-8 -* - #下载网页图片到本地文件夹 import os,urllib2,urllib #设置下载后存放的本地路径"E:\img\1.jpg" path=r'E:\img' file_name=r'1.jpg' dest_dir=os.path.join(path,file_name) #设置链接的路径 #url="http://pic3.nipic.com/20090518/2662644_083611033_2.jpg" url="http://ww4.sinaimg.cn/large/99e79587tw1dx42v7j6bqj.jpg" #定义下载函数downLoadPicFromURL(本地文件夹,网页URL) def downLoadPicFromURL(dest_dir,URL): try: urllib.urlretrieve(url , dest_dir) except: print '\tError retrieving the URL:', dest_dir #运行 downLoadPicFromURL(dest_dir,url)

  32. NLP with Python • Word Segmentation def segment(text, segs): words = [] last = 0 for i in range(len(segs)): if segs[i] == '1': words.append(text[last:i+1]) last = i+1 words.append(text[last:]) return words text = "doyouseethekittyseethedoggydoyoulikethekittylikethedoggy" seg1 = "0000000000000001000000000010000000000000000100000000000" seg2 = "0100100100100001001001000010100100010010000100010010000" print segment(text, seg1) print segment(text, seg2)

  33. NLP with Python • Regular Expressions for Tokenizing and Tagging Text

  34. NLP with Python • Regular Expressions for Tokenizing Text Tokenizing.py mmseg

  35. NLP with Python • Reading Tagged Corpora

  36. NLP with Python • Named Entity Recognition

  37. Advanced Topics • Network Programming • Internet Client Programming • Multithreaded Programming • GUI Programming • Web Programming • Database Programming • Extending Python

  38. Thank You ! www.themegallery.com

More Related