1 / 8

Project Part 2

Project Part 2. LING 572 Fei Xia 1/26/06. NLP Packages. FST: Carmel, AT&T toolkit TBL: fnTBL MaxEnt: DT: C4.5 Boosting: AdaBoost LM: SRI LM MT: GIZA++, Pharoah, …. Main steps. Download and compile the package, and test the code with given examples. License, citation

Download Presentation

Project Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Part 2 LING 572 Fei Xia 1/26/06

  2. NLP Packages • FST: Carmel, AT&T toolkit • TBL: fnTBL • MaxEnt: • DT: C4.5 • Boosting: AdaBoost • LM: SRI LM • MT: GIZA++, Pharoah, …

  3. Main steps • Download and compile the package, and test the code with given examples. • License, citation • Compilers, libraries, operating system • Create your own test data, write a few wrappers/converters, and test the code. • Fix bugs • Understand the main algorithm of the package: • Read README files, tutorials, and related papers • Check the source code. • Modify and improve the package • Run experiments

  4. Using fnTBL • Download and compile the package, and test the code: (< 1 hour) • Create your own test data, write a few wrappers/converters, and test the code: (about 6 hrs, my time) • Understand the main algorithm of the package: (?? Hrs) • Modify and improve the package: (?? Hrs) • Run experiments: (computer time) • 12 experiments

  5. Main tasks • Understand the code: • Core algorithm: fnTBL-1.1/src • POS tagger: perl_code/pos-train.prl and pos-apply.prl • A wrapper: perl_code/build_TBL_tagger1.pl • Modify the code: • Here you don’t need to change the core algorithm. • A new way of treating unknown words.  In Report2, explaining the algorithms and your modification

  6. Main tasks (cont) • Run the code with different settings • Corpus size: 1K, 5K, 10K, 40K • Feature templates: all the types or a subset • Treatment of unknown words  Report 1

  7. Report1 # of standard fewer feature w/ simple treatment sents case types for unknown words (tagger1.pl) (t=agger2.pl) (tagger3.pl) ================================================= 1K a11 a12 a13 5K a21 a22 a23 10K a31 a32 a33 40K a41 a42 a43 Replace each cell with a(b, c, d): a: tagging accuracy, b: # of lexical rules c: # of context rules, d: running time

  8. Files for the project • Files given to you: • fnTBL-1.1.linux.tar.gz • params/ • data/: • perl_code/ • Files that will be produced by you: • new_params/: feature templates • new_perl_code/: build_TBL_tagger3.pl, pos-train3.prl and pos-apply3.prl. • report/: Report1 and Report2 • result/: a11/, a12/, …., a43/

More Related