1 / 63

Wordsmith Tools

Wordsmith Tools. Stella E. O. Tagnin - USP Corpus Linguistics, Translation and Terminology New Technologies in Translation - CAPES Universitat Rovira i Virgili-Universidade de São Paulo Tarragona July 8-11, 2008. How to use Wordsmith Tools to investigate a corpus. First.

Download Presentation

Wordsmith Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wordsmith Tools Stella E. O. Tagnin - USP Corpus Linguistics, Translation and Terminology New Technologies in Translation - CAPES Universitat Rovira i Virgili-Universidade de São Paulo Tarragona July 8-11, 2008

  2. How to use Wordsmith Tools to investigate a corpus

  3. First Download demo version of Wordsmith Tools 5.0 from Mike Scott’s site: http://www.lexically.net/wordsmith/version5/index.html

  4. Name: Tarragona and USP • > Other Details: • > Registration: SA00.3461.2978.3904.6880.9VVB • > • > When "Updating from Demo", please paste these details in • > EXACTLY as you see them here. • > Please see "readme.txt" for any further details. • > • > • > -- • > Mike Scott • >

  5. WordSmith Tools WordList • S = General Statistics • F = Frequency • A = Alphabetical KeyWords • Study Corpus vs Reference Corpus Concord • KWIC = Key Word In Context • Collocates • Clusters

  6. WordList • S = General Statistics: overview of corpus and texts • F = Frequency: most frequent words may point to topic • A = Alphabetical: make lemmatizing easier

  7. WordList - Statistics Identifying peculiarities • corpus (Overall) • each text

  8. Frequency WordList • Hint as to topic • Survey of most recurrent words in text/corpus

  9. Alphabetical WordList • Spotting words • Lemmatizing word forms

  10. KeyWords • Identifying prevailing vocabulary • Study Corpus vs Reference Corpus

  11. Keywords N WORD FREQ. WCUPING.LST % FREQ. REFENG2.LST % KEYNESS P 1 CUP 1.024 0,77 1 3.291,6 0,000000 2 WORLD 1.197 0,90 301 0,06 2.496,9 0,000000 3 TEAM 575 0,43 48 1.538,4 0,000000 4 GAME 486 0,36 22 1.396,6 0,000000 5 HIS 714 0,53 257 0,05 1.296,2 0,000000 6 GERMANY 435 0,33 14 1.284,9 0,000000 7 SOCCER 374 0,28 0 1.206,4 0,000000 8 HE 778 0,58 429 0,08 1.130,6 0,000000 9 ITALY 332 0,25 5 1.021,0 0,000000 10 SAID 670 0,50 343 0,06 1.017,6 0,000000 11 WAS 892 0,67 716 0,13 987,2 0,000000 12 PLAYERS 337 0,25 15 969,6 0,000000 13 GOAL 352 0,26 51 851,9 0,000000 14 BALL 260 0,19 2 815,9 0,000000 15 IN 3.214 2,40 7.019 1,31 761,0 0,000000 16 COACH 229 0,17 0 738,5 0,000000 17 TOURNAMENT 205 0,15 0 661,0 0,000000 18 SPORTS 234 0,18 13 658,5 0,000000 19 PLAY 264 0,20 37 643,5 0,000000 20 FRANCE 208 0,16 6 618,7 0,000000 21 FANS 193 0,14 1 610,2 0,000000 22 MATCH 265 0,20 49 604,4 0,000000 23 MINUTE 206 0,15 9 593,5 0,000000 24 BRAZIL 209 0,16 19 551,6 0,000000 25 WIN 193 0,14 15 521,2 0,000000

  12. Comparing 2 WordLists • Positive keywords (occurring vocabulary) • Negative keywords (NON-occurring vocabulary)

  13. ... and vocabulary that does NOT occur • Negative keywords

  14. Compiling a Glossary Selecting Terms • Keywords – term candidates (terminology) • Concord - context • Collocates • Clusters – multiword combinations, not necessarily terms or phrases

  15. Concord • KWIC = Key Word In Context • Collocates • Clusters

  16. Identifying patterns Context • Concordance lines • Lexical patterns – collocations • Grammatical patterns – colligations

  17. Collocates • Position of most frequent co-occurring words

  18. Compiling a Glossary Selection of Terms • Keywords • Clusters

  19. Clusters de 3 palavras 3-word clusters

  20. WordList • With more than one term • > Settings • > Tab List • > WordList Tab • > Clusters • > Activated

  21. WordList – 2 words

  22. WordList – 3 words

  23. How to ignore undesired text By tagging: • Title • Subtitle • Figure • Date • URL • etc.

  24. Adjusting Settings Controller • Settings •  Adjust Settings •  Only part of file

More Related