1 / 13

HTRC Use Cases

HTRC Use Cases. HathiTrust Corpus Usage Patterns. HathiTrust Corpus. HathiTrust Corpus. HathiTrust Corpus. HathiTrust Corpus Usage Patterns (cont’d). C hapter 1. HathiTrust Corpus. C hapter 1. C hapter 1. Page IV. HathiTrust Corpus. Page IV. Page IV. Table of Contents 1………….#

oprah-cline
Download Presentation

HTRC Use Cases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HTRC Use Cases

  2. HathiTrust Corpus Usage Patterns HathiTrust Corpus HathiTrust Corpus HathiTrust Corpus

  3. HathiTrust Corpus Usage Patterns (cont’d) Chapter 1 HathiTrust Corpus Chapter 1 Chapter 1 Page IV HathiTrust Corpus Page IV Page IV Table of Contents 1………….# 2…………## HathiTrust Corpus Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………##

  4. Word Counts from HTRC Sample* • Top 10 words • the (1,092,274,158) • of (729,347,125) • and (515,034,460) • to (429,304,807) • in (337,513,888) • a (315,487,516) • that (167,847,940) • is (163,694,582) • was (138,907,857) • I (123,743,522) • Bottom 10 tokens • ¿°‘» • ¿°­¿ • ¿°° 1 ¿¦ • ¡••••••««• • ¡•••■•• • ¡►♦» • ¡—— • ¡„¡ • ¡■° 1 ¡•¦ 1 ¡► *Public Domain non-Google digitized HT materials, 250,000 volumes

  5. OCR Corrections on HTRC Sample

  6. HTRC Online Tools for Simple Analysis

  7. Tag Cloud Viewer

  8. Topic Modeling • Uses MALLET Topic Modeling to cluster • Top 8 topics showing at most 200 keywords for that topic

  9. Concept Mapping • Sentiment Analysis • six core emotions (Love, Joy, Surprise, Anger, Sadness, Fear)

  10. Correlation-Ngram Viewer

  11. Visualization for Extracted Entities Location Entity to Google Map Network Analysis Date Entity to Simile Timeline SEASR Project, UIUC, http://seasr.org

  12. Named Entity (NE) Tagging Mayor Rex Luthor announced today the establishment of a new research facility in Alderwood. It will be known as Boynton Laboratory. NE:Person NE:Time NE:Location NE:Organization SEASR Project, UIUC, http://seasr.org

  13. Metadata Enrichment • Gender • Genre • Structural • Chapters • Front matter • Indexes • Bibliographies • Part-of-Speech (POS) tagging Example source: http://www.stanford.edu/~mjockers/cgi-bin/drupal/node/17

More Related