Close, Distant, and Scalable Reading
This presentation is the property of its rightful owner.
Sponsored Links
1 / 56

Close, Distant, and Scalable Reading Glenn Roe & Martin Wynne [email protected] Summer School July 10 2013 PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on
  • Presentation posted in: General

Close, Distant, and Scalable Reading Glenn Roe & Martin Wynne [email protected] Summer School July 10 2013. Close Reading.

Download Presentation

Close, Distant, and Scalable Reading Glenn Roe & Martin Wynne [email protected] Summer School July 10 2013

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

Close, Distant, and Scalable Reading

Glenn Roe & Martin Wynne

[email protected] Summer School

July 10 2013


Close reading

Close Reading

Close reading: "operates on the premise that literature, as artifice, will be more fully understood and appreciated to the extent that the nature and interrelations of its parts are perceived, and that that understanding will take the form of insight into the theme of the work in question. This kind of work must be done before you can begin to appropriate any theoretical or specific literary approach”.


Close reading1

Close Reading

Close reading: "operates on the premise that literature, as artifice, will be more fully understood and appreciated to the extent that the nature and interrelations of its parts are perceived, and that that understanding will take the form of insight into the theme of the work in question. This kind of work must be done before you can begin to appropriate any theoretical or specific literary approach”.

[A] finely detailed, very specific examination of a short poem or short selected passage from a longer work, in order to find the focus or design of the work [...] the meaning of the microcosm, containing or signaling the meaning of the macrocosm (the longer work of which it is a part). To this end "close" reading calls attention to all dynamic tensions, polarities, or problems in the imagery, style, literal content, diction, etc”.

http://theliterarylink.com/closereading.html


Close reading as the paradigm for text based humanities scholarship

Close Reading as the paradigm fortext-based humanities scholarship


But what do you do with a million books

But what do you do witha million books?

There are only about 30,000 days in a human life -- at a book a day, it would take 30 lifetimes to read a million books and our research libraries contain more than ten times that number. Only machines can read through the 400,000 books already publicly available for free download from the Open Content Alliance.

  • Gregory Crane, “What do you do with a million books?”

    D-Lib Magazine, March 2006


And 5 million books

And 5 million books?

We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of “culturomics” focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. “Culturomics” extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.

www.sciencexpress.org / 16 December 2010


Culturomics

Culturomics…


Distant reading

Distant Reading

Distant reading: where distance, let me repeat it, is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems. And if, between the very small and the very large, the text itself disappears, well, it is one of those cases when one can justifiably say, less is more. If we want to understand the system in its entirety, we must accept losing something. We always pay a price for theoretical knowledge: reality is infinitely rich; concepts are abstract, are poor. But it’s precisely this ‘poverty’ that makes it possible to handle them, and therefore to know. This is why less is actually more.

Franco Moretti, “Conjectures on World Literature” Distant Reading, 2013.


Distant reading1

Distant Reading

A canon of 200 novels, for instance, sounds very large for 19th-century Britain (and is much larger than the current one), but it still less than %1 of the novels that were actually published […] and close reading won’t help here, a novel a day every day of the year would take a century or so … And it’s not even a matter of time, but of method: a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole.

Franco Moretti, Graphs, Maps, Trees: Abstract Models for Literary History, 2005


Digital humanities and distant reading

Digital Humanities andDistant Reading

The Humanities discovers data (DH 1.0  DH 2.0)

Quickly leads to a “data deluge” (arslonga, vita brevis)

Big Data approaches to Humanities collections (e-Research)

From accelerated research to new knowledge discovery


Digital humanities and distant reading1

Digital Humanities andDistant Reading

The Humanities discovers data (DH 1.0  DH 2.0)

Quickly leads to a “data deluge” (arslonga, vita brevis)

Big Data approaches to Humanities collections (e-Research)

From accelerated research to new knowledge discovery

digital


Digital humanities and distant reading2

Digital Humanities andDistant Reading

The Humanities discovers data (DH 1.0  DH 2.0)

Quickly leads to a “data deluge” (arslonga, vita brevis)

Big Data approaches to Humanities collections (e-Research)

From accelerated research to new knowledge discovery

digital > digitisation


Big data and the humanities

Big Data and the Humanities

  • How Big is Big?

  • The Complete Works of Voltaire (Voltaire Foundation):

  • 1,077 individual works, 6.7 million words

  • The Digital Encyclopédie of Diderot and d’Alembert (University of Chicago):

  • 28 volumes in folio; 74,00 articles; 21.7 million words

  • Electronic Enlightenment (University of Oxford):

  • 60,000 letters, 23 million words

  • ECCO-TCP (Oxford Text Archive):

  • 2,300 volumes, 75 million words

  • ARTFL-Frantext (University of Chicago):

  • 3,500 volumes, 215 million words

  • Early English Books Online EEBO (Northwestern University):

  • 23,000 volumes, ~1 billion words


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

Matt Jockers,

University of Nebraska-Lincoln

Macroanalysis: Digital Methods and Literary History (UIUC Press, 2013)


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

Matt Jockers, Macroanalysis (2013).


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

Simon Raper, “Graphing the history of philosohy”


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

  • Distant Reading has a Long History:

  • Annales School, Book History, etc.

  • Counting, not reading:

  • After death inventories

  • Library holdings/circulation records

  • Archives of publishers

  • Vocabulary of titles (Furet)

  • Censorship records

  • Martin, Furet, Darnton, Chartier, etc…


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

Robert Darnton, The Forbidden Best-Sellers of Pre-Revolutionary France (New York, 1995), 189.


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

From “distant” (not) reading to close reading and back again...

Digital Humanities as a locus for “scalable” reading practices

DATA: digitally assisted text analysis

Martin Mueller,

Northwestern


Digital humanities as locus for scalable reading

Digital Humanities as locus for “Scalable Reading”

By “not reading” we examine:

concordances, 

frequency tables, 

feature lists, 

classifications, 

collocation tables,

statistical models, networks, etc…

We can track:

Literary topoi (E.R. Curtius), concepts (R. Koselleck, Begriffsgeschichte), épistémès (M. Foucault) and other semantic patterns: over time, between categories, across genres.

So that distant reading and data-driven analysis can provide larger contexts for close reading(s) and traditional scholarship.


Digital humanities as locus for scalable reading1

Digital Humanities as locus for “Scalable Reading”

Three primary areas of Digitally Assisted Text Analysis:

1. Computational/Corpus Linguistics

2. Information Retrieval

3. Text Mining and Data Visualization


Corpus linguistics and scalable reading

Corpus Linguistics and Scalable Reading

Corpus

Concordance

Collocation

Sinclair, John, Corpus, Concordance, Collocation, Oxford University Press, 1991


Some testable assertions

Some testable assertions

State

  • “...no political writer before the middle of the sixteenth century used the word 'state' in anything like its modern political sense [referring to the machinery of government and social control]” (Skinner, Quentin, The Foundations of Modern Political Thought, Cambridge University Press, 1978).

    Tudor

  • “The idea of a "Tudor era" in history is a misleading invention, claims an Oxford University historian. Cliff Davies says his research shows the term "Tudor" was barely ever used during the time of Tudor monarchs.” (http://www.bbc.co.uk/news/education-18240901 May 2012)

    Holocaust

  • “I will argue that “The Holocaust” is an ideological representation of the Nazi holocaust...Until recently, however, the Nazi holocaust barely figured in American life. Between the end of World War II and the late 60s, only a handful of books and films touched on the subject”. (Norman Finkelstein, The Holocaust Industry. Verso, 2000.)


A new opportunity

A new opportunity

“It is not easy to justify assertions about the alleged frequency of infrequency of some particular belief or attitude in the past. How many examples does one need to cite in order to prove the point? Lacking any satisfactory method of quantifying these matters, all I can do is to record my impressions after long immersion in the period”.

Keith Thomas, The Ends of Life, Oxford University Press, 2010.


Intellectual history

“We cannot hope to understand the behaviour of people long dead, unless we can reconstruct the mental assumptions which led them to act as they did.”

- Keith Thomas, The Ends of Life, Oxford University Press, 2010.

Evidence:

Writing

Speech

Thoughts

Actions

Artefacts (art, architecture, cooking, etc.)

Other?

Intellectual History


Isn t this just googling stuff or isn t it just looking up words in online text collections

Isn't this just Googling stuff?

or

Isn't it just looking up words in online text collections?

An objection (or two)


The perils of interpretation

How do we interpret the results? We need to ask the questions:

What's in my corpus?

What's missing from the population of texts which the corpus is sampled from?

What claims can I make about results from this dataset?

What is the right tool for the job?

Will I successfully retrieve all occurrences of the word forms which I am looking for?

How can I make my search term more sophisticated?

What claims can I make about the significance of the frequencies?

How can I improve the process, and refine the results?

What do I need to investigate further?

The perils of interpretation…


Close distant and scalable reading glenn roe martin wynne digital humanities oxford summer school july 10 2013

DH Research and Development:

Full text search/retrieval

Tool development

Text mining approaches

PhiloLogic search engine

Distant > Scalable Reading


Information retrieval philologic search engine

Information Retrieval:PhiloLogic search engine

Open source full-text search and analysis system based on traditional models of humanistic textual scholarship.

Used worldwide by a number of teams independently of its French roots:

Perseus under PhiloLogic - Greek and Latin Library

The École des Chartes in Paris - medieval charters, etc.

Brown Women Writers Projects - heavy TEI encoding -- (Early Modern Women's Studies and The Scholarly Technology Group of Brown University)


Information retrieval philologic search engine1

Information Retrieval:PhiloLogic search engine

Maison de Balzac in Paris (scholarly on-line edition of Balzac's Comédie humaine)

Abraham Lincoln Digitization Project at Northern Illinois University

Indica et Buddhica - Sanskrit texts compiled by an Independent scholar in New Zealand

Alexander Street Press, a commercial on-line publisher. Many collections of large data sets, including a large collection of Black drama (about 1,200 plays)


Information retrieval philologic search engine2

Information Retrieval:PhiloLogic search engine

PhiloLogic3's general features include:

Word and phrase searching:

  • Proximity searches in sentences and paragraphs.

  • Similarity searches - fuzzy matching (wildcards*)

    Corpus definition using rich metadata at the document and sub-document level (Author, Title, Dates, Genre, etc.)

    A variety of advanced reporting features:

  • Concordances

  • KWICS (Keyword in Context)

  • Frequency distributions per period/work/author, etc.

  • Collocations and collocation tables


Information retrieval philologic search engine3

Information Retrieval:PhiloLogic search engine


Information retrieval philologic search engine4

Information Retrieval:PhiloLogic search engine


Information retrieval philologic search engine5

Information Retrieval:PhiloLogic search engine


Information retrieval philologic search engine6

Information Retrieval:PhiloLogic search engine


Information retrieval philologic search engine7

Information Retrieval:PhiloLogic search engine


Information retrieval philologic search engine8

Information Retrieval:PhiloLogic search engine


Information retrieval philologic search engine9

Information Retrieval:PhiloLogic search engine


From words to works extensions to philologic

"From words to works": Extensions to PhiloLogic

PhiloMine:  machine learning & text mining package

Open Source: http://code.google.com/p/philomine/

PhiloLine/PAIR:  sequence alignment algorithms for text

comparison

Open Source: http://code.google.com/p/text-pair/


Different similarities different searches

Different Similarities, Different Searches

  • Computing similarity is what enables search/retreival

  • Different kinds of similarity, different levels of text objects, different kinds of search

  • PhiloLogic finds and analyses word occurrences

  • PhiloMine compares texts using word vectors to find topical or stylistic similarity

  • PAIR sequence alignment compares texts using ordered sequences of words to identify text reuse


Styles of search

Styles of Search

Standard search

  • Find all occurrences of a word (PhiloLogic)

  • Find webpages about a word or concept (Google)

    Comparison queries (PhiloMine)

  • Find differences between sets of documents

  • Find mislabeled documents

  • Find similar documents

    "Unsupervised" search

  • Segment a document into topical chunks

  • Cluster documents into cohesive groups (PhiloMine)

  • Find repeated text in a corpus (PAIR)


Distant reading at close range or close reading at a distance

Distant reading at close rangeor close reading at a distance


Distant reading at close range voyant tools

Distant reading at close range:Voyant Tools


Distant reading at close range or close reading at a distance1

Distant reading at close rangeor close reading at a distance

Martin’s slides: some examples from Voyant, bring it all together?

Close/distant/scalable reading

corpus linguistics

info retrieval (full-text search/analysis)

text mining and data viz.

DATA – Digitally Assisted Text Analysis


Tools for scalable reading

Tools for scalable reading


  • Login