1 / 52

What Humanists Need to Know About Computing (and Computer Science)

What Humanists Need to Know About Computing (and Computer Science). Nancy Ide Department of Computer Science Vassar College. The Big Question. What is Humanities Computing? Any Humanist using a computer? Any Humanist using data relevant to his or her field that is stored on a computer?

inari
Download Presentation

What Humanists Need to Know About Computing (and Computer Science)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Humanists Need to Know About Computing(and Computer Science) Nancy Ide Department of Computer Science Vassar College The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  2. The Big Question • What is Humanities Computing? • Any Humanist using a computer? • Any Humanist using data relevant to his or her field that is stored on a computer? • Any Humanist creating data relevant to his or her field to be stored on a computer? • Any Humanist using an algorithmic process to analyze data stored on a computer? • Any Humanist creating an algorithmic process to analyze data stored on a computer? The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  3. Any Humanist using a computer? • This is certainly too broad to serve as a definition (these days) • Would include • Word processing • Web access • Email The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  4. Any Humanist using data stored on a computer? • This is better, but maybe still too imprecise • Search/retrieval from text, images, etc. • Searching a corpus for occurrences of a word, syntactic pattern, etc. • Searching a digitized image for patterns • Web access? The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  5. Any Humanist creating data to be stored on a computer? • Getting better… • Text encoding • Creation of corpora, lexicons, concordances, etc. • Digitized images • Databases • Hypertext/hypermedia • But do we include: • electronic publishing • creation of web pages, on-line course materials, etc.? The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  6. Any Humanist using an algorithmic process for analysis? • "Algorithmic process" = computer program (beyond search/access) • Includes use of: • Statistical routines • Named entity recognizers, part of speech taggers, syntactic analyzers, etc. • GIS, spatial modeling routines, etc. • We seem to be on more solid ground here… The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  7. Any Humanist creating an algorithmic process? • E.g., writers of text analysis software • Probably others, but not many come to mind… • This would seem to be a relatively small segment of the Humanities Computing community The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  8. What Does That Leave Us With? Data creation Data use HUMANITIES COMPUTING Algorithm use Algorithm creation Computer use The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  9. In Terms of Percentages…? Data creation Data use Algorithm use Algorithm creation The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  10. A Brief Historical Aside • Humanities Computing in the 1960's • Indistinguishable from "computational linguistics" • Use of statistics to analyze language • Concordance creation, dictionary creation, corpus creation The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  11. Humanities Computing in the 1970's • Computational linguistics embraced the symbolic approach and abandoned (even scorned) statistical analysis, now the province of HC • Stylistic analysis, authorship studies, literary analysis • Creation of resources (corpora, lexicons, concordances) continues • First development of software for text analysis The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  12. Humanities Computing in the 1980's • More of the same • Electronic scholarly editing becomes big • But the PC introduced a new contingent: • Word processing • Computer-assisted learning • Late '80's : Two major events • TEXT ENCODING INITIATIVE • Computational linguistics re-discovers statistics and language resources The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  13. The TEI establishes text encoding as a core activity of HC • CL's embrace of statistics and resource building blurs the distinction between HC and CL • Stronger computational skills in the CL community enable them to "steal" much previous HC work and take it farther The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  14. Humanities Computing in the 1990's • Text encoding still major focus • Word processing, computer-assisted learning drop out • Addition of several others due to increased computational power: • Digital images • Hypertext/hypermedia • Digital libraries • Advanced modeling tools • Web-based work • Electronic publishing The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  15. Now • CL has largely taken over development of statistical methods for language analysis, including HC staples such as authorship, stylistics • Also taking over some major kinds of resource creation (corpora, lexicons, etc.) • CL working on text encoding as well, esp. in context of W3 developments (XML, RDF, Semantic Web stuff) Are these things still a part of Humanities Computing? The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  16. [end of aside] The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  17. So, What Do Humanists Need to Know About Computing? My Previous Argument • Back in the mid-1980's, I argued that Humanists needed to know how to write computer programs • The chart on the earlier slide suggests this is probably not the case anymore • My Current Argument • The fundamental intellectual skills I was concerned • about are still what is needed The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  18. Data Use • A lot of this is search/retrieval • What does this require? • Know what you are looking for and how it is instantiated in the data • Example: looking for certain imagery in a text • First have to define “imagery” -- think in terms of character patterns • Does data include lemmas? • Example: Searching a database • What is in the DB and how is it structured? The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  19. Know how to formulate your query in precise terms • Rudimentary knowledge of boolean logic • Sometimes, knowledge of query language • E.g., SQL for databases • More generally, knowledge of tools for access and retrieval, and what they can and cannot do The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  20. Data Creation • Fundamentally, a data modeling problem • Identification of the objects in the data and their properties • Decomposition into sub-components • Identification of the relations among the objects • Structural relations: Inclusion? Super-set? Overlap? Parallel? • Logical relations :e.g., "author-of" may be a relation between a "title" object and a "name" object The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  21. What Does One Need to Know? • The obvious: • XML, XSL/XSLT, XML schemas (+ tools) • RDF, RDF schemas • Familiarity with Semantic Web work (ontologies) • TEI, EAGLES/ISLE XML Corpus Encoding Standard • The not-so-obvious: • Data modeling principles • Component analysis • Identification of components vs. relations vs. properties The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  22. Encoding options • E.g., nested tags (implicit relation) vs. link (explicit relation) • Many documents vs. one • Why it may or may not matter given XSLT and RDF BOOK PARTS: FRONTMATTER CHAPTERS CHAPTER PARAGRAPHS PARAGRAPH BACKMATTER RELATIONS: TITLE AUTHOR PUBLISHER PROPERTIES: PUBLISHED/UNPUBLISHED MONOGRAPH/EDITED VOLUME The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  23. The TEI Guidelines represent one of the most extensive data modeling efforts ever • World Wide Web Consortium developments like RDF schemas and work on the Semantic Web take us up another level • Powerful mechanisms for specifying relations and properties • “object-oriented” model of class membership, inheritance of properties, named relations, etc. • Instantiated objects can be element in a document, whole document or collection, etc. • Semantic Web work is defining ontologies for web data, will enable inferencing etc. over objects and their relations • Simple example: if a person is the author of a government document, we can deduce that he/she is a government official The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  24. Algorithm Use • First, need a good survey of what is out there and how to get it • Need sound idea of what the algorithm does • Need a very good idea of what the input and output mean in terms of the algorithm • “garbage in, garbage out” • “use the appropriate model” The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  25. Example: Statistics • On the one hand: • Minimally, need to know what things like principal components analysis, Pearson correlation, etc. are intended to tell you • Have to have some knowledge of randomness/chance vs. reliable confidence levels, etc. • On the other: • Have to understand what your input is/needs to be (formal representation) • Have to be able to interpret output • Have to know when it does and does not make sense to use statistical methods, in terms of humanities goals The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  26. Need to know what tools exist, how to get them • Need to learn how to use multiple tools to accomplish a task • E.g., many programs for automatic part of speech tagging, shallow syntactic bracketing, etc., are available for free; WordNet is a free resource containing information about word relations (synonomy, hyperonymy) • Could use a sequence of such programs to start with a “raw” text, perform various kinds of semantic analysis The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  27. Algorithm Creation • Need to know how to program in some useful language • Need to be very familiar with at least one operating system (preferably UNIX/LINUX) • BUT THIS IS ONLY A START… The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  28. Most important: Master principles of program/software design • More generally, this is a way of thinking about problems and their solutions • Abstraction over concepts • Modularity, generalization • Concepts like recursion • Sound data structuring practices • Good languages to develop this: • LISP/Scheme • Perl • Java (maybe C++ if one is disciplined…) The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  29. Bottom Line • Humanists are not (necessarily) trained to think formally about problems and their solutions • For activities we consider to be “humanities computing” at any level, it is necessary to formalize concepts that we may not be used to formalizing The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  30. Some things are easier to formalize, even when haven’t done so before • E.g., basic data models for document types • Other things are harder: • E.g., a formal specification of “imagery” that the computer can find in a text The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  31. My Current Argument • Humanities Computing Curricula should have as the ultimate goal the development of intellectualskills as well as computational skills • Specifically, formalization of “data”, “problem”, and “solution” • This may be a new way of thinking for many, or a kind of thinking that needs to be more fully developed The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  32. How do we develop these skills? • By doing, all the way up the line from using a computer for basic tasks through data creation through programming • But we already do that… The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  33. So what’s new? • Typically these intellectual skills are expected to develop “bottom-up” • Most humanists do not go far enough to reach the “top” on their own, so never see the “big picture” • Their computer science skills--in terms of principled problem statement, abstraction, etc.--do not develop adequately • You can end up with messy (i.e., unreusable, non-extensible) code, badly formulated problems and badly applied algorithms (yielding unreliable results), and data that is much harder to use for tasks other than that for which it was designed than it should be The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  34. HC Curricula Need to Develop an Approach that is at once “top-down” (developing intellectual skills) and “bottom-up” (developing practical skills and knowledge) The driving force of the curriculum should be exercise in formalization The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  35. A Few Suggestions for HC Curricula • Don’t hesitate to have students put pen to paper before getting in front of the computer • Example: Students take a problem in their own discipline and “translate” it into formal terms, on paper • Then implement The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  36. “I’m interested in Blake’s imagery” • What is an “image”? • Are there text patterns that realize images according to your definition? • Are there patterns of, for example, their distribution across the text that can tell you anything about it? • Do you need to have your text lemmatized? Tagged for part-of-speech? Would knowing synonyms, hypernyms, etc. help in any way? The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  37. “I want to organize all my information about [some historical figure] so I can get to specific pieces of information directly, explore connections, etc.” • Give an overview of the relational DB model • Students design relational DB for their data on paper • Enter into a standard DB application • See where problems lie • Can you do the queries you want? The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  38. “I want to create a corpus of [some literary figure’s] poetry • Develop a data model on paper • Think about different “views” of the data and ramifications of encoding choices • E.g., I see names as representing some person associated with other features, properties • Linguist sees it as a proper noun • Can I encode this so as to make it easier to see both views? • Instantiate model as an XML schema, test on a small sample (parse) The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  39. Give Students a Peek Under the Hood • Provide information on how computers work internally, how data is structured, accessed, etc. • Show them some real code (e.g. Java, Perl, LISP…), have them “read” and follow it • Make sure the examples embody sound programming principles, modularity, good data structures, etc.--and point this out! • Show them how digital images are stored, accessed at the gut level of the machine • Etc. -- the more the better The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  40. Teach good programming strategies even if students will never program • Top-down, modular design, abstraction, generalization are mental disciplines that can/should be applied everywhere • This is the art of computer science! • Teach these things as a methodology • Start small, test, add…etc. • E.g. data model for corpus--build XML schema in stages (encode, test) based on model The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  41. Don’t hesitate to have students perform exercises that are not directly relevant to what they want to do, if it will increase their facility with problem formulation, data organization, etc. Intellectual skills develop with practice The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  42. Show Students How to Put Pieces Together • Play around with UNIX and UNIX tools like grep, cut, sort, uniq, wc, and awk etc. and see how much they can accomplish • E.g., a frequency dictionary for a text: Use awk to isolate each word on a line Sort | uniq -c • They’ll see the magic as well as the “bugs”--the things that are treated as a “word” The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  43. Show Students How to Get the Pieces • Our most valuable resource is the WWW: we can find all sorts of freely distributed tools and resources • Students should do this as second nature MY OPINION The most pervasive problem in HC is a lack of awareness/exploitation of tools and resources, and of work done by others considered to be out of the field The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  44. Another Exercise:Ontology Building • One of the most important activities to which Humanists are well-placed to contribute is the development of domain-specific ontologies • This includes “meta-data” • Will be used to build the Semantic Web The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  45. ? NOUN PROPER NOUN kind-of NAME EVENT-NAME PERSON-NAME PLACE-NAME ORGANIZATION-NAME part-of TITLE FIRST-NAME MIDDLE-NAME LAST-NAME Exercise • Students build an ontology for some domain or sub-domain The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  46. RDF Schema • Instantiate as an RDF Schema, using freely available tool • Don’t need to know RDF syntax • Focus on the objects and relations being described The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  47. Beyond Ontologies • Inferencing over ontologies can enable discovery of implicit relations, show inconsistencies • Exercise: Have students represent their ontologies in a standard (free) logic system (e.g., CLASSIC), query The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  48. E.g., query a set of historical documents: “give me all the government officials between 1860-1865” • System does not have the information directly, can deduce that a government official is an author of a government document, pick all authors of gov’t docs between 1860-65 Welty & Ide, (1999). Using the right tools: Enhancing retrieval from marked-up documents. Computers and the Humanities 33:1-2, Special Issue on the Tenth Anniversary of the Text Encoding Initiative The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  49. And Not Least Important… • Students need to recognize the limitations of formalization • This should always be at the back of the instructor’s agenda • Students need to explore expanding the limitations of formalization • Think hard about new ways to formally represent or analyze sometimes very “non-formal” things The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

  50. Final Words Humanists need to learn computing skills based on Computer Science TEACH STUDENTS HOW TO LEARN Provide them with intellectual skills The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

More Related