authorship attribution n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Authorship Attribution PowerPoint Presentation
Download Presentation
Authorship Attribution

Loading in 2 Seconds...

play fullscreen
1 / 14

Authorship Attribution - PowerPoint PPT Presentation


  • 165 Views
  • Uploaded on

Authorship Attribution. By Allison Pollard. What is Authorship Attribution?. The way of determining who wrote a text when it is unclear who wrote it. It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Authorship Attribution' - Olivia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
authorship attribution

Authorship Attribution

By Allison Pollard

what is authorship attribution
What is Authorship Attribution?
  • The way of determining who wrote a text when it is unclear who wrote it.
  • It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece
the basis
The Basis
  • A text makes use of all linguistic domains: semantics, syntax, lexicography, phonology (orthography) and morphology. Each of these domains is rule governed, yet, within these rules and among the components, the grammar offers the writer choices.
  • The text as an end product is an outcome of the particular choices taken by its author. This is why each specific text carries the fingerprints of its creator.
the assumptions
The Assumptions:
  • there is a specific single author
  • there are choices to be made
  • the author is consistent in his/her preferred choices
  • these choices are present and could be detected in all end products of that creator
computerized analysis
Computerized Analysis
  • Developed in the 1980s
  • Based on stylometry—the statistical analysis of literary style [quantifying some of the features of an author’s style]
method 1 word or sentence length
Method 1:Word- or Sentence- Length
  • The origin of stylometry
  • First developed in 1887, later extended in 1938
  • NOT reliable methods
method 2 function words
Method 2:Function Words
  • Relies on word usage and context-free (“function”) words
  • Analyze frequency, position, or immediate context of words
  • Criticized method, cannot reliably distinguish between certain literature types
method 3 vocabulary distributions
Method 3:Vocabulary Distributions
  • Measuring the “richness” or “diversity” of an author’s vocabulary
  • Analyzes the frequency profile of word-usage to glimpse the author’s extent of vocabulary
method 4 content analysis
Method 4:Content Analysis
  • Tabulates the frequency of types of words in a text
  • Aims to reach the denotative or connotative meaning of the text
method 5 neural networks
Method 5:Neural Networks
  • Recognize the underlying organization of data (which is vitally important for any pattern recognition problem, which Stylometry is)
past uses scholarly
Past Uses—Scholarly
  • Did Shakespeare write his own plays?
  • Who wrote the Federalist papers?
recent uses literary
Recent Uses—Literary
  • Determine who wrote the anonymously published novel Primary Colors [Joe Klein]
  • Target suspects for the authorship of the Unabomber’s Manifesto [Ted Kaczynski]
future uses beyond
Future Uses—Beyond
  • Identifying and blocking spam
  • Detecting lies, flag potential inconsistencies
  • Locate authors of malicious code
references
References
  • Ephratt, Michal. Authorship attribution - the case of lexical innovations. http://www.cs.queensu.ca/achallc97/papers/p006.html
  • Gerritsen, Corey M. Authorship Attribution Using Lexical Attraction. http://genesis.csail.mit.edu/papers/Gerritsen2003.pdf
  • Holmes, David I. Stylometry: Its Origins, Development and Aspirations. http://www.cs.queensu.ca/achallc97/papers/s004.html
  • Pfleeger, Charles P. and Shari Lawrence Pfleeger. Security in Computing. Pg 342.