Using Old and New Media as Data Sources Jennifer Earl Associate Professor of Sociology & Director, Technology and Society PhD Emphasis University of California, Santa Barbara
Seminar Overview • Newspaper data (e.g., in my case, police action at protest events) • Online data sources (e.g., in my case, protest opportunities) • Main theme: using sources responsibly by taking the structure of the medium and research on the medium itself seriously
Questions Newspaper Data Might Address • Collecting data on coverage of major occurrences (rulings, upcoming cases, legislation) • Collecting data on coverage of categories of events or specific events (civil rights lawsuits generally or a specific lawsuit or decision)
More Questions Newspaper Data Could Address • Collecting data on coverage of specific organizations (e.g., major legal organizations, key lawyers) • Collecting data on the framing of an issue of the day (e.g., framing of new regulations, Supreme Court decision, etc.)
Questions Can Be Grouped into Broad Styles of Using Newspapers • Newspapers as archives of facts • Newspapers as attention indicators • Newspapers as frame barometers
How Do People Use Newspaper Data? Standard Methodological Practices • Select a set of articles (we will soon discuss how this selection happens) • Quantitatively content code selected articles (we will soon discuss units of analyses as well) if counts or stats are your goal • Qualitatively content code selected articles if themes, frames, or other emergent discursive phenomena are of interest
Quantitative Content Coding Example Source: Soule, Sarah A. and Jennifer Earl. 2005. “A Movement Society Evaluated: Collective Protest in the United States, 1960-1986.” Mobilization 10(3): 345-364. • Coded protest events discussed in paper along a range of variables, including reported size of event
Qualitative Content Coding Example Source: Noakes, John A. and Karin Gwinn Wilkins. 2002. “Shifting Frames of the Palestinian Movement US News.” Media Culture Society 24: 649-671 • Researchers may try to emergently identify different frames and then record their prevalence or look for change across time
Taking the Medium Seriously • Newspaper are a “mediated” media which means gate-keeping is important to consider at various levels • Space is limited, so coverage can be affected by competition for space, “newsworthyness,” reporting norms, and editorial concerns, among other influences • Hard versus soft elements of stories and differences in quality of reporting • Selection bias: what gets reported on and why? • Description bias: of things that get reported on, how accurate or informative is the coverage?
Selection Bias in Newspapers • Helpful reference: • Earl, Jennifer, Andrew Martin, John D. McCarthy and Sarah A. Soule. 2004. “The Use of Newspapers in Studying Collective Action.” Annual Review of Sociology 30: 65-80. • In some ways, this is an ironic concern since newspapers were initially a way to ameliorate other methodological problems such as selecting on the dependent variable
Causes of Selection Bias • Newsworthiness • Proximity of event to news source or wire service • Size/impact of the event • Ability to sensationalize the event (“If it bleeds, it leads…”) • Role of violence • Role of organizations as sponsors or media spokespersons • “Issue attention cycle” (Downs 1972) or seasonal affects (stories on poverty and homelessness reportedly rise during the holidays) • News agency effects • Location • Beats of reporters • Editorial direction
Mitigating Selection Bias • Is there an independent source that could help the researcher assess the potential seriousness of selection bias? • Do descriptive statistics suggest low frequencies of likely to be missed cases? • Can the sample be limited to reduce some effects (e.g., to a specific editorial period, to an area proximate to the news source) • Can multiple newspapers be used so that overlapping coverage may reduce selection bias?
Description Bias in Newspapers • Hard versus soft news items • Hard news: who, what, when, where, and why • Soft news: impressions, inferences, positions • Different kinds of description bias: • Omission of some information • Misrepresentation of some information • Framing in a clear direction
Mitigating Narrative Bias • Rely on hard elements of news stories for non-frame related questions • Where confirmatory sources are possible, double check hard elements of news stories as a check on their veracity • Make the tone of soft news elements the explicit focus of examination—is reporting becoming more or less favorable over time, for instance
Griswold’s Cultural Diamond • Social and cultural context • e.g., issue attention cycle • Audience characteristics • Newspaper characteristics • e.g., size of news hole (between papers and in comparison to different media like television) • Producer characteristics • e.g., paper style sheets if relevant • e.g., production routines such as beats, daily deadlines, etc. Source: Griswold, Wendy. 1986. Renaissance Revivals: City Comedy and Revenge Tragedy in the London Theatre, 1576-1980. Chicago: University of Chicago Press.
Other Practical Issues • What papers should you use? • a local paper? NYT as paper of record? • Does audience size/number of subscribers matter to your argument? • What is the unit of analysis? • Article, paragraph, sentence • Event • Organization • Frame
More Practical Issues… • How will you identify articles? • Manual skim of the paper • Semi-automated search • Fully automated search • Using indexes was popular but is now discouraged given the capacities of full text searches
Still More Practical Issues… • Are you using the right provider? • Lexus Nexus is probably not going to return the same results as Lexus Academic Universe • News Bank, ProQuest, ProQuest Ethnic News Watch, etc. are different still • Many major newspapers have their own archives as well • There may be different paper editions (e.g., am/pm) that are housed in different collections, different time coverages, different papers • Proximity searches may work differently
Taking the Structure of the Medium Seriously, Part I • There are a wide variety of “online” content types • Websites are best known • Chat rooms • Instant-messaging and SMS • Games and virtual worlds • Listservs and email • Twitter
Griswold’s Cultural Diamond Revisited • Peer and amateur production are important • Ability to broadcast or have point-to-point communication is important • Online media can be less intermediated • Popularity is still very intermediated, though Source: Griswold, Wendy. 2008. Cultures and Societies in a Changing World. Thousand Oaks, CA: Pine Forge. Page 155
Taking the Structure of the Medium Seriously, Part I • This variety of content types requires analyst to determine which content types are needed and then design a collection strategy designed for that type of content • Each content type has a different form factor, or design, that you need to consider • Research suggests different social and behavioral characteristics for content types • Producers and participants in each online arena may have different motives and skill levels • Many online content types allow connections of various types, although the meaning of those connections can be content specific
Questions Online Data Might Address • Data on organizational self-representation, taking advantage of less mediated media • Networks of interconnected sites, taking advantage of hyperlinks • Formation or maintenance of relevant communities (e.g., women in lawyering) • Many of the same things you could do with newspaper coverage: • Collecting different kinds of coverage of major occurrences (rulings, upcoming cases, legislation) • Collecting different kinds of coverage for specific kinds of events or specific events (lawsuits generally or a specific lawsuit or decision) • Collecting different kinds of coverage on specific organizations (e.g., Amenta’s data, but you could imagine doing that for major legal organizations, key lawyers) • Framing of an issue of the day
Taking the Structure of the Medium Seriously, Part II • Since there is not census or exhaustive map of all Internet communications, you have to search for what you want • This has very serious implications because you risk only finding what you look for, so be very careful about how you identify the set of cases you want to study
Case Studies of Key Websites • Case studies of key websites • major organizations • popular sites • Assumptions: • Popular or exemplary sites work like other sites, or, popular or exemplary sites are theoretically important albeit different • Social actors that migrate online operate similarly to social actors that emerge online (but see DiMaggio et al. 2001)
Case Studies of Small but an Important Field of Sites • Start with a few critical seed sites and crawl out • Assumptions: • Seed sites are appropriate (which is critical) • The “community” being mapped is coherent • There are not competing but separate communities or sets of sites Ackland, Robert and Rachel Gibson. 2004. "Mapping Political Party Networks on the WWW.' Available online at: http://voson.anu.edu.au/papers/political_networks.pdf
Construction of Communities (through hyperlinks at N-links away) • Start with a larger number critical seed sites and crawl even farther out, to N-links away • Assumptions: • Seed sites are appropriate (which is critical) • The “community” being mapped is coherent • There are not competing but separate communities or sets of sites • N-links are theoretically relevant Source: Garrido, M., & Halavais, A. (2003). Mapping Networks of Support for the Zapatista Movement: Applying Social-Networks Analysis to Study Contemporary Social Movements. In M. McCaughey & M. D. Ayers (Eds.), Cyberactivism: Online Activism in Theory and Practice (pp. 165-184). New York: Routledge.
Why Not Study Samples or Entire Populations? • People tend to find things on the Internet • by receiving an address directly • by searching using a searchable database (e.g., Google, MSN, etc.) • No population list • No standard format that would allow one to randomly generate a population list (as Random Digit Dialing did for phone surveys)
One Potential Solution: Sampling from Searches • Use the same tools that prospective users would have access to search, i.e., focus on “reachable” locations • Use the biggest net you can find and vet the consequences of constraints you place on searches
NSF CAREER Study on Web Protest • Studied random samples of websites discussing twenty different issue areas • Each issue area has its own random sample • Each issue is sampled at the same rate so one can build super-samples of larger, cross topic universes • Data comes from quantitatively content coding sampled websites and any actions that were hosted or linked on sampled websites
Other Methodological Decisions • What kind of online material to use? • What collection method to use and how will cases or data points be identified? • What is the unit of analysis going to be? • Will your strategy provide too little or too much coverage? • Does your collection strategy leave representation and producer problematized or make strong assumptions about these?
Beware of Technological Determinism • Does a technology itself “do” anything? • One way to discuss capacities without determinism is to discuss affordances that technologies provide • Capacities relative to other like technologies • Forces scholars to focus theories on which capacities are important • Understands that people variously use technology, meaning reality is always a mix of the spectacular and mundane, the popular and the crowd
Conclusions • Whether you are using old or new media: • Can that media provide data that really speaks to your theoretical and empirical questions? • Are you aware of how that media is structured and are you either mitigating those effects are making them explicitly part of your study? • Have you made unit of analysis, measurement, and larger design decisions with limitations of your data and methods in mind?