Basics of content analysis Presented by Natalia Tomlin Assistant Professor and Technical Services Librarian B. Davis Schwartz Memorial Library, LIU Post
Defining content analysis “Summarizing, quantitative analysis of messages that relies on the scientific method” (Neuendorf, 2002) “Technique for the objective, systematic, and quantitative description of manifest content of communication” (Berelson, 1952) “Research technique for making replicable and valid inferences from texts (or other meaningful matter) in the context of their use” (Knippendorff, 2004) “Procedures for defining, measuring, and analyzing both the substance and meaning of texts or messages or documents” (Beck and Manuel, 2008)
Klaus Knippendorff Kimberly Neuendorf Stone, Dunphy, Smith, and Ogilvie, 1966
Content analysis: quantitative or qualitative? Quantitative – focus on numerically measurable objectives Research questions are stated as hypotheses Use of inferential statistics Qualitative – focus on how the things occur, how people think about processes, exploratory research, more holistic, natural approach, use of language as a primary data, researcher is a part of the project. Use of verbal categories and descriptive statistics Content analysis may be quantitative or qualitative
Brief history of content analysis XVII century – analysis of texts by Church Speed (1893) “Do newspapers now give the news?” – content analysis of New York newspapers 1930s-1940s – earlier content analysis studies by sociologists World War II – propaganda analysis 1950s – use of content analysis by psychologists, anthropologists, historians, linguists, educators, psychiatrists, literary critics, library science 1958 – first computer-aided content analysis Evolution from word count to discovering concepts
Content analysis: areas of implementation Written materials : books, journals, official documents, advertisements, speeches, conversations Visual items – films, clothing, work of arts Sound texts, operas, musicals, lyrics Combinations of communication content: blogs, webpages, performance art, computer programs Fields: marketing, literature, gender studies, political science, psychology etc.
Many purposes of Content analysis Disclose international differences in communication content Audit communication against objectives Code open-ended questions in survey Determine psychological state of a person or group Determine existence of propaganda Reveal focus of individual groups Reflect cultural patterns of groups Describe trends in communication content (Berelson, 1952)
Examples of content analysis studies Walker (1975) – differences and similarities in American black and white popular song lyrics, 1962-1973. Aries (1973) – socialization differences in male, female, and mixed-sex small groups Adams and Shriebman (1978) – content analysis of news media Graham, Kamins, Oetomo (1993) – analysis of advertisements in Japan and Germany Horton (1986) – analysis of young adult books Kaur-Kasior (1987) – treatment of culture in greeting cards
Content analysis inlis Turner and Beck (2002) - repair strategies of remote users searching the online catalog Sproles and Ratlege (2004) - librarian job ads Koufogiannakis, Slater, Coumley (2004) - content analysis of librarianship research Kuchi (2006) - academic libraries websites Tancheva (2003) - analysis of online tutorials Aharony (2009) - blogs of the librarians LIS thesis and dissertation research (1946-1963) 62% of dissertations used content analysis Koufogiannakis and Slater (2004) – content analysis is one of top 5 preferred research methods in LIS
Research questions Research question Do technical services jobs require more advanced technology skills than reference services jobs? (observed reality) Hypothesis : Technical Services jobs require more advance information technology skills (prediction of relationship between two variables) Importance of conceptual definitions of variables (exhaustive and mutually exclusive; previously developed or new) Coding is based on definitions *** Some content analysis studies may state hypothesis but do not employ tests for statistical significance
Content analysisdesign Data making Unitizing Sampling Coding Reducing Inferring Narrating
Units (what is to be observed Sampling units (issues of newspapers, blogs, individual speeches – what to include or exclude in the analysis,) Recording units (blog posts, specific newspaper column) Context units-what can be communicated within the text (words, phrases, pictures, ideas)
Sampling Sampling – ability to generalize the properties found in a sample to the population from which the sample is drawn Random - Simple random (random numbers generator) - Systematic (every n-th element is chosen) - Stratified (division of the population into different subgroups and then random selection the final subjects proportionally from the different strata) Non-Random - Purposive (selected based on the knowledge of a population and the purpose of the study) - Convenience
Coding Define the recording units (=unit of analysis) (word, sentence, theme, paragraph, whole text (text must be short) Define categories (variables) (mutually exclusive and how broad/narrow categories will be) Provide conceptual definitions for variables Test the scheme on a sample of the text Assess accuracy and reliability Revise coding rule if needed Test again Code all text Assess reliability and accuracy (2nd time)
example of Codebook Unit of analysis : individual job ad posting Conceptual definition: each academic library job ad posted between 2012 and 2013 on CHE website Job number Job posting date Job category Type of library Degree requirement Professional experience Preferred degrees Faculty status
codebook example Job number 001 002 Job posting date 1.01.01.2013-01.31.2013 2.02.01.2013-02.29.2013 Job category 1.Administrative 2.Instructional 3.Technical Services Type of library 1.Research library 2.Community college 3.4-year college Degree requirement 1.MLS only 2.MLS and one more masters degree 3.Other Professional experience 1.None 2.1+ years 3.3+ years
Codebook creation Use of established conceptual definitions is adding validity to the study (previous studies; established sources such as ODLIS) Exploratory studies are more likely to create their own conceptual definitions Codebook serves as a guide for coders and a record of the project Codebook needs to be refined during the pretesting Better to have too many categories than too few
Assigning enumerations to variables Nominal – numbers only used for labeling purpose, they have no true value. Example: type of library Ordinal – rank ordered Interval – numbers represent distance between categories within ranking. Example: Years of experience Ratio – always has ‘’0” value . Example: Age
Quality control Validation of coding schema through inter-coder reliability test Acceptable inter-coder reliability levels vary Reliability test is done at pilot stage and the end of the study and the results of the latter are reported in the study. Reliability problem can be addressed by additional training for coders, revising coding instructions, combining and separating categories. Calculating the agreement: nominal scale – percentage; Cohen’s kappa and Scott’s pi, Pearson’s correlation are used for scales beyond nominal
Reportingfindings Reporting in raw numbers, percentages, or frequencies Must directly address research questions Format: bars, charts, tables Test of statistical significance (Chi-square) = associations between nominal variables
Analysis of the study (1) “Libraries and public perceptions: A comparative analysis of the European press : Methodological insights” by Anna Galluzzi (2014) “The analysis of newspapers has been figured out as an alternative method to measure the relevance and the public perception of libraries”
“The research aims at quantifying and qualifying the presence of • issues concerning libraries in the European press over the last years • …in order to answer the following • research questions: • which are the most discussed topics concerning libraries and • have they changed over the last years? • are there any significant differences between the European countries in the debate about libraries? • are there any significant differences between the European newspapers in the debate about libraries”
“chronological span covered by the research is five years, from 2008 to 2012. This choice was made because 2008 is generally consideredthe starting point of the economic crisis which is still deeply affecting the Western economies and political scenarios” “Countries taken into account are the United Kingdom, France, Spain and Italy, since they are considered representative of different areas and cultural traditions in Europe”
“second selection was made among the numerous print newspapers published, with the objective of choosing two titles for each country according to the following basic criteria.The two newspapers were picked among those of national relevance, the most widespread and the oldest in each country, avoiding - if possible - those officially representing political parties and the radical ones.
The selected newspapers are the following: 1.The United Kingdom: The Times and The Guardian 2.France: Le Figaro and Le Monde 3.Spain: El Mundo and El País 4.Italy: Corriere della Sera and La Repubblica
The keywords used as query parameters in the full text search were ”librar*” and ”bibliot*” The articles retrieved using the above-mentioned parameters are 41,611. After the retrieval of the articles responding to the query parameters, the second step was to select the pertinent ones, i.e.thosearticles which concern libraries in a proper sense The pertinent articles are 3,659.
“After the selection, a text and content analysis of the articles was carried out. Though aware of the many advantages (speed, completeness, objectivity and precision) of an automatic processing, the risk to think that the whole analysis could be delegated to computer software, instead of using them to speed up and enhance it, was given a special credit. the analysis was carried out manually and no text analysis software was used, starting from the firm belief that no software can replace human reasoning. A certain degree of subjectivity was considered somewhat inevitable and acceptable”
“First of all, each article was identified with a univocal name and an ExcelTMworksheet was prepared to host the results of the coding. Then, the articles were analyzed and coded. At the beginning, the texts were carefully reviewed and all concepts and ideas were annotated as they appeared and then grouped.”
Variables/Coding categories 1.country 2.newspaper title 3.year of publication 4.prevalence or not of libraries as subject of the article 5.type of library considered: Public, National, Academic, School, Special/Specialized, No specification or more than one type 6.main topic of the article: Mission/Roles, Conservation/Holdings/Catalogue, Digitization/Digital libraries, History, Reading/Marketing, Politics/Strategy/Management, Library closures/Budget cuts, Internet/E-book/Technology, Services/Users, Staff/Recruitment, New libraries/New buildings, Acquisitions/Open access, Buildings/Architecture.
7.the newspaper section where the article is published: Opinions/Letters/Debates, Culture/Education, In brief, Cities/ /National news, World/International news, Market/Economy/Business, Society, Science, Other
Analysis of the study (2) “The Role of Online Videos in Research Communication: A Content Analysis of YouTube Videos Cited in Academic Publications” by Kousha, Thelwall, and Abdoli (2012) “This article explores the extent to which YouTube videos are cited in academic publications and whether there are significant broad disciplinary differences in this practice”
Research questions: “How frequently are YouTube videos cited in academic publications and has frequency of use declined at any stage since the birth of YouTube (2005–2011)? What types of YouTube videos are commonly cited in research articles? Are there significant broad disciplinary differences in citing online videos??
Data collection Researches “extracted URL citations to YouTube videos from academic publications indexed by Scopus from 2005 to 2011 across four broad disciplines: the sciences, medicine and health sciences, social sciences, and arts and humanities. We then viewed a sample of the cited videos and classified their contents using a specially designed classification scheme”
“viewed 551 randomly sampled cited videos from research articles (omitting reviews, conference papers, editorials, letters, and notes) from the Scopus searches. In many cases, we also read the descriptions of, and some comments on, the YouTube videos (if available) and searched for a lecturer or speaker biography to better understand video contexts. The first and third authors separately conducted an initial content analysis of the videos based on a primary classification scheme derived from a previous classification of YouTube videos tweeted by academics (Thelwall et al., in press). To reach a reasonable degree of agreement on the classification procedure, the two coders first cross-checked the categorization process for a sample of 80 videos from different subject areas, discussing the coding of different types of videos.
Examples of the categories they used: “Demonstration of a natural or formal science phenomenon: This subclass includes videos with an apparently scientific theme such as a real-time lab experiment in robotics Natural or formal science documentary: This subclass includes documentaries (usually with narration and edited with different types of shots) about natural or formal science Natural or formal science academic lectures: This group includes natural or formal science lectures, speeches, and talks by academics in conferences”
Limitations: “Another practical limitation was the complex and subjective issue of coding video contents. We discussed the coding system after the initial classification process and modified it several times to get general agreement. For instance, we first merged television shows and news-related videos into one class, but subsequently split them into two subclasses because shows are more related to arts and humanities whereas the news is more associated with the social sciences (e.g., political science and journalism). Furthermore, some scientific demonstrations also can be used for academic education, and, in rare cases, it was difficult to recognize whether they were created for scientific demonstrations, entertainment, or teaching.”
Analysis of the study (3) “An analysis of American academic libraries' websites: 2000-2010 “ by Noa Aharony (2012) “It is …interesting to trace the changes and developments that academic library websites have undergone over the last ten years, as expressed through the library websites themselves”
“research questions are: Is there a difference between the content of academic library websites in the year 2000 and in the year 2010? What are the LIS current trends and tendencies being expressed through those academic library websites?” Conceptual definition: “According to  McGillis and Toms (2001), a library website reflects its virtual public face, acting as a front door to the collections, services, and, to an extent, its staff”
“The first phase of the investigation involved choosing academic library homepages, which appear both on a current webpage and in the Internet Archive, to be included in the sample. These were located by examining the Association of College and Research Libraries (ACRL) accredited LIS schools, numbering 57. A total of 31 academic libraries were selected from this list based on the following criteria: The library has a current homepage. The library homepage appears in the Internet Archive in the year 2000. Four out of the 31 libraries were not found in the Internet Archive in 2000, so data were collected from the first year that they appear in the Internet Archive”
Time frame: “The year 2000 was chosen because: firstly, while the Internet Archive began archiving its documents in 1996, most of the academic library content is found from the year 2000 onwards; and secondly, a ten year period was deemed suitable for tracing the changes, developments, and trends of the last decade, which contained many20 March 2014 Page 3 of 11 ProQuest technological innovations and conceptual changes in the field of library and information science”.
She conducted “content analysis of academic library websites in the two periods, based on  Qutab and Mahmood's (2009) website content analysis and modified for the purpose of the current study. The modified checklist includes 42 items divided into eight categories: -site description -currency -website aids and tools -library general information -library resources services -links to e-resources -value added services.” “The final percentage of agreement for all coding decisions was 89 per cent, which suggests that the coding classification used was reliable”
Content analysis of the interviewtranscript (4) Interviews: recorded and transcribed Team of 4 coders (2 groups ) will work on assigned number of interviews Print-outs of the interview text need to be read and the concepts highlighted Each group needs to meet and agree on the highlighted concepts reporting percentage of agreement All four coders will meet and discuss all concepts and group them into larger categories
Advantages and drawbacks Operates directly with text/transcripts of communication Can use both- qualitative and quantitative operations Allows research of the historical documents Is an unobtrusive , nonreactive research technique Not geographically limited Time-consuming Reveals the content but not the content significance Can not make conclusions about motives, meanings, or effect of the messages Some texts (websites) have tight data collection periods
Special thanks to morgangelber. My talented and gifted daughter Who never gets the credit she deservers.