gender in twitter styles stances and social networks
Download
Skip this Video
Download Presentation
Gender in Twitter: Styles, Stances, and Social Networks

Loading in 2 Seconds...

play fullscreen
1 / 52

Gender in Twitter: Styles, Stances, and Social Networks - PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on

Gender in Twitter: Styles, Stances, and Social Networks. Tyler Schnoebelen (reporting joint work with David Bamman and Jacob Eisenstein). At its most basic. At its most basic. Assumption 1: Men and women use different vocabularies

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Gender in Twitter: Styles, Stances, and Social Networks' - gerd


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
gender in twitter styles stances and social networks

Gender in Twitter: Styles, Stances, and Social Networks

Tyler Schnoebelen

(reporting joint work with David Bamman and Jacob Eisenstein)

at its most basic1
At its most basic
  • Assumption 1: Men and women use different vocabularies
    • Hypothesis I: Computational methods can cut through noise and predict speaker gender based on the words they use
  • Assumption 2: Social networks are typically “homophilous” (birds of a feather flock together)
    • Hypothesis II: Adding the gender make-up of a user’s social network should get even better prediction
actual goal
Actual goal
  • Problematize gender prediction as a task
    • Define a system where we could just “stop” and call it good
    • But NOT ACTUALLY STOP
  • Demonstrate that simple gender binaries aren’t actually descriptively accurate
  • Show ways to combine social theory and computational methods that expand the questions on both sides
standard is a keyword
“Standard” is a keyword

Although standard poodles isn’t what Cheshire (2004), Cameron & Coates (1989), Eckert & McConnell-Ginet (1999), Holmes (1997), or Romaine (2003) have in mind.

typical findings
Typical findings
  • Women use standard variables more often than men.
    • In fact, early dialectologists ignored women completely because they wanted “NORMS”—non-mobile, older, rural male speakers, seen as preserving the purest regional (non-standard) forms
      • See Chambers and Trudgill (1980).
    • Did they do it for prestige (to acquire social capital)?
    • To avoid losing status?
    • Are women actually creating norms, not following them?
  • Check out your text book (“Whose speech is more standard”) for more complications to this picture
more computational work
More computational work
  • People are fascinated by gender differences
  • In order to get statistical significance, you have to have enough data where you can detect a signal
  • In the past, this has led researchers to roll up words into word classes
the most common distinctions
The most common distinctions
  • Men use informative language
    • Prepositions, attributive adjectives, higher word lengths
  • Women use involved language
    • First and second person pronouns, present tense verbs, contractions
or by contextuality
Or by “contextuality”
  • Men are formal and explicit
    • Nouns, adjectives, prepositions, articles
  • Women are deictic and contextual
    • Pronouns, verbs, adverbs, interjections
  • “Contextuality” decreases when an unambiguous understanding is more important or difficult—when people are physically or socially farther away
our approach also lumps
Our approach also lumps
  • It’s just at a lower level because instead of “nouns” or “blog words”, we have “unigrams”.
    • We also ran our work with part-of-speech tagged unigrams for one level less lumping—the results are basically the same but not reported here.
  • Lumping itself isn’t a problem. In fact, you have to.
    • But ideologies are going to structure your lumpings, so watch out!
slide17
Data
  • Public Twitter messages in same-gender and cross-gender social networks
    • Word frequencies (unigrams)
    • Gender (induced from first names)--e.g., The Social Security Administration says:
      • Tyler is a male name 97.36% of the time
      • Penny and Annette are female names 100% of the time
      • Robin is female 87.69% of the time
  • 14,464 Twitter users (56% male)
    • Geolocated in the US
    • Must use 50 of top 1,000 most frequent words
    • Between 4 and 100 “mutual @’s” separated by 14-days
      • Women have 58% female friends
      • Men have 67% male friends
  • 9.2M tweets, Jan-Jun 2011
first step take the normal route
First step: take the “normal” route
  • Train a statistical model on part of the data.
    • Logistic regression
  • Test it on a different part of the data, hiding the gender labels.
    • 10-fold cross-validation: 10 unique training/test splits (so the test is a different 10% of the data)
  • State-of-the-art prediction: 88.9%
    • Lexical features do strongly predict gender
    • Ignoring syntax (treating tweets as “bags of words”) does pretty good
are women less standard
Are women less standard?
  • Female markers:
    • okay, yes, yess, yesss, yessss
    • nooo, noooo
    • cannot
  • Male markers:
    • yessir
    • nah, nobody
    • ain’t
  • What counts as standard?
hand classification 94 2 agreement
Hand classification (94.2% agreement)

At a corpus level, women use more non-dictionary words and men use more named entities. In a moment we’ll ask how universal this is.

but wait
But wait
  • “Dictionary” words are really diverse
    • There’s a sense that dude (m), cute (f), epic (m), and lovely (f) are “stylistic” in a way that ability (m), correct (m), lipstick (f) and sleepy (f) are not, but how would we pin this down?
    • Part of speech?
    • But in what way do cute (f), hot (f), epic (m), and solid (m) belong with correct (m), offensive (m), sleepy (f) and glad (f)?
    • And for hot and solid, the “style” or “content” division depends on the intended word sense.
involvement
Involvement
  • Using traditional definitions, we’d say that our data confirms
    • men as more informational (all those named entities)
    • women as more interactive/involved (pronouns, emoticons, etc).
  • Recall that most of the named entities for the men are sports figures and teams.
shit girls say
Shit Girls Say

http://www.youtube.com/watch?feature=player_embedded&v=u-yLGIH7W9Y

notice
Notice
  • That gender wasn’t really limited to the “gender” column
    • “Moms” and “dads” are gendered social roles
  • And that the words “guys” and “girls” aren’t really the same as “male” and “female”
    • What are the plausible age ranges and social styles for “guys” and “girls”?
clustering without regard to gender
Clustering without regard to gender
  • We clustered authors into 20 clusters, ignoring their gender
    • Clustering considered text only
    • K-means with log-linear distributions
      • (Eisenstein, Ahmed, and Xing, ICML 2011)
    • Many clusters have strong demographic orientations, including gender, race, and age
clusters that are majority female
Clusters that are majority female

At the population level, women use few named entities and many

non-dictionary words.

But there are clusters of (mostly) women who do the opposite.

clusters that are majority male
Clusters that are majority male

At the population level, men use many named entities and few

non-dictionary words.

But there are clusters of men who do the opposite.

erasure
Erasure!
  • Clusters are highly gendered
  • For example, let’s consider clusters made up of 60% or more of people of the same gender
    • That covers 72.79% of all the authors
    • But what about the 1,420 men who are part of female-majority clusters?
    • The 1,219 women who are part of male-majority clusters?
    • The 782 people who are part of clusters that aren’t gender-skewed?
    • Are they just noise? Odd-balls? Is there no structure to what they’re doing?
markers go beyond you
Markers go beyond “you”
  • The decile of men with the most female-skewed social networks
    • use far more female lexical markers than male markers (only 25% of the markers they use are male).
  • For the decile of men with the most male-skewed networks
    • male and female markers are used at roughly equal rates (because the female markers include more common words).
  • For the decile of women with the most female-skewed networks
    • 85% of the lexical markers that they use are female.
  • For the decile of women with the most male-skewed networks
    • 75% of the lexical markers that they use are female.
does social network help prediction
Does social network help prediction?
  • 89% accuracy with text alone
    • Logistic regression, 10-fold cross-validation
    • State-of-the-art
does social network help prediction1
Does social network help prediction?
  • 89% accuracy with text alone
    • Logistic regression, 10-fold cross-validation
    • State-of-the-art
  • Add network information…
    • Still 89% accuracy
wait why not
Wait, why not?
  • A new feature is only going to improve classification accuracy if it adds new information.
  • There is strong homophily: 63% of the connections are between same-gender individuals.
  • But language and social network can’t mutually disambiguate because they aren’t independent views on gender
  • Individuals who use linguistic resources from “the opposite gender” have consistently denser social network connections to the opposite gender.
    • Performance, style, accommodation
  • Gender is not an “A or B” kind of thing
not so simple
Not so simple
  • If we want to understand categories, we should start with people in interactions.
    • Counting is great but we have to watch our bins and investigate them, too.
  • A binary model of gender is only adequate if you have blinders on
    • “My mom has never in her life said that’s lovely or omg!...nevermind that!”
    • And we can’t trust the idea that we’ll just figure out each of the independent parts—if we figure out “woman” and “African American” then we’ll understand “African American women”.
  • Big data offers us the opportunity to let clusters emerge (and test them against our big bins).
    • In other words, Twitter and other forms of big data offer a way to show how language reflects and creates the social worlds we live in.
how to grab twitter data overview
How to grab Twitter data (overview)
  • Not the only way, just a way. Roughly sketched out.
  • Install Python Development Tools
  • Install the “twitter” package
  • Then write some code:
    • import twitter
    • twitter_search = twitter.Twitter(domain=“search.twitter.com”)
    • trends = twitter_search.trends()
    • [ trend[‘name’] for trend in trends[‘trends’] ]
  • References:
    • https://dev.twitter.com/
    • http://pypi.python.org/pypi/twitter/
alternatively
Alternatively…
  • Less programmy:
    • http://cit.duke.edu/blog/2010/01/collecting-sorting-and-archiving-tweets/
    • Maybe: http://www.visitmix.com/labs/archivist-desktop/
  • Programmy:
  • http://cjohansen.no/en/ruby/collecting_tweets_with_twibot_and_activerecord
  • http://hasin.me/2009/06/20/collecting-data-from-streaming-api-in-twitter/
  • https://github.com/jobrieniii/yourTwapperKeeper
ad