Gender in twitter styles stances and social networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 52

Gender in Twitter: Styles, Stances, and Social Networks PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

Gender in Twitter: Styles, Stances, and Social Networks. Tyler Schnoebelen (reporting joint work with David Bamman and Jacob Eisenstein). At its most basic. At its most basic. Assumption 1: Men and women use different vocabularies

Download Presentation

Gender in Twitter: Styles, Stances, and Social Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Gender in twitter styles stances and social networks

Gender in Twitter: Styles, Stances, and Social Networks

Tyler Schnoebelen

(reporting joint work with David Bamman and Jacob Eisenstein)


At its most basic

At its most basic


At its most basic1

At its most basic

  • Assumption 1: Men and women use different vocabularies

    • Hypothesis I: Computational methods can cut through noise and predict speaker gender based on the words they use

  • Assumption 2: Social networks are typically “homophilous” (birds of a feather flock together)

    • Hypothesis II: Adding the gender make-up of a user’s social network should get even better prediction


Actual goal

Actual goal

  • Problematize gender prediction as a task

    • Define a system where we could just “stop” and call it good

    • But NOT ACTUALLY STOP

  • Demonstrate that simple gender binaries aren’t actually descriptively accurate

  • Show ways to combine social theory and computational methods that expand the questions on both sides


Quick lit review

Quick lit review


Standard is a keyword

“Standard” is a keyword

Although standard poodles isn’t what Cheshire (2004), Cameron & Coates (1989), Eckert & McConnell-Ginet (1999), Holmes (1997), or Romaine (2003) have in mind.


Typical findings

Typical findings

  • Women use standard variables more often than men.

    • In fact, early dialectologists ignored women completely because they wanted “NORMS”—non-mobile, older, rural male speakers, seen as preserving the purest regional (non-standard) forms

      • See Chambers and Trudgill (1980).

    • Did they do it for prestige (to acquire social capital)?

    • To avoid losing status?

    • Are women actually creating norms, not following them?

  • Check out your text book (“Whose speech is more standard”) for more complications to this picture


More computational work

More computational work

  • People are fascinated by gender differences

  • In order to get statistical significance, you have to have enough data where you can detect a signal

  • In the past, this has led researchers to roll up words into word classes


The most common distinctions

The most common distinctions

  • Men use informative language

    • Prepositions, attributive adjectives, higher word lengths

  • Women use involved language

    • First and second person pronouns, present tense verbs, contractions


Or by contextuality

Or by “contextuality”

  • Men are formal and explicit

    • Nouns, adjectives, prepositions, articles

  • Women are deictic and contextual

    • Pronouns, verbs, adverbs, interjections

  • “Contextuality” decreases when an unambiguous understanding is more important or difficult—when people are physically or socially farther away


Are all nouns really the same

Are all nouns really the same?


Are all nouns really the same1

Are all nouns really the same?


And what about

And what about…


And what about1

And what about…


Our approach also lumps

Our approach also lumps

  • It’s just at a lower level because instead of “nouns” or “blog words”, we have “unigrams”.

    • We also ran our work with part-of-speech tagged unigrams for one level less lumping—the results are basically the same but not reported here.

  • Lumping itself isn’t a problem. In fact, you have to.

    • But ideologies are going to structure your lumpings, so watch out!


Gender in twitter styles stances and social networks

Data

  • Public Twitter messages in same-gender and cross-gender social networks

    • Word frequencies (unigrams)

    • Gender (induced from first names)--e.g., The Social Security Administration says:

      • Tyler is a male name 97.36% of the time

      • Penny and Annette are female names 100% of the time

      • Robin is female 87.69% of the time

  • 14,464 Twitter users (56% male)

    • Geolocated in the US

    • Must use 50 of top 1,000 most frequent words

    • Between 4 and 100 “mutual @’s” separated by 14-days

      • Women have 58% female friends

      • Men have 67% male friends

  • 9.2M tweets, Jan-Jun 2011


First step take the normal route

First step: take the “normal” route

  • Train a statistical model on part of the data.

    • Logistic regression

  • Test it on a different part of the data, hiding the gender labels.

    • 10-fold cross-validation: 10 unique training/test splits (so the test is a different 10% of the data)

  • State-of-the-art prediction: 88.9%

    • Lexical features do strongly predict gender

    • Ignoring syntax (treating tweets as “bags of words”) does pretty good


Top 500 markers for each gender

Top 500 markers for each gender


Are women less standard

Are women less standard?

  • Female markers:

    • okay, yes, yess, yesss, yessss

    • nooo, noooo

    • cannot

  • Male markers:

    • yessir

    • nah, nobody

    • ain’t

  • What counts as standard?


Hand classification 94 2 agreement

Hand classification (94.2% agreement)

At a corpus level, women use more non-dictionary words and men use more named entities. In a moment we’ll ask how universal this is.


But wait

But wait

  • “Dictionary” words are really diverse

    • There’s a sense that dude (m), cute (f), epic (m), and lovely (f) are “stylistic” in a way that ability (m), correct (m), lipstick (f) and sleepy (f) are not, but how would we pin this down?

    • Part of speech?

    • But in what way do cute (f), hot (f), epic (m), and solid (m) belong with correct (m), offensive (m), sleepy (f) and glad (f)?

    • And for hot and solid, the “style” or “content” division depends on the intended word sense.


Involvement

Involvement

  • Using traditional definitions, we’d say that our data confirms

    • men as more informational (all those named entities)

    • women as more interactive/involved (pronouns, emoticons, etc).

  • Recall that most of the named entities for the men are sports figures and teams.


Right these guys are not involved

Right. These guys are not “involved”


Shit girls say

Shit Girls Say

http://www.youtube.com/watch?feature=player_embedded&v=u-yLGIH7W9Y


Gender in twitter styles stances and social networks

Meme-splosion!


Notice

Notice

  • That gender wasn’t really limited to the “gender” column

    • “Moms” and “dads” are gendered social roles

  • And that the words “guys” and “girls” aren’t really the same as “male” and “female”

    • What are the plausible age ranges and social styles for “guys” and “girls”?


Back to our data

Back to our data


Clustering without regard to gender

Clustering without regard to gender

  • We clustered authors into 20 clusters, ignoring their gender

    • Clustering considered text only

    • K-means with log-linear distributions

      • (Eisenstein, Ahmed, and Xing, ICML 2011)

    • Many clusters have strong demographic orientations, including gender, race, and age


Clusters that are majority female

Clusters that are majority female

At the population level, women use few named entities and many

non-dictionary words.

But there are clusters of (mostly) women who do the opposite.


Clusters that are majority male

Clusters that are majority male

At the population level, men use many named entities and few

non-dictionary words.

But there are clusters of men who do the opposite.


Erasure

Erasure!

  • Clusters are highly gendered

  • For example, let’s consider clusters made up of 60% or more of people of the same gender

    • That covers 72.79% of all the authors

    • But what about the 1,420 men who are part of female-majority clusters?

    • The 1,219 women who are part of male-majority clusters?

    • The 782 people who are part of clusters that aren’t gender-skewed?

    • Are they just noise? Odd-balls? Is there no structure to what they’re doing?


Men with male networks use more male markers

Men with male networks use more male markers


Women with female networks use more female markers

Women with female networks use more female markers


Women with the most male networks use more male markers

Women with the most male networks use more male markers


Men with female networks use the most female markers

Men with female networks use the most female markers


The classifier does best classifying women with female networks

The classifier does best classifying women with female networks


The classifier does best classifying men with male networks

The classifier does best classifying men with male networks.


Markers go beyond you

Markers go beyond “you”

  • The decile of men with the most female-skewed social networks

    • use far more female lexical markers than male markers (only 25% of the markers they use are male).

  • For the decile of men with the most male-skewed networks

    • male and female markers are used at roughly equal rates (because the female markers include more common words).

  • For the decile of women with the most female-skewed networks

    • 85% of the lexical markers that they use are female.

  • For the decile of women with the most male-skewed networks

    • 75% of the lexical markers that they use are female.


Does social network help prediction

Does social network help prediction?

  • 89% accuracy with text alone

    • Logistic regression, 10-fold cross-validation

    • State-of-the-art


Does social network help prediction1

Does social network help prediction?

  • 89% accuracy with text alone

    • Logistic regression, 10-fold cross-validation

    • State-of-the-art

  • Add network information…

    • Still 89% accuracy


Once we have 1000 words author network info doesn t help

Once we have 1000 words/author, network info doesn’t help


Wait why not

Wait, why not?

  • A new feature is only going to improve classification accuracy if it adds new information.

  • There is strong homophily: 63% of the connections are between same-gender individuals.

  • But language and social network can’t mutually disambiguate because they aren’t independent views on gender

  • Individuals who use linguistic resources from “the opposite gender” have consistently denser social network connections to the opposite gender.

    • Performance, style, accommodation

  • Gender is not an “A or B” kind of thing


If we seek only predictive accuracy

If we seek only predictive accuracy…


We re awesome

We’re awesome!


Not so simple

Not so simple

  • If we want to understand categories, we should start with people in interactions.

    • Counting is great but we have to watch our bins and investigate them, too.

  • A binary model of gender is only adequate if you have blinders on

    • “My mom has never in her life said that’s lovely or omg!...nevermind that!”

    • And we can’t trust the idea that we’ll just figure out each of the independent parts—if we figure out “woman” and “African American” then we’ll understand “African American women”.

  • Big data offers us the opportunity to let clusters emerge (and test them against our big bins).

    • In other words, Twitter and other forms of big data offer a way to show how language reflects and creates the social worlds we live in.


Thanks

Thanks!


Appendix

Appendix


How to grab twitter data overview

How to grab Twitter data (overview)

  • Not the only way, just a way. Roughly sketched out.

  • Install Python Development Tools

  • Install the “twitter” package

  • Then write some code:

    • import twitter

    • twitter_search = twitter.Twitter(domain=“search.twitter.com”)

    • trends = twitter_search.trends()

    • [ trend[‘name’] for trend in trends[‘trends’] ]

  • References:

    • https://dev.twitter.com/

    • http://pypi.python.org/pypi/twitter/


Alternatively

Alternatively…

  • Less programmy:

    • http://cit.duke.edu/blog/2010/01/collecting-sorting-and-archiving-tweets/

    • Maybe: http://www.visitmix.com/labs/archivist-desktop/

  • Programmy:

  • http://cjohansen.no/en/ruby/collecting_tweets_with_twibot_and_activerecord

  • http://hasin.me/2009/06/20/collecting-data-from-streaming-api-in-twitter/

  • https://github.com/jobrieniii/yourTwapperKeeper


  • Login