Skip to content

Latest commit

 

History

History
276 lines (198 loc) · 16.8 KB

07_stephens_davidowitz_everybody_lies.md

File metadata and controls

276 lines (198 loc) · 16.8 KB

Everybody Lies

Caveat Lector. The book is generally hopelessly short on details about how numbers were generated. For instance, in sex::point_1, we don't know how it was deduced that the people searching were married, in what time frame research was conducted, etc.

Sex

  • Among married people: 'sexless marriage' --- top problem searched. 3x more searches than 'unhappy marriage' and 8x more than 'loveless marriage.'

  • Among unmarried: 'sexless relationship' is 2nd most common after 'abusive relationship'

  • In India, #1 search beginning w/ 'my husband wants ...' is 'to breastfeed him'

    • porn searches w/ women breastfeeding 4x common in India and Bangladesh than anywhere else in the world
  • Women search for concerns around their genitals as much as men. Women's #1 search = whether their vagina smells.

  • Banana Dreams

    • Data from Shadow, app. on which users record dreams. Not sure what the data show except whether or not fruits/vegies shaped like a dick is not a predictor of how common it is in the dream after "controlling for popularity."
      • Key predictor of what food we dream off = what we consume. water is #1. top 20 include chicken, bread, sandwiches, rice
      • 2nd big predictor of what food we dream off = what we find tasty. #1 and #2 = chocolate and pizza
      • Bananas are #2 on dreams but also 2nd most commonly eaten fruit. cucumbers 7th most common on both lists. hot dogs dreamed < hamburgers
      • funny phrase: 'did not give it more likelihood' (explaining regression coef.)
  • Incest:

    • 16 of top 100 searches by men on PornHub related to incest. Includes 'mom and son', 'mom fucks son', 'step mom fucks son', 'brother and sister', 'real brother and sister'.
    • 9 out of top 100 searches by women on PH around incest. Looking for 'dad and daughter' etc.
    • Google search, most common way to finish 'i want to have sex with my ...' = mom. 'I am attracted to ...' also has bunch of incest related stuff. But searches are not v. common. A few 1000s per year in the U.S.
      • Note: There is a very obtuse appendix table. Not clear what the numbers mean. Is it total n. searches? Then what is the time period? Number for mom given = 720.
  • Baby

    • Data = wives searching about husbands on Google. Find = adult men want to wear diaper, be breastfed. Not clear how the data was collected.
    • most frequent profession of women in porn videos searched by men who are 18-24; 25-64; 65+ = babysitters. teacher & cheerleader in top 4 for each group of men.
  • Americans search for 'porn' more than they search for 'weather'.

    • tried on google trends. pans out.
  • Among 150 most common searches by men:

    • shemale (77th most common; 1.4% of searches), granny (110th).
  • 25% of female searches for straight porn emphasize pain/humiliation. 5% for non-consensual. And women search for all these terms 2x as often as men.

  • 11% women between 15--34 say they are sexually active, !pregnant, !contraceptives. Should mean 10% become pregnant each month. 1 in 113 women become pregnant ----> not enough people having sex.

    • 16x more often to search for spouse not having sex than spouse willing to talk
    • 5.5x more likely for unmarried to complain about partner not wanting sex than not texting back
    • 2x more complaints that boyfriend won't have sex than girlfriend
      • boyfriend won't have sex = #1 searched complaint
  • Men search for questions about penis more than any other body part

    • more searches for how to make penis bigger than how to tune a guitar, make an omelet, change a tire
    • top googled concern about steroids = size of penis
    • men search 170x more often about penis than women do about their partner's penis
    • 40% of women's searches about partner's penis size = it's too big.
    • women search as often about getting boyfriends to climax sooner as making it longer
  • 12% of non-generic porn searches about big boobs. 20x search volume of small boob porn

Sexuality

  • Kinney oversampled prisoners and prostitutes and estimated 10% of Americans are gay. Representative surveys estimate 2--3%.

  • FB men --- 2.5% interested in men

  • Surveys, FB suggest more gays in tolerant states than intolerant. RI (highest support for gay marriage) gay population ~ 2x Mississippi (lowest support for gay marriage)

    • Gay people move from intolerant to tolerant
    • Gay people hide that they are gay in intolerant states
  • Assume HS students can't move to where they want to live. (Well their parents can choose to move.) So state diff. shouldn't appear in HS. Only 2 in 1000 boys in MS come out as gay.

  • Using Search data and Adwords, 5% of male porn searchers for gay male porn --- 'rocket tube', 'gay porn'. more gay porn searches in tolerant than intolerant. In MS, 4.8% is for gay porn. In RI = 5.2%. Estimates 5% of men are gay in US. In CA Bay Area ~ 4% openly gay on FB (this number gotten from FB ads data.).

    • 20% of videos watched by women on PornHub are lesbian.
  • Most common way to finish 'Is my husband' = 'gay'. 10% more often than 'cheating'. gay is 8x more common than 'an alcoholic' and 10x more often than 'depressed'

    • is my husband gay more common in intolerant regions. highest in SC and LA
  • Craigslist Ad data:

    • percentage men looking for casual encounters w/ men higher in less tolerant states. highest = KY, LA, AL
  • Search for 'gay test' 2x more common in the 'least' tolerant states

    • people will search 'gay test' and then 'gay porn' or after

Gender

  • According to AdWords:

    • interest in beauty and fitness = 42% male.
    • Weight loss = 33% male
    • Cosmetic surgery = 39% male
    • 20% of searches of 'how to' about breasts = getting rid of man boobs
  • 7M searches/yr for breast implants. 300K women get implants each year.

    • source unknown as Google Trends doesn't give #s

Race

  • In the US, nigger(s) searched as often as migraine(s), economist, and Lakers

    • tried this on Google trends but shows diff. results (n word much less common). so not clear.
    • n-word in 7M American searches/year
      • not clear how precise number gotten at.
    • Rap lyrics weren't skewing the results. They use 'nigga(s)'
    • reason for search = looking for offensive jokes. 20% of searches included the word 'jokes.' other common phrases used in conjunction = stupid, 'i hate ...'
    • n... jokes 17x common than 'kike jokes', 'gook jokes', 'spic jokes', 'chink jokes', 'fag jokes' combined
    • search for 'n jokes' rises by 30% on MLK day
  • Nate Silver: Best predictor of Trump support in primaries = searches for N word in the area

  • Stormfront:

    • quantcast: 200--400k visits per month
    • 30% women
    • states w/ higher per capita members = Montana, Alaska, Idaho
  • On Obama's election night

    • 1 in 100 searches included 'kkk' or 'nigger(s)'
    • search for Stormfront was 10x normal
    • in some states, searches for 'nigger president' > 'first black president'
  • Estimates Obama lost 4% due to explicit racism

    • Regressing (Obama - Kerry) ~ searches for N word + ed 'levels' + age + church etc.
  • Place with highest search rates of 'nigger(s)' =

    • upstate NY, western PA, eastern OH, industrial MI, rural IL, WV, LA, MS
  • Racist searches not correlated w/ unemployment

  • Why are X people ... top 5 negative words ...

    • blacks: rude, racist, stupid, ugly, lazy
    • jews: evil, racist, ugly, cheap, greedy
    • muslims: evil, terrorists, bad, violent, dangerous
    • mexicans: racist, stupid, ugly, lazy, dumb
    • asians: ugly, racist, annoying, stupid, cheap
    • gays: evil, wrong, stupid, annoying, selfish
    • christians: stupid, crazy, dumb, delusional, wrong
  • Top Google search w/ word Muslims in CA post San Bernardino = 'kill Muslims'. Americans searched for 'kill Muslims' as often as 'martini recipe', 'migraine symptoms'. Apparently also as often as 'Islamophobia'. hate searches about 20% of searches for Muslim before attack, 50% hours after it.

  • In 2015, 12k searches in US for 'kill Muslims'. 12 Muslims victims of hate crime.

Why Search Data is Great, Bad

  • Lying on surveys

    • On surveys, heterosexual women report having sex 55x/year and using a condom 16% of the time. This ~ 1.1B condoms. Heterosexual men numbers yield estimate of 1.6B condoms/year. Nielsen reports total condom sales ~ 600M.
    • Unmarried men claim using condoms 29x/year. Net number > total condom sales in the US.
  • People are trying to find out stuff they need. So they have to be honest.

  • People won't search for stuff that they passively get via their apps. etc. For instance, weather info.

  • Google search data bias = toward negative, weird

  • Relationship between n_searches, action can be weak. In 2015, 12k searches in US for 'kill Muslims'. 12 Muslims victims of hate crime.

Validating Google Trend Data

Not sure how this was deduced as trend data doesn't allow us to recover this.

  • States that search most for God = Alabama, Mississippi, and Arkansas
  • God searched most often on Sunday
  • Knicks searched most often in NYC

Other Interesting Patterns in Google Searches

  • search for 'anxiety symptoms', 'anxiety help' higher in places with lower levels of ed., lower median income, greater rural pop. Search rates higher in upstate NY than NYC

  • anxiety related searches don't rise day after terror attacks (since 2004, all major European or American attacks)

  • searches for jokes lowest on Mondays, cloudy or rainy days.

  • In winter, warm climates have 40% fewer depression searches than cold climates. Most effective depression drug ~ 20% effect.

  • Each month, 3.5M google searches related to suicide

    • Again not clear how number was arrived at. What the time period is etc.
  • In Mexico, top search about 'my pregnant wife' include = 'words of live to my pregnant wife' and 'poems for my ...' US -> 'now what' and 'what do i do'

  • Using Google Autocomplete, #1 way to finish 'is it normal to want to ...' = 'kill' and then #1 way to finish 'is it normal to want to kill...' = 'my family'

  • In 2015, in US, 700k GOOG searches into self-induced abortions, 3.4M searches for abortion clinics. 4k searches for coat hanger abortions. Steady search rates between 2004--207. Increase in late 2008 with fin. crisis, and in 2011, jump 40%. 2011 is when crackdown against abortion started. 92 state provisions restricting abortion enacted. MS = state w/ highest search rate for self-induced abortion. 8 out of 10 states at top hostile to abortion (hostility ratings from Guttmacher).

  • Sexism:

    • parents 2.5x 'is my son gifted' than 'is my daughter gifted'. same for 'is my son a genius'. apparently more searches for 'is my son behind' > 'is my daughter behind' but precise # not given
    • 'is my daughter overweight' 2x 'is my son overweight'. more eager to search for how to get their daughters to lose weight than boys.
      • baseline = 28% of girls overweight vs. 35% of boys
  • People 7x likelier to search if people will regret not having children than search if they will regret having children

    • precise numbers in appendix given for total searches of people saying they regret having children (1,730 per month) --- how were the numbers gotten? Google provides no obv. way to get these numbers. Aargh.

Other Stuff Google Searches Predict

  • (Bing Search): Pancreatic Cancer: search for abdominal pain followed by yellowing skim --- can identify 5 to 15%

  • 'flu symptoms' and 'muscle aches' ---> indicators of how fast flu is spreading.

  • 'how to vote' and 'where to vote' predicts turnout

  • Using Google correlate, when housing prices are rising, Americans search for '80/20 mortgage', 'new home builder' and 'appreciation rate'. When falling, search for 'short sale process', 'underwater mortgage' and 'mortgage forgiveness debt relief'.

  • Using Google correlated, #1 correlated w/ unemployment rate (2004--2011) = 'slutload'. Another highly correlated term = 'spider solitaire'. 'rawtube' also highly correlated at 1 point. basic point = 'diversion related' searches track unemp. well.

  • Vote pref.:

    • Person puts candidate they support first in a search that includes both cand. names w/ Stuart Gabriel (USC)
      • 1/4 of searches with Trump also included Clinton
      • includes some BS low N correlations
  • Conventional Wisdom = Explicit racism is limited to small % of Americans, mostly in the deep south.

Misc. Research Reported

  • Size of heart, esp. left ventricle, is #1 predictor of race horse success. Another big predictor = size of spleen (bigger, better). 2-year old horses that wheeze after running 1/8th of a mile nearly never pan out.

  • Wine price ~ (Ashenfelter)

    • Wine price mostly explained by region
    • Wine quality ~ region + weather from year grown
      • Price = 12.145 + .00117 winter rainfall + .0614 avg. growing season temp. + .00386 harvest rainfall
    • https://www.jstor.org/stable/20108831
  • 'United states are..' more common than 'US is ...' till 1880. (Google N-grams.)

  • Speed Dating data (McFarland, Jurafsky, Rawlings)

    • Paper = https://nlp.stanford.edu/pubs/mcfarlandjurafskyrawlings.pdf
    • Women prefer men who are taller, share their hobbies
    • Men prefer women who are skinnier, share their hobbies
    • Men signal interest by
      • laughing at women's jokes
      • limiting range of pitch. seen as masculine
    • Women signal interest by
      • varying pitch
      • speaking more softly
      • shorter turns talking
      • less interested if using 'probably', 'i guess', 'sorta', 'kinda' etc.
      • interested if talking about herself
    • Women like men
      • who follow their lead
      • laugh at their jokes
      • keep conversation of topics introduced by her
      • who support, sympathize ---'that's awesome', 'that's cool', 'must be tough' etc.
    • Date going badly if too many questions in convo.
  • Stories fit 1 of 6 categories generally (Reagan et al.)

    • Rags to Riches
    • Riches to Rags
    • Man in a Hole (fall, then rise)
    • Icarus (rise, then fall)
    • Cinderella (rise, fall, rise)
    • Oedipus (fall, rise, fall)
    • Paper = https://arxiv.org/abs/1606.07772
  • +ve stories more likely to make NYT most emailed ---Berger and Milkman using NYT stories. Paper = http://journals.ama.org/doi/abs/10.1509/jmr.10.0353?code=amma-site

  • Using school picture books:

  • Basketball

    • black men are 40x white men to reach the NBA
    • black kid born in wealthiest county in US is 2x more likely than one born in the poorest
    • NBA superstars 30% less likely to be born to unwed mothers. Data = 100 NBA players born in 1980s who scored the most points.
    • Fryer and Levitt found, among African Americans, unwed mothers, uneducated etc. give kids more unusual names. AA Kids born to poverty have 2x the rate of having a unique name. CA born NBA players have 1/2 as likely unique names as avg. black male
    • 60% of AA born in 1980s were born to unwed mothers. Of the Black NBA players born in 1980s, majority born to married parents
    • Each additional inch doubles chances of getting into the NBA
      • Those over 7', 1 in 5 reach NBA. Those below 6', 1 in 2M
  • Obama + Family photo (vs. solo Obama) + 'Learn More' vs. (Join us Now, Sign up) via A/B testing -> 40% more sign ups and 60M in add'l campaign funding.

  • Top addictions reported to Google in 2016: Drugs, Sex, Sugar, Porn, Love, Gambling, Alcohol, FB

  • Levitt used which team is in the finals as an instrument for ad exposure:

    • When Boston playing, Baltimore not (2012), respective audience shares in the cities = 56.7, 47.9. If reverse, shares = 48.0, 59.6 (2013)
    • Ad for movies ---> sig. more people attend movies advertised in the superbowl in cities whose teams make playoffs. no effect on movies not advertising in the superbowl
    • Hartmann and Klapper --- beer and soft drink --- 2.5x ROI
  • Clemens and Gottlieb use data on how Medicare reimburses physicians for procedures --- change by county --- to estimate how motivated are docs. by fin. incentives --- 2% increase in payment leads to 3% increase in care provision: https://www.aeaweb.org/articles?id=10.1257/aer.104.4.1320 (elective procedures respond much more strongly.)

  • When people become fans of baseball teams:

    • Teams that win when they are 10 (normally distributed around that; age 10 = 8% of the fans, by 19 ~ 1%)
  • When people get PID:

    • popular president in 14--24 shapes PID. Age 18 is most sig.
    • Gelman et al.
  • Cheating on taxes:

    • EITC max. at income of $9k. So $9k is the most commonly reported taxable income reported by self-employed w/ 1 child. In Miami, 30% reported $9k. In PHL, 2%. Biggest predictor of cheating = network effects.