Pages

Tuesday, February 8, 2022

Review: Everybody Lies: Big Data, New Data and What the Internet Tells us About Who We Really Are by Seth Stephens-Davidowitz

 


One of the big problems in both psychiatric and psychological research that I have written about extensively is the tendency of researchers to think that their subjects are usually being truthful, especially when it come to things like family dysfunction, marital maladjustment and child abuse. 


Most people who know that people often are not truthful about these matters think it’s mostly a matter of personal shame and embarrassment, whereas I think that, while that is sometimes the case, the lies are more often about protecting the reputation of their families of origin.

 

We of course have very little truly objective research data in these fields because:

1. We can’t read minds.

2. People are good actors, leading to falsehoods in the observations of the researchers.

3. People not only lie to others, but lie to themselves as well. This is a part of the  willful  blindness characteristic of groupthink, which we need in order to maintain group cohesion with our kin and ethnic groups. Logic evolved not to reach the truth, but to justify group norms, as Gregg Henriques has pointed out.

 

The author of this book states, “People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that last book. They call in sick when they’re not…They say they’re happy when they’re in the dumps. They say they like women when they really like men…People lie to their friends. They lie to their kids. They lie to parents. They lie to doctors…They lie to themselves. And they damn sure lie to surveys.”  

 

Stephens-Davidowitz’s book discusses one way we can get around this. We now have "big data" which can monitor not only the internet sites we visit but the questions we have in our own minds. People have little incentive to lie in the context of a Google search because no one they know will be aware of what they are doing when they do, say, a search for lesbian sex on Pornhub. The author refers to the internet as “digital truth serum.”

 

He talks about how this data helps us spot patterns of human thinking and behavior as well as predict how one variable will affect another. This data contains many surprises. For example, if you check into what follows most often when you type in “it’s normal to want to kill…” the most common inquiry is “my family.” 


Human sexual behavior, predictably, is a big area for surprises. Among the top searches on Pornhub by women is sex featuring violence against women, with such searches as “extreme brutal gangbang.” On Google, there are twice as many complaints by women than by men about a lack of sex in their relationship.

 

Some human activities that are thought by most people to be productive may actually backfire. When president Obama gave a speech about tolerance, searches for “kill Muslims” actually tripled during the speech.

 

One of my favorite facts was that after the release of particularly violent but popular movies (incorporating data from FBI hourly crime data, box office numbers, and a measure of violence in the particular movies), violent acts actually declined that weekend, rather than rise as conventional wisdom might suggest.

 

Now of course even with big data there are some questions which cannot be clarified, and the author gives us a wonderful discussion of some of the hazards in using it to draw conclusions.

 

Another of my frequently blogged about topics is the illogical assumptions made about studies in which one variable seems to correlate with another, like high schoolers who smoke pot getting poor grades. We all should know that correlation is not causation, but you’d never know that from looking at studies, in spite of all the hedging and disclaimers. 


I learned that there are actually names for some of the fallacies I've been writing about. “Reverse causation” is when variable A is correlated with variable B, leading to the idea that A causes B when in fact, B causes A. “Omitted variable bias” is when a third, ignored factor is something else that leads to increases in both A and B. Maybe kids from difficult homes have a tendency to both use drugs and get bad grades.

 

A big one in the genetics vs. environment debate is something called “dimensionality.” The human genome differs in literally millions of ways. If you test for a lot of different genes, some will correlate with the trait in question, but it’s just by chance. This is similar to flipping 500 coins and finding one that turns up heads 15 times in a row, and assuming the reason that it’s some sort of special coin. When such studies are repeated, the usual result is that the correlation disappears.

 

This book is funny and well written. I highly recommend it.


No comments:

Post a Comment