Done with first draft of Part II

ethanweed · Aug 11, 2021 · 6d211f1 · 6d211f1
1 parent 62bc064
commit 6d211f1
Show file tree

Hide file tree

Showing 93 changed files with 11,339 additions and 361 deletions.
diff --git a/Book/_build/.doctrees/01.01-intro.doctree b/Book/_build/.doctrees/01.01-intro.doctree
diff --git a/Book/_build/.doctrees/02.01-getting_started_with_python.doctree b/Book/_build/.doctrees/02.01-getting_started_with_python.doctree
diff --git a/Book/_build/.doctrees/02.02-more_python_concepts.doctree b/Book/_build/.doctrees/02.02-more_python_concepts.doctree
diff --git a/Book/_build/.doctrees/03.03-pragmatic_matters.doctree b/Book/_build/.doctrees/03.03-pragmatic_matters.doctree
diff --git a/Book/_build/.doctrees/03.04-basic_programming.doctree b/Book/_build/.doctrees/03.04-basic_programming.doctree
diff --git a/Book/_build/.doctrees/04.03-estimation.doctree b/Book/_build/.doctrees/04.03-estimation.doctree
diff --git a/Book/_build/.doctrees/environment.pickle b/Book/_build/.doctrees/environment.pickle
diff --git a/Book/_build/.doctrees/glue_cache.json b/Book/_build/.doctrees/glue_cache.json
diff --git a/Book/_build/.doctrees/landingpage.doctree b/Book/_build/.doctrees/landingpage.doctree
diff --git a/Book/_build/html/01.01-intro.html b/Book/_build/html/01.01-intro.html
@@ -619,7 +619,8 @@ <h2><span class="section-number">1.2. </span>The cautionary tale of Simpson’s
 <p>Here’s what’s going on. Firstly, notice that the departments are <em>not</em> equal to one another in terms of their admission percentages: some departments (e.g., engineering, chemistry) tended to admit a high percentage of the qualified applicants, whereas others (e.g., English) tended to reject most of the candidates, even if they were high quality. So, among the six departments shown above, notice that department A is the most generous, followed by B, C, D, E and F in that order. Next, notice that males and females tended to apply to different departments. If we rank the departments in terms of the total number of male applicants, we get <strong>A</strong>&gt;<strong>B</strong>&gt;D&gt;C&gt;F&gt;E (the “easy” departments are in bold). On the whole, males tended to apply to the departments that had high admission rates. Now compare this to how the female applicants distributed themselves. Ranking the departments in terms of the total number of female applicants produces a quite different ordering C&gt;E&gt;D&gt;F&gt;<strong>A</strong>&gt;<strong>B</strong>. In other words, what these data seem to be suggesting is that the female applicants tended to apply to “harder” departments. And in fact, if we look at all Figure &#64;ref(fig:berkeley) we see that this trend is systematic, and quite striking. This effect is known as Simpson’s paradox. It’s not common, but it does happen in real life, and most people are very surprised by it when they first encounter it, and many people refuse to even believe that it’s real. It is very real. And while there are lots of very subtle statistical lessons buried in there, I want to use it to make a much more important point …doing research is hard, and there are <em>lots</em> of subtle, counterintuitive traps lying in wait for the unwary.  That’s reason #2 why scientists love statistics, and why we teach research methods. Because science is hard, and the truth is sometimes cunningly hidden in the nooks and crannies of complicated data.</p>
 <div class="cell tag_hide-input docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">myst_nb</span> <span class="kn">import</span> <span class="n">glue</span>
+<span class="kn">import</span> <span class="nn">os</span>
 <span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
 <span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
 
@@ -651,7 +652,7 @@ <h2><span class="section-number">1.2. </span>The cautionary tale of Simpson’s
 </div>
 </div>
 <div class="figure align-default" id="fig-berkely" style="width: 600px">
-<p class="caption"><span class="caption-number">Fig. 1.2 </span><span class="caption-text">The Berkeley 1973 college admissions data. This figure plots the admission rate for the 85 departments that had at least one female applicant, as a function of the percentage of applicants that were female. The plot is a redrawing of Figure 1 from Bickel et al. (1975).</span><a class="headerlink" href="#fig-berkely" title="Permalink to this image">¶</a></p>
+<p class="caption"><span class="caption-number">Fig. 1.2 </span><span class="caption-text">The Berkeley 1973 college admissions data. This figure plots the admission rate for the 85 departments that had at least one female applicant, as a function of the percentage of applicants that were female. Based on data from <span id="id7">[<a class="reference internal" href="bibliography.html#id39"><span>BHOConnell75</span></a>]</span>.</span><a class="headerlink" href="#fig-berkely" title="Permalink to this image">¶</a></p>
 </div>
 <p>Before leaving this topic entirely, I want to point out something else really critical that is often overlooked in a research methods class. Statistics only solves <em>part</em> of the problem. Remember that we started all this with the concern that Berkeley’s admissions processes might be unfairly biased against female applicants. When we looked at the “aggregated” data, it did seem like the university was discriminating against women, but when we “disaggregate” and looked at the individual behaviour of all the departments, it turned out that the actual departments were, if anything, slightly biased in favour of women. The gender bias in total admissions was caused by the fact that women tended to self-select for harder departments.  From a legal perspective, that would probably put the university in the clear. Postgraduate admissions are determined at the level of the individual department (and there are good reasons to do that), and at the level of individual departments, the decisions are more or less unbiased (the weak bias in favour of females at that level is small, and not consistent across departments). Since the university can’t dictate which departments people choose to apply to, and the decision making takes place at the level of the department, it can hardly be held accountable for any biases that those choices produce.</p>
 <p>That was the basis for my somewhat glib remarks earlier, but that’s not exactly the whole story, is it? After all, if we’re interested in this from a more sociological and psychological perspective, we might want to ask <em>why</em> there are such strong gender differences in applications. Why do males tend to apply to engineering more often than females, and why is this reversed for the English department? And why is it it the case that the departments that tend to have a female-application bias tend to have lower overall admission rates than those departments that have a male-application bias? Might this not still reflect a gender bias, even though every single department is itself unbiased? It might. Suppose, hypothetically, that males preferred to apply to “hard sciences” and females prefer “humanities”. And suppose further that the reason the humanities departments have low admission rates is  because the government doesn’t want to fund the humanities (Ph.D. places, for instance, are often tied to government funded research projects). Does that constitute a gender bias? Or just an unenlightened view of the value of the humanities? What if someone at a high level in the government cut the humanities funds because they felt that the humanities are “useless chick stuff”. That seems pretty <em>blatantly</em> gender biased. None of this falls within the purview of statistics, but it matters to the research project. If you’re interested in the overall structural effects of subtle gender biases, then you probably want to look at <em>both</em> the aggregated and disaggregated data. If you’re interested in the decision making process at Berkeley itself then you’re probably only interested in the disaggregated data.</p>