Last time we discussed a simple model of polling where each demographic were a different color ball in an urn, with supporters of a candidate (and those not supporting that candidate) as solids and stripes. We saw with data from the Hispanic demographics that overdispersion was present and diluted support for Biden.
But we assumed the polls which formed the basis of our reasoning were a perfect representative sample. I think we will begin to tease this apart in this post. I don't think I'll discuss sampling methods yet.
Instead, in this post, I'll start with several urns with different quantities of striped and solid balls. Even using ideal sampling methods, the techniques produce a wide range of estimates of the proportion of striped balls to the rest of the urn.
Toy Problem
Consider 4 urns, A, C, F, and T. We have balls with three colors (red, green, blue) which are either striped or solid. We want to know how many solid balls are in each urn from a given sample. But we perform this backwards: we begin with knowing the number of striped and solid balls in each urn, and consider different sampling methods.
Urn A | Number Solid | Number Striped | Total | % of Balls |
---|---|---|---|---|
Green | 962,036 | 357,294 | 1,319,330 | 96.12% |
Blue | 34,147 | 7,532 | 41,679 | 3.04% |
Red | 3,348 | 8,242 | 11,590 | 0.84% |
Total | 999,531 (72.8%) | 373,068 | 1,372,599 |
Urn C | Number Solid | Number Striped | Total | % of Balls |
---|---|---|---|---|
Green | 6,628,592 | 1,910,704 | 8,539,296 | 97.48% |
Blue | 130,711 | 20,448 | 151,159 | 1.72% |
Red | 19,519 | 49,726 | 69,245 | 0.79% |
Total | 6,778,822 (77.4%) | 1,980,878 | 8,759,700 |
Urn F | Number Solid | Number Striped | Total | % of Balls |
---|---|---|---|---|
Green | 364,392 | 126,652 | 491,044 | 21.01% |
Blue | 598,611 | 193,014 | 791,625 | 33.88% |
Red | 276,758 | 777,166 | 1,053,924 | 45.11% |
Total | 1,239,761 (53.0%) | 1,096,832 | 2,336,593 |
Urn T | Number Solid | Number Striped | Total | % of Balls |
---|---|---|---|---|
Green | 4,723,236 | 1,716,632 | 6,439,868 | 96.91% |
Blue | 110,395 | 32,782 | 143,177 | 2.15% |
Red | 18,830 | 43,645 | 62,475 | 0.94% |
Total | 4,852,461 (73.0%) | 1,793,059 | 6,645,520 |
There are a total of 13,870,575 solid balls (72.566%) and 5,243,837 (27.434%) striped balls.
Exercise 1. If we had a hypothetical urn with 5,243,837 striped balls and 13,870,575 solid balls, and if we sampled without replacement n balls, what's the expected number of striped balls drawn? What's the interval containing 75% of the probability distribution about the mean for different sample sizes \(n=75,100,125\)?
This requires being more precise about what we're really interested in finding. The expected number of striped balls in a sample of n balls would be \(0.27434n\). This is unambiguous. (The median value would be, for \(n=100\), 27 striped balls.)
Constructing the interval can be done thus: the upper bound of k for \(\Pr(X\leq k|M,N,n=100)\approx 7/8\) empirically is about \(k\approx32\), and for the lower bound \(\Pr(X\leq k|M,N,n=100)\approx 1/8\) is about \(k\approx 22\). This would give us an interval of 27 ± 5.
We could also consider constructing an interval similar to the "highest posterior density interval". Intuitively, we plot the probability density function, then take a horizontal line tangent to the mode (peak of the PDF). We begin lowering the horizontal line until the area under the probability density between the intersection points equals 75%. This produces a slightly different value since the distribution is not symmetric, though for the sample sizes we are considering the differences would not be appreciable.
Exercise 2. If we drew 4 samples (each sampled without replacement) with each sample consisting of n balls, what's the expected number of striped balls drawn? What's the interval containing 75% of the probability distribution about the mean for \(n=75,100,125\)?
Constructing the intervals of expected striped balls drawn for \(n=100\) samples: urns A and T have interval 27 ± 5 striped balls, urn C has interval \(18\leq k\leq27\), and urn F has an interval \(41\leq k\leq53\).
Observe that urn C has an interval with fewer striped balls than the hypothetical pooled urn, whereas urn F has its interval centered at nearly double the hypothetical pooled urn's, and urns A and T coincide with the hypothetical pooled urn.
Exercise 3. If we sampled proportional to the number of balls in each urn (e.g., \(2336593/(5243837 + 13870575) \approx 12.22\%\) of the sample are drawn from urn F) and we sampled without replacement, what's the expected number of striped balls drawn? What's the interval containing 75% of the probability distribution about the mean for \(n=75,100,125\)? What's the expected number of balls of each color drawn?
This is several independent sampling problems, with \(n_{A} = nN_{A}/N\approx0.07n\), \(n_{C} = nN_{C}/N\approx 0.45n\), \(n_{F} = nN_{F}/N\approx 0.12n\), and \(n_{T} = nN_{T}/N\approx 0.35n\). The expected number of striped balls in the sample would be the sum of the expected number from a sample from respective urns with respective sizes. But some simple algebra shows this is just the expected value of the hypothetical pooled urn from exercise 1.
Since the wording suggests we move from urn to urn, taking a specific sample from each one independently, the intervals are computed independently, and the sum of the lower-bounds gives us the lower-bound for the resulting sample (and similarly for the upper-bound).
Exercise 4. What if each urn gets sampled equally? So a quarter of the sample is drawn from urn A, a quarter from urn C, etc. What is the expected value of striped balls appearing in the sample? What is the 75% interval for \(n=100\)?
This gives us an expected 30.92905 striped balls in the sample. The interval with 75% probability consists of \(20\leq k\leq41\) striped balls; the 50% intervals gives us 30 ± 6 striped balls.
Exercise 5. What if each urn gets sampled in this manner: we first fix a desired sample size \(n\geq100\). We want to randomly sample without replacement so we get at least \((R/N)n\) red balls (where there are \(R\) red balls and \(N\) total balls in the urn initially), at least \((G/N)n\) green balls (where \(G\) is the initial number of green balls in the urn) and at least \((B/N)n\) blue balls (with \(B\) the initial number of blue balls in the urn). We will have our sample consist of \(g\) green balls, \(r\) red balls, and \(b\) blue balls where \(r+g+b\geq n\). (A) What is the expected sample size for each urn for \(n=150\)? (B) What is the expected number of striped balls for each color? (C) [Open ended] Can we apply some set of weightings to better reflect the urn's composition?
We first compute how many balls we want, at minimum, drawn for each urn:
- Urn A's sample needs at least 144 green balls, 5 blue balls, and 1 red ball
- Urn C's sample needs at least 146 green balls, 3 blue balls, and 1 red ball
- Urn F's sample needs at least 32 green balls, 51 blue balls, and 68 red ball
- Urn T's sample needs at least 145 green balls, 3 blue balls, and 1 red ball
If we consider the stopping condition to be drawing s balls of a certain color (which the urn has K balls of that particular color), then we can compute the expected number of draws \(k+s\) needed using the negative hypergeometric distribution. The expected number of draws would be \[ n \approx s + s\frac{N-K}{K+1} = s\frac{N+1}{N-K+1}. \] This formula differs from a naive reading of the wikipedia page, because their "K" is our "N-K".
However, if we want our sample to contain the desired minimum with \(\alpha\) probability, we need at least \(\Pr(X\geq s\mid N, K, n)\geq \alpha\). Just brute forcing this, we find A needs \(n=354\), C needs \(n=378\), urn F requires \(n=194\), and finally T needs \(n=318\).
For, e.g., urn A, this has the unfortunate side effect of producing somewhere between double to triple as many green and blue balls as needed. So how do we handle this? We want to avoid discarding information (as a rule of thumb in life, but especially in statistics), so we may want to take the ratio of striped green balls to solid green balls, then multiply by the desired sample size for green balls (144). There are other possibilities, but this is the quickest for us.
Numerically, we find the samples produced for each urn has with the 90% confidence interval estimating solid balls of specific color in parentheses:
- Urn A's sample has 340 green balls (234–261), 11 blue balls (7–11), and 3 red ball (0–2)
- Urn C's sample needs at least 369 green balls (273–299), 7 blue balls (4–7), and 3 red ball (0–2)
- Urn F's sample needs at least 41 green balls (26–35), 66 blue balls (44–55), and 87 red ball (16–30)
- Urn T's sample needs at least 309 green balls (214–239), 7 blue balls (3–7), and 3 red ball (0–2)
If we were to normalize the overcounted balls, then we end up with the estimates:
- Urn A's weighted sample has between 99 to 111 striped green balls, 2 or 3 striped blue balls, and at most 1 striped red ball, for a weighted total of somewhere between 102 to 115 striped balls (68%-76.67% striped)
- Urn C's weighted sample has between 108 to 118 striped green balls, 2 (well, between 1.75 to 2) striped blue balls, and at most 1 striped red ball, for weighted total of somewhere between 111 to 121 striped balls (74%-80%)
- Urn F's weighted sample has between 20 to 27 striped green balls, 34 to 42.5 striped blue balls, and 12.5 to 23 striped red ball, for a weighted total somewhere between 66 to 93 striped balls (44%-62%)
- Urn T's weighted sample reports somewhere between 100 to 112 striped green balls, 1 to 3 striped blue balls, and at most 1 striped red ball, for a weighted total of 101 to 116 striped balls (67%-77%)
But if we were given this data, working backwards, we would end up with radically different estimates for the population size. Urn F has a "margin of error" of about ±9%, which would be huge. The reported margin of error (at 90% confidence) would be at most 6.75%, though. The reported margin of error underestimates the actual range of variability the polls could report.
On the other hand, for urn T, we see the reported margin-of-error would be 6.315% (at 90% confidence) whereas its estimates have a ± 5% margin. In this case, the margin of error reported over-estimates the interval width.
Observation 1. The number of striped balls drawn using these different sampling methods produce different estimates for the total number of striped balls in each of the urns.
Observation 2. For urns with high proportion of striped balls, the margin of error decreases. For urns with low estimates of striped balls with decent sample sizes should be believed.
Homework 1. Given the range of estimates for each of these sampling methods, produce plots estimating the number of striped balls in each urn.
A concluding remark: the reader may object at the sample size of 150 being too small. This is a valid criticism, but when you examine the polling crosstabs, it's not uncommon to find 150 Hispanics polled. The guiding question I have writing this series of posts is whether we can extract anything meaningful from the polling results.