Tuesday, May 28, 2019

How many news stories are there?

Recently, an eccentric billionaire bought the Los Angeles Times and sought to make it rival the New York Times as a "newspaper of record". Presumably this means hiring more journalists, but let us ask a simpler question.

Puzzle 1: How many news stories go unreported by both the New York Times and the Los Angeles Times?

We can solve this puzzle using the maximum likelihood estimator for the Hypergeometric Distribution. Think of it like this: on a remote island with some unknown deer population, we go and (without harming the wildlife) tag K deer. A month later, we return, and capture n deer, of which k are tagged. We can estimate the total population of deer N on the island.

Explicitly connecting that analogous problem to our own, we know the "tagged stories" K reported by the New York Times, the "sample stories" n reported by the Los Angeles Times, of which there is the "tagged sample stories" k reported by both newspapers, and we want to estimate how many news stories there are in total N. The maximum likelihood estimator for N is given by \[ \min_{\widehat{N}}\frac{\Pr(\widehat{N},K,n,k)}{\Pr(\widehat{N}-1,K,n,k)}\geq1 \] the smallest N for which the ratio of probabilities is greater than 1. It is not hard to solve this to find \(\widehat{N} = [Kn/k]\) where the brackets indicate we are using the integer part of the number (e.g., [3.2]=3, [4.9]=4).

Now we just need to list the stories which the New York Times reported but the Los Angeles Times did not (giving \(K-k\)), the stories which both papers reported (k), and the total number of stories the Los Angeles Times reported (n). From this, we will estimate how many stories have gone unreported.

To answer this fully, I looked at the front section for each paper for May 28, 2019. The short answer is K = 22 stories in the New York Times, n = 12 stories in the Los Angeles Times, and k = 5 stories in both. We thus may expect there to be N = [264/5] = 52 stories, of which 29 were reported and 23 went unreported by either newspaper. Find below a density plot of the probability for various N, and notice how it is maximized at N = 52 (indicated by a red vertical line):

Solution: Using the maximum likelihood estimate for the hypergeometric distribution, there were a total of N = 52 news stories, 29 were reported by one of the two newspapers, and 23 stories went unreported.

Puzzle 2: Is there a Bayesian estimate for the number of news stories? Or different ways to estimate the total number of news stories?

Puzzle 3: How stable is this estimate for N? If we examine, say, the last week's worth of articles, do we get approximately the same value for N?

Puzzle 4: What if we extend this analysis to include, e.g., the Wall Street Journal, the Washington Post, and others? How stable is N in this case?

Find two tables below, one listing the stories in the international section for both papers, and the second for national stories. Corresponding stories are listed on the same row.

New York Times Los Angeles Times
She Thought She’d Married a Rich Chinese Farmer. She Hadn’t. (A4)
Attacks by Extremists on Afghan Schools Triple, Report Says (A4)
Romania’s Most Powerful Man Is Sent to Prison for Corruption (A6)
With Trump’s Visit to Japan, Empress Masako Finds a Spotlight (A8)
Trump and Abe’s ‘Unshakable Bond’ Shows Some Cracks in Tokyo (A8) Trump pushes off war talk on Iran, says ‘regime change’ is not U.S. goal (A1)
Election Puts Europe on the Front Line of the Battle With Populism (A10) In European vote, far-right surge fails to materialize, but mainstream parties lose support (A2)
European Parliament Elections: 5 Biggest Takeaways (A10)
European Vote Reveals an Ever More Divided France (A11)
18 Schoolchildren Stabbed, and Girl and Man Killed, in Attack in Japan (A11) Knife-wielding man attacks schoolgirls in Japan, killing 2 (blurb of story on A2)
Sebastian Kurz, Austrian Leader, Is Ousted in No-Confidence Vote (A12) Ousted by parliament, Austria’s Kurz vows to win back job (A4)
Israel’s Netanyahu Struggles to Form a Government, as Time Runs Short (A12) Netanyahu running out of time to form government; Israel may face new elections (A2)
White Panda Is Spotted in China for the First Time (A12)
30 Dead and 200 Missing in Congo After Boat Sinks (A12)
Arrests, killings strike fear in Thailand’s dissidents: ‘The hunting has been accelerated’ (A3)

Matches are based on substantially overlapping subject matters. The only debatable story match is "Trump pushes off war talk on Iran", which is a proper subset of the corresponding New York Times article.

Also note, in the Los Angeles Times, there was a 1000 word blurb about the knife attacks in Japan. Later, on their website, they posted a longer and more detailed article. I decided to count that as a match, which may be debatable.

Sources: Los Angeles Times, New York Times

The national stories in both newspapers, appears to be completely disjoint sets of stories.

New York Times Los Angeles Times
Trump Administration Hardens Its Attack on Climate Science (A1)
Google’s Shadow Work Force: Temps Who Outnumber Full-Time Employees (A1)
Trump Wants to Wall Off Huawei, but the Digital World Bridles at Barriers (A1)
With His Job Gone, an Autoworker Wonders, ‘What Am I as a Man?’ (A1)
With the 2020 Democratic Field Set, Candidates Begin the Races Within the Race (A1)
Saving Charlie: A Rush to Rescue Stranded Cats and Dogs from Oklahoma Floods (A17)
Fearing Supreme Court Loss, New York Tries to Make Gun Case Vanish (A17)
A Missed Opportunity for the Malpractice System to Improve Health Care (A19)
Why a Hamptons Highway Is a Battleground Over Native American Rights (A22)
High radiation levels found in giant clams of Marshall Islands near U.S. nuclear dump (A1)
He made millions as an L.A. investor. Now, he may run for president to fight poverty (A1)
Want to park in Koreatown? Get ready for a ‘blood sport’ (A1)
Put your hands together for the World Series of Poker, turning 50 this year (A4)
Texas lawmakers approve safe gun storage program, quietly going around the NRA (A4)
Oklahoma’s opiod lawsuit targeting drugmaker goes to trial Tuesday (A7)

Matches are based on substantially overlapping subject matters.

Sources: Los Angeles Times, New York Times

No comments:

Post a Comment