Thursday, June 20, 2019

Software analogies in statistics

When it comes to programming, I'm fond of "agile" practices (unit testing, contracts, etc.) as well as drawing upon standard practices (design patterns, etc.). One time when I was doing some R coding, which really feels like scripting statistics, I wondered if the "same" concepts had analogous counterparts in statistics.

(To be clear, I am not interested in contriving some functorial pullback or pushforward of concepts, e.g., "This is unit testing in R. The allowable statistical methods within R unit tests are as follows: [...]. Therefore these must be the anologues to unit testing in statistics." This is not what I am looking for.)

The problem with analogies is there may be different aspects which we can analogize. So there is no "one" analogous concept, there may be several (if any) corresponding to each of these software development concepts.

The concepts I'd like to explore in this post are Design Patterns, Unit Testing, and Design by Contract. There are other concepts which I don't believe have good counterparts (structured programming amounts to writing linearly so you read the work from top to bottom, McCabbe's analogy of "modules :: classes :: methods" to "house :: room :: door" does not appear to have counterparts in statistics, etc.); perhaps the gentle reader will take up the challenge of considering analogies where I do not: I just do not pretend to be complete with these investigations.

Design Patterns

Design patterns in software was actually inspired by Christopher Alexander's pattern language in architecture. For software, design patterns standardize the terminology for recurring patterns (like iterators, singletons, abstract factories, etc.).

One line of thinking may be to emphasize the "pattern language" for statistics. I think this would be a repackaged version of statistics. This may or may not be fruitful for one's own personal insight into statistics, but it's not "breaking new ground": it's "repackaging". Unwin's "Patterns of Data Analysis?" seems to be the only work done along these lines.

For what it's worth, I believe it is useful to write notes for one's self, especially in statistics. I found Unwin's article a good example of what such entries should look like, using "patterns" to describe the situation you are facing (so you can ascertain if the pattern is applicable or not), what to do, how to do some kind of sanity test or cross check, etc. As an applied math, statistics is example-driven, and maintaining one's own "pattern book" with examples added to each pattern is quite helpful.

Another line may pursue the fact that software design patterns are "best practices", hence standardizing "best statistical practices" may be the analogous concept. Best coding practices are informal rules designed to improve the quality of software. I suppose the analogous thing would be folklore like using the geometric mean to combine disparate probability estimates. Or when to avoid misusing statistical tests to get bogus results.

Unit Testing

Unit testing has a quirky place in software development. Some adhere strictly to test driven development, where a function signature is written (e.g., "int FibonacciNumber(int k)") and then before writing the body of the function, unit tests are written (we make sure, e.g., "FibonacciNumber(0) == 1", negative numbers throw errors, etc.). Only after the unit tests are written do we begin to implement the function.

Unit tests do not "prove" the correctness of the code, but it increases our confidence in it. Sanity checks are formalized: squareroots of negative numbers raise errors, easy cases (and edge cases) are checked, and so forth. Code is designed to allow dependency injection (for mock objects, to facilitate testing). These tests are run periodically (e.g., nightly, or after every push to the version control system) and failures are flagged for the team to fix. I can't imagine anything remotely analogous to this.

The analogous counterpart to "increasing our confidence in our work" would be some form of model verification, like cross-validation. However, model verification usually comes after creating a model, whereas unit testing is a critical component of software development (i.e., during creation).

Design by Contract

Contracts implement Hoare triples, specifying preconditions, postconditions, and invariants. These guarantee the correctness of the software, but that correctness stems from Hoare logic.

Statistical tests frequently make assumptions, which are not usually checked. These seem quite clearly analogous to preconditions or postconditions. For example, with a linear regression, we should check the error is not correlated with any input (this would be a postcondition) and there is no multicollinearity, i.e., no two inputs are correlated (this would be a precondition). For example, we could imagine something like the following R snippet (possibly logging warnings instead of throwing errors):

foo <- function(mpg, wt, y, alpha = 0.05) {
  # assert mpg is normally distributed
  assert_that(shapiro.test(mpg)$p >= alpha)
  # assert wt is normally distributed
  assert_that(shapiro.test(wt)$p >= alpha)
  # now assert mpg & wt are uncorrelated
  assert_that(cor.test(mpg, wt, method = "kendall")$p.value >= alpha)
  
  # rest of code not shown
}

The other aspect of contracts may be the underlying formalism, whose analogous concept would be some formal system of statistics. By "formal system", I mean a logical calculus, a formal language with rules of inference; I do not mean "probabilistic inference". We need to formalize a manner of saying, "Given these statistical assumptions, or these mathematical relations, we may perform the following procedure or calculation." I have seen little literature on this (arXiv:1706.08605 being a notable exception). The R snippet above attempted to encode this more explicitly, but Hoare logic analogues are implicit in statistics textbooks.

We might be able to capture the "sanity check" aspect to post-conditions in special situations. For example, testing if two samples have the same mean, we could verify we reject the null hypothesis correctly (we disproved they share the same mean) by looking at the confidence intervals for the data samples and seeing they are "mostly disjoint". This example is imprecise and heuristic, but illustrates the underlying idea.

Conclusion

Although statistics has been referred to as "mathematical engineering", a lot of techniques of software engineering don't really apply or have analogous counterparts. Some, like preconditions, have something mildly similar for R scripts. Others, like "design patterns", are more a meta-concept, a guiding concept for one's own notetaking skills rather than directly applicable to doing statistics.

References

No comments:

Post a Comment