University of Minnesota
http://www.umn.edu/
612-625-5000
Menu

David Jacobs

Year: August 21st, 2005
Location: Minneapolis, Minnesota
Interviewed by: Blackburn, Henry

Abstract

David Jacobs takes us over the origins of mathematical statistics, going back to the Greeks, and he marvels at the prodigious work in computation, aside from the thinking thereupon, of people up to and including RA Fisher. He describes calculators and early computers and what tools the community had in hand at the origins of formal CVD epidemiology (set arbitrarily at 1948). We review the revolution in epidemiology that came from doing regressions and multivariate analysis, the logistic and proportional hazards ratios, by simply touching a button. (Henry Blackburn)

Quotes

[ed. When asked what computational tools were available at the beginnings of formal CVD epidemiological studies, Jacobs replied:]

In 1948, what existed were the fundamentals of probability theory, the ability to deal with differences between small samples (e.g., if you had a group of beers that had a certain quality in a sample group of beers you could tell whether they were the same or better or worse but you didn’t have anything multivariate). You could do a simple comparison.

You had the correlation coefficient and the basic idea of regression and people could conceive of multiple regression but they couldn’t really do it because they couldn’t get closed form solutions and they didn’t have computers. You could do two things at once and maybe you would do cross-classifications of 2 variables (R x C table) or of 3 simple variable (2 x 2 x 2 tables).

We were partly working with quick (e.g., hand or paper and pencil) computing methods, too, which we called “shortcut methods” when I was in graduate school. Many of the methods of adjustment, such as the Mantel-Haenszel procedure, are shortcut computing methods and are not really needed because something very similar can actually be done within the context of regression. Since regression can now easily be done on a computer by pushing a button, we no longer have to go through all the laborious thinking to do statistical adjustment.

I recall dire warnings in the Probability and Statistics faculty during my graduate years: people who do not go through all the computational steps will not understand the workings of the methods. Methods like the Mantel-Haenszel procedure are heuristic: if you view them step-by-step you see the underpinnings or inner workings of statistical adjustment.

In fact, our students are good at interpreting regression methods, but the warning was correct. When we ask them questions that require them to manipulate the statistical and mathematical concepts in common procedures, many do poorly.

Returning to the 1948 era, R.A. Fisher had gotten into factorial designs by the 1920s. He was doing agricultural experiments (not observational studies) and he would have a three-by-three plot (literally a plot of land, much statistical jargon has agricultural overtones for that reason)…

He also got into the idea of intra-class correlation, an idea with which we now deal very heavily in repeated-measures regression work. That is, the generalization of a correlation from a measure of concordance within pairs to concordance within a family. Intraclass correlation asks how much more alike are members of a family than the families are like each other. That concept turns out to have a wide utility.

So that stuff would have existed and probably the ability to do simple two-variable regression would have existed. You could do a two-variable regression, but you couldn’t do a three-variable regression because the computations were just too difficult.

Regression toward the mean

And when I think of Galton, I think of regression towards the mean. He plotted the height of the sons against the height of the fathers, so on the y axis was the height of the sons and the x axis was the height of the fathers and he noted that the slope was less than 1, and he said what that means is that the tall fathers are having shorter sons and the short fathers were having taller sons.

[When asked how he figured Keys thought he could do prospective epidemiology with 500 good Minnesota men, Jacobs replied:]

I was always struck in the Laboratory of Physiological Hygiene by the physiologists, which included particularly one of the people that I knew, Henry Taylor, as well as Ancel Keys, who believed that the way to get the answer to a question was to narrow it down perfectly. In this example situation you could measure what you needed very precisely and exactly— correctly; this would get rid of all the extraneous variance, leading to an answer that would not require much statistical theory to interpret. So it’s very much a bench scientist’s perspective in which the sample size arises by the seat of the pants, enough to get rid of the little variance that remained and to be able to say that you could replicate your observation in more than one study subject (which defines generalizability in much of bench science).

Contributions of CVD Epidemiology

Besides the computational and inferential advances that come with this enterprise that we have been discussing, there are advances from cardiovascular disease and arterial research in how you organize such a large enterprise and how you end up interpreting it. Those advances are all a very important part of the enterprise. It’s not all statistical.

But besides that I think that probably the two areas where CVD epidemiology really made a contribution is in the use of multivariable regression and then in clinical trials.

Observation versus trials

Of course, when you do a clinical trial and it comes out right, it is such an elegant answer to the question of whether the treatments differ from each other. You would like to do clinical trials if you could, but it turns out that the clinical trials methodology really limits research scope

  1. you can’t do that many clinical trials because they are very expensive and so you have a very limited number of questions that you can answer;
  2. you have a limited type of question you can answer because, most classically for example, we couldn’t ethically induce people to smoke as a randomized treatment and we were unable to induce them to quit very well as a randomized treatment.

And so the clinical trial methodology and the belief that the clinical trial had to be performed, it had to be pristine, it had to be double-blind, and that kind of thing in a way led to pharmaceutical studies. You think what kind of a treatment is most amenable to a feasible clinical trial study: drugs.

I think you had a sign in your office in the early 70s, “non-randomized trials are for the birds”. And that attitude that the clinical trials and randomized design was the only way to go contrasted with the tremendous progress that was being achieved especially in the cancer field by the non-randomized studies. In the cancer field they could do clinical trials for treatment but they couldn’t do clinical trials of the occurrence of cancer. Cancer was not occurring at a fast enough rate for them to do prospective studies or clinical trials of prevention, so they had to do case-control methods and they had to use observational studies to understand fundamental causes.

Choice of design:

In Leslie Kish’s Statistical Design for Research, Wiley, 2004, he formulated what he called the 3 Rs: randomization, representation, and reality. And these were measuring tools or standards for evaluating any given design.

Randomization has to do with comparability so it’s an internal validity criterion. If you have randomization then you feel very comfortable in comparing your different treatments or exposures.

Representation has to do whether if you can generalize to a total population. And so the clinical trial is poorly generalizable, being focused for ethical reasons just on one narrow interpretation of what exposure is studied and in a narrow population.

Reality had to do with the issue of when you implemented a given exposure, given treatment or whatever, was it done in such a way that it would fly in the real world? So was this design actually not only representing the general population, was it representing something as it would happen?

Cancer versus CVD Epidemiology

I think you had in your office in the early 70s, “non-randomized trials are for the birds,” and that attitude that the clinical trials and randomized design was the only way to go contrasted with the tremendous progress that was being achieved especially in the cancer field by the non-randomized studies. In the cancer field they could do clinical trials for treatment but they couldn’t do clinical trials of the occurrence of cancer. Cancer was not occurring at a fast enough rate for them to do prospective studies or clinical trials of prevention, so they had to do case-control methods and they had to use observational studies to understand fundamental causes.

There is another really interesting example of the cancer versus the heart disease fields and that is in studies of diet. Cancer people early on hit on food frequency questionnaires. The heart disease people were still using 24-hour recalls. Because of the logistic difficulty of 24-hour recalls the heart disease people were using the single 24-hour recall, even though they understood that method 1) not to represent the person very well, it’s too idiosyncratic; and 2) that the data processing questions which plague us meant that food grouping was extremely difficult to do.

So the heart disease people got themselves into looking at a certain set of nutrients and then that was all they could do. They couldn’t afford financially to do any more with their data and their data systems were not set up to be easy to go back and forth with the original foods data. Of course the cancer people could form their food groups more easily than nutrients. So a lot of the dietary stuff which says that certain kinds of foods are better or worse for you comes from the cancer-based food-frequency methodology.

Full Transcript Access

Full transcripts of interviews may be made available to those engaged with original materials for scholarly studies by contacting us.