AHA Pooling Project: To Pool or not to Pool

The 1960s-70s AHA Pooling Project to pool data from extant U.S. prospective studies in CVD epidemiology became a prime function of the newly launched AHA Council on Epidemiology and its Subcommittee on Criteria and Methods. It was a direct outcome of the 1959 Princeton Conference on Epidemiological Methods in response to the second of the meeting’s five recommendations (Pollack and Krueger 1960):

2. To develop cooperative studies involving the pooling of data from several epidemiologic investigations, especially in areas where the data from any single study would be numerically inadequate for meaningful statistical evaluation.

The charge was rational, the goal exemplary, and the practical and analytical issues worthwhile and challenging. But the subsequent vicissitudes of the Pooling Project would fill a volume: deciding what data were actually “poolable” from the several cohort studies involved and then to carry out the separate and pooled analyses of such a massive data base. Data handling soon swamped the facilities and staff provided at the University of Michigan by Fred Epstein and Felix Moore, and was eventually rescued by Olympian efforts of the data operation in Stamler’s Chicago Health Research Foundation.

In addition to the basic task of boosting the numbers and confidence in the prediction estimates from single and combined “traditional” CVD risk factors, the challenge presented by pooling was to exploit the far greater numbers and exposures for more interesting questions about the importance of other or uncommon phenomena that were beyond the capacity of any one data set (such as particular ECG findings during health), and of uncommon endpoints (such as sudden, unexpected death).

The intellectual excitement at the outset was manifest in the questions posed to the Pooling Group (1) by its principal gadfly, Jerry Stamler, in a letter of March 29, 1971:

“My effort here is to indicate the unique possibilities that exist for obtaining good answers to important questions by use of the pooled data;” and then he enumerated questions unanswerable in his and other individual prospective studies:

  • Is clinical diabetes a risk factor for CHD in the absence of concomitant hypertension?
  • Are individual ECG findings in health (such as minor negative T waves and simple ectopic beats) predictive of mortality in relation to other factors at entry (an issue being studied at the time among infarct survivors in the Coronary Drug Project)?
  • Are isolated systolic and diastolic blood pressure elevation predictive, that is, each in the absence of the other?
  • How to explain the Aristotelian logic of blood pressure, which is related to CHD risk, and relative body weight, which is related to the risk of high blood pressure, where relative weight is apparently unrelated to the risk of CHD? Is it a question of the pattern or distribution of pressure and weight?
  • What is the joint relation of multiple characteristics to CHD when the potential combinations of variables are huge, with and without multivariate analysis that was just getting under way with adequate computing systems?
  • What can we learn about the effect of stopping smoking on CVD risk in respect to detailed smoking history such as the age of onset, years quit, age at quitting, etc.?
  • And the most important question of all: What is the effect of change in risk factors, trends, on coronary risk and on long-term survival, as may be found in pooled interval data?

As Stamler put it in his letter, quoting a Chicago colleague: “Never make small plans!”

The critical and skeptical view of some about the pooling undertaking, already clear in Bill Kannel’s walking out of Ann Arbor deliberations (described above), is emphasized by Kannel’s March 24, 1971 letter to the pooling coordinator, Felix Moore, in which he reflects on and agrees with the interesting goals of pooling and the ambitious proposals from Stamler, but rejects the feasibility of excavating the data:

“It is my opinion that the kinds of analysis undertaken thus far do little to exploit the potential of pooled data. What has been done thus far can be better done, and with greater validity, by each individual participant in the project using their full follow-up. [Note: In other words, Framingham has already settled the issue of “traditional risk factors;” we would do better to wait for full follow-up among these presumably redundant studies.]

Kannel goes on to echo Stamler’s better reasons for pooling: to broaden the biologic base, to increase exposure to uncommon factors, to predict in one population from data of another, to stimulate more sophisticated analyses, to look for consistency, and to get information for uncommon end-points such as silent myocardial infarct.

But Framingham, he protested in his letter, was in the throes of trying to survive by shifting to a base in Boston University. It had neither the time nor staff to address the complexity of producing all the interim data proposed by Stamler.

Inaction won the day. The final Pooling Project report contained nothing about these interesting, never-pursued questions; only the already widely repeated relation of the baseline “traditional” risk factors to coronary risk (AHA Pooling Project 1978). Decades of follow-up were required for some of those issues to be addressed in the individual prospective studies.

It could, indeed, be questioned, as did Kannel, whether pooling was worth the effort to reach this, the project’s “final” conclusion: “.. the Pooling Project results presented here add further to the large body of epidemiologic data demonstrating that the relationships between the three major risk factors and premature atherosclerotic disease meet the criteria of consistency, strength, graded relationship, independence, temporal relationship and predictive capacity. . . [they] also meet the criterion of coherence. Therefore these relationships are almost certainly cause-and-effect, i.e. etiologic, in nature” (AHA Pooling Project 1978, 266). (Henry Blackburn)


(1)Pooling Group of the American Heart Association: Felix Moore and Fred Epstein Co-P-I; H. Blackburn; J. Chapman; L. Cook; J. Doyle; T. Gordon; W. Kannel; A. Keys, O. Paul; R. Shekelle; P. McNamara; H.L. Taylor.


American Heart Association Pooling Project Research Group 1978. Relationship of blood pressure, serum cholesterol, smoking habit, relative weight and ECG abnormalities to incidence of major coronary events: Final report of the Pooling Project. Journal of Chronic Diseases 31:201-306.

Pollock, H. and D. E. Krueger, eds. 1960. Epidemiology of cardiovascular diseases methodology: hypertension and arteriosclerosis. Report of a conference, 1959, American Heart Association–National Heart Institute. American Journal of Public Health 50 (Suppl.):1-124.