University of Minnesota

Alessandro Menotti and Henry Blackburn on Evolution of the Risk Factor Concept

There is a view that the risk factor concept has been a revolution in the history of medicine leading to greater understanding of causes, predicting risk, and guiding the prevention of the common diseases of industrial society. Its origin is credited both to the life insurance industry and to cardiovascular disease (CVD) epidemiology. From early in the 20th century, insurance underwriters used data on body build and blood pressure of insured lives as variables among numerous “impairments” to create actuarial estimates and predict excess mortality. The new field of formal CVD epidemiology that arose independently in the mid-20th century called such variables “risk characteristics,” then “risk factors,” the latter term appearing in a 1961 paper on coronary disease risk by William Kannel and colleagues from the Framingham Heart Study (Kannel et al 1961).

In that report, serum cholesterol, blood pressure, and smoking habit, measured in some 5,000 disease-free subjects, were found significantly related to the six-year probability of a coronary event. Their multivariable combination with other measures is the base of modern prevention and public health.

CVD epidemiology and other predictive sciences have evolved from this early prospective evidence among populations, with further search of the degree and independence of ever-more novel predictive factors in the chain of disease causation. A rapid accumulation of evidence about CVD risk led to development of the risk paradigm that has provided a useful framework for further research, for actuarial estimates of risk in insurance medicine, and for guiding training, practice and policy in chronic disease prevention.

Prospective studies carried out in varied populations continue to explore the relative, absolute, and attributable risk associated with “traditional” and newer risk factors, as well as their force according to chronologic age when measured, according to different geography and culture, and to their trends over time. Attempts to “prove” causal roles of the predictive factors also spread widely as clinical trials and community programs in the primary and secondary prevention of chronic diseases. The risk factors are thus the main constituents of modern CVD epidemiology.

But this new general field concept of risk produced a negative reaction among some in medicine who resented the introduction of ideas about disease causation from the unfamiliar disciplines of statistics and epidemiology. Others came to use the risk idea inappropriately, dichotomizing risk factors as “present” or “absent” and observing that disease occurred among patients “with no risk factors” or that many “with risk factors” remained disease-free. The popular clinical concept that risk factors were either present/absent or high/low produced a myth of “threshold,” a magical level above which risk is substantial and below which risk does not exist.

In fact, all attempts have failed to locate such critical levels in population distributions. It was finally established unequivocally among the 370,000 screenees of the Multiple Risk Factor Intervention Trial (MRFIT) that CVD risk in respect to serum cholesterol or blood pressure levels is smoothly continuous, monotonic (in one direction), and exponential. Moreover, the general concept of “the lower the better” for risk factor levels was accepted only recently: the level for “normal” blood pressure diminished apparently magically from 160 mm. Hg to 120 in the years between the 1960’s and the 2000’s, on average 1 mm. Hg per year!

This medical dichotomizing of continuous data discarded much valuable information about risk probabilities. Moreover, case-control approaches cited as counter proof to the epidemiological-statistical findings were often inapplicable to coronary heart disease with its high short-term fatality and where “reverse causality” (the risk factor being caused by the disease) may occur.

A further mistake in application was consideration of only one risk factor at a time; cross-classification of as few as 3 risk factors at 3 cut-off levels proved to be beyond practicality. The concept and practical convenience of multivariate analysis in handling many and interactive risk factors in prediction models entered preventive practice only in the latter 1960’s, again from the Framingham Study.

Since these origins in mid-century, an enormous amount of prospective information has been gathered on CVD risk, such that several hundred risk factors are now alleged for coronary disease alone. But the majority of these are indirect indicators of risk (i.e. not in the causal pathway), most are closely associated with the “traditional” CVD risk factors, and few survive simple tests of their independent association in multivariate models. Moreover, the prognostic value, reputation, and use of those traditional CVD risk factors: blood pressure, serum cholesterol, body mass, and tobacco use, rose remarkably when it was found that they successfully predict excess risk of deaths from other than cardiovascular diseases, and, in fact, strongly predict total mortality–from all causes.

Bradford Hill at the London School of Hygiene and Tropical Medicine, along with advisors to the US. Surgeon General’s 1964 Report on Health Effects of Smoking, established a useful guide for causal inference from risk factor associations with disease. To the extent that the following criteria are met by a statistical association of risk factor and disease, the greater the likelihood that the factor causes the disease:

  • strength of association
  • consistency
  • gradation
  • temporal sequence
  • biological plausibility
  • specificity
  • coherence with other research and experimental findings

Later, the Seven Countries Study in particular demonstrated the universality of the traditional CVD risk factors that discriminate a wide range of individual coronary and CVD risk within whole populations differing widely in disease incidence. That study also showed major cultural differences in risk factor means and distributions, the more dramatic example being the serum cholesterol values in 1960’s Japanese and Finnish cohorts of middle-aged men. In widely disparate population distributions among apparently healthy rural men, the highest Japanese values reach only the lowest 5th percentile of the Finns’ values. Genetics explains part of the individual’s position in his population distribution but culture and environment explain the spatial separation of the population distributions (Kromhout et al. 2002).

Rose emphasized other attributes of risk factor distributions, illustrating, for example, in BMI distributions among five populations, how the distribution relates to the frequency of “overweight” in a population, defined by an arbitrary criterion of BMI >30, and how prevalence is profoundly influenced by a relatively small shift downward and leftward in the mean and in the whole array. This illustrates the potential for greater public health of small but widespread changes in population risk factor levels, as well as the greater efficacy of population interventions over medical strategies directed only to those at highest risk.

The Seven Countries Study also was among the first to provide systematic evidence about large differences in average population risk factor levels and their (ecologic) correlation with coronary heart disease rates. High population levels of serum cholesterol and saturated fatty acids intake, characteristic of Northern European and North American samples, are associated with high rates of CHD incidence and mortality, in contrast to findings in Southern Europe and Japan, where serum cholesterol, saturated fatty acid intake and CHD rates are much lower. This finding is congruent with the clinical association of elevated blood lipids and disease, and with the ability to modify serum cholesterol levels experimentally by diet and drugs, thus firmly establishing the causal pathway of diet > lipids > atherosclerosis.

The ecologic correlation of average population risk factor levels, their combinations, and CVD rates “explain” much of the geographic differences in disease burden. These population phenomena became the basis for Geoffrey Rose’s useful articulation of the idea of “sick individuals and sick populations”. The regressions of risk factors on disease rates also point to “exceptions that test the rule,” when, for example, the disease experience departs strongly from the regression line: such as too many coronary events in East Finland, or too fewer than predicted for Crete, leading the way to new ideas about risk influences among populations.

The concept of “necessary and contributory causes” also arose predominantly from Seven Countries Study findings, for example, about smoking habits, which indicated a predictive role of tobacco for individuals within populations but a very marginal role in explaining comparative CVD rates. A high prevalence of smokers in the Japanese cohorts is associated with low rates of CHD, in the absence, presumably, of relatively average high blood cholesterol level. During many years of follow-up, the “ecological” association of smoking habits with CHD rates has been weak among the seven countries studied. In partial explanation, smoking habit intensity and duration differ among the populations, but the phenomenon also suggests the existence of “necessary and contributory” causal factors resulting in a significant population burden of coronary disease.

High levels of serum cholesterol, and high saturated fat intake, would, in this view, be considered “necessary” factors for any substantial prevalence of atherosclerosis. The Japanese experience indicates that this could not be said for smoking habits. The ecologic data also suggest that co-factors are necessary as well for the association of smoking and lung cancer; the Japanese appear to be specially protected from cancer as well as coronary disease.

Risk Scores

With the advent of facile computation for multivariate analysis, many attempts have been made to create practical risk functions based on combined risk factors and then apply them to the same or different populations. The “back application” of a risk function to the population that produced the model usually gives acceptable discrimination between cases and non-cases. For example, in most of the early models, about one third of cases were located in the upper 10% of the estimated risk distribution, while about half were located in the upper 20%. But again, Rose has shown that among Western affluent society, the bulk of coronary cases comes from the central part of the distribution of serum cholesterol level, formerly considered “within normal limits,” rather than from the tails (Rose).

Good performance also may be seen in applying predictive models developed from one to another population, and differences in relative risk can be safely estimated using risk functions derived from populations other than those from which the models are applied. In these circumstances, a given difference in the level of a risk factor is associated with the same proportional difference of disease or mortality rates in different populations. On the other hand, the estimates of absolute risk across populations are usually off the mark, giving over- or under-estimates of actual events.

In the 15-year experience of the Seven Countries Study, predictive regression models derived from four factors among populations of Northern Europe and North America yielded a two-fold excess of estimated over observed events when applied to Southern Europeans and Japanese cohorts. The reverse (underestimation), was obtained by applying the Southern European or the Japanese data in models for estimates among Northern European and North American populations (Kromhout et al 2002).

This suggests that unmeasured or unknown factors, probably including duration and temporal slope of exposure, explain the residual difference in risk across populations. For instance, because of dramatic dietary changes in Japan in recent decades, a 60-year-old Japanese man with a serum cholesterol value of 200 mg. dl today probably had a value less than 160 for the bulk of his youth and adult life. Moreover, cardio-protective factors have been postulated in traditional Japanese exposures, including abundant dietary anti-oxidants. Thus, considerable undefined risk information is, perforce, represented in the “constants” of present-day multivariate prediction models.

Older Ages

With the advent of long-term observations, specific questions have arisen about how long a single risk factor measurement is predictive, or whether risk factors are predictive when measured at older ages. The latter question is now pretty well answered in that risk factors measured in the elderly are predictive but to a lesser extent than measurements taken in early and mid-adulthood. This is clear for both serum cholesterol and blood pressure, but the complex influence of drug treatment, which has become more common during the last decades, renders global interpretation difficult.

On the other hand, though history reveals that relative risk diminishes with age, absolute risk and the frequency of events increase dramatically. Thus, risk attributable to a given factor remains high, as should the potential for prevention by modification of CVD risk factors, even in the aging population.

The Seven Countries Study and other of the longer-term cohort studies show further that a single measurement taken in adulthood is predictive of events over a period of several decades, with only a slight decline in the slope after 30 or more years. This does not mean that risk factor changes over the years have no influence in the evolution of risk. But a measure taken in midlife probably contains much information about the past and about vascular damage affected by long exposure to the risk factor.


Those who have been active in CVD epidemiology for many years have witnessed great changes in risk factor distributions among generations from the 1950s to the present. For example, the decline in major risk factor levels in North America and in some Northern European countries started in the 1960’s or 1970’s and was accompanied by declines in coronary heart disease mortality and probably incidence. The reverse occurred in countries of Eastern Europe and apparently is now occurring in developing countries. Aging cohorts like those of the Seven Countries Study and trends in values from international cross-sectional surveys, indicate that risk factor distributions of the 2000s greatly differ from those of the 1960s (Kromhout et al 2002). Further documentation of world-wide time trends is provided by the WHO MONICA Project. (MONICA 2003). Meanwhile, primary and secondary prevention trials show that reduced CVD risk is achievable through a variety of diet, drug, and multiple risk-factor modifications, demonstrating the still-unrealized promise of wide and more complete prevention.

Refinement of risk prediction continues in modern researches, including genetic markers, but new risk factors are often found to be confounded when established risk factors are simultaneously entered into the multivariate models. Few studies today are able to include all previously identified risk factors in their analyses, leaving uncertain any proposed role of the potentially “novel” risk factors. ARIC is one of the few studies able to meet the requirements for both old and new risk factors in a modern multivariate analysis (Chambless et al 2003). Whatever new factor is identified, the chance is always high that it is somehow correlated with the old ones. This also may apply to genetic measures, which are looked to nowadays with sanguine hope to solve issues of prediction and causality.

At this point, some consider that new statistical models are needed to “digest” many more predictive variables and exploit their potential, even when the variables are highly correlated with each other. Hopes were poised, a few years ago, for models based on a “neural network,” but few investigations have been carried out. A contrary view is that multivariate methodology may have already made its major contribution. (Alessandro Menotti and Henry Blackburn)


  1. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J. 1961. Factors of risk in the development of coronary heart disease–six year follow-up experience. The Framingham Study. Annals of Internal Medicine: 33-50
  2. Kromhout D, Menotti A, Blackburn H (ed). 2002. Norwell, MA, USA, Kluwer Academic Pub.
  3. WHO MONICA Project. 2003. MONICA. Monograph and Multimedia Sourcebook. 2003. ed. H Tunstall-Pedoe. Geneva: World Health Organization.
  4. Chambless L, Folsom A, Sharrett A, Sorlie P, Couper D, Szklo M, Nieto F. 2003. Coronary heart disease risk prediction in the Atherosclerosis Risk in Communities (ARIC) study. Journal of Clinical Epidemiology 36: 880-890.