Evolution of Cardiovascular Disease Risk Scores. A.Menotti, H. Blackburn
The identification of characteristics called coronary or cardiovascular risk factors has led to development of universal risk scores for prediction of future CVD events that involve multiple risk factors applied singly and together. The primary aim has been to identify the level of an individual’s multi-factor risk for appropriate preventive care. These scores have been progressively facilitated by multivariable statistical functions: multiple linear regression and the corresponding discriminant function, the multiple logistic and the Cox proportional hazards models, the Poisson and the Weibull models, all applied with increasingly powerful digital computers.
The first widely used risk score was computed from data of the Framingham Heart Study published as the Coronary Risk Handbook by the American Heart Association in 1973 This was followed by other such devices in many countries over the years, either in the form of charts or interactive computer software. Most included the “traditional” CVD risk factors of sex, age, blood pressure, serum cholesterol, and smoking habits. Less often they included body mass index, presence of diabetes mellitus, HDL cholesterol level, and left ventricular hypertrophy on the electrocardiogram.
Some of the “risk factors” can be considered as having an etiologic role (e.g. serum cholesterol), others are only early markers of the disease (e.g. ECG abnormalities); others are likely part of the pathophysiological chain of causality (e.g. markers of inflammation). All may contribute to improved prediction of future events. Here I consider some of the issues surrounding risk models and their predictive tools.
Power and discrimination
The ideal risk function would have perfect predictive power. This does not exist.
In principle, however, predictive power and discrimination should increase on adding “significant” independent risk factors. Significance in validating novel risk factors has been based on the P value of its coefficient (that is, the probability that the null hypothesis is true); the partial chi square of the likelihood ratio (an indicator called “informativeness” that suggests whether the addition of any factor improves the overall discrimination); and several risk ratios including relative risk, odds ratio and hazard ratio; that is, how much greater the risk in the presence of a risk factor; ROC curves (Receiver Operative Curve), a measure of discrimination that combines the sensitivity and specificity of the estimates; “calibration,” that is, the assessment of coherence between expected and observed cases in classes of estimated risk [40 years ago this procedure was commonly used when it had not yet achieved its distinctive name]; and the reclassification procedure of cases and non-cases (a process that compares the observed and expected number of events in each cell of a reclassification table and is actually a test of calibration). [The ROC curve, once considered the best option before dedicated computer programmes were available, is now considered the least sensitive tool].
Beyond the “traditional risk factors,” claims are often made for new predictive characteristics, the significance of which is almost always tested against those original factors while ignoring the many others that have emerged over time. It is impossible to ascertain whether a new risk factor really contributes to improved discrimination when “all others” remain, in fact, possible confounders. Thus, in my view, a study can provide full evidence about the predictive power of a new risk factor only if the already time-tested characteristics, as well as all the proposed new ones, are available for the same analysis.
An example of such attempts is found in the Framingham Study and the Atherosclerosis Research In Communities (ARIC) Study that have simultaneously measured many new and old factors at once, using nested case-control designs. ARIC has published on at least 12 new factors that increase the predictive power of its model (using the ROC curve). In a recent paper, beyond 10 basic risk factors, 19 new ones were tested within the same procedure and several of the 19 improved discrimination (among those: the log of CRP, lipoprotein LpPLA, the log of D dimer, the log of vitamin B6, plasma fibrinogen, Keys score, forced expiratory volume, and carotid intima-media thickness). The increase in area under the ROC curve varied from 0.006 to 0.011, along with modest increases of attributable risk.
Some of the novel risk factors are termed markers because their place in the chain of causation is not established and because they appear to wax and wane in strength among analyses; for example: coagulation factors, lipid sub-fractions, psychosocial and socioeconomic factors, homocysteine, and more recently, inflammatory markers and genetic measures. A recent study showed a significant added predictive power of hsCRP (high sensitivity C Reactive Protein) with a family history of heart attack. Immediately these appeared on the internet as a new tool for prediction of CHD events.
But what does it mean when all the new and old risk factors are not present to compete in the same model? Even when predictive power passes through models that incorporate all previously identified “significant” risk factors, models may accommodate no more than a given number of factors that add to prediction when the factors are highly correlated with each other.
How to improve sensitivity and specificity of prediction is a tricky problem. The expectation is bound to the identification of new powerful factors (possibly unrelated to those already known) and/or to the availability of new models capable to overcome the weakness of those currently in use. For example, the initial use of models derived from the philosophy of neural networks does not seem promising.
Can risk scores be exported?
Successive Framingham risk functions have been used in countries having contrasting cultures and they have even been officially adopted by national organizations. In most cases the Framingham risk score systematically over-estimates local risk outside the U.S. This situation should be expected, since, more than 30 years ago the Seven Countries Study showed that risk functions from Northern European and North American populations over-estimated coronary risk in Southern Europe, and data from the latter under-estimated risk in the former. Avoiding such distortions, investigators in Great Britain, Germany, Scandinavian countries, Israel, and Italy, have developed functions and predictive tools based on their own cohort data.
In the late 1990s, the SCORE project in Europe attempted to provide a single European risk function. During the process, the Seven Countries findings described above compelled SCORE to produce separate models for countries with high and low CHD rates. In contrast, comparison of risk functions between Framingham and ARIC cohorts within the U.S. produce similar estimates of absolute risk.
Both the Framingham and SCORE research groups have attempted another device to render their risk functions useful in other populations, through the process called re-calibration that adjusts the risk function for estimates of absolute risk to optimal coherence with findings among the populations. The process consists of modifying the constant for the risk function to the background incidence or mortality rates of the new population. In the Cox proportional hazards model, one may also insert as reference the means of risk factor values for the new population. Using Framingham risk, this approach has produced “satisfactory” estimates of actual risk in new populations, for example, different ethnicities within the U.S. and Chinese populations.
A similar approach, employing EuroSCORE risk functions from Greece, Spain, Switzerland, Germany and Australia, again gave “satisfactory” results. However, after this manipulation, only the risk factor coefficients and their influence on relative risk remain of the original functions. Everything works out by starting from the assumption that the multivariable coefficients of the several risk factors are similar across populations and cultures. This has not yet been systematically investigated, but in 1990 Chambless et al. made a review of logistic models across studies and showed similarities of the multivariate coefficients but left doubt about their homogeneity. Subsequently, several papers from the Seven Countries Study 25- and 40-year follow-up showed that coefficients of the standard risk factors were not significantly different across its several areas, particularly for serum cholesterol. Similarities of the multivariable coefficients of the standard risk factors were also found in the WHO ERICA study and in the Euro SCORE project.
Still, doubts remain about such modelling approaches. For example, a recent meta-analysis of the PSC group in considered the problem of heterogeneity among 61 studies of the association between serum cholesterol level and CHD death. Heterogeneity was confirmed even after exclusion of outliers. However, it was reduced when studies were grouped by large geographical entities such as Europe, the U.S. plus Australia, and East Asia. Conflicting findings also may be attributable to the fact that studies followed different methodologies whose characteristics could not be taken into account by the meta-analytical procedure.
Definition of high risk
Tools produced for the estimation of CVD risk are designed primarily to identify high risk subjects who require personalized preventive treatment. The choice of a threshold for therapy is, however, difficult. In the early use of multivariable functions it was shown that almost 50% of 5- to 10-year coronary events occurred among the upper 20% of estimated risk, and about 30 % among the upper 10%. The cut-off limit of the upper 20% of estimated risk seemed rational but should have taken into account sex, age, other covariates, duration of follow up, and the diagnostic criteria for the end-point, all of which can modify the multivariable coefficients.
In general, in considering treatment to reduce high risk, I suggest that there should be a high likelihood that risk will be modified by the anticipated intervention and that a threshold would be crossed by any treatment effect. The ATPIII report has suggested, for example, to treat as high risk those carrying an estimated risk for a cardiac event of 2 % per year or greater over 10 years. Subjects with a risk lower than 1% should not be considered. However, if the background population risk is relatively low, almost nobody would be chosen for treatment with this criterion. The EuroSCORE research group proposed, in contrast, to define high risk at the level of 5% or greater mortality risk from CVD in 10 years. Others propose risk levels associated with the best cost/benefit ratio, that is, a new marker should sufficiently alter the clinical outcomes to justify the cost of testing and treatment.
Expressing the estimated risk to patients
Clinical judgment in providing risk estimates to patients includes the likelihood of a patient’s commitment to preventive action. Moreover, the magnitude of absolute risk says little for most physicians and means nothing to most patients. Relative risk, on the other hand (expressed as the ratio of estimated risk to average risk for the same sex and age in the same population) is clearly more understandable to physicians and patients. But, even when high, relative risk may correspond to little absolute risk, especially in younger people. On the contrary, in older people a low relative risk may hold a high absolute risk. A useful alternative to being explicit in either case, to achieve the needed action, is to communicate risk transformed into estimated years of life or biological age.
In 2002, an Italian group produced a chart and software for cardiovascular risk prediction defining a “biological age of risk”, that is, the age of a person who carries average Italian risk factor levels and the same estimated risk as the subject. This age can be equal to, greater than or smaller than the actual age of the subject, giving an understandable indicator of the need for action. More recently, a similar index called “heart age” was created by the Framingham research group on the basis of 30-year Framingham follow-up data.
During the last half century, great progress has been achieved with various practical tools to predict risk of cardiovascular events as a function of baseline risk factor levels in patients and the clinically healthy. Improved prediction among the at-risk segment of any population would benefit from universally applicable risk functions and from better ways of communicating the concept of risk efficiently and effectively to achieve a preventive impact.(Alessandro Menotti)
American Heart Association. 1973. Coronary Risk Handbook: estimating the risk of coronary heart disease in daily practice. Dallas, American Heart Association.
Keys, A., Menotti, A., Aravanis, C., Blackburn, H., Djordjevic, B.S., Buzina, R., Dontas, A., Fidanza, F., Karvonen, M.J., Kimura, N., Mohacek, I., Nedeljkovic, S., Puddu, V., Punsar, S., Taylor, H.L., Conti, S., Kromhout, D., Toshima, H. 1984. The Seven Countries Study: 2289 deaths in 15 years. Prev Med; 13 : 141-154.
Conroy, R.M., Pyorala, K., Fitzgerald, A.P., Sans, S., Menotti, A., De Backer, G., De Bacquer, D., Ducimetiere, P., Jousilahti, P., Keil, U., Njolstad, I., Oganov, R.G., Thomsen, T., Tunstall-Pedoe, H., Tverdal, A., Wedel, H., Whincup, P., Wilhelmsen, L., Graham, I.M.. 2003. Estimation of ten-year risk of fatal cardiovascular diseases in Europe: the SCORE project. Eur Heart J; 24: 987-1003.
Full references for this essay are available from the author at: “Prof. Alessandro Menotti” <email@example.com>