Mortality Risk Analysis in a Dataset of Half a Million People

The UK mortality risk study I point out below doesn't provide any real surprises when it comes to the risk factors associated with higher mortality rates at a given age, but taken as a whole it is a good example of the present trend towards much more data and far larger study sizes in epidemiology. In this age of databases, with the cost of storage and computation falling rapidly towards numbers barely distinguishable from zero, the quality of epidemiological analysis is increasing. More data and larger study populations bring the possibility of ever better statistical measures, the ability to identify more subtle correlations, and - perhaps of greatest interest for those of us not in the science business - online databases that allow everyone to jump and and look at the results.

So you should head over to the UK Longevity Explorer and take a look at the Association Explorer; it's an interesting tool to tinker with, especially once you start digging down into the weeds of smaller associations. It is a nice view of all the things we'd like to render entirely irrelevant by producing rejuvenation biotechnologies capable of repair of cell and tissue damage. In a world in which the causes of aging can be meaningfully addressed, it no longer matters that you have minor gene variants, or had more or less exposure to infectious diseases in youth, or experienced other circumstances that presently swing life expectancy a year or a few years in either direction. The benefits provided by repair therapies will vastly outweigh all of that when it comes to long term health and life expectancy.

On a slightly different topic, and unlike the study below, I suspect that the largest datasets of interest to aging research that emerge in the decades ahead will be obtained without the consent of study participants. The incentives align with this outcome: (a) all groups with the capability to gather large amounts of data are presently doing so rapaciously, since they can use that data to generate profits in many ways; (b) few organizations are any good at defending large databases from attackers; (c) a dataset released into the wild from legal jurisdiction A is a dataset that researchers in legal jurisdiction B don't have to do the work to assemble or otherwise pay to use.

Given these points, I think that we will see continuing theft and release of large sets of medically relevant data, and that researchers and their boards will concoct ethical justifications for using this data as becomes more widely available. For example, researchers might pay a third party to anonymize stolen datasets available online in a way that prevents records from being associated with individuals without disturbing statistical associations, and then never officially view the original data themselves. There will be a sense that it is a shame to let this all go to waste since it is out there.

5 year mortality predictors in 498,103 UK Biobank participants: a prospective population-based study

Participants were enrolled in the UK Biobank from April, 2007, to July, 2010, from 21 assessment centres across England, Wales, and Scotland with standardised procedures. In this prospective population-based study, we assessed sex-specific associations of 655 measurements of demographics, health, and lifestyle with all-cause mortality and six cause-specific mortality categories in UK Biobank participants using the Cox proportional hazard model. We excluded variables that were missing in more than 80% of the participants and all cardiorespiratory fitness test measurements because summary data were not available. Validation of the prediction score was done in participants enrolled at the Scottish centres. UK life tables and census information were used to calibrate the score to the overall UK population.

Of 498,103 UK Biobank participants included (54% of whom were women) aged 37-73 years, 8532 (39% of whom were women) died during a median follow-up of 4ยท9 years. Self-reported health was the strongest predictor of all-cause mortality in men and a previous cancer diagnosis was the strongest predictor of all-cause mortality in women. When excluding individuals with major diseases or disorders (Charlson comorbidity index greater than 0; n=355 043), measures of smoking habits were the strongest predictors of all-cause mortality. The prognostic score including 13 self-reported predictors for men and 11 for women achieved good discrimination and significantly outperformed the Charlson comorbidity index.

Measures that can simply be obtained by questionnaires and without physical examination were the strongest predictors of all-cause mortality in the UK Biobank population. The prediction score we have developed accurately predicts 5 year all-cause mortality and can be used by individuals to improve health awareness, and by health professionals and organisations to identify high-risk individuals and guide public policy.

UK Longevity Explorer

Interest into the causes of death and disease is growing, as is our knowledge and understanding. Individuals, healthcare professionals, researchers, health organisations and governments all want to understand more about what might improve or reduce life expectancy, particularly in the middle-aged and elderly.

A large-scale project called UK Biobank was set up, and between 2006 and 2010, it collected 655 measurements from nearly half a million UK volunteers (498,103) aged 40-70. This website presents the two main parts of the researchers' work: the Association Explorer and the Risk Calculator. These are closely connected - the Risk Calculator is based on findings from the Association Explorer.

The Association Explorer is an interactive graph where you can explore how closely 655 measurements (variables) from the UK Biobank study are associated with different causes of death. The results for different associations are presented separately for women and men, and illustrate the ability of each variable to predict mortality. For more detailed results for each specific measurement, you can click on each dot (data point). You can also select groups of measurements, different causes of death, as well as search for a particular variable of interest using the search bar.

As questionnaire-based variables were found to be the strongest predictors, the researchers created a calculator that could use questionnaire answers to predict an individual's risk of dying within five years ('five-year risk'). To do this, they used a computer-based approach to automatically select the combination of questions from UK Biobank that gave the most accurate prediction of death within five years.