A Machine Learning Review of the Literature on the Biology and Treatment of Aging
Researchers here take an interesting approach to reviewing the evolution of the field of aging research over the past century, employing machine learning approaches to process abstract summaries from the entire literature in search of meaningful patterns. Some of the findings are much as might be expected given the way in which top-down funding choices shape trends in research, while others are more subtle commentaries on the difference between the map and the territory, such as in the matter of the hallmarks of aging and its relationship with aging research as it actually exists.
Aging research has advanced significantly over the past century, from early studies on animal models to a current emphasis on clinical and translational applications. As research literature expands exponentially, traditional narrative reviews can no longer capture the field's complexity, highlighting the need for new, unbiased synthesis tools. Here, we leverage advanced natural language processing (NLP) and machine learning (ML) techniques to analyze 461,789 abstracts related to aging published between 1925 and 2023.
A central finding of our study is the marked evolution in research priorities over the past 50 years. Early decades were dominated by a focus on animal models and cellular mechanisms, which laid the groundwork for our mechanistic understanding of aging. In contrast, recent decades show a pronounced shift toward clinical research and healthcare applications, reflecting both technological advances and changing societal priorities as populations age. This transition is further underscored by a consolidation of research themes around a few dominant topics; most notably, those related to healthcare and clinics, and an intensive emphasis on neurodegenerative diseases where Alzheimer's disease (AD) and dementia have emerged as the most studied conditions in the aging field. The overwhelming dominance of AD and dementia research may not solely reflect emerging scientific trends but could also be partially driven by funding policies. For instance, agencies like the National Institute on Aging have historically allocated a substantial proportion of their research funding to Alzheimer's and related dementias, shaping the field's research priorities.
Our clustering analysis revealed distinct thematic groups that not only segregate clinical and basic biological research but also highlight specific tissue- and system-focused studies (e.g., those related to the central nervous system or skeletal muscle). Links between biology of aging clusters (such as oxidative stress and cellular senescence) and clinically oriented clusters remain sparse. This suggests that despite the overall growth in aging research, a significant gap persists between fundamental aging mechanisms and their translation to clinical settings.
Beyond these broad trends, a focused analysis on the biology of aging research literature uncovered distinct clusters corresponding to fundamental aging processes. When we compared these clusters with the well-established hallmarks of aging, we found that while some clusters align closely with these predefined categories, others do not clearly fit into them. This discrepancy suggests that the biology of aging contains more diversity than the classical hallmarks scheme alone might capture.