Opinion

Whose genome are we studying?

Precision medicine has a representation problem.

The convergence of genomics and artificial intelligence is transforming how we understand disease. We now have access to an unprecedented amount of data, and it’s paving the way for the transformation of our approach to medicine. The principle is simple: every individual is genetically unique, and so medicine should reflect that individuality. Recent advances in genomics and bioinformatics have brought this vision closer to reality. Around 20 years ago, it took billions of dollars to sequence the first human genome. Today, we can sequence the entire genome of an individual for a few hundred pounds.

Artificial intelligence (AI) has further accelerated this progress. Machine learning models can now analyse enormous datasets, detect patterns in genetic variation, and predict how specific mutations might affect protein structure or disease risk. Algorithms trained on genomic data can identify variants in our genome that result in individuals having different responses to the same medication. In theory, this could lead to a future where diagnoses and treatments are informed by the complete genetic profile of a patient.

However, this progress comes with a fundamental risk. In healthcare, as in many other fields, AI systems are limited by the data they are trained on. If genomic datasets are incomplete or skewed toward certain populations, the predictions generated by them will reflect that bias. This problem is already visible in current genomic databases. Many large genetic studies have historically focused disproportionately on individuals of European ancestry, meaning that the variants discovered, and the risk predictions built from them, are often less accurate for other populations.

Recent large-scale sequencing initiatives highlight the importance of expanding genomic diversity. Projects aimed at sequencing genomes from underrepresented populations have revealed a vast number of previously unknown variants. Many of these variants are rare and often confined to specific populations.

This has most recently been demonstrated by the outcomes of the Genome India project. By sequencing individuals from diverse Indian populations, the project identified around 180 million genetic variants, roughly two-thirds of which are rare variants that occur within less than 0.1% of the population. Many of these variants are population-specific, showing that large amounts of genetic diversity are missed when certain populations are underrepresented in genomic studies. The project identified 118 variants that adversely impact drug response and have a higher frequency in specific subpopulations. This demonstrates that population-specific genetic variation can directly influence how individuals respond to medication. Without sequencing diverse populations, clinically relevant variants affecting treatment efficacy or safety could remain undiscovered, limiting the potential of precision medicine.

The expansion of genomic sequencing raises another important question: who is responsible for generating and maintaining these datasets? Large genomic resources are often built through collaborations between academic institutions, governments, and private companies. Public initiatives such as the 1000 Genomes Project were designed to map human genetic variation across global populations and make that data freely available to researchers. At the same time, private companies have entered this space through direct-to-consumer genetic testing, quietly building enormous genetic databases of their own. But the existence of multiple actors collecting genomic data also means that there is no single standard for how representative these datasets should be.

As genomic medicine continues to expand, questions of representation and responsibility will continue to bubble to the surface. Precision medicine depends on large datasets to function effectively. Ensuring equitable representation in genomic research is central to the future of genomic medicine. If the future of precision medicine lies in AI models, a standard for representation must exist. Without this, bias will continue to be built into the way we practice medicine.

Ultimately, the promise of precision medicine lies not just in technological innovation but in how we choose to build and manage the data that powers it. Artificial intelligence may be able to analyse genomes at an unprecedented scale, but it cannot overcome gaps in the underlying data. The future of personalised healthcare will depend on building genomic resources that are diverse, transparent, and ethically governed. Only then can precision medicine truly fulfil its promise.

From Issue 1895

13 March 2025

Discover stories from this section and more in the list of contents

Explore the edition