What We Do

Genomic prediction | Functional genomics | Microbiome | Machine learning

Genomic prediction

Complex and chronic diseases such as cardiovascular disease, cancer, diabetes and autoimmune disease account for the majority of deaths and disability worldwide, and early prevention is a major priority for public health. For example, it is now routine to measure cholesterol and blood pressure to identify those at increased risk of cardiovascular disease. At the same time, disease is partly due to changes, both large and small, in our genome – the DNA code we are born with. Therefore, genomics can be used to estimate disease risk. However, currently, genomic information is rarely incorporated into the assessment of a person’s risk in the doctor’s office.

Precision medicine is the tailoring of current disease prevention and treatment approaches toward the individual. One prominent strategy is to use genomic information. Our inherited DNA does not change, meaning we can potentially use the information within DNA to calculate a person’s risk for any of hundreds of diseases. We may be able to identify at-risk individuals very early, before actual disease occurs or even before undetectable disease processes begin. For people with high genomic risk, we may be able to recommend to them preventative measures that either address the biological disturbances caused by genetics or address environmental risk factors that, unlike the DNA itself, are modifiable. We can also use a person’s genomic information to determine how well they may react to different medicines, thus informing the best course of treatment. Development of accurate genomic prediction has the potential to accelerate medical practice towards the goal of precision medicine, by providing a means for personalised management options for each person, given their unique genomic profile.

A key goal of the Cambridge Baker Systems Genomics Initiative is to lead the world in the development of accurate genomic predictors (e.g. genomic risk scores; GRSs); the design of methods and tools for GRS construction; the annotation and curation of GRSs for clinical translation; and the construction of best practices for GRSs to help guide the research community.

We are currently developing novel GRSs for dozens of chronic and complex diseases. Some of our early successes are exemplified by coronary artery disease and celiac disease.

For coronary artery disease (including heart attacks), we can now differentiate groups of people who follow distinct trajectories of lifetime disease risk (see figure). Those at highest risk of coronary artery disease are candidates for early preventative treatment, such as statin prescription or lifestyle coaching. Towards this end, the GRS for coronary artery disease is currently being trialled internationally in studies to motivate individuals to change their lifestyle to reduce well-known risk factors that can cause heart disease, such as high-fat and high salt diets, smoking, and inadequate physical activity.

For celiac disease, we have shown that a GRS can potentially revolutionise the pathways to coeliac diagnosis. This is important because individuals with celiac disease have an autoimmune response to gluten and therefore need to control or avoid gluten consumption as early as possible. By detecting early those individuals at high risk of celiac disease, we can focus costly and time-consuming tests, which include gluten challenge, serology and small bowel biopsy, on those individuals most likely to ultimately develop disease, while excluding those individuals least likely to develop it.

Functional genomics

Functional genomics is the integration of large-scale genomic data with molecular and cellular data to determine how the body works in health and disease. We are leveraging the vast wealth of biological data given by high-throughput assays from hundreds of thousands of individuals. Such systems-level biological data may include gene transcripts, metabolites, proteins and matching genetic information – the so-called “omics”. Using omics data, we can discover and characterise the biological pathways that drive complex disease, as well as identify potential biomarkers for disease risk.

Biomarkers are themselves typically molecules in the body that can be used as diagnostic or screening tools. A simple example is cholesterol for cardiovascular disease risk. With precision medicine becoming part of routine healthcare practice, biomarkers are playing a key role as they improve our predictive power for disease, define new disease subtypes, uncover new biology and change treatment decisions.

Our focus is on identifying key genes, proteins, metabolites and the molecular networks thereof to understand how changes in their levels and composition cause the progression of cardiometabolic diseases. Furthermore, we are investigating genomic regulation of immune responses and gene transcription, particularly in early life, together with their interaction with environmental factors. Finally, we are using functional genomics to drive toward precision medicine by developing methodologies and analysis strategies for the integration of multiple omics data with the wealth of health information in electronic health records.


The human body harbours more microbial cells than human cells. It has been widely shown that the microbial communities that live on us have a significant impact on our health. These communities are commonly referred to as the human microbiome.

We analyse vast amounts of microbiome data from large-scale population studies, with a particular focus on the human gut and respiratory tract. For the gut, there is potential truth in the maxim – “you are what you eat” – for there is evidence that the gut microbiome may influence the development of diseases outside the gut – e.g. cardiovascular, allergic or autoimmune conditions.

In our microbiome research we have several key projects. First, we identify and characterise the microbes that predict present or future disease, and we use statistical and machine learning methods to create predictive models based on the gut microbiome. As an extension, we combine human genomic and microbiome data both to uncover the human genetic variants that determine the communities of microbes living on us as well as to create combined human and microbial genomic risk scores that predict future disease.

Second, we study how time itself affects the microbes that live on us. The longitudinal dynamics of microbial communities are important to cardiovascular and respiratory diseases, with some of our own work showing that early childhood, the first two years of life, is a critical window for the airway microbiome’s role in the development of future asthma.

Third, we are exploring pathogen and antimicrobial resistance carriage in large populations. This also includes designing clinically useful algorithms and software tools to perform rapid typing, drug and virulence gene detection.

Machine learning

Recent advances in high-throughput omics technologies are transforming biological studies into “big data” disciplines. In parallel, progress in machine learning, including deep learning, is revolutionizing fields such as artificial intelligence and is in the process of transforming genomic research. 

Across both its Cambridge and Baker nodes, CBSGI is at the intersection of these two fields, integrating machine learning frameworks with vast quantities of data from population-scale genomic studies.

Our Cambridge node is designing deep learning models to predict cardiometabolic and haematological traits that allow the automatic learning of feature representation of genomic data and the modelling of the complex relationships among human genetic variants and the traits. We aim to discover hidden relationships between the genetic variants and these traits, that serve to quantify clinically important biomarkers and other information which empower subsequent translational studies.

Our Baker node is using multiple omics data to develop deep learning mechanisms that mimic neural networks in a hierarchal organisational structure, and thereby better reflect true biology. Based on patterns that we identify, we are designing approaches that help us interpret vast amounts of genomic variation and how they affect vital processes in human body, which in turn cause detectable changes in biomarkers and susceptibility to disease.

search previous next tag category expand menu location phone mail time cart zoom edit close