Skip to main content

Finding a cure for cancer – is big data the solution?

November 15, 2018

The explosion in patient data has the potential to revolutionize how we treat cancer – but the big data solution still faces huge logistical challenges.

Cancer is one of the great scourges of our time. It is the second leading cause of death globally, responsible for killing 8.8 million people in 2015. And it is becoming increasingly prevalent – the number of new cases is expected to rise by about 70% over the next two decades.

While the fight against cancer has made huge strides over the past thirty years, with the survival rate having doubled, a general cure remains elusive.

The challenge is that cancer is not a single disease, it is hundreds of different diseases. And every cancerous tumor is different. Each one grows and develops in its own unique way – a single tumor can possess billions of cells, each of which can mutate. And because these cancer cells can add new mutations and new genetic variations each time they divide, as the tumor grows and develops, there can become an almost infinite multitude of variations on the genome level.

This makes treating cancer incredibly complex, with oncologists, doctors who specialize in treating cancer, attempting to stop an unpredictable, moving target. There is no one-size-fits-all solution.

“Cancer is extremely heterogeneous,” says Dr. Andreas Schuppert, Professor at RWTH Aachen University and Key Expert in Clinical Pharmacometrics in Bayer’s Pharmaceuticals Division. “If you try to compare cancers on the molecular level, you will almost never find two cancers which are the same. The problem is that no one knows precisely what are the parameters which control the fate of the cancer. This is the challenge.”

“Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world.”

– Atul Butte, Director, Institute for Computational Health Sciences, UCSF

The data mountain

What researchers do have, is data, and lots of it. A single cancer patient can generate nearly one terabyte of biomedical data, consisting of routine diagnostic data as well as all the patient’s clinical data. It is the equivalent to storing more than 300,000 photos or 130,000 books.

Alongside patient history and diagnostic imaging data, genetic data has become a key tool in the battle against cancer. As the cost of DNA sequencing has plummeted over the last two decades, the gathering of detailed genetic information on both the patient and the tumor has become far more commonplace, rapidly increasing the volume of data available to scientists.

Today, researchers not only look at the genetic code of the cancer cells that were the original source of the disease, they can also compare them with the DNA of metastases, or secondary tumors, that develop.

By the end of 2017, it is estimated that research based around genome sequencing of patients generated one exabyte of data annually – that’s a million terabytes (the equivalent of 130 billion books – a thousand times more than have ever been published).

“Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world.” – Atul Butte, Director, Institute for Computational Health Sciences, UCSF.

How scientists use big data in healthcare to fight cancer

The highly complex nature of cancers means that the approach to using big data is markedly different from the approach used for some other types of diseases. In cancer, there can be many thousands of parameters per patient but sometimes only few similar patients to study as each tumor is unique.

Researchers are currently using this data to analyze the disease on three levels:

  • Cellular – Looking for patterns in the data of individual cancer cells to discover genetic biomarkers. Finding common features could help us better predict how individual tumors might mutate and what drug treatments might be most effective.
  • Patient – A patient’s medical history and DNA data could be used to help define the best combination of therapies for them, based on their tumor, their genes – and the effects of treatments on patients with similar disease patterns and genetics.
  • Population – Wider population data can be analyzed to inform treatment strategies for patients based on their different lifestyles, geographies, and cancer types.

The end goal of these various approaches is for oncologists to be able to provide each patient with a tailor-made drug treatment that targets his or her specific cancer cells, limiting the risk of serious side effects.

8.8 million deaths in 2015 from cancer - the second leading cause of death globally

Symptoms and supercomputers

“Big data has long been a reality in medicine. The three Vs – volume, velocity and variety – will increasingly determine everyday reality in doctors’ offices,” says Dr. Joerg Lippert, Head of Clinical Pharmacometrics in Bayer’s Pharmaceuticals Division.

Data analytics has already become a key weapon in the fight against disease in the digital age. But for cancer, the sheer unpredictability of how a tumor mutates and develops makes the task of finding an answer in all that data far harder.

The challenge is less accumulating patient data but the ability to properly manage and analyze it all effectively. Just as there is no one cure-all for cancer, there is no single tool for analyzing the data. Researchers and computer scientists are constantly having to play catch up. As soon as they develop new tools and techniques required to efficiently process all this information, more data sources emerge and the computational demands grow ever larger.

There is also an issue of standardization and how the data is collected, stored, and studied. Genomics does not yet have standards for converting raw sequence data into processed data, while some medical data is very unstructured and in some cases inaccurate due to human error when the data was entered or in the way it was measured and recorded.

To handle this volume of personal data, develop the software tools to make sense of it all, and have the necessary supercomputing power required to process everything, researchers are increasingly taking a collaborative approach. Public-private projects, such as the Innovative Medicines Initiative in Europe, are helping accelerate the development process and bring together different expertise. The Project Data Sphere initiative is a platform to share, integrate and analyze historical cancer research data in a single location. While the DREAM Challenges are an open science effort that uses crowdsourcing and transparent data sharing to assess existing analytical tools, suggest improvements and develop new solutions. Of course, the requirements of data privacy legislation adds to the complexity.

The power of collective wisdom

The mountains of information now available to oncologists have the potential to help deliver tailor-made treatments to patients that can better target the tumor and reduce the likely side-effects.

But it is an incredibly difficult task and truly realizing the potential of this new weapon in the battle against cancer requires effectively managing and analyzing all that data. The solution lies in greater collaboration, working together to use multiple tools each looking for specific features.

If the 20th century medicine was about the quest for data, the 21st century is about how we work together to better use it.


Join the conversation #CanWeLiveBetter
Let’s talk about today’s challenges and tomorrow’s solutions


Back to top