The Cure for Cancer May Literally Be Up In the Air

If you thought technology was progressing quickly, you ain’t seen nothing yet. Two things are currently happening in parallel. First, the technology to collect biological data is growing at an incredibly rapid pace. Second – and the reason this is so important – is that computing is becoming scalable. The combination of the two is about to revolutionize health care.

Understanding disease and how to treat it requires a deep knowledge of human biology and what goes wrong in diseased cells. Up until now, this has meant that scientists do experiments, read papers, and go to seminars to get data to build models of both normal and diseased cell states. This requires something that, in today’s tech dependent world, we don’t seem to have much of: time.

What medical research really needs is to combine large amounts of data with computers and have it accessible to anybody who cares to find it. But wait, we have that already: it’s called the Internet. The problem is that what researchers really require is access to a data system that will collect and integrate information from universities, hospitals, and biotech companies beyond just what they can themselves collect or remember. This system, with affordable massive data in the clinic and in the lab, is on the horizon. This will mean exponentially faster medical progress.

As of now, there are several ublicly traded companies and about a dozen high profile startups startups whose entire business is focused on the race for faster, cheaper, more accurate DNA sequencing. Clinical applications are usually limited to screens for known genetic markers of disease or drug response, but as the cost of data acquisition drops we will start to see companies and academics use unbiased observational correlations to generate meaningful hypotheses about the genetic causes of disease.

Progress is rapidly being made in imaging and identifying proteins, metabolites, and other small molecules in the body. The end goal is the opportunity to create pools of comprehensive data for patients and healthy people where researchers can integrate data and find patterns. These data pools can be created by anyone who has the consent of the patients: universities, hospitals, or companies. The resulting data tornado will be huge. This could also create the next engine of economic growth and improve millions of lives. The question that remains, of course, is how all this data will be integrated.

Cloud Computing to the Rescue

The cloud can help create a value network where researchers, doctors, and entrepreneurs can be given access to data and algorithms that will help them communicate and share information more effectively than ever before.

The true value of the data will begin to be unlocked as it is analyzed in the context of all the other available data, whether in public clouds or private, secure silos. This massively integrated analysis will speed the transition from bleeding edge experimentation to the good-enough stage where they will compete on ease-of-use, speed, and cost.

One of the main areas where this kind of integration could be most efficient is that of cancer research. Since cancer is a disease of genetic regulation gone wrong, chances are that if you live long long, you will get cancer. For most diseases, like asthma, diabetes or autism, there are classifications that have been formed that help the development of treatments. However, what medical experts have recently concluded is that medicines that target individual pathologies would be far more effective, since it varies significantly among peoples.

Cancer As the Ultimate Big Bio Problem

Tumors may have millions of mutations and rearrangements as compared to normal tissue in the same individual, and cancer cells within the tumor itself may have different genomes. Essentially then, cancer is the perfect ailment to be analysed and treated using algorithms, where there are N types of lymphomas, with N being the number of people who have lymphoma.

If we imagine a world where every tumor is comprehensively profiled, it quickly becomes clear that not only will the data sets be very large but also involve different domains of expertise required for quality control, model building, and interpretation. Every cancer and person will be different based on their genome, proteome, metabolite and small molecule profiles, and features we have yet to discover. With the technology being used today, it would take an average of two days to process gene expression data. If we consider a few thousand patients, that time adds up to about 570 years on the desktop! Only a distributed computing platform can get the job done, and the cloud opens this work up to the masses.

Big bio in cancer research has life-changing implications for treatment and diagnosis. The cloud can very legitimately help treat cancers by allowing researchers, doctors, and engineers to gather, interpret, and integrate data on previously unprecedented scales. As we begin to understand more precisely how individual cancers work, drug development ventures will have a much better sense of what to focus on, diagnostics companies will know what to look for, and patients will be treated by therapies that maximize effectiveness and minimize side effects – all based on actual data.