Jonathan D. Grinstein, PhD, the North American Editor of Inside Precision Medicine, hosts a new series called Behind the Breakthroughs that features the people shaping the future of medicine. With each episode, Jonathan gives listeners access to their motivational tales and visions for this emerging, game-changing field.
When discussing precision medicine, I frequently encounter the paraphrasing of a quote that roughly translates to: Data is the oil of medicine.
What I didn’t know, though, is that there is a second part to that quote. The full quote, attributed to technology expert Peter Sondergaard, is: Information is the oil of the 21st century, and analytics is the combustion engine.
Over the last few decades, significant progress has been made in the “oil rigs” of biological and medical data, and it is only getting started. According to a report from RBC Capital Markets, the healthcare industry currently generates approximately 30% of global data volume, with a compound annual growth rate of 36% expected by 2025. That is 6% faster than manufacturing, 10% faster than finance, and 11% faster than media and entertainment. L.E.K. Consulting’s report provides similar figures, estimating that the healthcare industry generates a large and rapidly growing amount of data. In other words, obtaining the oil is not a rate-limiting factor; the issue is how to use it.
In this episode of Behind the Breakthroughs, Harry Clifford, PhD, digital biology lead at NVIDIA, discusses how the computer manufacturing and AI computing leader is building a technology platform to be that combustion engine. Clifford explains how NVIDIA is addressing data bottlenecks in drug development and healthcare to advance precision medicine.
This interview has been edited for length and clarity.
IPM: Do you start with a biological question at NVIDIA and then figure out how to apply computational tools to it, or is it sort of backward where you look for where to apply NVIDIA’s computational power and expertise?
Clifford: A lot of it is about tracking the data and seeing where the most data is being produced, as well as tracking bottlenecks and identifying areas where we can accelerate, and a lot of it is about what we can do with AI that has never been done before when it is possible to push the boundaries. But, overall, I believe the NVIDIA healthcare team is well aware that, for example, the healthcare industry is evolving into a technology industry, with modern technology increasingly powered by AI and supercomputers. So, while NVIDIA is not a healthcare company, it does serve the healthcare ecosystem. We want to provide the industry with a platform that allows incredible healthcare AI applications to be created and delivered.
There may be no industry where AI holds more potential than in healthcare. The healthcare industry generates approximately 30% of global data volume, which is vastly underutilized. So, as I said, we tend to follow the data regarding problem-solving.
Another reason is that it is clearly a large market. Healthcare applies to pretty much everyone. Everyone is a patient at some point. Everyone cares about their health. Within that, there are tens of thousands of diseases and millions of patients with unmet needs, so it’s a pretty vast space where AI can integrate.
Then there’s the development side—what can we do to improve therapeutic research and development to meet the needs of those individuals better? Whether it is a drug for a specific disease mechanism, a specific cancer pathway, or whatever an individual is dealing with, we must ensure that therapies address the disease or disorder’s characteristics or biomarkers. When we think about it from an AI standpoint, having a better understanding of the genome or proteome and all of the underlying biological systems will help drive the next wave of precision medicine.
IPM: What are the biggest obstacles to achieving this vision of a highly digitalized and integrated healthcare future?
Clifford: There are specific problems that we can address directly. In genomics, for example, we likely want to address two major issues: the scale of the data and the ability to process it much faster, as well as the complexity of the data being produced.
In terms of more traditional sequencing, all of the sequencing instruments that are being released are significantly larger. They’re producing huge amounts of data. The most recent estimate is that genomics alone will generate approximately 40 exabytes of data over the next decade. Without some accelerated computing, it seems like there is no way to handle that.
The second aspect is that the data’s complexity will be a major problem to handle as we move from sequencing into the newest tools for single-cell and spatial omics. If you think back to how we used to deal with sequencing data, where we would do variant calling, look for specific variants on the Integrative Genomics Viewer (IGV), and manually check that everything was called correctly by the statistical algorithms, those days are long gone because your data is so large and rich. As a result, you must rely on extremely sophisticated algorithms to interpret that data. Spatial omics is a great example because when you think about how you are processing both imaging data and whatever you are measuring from the data, whether it is specific probes or gene expression levels or anything else, you are trying to process both types of complex data at the same time. That is where I believe AI will play a critical role in ensuring that spatial techniques, for example, deliver on the promise of precision medicine.
IPM: How do you define “complexity of data”?
Clifford: When the data being produced is multidimensional, there is a lot more information in it. If we consider spatial omics, many approaches will use either imaging directly or combine imaging with traditional sequencing techniques. Processing that imaging data entails, for example, new segmentation techniques in which you draw a boundary around the cells in the image and ensure that whatever probes you are looking at fit within that boundary and are assigned to the appropriate cells. So, to begin with, [spatial omics] is a complex type of data generated as these devices become more complex and mature. Second, how do you combine that imaging information with your usual sequencing data? Can you do anything where you look at the morphology of the cells as well as the expression of the genome? There is a lot to do in dealing with the complex data right off the instrument and dealing with the data downstream because you now have more of it and multiple data modalities.
IPM: What are the most readily available areas in medicine for NVIDIA’s approaches to be applied to, and what are the more difficult problems or questions you see on the horizon or are interested in potentially solving?
Clifford: In terms of where we can solve problems more quickly, we consider how to accelerate already present bottlenecks that we are aware of. We know that what is currently being used is a very well-defined problem, and all we need to do is reduce the runtime and increase access to it, which I believe is more immediate and changeable.
In terms of the more difficult problems, that requires us to look quite far into the future and think about, as the saying goes, “skating to the puck,” figuring out what is going to happen in the healthcare industry and where we can help in the future as, for example, AI is adopted. A lot can be done to improve productivity and reduce the burden of secondary tasks in healthcare, such as doctors transcribing notes. Future software, including word processing, image editing, and spreadsheets, will heavily rely on machine learning. We can expect AI co-pilots and chatbots to feed through, and how we interact with the software will change rapidly, and I don’t think healthcare will be any different there. AI will not only be a powerful tool for data analysis and automating current research and digital biology methods, but it will also improve the work of all healthcare professionals and biologists. As with any technological advancement, it is initially used to make current tasks easier, but in the long run, it enables entirely new and unexpected approaches, which I am very interested in discovering.
I would even say that we are in a post-Moore’s law era, where it is less about what we can fit on a single chip and more about how we can scale computing and accelerate scale, as well as all of the benefits that full-stack optimizations can bring. In other words, what can we combine with it to solve new problems rather than just adding more to the chip? The old way of thinking was all about horsepower and faster and faster chips, which was very much about Moore’s Law. What is happening at NVIDIA combines this with much more optimized scaling, connecting many of those systems and then optimized software, which will drive an entirely new approach.