Using artificial intelligence to analyze gene expression patterns — which genes are ‘switched on’ — in blood cells could aid in the diagnosing acute myeloid leukemia (AML) more quickly, potentially getting patients into treatment earlier, a new proof-of-concept study suggests.
The study, “Scalable prediction of acute myeloid leukemia using high-dimensional machine learning and blood transcriptomics,” was published in iScience.
Analysis of the transcriptome — which shows genes that are ‘turned on’ in a cell, and to what degree — can provide a host of information about the behavior of individual cells. Among its potential applications are analyses of particular types of cancer cells (in this case, AML cells), because they may have distinct gene expression signatures that, if screened, could aid in a diagnosis.
The amount of data that comes from transcriptome analyses, however, is massive and nearly impossible for experts to analyze on their own.
Computers, in contrast, are able to sort through massive amounts of data, and look for patterns that, say, distinguish diseased from healthy cells. They do so by generating algorithms which, when given new data, spot similar patterns to classify samples as healthy or malignant. This ability is commonly referred to as artificial intelligence.
Researchers in Germany collected gene expression data from 105 previously published studies. This included data from 4,145 blood or bone marrow samples taken from people with AML, and 7,884 samples from healthy people or those with a different disease (leukemia or otherwise).
“Numerous studies have been carried out on this topic and … there is an enormous data pool. We have collected virtually everything that is currently available,” Joachim Schultze, a professor at German Center for Neurodegenerative Diseases and study co-author, said in a press release.
The researchers fed data from a portion of the samples into computers, aiming to find patterns that allow the sorting of AML samples from non-AML samples, and to generate algorithms based on those patterns.
These algorithms were then tested on data from the remaining samples.
In this latter test, “we knew the classification as it was listed in the original data, but the software did not,” Schultze said. “We then checked the hit rate. It was above 99 percent for some of the applied methods.” This means that 99% of AML samples were correctly identified as such.
The hit rates did vary based on other factors, like how many samples were fed into the computer during the initial algorithm generation. But the results provided a proof-of-concept for using this strategy as a diagnostic tool.
Further analysis suggested high hit rates for additional classifications, such as determining AML subtype and identifying other blood cancers.
Importantly, the approach holds promise as a tool with relatively few barriers to implementation.
“Our results show that with existing technologies it is potentially possible to achieve good performance in a near-automated fashion,” the researchers wrote.
“In principle, a blood sample taken by the family doctor and sent to a laboratory for analysis could suffice. I guess that the cost would be less than 50 euros,” Schultze said, noting that this proof-of-principle is not an actual diagnostic test.
“We have not yet developed a workable test,” he said. “We have only shown that the approach works in principle. So we have laid the groundwork for developing a test.”
On a broader scale, these findings show the power of this analytic approach, emphasizing the importance of having publicly available data that can facilitate this kind of analysis.
“Taken together, our results underline the immense value of making [gene expression] data publicly available, allowing for new and large-scale multi-study analyses,” the researchers wrote.
“We envision that combining whole genome and transcriptome analysis based on machine learning algorithms will ultimately allow early detection, diagnosis, differential diagnosis, subclassification and outcome prediction in an integrated fashion,” they added.