What makes a good tool? Suppose a good tool is characterized as one that excels in its primary purpose, while a mediocre tool breaks easily or performs poorly for its purpose. But multimodal AI can change the way we look at how the performance of AI tools can be improved.
MIT researchers have developed a multimodal healthcare analytics framework called Holistic AI in Medicine (HAIM), recently described in Nature NPJ Digital Medicine, which uses multiple data sources to more easily build predictive models in healthcare settings: from identifying a range of chest pathologies such as lung lesions and edema to predicting a patient’s 48-hour mortality risk and length of stay. And to do so, they created over 14,000 AI models for testing.
In AI, most tools are single-modality tools, meaning they synthesize one category of information to generate results—for example, feeding a machine learning model thousands of CT scans of lung cancer so that it learns how to properly identify lung cancer from medical images.
Additionally, most multimodality tools are highly dependent on medical imaging and tend to weigh other less important factors, although there are a number of ways doctors can determine whether someone has lung cancer or is at risk for it. development of lung cancer: persistent cough, chest pain, loss of appetite, family burden, genetics, etc. If the AI tool were supplemented with a more complete picture of a patient’s other symptoms and health history, could it identify lung cancer or other diseases even earlier and with greater accuracy?
“This idea of using single data to drive important clinical decisions didn’t make sense to us,” said Abdul Latif Jameel Clinic for Machine Learning in Health postdoc and lead study co-author Louis R. Soenksen. “Most doctors in the world work in a fundamentally multimodal way and would never make recommendations based on narrow unimodal interpretations of their patients’ conditions.”
More than two years ago, the field of AI in healthcare was exploding. The amount of funding in AI digital health startups has doubled from the previous year to $4.8 billion and will double again in 2021 to $10 billion.
At that time, Soenksen, Jameel Clinic CEO Ignacio Fuentes, and Dimitris Bertsimas, Boeing’s global operations leader, professor of management at the MIT Sloan School of Management, and head of the Jameel Clinic faculty, decided to take a step back to consider what is missing from the field.
“Things were coming to light and to the right, but there was also a time when people got disillusioned because the promise of AI in healthcare was not fulfilled,” recalls Soenksen. “Basically, we realized we wanted to bring something new to the table, but we had to do it systematically and provide the necessary nuance for people to appreciate the advantages and disadvantages of multimodality in healthcare.”
The new idea they came up with was seemingly common sense: building a framework to easily generate machine learning models capable of processing different combinations of multiple inputs in the way that a doctor might consider a patient’s symptoms and health history before to make a diagnosis. But there was a marked absence of multimodal health frameworks, with only a few articles published on them that were more conceptual than concrete. Furthermore, when it comes to developing a unified and scalable framework that can be consistently applied to train any multimodal model, single-modality models often outperform their multimodal counterparts.
Looking at this gap, they decided it was time to assemble a team of experienced AI researchers at MIT and began building HAIM.
“I was very fascinated by the whole idea of [HAIM] because of its potential to significantly impact the infrastructure of our current healthcare system to bridge the gap between academia and industry,” says Yu Ma, a PhD student advised by Bercimas and co-author of the paper. “When [Bertsimas] asked me if I would like to contribute and I immediately jumped in.’
Although large amounts of data are usually seen as an advantage in machine learning, in this case the team realized that this is not always the case when using multimodal systems; there was a need for a more nuanced approach to evaluating inputs and modalities.
“A lot of people are doing multimodal learning, but it’s rare to have a study of every single possible combination of the model, the data sources, all the combinations of hyperparameters,” says Ma. “We were really trying to understand exactly how multimodality performs in different scenarios.”
According to Fuentes, the framework “opens up an interesting avenue for future work, but we need to understand that multimodal AI tools in clinical settings face multiple data challenges.”
Bertsimas plans for HAIM 2.0 are already in the works. Consideration is being given to incorporating more modalities (eg, signal data from electrocardiograms and genomic data) and ways to assist healthcare professionals in decision making rather than predicting the likelihood of certain outcomes.
CHAIM is also an acronym that Berzimas came up with, which happens to be the Hebrew word for “life.”
This work was supported by the Abdul Latif Jameel Clinic for Machine Learning in Health and the National Science Foundation Graduate Research Fellowship.