Using Artificial Intelligence to Help Radiologists Save Lives
Aidoc is a Tel Aviv-based deep learning company focused on enhancing and prioritizing radiologists’ workflow by automatically detecting and highlighting pathologies in diagnostic images. Founded in February 2016, Aidoc’s solution has already been approved by the FDA, and is in use in 50 prominent medical institutions across the U.S. and Europe, including Yale Medical Center.
Radiology is a very important field of medicine, and the use of imaging as a critical diagnostic modality is constantly growing as CT, MRI, and other medical imaging technologies continue to evolve. Over 75% of all patient care today involves radiology, and radiologists are under pressure to produce quality results at a faster pace with increasing amounts of data. Despite advances in platforms to optimize the radiologist workflow, the manual analysis of the images is still a major bottleneck.
What is the main problem Aidoc is dealing with?
Aidoc solves this problem by using computer vision and deep neural networks to automatically analyze CT, MRI, and other scanned images to detect a variety of critical medical conditions like brain hemorrhages, spinal fractures, and many more. Integrating seamlessly into the radiologist’s existing work environment, Aidoc helps prioritize the workflow by alerting the radiologist to scans that seem to present the highest risk. It also optimizes the actual diagnostic phase, saving precious time—and lives.
Aidoc’s AI team is comprised of five AI algorithm engineers, focused on developing algorithms, and two AI infrastructure engineers, focused on building the infrastructure.
The Challenges in Managing Dozens of Deep Learning Experiments a Day
Aidoc’s core technology is based on deep learning for computer vision, which is, in general, a highly empirical field. Every deep learning solution relies on big data and high levels of both domain and AI expertise to design and run many experiments.
Aidoc’s top-tier AI team leverages their deep understanding of radiology image analysis, as well as their extensive expertise in deep learning processes, to run dozens of experiments concurrently each day, in order to test hypotheses and train optimal diagnostic models. The fact that each targeted pathology requires its own model, with target-specific hyperparameters, data, artifacts, and resources, makes the work even more complex.
Because of this, the team found themselves challenged by the processes and cloud infrastructure required to manage, run, track, and compare hundreds of experiments and a vast quantity of results.
Tedious Manual Operations
There are many parameters that must be tracked in order to manage and evaluate deep learning experiments: performance (run duration, computing power/cost ratio, etc.), model accuracy (TPR, TNR, and other success rates), and metadata on the experiments’ hyperparameters, data, source code, artifacts, and compute resources—to name a few.
Aidoc’s AI engineers found it difficult to aggregate all of the experiment parameters in order to effectively evaluate and compare results. They could not just focus on algorithms and experiment designs—they also needed to keep track of which server each experiment ran on (whether on-premises or in the cloud) and then manually log in to that machine to download the results for analysis.
We knew we're on to something big here. We're on to something that's really valuable that helps doctors saves lives. So we want to deploy it as fast as possible, to as many hospitals as possible.
Aidoc’s team realized that there was something ineffective and inefficient about a process that did not let their team members focus on the reason they were brought into the company (i.e., their core competency in artificial intelligence). As Idan Bassuk describes it:
...every hour of our team members that is wasted on creating machines and other things that can be automated is an hour that we're delayed in deploying our solutions to the market and saving more lives.
At first the team searched for and used existing open-source libraries. This approach was better than building a solution themselves, but it still required a lot of manual labor. The team was sure that they could find a much more efficient solution elsewhere.
Inefficient Cloud Resource Management
One of the main causes of overspending on a cloud infrastructure is failing to tightly control and rightsize the required compute and storage resources. This is especially true for deep learning, which uses particularly expensive cloud resources, such as GPU instances, for experiments that can take days to run. Cloud costs will spiral needlessly if it takes time to notice that an experiment is lying idle due to some kind of interruption or failure, or is simply not progressing in a way that will produce meaningful results.
In addition to the direct costs of inefficiently managing experiment workloads, Aidoc’s AI engineers were also spending a lot of precious time trying to manually rightsize resources and shut down idle instances, and they had no visibility into experiment performance. Aidoc looked for a solution that would automate and optimize the consumption of the cloud resources they needed to run their experiments and research, including leveraging reduced-cost resources.
Automating the Entire Deep Learning Lifecycle
The Aidoc team saw immediate productivity improvements from the moment they started using MissingLink’s resource management capabilities. At last they had an automated platform that could manage the complete deep learning lifecycle, the underlying cloud infrastructure, and experiment results aggregated from multiple cloud servers. In addition, with MissingLink, the Aidoc AI team could now view and manage all of their experiments from a single web page, set up a queue of prioritized experiments, and let MissingLink automatically handle deployment.
Queue, Track, and Compare Experiments
Using MissingLink, the Aidoc team can sync and organize hundreds of experiments and models, with full visibility into experiments as they run. If an experiment fails, they immediately get an email notification from MissingLink that includes a description of the exception that caused the failure. MissingLink’s proactive response and the information provided makes it much easier for the team to deal quickly and effectively with the error and rerun an experiment. In the meantime, MissingLink will automatically start the next experiment in the queue, ensuring that compute resources are never idle and saving the team time and money. MissingLink also makes it easy to compare or reproduce completed experiments in a few clicks.
Another valuable experiment-tracking feature is the visual representation of trending results as the experiment evolves. Rather than waiting for days until the experiment ends, the data scientist can easily see in real time if an experiment is not proceeding as planned and is unlikely to yield interesting results. They can stop the experiment with one button click and immediately start planning the next experiment.
When we run hundreds of experiments at the same time and we want to compare them, today, with MissingLink, we have a single web page in which we can look at all these experiments, choose the specific experiments that we want to compare (based on semantic names we gave the experiments), and just click on a single "compare" button, and, in an instant, we get a comparison of all the experiment results.
Optimized On-Premises and Public Cloud Environment
MissingLink lets Aidoc’s AI engineers define the experiment to run in just a few clicks, and the platform then transparently takes care of deploying it in the cloud, running it, and shutting down the cloud server instances when the experiment is done. In addition, it manages the experiments that Aidoc’s AI engineers run on their local machines, helping them seamlessly manage a hybrid IT environment.
MissingLink automatically manages a queue of experiments according to predefined priorities, fully optimizing the usage of AWS’ GPU-based Spot Instances. As experiments are stopped or end, MissingLink grabs the next experiment from the queue and runs it.
“This feature makes our work much more efficient. I just choose which experiments I want to run, which machines I want to run them on, and I know that once there's an available machine, it will run...The whole process is as efficient as possible and it saves us money because the experiments are shut down once they're over. No time is wasted on active servers that are not actually running experiments, standing idle.”
In addition to shutting down expensive cloud resources that are no longer needed, the platform continuously and automatically aligns Aidoc’s cloud infrastructure compute and storage capacity with the needs of the currently running experiments. With MissingLink, the Aidoc cloud environment grows and shrinks elastically as needed.
Deep Learning + Ops = MissingLink DeepOps
A highly transparent culture, as well as shared responsibility, are critical for breaking down silos and fostering trust across deep learning teams. Enter MissingLink, whose DeepOps platform lets Aidoc’s AI engineers (deep learning) and operations personnel (ops) collaborate to build a faster, more reliable deep learning pipeline—fueling the rapid growth that is core to Aidoc’s vision.
“MissingLink is a core element of our DevOps environment, or should I say DeepOps solution. When we started a few years ago, our vision was to create a platform that would help us automate everything on the infrastructure level and let us focus on what we do best: research and build algorithms that run in production. MissingLink is a core part in making this vision a reality.”
By automating and streamlining the complete deep learning lifecycle, Aidoc’s AI team is freed up from tedious, repetitive, manual processes, and can accelerate the development and delivery of life-saving features. Instead of copying data, reproducing experiments, and managing resources, Aidoc’s top-notch AI engineers can bring the full power of their core competencies to bear on real issues, offering creative solutions to radiologists and their patients.