Publication date: Sep 03, 2019
Many startups are using modern data and AI technologies to tackle problems related to workflow optimization and automation, demand forecasting, treatment and care, diagnostics, drug discovery, personalized medicine, and many other areas.
We will focus on recent progress in foundational topics that the healthcare and medical community care about: access to high-quality labeled data, developing ML models that cut across organizations, and data markets and networks.
Medical data comes in many forms: images, audio, unstructured text, and occasionally even structured data.
Regardless of the form, the problems that affect medical data aren’t fundamentally different from the problems in other industries: missing data, corrupt values, suspicious outliers, typographic errors, lack of labels, and more.
Researchers also have used Holoclean on data sets related to healthcare (hospital dataset used in data cleaning benchmarks) and public health (food inspection dataset).
As more and more medical databases become available, labelling the data to train machine learning models become ever more critical.
This is true for repositories composed of medical imaging data, genomics, and other data types.
Data is often labelled when it is created (for example, medical images labelled with a patient ID and a diagnosis); services like Mechanical Turk are often used to classify unlabeled data.
GE has claimed that most medical data never gets analyzed–but this data still may be useful for training if it can be properly tagged.
In addition to Holoclean, Christopher RcE9 and his collaborators have released an open source data programming tool called Snorkel.
Snorkel automates the work of creating training data sets by labelling data programmatically, and then using machine learning to classify and even transform images: In a nutshell, data programming techniques provide ways to -manufacture” data that we can feed to various learning and predictions tasks (even for ML data quality solutions).
The rise in tools for data programming occurs at a time when other researchers are exploring -small sample learning” (machine learning tools that rely on smaller amounts of labeled data) for biomedical image analysis.
Besides labelling data for use in machine learning, data programming can be used to extract knowledge and information buried within existing data sources.
Snorkel has been used to create training data for a machine reading system that automatically collects and synthesizes genetic associations and makes them available in a structured database.
At O’Reilly’s Artificial Intelligence conference in Beijing, Ion Stoica, director of UC Berkeley’s RISELab, described new projects that allowed organizations to cooperate without actually sharing data as coopetitive learning.
- Looking beyond the hype: Applied AI and machine learning in translational medicine.
- San Diego-based healthcare pioneer Eric Topol predicts the fascinating future of medicine
- Venture capital investment in AI and mental health startups surges in Q2: report
- Why data sharing and privacy are key to advance medical research for patients
- Holy Grail of medical data and privacy
- How AI Can Improve Healthcare | In-Person | J.P. Morgan
- Personalizing Patient Engagement with AWS Cloud