Researchers found that a black-box algorithm predicted patient death better than humans.
They used ECG results to sort historical patient data into groups based on who would die within a year.
Although the algorithm performed better, scientists don’t understand how or why it did.
Albert Einstein’s famous expression “spooky action at a distance” refers to quantum entanglement, a phenomenon seen on the most micro of scales. But machine learning seems to grow more mysterious and powerful every day, and scientists don’t always understand how it works. The spookiest action yet is a new study of heart patients where a machine-learning algorithm decided who was most likely to die within a year based on echocardiogram (ECG) results, reported by New Scientist.
The algorithm performed better than the traditional measures used by cardiologists. The study was done by researchers in Pennsylvania’s Geisinger regional healthcare group, a low-cost and not-for-profit provider.
Much of machine learning involves feeding complex data into computers that are better able to examine it really closely. To analogize to calculus, if human reasoning is a Riemann sum, machine learning may be the integral that results as the Riemann calculation approaches infinity. Human doctors do the best they can with what they have, but whatever the ECG algorithm is finding in the data, those studying the algorithm can’t reverse engineer what it is.
The most surprising axis may be the number of people cardiologists believed were healthy based on normal ECG results: “The AI accurately predicted risk of death even in people deemed by cardiologists to have a normal ECG,” New Scientist reports.
To imitate the decision-making of individual cardiologists, the Geisinger team made a parallel algorithm out of the factors that cardiologists use to calculate risk in the accepted way. It’s not practical to record the individual impressions of 400,000 real human doctors instead of the results of the algorithm, but that level of granularity could show that cardiologists are more able to predict poor outcomes than the algorithm indicates.
It could also show they perform worse than the algorithm—we just don’t know. Head to head, having a better algorithm could add to doctors’ human skillset and lead to even better outcomes for at-risk patients.
Machine learning experts use a metric called area under the curve (AUC) to measure how well their algorithm can sort people into different groups. In this case, researchers programmed the algorithm to decide which people would survive and which would die within the year, and its success was measured in how many people it placed in the correct groups. This is why future action is so complicated: People can be misplaced in both directions, leading to false positives and false negatives that could impact treatment. The algorithm did show an improvement, scoring 85 percent versus the 65 to 80 percent success rate of the traditional calculus.
As in other studies, one flaw in this research is that the scientists used past data where the one-year window had finished. The data set is closed and scientists can directly compare their results to a certain outcome. There’s a difference—and in medicine it’s an ethical one—between studying closed data and using a mysterious, unstudied mechanism to change how we treat patients today.
Medical research faces the same ethical hurdles across the board. What if intervening based on machine learning changes outcomes and saves lives? Is it ever right to treat one group of patients better than a control group that receives less effective care? These obstacles make a big difference in how future studies will pursue the results of this study. If the phenomenon of better prediction holds up, it may be decades before patients are treated differently.