Will AlphaFold be the Best Predictor for Genomic Misconceptions in the Next Decade? An Emory University Bioinformatician
About a decade ago, iga Avac was a PhD physics student with a passion for machine learning and a crash course in genomics. He was soon working in a lab that studied rare diseases, on a project aiming to pin down the exact genetic mutation that caused an unusual mitochondrial disease.
There are 71 million possible missense variants in the human genome, and the average person carries more than 9,000 of them. Most are harmless, but some have been implicated in genetic diseases such as sickle cell anemia and cystic fibrosis, as well as more complex conditions like type 2 diabetes, which may be caused by a combination of small genetic changes. Avsec started asking his friends if they knew which ones were dangerous. The answer: “Well largely, we don’t.”
Of the 4 million missense variants that have been spotted in humans, only 2 percent have been categorized as either pathogenic or benign, through years of painstaking and expensive research. For a single missense variant, it can be several months before we understand the effect.
The use of genomics in healthcare is limited by a new tool based on the AlphaFold network that can accurately predict when a certain sequence of genes will cause health conditions.
Its impact won’t be as significant as AlphaFold, which ushered in a new era in computational biology, agrees Joseph Marsh, a computational biologist at the MRC Human Genetics Unit in Edinburgh, UK. It seems to me that it is exciting. It’s probably the best predictor we have right now. Will it be the best predictor in two or three years? There is a good chance it won’t be.
Yana Bromberg, a bioinformatician at Emory University in Atlanta, Georgia, emphasizes that tools such as AlphaMissense must be rigorously evaluated — using good performance metrics — before ever being applied in the real-world.
The performance of such prediction methods has been benchmarked for years against experimental data that has not yet been released. “It’s my worst nightmare to think of a doctor taking a prediction and running with it, as if it’s a real thing, without evaluation by entities such as CAGI,” Bromberg adds.