Big data helps autism research: U of T team identifies 18 new genes increasing risk
Scientists in the world’s largest autism genomics project recently identified 18 new genes that increase risk for the condition.
Some of the genes seen in participants also carry risk for heart disease, diabetes and other conditions, opening the potential for more personalized genetic counselling.
The results of the project, named MSSNG, provide more evidence that each person’s autism is unique, meaning researchers will still need a lot more genomic data before they can sort and target the many forms of the condition. However, some families are already benefitting. The MSSNG project includes whole-genome data from more than 7,000 individuals affected by autism, and that data is stored on Google Cloud, which allows access to researchers around the world.
Professor Stephen Scherer, director of both the McLaughlin Centre at the University of Toronto and the Centre for Applied Genomics at the Hospital for Sick Children, is the senior investigator for MSSNG.
He spoke with U of T's Jim Oldfield about how the cloud is enabling a new kind of open science on autism, and what needs to happen next for big data to deliver on its potential to treat the most baffling medical conditions.
How did the MSSNG project come about?
Genome sequencing generates massive amounts of data, and the need to deal with those terabytes of information is what put us into the cloud environment.
The project came together four years ago when we decided to make all that data available. The original vision a few of us had was for truly open science, where you could type a keyword into a database, say if you’re looking for which individuals carry a gene.
We found out along the way that we need more open consent, in part because we’re dealing with clinical research data, even though it’s anonymized. So we now have a system where you apply through a data access committee. You can get anything you want in the cloud, including raw reads from the sequencers and new analytics tools we've developed. Almost 100 researchers at dozens of institutions are using the system, and we expect those numbers to grow. It’s probably one of the most open-science genetics projects right now.
Why is this technology well-suited for autism research?
We need to take this approach because autism is extremely heterogeneous in terms of how it presents clinically and the underlying genetics. There are well over 100 different forms, which is why we sometimes call them the autisms.
To subcategorize these conditions, we need big numbers and whole genomes. We calculated that to get all low hanging fruit – the highly penetrative autisms with the most common genetic variants – we’d need about 10,000 families. To find new impactful variants, including copy number variations or small insertions and deletions, some of which are in the noncoding regions of the genome, we’ll likely need up to 100,000.
Will machine learning help analyze that data?
I hope so. [U of T Professor] Brendan Frey and his group published a paper in Science a couple of years ago using MSSNG data in its early form. They used deep genomics algorithms to analyze hundreds of thousands of variants. We published a follow-up paper using his programs to look for splicing differences in autism subjects versus controls. These are some of the first papers that convincingly show non-genic regions of the genome can be involved in autism. So the short answer is we’re already using machine learning to mine the data we have, and other groups are doing it as well. We do think U of T will have a competitive advantage here.
How is MSSNG benefiting patients now?
We’ve found a total of 63 genes and mutations that increase risk for autism through this project.
That data is communicated back to families that are part of the study, through a genetic counsellor in cases where it’s relevant. Sometimes other conditions are implicated such as epilepsy, anxiety or sleep/mood disorders. In others, a formal diagnosis can help encourage earlier behavioral interventions.
A genetic profile that matches a known subtype of autism can also affect prognosis and assessment of familial recurrence risk. And we’re linking families with one another in cases where they may benefit by talking about what worked and what didn’t. In the future, this data should facilitate clinical trials based on a small number of key neurological pathways affected by the many genetic variants in autism.
What progress might we see in the next five years?
I often say autism is about 10 years behind cancer in terms of how we use genomic data. But, we’re only behind because we started later.
Some people don’t think autism should be an area of research, and some families don’t want interventions. But most want investment and research so the demand for data is very high.
If had my dream – and I think this will happen in Ontario within three years – every child with a diagnosis would have his or her genome sequenced. For about 20 per cent of families, we can now explain why autism comes about in their child. Previous technologies only looked at two per cent of the genome, the genes. Now, most leading-edge labs are studying the other 98 per cent, and whole-genome sequencing provides the fundamental road map for those experiments. We are linking all that high-quality data together and using it to decode evolution. It’s a very exciting time.
Nature Neuroscience published the recent results from MSSNG, which is a collaboration between SickKids, Autism Speaks, Verily (formerly Google Life Sciences) and researchers at the University of Toronto.