CSIRO uses AI to crunch a trillion genomic data points

By

To identify disease-causing genes.

CSIRO researchers crunched one trillion genomic data points in the cloud to help locate parts of the human genome that cause disease.

CSIRO uses AI to crunch a trillion genomic data points

The CSIRO's bioinformatics group used its own VariantSpark artificial intelligence (AI) based platform, which runs on Amazon Web Services (AWS).

In a new study published in the technical journal Giga Science, the researchers outlined how they analysed a synthetic dataset of 100,000 individuals’ genomes, each made up of over three billion DNA base pairs.

Dr Denis Bauer, head of the bioinformatics group, said no other technology platform has yet been able process one trillion data points of genomic data, over 10 million variants and 100,000 samples at once.

Using AI platforms in this way will be essential for the future of healthcare in Australia, CSIRO’s Australian e-Health research centre chief executive Dr David Hansen added.

"Artificial intelligence is a critical component of understanding genomic information," Hansen said.

"Despite recent technology breakthroughs with whole genome sequencing studies, the molecular and genetic origins of complex diseases are still poorly understood which makes prediction, application of appropriate preventive measures and personalised treatment difficult."

This is because many traits and disease are thought to be polygenic, or influenced by more than one gene, the Giga Science paper states.

VariantSpark was found to better identify genomic variants associated with complex genetic expressions compared to traditional monogenic, genome-wide association studies.

"Our research shows VariantSpark is the only method able to scale to ultra-high dimensional genomic data in a manageable time," Bauer said.

"It was able to process this information in 15 hours while it would take the fastest competitors likely more than 100,000 years to process such a volume of data.

"This is a significant milestone, as it means VariantSpark can be scaled up to analyse population-level datasets and drive better healthcare outcomes."

The paper concluded that VariantSpark is not a replacement for traditional genetic association analysis, but rather a complement.

“The results of traditional GWAS [genome-wide association studies] and VariantSpark should be considered together to gain insights into the full influence of the genome on disease and other phenotypes,” the authors wrote.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:

Most Read Articles

CBA backs GitHub automations to get new features to customers faster

CBA backs GitHub automations to get new features to customers faster

NAB decommissions 26-year-old Teradata platform

NAB decommissions 26-year-old Teradata platform

Microsoft had three staff at Australian data centre campus when Azure went out

Microsoft had three staff at Australian data centre campus when Azure went out

Supernode plans $2.5bn data centre development north of Brisbane

Supernode plans $2.5bn data centre development north of Brisbane

Log In

  |  Forgot your password?