NIH’s 1,000 Genomes Project gains wider access
■ A major technology company has agreed to host the project’s database on a cloud-based system to provide a clear path for researchers.
By Pamela Lewis Dolan — Posted April 18, 2012
- WITH THIS STORY:
- » Related content
Researchers and physicians now will have access to the largest-known database of genetic variations thanks to a partnership announced in March between Amazon Web Services and the National Institutes of Health.
Through the partnership, the 200-terabyte 1000 Genomes Project database, equivalent to more than 30,000 standard DVDs, will be stored on Amazon’s cloud and accessible to anyone with an Internet connection and a computer capable of processing that amount of data. Tools to process the data also are available on Amazon for a fee that varies according to the data access and analyses needed. Amazon created a website explaining the computing requirements or tools that can be used to access the database (link).
The 1,000 Genomes Project was launched in 2008 to create the most comprehensive map of human genetic variation available anywhere in the world. The goal is to collect data from the genomes of more than 2,600 people from 26 populations around the world and find the majority of all genome variations in existence. The database now has the genome sequencing of 1,700 people.
Whole genome sequencing allows researchers to identify genetic variations that increase a person’s risk of developing any one of a variety of conditions or diseases.
“The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation’s health and economy,” said Francis S. Collins, MD, PhD, NIH director.
Lisa Brooks, PhD, program director of the Genetics Variation Program at the NIH, said most research labs do not have access to such a large data set or the computing power to work with one. The NIH is investing this money to advance disease study that otherwise would be difficult to carry out, she said.
The 1,000 Genomes Project database is geared toward researchers as opposed to the practicing physician. But if a physician already deals with genetic testing, he or she could use the database to determine whether a patient’s genetic variation may have led to them developing a particular disease or condition, Brooks said. A physician’s office computer would not likely be able to process the data, but a doctor could get access through a teaching hospital that does research.
Some geneticists argue that there could be value in whole genome sequencing becoming a routine part of care by primary care physicians. It would provide a way of tailoring preventive care plans to each patient’s genetic dispositions to certain conditions. A publicly available database such as this would give geneticists access to more genome variations for comparison, said Charis Eng, MD, PhD, founding director of the Center for Personalized Medicine at Cleveland Clinic.
“When we find a [genome] variation & what we do is compare it to ‘a normal person,’ and right now we don’t have 1,000 normal persons,” Dr. Eng said. She said the cloud-based database will provide thousands of references.
The partnership between Amazon and the NIH is the kind of solution the Obama administration hopes to see more of through the Big Data Research and Development Initiative announced in March.
To launch the initiative, six federal agencies, including the NIH, committed more than $200 million to improve the access to big data sets and the tools needed to organize the data and use them for research discoveries.