1001 Genomes Plus
- Acronym 1001G3
- Duration 36
-
Project leader
Magnus Nordborg, AT, Gregor Mendel Institute Vienna, funded by FWF
-
Other project participants
Paul Kersey, UK, EMBL Hinxton, present address: Royal Botanical Gardens, Kew, funded by BBSRC
Detlef Weigel, DE, MPI for Developmental Biology, Tuebingen, funded by DFG
- Funding
- Total Granted budget
Abstract
"Understanding how genetic variation translates into phenotypic variation, and how this translation depends on the environment, is a major challenge for modern biology. It is fundamental to human genetics and agriculture, as well as evolutionary biology. Thanks to advances in technology, it is now possible to start answering this question by sequencing entire populations and connecting this information to phenotypic data, whether this be public health records, crop yield data, or the ability to withstand stress in a controlled experiment or in nature. There is, however, an important aspect that is often glossed over in all these (often highly publicized) efforts: we are still far from fully describing genetic variation on a population scale. The ""next-generation"" sequencing methods that have made it economically feasible to screen large numbers of individuals (the almost mythical ""$1000 Human Genome"") do not actually produce complete genome sequences - they produce massive numbers of very short sequence fragments that must be aligned to a reference genome in order to identify variants. Because of this, only simple variants (single nucleotide and very short insertion/deletion polymorphisms) are reported, and the results are invariably biased with respect to what is present or missing in the reference genome. Large or complex structural variants, as well as simple variants inside complex variants are generally missed completely. It is currently not known how serious this problem is, for the simple reason that finding out requires completely assembling large number of genomes, and comparing the result to data generated using standard methods. This is the objective of the 1001G+ proposal. Long-read sequencing has now advanced to a stage where generating nearly complete genomes for large samples is feasible - at least for organisms with relatively small genomes. Building on our success with the ""1001 Genomes Project"", we will assemble at least 50 genomes from a diverse collection of Arabidopsis thaliana strains, annotate them with transcriptome and epigenome information, and develop tools to make the results available to the community. This will go a long way toward answering the question of what is hidden in the part of the genome we currently cannot see - certainly in A. thaliana, but our results (and the tools and concepts we develop to find, interpret, and share complete information on sequence variants) will pave the way for similar studies in organisms with larger genomes, where the hidden part is likely to be relatively larger, and perhaps even more important. The project brings together a team of a researchers with complementary skills, considerable management expertise and a strong track record of collaborating to deliver results for the community. In addition, regular meetings with leaders of complementary efforts in other organisms will ensure the broader relevance of the project."