Algorithms for large-scale genome analysis, Haplotypes are a series of genetic variations arranged side by side on the same chromosome and transmitted in groups to the next generation.
This research enables them to understand the inheritance of certain complex traits, e.g. B. risk of disease. But, this analysis usually requires a genome analysis of family members (parents and children), a tedious and expensive process. To overcome this problem, researchers have developed SHAPEIT4, a powerful computer algorithm that can be used to quickly identify haplotypes of hundreds of thousands of unrelated people.
The results are as detailed as family analysis, a process that cannot be carried out on a large scale. Your tool is now available online under an open-source license that is freely available to the entire research community.
Analysis of genetic data is becoming increasingly important, especially in the field of personalized medicine. The number of human genomes ordered each year grows exponentially and the largest database represents more than one million individuals.
This wealth of data is valuable for a better understanding of the genetic fate of humans, for determining the genetic weight of a particular disease, or for a better understanding of the history of human migration. But, it makes sense to process this big data electronically.
Genotype allows one to find out individual alleles, namely genetic variations originating from their parents. But, without knowing the parent’s genome, we do not know which alleles transmitted to children at the same time and where are the combinations.
For example, to determine the genetic risk of a disease, scientists evaluate whether genetic changes in people who have developed the disease are more or less present to determine the role of these changes in the disease investigated.
But, we move from one variant to a combination of many, which allows us to determine which combination of alleles in the same chromosome has the greatest impact on disease risk.
The method developed by the researchers makes it possible to process large numbers of genomes, around 500,000 to 1,000,000 individuals, and to identify their haplotypes without knowing their lineage or descent. The SHAPEIT4 tool has been successfully tested on 500,000 individual genomes available on the UK biobank, a scientific database developed in the UK.
Researchers have decided to make their tools accessible to everyone under an open-source MIT license: all code is accessible and researchers can change it as they wish. This decision was made primarily for reasons of transparency and reproducibility, as well as to encourage researchers from all over the world.
This tool is far more efficient than older tools and is faster and cheaper. This also allows the impact on the digital environment to be limited. Very powerful computers that are used to process big data are really very energy efficient. Reducing its use also helps minimize its negative effects.