Project Title: “An integrated population based Latvian genome reference and its applicability to personal risk estimation for metabolic traits”
Funding: European Regional Development Fund (ERDF), Measure 1.1.1.1 “Support for applied research”
Project No.: 1.1.1.1/20/A/126
Period: 1 March 2021 – 30 November 2023
Project costs: 540 540.54 EUR
Principle Investigator: Dr. biol. Jānis Kloviņš
Cooperation partner: SIA “Latvia MGI Tech”
Project summary:
This project aims is to create the first Latvian genome variation reference and estimate the accurate, population-specific PRS for the metabolic disease using the national biobank, the Genome Database of Latvian Population (LGDB) resource establishing a functional framework for -omics based personalised prevention and treatment program in Latvia.
The aim of our project is to create the first Latvian genome variation reference and estimate the accurate, population-specific polygenic risk scores (PRS) for the metabolic disease using the national biobank the Genome Database of Latvian Population resource establishing a functional framework for -omics based personalised prevention and treatment program in Latvia. The project will have a great impact on the further development of genomics-based medicine in Latvia and resources created in the frame of this project will also be available for the whole research community. More specifically, we will target diabetes and produce validated optimal polygenic risk scores for this type 2 diabetes and other metabolic parameters and will use this information to search for novel biomarkers or possible patient subgroups that would benefit from the knowledge of their genetics.
Information published 01.03.2021.
Progress of the project:
1 March 2021 – 31 May 2021
In the first quarter of the project, cohort selection is performed using the resources of the Genome database of Latvian population, including individuals representing the general population, patients with type 2 diabetes within the OPTOMED cohort, patients with type 2 diabetes outside the OPTIMED cohort with additional longitudinal samples and MODY patients. The clinical parameters that will be obtained from the health data registries are selected in order to supplement the range of related data available in the Genome database of Latvian population. In addition, selection of DNA samples and quality control are performed for the preparation of the next generation of sequencing libraries.
Information published 31.05.2021.
Progress of the project:
1 June 2021 – 31 August 2021
Cohort selection was performed using the resources of the National Genome Database of Latvia. Using MGI next-generation sequencing technology, full genome sequences have been obtained for 236 samples collected from both healthy individuals and patients with type 2 diabetes. The quality control of the full genome sequencing data obtained in the first work package and the development of the analysis workflow have been started. Scientific literature is analyzed and various tools for primary data processing, identification of polymorphisms, insertions, deletions and structural variations are tested using HPC resources of Riga Technical University.
Information published 31.08.2021.
Progress of the project:
1 September 2021 – 30 November 2021
New samples are selected from the Genome Database of Latvian Population, DNA quality control and sample preparation for the next-generation sequencing with MGI DNA nano-bead-based technology and DNBSEQ-T10 × 4 equipment have been done. The development of data analysis pipelines using the HPC resources of Riga Technical University is still ongoing. For 236 samples we have obtained full genome sequencing data, successfully determined the structural variations, SNPs, insertions/deletions, performed the quality control of sequencing data and mapping.
Information published 30.11.2021.
Progress of the project:
1 December 2021 – 28 February 2022
Statistical analysis of 213 whole-genome sequencing data was completed, including mapping, joint variant calling, calling of single nucleotide variants, indels, structural variations, and mobile elements, annotation of the location and function of these variants, analysis of population structure, and combining with others whole-genome data sets, including 1000 Genome Project. An additional 600 whole-genome sequences were obtained from both healthy individuals and patients with type 2 diabetes and MODY using MGI next-generation sequencing technology. Quality control of this data and data analysis is performed. Testing of selected tools for PRS calculation and process optimization is underway.
Information published 28.02.2022.
Progress of the project:
1 March 2022 – 31 May 2022
During the reporting period, an analysis of whole-genome sequencing data of 684 samples has been performed using a previously developed workflow. Calling of single nucleotide variants, indels, structural variations and mobile elements, annotation of the location and function of these variants, analysis of population structure and merging with other whole-genome data sets have been performed. The imputation quality using the 1000 Genomes project panel has been compared with the combined panel, which additionally contains the whole-genome sequences of the Latvian population, and the results show significant improvements in the imputation quality. Polygenic risk scores in the cohort type 2 diabetes patients have been caluclated using previously published polygenic risk models.
Information published 31.05.2022.
Progress of the project:
June 1, 2022 – August 31, 2022
During the reporting period, the selection of samples from the cohort of patients with type 2 diabetes and the group of patients with MODY diabetes was continued. Corresponding DNA samples for sequencing and clinical and phenotypic information related to the corresponding patients were extracted from the biobank. Using all samples sequenced so far and supplementing the sample set with a previously genotyped larger set of samples, an admixture analysis to establish the population structure was performed. Development of the Reference Genome Variation Panel, optimization of the imputation and calculation of polygenetic risk values using population-specific models is ongoing.
Information published 31.08.2022.
Progress of the project:
1 September 2022 – 30 November 2022
A complete genome imputation analysis was performed using a population-specific reference panel from Latvia. The results showed a significant improvement in imputed genotype accuracy and an overall increase in the number of imputed variants with a population frequency of >0.5%. In collaboration with the Riga Technical University computing center, the performance of genome variation detection software was significantly improved on Latvia’s HPC infrastructure. During the reporting period, the project participant gained additional knowledge in the field of computer science at the university at Buffalo, State University of New York, USA.
Information published 30.11.2022.
Progress of the project:
1 December 2022 – 28 February 2023
During the reporting period, a publication entitled “Whole genome sequencing of 539 individuals from Latvia: the first step towards the population-specific reference of genetic variation” was submitted to BMC Medical Genomics. The study summarizes the genetic variation diversity of the Latvian population, discusses population stratification at the European and global levels, and defines the impact of population-specific reference panel on imputation accuracy and the number of imputed variants. Population structure analysis was complemented by the determination of mitochondrial and Y chromosome haplogroups from WGS sequenced samples.
Information published 28.02.2023.
Progress of the project:
1 March 2023 – 31 May 2023
Whole genome sequencing analysis of selected samples for the creation of a representative cohort of the population of Latvia is underway. Using previously selected tools, PRS algorithms are compared using sets of different polymorphisms from the available data. Recommendations for the use of PRS in clinical practice have been prepared.
Information published 31.05.2023.
Progress of the project:
1 June 2023 – 31 August 2023
Whole genome sequencing analysis of the selected samples for the creation of a representative cohort of the population of Latvia is ongoing. Evaluation and testing of PRS algorithms on the sample set created so far continues. Publication manuscripts on PRS in the MODY cohort and the association of PRS with specific metabolic markers are in progress while the final steps in data analysis are underway.
Information published 01.09.2023.
Progress of the project:
1 September 2023 – 30 November 2023
Within the project, a comparison of the performance of 102 Type 2 diabetes polygenic risk models has been conducted in the Latvian population. Additionally, a population-specific Type 2 diabetes polygenic risk model has been developed, showing performance equivalent to previously published models. A manuscript titled ‘Evaluating the Efficacy of Type 2 Diabetes Polygenic Risk Scores in an Independent European Population’ has been prepared for submission to the International Journal of Molecular Sciences based on the obtained results. Furthermore, the newly created, population-specific polygenic risk model has been utilized for stratifying MODY patients. However, additional exploration, including previously published Type 1 diabetes polygenic risk models, reveals that the genetic risk for Type 1 diabetes plays a more significant role in the stratification of MODY patients, especially in cases where causal variants in MODY genes have not been identified. The manuscript ‘Identification of pathogenic mutations and application of polygenic risk scores to differentiate MODY patients from other diabetes types’ has been submitted for publication in Diabetes. Finally, a plasma metabolome analysis has been performed, revealing altered metabolite and lipoprotein profiles in healthy individuals with an elevated Type 2 diabetes polygenic risk.
Information published 30.11.2023.