loader image

LATVIAN

BIOMEDICAL

RESEARCH AND STUDY CENTRE


RESEARCH AND EDUCATION IN BIOMEDICINE FROM GENES TO HUMAN

Funding: European Regional Development Fund (ERDF) “On Implementation of Activity 1.1.1.2 “Post-doctoral Research Aid” of the Specific Aid Objective 1.1.1 “To increase the research and innovative capacity of scientific institutions of Latvia and the ability to attract external financing, investing in human resources and infrastructure” of the Operational Programme “Growth and Employment”

Project Title: sRNAflow – a tool for analysis of small RNA-seq data

Project Nr.: 1.1.1.2/VIAA/1/16/135

Period: 36 month (1st September 2017 – 31st August 2020)

Project costs: 133 806,00 EUR

Project implementer: Dr.biol. P. Zajakins

The goal of this project is to develop sRNAflow – a software tool for analysis of small RNAs, which would not require advanced computer skills from the user. The tool will include some specific novel algorithms necessary for analysis of samples obtained from biofluids. Many already existing tools for adapter removing, quality check, mapping reads, counting, differential expression analysis, non-template isomiRs search, and for the miRNA target prediction will be integrated into software. This tool will produce united report for all steps of workflow. This project is focused on needs of the inexperienced user to keep this tool simple to use.

Information published 01.09.2017.

Progress of the project

1 September 2017 – 30 November 2017

During these three months, using the R programming language and publicly available software packages, the “alpha” version of the software program has been developed. It is includes, at the moment, a quality check with fastqc before and after the removal of the adapters, a representative random subset of sample comparing against the NR database, using the blastn tool, mapping of the sample to the full human genome (Ensembl database), alignments repositioning, based on the local coverage with the ShortStack algorithm, creation of a catalog of expressed RNA types using Ensembl’s human genome annotation and miRNA differential expression analysis using DESeq2.

The project was popularly presented to the wider community within the framework of “Scientists’ Night” activities.

Information published 30.11.2017.

Progress of the project

1 December 2017 – 28 February 2018

During these three months, a new version of the program has been developed. It’s corrects various errors and have such additional functionality as a differential expression analysis using DESeq2 for all classifiable RNA types. It introduces the analysis of the overlap of the classified RNA types. Has been developed the prioritisation algorithm of building of catalog of expressed RNA types, which allows to solve the problem with an annotation overlap. Started work on the BLAST rating sorting algorithm.

Participated in the “3rd OpenMultiMed Training School. Multi-scale and Multi-level Modelling Methodologies in Biomedicine”.

Information published 28.02.2018.

Progress of the project

1 March 2018 – 31 Maijs 2018

Within these three months, work on the BLAST ranking algorithm continues. The metagenome generation program is developed, it is include generic metagenome combining human and 44000 bacteria, 400 mushrooms and 200 genotypes of protist species. A new metagenome generation procedure that will allow generate more compact metagenome is a planned. A new version of software developed. Updated databases and versions of the used programs.

During the visit to the European Bioinformatics Institute (EMBL-EBI) consultations form EMBL-EBI specialists were acquired.

Attending the “System Biology School” in St. Petersburg, the Institute for Bioinformatics, ITMO Universitates and Washington Universitates (St. Louis) in Russia. The study covered topics such as the Genome Association Study (GWAS), the sequencing of the exome: from the basics to the phenotype (BWA, Plink, GATK, ExAC and gnomAD), RNSseq, Proteomics and Epigenetic.

Information published 31.05.2018.

Progress of the project

1 Juny 2018 – 31 August 2018

Within these three months, it has been incorporated metagenome identification procedures based on Kraken2 and MetaPhlAn2.

The custom Kraken2 and MetaPhlAn2 metagenome databases have been prepared, combining genomes of humans, bacteria and protozoan species. Developed the visualization procedure based on Krone. Work on the BLAST ranking algorithm continues.

Information published 31.08.2018.

Progress of the project

1 September 2018 – 30 November 2018

During these three months, work with the BLAST ranking sorting algorithm for the identification of small RNAs for commensal microflora or infections in biofluids has been continued. During the visit to the European Bioinformatics Institute (EMBL-EBI) consultations form EMBL-EBI specialists were acquired. One of the possible options for the metagenome identification procedure  based on Sourmash, which uses k-mers generic databases from all registered in GenBank and RefSeq databases species, has been incorporated into the pipeline. The RNA type catalog’s generation algorithm supplemented with new tRNA and tRFs databases. Approach of generation of the tRNA database is altered. The work on the new method of an RNA type catalog’s generation continues.

Information published 30.11.2018.

Progress of the project

During these three months work on meta-genome generation from the genomes of species selected in the BLAST ranking sorting algorithm for identifying microflora and small RNA from biofluids samples was continued. The work on problem solving in the algorithm with the filtration of poorly represented organisms. Corrected errors in the RNA types catalog generation algorithm. Changed the priority procedure, and solved the problem with feature overlay and revamped the approach of RNA types catalog generation in such cases.

Information published 28.02.2019.

Progress of the project

1 Mart 2019 – 31 May 2019

During these three months work on problem-solving in the algorithm with the filtration of poorly represented organisms continued. Corrected errors in the RNA types catalogue generation algorithm and changed the priority procedure. An algorithm for the personalization of a genome has been developed for cases of availability of patient’s mRNA NGS data. Work on a unified report and user interface has started.

Information published 31.05.2019.

Progress of the project

1 June 2019 – 31 August 2019

During these three months work on problem-solving in the algorithm with the filtration of poorly represented organisms continued. The summer school Genopole 2019 “Bioinformatics and Biostatistical Tools in Medical Genomics” visited (Seine-Port, France, University of Evry-Val d’Essonne). Participation in the conference «Bioinformatics: from Algorithms to Applications» (Russia, Saint Petersburg, Saint Petersburg State University). Algorithm for identification of commensal microflora and small RNAs of pathogenic origin in bio-specimens are presented during the poster session. Work on a unified report and user interface continued.

Information published 30.08.2019.

Progress of the project

1 September 2019 – 30 November 2019

During these three months, work on a unified report and user interface has been continued. During the visit to the European Bioinformatics Institute (EMBL-EBI) consultations form EMBL-EBI specialists were acquired. The Shiny package is selected as the base for user interface construction. Work on caching procedures to speed up program work when changes are made on late stages only has started.

Information published 29.11.2019.

Progress of the project

1 December 2019 – 29 February 2020

During these three months, work on a unified report, caching procedures and user interface based on the Shiny package has been continued. A new version of software developed. Updated databases and versions of the used programs.

Information published 28.02.2020.

Progress of the project

1 March 2019 – 31 May 2020

During these three months, work continued with the integration of a series of procedures to speed up the work of the program in cases when changes were entered only in certain stages on the basis of the NextFlow package. Work on creating a user interface based on the Shiny package has been continued. Analysis of literature on containerization. A public repository for the application has been created on GitHub.

Information published 29.05.2020.

Progress of the project

1 June 2019 – 31 August 2020

During these three months, a new version of the software was developed, in which various errors were corrected, databases and used programs were updated. Attended the conference “Bioinformatics Community Conference 2020”. The results were presented in a poster-session. Continued work on creating a user interface based on the Shiny package. The software has been successfully containerized and a public repository of the program has been created on DockerHub (zajakin/srnaflow). Preparation of a publication.

Information published 31.08.2020.