15 Jun 2022 | Paper accepted at WABI 2022
Our paper “Fast gapped k-mer counting with subdivided multi-way bucketed Cuckoo hash tables” by Jens and Sven was accepted at WABI 2022, taking place September 5-9, 2022 in Potsdam, Germany.
Jens will present it; you can meet us there.
13 Jun 2022 | Algorithmic Bioinformatics presenting at DSB 2022
Sven, Vu Lam and Jens attended the Data Structures in Bioinformatics (DSB 2022) in Düsseldorf and presented parts of their work.
Jens Zentgraf gave a talk about joint work with Sven Rahmann, titled “Fast gapped k-mer counting with subdivided multi-way bucketed Cuckoo hash tables”.
11 Apr 2022 | Summer semester 2022 has started
The summer semester 2022 will offer a mixture between in-presence and online courses.
Our group offers the popular “Algorithms for Sequences Analysis” course (lectures online, may later move to an on-campus mode; tutorials both online and on-campus).
More information can be found on the course website.
Registration for UdS students is required to access the course materials.
Our group also offers a (Master) seminar “Algorithms for Metagenomics”.
Information and registration take place via the CS Seminar website.
Registration ends Tuesday, 12.04.
08 Mar 2022 | Algorithmic Bioinformatics supports an art project of Alicja Kwade
The Berlin-based artist Alicja Kwade had her personal genome printed out for her exhibition “In Absence” – on 314,000 DIN A4 pages of paper. To do this, she collaborated with Sven Rahmann, bioinformatics professor at Saarland University. The exhibition can be visited at the Berlinische Galerie until April 4.
The human genome consists of 3.1 billion base pairs - a number that is difficult to grasp. “Even for us bioinformaticians, this is an abstractly high number, although we work with genome data almost every day. This is because we usually only have the data as files on the computer,” says Saarbrücken bioinformatics professor Sven Rahmann.
The dimensions of the human genome can be better understood through a project by the Berlin artist Alicja Kwade. She has had her personal genome printed out on 314,000 A4 pages and is exhibiting it publicly in her exhibition “In Absence” at the Berlinische Galerie. 12,000 pages have been hung on the walls of the hall, the rest are in copper archive boxes distributed around the room. If all the pages of this genome document were laid side by side, they would stretch over a length of around 66 kilometers.
More information is available in the press release by Saarland Informatics Campus, or in an article of Saarbrücker Zeitung.
(Photograph by Frank Tschentscher)
07 Mar 2022 | Borja Freire Castro - Visiting Researcher from University of A Coruña, Spain.
We welcome Borja, who is finishing his PhD thesis at University of A Coruña, Spain, as a short-term guest researcher in our group.
He has worked on the reconstruction of viral quasispecies, and is interested in learning about alignment-free algorithms and statistical methods in bioinformatics.
19 Oct 2021 | Busy Beaver Award for "Algorithms on Sequence Analysis" lecture
The summer term lecture “Algorithms for Sequence Analysis” by Sven Rahmann has received a Busy Beaver Award by the Student Council.
This award is handed out for lectures that were well received and received very positive evaluations by students.
18 Oct 2021 | Winter semester 2021/22 is starting
During the winter semester, Prof. Sven Rahmann is offering the following courses:
Lecture and tutorials in “Statistics, Probability and Applications in Bioinformatics”. More information is on the website, or directly in the SIC CMS, where you also need to register for the course with your UdS student account.
Master Seminar “Current Topics in Sequence Analysis”, for students who have previously passed the “Algorithms for Sequence Analysis course”. More information is on the website, or directly in the SIC Seminar System, where you choose your preferred seminar(s). Note that you need to go through the assignment process and cannot directly register for the seminar.
26 Jun 2021 | Saarland University Open Day / Open House
Saarland University Open House is taking place virtually on Satuday, June 26, 2021.
All departments offer insights into their research and provide information about their study programs.
We present a talk “Genome assembly as a bioinformatics puzzle with billions of pieces” for the interested public. There is also a real puzzle to download. Enjoy!
19 Apr 2021 | Sustainable Data Analysis with Snakemake
The latest version of the paper “Sustainable data analysis with Snakemake” is out.
Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
In this latest version, we have clarified several claims in the readability analysis. Further, we have extended the description of the scheduling to also cover running Snakemake on cluster and cloud middleware. We have extended the description of the automatic code linting and formatting provided with Snakemake. Finally, we have extended the text to cover workflow modules, a new feature of Snakemake that allows to easily compose multiple external pipelines together, while being able to extend and modify them on the fly.
12 Apr 2021 | Summer semester 2021 has started
During this semester, the “Algorithms for Sequences Analysis” course is offered by Sven Rahmann.
More information can be found on the course website.
Registration for UdS students is required via the Course Management System of Saarland Informatics Campus.
08 Apr 2021 | GAMIBHEAR-a tool for accurate haplotype reconstruction from GAM data
We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observ
ation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimize phasing
“GAMIBHEAR” is available as an R package under the open-source GPL-2 license. The paper has been published in the Bioinformatics journal. “GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data”
Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation.GAM’s ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a priori. So far, however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes.
09 Mar 2021 | Bioinformatic analysis of neutrophils in chronic lymphocytic leukemia reveal interesting facts
Our paper Proteomic and bioinformatic profiling of neutrophils in CLL reveals functional defects that predispose to bacterial infections is online.
Patients with chronic lymphocytic leukemia (CLL) typically suffer from frequent and severe bacterial infections. Although it is well known that neutrophils are critical innate immune cells facilitating the early defense, the underlying phenotypical and functional changes in neutrophils during CLL remain largely elusive. Using a murine adoptive transfer model of CLL, we demonstrate aggravated bacterial burden in CLL-bearing mice upon a urinary tract infection with uropathogenic Escherichia coli. Bioinformatic analyses of the neutrophil proteome revealed increased expression of proteins associated with interferon signaling and decreased protein expression associated with granule composition and neutrophil migration. Functional experiments validated these findings by showing reduced levels of myeloperoxidase and acidification of neutrophil granules after ex vivo phagocytosis of bacteria. Pathway enrichment analysis indicated decreased expression of molecules critical for neutrophil recruitment, and migration of neutrophils into the infected urinary bladder was significantly reduced. These altered migratory properties of neutrophils were also associated with reduced expression of CD62L and CXCR4 and correlated with an increased incidence of infections in patients with CLL.
In conclusion, this study describes a molecular signature of neutrophils through proteomic, bioinformatic, and functional analyses that are linked to a reduced migratory ability, potentially leading to increased bacterial infections in patients with CLL.
03 Feb 2021 | Pangenome Local Alignment Search Tool for detecting high scoring local alignments
Our paper Detecting high-scoring local alignments in pangenome graphs is about a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph.
Our approach additionally allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings. An implementation of our method is presented, and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. This is an advantage over classical methods that do not make use of sequence similarity within the pangenome.
PLAST builds a compacted, colored de Bruijn graph from given input genomes using the API of Bifrost. Apart from the requirements of Bifrost (c++ and cmake), there are no further strict dependencies.
The source code and test data is available here: PLAST
31 Jan 2021 | Open PhD position in Research Training Group WisPerMed
Fully-funded 3-year position (TV-L E13, 100%) for a doctoral student (m/f/d) in our Research Training Group WisPerMed (“Knowledge and data based personalization of medicine at the point of care”) in a research project on genome-wide variant data and its interpretation.
Use and learn Snakemake, Python, Rust, …
Develop software that will make a difference in clinical practice with a focus on reproducibility and code quality.
Help future doctors to access patient-related information quickly and in context.
Work with Johannes Köster and Sven Rahmann.
Please take a look at this poster and apply immediately via WisPerMed or the German version.
04 Nov 2020 | SARS-CoV-2 pre-formed immunity towards structural proteins not driven by similar epitopes
Our findings on SARS-CoV-2 viral proteins is published in Scientific Reports-Nature. The paper Epitope similarity cannot explain the pre-formed T cell immunity towards structural SARS-CoV-2 proteins is online.
The current pandemic is caused by the SARS-CoV-2 virus and large progress in understanding the pathology of the virus has been made since its emergence in late 2019. Several reports indicate short lasting immunity against endemic coronaviruses, which contrasts studies showing that biobanked venous blood contains T cells reactive to SARS-CoV-2 S-protein even before the outbreak in Wuhan. This suggests a preformed T cell memory towards structural proteins in individuals not exposed to SARS-CoV-2. Given the similarity of SARS-CoV-2 to other members of the Coronaviridae family, the endemic coronaviruses appear likely candidates to generate this T cell memory. However, given the apparent poor immunological memory created by the endemic coronaviruses, immunity against other common pathogens might offer an alternative explanation. Here, we utilize a combination of epitope prediction and similarity to common human pathogens to identify potential sources of the SARS-CoV-2 T cell memory. Although beta-coronaviruses are the most likely candidates to explain the pre-existing SARS-CoV-2 reactive T cells in uninfected individuals, the SARS-CoV-2 epitopes with the highest similarity to those from beta-coronaviruses are confined to replication associated proteins-not the host interacting S-protein.
Thus, our study suggests that the observed SARS-CoV-2 pre-formed immunity to structural proteins is not driven by near-identical epitopes.
12 Oct 2020 | Machine learning approaches for diagnostic biopsies of non-small cell lung cancer patients
Our article entitled Machine learning reveals a PD-L1–independent prediction of response to immunotherapy of non-small cell lung cancer by gene expression context is published in European Journal of Cancer and now available online.
We set out to apply context-sensitive feature selection and machine learning approaches on expression profiles of immune-related genes in diagnostic biopsies of patients with stage IV NSCLC.We applied supervised machine learning methods for feature selection and generation of predictive models.Feature selection and model creation were based on a training cohort of 55 patients with recurrent NSCLC treated with PD-1/PD-L1 antibody therapy. Resulting models identified patients with superior outcomes to immunotherapy, as validated in two subsequently recruited, separate patient cohorts (n = 67, hazard ratio = 0.46, p = 0.035). The predictive information obtained from these models was orthogonal to PD-L1 expression as per immunohistochemistry: Selecting by PD-L1 positivity at immunohistochemistry plus model prediction identified patients with highly favourable outcomes.Visualisation of the models revealed the predictive superiority of the entire 7-gene context over any single gene.
Using context-sensitive assays and bioinformatics capturing the tumour immune context allows precise prediction of response to PD-1/PD-L1-directed immunotherapy in NSCLC.
20 Sep 2020 | Unique germline-specific ageing pattern found in healthy men
Our paper A germ cell-specific ageing pattern in otherwise healthy men is online.
Life-long sperm production leads to the assumption that male fecundity remains unchanged throughout life. However, recently it was shown that paternal age has profound consequences for male fertility and offspring health. Paternal age effects are caused by an accumulation of germ cell mutations over time, causing severe congenital diseases. Apart from these well-described cases, molecular patterns of ageing in germ cells and their impact on DNA integrity have not been studied in detail.
In this study, we aimed to assess the effects of ‘pure’ ageing on male reproductive health and germ cell quality. We assembled a cohort of 198 healthy men (18–84 years) for which end points such as semen and hormone profiles, sexual health and well-being, and sperm DNA parameters were evaluated. Sperm production and hormonal profiles were maintained at physiological levels over a period of six decades. In contrast, we identified a germ cell-specific ageing pattern characterized by a steady increase of telomere length in sperm and a sharp increase in sperm DNA instability, particularly after the sixth decade. Importantly, we found sperm DNA methylation changes in 236 regions, mostly nearby genes associated with neuronal development. By in silico analysis, we found that 10 of these regions are located in loci which can potentially escape the first wave of genome-wide demethylation after fertilization.
In conclusion, human male germ cells present a unique germline-specific ageing process, which likely results in diminished fecundity in elderly men and poorer health prognosis for their offspring.
17 Sep 2020 | Genomic analysis of pathogenic Microbotryum species
Our article Meiotic recombination in the offspring of Microbotryum hybrids and its impact on pathogenicity is published in BMC Evolutionary Biology.
Here, we performed experimental crosses between the two pathogenic Microbotryum species, M. lychnidis-dioicae and M. silenes-acaulis that are specialized to different hosts. The resulting offspring were analyzed on phenotypic and genomic levels to describe genomic characteristics of hybrid offspring and genetic factors likely involved in host-specialization.
Genomic analyses of interspecific fungal hybrids revealed that individuals were most viable if the majority of loci were inherited from one species. Interestingly, species-specific loci were strictly controlled by the species’ origin of the mating type locus. Moreover we detected signs of crossing over and chromosome duplications in the genomes of the analyzed hybrids. In Microbotryum, mitochondrial DNA was found to be uniparentally inherited from the a2 mating type. Genome comparison revealed that most gene families are shared and the majority of genes are conserved between the two species, indicating very similar biological features, including infection and pathogenicity processes. Moreover, we detected 211 candidate genes that were retained under host-driven selection of backcrossed lines. These genes and might therefore either play a crucial role in host specialization or be linked to genes that are essential for specialization.
This study manifests genetic factors of host specialization that are required for successful biotrophic infection of the post-zygotic stage, but also demonstrates the strong influence of intra-genomic conflicts or instabilities on the viability of hybrids in the haploid host-independent stage.