The Algorithmic Bioinformatics group at Saarland University is headed by Prof. Sven Rahmann. It belongs to both the [Mathematics and Informatics Faculty (“MI”) and the Center for Bioinformatics (ZBI) and is part of Saarland Informatics Campus (SIC). Our research focuses on method development (algorithms and data structures) for concrete problems that arise in biological data analysis. We mainly teach in the Bioinformatics degree programs.


The Algorithmic Bioinformatics group at Saarland University is headed by Prof. Sven Rahmann. It belongs to both the [Mathematics and Informatics Faculty (“MI”) and the Center for Bioinformatics (ZBI) and is part of Saarland Informatics Campus (SIC). Our research focuses on method development (algorithms and data structures) for concrete problems that arise in biological data analysis. We mainly teach in the Bioinformatics degree programs.

Recent News

19 Oct 2021 | Busy Beaver Award for "Algorithms on Sequence Analysis" lecture

The summer term lecture “Algorithms for Sequence Analysis” by Sven Rahmann has received a Busy Beaver Award by the Student Council. This award is handed out for lectures that were well received and received very positive evaluations by students.

18 Oct 2021 | Winter semester 2021/22 is starting

During the winter semester, Prof. Sven Rahmann is offering the following courses:

  • Lecture and tutorials in “Statistics, Probability and Applications in Bioinformatics”. More information is on the website, or directly in the SIC CMS, where you also need to register for the course with your UdS student account.

  • Master Seminar “Current Topics in Sequence Analysis”, for students who have previously passed the “Algorithms for Sequence Analysis course”. More information is on the website, or directly in the SIC Seminar System, where you choose your preferred seminar(s). Note that you need to go through the assignment process and cannot directly register for the seminar.

09 Sep 2021 | Hashing Tutorial at GCB 2021

Jens Zentgraf and Sven Rahmann gave a tutorial on modern hashing methods for alignment-free (k-mer based) sequence analysis at the German Conference on Bioinformatics (GCB) 2021 online.

The slides are available online:

  1. Introduction: k-mers and alignment-free methods
  2. Hashing: Hash functions and collision resolution strategies
  3. Multi-way bucketed Cuckoo hashing
  4. Performance engineering

26 Jun 2021 | Saarland University Open Day / Open House

Saarland University Open House is taking place virtually on Satuday, June 26, 2021. All departments offer insights into their research and provide information about their study programs. We present a talk “Genome assembly as a bioinformatics puzzle with billions of pieces” for the interested public. There is also a real puzzle to download. Enjoy!

19 Apr 2021 | Sustainable Data Analysis with Snakemake

The latest version of the paper “Sustainable data analysis with Snakemake” is out.

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

In this latest version, we have clarified several claims in the readability analysis. Further, we have extended the description of the scheduling to also cover running Snakemake on cluster and cloud middleware. We have extended the description of the automatic code linting and formatting provided with Snakemake. Finally, we have extended the text to cover workflow modules, a new feature of Snakemake that allows to easily compose multiple external pipelines together, while being able to extend and modify them on the fly.

12 Apr 2021 | Summer semester 2021 has started

During this semester, the “Algorithms for Sequences Analysis” course is offered by Sven Rahmann.

More information can be found on the course website. Registration for UdS students is required via the Course Management System of Saarland Informatics Campus.

08 Apr 2021 | GAMIBHEAR-a tool for accurate haplotype reconstruction from GAM data

We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observ ation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimize phasing accuracy.

“GAMIBHEAR” is available as an R package under the open-source GPL-2 license. The paper has been published in the Bioinformatics journal. “GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data”

Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation.GAM’s ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a priori. So far, however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes.

01 Apr 2021 | Algorithmic Bioinformatics group at Saarland University

As of today, Prof. Dr. Sven Rahmann heads the new Algorithmic Bioinformatics group at Saarland University. The group belongs to both the Mathematics and Informatics Faculty (“MI”) and the Center for Bioinformatics (ZBI) at Saarland University and is part of Saarland Informatics Campus (SIC).

The research focus will gradually shift more towards algorithm development and methodological research. We look forward towards new exciting projects and collaborations, and we are also happy to continue and conclude ongoing projects at University Alliance Ruhr. This website will remain the website of the lab, and the content will be gradually adapted towards the new profile.

09 Mar 2021 | Bioinformatic analysis of neutrophils in chronic lymphocytic leukemia reveal interesting facts

Our paper Proteomic and bioinformatic profiling of neutrophils in CLL reveals functional defects that predispose to bacterial infections is online.

Patients with chronic lymphocytic leukemia (CLL) typically suffer from frequent and severe bacterial infections. Although it is well known that neutrophils are critical innate immune cells facilitating the early defense, the underlying phenotypical and functional changes in neutrophils during CLL remain largely elusive. Using a murine adoptive transfer model of CLL, we demonstrate aggravated bacterial burden in CLL-bearing mice upon a urinary tract infection with uropathogenic Escherichia coli. Bioinformatic analyses of the neutrophil proteome revealed increased expression of proteins associated with interferon signaling and decreased protein expression associated with granule composition and neutrophil migration. Functional experiments validated these findings by showing reduced levels of myeloperoxidase and acidification of neutrophil granules after ex vivo phagocytosis of bacteria. Pathway enrichment analysis indicated decreased expression of molecules critical for neutrophil recruitment, and migration of neutrophils into the infected urinary bladder was significantly reduced. These altered migratory properties of neutrophils were also associated with reduced expression of CD62L and CXCR4 and correlated with an increased incidence of infections in patients with CLL.

In conclusion, this study describes a molecular signature of neutrophils through proteomic, bioinformatic, and functional analyses that are linked to a reduced migratory ability, potentially leading to increased bacterial infections in patients with CLL.

03 Feb 2021 | Pangenome Local Alignment Search Tool for detecting high scoring local alignments

Our paper Detecting high-scoring local alignments in pangenome graphs is about a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph.

Our approach additionally allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings. An implementation of our method is presented, and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. This is an advantage over classical methods that do not make use of sequence similarity within the pangenome.

PLAST builds a compacted, colored de Bruijn graph from given input genomes using the API of Bifrost. Apart from the requirements of Bifrost (c++ and cmake), there are no further strict dependencies. The source code and test data is available here: PLAST

31 Jan 2021 | Open PhD position in Research Training Group WisPerMed

Fully-funded 3-year position (TV-L E13, 100%) for a doctoral student (m/f/d) in our Research Training Group WisPerMed (“Knowledge and data based personalization of medicine at the point of care”) in a research project on genome-wide variant data and its interpretation. Use and learn Snakemake, Python, Rust, … Develop software that will make a difference in clinical practice with a focus on reproducibility and code quality. Help future doctors to access patient-related information quickly and in context. Work with Johannes Köster and Sven Rahmann. Please take a look at this poster and apply immediately via WisPerMed or the German version.

04 Nov 2020 | SARS-CoV-2 pre-formed immunity towards structural proteins not driven by similar epitopes

Our findings on SARS-CoV-2 viral proteins is published in Scientific Reports-Nature. The paper Epitope similarity cannot explain the pre-formed T cell immunity towards structural SARS-CoV-2 proteins is online.

The current pandemic is caused by the SARS-CoV-2 virus and large progress in understanding the pathology of the virus has been made since its emergence in late 2019. Several reports indicate short lasting immunity against endemic coronaviruses, which contrasts studies showing that biobanked venous blood contains T cells reactive to SARS-CoV-2 S-protein even before the outbreak in Wuhan. This suggests a preformed T cell memory towards structural proteins in individuals not exposed to SARS-CoV-2. Given the similarity of SARS-CoV-2 to other members of the Coronaviridae family, the endemic coronaviruses appear likely candidates to generate this T cell memory. However, given the apparent poor immunological memory created by the endemic coronaviruses, immunity against other common pathogens might offer an alternative explanation. Here, we utilize a combination of epitope prediction and similarity to common human pathogens to identify potential sources of the SARS-CoV-2 T cell memory. Although beta-coronaviruses are the most likely candidates to explain the pre-existing SARS-CoV-2 reactive T cells in uninfected individuals, the SARS-CoV-2 epitopes with the highest similarity to those from beta-coronaviruses are confined to replication associated proteins-not the host interacting S-protein.

Thus, our study suggests that the observed SARS-CoV-2 pre-formed immunity to structural proteins is not driven by near-identical epitopes.

12 Oct 2020 | Machine learning approaches for diagnostic biopsies of non-small cell lung cancer patients

Our article entitled Machine learning reveals a PD-L1–independent prediction of response to immunotherapy of non-small cell lung cancer by gene expression context is published in European Journal of Cancer and now available online.

We set out to apply context-sensitive feature selection and machine learning approaches on expression profiles of immune-related genes in diagnostic biopsies of patients with stage IV NSCLC.We applied supervised machine learning methods for feature selection and generation of predictive models.Feature selection and model creation were based on a training cohort of 55 patients with recurrent NSCLC treated with PD-1/PD-L1 antibody therapy. Resulting models identified patients with superior outcomes to immunotherapy, as validated in two subsequently recruited, separate patient cohorts (n = 67, hazard ratio = 0.46, p = 0.035). The predictive information obtained from these models was orthogonal to PD-L1 expression as per immunohistochemistry: Selecting by PD-L1 positivity at immunohistochemistry plus model prediction identified patients with highly favourable outcomes.Visualisation of the models revealed the predictive superiority of the entire 7-gene context over any single gene.

Using context-sensitive assays and bioinformatics capturing the tumour immune context allows precise prediction of response to PD-1/PD-L1-directed immunotherapy in NSCLC.

20 Sep 2020 | Unique germline-specific ageing pattern found in healthy men

Our paper A germ cell-specific ageing pattern in otherwise healthy men is online.

Life-long sperm production leads to the assumption that male fecundity remains unchanged throughout life. However, recently it was shown that paternal age has profound consequences for male fertility and offspring health. Paternal age effects are caused by an accumulation of germ cell mutations over time, causing severe congenital diseases. Apart from these well-described cases, molecular patterns of ageing in germ cells and their impact on DNA integrity have not been studied in detail.

In this study, we aimed to assess the effects of ‘pure’ ageing on male reproductive health and germ cell quality. We assembled a cohort of 198 healthy men (18–84 years) for which end points such as semen and hormone profiles, sexual health and well-being, and sperm DNA parameters were evaluated. Sperm production and hormonal profiles were maintained at physiological levels over a period of six decades. In contrast, we identified a germ cell-specific ageing pattern characterized by a steady increase of telomere length in sperm and a sharp increase in sperm DNA instability, particularly after the sixth decade. Importantly, we found sperm DNA methylation changes in 236 regions, mostly nearby genes associated with neuronal development. By in silico analysis, we found that 10 of these regions are located in loci which can potentially escape the first wave of genome-wide demethylation after fertilization.

In conclusion, human male germ cells present a unique germline-specific ageing process, which likely results in diminished fecundity in elderly men and poorer health prognosis for their offspring.

17 Sep 2020 | Genomic analysis of pathogenic Microbotryum species

Our article Meiotic recombination in the offspring of Microbotryum hybrids and its impact on pathogenicity is published in BMC Evolutionary Biology.

Here, we performed experimental crosses between the two pathogenic Microbotryum species, M. lychnidis-dioicae and M. silenes-acaulis that are specialized to different hosts. The resulting offspring were analyzed on phenotypic and genomic levels to describe genomic characteristics of hybrid offspring and genetic factors likely involved in host-specialization.

Genomic analyses of interspecific fungal hybrids revealed that individuals were most viable if the majority of loci were inherited from one species. Interestingly, species-specific loci were strictly controlled by the species’ origin of the mating type locus. Moreover we detected signs of crossing over and chromosome duplications in the genomes of the analyzed hybrids. In Microbotryum, mitochondrial DNA was found to be uniparentally inherited from the a2 mating type. Genome comparison revealed that most gene families are shared and the majority of genes are conserved between the two species, indicating very similar biological features, including infection and pathogenicity processes. Moreover, we detected 211 candidate genes that were retained under host-driven selection of backcrossed lines. These genes and might therefore either play a crucial role in host specialization or be linked to genes that are essential for specialization.

This study manifests genetic factors of host specialization that are required for successful biotrophic infection of the post-zygotic stage, but also demonstrates the strong influence of intra-genomic conflicts or instabilities on the viability of hybrids in the haploid host-independent stage.

29 May 2020 | New preprint and software "ting" for TCR repertoire clustering

Our preprint “Rapid T cell receptor interaction grouping with ting” is online. Clustering of antigen-specific T cell receptor repertoire (TCRR) sequences remains challenging. While established tools like gliph aim to solve this problem they suffer from serveral shortcommings, including bad performance on huge repertoires, non-determinism, potential loss of significant antigen-specific or inclusion of too many unspecific sequences. “ting” solves these issues by applying an efficient algorithm for identifying antigen-specific k-mers based on Fisher’s Exact Test. This allows fast processing of large scale repertoires and an improved differentiation between naive and specific TCR3b sequences.

The full paper has been submitted for review.

16 May 2020 | New paper and software "xengsort" for xenograft sorting

Finally, our preprint “Fast lightweight accurate xenograft sorting” is online. Xenograft sorting classifies the (paired-end or single-end) reads of a xenograft sample according to species of origin. A typical application concnerns sequenced samples from patient-derived xenografts (PDX; tumors extracted from human patients and implanted into mice), where the reads have to be classified into human reads and mouse reads (and, possibly, reads that could originate from both species, reads from neither species, and ambiguous reads). We have developed an alignment-free approach based on 3-way bucketed Cuckoo hashing. Our tool “xengsort” is faster by a factor of 4 than existing alignment-free tools on typical PDX datasets.

A poster about this work will be presented at ISMB HiTSeq. The full paper has been submitted for review.

01 May 2020 | wg-blimp- new article on analysis pipeline for whole genome bisulfite sequencing data

Our article “wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data”has been published in BMC Bioinformatics journal.

We developed wg-blimp (whole genome bisulfite sequencing methylation analysis pipeline) as an end-to-end pipeline to ease whole genome bisulfite sequencing data analysis. It integrates established algorithms for alignment, quality control, methylation calling, detection of differentially methylated regions, and methylome segmentation, requiring only a reference genome and raw sequencing data as input. We improve on previous pipelines by providing a more comprehensive analysis workflow as well as an interactive user interface.

We demonstrated its applicability by analysing multiple publicly available datasets. Thus, wg-blimp is a relevant alternative to previous analysis pipelines and may facilitate future epigenetic research.

21 Apr 2020 | Fused lasso paper accepted at SEA 2020

Elias’ and Sven’s paper “Engineering Fused Lasso Solvers on Trees” was accepted at the 18th Symposium on Experimental Algorithms (SEA 2020), which will be held as an online conference from June 16 till June 18. Our paper presents two practically efficient algorithms for solving fused lasso problems on tree graphs with general weights for nodes and edges, even zero weights, which other algorithms cannot easily handle.

We hope to see you at SEA’20 for our presentation. Also, the TreeLas software is avaliable on our Software page.

20 Apr 2020 | Online-Lehre und Prüfungen ab 20.04.2020

Aufgrund der Beschlüsse der Landesregierung NRW und der einzelnen Universitäten in NRW findet ab 20.04. online-Lehre statt. Das betrifft insbesondere “Algorithmen auf Sequenzen” von Prof. Rahmann und die Vorbesprechung zum Seminar “Aktuelle Themen der Bioinformatik”. Ferner werden ausstehende Prüfungen (z.B. zu “Algorithmische Bioinformatik”) online durchgeführt.

06 Mar 2020 | Genome Informatics presenting at SIGOPT 2020

Two researchers from the Genome Informatics group attended the SIGOPT 2020 meeting in Dortmund and presented their work on continous and combinatorial optimization problems in bioinformatics.

Elias presented his fast solver for the fused lasso problem on tree graphs (joint work with Sven).

Jens gave a talk about joint work with Henning and Sven, titled “Cost-optimal assignment of elements in genome-scale multi-way bucketed Cuckoo hash tables”.

The bioinformatics session on Friday further included talks by David Blumenthal (Munich) on median graphs and by Sven Schrinner (Düsseldorf) on polyploid phasing.

04 Feb 2020 | Genome Informatics presenting at DSB 2020

Researchers from the Genome Informatics group attended the Data Structures in Bioinformatics (DSB 2020) in Rennes and presented their work.

Jens Zentgraf gave a talk about joint work with Henning Timm and Sven Rahmann, titled “Cost-optimal assignment of elements in genome-scale multi-way bucketed Cuckoo hash tables”.

Sven Rahmann presented joint work with Jens Zentgraf on “Faster xenograft sorting with 3-way bucketed Cuckoo hash tables”.

Even though it was raining in Rennes and Air France wasn’t able to fly from Düsseldorf to Rennes in under 9 hours, the meeting was a lot of fun and showcased many interesting new results.

Algorithmic Bioinformatics, SIC, Saarland University | Privacy notice | Legal notice