Subsequence combinatorics

Investigators: Sven Rahmann
Collaborators: Cees Elzinga & Hui Wang
Funding: internal
Software: SubSeqComb

We have investigated several combinatorial problems (enumeration problems) related to subsequences within a sequence.

A subsequence is obtained from a string by deleting any number of characters; thus in contrast to a substring, a subsequence is not necessarily a contiguous part of the string. Counting subsequences under various constraints has become relevant to biological sequence analysis, to machine learning, to the analysis of categorical time series in the social sciences, and to the theory of word complexity. We have investigated properties of subsequences that lead to efficient dynamic programming algorithms to count

distinct subsequences in a string,
distinct common subsequences of two strings,
matching joint subsequence embeddings in two strings,
distinct subsequences with a given minimum span,
sequences generated by a string allowing characters to come in runs of a length that is bounded from above.
sequences whose longest increasing subsequence has a given length.

Articles

Cees Elzinga, Sven Rahmann, Hui Wang: Algorithms for Subsequence Combinatorics. Theoretical Computer Science 409(3): 394–404, 28 December 2008.
Sven Rahmann: Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms. CPM 2006, LNCS 4009: 153-164.

Algorithmic Bioinformatics, SIC, Saarland University | Privacy notice | Legal notice