Algorithmic Approaches to the String Barcoding Problem

نویسندگان

  • Günther Raidl
  • Philipp Neuner
چکیده

This thesis deals with a heuristic approach based on Lagrangian relaxation to the string barcoding (SB) problem, a close cousin to the well-known combinatorial set cover (SC) problem. It has recently been proven to be NP-hard and has many real-world applications, particularly in the fields of medicine and biology. Given a set of sequences over some alphabet, DNA for instance, we aim at finding a set of short sequences, so-called probes, such that we are able to identify an unknown sample sequence as one of the input sequences by determining which probes are subsequences of the sample, and which are not. The problem is twofold: the determination of all possible probes and the selection of a suitable subset of minimum cardinality. The problem has been dealt with under various other names and has in this form been introduced by Rash and Gusfield in 2002. They proposed an exact approach based on integer linear programming and the use of suffix trees to generate a complete, nonredundant set of candidate probes. We evaluated several approaches for the SB as well as the SC problem. One of the leading heuristics for the SC problem, based on Lagrangian relaxation, has been proposed by Caprara et al. in 1999. We adapted the algorithm to see if it works equally well when applied to the structurally very similar SB problem. Though the results we obtained are somewhat mixed, the heuristic shows its strength with very complex instances and delivers much better results compared to simpler heuristics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithmic Perspectives of the String Barcoding Problems

1.1 INTRODUCTION Let Σ be a finite alphabet. A string is a concatenation of elements of Σ. The length of a string x, denoted by |x|, is the number of the characters that constitute this string. Let S be a set of strings over Σ. The simplest " binary-valued version " of the string barcoding problem discussed in this chapter is defined as follows [3, 17]: Problem name: String barcoding problem (S...

متن کامل

The String Barcoding Problem

In this paper we consider an approach to solve the string barcoding problem. This approach is based on an explicit reduction from the problem to the satisfiability problem.

متن کامل

Semi-local String Comparison: Algorithmic Techniques and Applications

The longest common subsequence (LCS) problem is a classical problem in computer science. The semi-local LCS problem is a generalisation of the LCS problem, arising naturally in the context of string comparison. Apart from playing an important role in string algorithms, this problem turns out to have surprising connections with computational geometry, algebra, graph theory, as well as applicatio...

متن کامل

Highly Scalable Algorithms for Robust String Barcoding

String barcoding is a recently introduced technique for genomic based identification of microorganisms. In this paper, we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size, on a well equipped workstation. Experimental results on both...

متن کامل

Fast Kernel Methods for SVM Sequence Classifiers

In this work we study string kernel methods for sequence analysis and focus on the problem of species-level identification based on short DNA fragments known as barcodes. We introduce efficient sorting-based algorithms for exact string k-mer kernels and then describe a divide-and-conquer technique for kernels with mismatches. Our algorithm for the mismatch kernel matrix computation improves cur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007