CS 241 Data Structures- Last Lab -- DNA sequence matching, Due 4/27/2017

This simple bioinformatics exercise will help cement your knowledge of dynamic programming. Your task is, given n sequences of residues (e.g. ACGT, GTATA, ..., CATAT), to compute the best match, i.e. the most likely original alignment for each pair (under the assumption that they are both descendants of a common ancestral string); and then, report which pair is the most closely related (along with their match score).

Details

Use the simple algorithm presented in class (from here) with the values:

Each sequence will be stored in file in here. Your program should input all the sequences, then compute the best match for each pair, and display that value along with the alignment that produces it. For example, if the first two sequences were AT and CAT, it would display

   value=0
   
   -AT
   CAT
Or, if you were being fancy, you might make it look just like the wikipedia alignments; i.e.:
   -AT
   |||
   CAT
Once you can correctly calculate the values for each pair, select the pair with the highest value, and display that (along with their alignment).

Extra credit

  1. Make the match, mismatch and gap values variable under the user's control
  2. Use a similarity matrix to set mismatch scores (as later in the wikipedia article)

How to get credit