CS 241 Data Structures- Last Lab -- DNA sequence matching, Due 4/27/2017
This simple bioinformatics exercise will help cement your knowledge of dynamic
programming. Your task is, given n sequences of residues (e.g. ACGT, GTATA, ..., CATAT), to compute the best match,
i.e. the most likely original alignment for each pair (under the assumption that they are both
descendants of a common ancestral string); and then, report which pair is the most closely related (along with their match score).
Details
Use the simple algorithm presented in class (from
here) with the values:
- match = 2
- mismatch = -1
- gap = -4
Each sequence will be stored in file in here. Your program
should input all the sequences, then compute the best match for each pair, and display that value along with the alignment that produces it. For example, if
the first two sequences were AT and CAT, it would display
value=0
-AT
CAT
Or, if you were being fancy, you might make it look just like the wikipedia alignments; i.e.:
-AT
|||
CAT
Once you can correctly calculate the values for each pair, select the pair with the highest value, and display that (along with their alignment).
Extra credit
- Make the match, mismatch and gap values variable under the user's control
- Use a similarity matrix to set mismatch scores (as later in the wikipedia article)
How to get credit
- Demo your program on or before the due date. Extra credit is available at the rate of
5%/day for programs demonstrated early.
- Zip and email me your project.