Problem B
Gorilla
In biology, a recurring problem is to find optimal alignments between two strings of DNA, RNA, or proteins. One example of this is measuring the similarity of different protein sequences found in different animals to understand how related they are.
Given a set of amino acid sequences belonging to different animals, we want to produce an optimal alignment. An alignment between two strings $X$, $Y$ is a pair of alignment strings $X_ a, Y_ a$ of the same length, where each alignment string consists of the original string, but with zero or more ‘-’ characters (called gaps) inserted. An alignment is considered optimal if it maximises the score. The score is calculated by the sum of each aligned character pair in the two strings. A gap compared to anything always gives the score $-4$. The score of two characters from the set ARNDCQEGHILKMFPSTWYVBZX is determined by a specific BLOSUM-matrix. The matrix and code snippets for generating the matrix are attached.
For example, the strings $\texttt{KATTIS}$ and $\texttt{KATIS}$ can be aligned
as:
$\texttt{KATTIS}$
$\texttt{KAT-IS}$
This is an optimal alignment since it produces the maximal
score $5 + 4 + 5 - 4 + 4 + 4 =
18$.
Input
The first line contains integers $N$ and $Q$, such that $1 \leq N \leq 20$ and $1 \leq Q \leq 100$. Then follow $2N$ lines of organism names and their amino acid sequences. For each organism, the first line is their name, and the second line is a string representing the amino acid sequence. Then follow $Q$ lines of queries, where each query is a pair of names separated by a space.
An organism name consists of a unique word of at most 20 characters, using only characters in the range {a—z,A—Z}.
Each amino acid sequence consists of a string of length at most $200$ where each character represents an amino acid (from the set ARNDCQEGHILKMFPSTWYVBZX).
Output
For each of the $Q$ queries, output two lines containing an optimal alignment for the two amino acid sequences. The first line should contain the alignment string corresponding to the first organism, and the second should contain the alignment string corresponding to the second organism. If there exists multiple optimal alignments output any of them.
Sample Input 1 | Sample Output 1 |
---|---|
2 1 katis KATIS kattis KATTIS kattis katis |
KATTIS KAT-IS |
Sample Input 2 | Sample Output 2 |
---|---|
1 1 a A a a |
A A |
Sample Input 3 | Sample Output 3 |
---|---|
3 3 Sphinx KQRK Bandersnatch KAK Snark KQRIKAAKABK Sphinx Snark Sphinx Bandersnatch Snark Bandersnatch |
KQR-------K KQRIKAAKABK KQRK K-AK KQRIKAAKABK -------KA-K |