Dynamic Programming – Longest Common Subsequence

Introducing Dynamic Programming

And the bottom up approach to solve problems which exihibit traits as described above is called Dynamic Programming. Two re-iterate, the two similarities in all the problems which can be solved using the Dynamic Programming are:

  1. “An optimal solution to a problem contains and optimal solution to a sub problem.”
  2. “The recursive solution contains few distinct sub problems repeated more than once.”

Programming is a very old term and it was not always a synonym for ‘writing computer programs’. It meant to be a tabular way of calculation with rows and columns of instruction or data, and so are the programming languages. If you are familiar with the Assembly language, it also has the same tabular format of programming where each row contains a set of 2 or more columns and a row is an instruction which performs a task.

We use the tabular format to explain the solution to finding the length of the longest common subsequence using dynamic programming. We consider the same strings for this demonstration. Create a table with each character of first sequence as columns and each character of the second sequence as rows like the below table:
table
Now we definitely need to know how did we populate this table. Here is the rule,

We are evaluating each cell one at a time, suppose we are evaluating cell (i, j ) i.e. ith row and jth column, then we check if the character of row i matches the character of column j. Below two cases arise:

  • If the two characters are equal, we add one to the longest length calculated in the the previous row and column and store it in cell(i,j), it is the same case as the one we discussed where X[i] = Y[j], the length of the longest subsequence evaluates to be 1 + Li-1,j-1.
    That means we take the value from cell(i-1,j-1), add 1 to it and store it in cell(i,j).
  • If the two characters are not equal, we find the maximum value among the two cell(i, j-1) and cell(i-1, j). This is again in accordance of the statements we proved previously in the discussion where X[i] != X[j], Li,j = max(L<sub<i-1,j), Li,j-1).

The first row and column will always have the value zero because they don’t contain any character.

Lets calculate the value of cell(2,2) assuming the index starts at 0 ,0. The second row contains character D and the second column contains character B. Both the characters don’t match so the value of cell(1,2) will have the value which is maximum of [cell(2,1) and cell(1,2)] i.e.e maximum of (0,1) which is 1. Hence, cell(2,2) will contain the value 1. This also means that the length of the longest common subsequence of the two strings A B and B D is one.

Now we calculate the value of cell(2,5), the second row contains the character D and the fifth row contains D as well. Both the characters match so the value of cell (2,5) will be 1 + the value of cell (2-1, 5-1) as per the discussions we had above. Li,j = Li-1, j-1 + 1.

In a similar way we can populate the whole table. The value in the cell (M,N) is the length of the longest common subsequence. In this case it is 4. So what is the running time of the algorithm?

Clearly it is the time taken to populate the whole table, there is a constant amount of work involved in looking at the two values, comparing them and figuring out the value of a given cell in the table, it actually depends on the previous calculations. And this constant work has to be done for M * N cells, so the total running time is proportional to M * N.
We can say that T = O(M * N) . We can also define it as a stricter bound by using the theta notation, but I don’t want to dwell into that at this moment of time, we can just understand that it is proportional to M * N.

Same with the space complexity, which looks to be proportional to M * N because we have used a table with M rows and N columns. Well that can be reduced to Max(M, N) as you can see to calculate the value of a given cell, we need the current row and 2 cells from the previous row.

Source Code

Conclusion

So this is how we calculate the length of the longest subsequence, we also need to find the longest subsequences and write code for the same, which we plan to do in the next article. There we will calculate the actual time and space complexity of the whole algorithm along with part to find the LCS.

Hope this helps, happy learning. Stay connected and stay Subscribed