Dynamic Programming – Longest Palindromic Sequence

Introduction

Palindromes are fascinating character sequences in a string. A palindrome is a string which reads the same when read from either of the ends. This post in particular talks about palindromic sub sequences.

To know more about a sub sequence, please check my post on Longest Common Sub sequence.

A palindromic sequence means a sequence of characters which is a palindrome. Now, we must understand it clearly that we are talking about a sub sequence and not a substring.

Understanding the Longest Palindromic Subsequence problem better

It is really easy to say if a string is a palindrome. We start with two pointers one at each end and match the characters pointed by the pointers. If a match occurs we shrink the range by one from both ends and then match the corresponding characters.
If a match is not found we return false. If the pointers meet or cross, we return a true.

Pretty Simple!

Now, the job is to identify the longest palindrome sequence in a given string. Here is an example string. Let us identify all the palindromic sub sequences here.

Input String : ACECCBA

One letter palindrome sequence – A, B, C, E
Two letter palindrome sequence – AA, CC
Three letter palindrome sequence – ACA, AEA, ABA, CEC, CCC
Four letter palindrome sequence – ACCA
Five letter palindrome sequence – ACECA, ACCCA

There are no more palindrome sequences possible. And of course the longest ones are five letter long (ACECA and ACCCA).

Approaching the problem

Let us analyze what we know about the problem. Two things:

  • For a simplified understanding, we need to identify the smaller palindromes first and then build our solution upon them. Means, we would require a bottom up approach
  • In the process of identifying the bigger palindromes, we may end up looking at the smaller ones many times as they will be included in the bigger ones.

This matches our hallmarks of dynamic programming which we have discussed in one of the previous posts.

Also, as there are so many sub sequences to evaluate, approx 2N where N is the number of characters in the given string. It is advisable to not go for a brute force solution. We may come up with other smarter solutions later, but here we are discussing the dynamic programming way of solving the problem.

The big Idea

If we consider one character at a time, then the longest palindrome possible is 1 character long. This is trivial.
If we consider two characters at a time, then the length of the longest palindrome sequence from those two characters is decided by the following two cases:

  • If both the characters match then the length would be 2.
  • Else the length would be 1

Similarly, if we consider three characters sequences, it can be consider as the following:

  • If the end characters match then the length would be 2 + the length of longest palindrome formed by the remaining characters (and remember we already found this in a previous step).
  • If the end characters do not match then the length would be max of the length of the longest palindrome formed by the first two characters or the length of longest palindrome formed by the last two characters.

In a similar way we can define the larger problem in terms of a smaller problem we solved earlier.

Here is a demonstration:
Let us consider the the string ACECC
Considering one character at a time we get five palindromes each of length one.
Considering two consecutive characters at a time:

  • (AC) – max length of the palindrome sequence is 1, because both the characters won’t match.
  • (CE) – max length of the palindrome sequence is 1, because both the characters won’t match.
  • (EC) – max length of the palindrome sequence is 1, because both the characters won’t match.
  • (CC) – max length of the palindrome sequence is 2, because both the characters match.

Now consider three consecutive characters at a time:

  • (ACE) – max length of the palindrome sequence is 1, because the end characters won’t match.
  • (CEC) – max length of the palindrome sequence is 2 + max length of the palindrome sequence formed by the remaining character (E), because the end characters match. The answer is 3.
  • (ECC) – max length of the palindrome sequence is max of (the length of the palindrome sequence formed by the characters (EC) or the length of the palindrome sequence formed by the characters (CC) ) because the end characters won’t match. The answer is max (1, 2), and the answer is 2.

Now consider four consecutive characters at a time:

  • (ACEC) – max length of the palindrome sequence is max of (the length of the palindrome sequence formed by the characters (ACE) or the length of the palindrome sequence formed by the characters (CEC) ) because the end characters won’t match. The answer is max(1, 3), and the answer is 3.
  • (CECC) – max length of the palindrome sequence is 2 + max length of the palindrome sequence formed by the remaining character (EC), because the end characters match. The answer is 2 + 1, and the answer is 3.

Now consider five consecutive characters at a time:

  • (ACECC) – max length of the palindrome sequence is max of (the length of the palindrome sequence formed by the characters (ACEC) or the length of the palindrome sequence formed by the characters (CECC) ) because the end characters won’t match. The answer is max(3, 3), and the answer is 3.

And the final answer to the question for input string ACECC is 3. So on so forth… we can do this till we have considered all the characters in a given string in case of lengthier strings.

Here is the DP table for the string STR with value ACECC
Longest Palindromic Subsequence

How did we populate this table?
The value in cell (i,j) is the length of the longest palindrome sequence over the range of characters from index i to j.

That means the diagonal element always mean we are considering the range from i to i, which means one character at a time.

Hence, the longest palindrome over one character is of length 1. So, populate all the diagonal element value to 1.

Now as per our above definition if STR[i] != STR[j] then the cell(i, j) must contain the max of M[i,j-1] and M[i+1, j]. Where M[i, j-1] represents the length of the longest palindrome between index i and j-1 and M[i+1, j] represents the length of the longest palindrome between index i+1 and j.

For a concrete example consider i = 1 and j = 5, and lets say that STR[1] != STR[5] then we M[1][5] = max of M[1][4] and M[2][5].
Let us say that STR[1] == STR[5] then M[1][5] = 2 + M[i+1][j-1] = 2 + M[2][4]

The value at the top right corner of the table is the length of the longest palindrome sequence.