Find Interleaving Strings using Dynamic Programming

Introduction

This problem is already discussed in the post here. This post attempts to solve the same using the dynamic programming approach. Here is a brief description of the problem. Given three strings A, B and C such that A is “ab”, B is “ade” and C is “adabe” find if C is an interleaving of A and B or not.

Interleaving means constructing the third string C using the characters of A and B such that the characters in A and B preserve their relative ordering in C. In the above example C is an interleaving of A and B.

Approach to Find Interleaving Strings using Dynamic Programming

How do we know if this problem can be solved using dynamic programming?

Identifying a problem which can be solved by dynamic programming

If the nature of a problem satisfies the hallmark of dynamic programming then it can be solved by this methodology. So, to learn about the hallmarks of dynamic programming please read my previous post here.

In brief if a problem has the following two properties, it can be a good candidate of the dynamic programming methodology:

  • The optimal substructure – which means an optimal solution to the problem contains the optimal solution to a sub problem
  • Overlapping sub problems – A recursive solution contains a “small” number of distinct sub problems repeated many times

Meaning of optimal Substructure

The optimal substructure relation for this problem means, if the lengths of A and B are M and N respectively, then if A(1..M), B(1..N) interleaves C(1..M+N) then for 1 <= K < M and 1 <= L < N, A(1..K) and B(1..L) will interleave C(1..K+L)

Meaning of Overlapping sub problems

The problem here asks for finding if A and B interleaves to C. Here is a pictorial representation of the problem space:

Find Interleaving Strings using Dynamic Programming

That’s a very complex tree, let us define what does this tree represent.

The root of the tree represents a problem with two strings A = ab, B = ade and it tries to find all the interleavings from A and B. The interleavings can be constructed either by taking the first character of A or B. Hence, two branches

  • Left Branch – this branch takes from the first string and reduces the problem to (A = b, B = ade)
  • Right Branch – this branch takes from the second string and reduces the problem to (A = ab, B = de)

This further breaks down until both the strings are exhausted.  Check out the leaf nodes for the empty strings.

Some cool facts which we can derive from this tree are as below:

  • The number of leaves in the tree, is the total number of interleavings which can be created using A and B.
  • If we add up all the characters removed at each level in a path from root to leaf it will result into one interleaving
  • There are multiple sub problems which are overlapping, all nodes filled with same colors are exactly same sub problems. We call them overlapping
  • The height of the tree, is the length of the interleaved string.

The above tree basically is the recursion tree, for generating all the interleavings from two strings.

No matter how different a problem is, if that can be solved by dynamic programming, you will be able to find these two properties in the problem.

Solution to current problem

Now that we know so much about the problem, lets devise a dynamic programming solution. Before we go into the explanation, let us see the below matrix

interleavingdp

What does this grid signifies?

The header row represents the characters of string B and the first column represents the characters of string A and string C is “adabe“. Each cell (i, j) in the grid contains either a T (true) or F (false). If G(i,j) is true, that means A(1..i), B(1..j) interleaves C(1..i+j).

For e.g: G(1,1) is F(false), which means If A = a and B = a then they cannot interleave C(1..2) which is equivalent to ad. 

Let us do it once again for G(2, 2) which is T(true). If A = ab and B = ad then they can interleave C(1..4) which is adab.

Now that we know that this table works somehow, let us understand how to build this table or grid. First we initialize two counters i and j with zero. Here are the equations based on above examples and derivations.

  • If A[i] == C[i+j] then G[i][j] = G[i-1][j],
  • If B[j] == C[i+j] then G[i][j] = G[i][j-1]
  • Else C[i][j] = F

Source Code

Analysis

The code uses a grid of size M * N where M is the length of string A and N is the length of string B. Hence the space complexity is O(M*N).

The time complexity of the algorithm is O(M*N) because there is nested loop which runs for M * N times.

Summary

Mostly dynamic programming based solutions have running time of form O(M*N), the space can vary because you really do not need to preserve all the cells, only two rows are required to arrive at the solution.

Please comment below in case you need some more explanation on this post. I would be happy to address the concerns.

  • Rashidul Hasan

    Could you please check the code for test cases “ab”,”bc”,”babc” and “aabc”,”abad”,”aabadabc”?

    • I am checking this. Thanks for pointing out. I think the second test case fails for the given code. Will revisit the post soon,