Count Binary Search Trees created from N unique elements

Formulating the Solution

I would always prefer to sort the array before we start the calculation. By sorting this we make the understanding easy, when we are considering the ith element as the root, we know that 1 to i-1 are in the left of i in the resultant tree and i+1 to N are on the right of i in the resultant tree, you can do it without sorting, but that might take a lot of calculation to figure out the same thing in each iteration.

TreeCount(N) = TreeCount(1 as root) + TreeCount(2 as root) + … + TreeCount(i as root) + … + TreeCount(N as root). We can also denote this in a better way

When we know that for a given root RT if there are 3 elements in the left, then we know there are five possibilities in the left as in Answer 4. If we have two elements in the right of RT, then we know there are 2 possibilities in the right as in answer 3. Then we add both of them i.e. 5 * 2 = 10 for that is the total number of possible trees for the root RT.

Hence we iterate through the sorted array and figure out the solution considering each element as the root. Below is the pseudo code for the same:

Explaining the Pseudo Code

This is simple, if the number of elements is 0 we return 1 i.e. an empty tree is possible. If the number of elements is we still return 1 because only one tree is possible and the element is the root of the tree.

For number of elements greater than 1 we iterate through each element of the array and for each element k as root we find the number of trees possible for k-1 elements in the left and the number of trees possible for n-k elements in the right. Now, these two are independent of each other so to find the total trees for a given root, we multiply the number of trees in left and right and add it to the sum.

At the end of the iteration we return the sum total.


If we closely analyze for each N the code makes N + N recursive calls i.e. 2N recursive calls. I cannot disagree to the fact that it is a really horrible running time, it is certainly exponential, and one of the worst exponential. The reason is very simple, it is trying to calculate one sub problem repeated number of times.

We can use memoization to store the results of all the calculated sub-problems and that will certainly help improving the running time because now before calling any recursion we will perform a constant time look up. Thus the exponential thing can be improved to get quadratic running time. This is demonstrated in the second code below:

Source Code

The above code starts crying for elements count more than 30 because it is a big multiplication and mostly repeated sub-problems are calculated again and again. Hence I thought of using memoization and only calculate the sub-problems if it is not already solved.