Hierarchical Data Structure – Tree ADT

Introduction

This is the second article in the series of non-linear data structures, to read more about the previous article, please check the topic HIERARCHICAL DATASTRUCTURE – DETAILED DISCUSSION. To get updates on the other articles in the series, please use the Subscribe feature on the right panel.

Purpose of article

This article will mostly focus on the coding aspect of what we learnt in the first article, it will also make us eligible to go hand in hand with the codes and we can simultaneously test what all we are learning. We will define all the structures which will end up creating a tree.

We will also define the Tree ADT (Abstract Data Structure) and operations which can be performed on the Tree.

The Tree ADT

What is an ADT or an Abstract Data Structure?
It is an abstract definition or contract the API has to obey and promise the user to deliver what it says. There are two things in an API which can be considered pretty much independent of each other. The contract and the implementation, we mostly try to write a contract before providing an implementation. This is good for two reasons, first – we define the goal and hence seal the requirement (with a minor scope for changes) before even starting the implementation, second – we make our intentions clear to the user of the API and give them a hint of what they can expect.

The ADT should be such that it can handle all sort of variations in the data structure, for e.g. if I define an ADT for Tree, then it must be capable of handling Binary Tree, N-ary Tree, Red-Black Tree, AVL Tree, Heap etc.

Defining the ADT

What is a tree made of? From the previous article we can get the obvious answer and that is nodes. So the basic thing which should be available before defining a tree would be Node.
What is the minimum information a node must contain? I would like to describe it in a simple way as below, a node must contain:

1) A link to its parent node.
2) The object stored at the node.
3) An array or list of links to its children.
4) An id to uniquely identify itself in the tree (this is not mandatory, there are ways to identify a node without a node specific identifier.

With this basic information we would like to define our class which can represent the Node of a tree. A node is basically represented by a position in the tree, so we will create a Position interface, which would return the element stored at this Position.

Also, there might be a situation where we might try to access a position which doesn’t exist. In that case we must throw an exception, so we will also define an exception class and may call it InvalidPositionException

Now that we have the Position interface ready, lets define our Tree interface, I will describe what each method is required for and what is the running time of each method.

size() :

This method is required to find the total number of nodes present in the tree. It must take O(1) time i.e. constant time, this is a method which is used very frequently so we cannot afford if has running time more than constant running time.

We achieve this by maintaining a size variable and increment it every time we add a node, we also decrement it every time we remove a node from the tree. So when we call this method it just has to return the value of the size variable.

isEmpty() :

This method is used to check if the tree contains any node or it is empty. The method must return a boolean variable whose value would be true if the value of the size variable is 0. If the tree contains at least one node the method would return false.

This takes O(1) i.e. constant running time because it just has to compare the value of the size variable to zero and return either true or false.

elements() :

This method is used to return an iterator over the the list of nodes(positions) present in the tree, for this we have to visit each node at least once, that makes the running time O(n) means proportional to n, where n is the total number of nodes.

positions() :

This method is similar to the elements(), the only different being the positions() method returns an iterator over the positions present int he tree and the elements(), method returns an iterator over the elements present at the respective positions.

This will take O(n) time for the same reason as the elements() method.

replace(Position v, Object e) :

This method is to replace the element present on position v with the new element e, and return back the old element. This takes O(1) or constant time because because we already know the position and we just need to store the element at this position in a temporary variable and put the new element at the position.

root() :

This method is used to get the root of the tree, it takes O(1) or constant time. The tree already has the pointer to the root, it just needs to return it when the root() method is called.

parent(Position v) :

This method returns the parent node/position of the given node/position v. It takes O(1) or constant time because every node maintains the information about its parent, so it just needs to return the parent.

children(Position v) :

This method returns an iterator over the list of child nodes for the given position/node v. It takes O(Cv) running time where Cv is the total number of children of a given node v.

isInternal(Position v) :

This method returns a boolean value true if this node is not among the leaf nodes of the tree, false otherwise. This will take O(1) i.e. constant running time because the information about the children of a given node is readily available with each node.

isExternal(Position v) :

This method returns a boolean value true if this node is among the leaf nodes of the tree, false otherwise. This will take O(1) i.e. constant running time for the same reason as the isInternal(Position v) method.

isRoot(Position v) :

This method returns a boolean value true if this node is the root node. This will take O(1) i.e. constant running time because the information about the root is available with the tree object.

So, now we have our Tree ADT defined well, we have decided to provide all the above methods in a tree ADT which can be used by any concrete class. Now we can try to write all the trees we discussed in the first article HIERARCHICAL DATASTRUCTURE – DETAILED DISCUSSION

The Binary Tree

With the help of the above ADT definition, we will try to define some more structures to adapt it into a binary tree. We are more inclined towards a binary tree because that is the simplest and most common one.

This is how a binary tree looks like
tree2

As we saw the Position interface for the generic Tree ADT, we find that it is not sufficient for the Binary trees, the binary tree position/node should also contain the information about its parent and its left and right child. We go ahead and define a BTPosition interface.

I will explain the methods which are a part of the BTPosition.

setLeft(BTPosition p) :

It sets the node p as the left child of the present node. The running time would be O(1) i.e. constant, because we have the node to which we want to add the left child, so its a one step process.

setRight(BTPosition p) :

It sets the node p as the right child of the present node. The running time would be O(1) i.e. constant, the reason remains the same as of the setLeft(BTPosition p).

setParent(BTPosition p) :

It sets the node p as the parent of the present node. The running time would be O(1) i.e. constant, the reason remains the same as of the setLeft(Position p).

The other three are just getters method which return the left child, right child and the parent of the present node.

Similar to the Position interface we discover that the Tree interface requires some more specialized methods to make the binary tree operations easy. For e.g. what if we want the left or right child? What if we want to check if the right or left child exist? It might not be easy and constant time operation with the existing Tree ADT.So we define a BinaryTree interface.

BinaryTree

Now lets define the methods in the BinaryTree interface:

left(BTPosition p) :

The method returns the left child of the node p. Takes O(1) constant time because the BTPosition has the information about it’s left and right child.

right(BTPosition p) :

The method returns the right child of the node p. Takes O(1) constant time, reason is same as above.

hasLeft(BTPosition p) :

The method returns a boolean value true if the node p has a left child, false otherwise. Takes O(1) constant time, reason is same as above.

hasRight(BTPosition p) :

The method returns a boolean value true if the node p has a right child, false otherwise. Takes O(1) constant time, reason is same as above.

Conclusion

Now we have almost all the structures defined which will help us understand and learn the Binary Tree further. We will continue the same in the next article on the topic, where we will
implement a concrete Binary Tree and will also implement some algorithms which we mentioned in article one.

Hope this helps, happy reading. Don’t forget to subscribe to get updated articles and notifications.

  • Ranjana

    Much awaited.. waiting for more ..