This series of articles is from the online sessions I conduct for working professionals as a part of the data structures and algorithm subject. I will keep posting each session’s article here so that it can be used by everyone who is not able to attend the sessions.
On the other hand if you are interested in watching a video for this article, you can subscribe our Youtube Channel.
What are data structures?
Data is the basis of information, in today’s e-age information is the most important asset. An important and un-deniable fact about data is that its volume is exponentially increasing and that brings a lot of challenges in making sense or extracting meaningful information out of it.
Few of the challenges about managing data are listed below:
- Storing data
- Keeping track of the places where data is stored
- Operating on the volumes of data to extract information
- Representing data as per requirement
- Regular back up of the data
There are many more to list but these are the major ones and in the coming sessions we will try to address these challenges in depth and breadth.
An example to explain the above is the problem statement of adding numbers. Let us say we are trying to add two numbers, the obvious choice is to use two memory locations M1 and M2 and store numbers N1 and N2 respectively. Use another memory location S1 to store the sum (M1+M2).
Let us extend this problem to add 50 numbers, here we see that the obvious choice changes, instead of choosing 50 individual memory locations M1, M2, M3 … M50 we prefer to create an array (a data structure) of length 50 and store the numbers sequentially. Run a loop through the array and add each number and store them in the memory location S1.
So do we notice the obvious problem here, we can see that it was very easy to find two memory locations to store two numbers which do not reside in an array but when we try to store 50 numbers, we want an array and it requires continuous memory location (which might not be as easy as finding 2 memory locations). This will further complicate in case we are trying to add 10 million numbers.
This small example is just to explain that data structures are a good solution to basic problems. Point worth noticing is that, they also introduce complexities and require a special attention to understand and use them efficiently.
Just to write a definition of the data structure, we can say that data structure is building a structure which contains a group of atomic data and obeys certain rules. A data-structure is a pre defined blue print which enforces a systematic data organization.
For e.g.: An array data structure enforces a rule that the elements will be stored in continuous memory locations and can only be accessed using the index of the memory locations. With complex data structures these rules become more complex.
Significance of data Structures
We can understand the significance of data structures when we dwell deep into the topic. This is important because just storing the data might not be enough, we may require processing the data as well, and it becomes more significant when we talk about conditional processing or selective processing of the stored data.
In simple words, the processing may be sorting the data, searching from the data, filtering data or doing complex operations. We will discover that few of this data processing can’t even be done without organizing data using special data structures. It is worth noting that for every problem there is an appropriate data structure (if not, we can write one).
There are few ways to represent data without using data structures. For e.g.: we can represent data using the language defined basic data types integer, character, byte, long, double, boolean, float and short. Anything apart from this would be represented using a data structure. There is another termed coined for the same, Abstract Data Type or ADT.
String is one of the ADT which is a composition of characters or we can say a string is an array of characters. Different languages implement it differently; in C it is an array of characters terminated with a null character. In java it is just an array of characters which once created cannot be modified and so on. Theoretically the String doesn’t have a limit on its max length.
Arrays are fixed size data structures, which means once we create an array we are bound to obey its size and cannot store more elements than its size. An array is an abstract data type which means that we can define an array of any type of element (it may be int, float, boolean, char etc).
Another point about arrays is that we can only access the elements in an indexed manner. So if we know that an element is stored at a given index we can directly fetch it. You can say that it is a great thing about arrays, when it comes to searching and fetching data from arrays.
Also, it is a good choice when we have a prior knowledge about the maximum number of elements to be stored. A possible drawback about using arrays is that we need to find continuous memory location of size equal to or more than the number of elements to be stored. But I believe space is not a big deal now a day.
Another drawback about arrays is an operation where we want to insert an element at an index k in a partially filled array. Consider an array A[1 …. N] of size N, where 1… m index is already occupied and we are trying to insert an element at index k where 1 < k < m < N in this case we are forced to move the elements from index k to index m one space to the right.
Similar is the case when we remove an element at index k from the array. We need to move rest of the elements from index k+1 to m by one space to the left.
We understand that arrays are good and handy data structures and we also understand that they have few drawbacks. As I say every problem has an appropriate data structure to solve it so we can say that arrays are not the perfect data structures for problems like frequent addition and removal.
This is the precise reason why we look for data structures like Linked List; they address the problem of frequent additions and removals. Linked List is a composite data structure which consists of multiple nodes. Also, it does not require continuous memory locations because each node of the Linked List stores the link to the next node.
Another fact about this data structure is that it doesn’t put a restriction of defining the maximum size while creating; hence it is an ever growing data structure which is an added advantage over the arrays. This in fact is a very important factor of choosing a linked list over then fixed size data structures.
Most of the additions and removals are just done by manipulating the links which are connecting the various nodes. So this data structure addresses most of the drawbacks of arrays, and we do a trade-off with a different set of drawbacks listed below:
1) Element access is sequential, which means even if we know the location of the element to be accessed, we have to start all the way from the beginning and visit each element in the path.
2) More complex than managing arrays, for each node we add or remove, a special care has to be taken to rightly link the remaining nodes, else there is a possibility of data loss.
Another fascinating data structure is the Stack; it is also called a Last In First Out data structure. This simply means that the order of pushing(inserting) the elements into stack is reverse to the order of popping(removing) elements from the stack.
There are many problems which can be very easily solved by using the Stack data structure, the simplest being reversing an array/string.
Consider the array A B C D E F G H I J, and the problem at hand is to store the elements in the same array but in the reverse order. We can definitely do swapping for two elements one picked from the end and the other from the beginning.
Another good way is to use a Stack and push each element one after the other till the end. After we reach to the end of the array, we start popping from the stack and re-create the array in the reverse order.
One thing worth noting about the Stack ADT is that the underlying data structure which makes the Stack implementation possible is either an array or a linked list.
The Queue on the other hand is a data structure which enforces a different rule, First In First Out. This means, the order of inserting and removing the elements is exactly the same. The problems addressed by this data structure are real world problems like assembly line processing, task management and task scheduling etc.
The Map is a very important data structure and should be given its own time; when we discuss maps we need to implicitly discuss Hash tables and the concepts of Hashing. Maps primarily means storing elements against a known identifier called key and the element itself is the value.
The underlying data structure is a two dimensional array (preferred) or a linked list with a complex node structure (backed by a one dimensional array). To understand more about a map, it is like a table with two columns, the first column contains the key identifier and the second column contains the actual element.
This data structure promises a lot on the performance side for the simple operations like searching, retrieving and deleting, because of which it is an ideally candidate for all the caching related real world problem. As the underlying data structure is an array, it eventually runs into the issue of fixed length and restricted growth, which demands creation of a new Map in case we overrun the limits of the map in use. This will require allocation of whole new memory for the complete map and we have to take care and avoid frequent remapping or recreation.
The map doesn’t allow duplicate keys and if an attempt is made to insert a key value pair where the key already exists then its value is replaced by the new value and the older value is lost.
The Set obeys the mathematical definition of set and only allows unique elements. An attempt to add a duplicate element must be discarded and the previous element must be retained into the set. Similar to the Stack, Queue and Map the Set also has the array/linked list as the underlying data structure.
The Set addresses the problem of finding duplicates in a list of elements or we can say removal of duplicates from a list.
This was just an introduction to a series data structures and algorithms.
Stay connected and stay Subscribed