A general discussion
In general the running time of an algorithm or data structure vary with input size, although it may also vary for different inputs of the same size. Also, the running time is affected by the hardware environment (processor, clock rate, memory speed, disk speed) and software environment (operating system, programming language and compiler) in/on which the algorithm is implemented, compiled and executed. These are called environmental factor and it is not in our control to improve it.
Question: So what is in our control?
We being software programmers can improve and optimize the algorithm to run in the given space and time. If we are interested in characterizing an algorithm’s running time as a function of the input size.
Question: How is the running time measured?
The naive way or may I say experimental approach is running the algorithm multiple times with different input of the same size and also inputs of different sizes, plotting the graph of input size versus time to complete each run. The approach looks good but it also comes with the following drawbacks:
- Experiments can be done on a limited set of data and not on all possible data set.
- It is difficult to compare the experimental running times of two algorithms until the experiments were performed in the same environmental conditions.
- We have to fully implement the algorithm to conduct experiments.
There is a need for a better method of analysis or may I say analysis tool which can guarantee more significant results and allows us to avoid performing experiments. Here we will try to develop a general methodology for analyzing the running times of the algorithms that:
- .Takes into account all the possible inputs.
- Allows us to evaluate the relative efficiency of any two algorithms in a way that is independent form the environmental factors.
- Can be performed by studying a high level description of the algorithm without actually implementing it.
This methodology aims at associating with each algorithm a function f(n) that characterizes the running time of the algorithm as a function of the input size n. This precisely means that for an algorithm A we can say that the running time of A proportional to n or something similar. And this means, that given any circumstances the running time of A will not increase beyond cn where c is a constant which is primarily dependent on the environmental factors.
For e.g. when we have two algorithms A and B, running time of A is proportional to n and running time of B is proportional to n2 then we will definitely prefer A over B because A grows at a smaller rate compared to B.
Analysis of Algorithm
Let us talk about few primitive operation, they are defined below:
- Assignment: Assigning a value to a variable.
- Method invocation: Calling a method
- Arithmetic Operation
- Indexing into an array
- Returning from a method
A primitive operation corresponds to a low level instruction with an execution time that is constant.
Average Case and Worst Case Analysis
This is quite interesting to know that an algorithm may run faster on some inputs that it does on the others of the same size. Thus we may wish to express the running time of an algorithm as the function of the input size obtained by taking the average over all possible inputs of the same size. This is a real big task which requires a lot of mathematics and probability concepts and hence, we will not try to evaluate it to utmost accuracy.
On the other hand finding the worst case time is an easy task, if we can determine the worst set of inputs an average over that will give us the worst case running time. The good news is that if we design an algorithm in such a way that it is performing well with the worst inputs; it will mostly perform better with average case inputs.
The Asymptotic Notations
The Big O notation: Big O is mostly an upper bound guarantee, which means if I say the running time T(n) of algorithm A is O(n) where n is the size of the input, then T(n) <= cn for all n >= n0.
By definition of Big O, if f(n) and g(n) are two functions mapping non-negative integers to real numbers then we can say f(n) is O(g(n)) if there is a real constant c > 0 and an integer constant n0 >= 1such that f(n) <= cg(n) for n >= n0.
Let the running time T(n) for an algorithm A be defined as 4n – 3, then by the definition of Big O, we need to find a real constant c > 0 and an integer constant n¬0 >= 1 such that 4n – 3 <= cn for every integer n > n0 .
The good part about Big O: The Big O notation allows us to ignore constant factors and lower order terms and focus on the main components of a function that affects its growth. So using the Big O we can precisely write a concrete relation between the algorithms’s input size and its running time.
This means if the running time can be represented by a polynomial a0 + a1n + a2n2 + … + adnd then the highest degree term in the polynomial is the term that determines the asymptotic growth rate of that polynomial.
Example of Big O notation: 5n4 + 3n3 + 2n2 + 4n + 1 is equivalent to O (n4)
Justification: 5n4 + 3n3 + 2n2 + 4n + 1 <= (5+ 3+ 2+ 4+ 1) n4 = cn4 for c = 15, when n > n0 = 1
In general we should use the big O notation to characterize a function as closely as possible. While it is true that the function f(n) = 4n3 + 3n2 is O(n5) or even O(n4), it is more accurate to say that f(n) is O(n3).
The Big Ω notation: Just as the big O notation provides an asymptotic way of saying that a function is “less than or equal to” another function, the following notations provide an asymptotic way of saying that a function grows at a rate that is “greater than or equal to” that of another. This means f(n) >= cg(n), for n >= n0.
This definition allows us to say asymptotically that one function is greater than or equal to another, up to a constant factor.
Example of Big Ω notation: 3n logn + 2n is Ω(n logn)
Justification: 3n logn + 2n >= 3n logn, for n >= 2
The Big θ notation: In addition there is a notation that allows us to say that two functions grow at the same rate, up to constant factors. We say that f(n) is θ(n) if f(n) is O(g(n)) and f(n) is Ω(g(n)). This means there are real constants c’ > 0 and c’’ > 0, and an integer constant n0 >= 1, such that c’g(n) <= f(n) <= c’’g(n), for n >= n0.
Example of Big θ notation: 3n logn + 4n + 5logn is θ(n logn)
Justification: 3n logn <= 3n logn + 4n + 5logn <= (3+4+5) nlogn for n >= 2.
Increasing order of growth or most commonly used function in asymptotic analysis.
1 logn n nlogn n2 n3 2n
We discussed about the ways to analyze algorithms and the asymptotic notations in detail, we also analyzed the running time of the Insertion Sort problem. More on the topic in coming articles.
Stay connected and stay Subscribed