I have created a binary search tree in C language, when i am testing my tree, the insertion and search operations take different times to execute. for example, i have two scenarios, inserting random values from 1 to 10000 and inserting sorted values from 1 to 10000. when i insert random values from 1 to 10000 into my BST then it takes less time than inserting sorted values from 1 to 10000 into my BST.
the same for search operation to be executed in my BST it takes less time while i am searching in those random values, but takes too long while searching in sorted values in my BST.
Now, the problem is the time complexity, can anyone explain, how is this handled? what is the time complexity for all four cases?
Note: Inserting and searching those sorted values almost take the same time, still searching takes a bit longer!
If you don't balance the tree, its structure depends on the insertion order, and a "fully unbalanced" binary search tree is equivalent to a sorted linked list.
Thus, the worst case time complexity for your operations is linear in the tree's size, not logarithmic as it would be in a balanced tree.
For instance, if you insert from 1 and incrementing, you'll end up with
1
/\
2
/\
3
/\
...
where the right "spine" is a linked list.
Use AVL Tree. It will keep your tree balanced and you will always get search time of log(n)
Related
I understand that binary search cannot be done for an unordered array.
I also understand that the complexity of a binary search in an ordered array is O(log(n)).
Can I ask
what is the complexity for binary search(insertion) for an
ordered array? I saw from a textbook, it stated that the complexity
is O(n). Why isn't it O(1) since, it can insert directly, just like
linear search.
Since binary search can't be done in unordered list, why is it
possible to do insertion, with a complexity of O(N)?
insertion into list complexity depends on used data structure:
linear array
In this case you need to move all the items from index of inserting by one item so you make room for new inserted item. This is complexity O(n).
linked list
In this case you just changing the pointers of prev/next item so this is O(1)
Now for the ordered list if you want to use binary search (as you noticed) you can use only array. The bin-search insertion of item a0 into ordered array a[n] means this:
find where to place a0
This is the bin search part so for example find index ix such that:
a[ix-1]<=a0 AND a[ix]>a0 // for ascending order
This can be done by bin search in O(log(n))
insert the item
so you need first to move all the items i>=ix by one to make place and then place the item:
for (int i=n;i>ix;i--) a[i]=a[i-1]; a[ix]=a0; n++;
As you can see this is O(n).
put all together
so O(n+log(n)) = O(n) that is why.
BTW. search on not strictly ordered dataset is possible (although it is not called binary search anymore) see
How approximation search works
I'm currently working on a project which involved getting the amount of conflicts between two arrays. This means the differences in order in which certain numbers are placed in the array. Each number only occurs once and the two array's are always the same size.
For example:
[1,2,3,4]
[4,3,2,1]
These two arrays have 6 conflicts:
1 comes for 2 in the first array, but 2 comes for 1 in the second, so conflict + 1.
1 comes for 3 in the first array, but 3 comes for 1 in the second, so conflict + 1.
etc.
I've tried certain approaches to make an algorithm which computes the amount in O(n log n). I've already made one by using Dynamic Programming which is O(N²), but I want an algorithm which computes the value by Divide and Conquer.
Anyone has any thought on this?
You can also use self balancing binary search tree for finding the number of conflicts ("inversions").
Lets take an AVL tree for example.
Initialize inversion count = 0.
Iterate from 0 to n-1 and do following for every arr[i]
Insertion also updates the result. Keep counting the number of greater nodes when tree is traversed from root to leaf.
When we insert arr[i], elements from arr[0] to arr[i-1] are already inserted into AVL Tree. All we need to do is count these nodes.
For insertion into AVL Tree, we traverse tree from root to a leaf by comparing every node with arr[i[].
When arr[i[ is smaller than current node, we increase inversion count by 1 plus the number of nodes in right subtree of current node. Which is basically count of greater elements on left of arr[i], i.e., inversions.
Time complexity of above solution is O(n Log n) as AVL insert takes O(Log n) time.
The best case scenario of insertion sort is meant to be O(n), however, if you have 2 elements in an array that are already sorted, such as 10 and 11, doesn't it only make one comparison rather than 2?
Time complexity of O(n) does not mean that the number of steps is exactly n, it means that the number of steps is dominated by a linear function. Basically, sorting twice as many elements should take at most twice as much time for large numbers.
The best case scenario for insert sort is when you can insert the new element after just one comparison. This can happen in only 2 cases:
You are inserting elements in from a reverse sorted list and you compare the new element with the first element of the target list.
You are inserting elements from a sorted list and you compare the new element with the last one of the target list.
In these 2 cases, each new element is inserted after just one comparison, including in the case you mention.
The time complexity would be indeed O(n) for these very special cases. You do not need such a favorable case for this complexity, the time complexity will be O(n) if there is a constant upper bound for the number of comparisons independent of the list length.
Note that it is a common optimization to try and handle sorted lists in an optimized way. If the optimization mentioned in the second paragraph above is not implemented, sorting an already sorted list would be the worst case scenario, with n comparisons for the insertion of the n+1th element.
In the general case, insertion sort on lists has a time complexity of O(n2), but careful implementation can produce an optimal solution for already sorted lists.
Note that this is true for lists where inserting at any position has a constant cost, insertion sort on arrays does not have this property. It can still be optimized to handle these special cases, but not both at the same time.
Insertion sort does N - 1 comparisons if the input is already sorted.
This is because for every element it compares it with a previous element and does something if the order is not right (it is not important what it does now, because the order is always right). So you will do this N - 1 times.
So it looks like you have to understand a big O notation. Because O(n) does not mean n operations, it does not even mean close to n operations (n/10^9 is O(n) and it is not really close to n). All it mean that the function approximately linear (think about it as limit where n-> inf).
Why do people use binary search trees?
Why not simply do a binary search on the array sorted from lowest to highest?
To me, an insertion / deletion cost seems to be the same, why complicate life with processes such as max/min heapify etc?
Is it just because of random access required within a data structure?
The cost of insertion is not the same. If you want to insert an item in the middle of an array, you have to move all elements to the right of the inserted element by one position, the effort for that is proportional to the size of the array: O(N). With a self-balancing binary tree the complexity of insertion is much lower: O(ln(N)).
What is the benefit of a binary search tree over a sorted array with binary search? Just with mathematical analysis I do not see a difference, so I assume there must be a difference in the low-level implementation overhead. Analysis of average case run time is shown below.
Sorted array with binary search
search: O(log(n))
insertion: O(log(n)) (we run binary search to find where to insert the element)
deletion: O(log(n)) (we run binary search to find the element to delete)
Binary search tree
search: O(log(n))
insertion: O(log(n))
deletion: O(log(n))
Binary search trees have a worst case of O(n) for operations listed above (if tree is not balanced), so this seems like it would actually be worse than sorted array with binary search.
Also, I am not assuming that we have to sort the array beforehand (which would cost O(nlog(n)), we would insert elements one by one into the array, just as we would do for the binary tree. The only benefit of BST I can see is that it supports other types of traversals like inorder, preorder, postorder.
Your analysis is wrong, both insertion and deletion is O(n) for a sorted array, because you have to physically move the data to make space for the insertion or compress it to cover up the deleted item.
Oh and the worst case for completely unbalanced binary search trees is O(n), not O(logn).
There's not much of a benefit in querying either one.
But constructing a sorted tree is a lot faster than constructing a sorted array, when you're adding elements one at a time. So there's no point in converting it to an array when you're done.
Note also that there are standard algorithms for maintaining balanced binary search trees. They get rid of the deficiencies in binary trees and maintain all of the other strengths. They are complicated, though, so you should learn about binary trees first.
Beyond that, the big-O may be the same, but the constants aren't always. With binary trees if you store the data correctly, you can get very good use of caching at multiple levels. The result is that if you are doing a lot of querying, most of your work stays inside of CPU cache which greatly speeds things up. This is particularly true if you are careful in how you structure your tree. See http://blogs.msdn.com/b/devdev/archive/2007/06/12/cache-oblivious-data-structures.aspx for an example of how clever layout of the tree can improve performance greatly. An array that you do a binary search of does not permit any such tricks to be used.
Adding to #Blindy , I would say the insertion in sorted array takes more of memory operation O(n) std::rotate() than CPU instruction O(logn), refer to insertion sort.
std::vector<MYINTTYPE> sorted_array;
// ... ...
// insert x at the end
sorted_array.push_back(x);
auto& begin = sorted_array.begin();
// O(log n) CPU operation
auto& insertion_point = std::lower_bound(begin()
, begin()+sorted_array().size()-1, x);
// O(n) memory operation
std::rotate(begin, insertion_point, sorted_array.end());
I guess Left child right sibling tree combines the essence of binary tree and sorted array.
data structure
operation
CPU cost
Memory operation cost
sorted array
insert
O(logn) (benefits from pipelining)
O(n) memory operation, refer to insertion-sort using std::rotate()
search
O(logn)
benefits from inline implementation
delete
O(logn) (when pipelining with memory operation)
O(n) memory operation, refer to std::vector::erase()
balanced binary tree
insert
O(logn) (drawback of branch-prediction affecting pipelining, also added cost of tree rotation)
Additional cost of pointers that exhaust the cache.
search
O(logn)
delete
O(logn) (same as insert)
Left child right sibling tree (combines sorted array and binary tree)
insert
O(logn) on average
No need std::rotate() when inserting on left child if kept unbalanced
search
O(logn) (in worst case O(n) when unbalanced)
takes advantage of cache locality in right sibling search , refer to std::vector::lower_bound()
delete
O(logn) (when hyperthreading/pipelining)
O(n) memory operation refer to std::vector::erase()