I am currently reading Cormen's "Introduction to Algorithms" and I found something called a sentinel.
It's used in the mergesort algorithm as a tool to decide when one of the two merging lists is exhausted. Cormen uses the infinity symbol for the sentinels in his pseudocode and I would like to know how such an infinite value can be implemented in C.
A sentinel is just a dummy value. For strings, you might use a NULL pointer since that's not a sensible thing to have in a list. For integers, you might use a value unlikely to occur in your data set e.g. if you are dealing with a list ages, then you can use the age -1 to denote the list.
You can get an "infinite value" for floats, but it's not the best idea. For arrays, pass the size explicitly; for lists, use a null pointer sentinel.
in C, when sorting an array, you usually know the size so you could actually sort a range [begin, end) in which end is one past the end of the array. E.g. int a[n] could be sorted as sort(a, a + n).
This allow you to do two things:
call your sort recursively with the part of the array you haven't sorted yet (merge sort is a recursive algorithm)
use end as a sentinel.
If you know the elements in your list will range from the smallest to the highest possible values for the given data type the code you are looking at won't work. You'll have to come up with something else, which I am sure can be done. I have that book in front of me right now and I am looking at the code that is causing you trouble and I have a solution that will work for you if you know the values range from the smallest for the given data type to the largest minus one at most. Open that book back up to page 31 and take a look at the Merge function. The lines causing you problems are lines 8 and 9 where the sentinel value of infinity is being used. Now, we know the two arrays are each sorted already and that we just need to merge them to get the array that is twice as big and in sorted order. This means that the largest elements in each half is at the end of the sub-arrays, and that the larger of the two is the largest in the array that is twice as big and we will have sorted once the merge function has completed. All we need to do is determine the largest of those two values, increment that value by one, and use that as our sentinel. So, lines 8 and 9 of the code should be replaced by the following 6 lines of code:
if L[n1] < R[n2]
largest = R[n2]
else
largest = L[n1]
L[n1 + 1] = largest + 1
R[n2 + 1] = largest + 1
That should work for you. I have a test tomorrow in my algorithms course on this stuff and I came across your post here and thought I'd help you out. The authors' use of sentinels in this book is something that has always bugged me, and I absolutely can not stand how much they are in love with recursion. Iteration is faster and in my opinion usually easier to come up with and grasp.
The trick is that you don't have to check array bounds when incrementing the index in only one of the lists in the inner while loops. Hence you need sentinels that are larger than all other elements. In c++ I usually use std::numeric_limits<TYPE>::max().
The C-equivalent should be macros like INT_MAX, UINT_MAX, LONG_MAX etc. Those are good sentinels. If you need two different sentinels, use ..._MAX and ..._MAX - 1
This is all assuming you're merging two lists that are ordered ascending.
Related
I need to optimize my algorithm for counting larger/smaller/equal numbers in array(unsorted), than a given number.
I have to do this a lot of times and given array also can have thousands of elements.
Array doesn't change, number is changing
Example:
array: 1,2,3,4,5
n = 3
Number of <: 2
Number of >: 2
Number of ==:1
First thought:
Iterate through the array and check if element is > or < or == than n.
O(n*k)
Possible optimization:
O((n+k) * logn)
Firstly sort the array (im using c qsort), then use binary search to find equal number, and then somehow count smaller and larger values. But how to do that?
If elements exists (bsearch returns pointer to the element) I also need to check if array contain possible duplicates of this elements (so I need to check before and after this elements while they are equal to found element), and then use some pointer operations to count larger and smaller values.
How to get number of values larger/smaller having a pointer to equal element?
But what to do if I don't find the value (bsearch returns null)?
If the array is unsorted, and the numbers in it have no other useful properties, there is no way to beat an O(n) approach of walking the array once, and counting items in the three buckets.
Sorting the array followed by a binary search would be no better than O(n), assuming that you employ a sort algorithm that is linear in time (e.g. a radix sort). For comparison-based sorts, such as quicksort, the timing would increase to O(n*log2n).
On the other hand, sorting would help if you need to run multiple queries against the same set of numbers. The timing for k queries against n numbers would go from O(n*k) for k linear searches to O(n+k*log2n) assuming a linear-time sort, or O((n+k)*log2n) with comparison-based sort. Given a sufficiently large k, the average query time would go down.
Since the array is (apparently?) not changing, presort it. This allows a binary search (Log(n))
a.) implement your own version of bsearch (it will be less code anyhow)
you can do it inline using indices vs. pointers
you won't need function pointers to a specialized function
b.) Since you say that you want to count the number of matches, you imply that the array can contain multiple entries with the same value (otherwise you would have used a boolean has_n).
This means you'll need to do a linear search for the beginning and end of the array of "n"s.
From which you can calculate the number less than n and greater than n.
It appears that you have some unwritten algorithm for choosing these (for n=3 you look for count of values greater and less than 2 and equal to 1, so there is no way to give specific code)
c.) For further optimization (at the expense of memory) you can sort the data into a binary search tree of structs that holds not just the value, but also the count and the number of values before and after each value. It may not use more memory at all if you have a lot of repeat values, but it is hard to tell without the dataset.
That's as much as I can help without code that describes your hidden algorithms and data or at least a sufficient description (aside from recommending a course or courses in data structures and algorithms).
The solution implementing find medians in two sorted array is awesome. However, I am still very confused about code to calculate K
var aMid = aLength * k / (aLength + bLength)
var bMid = k - aMid - 1
I guess this is the key part of this algorithm which I really dont know why is calculated like this. To explain more clearly what I mean, the core logic is divide and conquer, considering the fact that different size list should be divided differently. I wonder why this formula is working perfectly.
Can someone give me some insight about it. I searched lots of online documents and it is very hard to find materials to explain this part well.
Many thanks in advance
The link shows two different ways of computing the comparison points in each array: one always uses k/2, even if the array doesn't have that many elements; the other (which you quote) tries to distribute the comparison points based on the size of the arrays.
As can be seen from these two examples, neither of which is optimal, it doesn't make much difference how you compute the comparison points, as long as the size of the the two components is generally linear in K (using a fixed size of 5 for one of the comparison points won't work, for example.)
The algorithm effectively reduces the problem size by either aMid or bMid on each iteration. Ideally, the problem size would be reduced by k/2; and that's the computation you should use if both arrays have at least k/2 members. If one has two few members, you can set the comparison point for the array to its last element, and compute the other comparison point so that the total is k - 1. If you end up discarding all of the elements from some array, you can then immediately return element k of the other array.
That strategy will usually perform fewer iterations than either of the proposals in your link, but it is still O(log k).
What is the best algorithm for detecting duplicate numbers in array, the best in speed, memory and avoiving overhead.
Small Array like [5,9,13,3,2,5,6,7,1] Note that 5 i dublicate.
After searching and reading about sorting algorithms, I realized that I will use one of these algorithms, Quick Sort, Insertion Sort or Merge Sort.
But actually I am really confused about what to use in my case which is a small array.
Thanks in advance.
To be honest, with that size of array, you may as well choose the O(n2) solution (checking every element against every other element).
You'll generally only need to worry about performance if/when the array gets larger. For small data sets like this, you could well have found the duplicate with an 'inefficient' solution before the sort phase of an efficient solution will have finished :-)
In other words, you can use something like (pseudo-code):
for idx1 = 0 to nums.len - 2 inclusive:
for idx2 = idx1 + 1 to nums.len - 1 inclusive:
if nums[idx1] == nums[idx2]:
return nums[idx1]
return no dups found
This finds the first value in the array which has a duplicate.
If you want an exhaustive list of duplicates, then just add the duplicate value to another (initially empty) array (once only per value) and keep going.
You can sort it using any half-decent algorithm though, for a data set of the size you're discussing, even a bubble sort would probably be adequate. Then you just process the sorted items sequentially, looking for runs of values but it's probably overkill in your case.
Two good approaches depend on the fact that you know or not the range from which numbers are picked up.
Case 1: the range is known.
Suppose you know that all numbers are in the range [a, b[, thus the length of the range is l=b-a.
You can create an array A the length of which is l and fill it with 0s, thus iterate over the original array and for each element e increment the value of A[e-a] (here we are actually mapping the range in [0,l[).
Once finished, you can iterate over A and find the duplicate numbers. In fact, if there exists i such that A[i] is greater than 1, it implies that i+a is a repeated number.
The same idea is behind counting sort, and it works fine also for your problem.
Case 2: the range is not known.
Quite simple. Slightly modify the approach above mentioned, instead of an array use a map where the keys are the number from your original array and the values are the times you find them. At the end, iterate over the set of keys and search those that have been found more then once.
Note.
In both the cases above mentioned, the complexity should be O(N) and you cannot do better, for you have at least to visit all the stored values.
Look at the first example: we iterate over two arrays, the lengths of which are N and l<=N, thus the complexity is at max 2*N, that is O(N).
The second example is indeed a bit more complex and dependent on the implementation of the map, but for the sake of simplicity we can safely assume that it is O(N).
In memory, you are constructing data structures the sizes of which are proportional to the number of different values contained in the original array.
As it usually happens, memory occupancy and performance are the keys of your choice. Greater the former, better the latter and vice versa. As suggested in another response, if you know that the array is small, you can safely rely on an algorithm the complexity of which is O(N^2), but that does not require memory at all.
Which is the best choice? Well, it depends on your problem, we cannot say.
Task is count unique numbers of a given array. I saw numerous similar questions on SO, but here we have additional requirements, which weren't stated in other questions:
Amount of allowed additional memory is O(1)
Changes to array are
prohibited
I was able to write quadratic algorithm, which agrees with given constraints. But I keep wondering, may one could do better on such a problem? Thank you for your time.
Algorithm working with O(n^2)
def count(a):
unique = len(a)
ind = 0
while ind < len(a):
x = a[ind]
i = ind+1
while i < len(a):
if a[i] == x:
unique -= 1
break
i += 1
ind += 1
print("Total uniques: ", unique)
This is a very similar problem to a follow-up question in chapter 1 (Arrays and Strings) from Cracking the Coding Interview:
Implement an algorithm to determine if a string has all unique
characters. What if you cannot use additional data structures?
The answer (to the follow-up question) is that if you can't assume anything about the array (namely, it is not sorted, you don't know its size, etc.), then there is no algorithm better than what you showed.
That being said, you may think about relaxing the constraints a little bit, to make it more interesting. For example, if you have an upper bound on the array size, you could use a bit vector to keep track of which values you've read before while traversing the array, although this is not strictly an O(1) solution when it comes to memory usage (one could argue that by knowing the maximum array size, the memory usage is constant, and thus O(1), but that is a little bit of cheating). Similarly, if the array was sorted, you could also solve it in O(n) by going through each element at a time and check if its neighbors are different numbers.
Because there is no underlying structure in the array given (sorted, etc.) you are forced to brute force every value in the array...
There is a more complicated approach that I believe would work. It entails keeping your array of unique numbers sorted. This means that it would take more time when inserting into the array but would allow you to look-up values much quicker. You should be able to insert into the array in logn time by looking at the value directly in the middle of the array and checking if it's larger or smaller. You'd then eliminate half the array as a valid insertion location and repeat. You would use a similar approach to look-up values in the array. The only issue with this is that it requires more memory space than I believe you are allocated (1).
That being said, I think given the constraints on the task restrict the algorithm to O(n^2).
This question already has answers here:
Limit input data to achieve a better Big O complexity
(3 answers)
Closed 8 years ago.
You are given an unsorted array of n integers, and you would like to find if there are any duplicates in the array (i.e. any integer appearing more than once).
Describe an algorithm (implemented with two nested loops) to do this.
My description of the algorithm:
In step 1, we write a while loop to check to see if the array is empty/null, if the array is not null then we proceed with an inner loop.
Step 2, we now write a for loop to run an iteration of n-1 and in that loop we will assign to current (variable) the first index in the array (in the first iteration) and we will update the current variable by index + 1 each time through the iteration which means that the first time, current will hold the first index in the array and the second time, it will hold the second index in the array and so on until the loop ends.
Step 3, we will write a loop within the for loop (used in step 2) to compare the current number to all the integers in the array, if the integer equals to the next number then we will print the number using a printf statement else update next to hold the next index in the array and use that to compare to the current variable and do so until it has been compared to all the integers in the array and once this has been done, the current variable will be updated to store the next index of the array and will compare that particular number to all the integers in the array.
Will the algorithm be correct? (according to the question)... you're suggestions would be grateful. And no! it's not a homework question or such. Thank you for your time.
The complexity is definitely O (N^2) = N * ((N + 1)/2) Or O(N^2) in its simplified manner.
Edit:
I have added a description of an algorithm that is more efficient (in the question below). But going back to the question above, would it be suitable as an answer for an exam question? (it has shown up in previous papers so i would really appreciate your help).
If we limit the input data in order to achieve some best case scenario, how can you limit the input data to achieve a better Big O complexity? Describe an algorithm for handling this limited data to find if there are any duplicates. What is the Big O complexity?
If we limit the data to, let’s say, array size of 5 (n = 5), we could reduce the complexity to O(N). If the array is sorted, than all we need is a single loop to compare each element to the next element in the array and this will find if duplicates exist. Which simply means that if an array given to us is by default (or luckily) already sorted (from lowest to highest value) in this case the reduction will be from O(N^2) to O(N) as we wouldn’t need the inner loop for comparing the integers for sorting since it is already sorted therefore we could implement a single loop to compare the integers to its successor and if a duplicate is encountered, then we could, for instance, use a printf statement to print the duplicates and proceed to iterate the loop n-1 times (which would be 4)- ending the program once that has been done. The best case in this algorithm would be O(N) simply because the performance grows linearly and in direct proportion to the size of the input/ data so if we have a sorted array of size 50 (50 integers in the array) then the iteration would be n-1 (the loop will iterate 50 – 1 times) where n is the length of the array which is 50. The running time in this algorithm increases in direct proportion to the input size. This simply means that in a sorted array, the amount of time the operations take to perform is completely dependent on the input size of the array.
p.s. Sure there are other algorithms efficient and faster but from my knowledge and from what the question asks is for a better big o complexity in the first question and i believe this algorithm achieves that. (correct me if i'm wrong)- thanks :)
You describe three loops, but the first is actually just a condition (If is null or empty abort).
The remaining algo sounds good, except I'd say instead of "current will hold the first index in the array" (which nitpicks would insist is always 0 in C) "current will hold value of first element in the array" or such.
As an aside (although I understand it's a practice assignment) it's so terribly inefficient (I think n^2 is correct). I urge to just have one loop over the array, copying the checked numbers in a sorted structure of some kind and do binary searches in it. (As a teacher I'd have my students describe a balanced tree first so that they can use it here, like a virtual library ;-) ).