Array Tree...Maybe recursion? - arrays

I am trying to divide arrays recursively... I think that is what this would be called haha....
For instance, lets say the initial array contains 50 values the highest being 97 and the lowest being 7... I want to split this array into two, dividing them based on whether they are greater or lower than the midrange of the entire set. The midrange being 52...( (97+7)/2 )
Then I want to divide these two arrays using the same method and so on, ideally having a program that repeat this process an arbitrary number of times....
Load Values into array1
Find Midrange
For every value in array1{
if value > midrange{
assign value to ArrayHigh1}
Else{ assign value to ArrayLow1}
}
Perform same thing on ArrayHigh1 and ArrayHigh2
Etc etc etc.
I'm having trouble figuring out how I would create the successive arrays (ArrayHigh2 3 4 etc)
Also, I feel like there must be an easier way to do this, but I cannot think of one at the moment...
Thanks for the help

You seem to be working your way towards a B-tree or an implementation of Merge- or Quicksort. Plenty of reference implementations are available online.
Though speaking generally, you might benefit greatly from reading a book many here are familiar with.

Related

How to find an almost sorted array?

I'm trying to create a program that will select the fastest sorting algorithm for a particular array of integers. I'm trying to check off the condition "is almost sorted," and was wondering what common practice to find this in the industry is.
Assume that there is a sorted array available to the coder. The two possible solutions I can think of are:
Loop through both lists simultaneously. Compare values at the index, find the percentage of correctly placed values. I understand that this is pretty quick (just O(N)), but it can be wildly inaccurate... what if everything is shifted by one space? This algorithm will give 0, but insertion sort will take a single run to do order this.
Find how far something is shifted from it's correct position in either direction (w/ wraparound). This seems to be a better solution, but could be pretty slow (O(N^2), since we might have to loop through a sorted list for every unsorted object, which could be corrected A BIT by comparing the value in a while loop).
Are there others? If not, which do I pick?
Thanks!

Sorting n sets of data into one

I have n arrays of data, each of these arrays is sorted by the same criteria.
The number of arrays will, in almost all cases, not exceed 10, so it is a relatively small number. In each array, however, can be a large number of objects, that should be treated as infinite for the algorithm I am looking for.
I now want to treat these arrays as if they are one array. However, I do need a way, to retrieve objects in a given range as fast as possible and without touching all objects before the range and/or all objects after the range. Therefore it is not an option to iterate over all objects and store them in one single array. Fetches with low start values are also more likely than fetches with a high start value. So e.g. fetching objects [20,40) is much more likely than fetching objects [1000,1020), but it could happen.
The range itself will be pretty small, around 20 objects, or can be increased, if relevant for the performance, as long as this does not hit the limits of memory. So I would guess a couple of hundred objects would be fine as well.
Example:
3 arrays, each containing a couple of thousand entires. I now want to get the overall objects in the range [60, 80) without touching either the upper 60 objects in each set nor all the objets that are after object 80 in the array.
I am thinking about some sort of combined, modified binary search. My current idea is something like the following (note, that this is not fully thought through yet, it is just an idea):
get object 60 of each array - the beginning of the range can not be after that, as every single array would already meet the requirements
use these objects as the maximum value for the binary search in every array
from one of the arrays, get the centered object (e.g. 30)
with a binary search in all the other arrays, try to find the object in each array, that would be before, but as close as possible to the picked object.
we now have 3 objects, e.g. object 15, 10 and 20. The sum of these objects would be 45. So there are 42 objects in front, which is more than the beginning of the range we are looking for (30). We continue our binary search in the remaining left half of one of the arrays
if we instead get a value where the sum is smaller than the beginning of the range we are looking for, we continue our search on the right.
at some point we will hit object 30. From there on, we can simply add the objects from each array, one by one, with an insertion sort until we hit the range length.
My questions are:
Is there any name for this kind of algorithm I described here?
Are there other algorithms or ideas for this problem, that might be better suited for this issue?
Thans in advance for any idea or help!
People usually call this problem something like "selection in the union of multiple sorted arrays". One of the questions in the sidebar is about the special case of two sorted arrays, and this question is about the general case. Several comparison-based approaches appear in the combined answers; they more or less have to determine where the lower endpoint in each individual array is. Your binary search answer is one of the better approaches; there's an asymptotically faster algorithm due to Frederickson and Johnson, but it's complicated and not obviously an improvement for small ranks.

How to know if an array is sorted?

I already read this post but the answer didn't satisfied me Check if Array is sorted in Log(N).
Imagine I have a serious big array over 1,000,000 double numbers (positive and/or negative) and I want to know if the array is "sorted" trying to avoid the max numbers of comparisons because comparing doubles and floats take too much time. Is it possible to use statistics on It?, and if It was:
It is well seen by real-programmers?
Should I take samples?
How many samples should I take
Should they be random, or in a sequence?
How much is the %error permitted to say "the array sorted"?
Thanks.
That depends on your requirements. If you can say that if 100 random samples out of 1.000.000 is enough the assume it's sorted - then so it is. But to be absolutely sure, you will always have to go through every single entry. Only you can answer this question since only you know how certain you need to be about it being sorted.
This is a classic probability problem taught in high school. Consider this question:
What is the probability that the batch will be rejected?
In a batch of 8,000, clocks 7% are defective. A random sample of 10 (without replacement) from the 8,000 is selected and tested. If at least one is defective the entire batch will be rejected.
So you can take a number of random samples from your large array and see if it's sorted, but you must note that you need to know the probability that the sample is out of order. Since you don't have that information, a probabilistic approach wouldn't work efficiently here.
(However, you can check 50% of the array and naively conclude that there is a 50% chance that it is sorted correctly.)
If you run a divide and conquer algorithm using multiprocessing (real parallelism, so only for multi-core CPUs) you can check whether an array is sorted or not in Log(N).
If you have GPU multiprocessing you can achieve Log(N) very easily since modern graphics card are able to run few thousands processes in parallel.
Your question 5 is the question that you need to answer to determine the other answers. To ensure the array is perfectly sorted you must go through every element, because any one of them could be the one out of place.
The maximum number of comparisons to decide whether the array is sorted is N-1, because there are N-1 adjacent number pairs to compare. But for simplicity, we'll say N as it does not matter if we look at N or N+1 numbers.
Furthermore, it is unimportant where you start, so let's just start at the beginning.
Comparison #1 (A[0] vs. A[1]). If it fails, the array is unsorted. If it succeeds, good.
As we only compare, we can reduce this to the neighbors and whether the left one is smaller or equal (1) or not (0). So we can treat the array as a sequence of 0's and 1's, indicating whether two adjacent numbers are in order or not.
Calculating the error rate or the propability (correct spelling?) we will have to look at all combinations of our 0/1 sequence.
I would look at it like this: We have 2^n combinations of an array (i.e. the order of the pairs, of which only one is sorted (all elements are 1 indicating that each A[i] is less or equal to A[i+1]).
Now this seems to be simple:
initially the error is 1/2^N. After the first comparison half of the possible combinations (all unsorted) get eliminated. So the error rate should be 1/2^n + 1/2^(n-1).
I'm not a mathematician, but it should be quite easy to calculate how many elements are needed to reach the error rate (find x such that ERROR >= sum of 1/2^n + 1/2^(n-1)... 1/^(2-x) )
Sorry for the confusing english. I come from germany..
Since every single element can be the one element that is out-of-line, you have to run through all of them, hence your algorithm has runtime O(n).
If your understanding of "sorted" is less strict, you need to specify what exaclty you mean by "sorted". Usually, "sorted" means that adjacent elements meet a less or less-or-equal condition.
Like everyone else says, the only way to be 100% sure that it is sorted is to run through every single element, which is O(N).
However, it seems to me that if you're so worried about it being sorted, then maybe having it sorted to begin with is more important than the array elements being stored in a contiguous portion in memory?
What I'm getting at is, you could use a map whose elements by definition follow a strict weak ordering. In other words, the elements in a map are always sorted. You could also use a set to achieve the same effect.
For example: std::map<int,double> collectoin; would allow you to almost use it like an array: collection[0]=3.0; std::cout<<collection[0]<<std:;endl;. There are differences, of course, but if the sorting is so important then an array is the wrong choice for storing the data.
The old fashion way.Print it out and see if there in order. Really if your sort is wrong you would probably see it soon. It's more unlikely that you would only see a few misorders if you were sorting like 100+ things. When ever I deal with it my whole thing is completely off or it works.
As an example that you probably should not use but demonstrates sampling size:
Statistically valid sample size can give you a reasonable estimate of sortedness. If you want to be 95% certain eerything is sorted you can do that by creating a list of truly random points to sample, perhaps ~1500.
Essentially this is completely pointless if the list of values being out of order in one single place will break subsequent algorithms or data requirements.
If this is a problem, preprocess the list before your code runs, or use a really fast sort package in your code. Most sort packages also have a validation mode, where it simply tells you yes, the list meets your sort criteria - or not. Other suggestions like parallelization of your check with threads are great ideas.

What is the best way to count the number of times a value occurs in an array?

I have been given an assignment to write a program that reads in a number of assignment marks from a text file into an array, and then counts how many marks there are within particular brackets, i.e. 40-49, 50-59 etc. A value of -1 in the text file means that the assignment was not handed in, and a value of 0 means that the assignment was so bad that it was ungraded.
I could do this easily using a couple of for loops, and then using if statements to check the values whilst incrementing appropriate integers to count the number of occurences, but in order to get higher marks I need to implement the program in a "better" way. What would be a better, more efficient way to do this? I'm not looking for code right now, just simply "This is what you should do". I've tried to think of different ways to do it, but none of them seem to be better, and I feel as if I'm just trying to make it complicated for the sake of it.
I tried using the 2D array that the values are stored in as a parameter of a function, and then using the function to print out the number of occurences of the particular values, but I couldn't get this to compile as my syntax was wrong for using a 2D array as a parameter, and I'm not too sure about how to do this.
Any help would be appreciated, thanks.
Why do you need a couple for loops? one is enough.
Create an array of size 10 where array[0] is marks between 0-9, array[1] is marks between 10-19, etc. When you see a number, put it in the appropriate array bucket using integer division, e.g. array[(int)mark/10]++. when you finish the array will contain the count of the number of marks in each bucket.
As food for thought, if this is a school assignment, you might want to apply other things you have learned in the course.
Did you learn sorting yet? Maybe you could sort the list first so that you are not iterating over the array several times. You can just go over it once, grab all the -1's, and spit out how many you have, then grab all the ones in the next bracket and so on.
edit: that is of course, assuming that you are using a 1d array.

Infinity as sentinel in mergesort?

I am currently reading Cormen's "Introduction to Algorithms" and I found something called a sentinel.
It's used in the mergesort algorithm as a tool to decide when one of the two merging lists is exhausted. Cormen uses the infinity symbol for the sentinels in his pseudocode and I would like to know how such an infinite value can be implemented in C.
A sentinel is just a dummy value. For strings, you might use a NULL pointer since that's not a sensible thing to have in a list. For integers, you might use a value unlikely to occur in your data set e.g. if you are dealing with a list ages, then you can use the age -1 to denote the list.
You can get an "infinite value" for floats, but it's not the best idea. For arrays, pass the size explicitly; for lists, use a null pointer sentinel.
in C, when sorting an array, you usually know the size so you could actually sort a range [begin, end) in which end is one past the end of the array. E.g. int a[n] could be sorted as sort(a, a + n).
This allow you to do two things:
call your sort recursively with the part of the array you haven't sorted yet (merge sort is a recursive algorithm)
use end as a sentinel.
If you know the elements in your list will range from the smallest to the highest possible values for the given data type the code you are looking at won't work. You'll have to come up with something else, which I am sure can be done. I have that book in front of me right now and I am looking at the code that is causing you trouble and I have a solution that will work for you if you know the values range from the smallest for the given data type to the largest minus one at most. Open that book back up to page 31 and take a look at the Merge function. The lines causing you problems are lines 8 and 9 where the sentinel value of infinity is being used. Now, we know the two arrays are each sorted already and that we just need to merge them to get the array that is twice as big and in sorted order. This means that the largest elements in each half is at the end of the sub-arrays, and that the larger of the two is the largest in the array that is twice as big and we will have sorted once the merge function has completed. All we need to do is determine the largest of those two values, increment that value by one, and use that as our sentinel. So, lines 8 and 9 of the code should be replaced by the following 6 lines of code:
if L[n1] < R[n2]
largest = R[n2]
else
largest = L[n1]
L[n1 + 1] = largest + 1
R[n2 + 1] = largest + 1
That should work for you. I have a test tomorrow in my algorithms course on this stuff and I came across your post here and thought I'd help you out. The authors' use of sentinels in this book is something that has always bugged me, and I absolutely can not stand how much they are in love with recursion. Iteration is faster and in my opinion usually easier to come up with and grasp.
The trick is that you don't have to check array bounds when incrementing the index in only one of the lists in the inner while loops. Hence you need sentinels that are larger than all other elements. In c++ I usually use std::numeric_limits<TYPE>::max().
The C-equivalent should be macros like INT_MAX, UINT_MAX, LONG_MAX etc. Those are good sentinels. If you need two different sentinels, use ..._MAX and ..._MAX - 1
This is all assuming you're merging two lists that are ordered ascending.

Resources