This is an extremely simple question I nevertheless would like to know the answer to. How does the complexity of locating the n'th element in a vector (an array) compared to that of comparing a natural number with a list of natural number of size n? I suppose they are both O(n) but with significantly different coefficients.
No; array lookup is constant-time.
The reason for this is that computer memory is constructed so as to allow for constant-time access to any element as long as you have the address of it. An array is a contiguous sequence of elements, and therefore, you can compute the address of the desired element from the start address, the index, and the element size. (All elements must be of the same size; this might appear not to be the case in languages that allow you to put different kinds of elements into the same array, but in that situation, the elements of the array are actually references (memory addresses, in practice) to the objects.)
Related
I need to optimize my algorithm for counting larger/smaller/equal numbers in array(unsorted), than a given number.
I have to do this a lot of times and given array also can have thousands of elements.
Array doesn't change, number is changing
Example:
array: 1,2,3,4,5
n = 3
Number of <: 2
Number of >: 2
Number of ==:1
First thought:
Iterate through the array and check if element is > or < or == than n.
O(n*k)
Possible optimization:
O((n+k) * logn)
Firstly sort the array (im using c qsort), then use binary search to find equal number, and then somehow count smaller and larger values. But how to do that?
If elements exists (bsearch returns pointer to the element) I also need to check if array contain possible duplicates of this elements (so I need to check before and after this elements while they are equal to found element), and then use some pointer operations to count larger and smaller values.
How to get number of values larger/smaller having a pointer to equal element?
But what to do if I don't find the value (bsearch returns null)?
If the array is unsorted, and the numbers in it have no other useful properties, there is no way to beat an O(n) approach of walking the array once, and counting items in the three buckets.
Sorting the array followed by a binary search would be no better than O(n), assuming that you employ a sort algorithm that is linear in time (e.g. a radix sort). For comparison-based sorts, such as quicksort, the timing would increase to O(n*log2n).
On the other hand, sorting would help if you need to run multiple queries against the same set of numbers. The timing for k queries against n numbers would go from O(n*k) for k linear searches to O(n+k*log2n) assuming a linear-time sort, or O((n+k)*log2n) with comparison-based sort. Given a sufficiently large k, the average query time would go down.
Since the array is (apparently?) not changing, presort it. This allows a binary search (Log(n))
a.) implement your own version of bsearch (it will be less code anyhow)
you can do it inline using indices vs. pointers
you won't need function pointers to a specialized function
b.) Since you say that you want to count the number of matches, you imply that the array can contain multiple entries with the same value (otherwise you would have used a boolean has_n).
This means you'll need to do a linear search for the beginning and end of the array of "n"s.
From which you can calculate the number less than n and greater than n.
It appears that you have some unwritten algorithm for choosing these (for n=3 you look for count of values greater and less than 2 and equal to 1, so there is no way to give specific code)
c.) For further optimization (at the expense of memory) you can sort the data into a binary search tree of structs that holds not just the value, but also the count and the number of values before and after each value. It may not use more memory at all if you have a lot of repeat values, but it is hard to tell without the dataset.
That's as much as I can help without code that describes your hidden algorithms and data or at least a sufficient description (aside from recommending a course or courses in data structures and algorithms).
It is said that an example of a O(1) operation is that of accessing an element in an array. According to one source, O(1) can be defined in the following manner:
[Big-O of 1] means that the execution time of the algorithm does not depend on
the size of the input. Its execution time is constant.
However, if one wants to access an element in an array, does not the efficiency of the operation depend on the amount of elements in the array? For example
int[] arr = new int[1000000];
addElements(arr, 1000000); //A function which adds 1 million arbitrary integers to the array.
int foo = arr[55];
I don't understand how the last statement can be described as running in O(1); does not the 1,000,000 elements in the array have a bearing on the running time of the operation? Surely it'd take longer to find element 55 than it would element 1? If anything, this looks to me like O(n).
I'm sure my reasoning is flawed, however, I just wanted some clarification as to how this can be said to run in O(1)?
Array is a data structure, where objects are stored in continuous memory location. So in principle, if you know the address of base object, you will be able to find the address of ith object.
addr(a[i]) = addr(a[0]) + i*size(object)
This makes accessing ith element of array O(1).
EDIT
Theoretically, when we talk about complexity of accessing an array element, we talk for fixed index i.
Input size = O(n)
To access ith element, addr(a[0]) + i*size(object). This term is independent of n, so it is said to be O(1).
How ever multiplication still depends on i but not on n. It is constant O(1).
The address of an element in memory will be the base address of the array plus the index times the size of the element in the array. So to access that element, you just essentially access memory_location + 55 * sizeof(int).
This of course assumes you are under the assumption multiplication takes constant time regardless of size of inputs, which is arguably incorrect if you are being very precise
To find an element isn't O(1) - but accessing element in the array has nothing to do with finding an element - to be precise, you don't interact with other elements, you don't need to access anything but your single element - you just always calculate the address, regardless how big array is, and that is that single operation - hence O(1).
The machine code (or, in the case of Java, virtual machine code) generated for the statement
int foo = arr[55];
Would be essentially:
Get the starting memory address of arr into A
Add 55 to A
Take the contents of the memory address in A, and put it in the memory address of foo
These three instructions all take O(1) time on a standard machine.
In theory, array access is O(1), as others have already explained, and I guess your question is more or less a theoretical one. Still I like to bring in another aspect.
In practice, array access will get slower if the array gets large. There are two reasons:
Caching: The array will not fit into cache or only into a higher level (slower) cache.
Address calculation: For large arrays, you need larger index data types (for example long instead of int). This will make address calculation slower, at least on most platforms.
If we say the subscript operator (indexing) has O(1) time complexity, we make this statement excluding the runtime of any other operations/statements/expressions/etc. So addElements does not affect the operation.
Surely it'd take longer to find element 55 than it would element 1?
"find"? Oh no! "Find" implies a relatively complex search operation. We know the base address of the array. To determine the value at arr[55], we simply add 551 to the base address and retrieve the value at that memory location. This is definitely O(1).
1 Since every element of an int array occupies at least two bytes (when using C), this is no exactly true. 55 needs to be multiplied by the size of int first.
Arrays store the data contiguously, unlike Linked Lists or Trees or Graphs or other data structures using references to find the next/previous element.
It is intuitive to you that the access time of first element is O(1). However you feel that the access time to 55th element is O(55). That's where you got it wrong. You know the address to first element, so the access time to it is O(1).
But you also know the address to 55th element. It is simply address of 1st + size_of_each_element*54 .
Hence you access that element as well as any other element of an array in O(1) time. And that is the reason why you cannot have elements of multiple types in an array because that would completely mess up the math to find the address to nth element of an array.
So, access to any element in an array is O(1) and all elements have to be of same type.
Is there a function in Fortran that deletes a specific element in an array, such that the array upon deletion shrinks its length by the number of elements deleted?
Background:
I'm currently working on a project which contain sets of populations with corresponding descriptions to the individuals (i.e, age, death-age, and so on).
A method I use is to loop through the array, find which elements I need, place it in another array, and deallocate the previous array and before the next time step, this array is moved back to the array before going through the subroutines to find once again the elements not needed.
You can use the PACK intrinsic function and intrinsic assignment to create an array value that is comprised of selected elements from another array. Assuming array is allocatable, and the elements to be removed are nominated by a logical mask logical_mask that is the same size as the original value of array:
array = PACK(array, .NOT. logical_mask)
Succinct syntax for a single element nominated by its index is:
array = [array(:index-1), array(index+1:)]
Depending on your Fortran processor, the above statements may result in the compiler creating temporaries that may impact performance. If this is problematic then you will need to use the subroutine approach that you describe.
Maybe you want to look into linked lists. You can insert and remove items and the list automatically resizes. This resource is pretty good.
http://www.iag.uni-stuttgart.de/IAG/institut/abteilungen/numerik/images/4/4c/Pointer_Introduction.pdf
To continue the discussion, the solution you might want to implement depends on the number of delete operation and access you do, where you insert/delete the elements (the first, the last, randomly in the set?), how do you access the data (from the first to the last, randomly in the set?), what are your efficiency requirements in terms of CPU and memory.
Then you might want to go for linked list or for static or dynamic vectors (other types of data structures might also fit better your needs).
For example:
a static vector can be used when you want to access a lot of elements randomly and know the maximum number nmax of elements in the vector. Simply use an array of nmax elements with an associated length variable that will track the last element. A deletion can simply and quickly be done my exchanging the last element with the deleted one and reducing the length.
a dynamic vector can be implemented when you don't know the maximum number of elements. In order to avoid systematic array allocation+copy+unallocation at for each deletion/insertion, you fix the maximum number of elements (as above) and only augment its size (eg. nmax becomes 10*nmax, then reallocate and copy) when reaching the limit (the reverse system can also be implemented to reduce the number of elements).
Task is count unique numbers of a given array. I saw numerous similar questions on SO, but here we have additional requirements, which weren't stated in other questions:
Amount of allowed additional memory is O(1)
Changes to array are
prohibited
I was able to write quadratic algorithm, which agrees with given constraints. But I keep wondering, may one could do better on such a problem? Thank you for your time.
Algorithm working with O(n^2)
def count(a):
unique = len(a)
ind = 0
while ind < len(a):
x = a[ind]
i = ind+1
while i < len(a):
if a[i] == x:
unique -= 1
break
i += 1
ind += 1
print("Total uniques: ", unique)
This is a very similar problem to a follow-up question in chapter 1 (Arrays and Strings) from Cracking the Coding Interview:
Implement an algorithm to determine if a string has all unique
characters. What if you cannot use additional data structures?
The answer (to the follow-up question) is that if you can't assume anything about the array (namely, it is not sorted, you don't know its size, etc.), then there is no algorithm better than what you showed.
That being said, you may think about relaxing the constraints a little bit, to make it more interesting. For example, if you have an upper bound on the array size, you could use a bit vector to keep track of which values you've read before while traversing the array, although this is not strictly an O(1) solution when it comes to memory usage (one could argue that by knowing the maximum array size, the memory usage is constant, and thus O(1), but that is a little bit of cheating). Similarly, if the array was sorted, you could also solve it in O(n) by going through each element at a time and check if its neighbors are different numbers.
Because there is no underlying structure in the array given (sorted, etc.) you are forced to brute force every value in the array...
There is a more complicated approach that I believe would work. It entails keeping your array of unique numbers sorted. This means that it would take more time when inserting into the array but would allow you to look-up values much quicker. You should be able to insert into the array in logn time by looking at the value directly in the middle of the array and checking if it's larger or smaller. You'd then eliminate half the array as a valid insertion location and repeat. You would use a similar approach to look-up values in the array. The only issue with this is that it requires more memory space than I believe you are allocated (1).
That being said, I think given the constraints on the task restrict the algorithm to O(n^2).
According to Wikipedia, accessing any single element in an array takes constant time as only one operation has to be performed to locate it.
To me, what happens behind the scenes probably looks something like this:
a) The search is done linearly (e.g. I want to access element 5. I begin the search at index 0, if it's not equal to 5, I go to index 1 etc.)
This is O(n) -- where n is the length of the array
b) If the array is stored as a B-tree, this would give O(log n)
I see no other approach.
Can someone please explain why and how this is done in O(1)?
An array starts at a specific memory address start. Each element occupies the same amount of bytes element_size. The array elements are located one after another in the memory from the start address on. So you can calculate the memory address of the element i with start + i * element_size. This computation is independent of the array size and is therefor O(1).
In theory elements of an array are of the same known size and they are located in a continuous part of memory so if beginning of the array is located at A memory address if you want to access any element you have to compute its address like this:
A + item_size*index so this is a constant time operation.
Accessing a single element is NOT finding an element whose value is x.
Accessing an element i means getting the element at the i'th position of the array.
This is done in O(1) because it is pretty simple (constant number of math calculations) where the element is located given the index, the beginning of the array and the size of each element.
RAM memory offers a constant time (or to be more exact, a bounded time) to access each address in the RAM, and since finding the address is O(1), and retrieving the element in it is also O(1), it gives you total of O(1).
Finding if an element whose value is x is actually Omega(n) problem, unless there is some more information on the array (sorted, for example).
Usually an array is organized as a continuous block of memory where each position can be accessed by means of an index calculation. This index calculation cannot be done in constant time for arrays of arbitrary size, but for reasons of addressable space, the numbers involved are bounded by machine word size, so an assumption of constant time is justified.
If you have array of Int's. Each int is 32 bit's in java . If you have for example 10 integers in java it means you allocate memory of 320 bits. Then computer know
0 - index is at memory for example - 39200
last index is at memory 39200 + total memory of your array = 39200+320= 39520
So then if you want to access index 3. Then it is 39200 + 32*3 = 39296.
It all depends how much memory your object takes which you store in array. Just remember arrays take whole block of memory.