Dynamic array guarantee clarification - arrays

In Skiena's Algorithm Design Manual, he mentions at one point:
The primary thing lost using dynamic arrays is the guarantee that each array
access takes constant time in the worst case. Now all the queries will be fast, except
for those relatively few queries triggering array doubling. What we get instead is a
promise that the nth array access will be completed quickly enough that the total
effort expended so far will still be O(n).
I'm struggling to understand this. How will an array query expand the array?

Dynamic arrays are arrays where the size does not need to be specified (Think of an ArrayList in java). Under the hood, dynamic arrays are implemented using a regular array. Though, because it's a regular array the implementation of the ArrayList needs to specify the size of the underlying array.
So the typical way to handle this in dynamic arrays is to initialize the standard array with a certain amount of elements, then when it reached it's maximum elements, the array is doubled in size.
Because of this underlying functionality, most of the time it will take constant time when adding to a dynamic array, but occasionally it will double the size of the 'under the hood' standard array which will take longer than the normal add time.
If your confusion lies with his use of the word 'query', I believe he means to say 'adding or removing from the array' because a simple 'get' query shouldn't be related to the underlying standard array size.

Related

Is Array of Objects Access Time O(1)?

I know that accessing something from an array takes O(1) time if it is of one type using mathematical formula array[n]=(start address of array + (n * size Of(type)), but Assume you have an array of objects. These objects could have any number of fields including nested objects. Can we consider the access time to be constant?
Edit- I am mainly asking for JAVA, but I would like to know if there is a difference in case I choose another mainstream language like python, c++, JavaScript etc.
For example in the below code
class tryInt{
int a;
int b;
String s;
public tryInt(){
a=1;
b=0;
s="Sdaas";
}
}
class tryobject{
public class tryObject1{
int a;
int b;
int c;
}
public tryobject(){
tryObject1 o=new tryObject1();
sss="dsfsdf";
}
String sss;
}
public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
Object[] arr=new Object[5];
arr[0]=new tryInt();
arr[1]=new tryobject();
System.out.println(arr[0]);
System.out.println(arr[1]);
}
}
I want to know that since the tryInt type object should take less space than tryobject type object, how will an array now use the formula array[n]=(start address of array + (n * size Of(type)) because the type is no more same and hence this formula should/will fail.
The answer to your question is it depends.
If it's possible to random-access the array when you know the index you want, yes, it's a O(1) operation.
On the other hand, if each item in the array is a different length, or if the array is stored as a linked list, it's necessary to start looking for your element at the beginning of the array, skipping over elements until you find the one corresponding to your index. That's an O(n) operation.
In the real world of working programmers and collections of data, this O(x) stuff is inextricably bound up with the way the data is represented.
Many people reserve the word array to mean a randomly accessible O(1) collection. The pages of a book are an array. If you know the page number you can open the book to the page. (Flipping the book open to the correct page is not necessarily a trivial operation. You may have to go to your library and find the right book first. The analogy applies to multi-level computer storage ... hard drive / RAM / several levels of processor cache)
People use list for a sequentially accessible O(n) collection. The sentences of text on a page of a book are a list. To find the fifth sentence, you must read the first four.
I mention the meaning of the words list and array here for an important reason for professional programmers. Much of our work is maintaining existing code. In our justifiable rush to get things done, sometimes we grab the first collection class that comes to hand, and sometimes we grab the wrong one. For example, we might grab a list O(n) rather than an array O(1) or a hash O(1, maybe). The collection we grab works well for our tests. But, boom!, performance falls over just when the application gets successful and scales up to holding a lot of data. This happens all the time.
To remedy that kind of problem we need a practical understanding of these access issues. I once inherited a project with a homegrown hashed dictionary class that consumed O(n cubed) when inserting lots of items into the dictionary. It took a lot of digging to get past the snazzy collection-class documentation to figure out what was really going on.
In Java, the Object type is a reference to an object rather than an object itself. That is, a variable of type Object can be thought of as a pointer that says “here’s where you should go to find your Object” rather than “I am an actual, honest-to-goodness Object.” Importantly, the size of this reference - the number of bytes used up - is the same regardless of what type of thing the Object variable refers to.
As a result, if you have an Object[], then the cost of indexing into that array is indeed O(1), since the entries in that array are all the same size (namely, the size of an object reference). The sizes of the objects being pointed at might not all be the same, as in your example, but the pointers themselves are always the same size and so the math you’ve given provides a way to do array indexing in constant time.
The answer depends on context.
It's really common in some textbooks to treat array access as O(1) because it simplifies analysis.
And in fairness it is O(1) cpu instructions in today's architectures.
But:
As the dataset gets larger tending to infinity, it doesn't fit in memory. If the "array" is implemented as a database structure spread across multiple machines, you'll end up with a tree structure and probably have logarithmic worst case access times.
If you don't care about data size going to infinity, then big O notation may not be the right fit for your situation
On real hardware, memory accesses are not all equal -- there are many layers of caches and cache misses cost hundreds or thousands of cycles. The O(1) model for memory access tends to ignore that
In theory work, random access machines access memory in O(1), but turing machines cannot. Hierarchical cache effects tend to be ignored. Some models like transdichotomous RAM try to account for this.
In short, this is a property of your model of computation. There are many valid and interesting models of computation and what to choose depends on your needs and your situation.
In general. array denotes a fixed-size memory range, storing elements of the same size. If we consider this usual concept of array, then if objects are members of the array, then under the hood your array stores the object references and referring the i'th element in your array you find the reference of the object/pointer it contains, with an O(1) complexity and the address it points to is something the language is to find.
However, there are arrays which do not comply to this definition. For example, in Javascript you can easily add items to the array, which makes me think that in Javascript the arrays are somewhat different from an allocated fixed-sized range of elements of the same size/type. Also, in Javascript you can add any types of elements to an array. So, in general I would say the complexity is O(1), but there are quite a few important exceptions from this rule, depending on the technologies.

In Swift, how efficient is it to append an array?

If you have an array which can very in size throughout the course of your program, would it be more efficient to declare the array as the maximum size it will ever reach and then control how much of the array your program can access, or to change the size of the array quite frequently throughout the course of the program?
From the Swift headers, there's this about array growth and capacity:
When an array's contiguous storage fills up, new storage must be allocated and elements must be moved to the new storage. Array, ContiguousArray, and Slice share an exponential growth strategy that makes append a constant time operation when amortized over many invocations. In addition to a count property, these array types have a capacity that reflects their potential to store elements without reallocation, and when you know how many elements you'll store, you can call reserveCapacity to pre-emptively reallocate and prevent intermediate reallocations.
Reading that, I'd say it's best to reserve the capacity you need, and only come back to optimize that if you find it's really a problem. You'll make more work for yourself if you're faking the length all the time.

Data Structure to do lookup on large number

I have a requirement to do a lookup based on a large number. The number could fall in the range 1 - 2^32. Based on the input, i need to return some other data structure. My question is that what data structure should i use to effectively hold this?
I would have used an array giving me O(1) lookup if the numbers were in the range say, 1 to 5000. But when my input number goes large, it becomes unrealistic to use an array as the memory requirements would be huge.
I am hence trying to look at a data structure that yields the result fast and is not very heavy.
Any clues anybody?
EDIT:
It would not make sense to use an array since i may have only 100 or 200 indices to store.
Abhishek
unordered_map or map, depending on what version of C++ you are using.
http://www.cplusplus.com/reference/unordered_map/unordered_map/
http://www.cplusplus.com/reference/map/map/
A simple solution in C, given you've stated at most 200 elements is just an array of structs with an index and a data pointer (or two arrays, one of indices and one of data pointers, where index[i] corresponds to data[i]). Linearly search the array looking for the index you want. With a small number of elements, (200), that will be very fast.
One possibility is a Judy Array, which is a sparse associative array. There is a C Implementation available. I don't have any direct experience of these, although they look interesting and could be worth experimenting with if you have the time.
Another (probably more orthodox) choice is a hash table. Hash tables are data structures which map keys to values, and provide fast lookup and insertion times (provided a good hash function is chosen). One thing they do not provide, however, is ordered traversal.
There are many C implementations. A quick Google search turned up uthash which appears to be suitable, particularly because it allows you to use any value type as the key (many implementations assume a string as the key). In your case you want to use an integer as the key.

Are bulk array operations faster than sequential operations?

Say I am given an array, and a function called replace:
void replace(from, to, items[])
whose job is to replace the array elements in the range [from, to) with the elements in items.
I'll assume the maximum size of the array is known beforehand, so I can ensure the array will never overflow.
My question is, if I am given a list of replacements (e.g. elements of the form (from, to, items)), is it possible for me to obtain the final resulting array with a faster time complexity than performing each operation sequentially?
In other words, is there any advantage to knowing the sequence of operations beforehand, or is it no better than being given each operation one by one (in terms of the asymptotic time complexity)?
Note: It seems like the question is confusing; I did not intend to imply that the number of elements replacing a given range is the same as the size of that range! It could be fewer or more, causing shifting, and the point of the question was was to ask if knowing them beforehand could avoid extra work like shifting in the worst case.
I think this may not be what you are finding, but using Rope can shorten the time-complexity. It provides operation on string like concat without "shift"ing actual array. (not needed to be a string of char. arbitary type of element can be used as alphabets)
According to Wikipedia (...Because I have not ever used it yet), Rope is a balaced binary tree of fragmented string. A rope represents long string concatenated all fragmented string.
your replace() function can be implemented using Split() and Concat() operation on the rope,
both operations takes only O(log(n)) time. Here, I put n as the length of the long string the rope represents.
To get normal array, rope can be converted in O(n) time using Report() operation.
so the answer is: There is a algorithm faster than sequencially applies string operation.
convert given array to a rope
apply every replace job on the rope. each operation runs in O(log(n)) and does not copy actual array items.
convert the processed rope to a real array (this causes copy of all items).
return it.
It's always linear, down to memcpy's level. The only way to speed it up is trade time for space - override the array subscript operator so that elements between from and to point to elements in items rather than the original array, which is not a good solution under any circumstances.

How can I efficiently copy 2-dimensional arrays of bytes into a larger 2D array?

I have a structure called Patch that represents a 2D array of data.
newtype Size = (Int, Int)
data Patch = Patch Size Strict.ByteString
I want to construct a larger Patch from a set of smaller Patches and their assigned positions. (The Patches do not overlap.) The function looks like this:
newtype Position = (Int, Int)
combinePatches :: [(Position, Patch)] -> Patch
combinePatches plan = undefined
I see two sub-problems. First, I must define a function to translate 2D array copies into a set of 1D array copies. Second, I must construct the final Patch from all those copies.
Note that the final Patch will be around 4 MB of data. This is why I want to avoid a naive approach.
I'm fairly confident that I could do this horribly inefficiently, but I would like some advice on how to efficiently manipulate large 2D arrays in Haskell. I have been looking at the "vector" library, but I have never used it before.
Thanks for your time.
If the spec is really just a one-time creation of a new Patch from a set of previous ones and their positions, then this is a straightforward single-pass algorithm. Conceptually, I'd think of it as two steps -- first, combine the existing patches into a data structure with reasonable lookup for any give position. Next, write your new structure lazily by querying the compound structure. This should be roughly O(n log(m)) -- n being the size of the new array you're writing, and m being the number of patches.
This is conceptually much simpler if you use the Vector library instead of a raw ByteString. But it is simpler still if you simply use Data.Array.Unboxed. If you need arrays that can interop with C, then use Data.Array.Storable instead.
If you ditch purity, at least locally, and work with an ST array, you should be able to trivially do this in O(n) time. Of course, the constant factors will still be worse than using fast copying of chunks of memory at a time, but there's no way to keep that code from looking low-level.

Resources