Lets say I have an array as:
int a[]={4,5,7,10,2,3,6}
when I access an element, such as a[3], what does actually happen behind the scenes?
Why do many algorithm books (such as the Cormen book...) say that it takes a constant time?
(I'm just a noob in low-level programming so I would like to learn more from you guys)
The array, effectively, is known by a memory location (a pointer). Accessing a[3] can be found in constant time, since it's just location_of_a+3*sizeof(int).
In C, you can see this directly. Remember, a[3] is the same as *(a+3) - which is a bit more clear in terms of what it's doing (dereferencing the pointer "3 items" over).
an array of 10 integer variables, with indices 0 through 9, may be stored as 10 words at memory addresses 2000, 2004, 2008, … 2036, so that the element with index i has the address 2000 + 4 × i.
this process take one multiplication and one addition .since these two operation take constant time.so we can say the access can be perform in constant time
Just to be complete, "what structure is accessed in linear time?" A Linked List structure is accessed in linear time. To get the n element you have to travel through n-1 previous elements. You know, like a tape recorder or a VHS cassette, where to go to the end of the tape/VHS you had to wait a long time :-)
An array is more similar to an hard disk: every point is accessible in "constant" time :-)
This is the reason the RAM of a computer is called RAM: Random Access Memory. You can go to any location if you know its address without traversing all the memory before that location.
Some persons told me that HD access isn't really in constant time (where by access I mean "time to position the head and read one sector of the HD"). I have to say that I'm not sure of it. I have googled around and I haven't found anyone speaking of it. I DO know that the time isn't linear, because it is still accessed randomly. In the end, if you think that HD access isn't constant enough for you (but then, what is constant? the access of the RAM? considering Cache, Prefetching, Data Locality and Compiler optimizations?), feel free to consider the sentence as An array is more similar to an USB disk stick: every point is accessible in "constant" time :-)
Because arrays are stored in memory sequentially. So actually, when you access array[3] you are telling the computer, "Get the memory address of the beginning of array, then add 3 to it, then access that spot." Since adding takes constant time, so does array access!
"constant time" really means "time that doesn't depend on the 'problem size'". For the 'problem' "get something out of a container", the 'problem size' is the size of the container.
Accessing an array element takes basically the same amount of time (this is a simplification) regardless of the container size, because a fixed set of steps is used to retrieve the element: its location in memory (this is also a simplification) is calculated, and then the value at that location in memory is retrieved.
A linked list, for example, can't do this, because each link indicates the location of the next one. To find an element you have to work through all the previous ones; on average, you'll work through half the container, so the size of the container obviously matters.
Array is collection of similar types of data. So all the elements will take same amount of memory.Therefore if you know the base address of array and type of data it holds you can easily get element Array[i] in constant time using formula stated below:
int takes 4 bytes in a 32 bit system.
int array[10];
base address=4000;
location of 7th element:4000+7*4.
array[7]=array[4000+7*4];
Hence, its clearly visible you can get ith element in constant time if you know the base address and type of data it holds.
Please visit this link to know more about array data structure.
An array is sequential, meaning that, the next element's address in the array is next to the present one's. So if you want to get 5th element of an array, you calculate the address of 5th element by summing up the base address of the array with 5. This direct address is now directly used to get the element at that address.
You may now further ask the same question here- "How does the computer knows/accesses the calculated address directly?" This is the nature and the principle of computer memory (RAM). If you are further interested in how RAM accesses any address in constant time, you can find it in any computer organization text or you can just google it.
If we know memory location that need to be accessed then this access is in constant time. In case of array the memory location is calculated by using base pointer, index of element and size of element. This involves multiplication and addition operation which takes constant time to execute. Hence element access inside array takes constant time.
Related
I know that accessing something from an array takes O(1) time if it is of one type using mathematical formula array[n]=(start address of array + (n * size Of(type)), but Assume you have an array of objects. These objects could have any number of fields including nested objects. Can we consider the access time to be constant?
Edit- I am mainly asking for JAVA, but I would like to know if there is a difference in case I choose another mainstream language like python, c++, JavaScript etc.
For example in the below code
class tryInt{
int a;
int b;
String s;
public tryInt(){
a=1;
b=0;
s="Sdaas";
}
}
class tryobject{
public class tryObject1{
int a;
int b;
int c;
}
public tryobject(){
tryObject1 o=new tryObject1();
sss="dsfsdf";
}
String sss;
}
public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
Object[] arr=new Object[5];
arr[0]=new tryInt();
arr[1]=new tryobject();
System.out.println(arr[0]);
System.out.println(arr[1]);
}
}
I want to know that since the tryInt type object should take less space than tryobject type object, how will an array now use the formula array[n]=(start address of array + (n * size Of(type)) because the type is no more same and hence this formula should/will fail.
The answer to your question is it depends.
If it's possible to random-access the array when you know the index you want, yes, it's a O(1) operation.
On the other hand, if each item in the array is a different length, or if the array is stored as a linked list, it's necessary to start looking for your element at the beginning of the array, skipping over elements until you find the one corresponding to your index. That's an O(n) operation.
In the real world of working programmers and collections of data, this O(x) stuff is inextricably bound up with the way the data is represented.
Many people reserve the word array to mean a randomly accessible O(1) collection. The pages of a book are an array. If you know the page number you can open the book to the page. (Flipping the book open to the correct page is not necessarily a trivial operation. You may have to go to your library and find the right book first. The analogy applies to multi-level computer storage ... hard drive / RAM / several levels of processor cache)
People use list for a sequentially accessible O(n) collection. The sentences of text on a page of a book are a list. To find the fifth sentence, you must read the first four.
I mention the meaning of the words list and array here for an important reason for professional programmers. Much of our work is maintaining existing code. In our justifiable rush to get things done, sometimes we grab the first collection class that comes to hand, and sometimes we grab the wrong one. For example, we might grab a list O(n) rather than an array O(1) or a hash O(1, maybe). The collection we grab works well for our tests. But, boom!, performance falls over just when the application gets successful and scales up to holding a lot of data. This happens all the time.
To remedy that kind of problem we need a practical understanding of these access issues. I once inherited a project with a homegrown hashed dictionary class that consumed O(n cubed) when inserting lots of items into the dictionary. It took a lot of digging to get past the snazzy collection-class documentation to figure out what was really going on.
In Java, the Object type is a reference to an object rather than an object itself. That is, a variable of type Object can be thought of as a pointer that says “here’s where you should go to find your Object” rather than “I am an actual, honest-to-goodness Object.” Importantly, the size of this reference - the number of bytes used up - is the same regardless of what type of thing the Object variable refers to.
As a result, if you have an Object[], then the cost of indexing into that array is indeed O(1), since the entries in that array are all the same size (namely, the size of an object reference). The sizes of the objects being pointed at might not all be the same, as in your example, but the pointers themselves are always the same size and so the math you’ve given provides a way to do array indexing in constant time.
The answer depends on context.
It's really common in some textbooks to treat array access as O(1) because it simplifies analysis.
And in fairness it is O(1) cpu instructions in today's architectures.
But:
As the dataset gets larger tending to infinity, it doesn't fit in memory. If the "array" is implemented as a database structure spread across multiple machines, you'll end up with a tree structure and probably have logarithmic worst case access times.
If you don't care about data size going to infinity, then big O notation may not be the right fit for your situation
On real hardware, memory accesses are not all equal -- there are many layers of caches and cache misses cost hundreds or thousands of cycles. The O(1) model for memory access tends to ignore that
In theory work, random access machines access memory in O(1), but turing machines cannot. Hierarchical cache effects tend to be ignored. Some models like transdichotomous RAM try to account for this.
In short, this is a property of your model of computation. There are many valid and interesting models of computation and what to choose depends on your needs and your situation.
In general. array denotes a fixed-size memory range, storing elements of the same size. If we consider this usual concept of array, then if objects are members of the array, then under the hood your array stores the object references and referring the i'th element in your array you find the reference of the object/pointer it contains, with an O(1) complexity and the address it points to is something the language is to find.
However, there are arrays which do not comply to this definition. For example, in Javascript you can easily add items to the array, which makes me think that in Javascript the arrays are somewhat different from an allocated fixed-sized range of elements of the same size/type. Also, in Javascript you can add any types of elements to an array. So, in general I would say the complexity is O(1), but there are quite a few important exceptions from this rule, depending on the technologies.
So it is well known to access an element in an array takes only O(1) time since you can use a simple formula to determine the memory address of the element you want.
For instance, say you have an array starting from address 600 and want the 4th index, where each item in the array takes up 4 bits of space then we can directly access 600 + 4 * 4 = memory address 616.
The question is how? Like at the really low level won't it go "is this address 616? no, it is 600", therefore scans more to the right - "Is this address 616? no, it is 601" etc. Creating a low level O(n) access time.
I suppose my question is, how does the memory instantly access the address it needs to? Thanks for any responses :)
For example: I want to use reflect to get a slice's data as an array to manipulate it.
func inject(data []int) {
sh := (*reflect.SliceHeader)(unsafe.Pointer(&data))
dh := (*[len(data)]int)(unsafe.Pointer(sh.Data))
printf("%v\n", dh)
}
This function will emit a compile error for len(data) is not a constant. How should I fix it?
To add to the #icza's comment, you can easily extract the underlying array by using &data[0]—assuming data is an initialized slice. IOW, there's no need to jump through the hoops here: the address of the first slice's element is actually the address of the first slot in the slice's underlying array—no magic here.
Since taking an address of an element of an array is creating
a reference to that memory—as long as the garbage collector is
concerned—you can safely let the slice itself go out of scope
without the fear of that array's memory becoming inaccessible.
The only thing which you can't really do with the resulting
pointer is passing around the result of dereferencing it.
That's simply because arrays in Go have their length encoded in
their type, so you'll be unable to create a function to accept
such array—because you do not know the array's length in advance.
Now please stop and think.
After extracting the backing array from a slice, you have
a pointer to the array's memory.
To sensibly carry it around, you'll also need to carry around
the array's length… but this is precisely what slices do:
they pack the address of a backing array with the length of the
data in it (and also the capacity).
Hence really I think you should reconsider your problem
as from where I stand I'm inclined to think it's a non-problem
to begin with.
There are cases where wielding pointers to the backing arrays
extracted from slices may help: for instance, when "pooling"
such arrays (say, via sync.Pool) to reduce memory churn
in certain situations, but these are concrete problems.
If you have a concrete problem, please explain it,
not your attempted solution to it—what #Flimzy said.
Update I think I should may be better explain the
you can't really do with the resulting
pointer is passing around the result of dereferencing it.
bit.
A crucial point about arrays in Go (as opposed to slices)
is that arrays—as everything in Go—are passed around
by value, and for arrays that means their data is copied.
That is, if you have
var a, b [8 * 1024 * 1024]byte
...
b = a
the statement b = a would really copy 8 MiB of data.
The same obviously applies to arguments of functions.
Slices sidestep this problem by holding a pointer
to the underlying (backing) array. So a slice value
is a little struct type containing
a pointer and two integers.
Hence copying it is really cheap but "in exchange" it
has reference semantics: both the original value and
its copy point to the same backing array—that is,
reference the same data.
I really advise you to read these two pieces,
in the indicated order:
https://blog.golang.org/go-slices-usage-and-internals
https://blog.golang.org/slices
It is said that an example of a O(1) operation is that of accessing an element in an array. According to one source, O(1) can be defined in the following manner:
[Big-O of 1] means that the execution time of the algorithm does not depend on
the size of the input. Its execution time is constant.
However, if one wants to access an element in an array, does not the efficiency of the operation depend on the amount of elements in the array? For example
int[] arr = new int[1000000];
addElements(arr, 1000000); //A function which adds 1 million arbitrary integers to the array.
int foo = arr[55];
I don't understand how the last statement can be described as running in O(1); does not the 1,000,000 elements in the array have a bearing on the running time of the operation? Surely it'd take longer to find element 55 than it would element 1? If anything, this looks to me like O(n).
I'm sure my reasoning is flawed, however, I just wanted some clarification as to how this can be said to run in O(1)?
Array is a data structure, where objects are stored in continuous memory location. So in principle, if you know the address of base object, you will be able to find the address of ith object.
addr(a[i]) = addr(a[0]) + i*size(object)
This makes accessing ith element of array O(1).
EDIT
Theoretically, when we talk about complexity of accessing an array element, we talk for fixed index i.
Input size = O(n)
To access ith element, addr(a[0]) + i*size(object). This term is independent of n, so it is said to be O(1).
How ever multiplication still depends on i but not on n. It is constant O(1).
The address of an element in memory will be the base address of the array plus the index times the size of the element in the array. So to access that element, you just essentially access memory_location + 55 * sizeof(int).
This of course assumes you are under the assumption multiplication takes constant time regardless of size of inputs, which is arguably incorrect if you are being very precise
To find an element isn't O(1) - but accessing element in the array has nothing to do with finding an element - to be precise, you don't interact with other elements, you don't need to access anything but your single element - you just always calculate the address, regardless how big array is, and that is that single operation - hence O(1).
The machine code (or, in the case of Java, virtual machine code) generated for the statement
int foo = arr[55];
Would be essentially:
Get the starting memory address of arr into A
Add 55 to A
Take the contents of the memory address in A, and put it in the memory address of foo
These three instructions all take O(1) time on a standard machine.
In theory, array access is O(1), as others have already explained, and I guess your question is more or less a theoretical one. Still I like to bring in another aspect.
In practice, array access will get slower if the array gets large. There are two reasons:
Caching: The array will not fit into cache or only into a higher level (slower) cache.
Address calculation: For large arrays, you need larger index data types (for example long instead of int). This will make address calculation slower, at least on most platforms.
If we say the subscript operator (indexing) has O(1) time complexity, we make this statement excluding the runtime of any other operations/statements/expressions/etc. So addElements does not affect the operation.
Surely it'd take longer to find element 55 than it would element 1?
"find"? Oh no! "Find" implies a relatively complex search operation. We know the base address of the array. To determine the value at arr[55], we simply add 551 to the base address and retrieve the value at that memory location. This is definitely O(1).
1 Since every element of an int array occupies at least two bytes (when using C), this is no exactly true. 55 needs to be multiplied by the size of int first.
Arrays store the data contiguously, unlike Linked Lists or Trees or Graphs or other data structures using references to find the next/previous element.
It is intuitive to you that the access time of first element is O(1). However you feel that the access time to 55th element is O(55). That's where you got it wrong. You know the address to first element, so the access time to it is O(1).
But you also know the address to 55th element. It is simply address of 1st + size_of_each_element*54 .
Hence you access that element as well as any other element of an array in O(1) time. And that is the reason why you cannot have elements of multiple types in an array because that would completely mess up the math to find the address to nth element of an array.
So, access to any element in an array is O(1) and all elements have to be of same type.
According to Wikipedia, accessing any single element in an array takes constant time as only one operation has to be performed to locate it.
To me, what happens behind the scenes probably looks something like this:
a) The search is done linearly (e.g. I want to access element 5. I begin the search at index 0, if it's not equal to 5, I go to index 1 etc.)
This is O(n) -- where n is the length of the array
b) If the array is stored as a B-tree, this would give O(log n)
I see no other approach.
Can someone please explain why and how this is done in O(1)?
An array starts at a specific memory address start. Each element occupies the same amount of bytes element_size. The array elements are located one after another in the memory from the start address on. So you can calculate the memory address of the element i with start + i * element_size. This computation is independent of the array size and is therefor O(1).
In theory elements of an array are of the same known size and they are located in a continuous part of memory so if beginning of the array is located at A memory address if you want to access any element you have to compute its address like this:
A + item_size*index so this is a constant time operation.
Accessing a single element is NOT finding an element whose value is x.
Accessing an element i means getting the element at the i'th position of the array.
This is done in O(1) because it is pretty simple (constant number of math calculations) where the element is located given the index, the beginning of the array and the size of each element.
RAM memory offers a constant time (or to be more exact, a bounded time) to access each address in the RAM, and since finding the address is O(1), and retrieving the element in it is also O(1), it gives you total of O(1).
Finding if an element whose value is x is actually Omega(n) problem, unless there is some more information on the array (sorted, for example).
Usually an array is organized as a continuous block of memory where each position can be accessed by means of an index calculation. This index calculation cannot be done in constant time for arrays of arbitrary size, but for reasons of addressable space, the numbers involved are bounded by machine word size, so an assumption of constant time is justified.
If you have array of Int's. Each int is 32 bit's in java . If you have for example 10 integers in java it means you allocate memory of 320 bits. Then computer know
0 - index is at memory for example - 39200
last index is at memory 39200 + total memory of your array = 39200+320= 39520
So then if you want to access index 3. Then it is 39200 + 32*3 = 39296.
It all depends how much memory your object takes which you store in array. Just remember arrays take whole block of memory.