How are arrays and hash maps constant time in their access? - arrays

Specifically: given a hash (or an array index), how does the machine get to the data in constant time?
It seems to me that even passing by all the other memory locations (or whatever) would take an amount of time equal to the number of locations passed (so linear time). A coworker has tried valiantly to explain this to me but had to give up when we got down to circuits.
Example:
my_array = new array(:size => 20)
my_array[20] = "foo"
my_array[20] # "foo"
Access of "foo" in position 20 is constant because we know which bucket "foo" is in. How did we magically get to that bucket without passing all the others on the way? To get to house #20 on a block you would still have to pass by the other 19...

How did we magically get to that
bucket without passing all the others
on the way?
"We" don't "go" to the bucket at all. The way RAM physically works is more like broadcasting the bucket's number on a channel on which all buckets listen, and the one whose number was called will send you its contents.
Calculations happen in the CPU. In theory, the CPU is the same "distance" from all memory locations (in practice it's not, because of caching, which can have a huge impact on performance).
If you want the gritty details, read "What every programmer should know about memory".

Then to understand you have to look at how memory is organized and accessed. You may have to look at the way an address decoder works. The thing is, you do NOT have to pass by all the other addresses to get to the one you want in the memory. You can actually jump to the one you want. Otherwise our computers would be really really slow.

Unlike a turing machine, which would have to access memory sequentially, computers use random access memory, or RAM, which means if they know where the array starts and they know they want to access the 20th element of the array, they know what part of memory to look at.
It is less like driving down a street and more like picking the correct mail slot for your apartment in a shared mailbox.

2 things are important:
my_array has information about where in memory computer must jump to get this array.
index * sizeof type gets offset from beginning of array.
1 + 2 = O(1) where data can be found

Big O doesn't work like that. It's supposed to be a measure of how much computational resources are used by a particular algorithm and function. It's not meant to measure the amount of memory used and if you are talking about traversing that memory, it's still a constant time. If I need to find the second slot of an array it's a matter of adding an offset to a pointer. Now, if I have a tree structure and I want to find a particular node you are now talking about O(log n) because it doesn't find it on the first pass. On average it takes O(log n) to find that node.

Let's discuss this in C/C++ terms; there's some additional stuff to know about C# arrays but it's not really relevant to the point.
Given an array of 16-bit integer values:
short[5] myArray = {1,2,3,4,5};
What's really happened is that the computer has allocated a block of space in memory. This memory block is reserved for that array, is exactly the size needed to hold the full array (in our case 16*5 == 80 bits == 10 bytes), and is contiguous. These facts are givens; if any or none of them are true in your environment at any given time, you're generally at risk for your program crashing due to an access vialoation.
So, given this structure, what the variable myArray really is, behind the scenes, is the memory address of the start of the block of memory. This is also, conveniently, the start of the first element. Each additional element is lined up in memory right after the first, in order. The memory block allocated for myArray might look like this:
00000000000000010000000000000010000000000000001100000000000001000000000000000101
^ ^ ^ ^ ^
myArray([0]) myArray[1] myArray[2] myArray[3] myArray[4]
It is considered a constant-time operation to access a memory address and read a constant number of bytes. As in the above figure, you can get the memory address for each one if you know three things; the start of the memory block, the memory size of each element, and the index of the element you want. So, when you ask for myArray[3] in your code, that request is turned into a memory address by the following equation:
myArray[3] == &myArray+sizeof(short)*3;
Thus, with a constant-time calculation, you have found the memory address of the fourth element (index 3), and with another constant-time operation (or at least considered so; actual access complexity is a hardware detail and fast enough that you shouldn't care) you can read that memory. This is, if you've ever wondered, why indexes of collections in most C-style languages start at zero; the first element of the array starts at the location of the array itself, no offset (sizeof(anything)*0 == 0)
In C#, there are two notable differences. C# arrays have some header information that is of use to the CLR. The header comes first in the memory block, and the size of this header is constant and known, so the addressing equation has just one key difference:
myArray[3] == &myArray+headerSize+sizeof(short)*3;
C# doesn't allow you to directly reference memory in its managed environment, but the runtime itself will use something like this to perform memory access off the heap.
The second thing, which is common to most flavors of C/C++ as well, is that certain types are always dealt with "by reference". Anything you have to use the new keyword to create is a reference type (and there are some objects, like strings, that are also reference types although they look like value types in code). A reference type, when instantiated, is placed in memory, doesn't move, and is usually not copied. Any variable that represents that object is thus, behind the scenes, just the memory address of the object in memory. Arrays are reference types (remember myArray was just a memory address). Arrays of reference types are arrays of these memory addresses, so accessing an object that is an element in an array is a two-step process; first you calculate the memory address of the element in the array, and get that. That is another memory address, which is the location of the actual object (or at least its mutable data; how compound types are structured in memory is a whole other can o' worms). This is still a constant-time operation; just two steps instead of one.

Related

Where is the address to the first element of an array stored?

I was playing with C, and I just discovered that a and &a yield to the same result that is the address to the first element of the array. By surfing here over the topics, I discovered they are only formatted in a different way. So my question is: where is this address stored?
This is an interesting question! The answer will depend on the specifics of the hardware you're working with and what C compiler you have.
From the perspective of the C language, each object has an address, but there's no specific prescribed mechanism that accounts for how that address would actually be stored or accessed. That's left up to the compiler to decide.
Let's imagine that you've declared your array as a local variable, and then write something like array[137], which accesses the 137th element of the array. How does the generated program know how to find your array? On most systems, the CPU has a dedicated register called the stack pointer that keeps track of the position of the memory used for all the local variables of the current function. As the compiler translates your C code into an actual executable file, it maintains an internal table mapping each local variable to some offset away from where the stack pointer points. For example, it might say something like "because 64 bytes are already used up for other local variables in this function, I'm going to place array 64 bytes past where the stack pointer points." Then, whenever you reference array, the compiler generates machine instructions of the form "look 64 bytes past the stack pointer to find the array."
Now, imagine you write code like this:
printf("%p\n", array); // Print address of array
How does the compiler generate code for this? Well, internally, it knows that array is 64 bytes past the stack pointer, so it might generate code of the form "add 64 to the stack pointer, then pass that as an argument to printf."
So in that sense, the answer to your question could be something like "the hardware stores a single pointer called the stack pointer, and the generated code is written in a way that takes that stack pointer and then adds some value to it to get to the point in memory where the array lives."
Of course, there are a bunch of caveats here. For example, some systems have both a stack pointer and a frame pointer. Interpreters use a totally different strategy and maintain internal data structures tracking where everything is. And if the array is stored at global scope, there's a different mechanism used altogether.
Hope thi shelps!
It isn't stored anywhere - it's computed as necessary.
Unless it is the operand of the sizeof, _Alignof, or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" is converted ("decays") to an expression of type "pointer to T", and the value of the expression is the address of the first element of the array.
When you declare an array like
T a[N]; // for any non-function type T
what you get in memory is
+---+
| | a[0]
+---+
| | a[1]
+---+
...
+---+
| | a[N-1]
+---+
That's it. No storage is materialized for any pointer. Instead, whenever you use a in any expression, the compiler will compute the address of a[0] and use that instead.
Consider this C code:
int x;
void foo(void)
{
int y;
...
}
When implementing this program, a C compiler will need to generate instructions that access the int objects named x and y and the int object allocated by the malloc. How does it tell those instructions where the objects are?
Each processor architecture has some way of referring to data in memory. This includes:
The machine instruction includes some bits that identify a processor register. The address in memory is in that processor register.
The machine instruction includes some bits that specify an address.
The machine instruction includes some bits that specify a processor register and some bits that specify an offset or displacement.
So, the compiler has a way of giving an address to the processor. It still needs to know that address. How does it do that?
One way is the compiler could decide exactly where everything in memory is going to go. It could decide it is going to put all the program’s instructions at addresses 0 to 10,000, and it is going to put data at 10,000 and on, and that x will go at address 12300. Then it could write an instruction to fetch x from address 12300. This is called absolute addressing, and it is rarely used anymore because it is inflexible.
Another option is that the compiler can let the program loader decide where to put the data. When the software that loads the program into memory is running, it will read the executable, see how much space is needed for instructions, how much is needed for data that is initialized to zero, how much space is needed for data with initial values listed in the executable file, how much space is needed for data that does not need to be initialized, how much space is requested for the stack, and so on. Then the loader will decide where to put all of these things. As it does so, it will set some processor registers, or some tables in memory, to contain the addresses where things go.
In this case, the compiler may know that x goes at displacement 2300 from the start of the “zero-initialized data” section, and that the loader sets register r12 to contain the base address of that section. Then, when the compiler wants to access x, it will generate an instruction that says “Use register r12 plus the displacement 2300.” This is largely the method used today, although there are many embellishments involving linking multiple object modules together, leaving a placeholder in the object module for the name x that the linker or loader fills in with the actual displacement as they do their work, and other features.
In the case of y, we have another problem. There can be two or more instances of y existing at once. The function foo might call itself, which causes there to be a y for the first call and a different y for the second call. Or foo might call another function that calls foo. To deal with this, most C implementations use a stack. One register in the processor is chosen to be a stack pointer. The loader allocates a large amount of space and sets the stack pointer register to point to the “top” of the space (usually the high-address end, but this is arbitrary). When a function is called, the stack pointer is adjusted according to how much space the new function needs for its local data. When the function executes, it puts all of its local data in memory locations determined by the value of the stack pointer when the function started executing.
In this model, the compiler knows that the y for the current function call is at a particular offset relative to the current stack pointer, so it can access y using instructions with addresses such as “the contents of the stack pointer plus 84 bytes.” (This can be done with a stack pointer alone, but often we also have a frame pointer, which is a copy of the stack pointer at the moment the function was called. This provides a firmer base address for working with local data, one that might not change as much as the stack pointer does.)
In either of these models, the compiler deals with the address of an array the same way it deals with the address of a single int: It knows where the object is stored, relative to some base address for its data segment or stack frame, and it generates the same sorts of instruction addressing forms.
Beyond that, when you access an array, such as a[i], or possibly a multidimensional array, a[i][j][k], the compiler has to do more calculations. To do this, compiler takes the starting address of the array and does the arithmetic necessary to add the offsets for each of the subscripts. Many processors have instructions that help with these calculations—a processor may have an addressing form that says “Take a base address from one register, add a fixed offset, and add the contents of another register multiplied by a fixed size.” This will help access arrays of one dimension. For multiple dimensions, the compiler has to write extra instructions to do some of the calculations.
If, instead of using an array element, like a[i], you take its address, as with &a[i], the compiler handles it similarly. It will get a base address from some register (the base address for the data segment or the current stack pointer or frame pointer), add the offset to where a is in that segment, and then add the offset required for i elements. All of the knowledge of where a[i] is is built into the instructions the compiler writes, plus the registers that help manage the program’s memory layout.
Yet one more point of view, a TL;DR answer if you will: When the compiler produces the binary, it stores the address everywhere where it is needed in the generated machine code.
The address may be just plain number in the machine code, or it may be a calculation of some sort, such as "stack frame base address register + a fixed offset number", but in either case it is duplicated everywhere in the machine code where it is needed.
In other words, it is not stored in any one location. Talking more technically, &some_array is not an lvalue, and trying to take the address of it, &(&some_array), will produce compiler error.
This actually applies to all variables, array is not special in any way here. The address of a variable can be used in the machine code directly (and if compiler actually generates code which does store the address somewhere, you have no way to know that from C code, you have to look at the assembly code).
The one thing special about arrays, which seems to be the source of your confusion is, that some_array is bascially a more convenient syntax for &(some_array[0]), while &some_array means something else entirely.
Another way to look at it:
The address of the first element doesn't have to be stored anywhere.
An array is a chunk of memory. It has an address simply because it exists somewhere in memory. That address may or may not have to be stored somewhere depending on a lot of things that others have already mentioned.
Asking where the address of the array has to be stored is like asking where reality stores the location of your car. The location doesn't have to be stored - your car is located where your car happens to be - it's a property of existing. Sure, you can make a note that you parked your car in row 97, spot 114 of some huge lot, but you don't have to. And your car will be wherever it is regardless of your note-taking.

Theta of 1 in big arrays

If I allocated an array of 1,000,000,000 members successfully, how can I access to member in index 999,999,999 in Theta of 1?
According to array attributes, an access for each member should be Theta of 1. However, isn't there some sort of internal loop that counts the indices until it gets to required member? If there is, shouldn't it be Theta of n?
No, there's no internal loop. Arrays are random access, meaning any element can be accessed in Θ(1) time. All the computer has to do is take the array's starting address, add an offset to the desired element, and look up the value at the computed address.
In practice, you are unlikely to ever have an array with a billion elements. Arrays aren't well suited to such large data sets as they'd be several gigabytes or more in size. More sophisticated data structures and/or algorithms are typically employed. For instance, a naïve program might read a 2GB file into a 2GB byte array, whereas a smarter one would read it in small chunks, say 4KB at a time.
It is actually in theta of (1) only. when you declare arr=int[100000000]
arr var will store the first address of the memory allocations.
When you do arr[n] what it does is *(arr+n) directly add n to the starting address and directly access the array.
Arrays are always stored in sequence manner only.
for more info please read https://www.ics.uci.edu/~dan/class/165/notes/memory.html
ask in the comment for more resources if you need.

How does a C program get information from an array internally?

I'm relatively new to programming, so when someone suggested that building an array of structs (each containing n attributes of a particular "item") was faster than building n arrays of attributes, I found that I didn't know enough about arrays to argue one way or the other.
I read this:
how do arrays work internally in c/c++
and
Basic Arrays Tutorial
But I still don't really understand how a C program retrieves a particular value from an array by index.
It seems pretty clear that data elements of the array are stored adjacent in memory, and that the array name points to the first element.
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data, or does the program have to somehow iterate over every intermediary piece of data before it gets there (as in the linkedlist data structure)?
More fundamentally, if a program asks for a piece of information by its memory address, how does the machine find it?
Let's take a simpler example. Let's say you have a array int test[10] which is stored like this at address 1000:
1|2|3|4|5|6|7|8|9|10
The complier knows that, for example, an int is 4 bytes. The array access formula is this:
baseaddr + sizeof(type) * index
The size of a struct is just the sum of the sizes of its elements plus any padding added by the compiler. So the size of this struct:
struct test {
int i;
char c;
}
Might be 5. It also might not be, because of padding.
As for your last question, very shortly (this is very complicated) the MMU uses the page table to translate the virtual address to a physical address, which is then requested, which if it's in cache, is returned, and otherwise is fetched from main memory.
You wrote:
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data
Yes, that is exactly what they do. They do not iterate over intervening items (and doing so would not help since there are no markers to guide beginning and end of each item).
so here is the whole trick, array elements are adjacent in memory.
when you declare an array for example: int A[10];
the variable A is a pointer to the first element in the array.
now comes the indexing part, whenever you do A[i] it is exactly as if you were doing *(A+i).
the index is just an offset to the beginning address of the array, also keep in mind that in pointer arithmetic the offset is multiplied by the size of the data type of the array.
to get a better understanding of this write a little code, declare an array and print the address of it and then of each element in the array.
and notice how the offset is always the same and equal to the size of the data type of your array on your machine.

Is it possible that the initialisation fails, when creating an array with a size N

From what I understands, when I create a static array, say, int[] array = new int[N];, the run time actually looks for N*4 bytes of memory whose addresses are also continuous. right?
So what if the run time can't find continuous memory addresses?
for example, if my memory is 128MB, and in my application N = 25M, which means I need 100MB memory for my array. Is it possible for this creation of array to fail? Is it possible, the 100MB of memory in need can't be located because there are too many memory fragments?
thanks
Yes it can fail. In that case an OutOfMemoryException will be thrown. An easy way to test this is the following:
int[] array = new int[int.MaxValue];
(This assumes C#, behavior in Java will be similar)
If we are talking abut C++ (but is the same in general) arrays are contiguous, meaning that the memory has consecutive addresses, i.e. it's contiguous in virtual address space. It need not be contiguous in physical address space(programmers never sees an actual address of an array element, just a reference to the array and the means to index it).
Anyway if there is not memory available you will get an exception(it is not a contiguous matter)

Best place to put constant memory which is known before kernel launch and never changed

I have an array of integers which size is known before the kernel launch but not during the compilation stage. The upper bound on the size is around 10000 float3 elements (I guess that means 10000 * 3 * 4 = ~120KB). It is not known at the compile time.
All threads scan linearly through (at most) all of the elements in the array.
You could check the size at runtime, then if it will fit use cudaMemcpyToSymbol, or otherwise use texture or global memory. This is slightly messy, you will have to have some parameter to tell the kernel where the data is. As always, always test actual performance. Different access patterns can have drastically different speeds in different types of memory.
Another thought is to take a step back and look at the algorithm again. There are often ways of dividing the problem differently to get the constant table to always fit into constant memory.
If all threads in a warp access the same elements at the same time then you should probably consider using constant memory, since this is not only cached, but it also has a broadcast capability whereby all threads can read the same address in a single cycle.
You could calculate the free constant memory after compile your kernels and allocate it statically.
__constant__ int c[ALL_I_CAN_ALLOCATE];
Then, copy your data to constant memory using cudaMemcpyToSymbol().
I think this might answer your question but your requirement for constant memory exceed the limits of the GPU.
I'll recommend other approaches, i.e. use the share memory which can broadcast data if all threads in a halfwarp read from the same location.

Resources