Theta of 1 in big arrays - arrays

If I allocated an array of 1,000,000,000 members successfully, how can I access to member in index 999,999,999 in Theta of 1?
According to array attributes, an access for each member should be Theta of 1. However, isn't there some sort of internal loop that counts the indices until it gets to required member? If there is, shouldn't it be Theta of n?

No, there's no internal loop. Arrays are random access, meaning any element can be accessed in Θ(1) time. All the computer has to do is take the array's starting address, add an offset to the desired element, and look up the value at the computed address.
In practice, you are unlikely to ever have an array with a billion elements. Arrays aren't well suited to such large data sets as they'd be several gigabytes or more in size. More sophisticated data structures and/or algorithms are typically employed. For instance, a naïve program might read a 2GB file into a 2GB byte array, whereas a smarter one would read it in small chunks, say 4KB at a time.

It is actually in theta of (1) only. when you declare arr=int[100000000]
arr var will store the first address of the memory allocations.
When you do arr[n] what it does is *(arr+n) directly add n to the starting address and directly access the array.
Arrays are always stored in sequence manner only.
for more info please read https://www.ics.uci.edu/~dan/class/165/notes/memory.html
ask in the comment for more resources if you need.

Related

How does array offset access actually work

We all are aware of how easy it is to access elements of an array in the blink of an eye:
#include<stdio.h>
int main()
{
int array[10];
array[5]=6; //setat operation at index 5
printf("%d",array[5]); //getat operation
}
Yea, question may sound a bit stupid but how does a compiler just get you the index that you want to access for inserting data or for displaying it, so fast. Does it traverse to that index on its own for completing setat(),getat() operations.
Cause general means are: if you are asked to pick 502th element from a row of 1000 units, you would start counting until u get the count 502 (in case of computer 501) so is this the same happening in computer.
The array is stored in random-access memory (RAM). RAM is is divided into equal-sized, individually addressable units, such as bytes. Addressable means enumerable by an address, which is a number. Random access means that the processor doesn't have to traverse addresses 0 through 499 in order to access location 500. It directly proceeds to 500. How it works is that the computer places a binary representation of the adress 500 onto a collection of signal lines called the "address bus". All of the devices connected to the address bus simultaneously examine the address and their circuitry answers the question "is this address in my range?". The device for which the answer is yes, then springs into action.
In the case of RAM it circuitry further decodes the address to determine which row and column of which bank to activate. The values read out are placed onto the data bus for the processor to collect. The actual implementation is considerably more complicated due to caching, but that's the basic idea.
The main idea is that the machine accesses memory and memory-like resources (such as I/O ports) using an address, and the address is distributed, as a set of electrical signals, in parallel to all of the devices, which can look at it at once; and those devices themselves have parallel circuitry to further analyze the address to identify a specific resource within their innards. So addressing happens very fast, without having to search through resources that are not being addressed.
C language arrrays are a very low-level concept. A C array sits at some address in memory and holds equal sized objects. These objects are accessed by performing arithmetic. For instance if the array elements are 8 bytes wide, then accessing the 17th element means that the machine has to multiply 17 x 8 to produce the offset 136, which is then added to the address of the array to produce the address of the element.
In youor program you have the expression array[5]. The value 5 is known to the C compiler at compile time (before the program is translated, linked and executed). The size of the array elements, which are of type int is also known at compile time. The address of array isn't known at compile time. Therefore the offset calculation likely takes place at compile time; the 5 is converted to a sizeof (int) * 5 offset calculated at compile time to a value like 20, which is then added to the address of array at run-time to calculate the address of array[5] and fetch its value from that address.

How does a C program get information from an array internally?

I'm relatively new to programming, so when someone suggested that building an array of structs (each containing n attributes of a particular "item") was faster than building n arrays of attributes, I found that I didn't know enough about arrays to argue one way or the other.
I read this:
how do arrays work internally in c/c++
and
Basic Arrays Tutorial
But I still don't really understand how a C program retrieves a particular value from an array by index.
It seems pretty clear that data elements of the array are stored adjacent in memory, and that the array name points to the first element.
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data, or does the program have to somehow iterate over every intermediary piece of data before it gets there (as in the linkedlist data structure)?
More fundamentally, if a program asks for a piece of information by its memory address, how does the machine find it?
Let's take a simpler example. Let's say you have a array int test[10] which is stored like this at address 1000:
1|2|3|4|5|6|7|8|9|10
The complier knows that, for example, an int is 4 bytes. The array access formula is this:
baseaddr + sizeof(type) * index
The size of a struct is just the sum of the sizes of its elements plus any padding added by the compiler. So the size of this struct:
struct test {
int i;
char c;
}
Might be 5. It also might not be, because of padding.
As for your last question, very shortly (this is very complicated) the MMU uses the page table to translate the virtual address to a physical address, which is then requested, which if it's in cache, is returned, and otherwise is fetched from main memory.
You wrote:
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data
Yes, that is exactly what they do. They do not iterate over intervening items (and doing so would not help since there are no markers to guide beginning and end of each item).
so here is the whole trick, array elements are adjacent in memory.
when you declare an array for example: int A[10];
the variable A is a pointer to the first element in the array.
now comes the indexing part, whenever you do A[i] it is exactly as if you were doing *(A+i).
the index is just an offset to the beginning address of the array, also keep in mind that in pointer arithmetic the offset is multiplied by the size of the data type of the array.
to get a better understanding of this write a little code, declare an array and print the address of it and then of each element in the array.
and notice how the offset is always the same and equal to the size of the data type of your array on your machine.

Best place to put constant memory which is known before kernel launch and never changed

I have an array of integers which size is known before the kernel launch but not during the compilation stage. The upper bound on the size is around 10000 float3 elements (I guess that means 10000 * 3 * 4 = ~120KB). It is not known at the compile time.
All threads scan linearly through (at most) all of the elements in the array.
You could check the size at runtime, then if it will fit use cudaMemcpyToSymbol, or otherwise use texture or global memory. This is slightly messy, you will have to have some parameter to tell the kernel where the data is. As always, always test actual performance. Different access patterns can have drastically different speeds in different types of memory.
Another thought is to take a step back and look at the algorithm again. There are often ways of dividing the problem differently to get the constant table to always fit into constant memory.
If all threads in a warp access the same elements at the same time then you should probably consider using constant memory, since this is not only cached, but it also has a broadcast capability whereby all threads can read the same address in a single cycle.
You could calculate the free constant memory after compile your kernels and allocate it statically.
__constant__ int c[ALL_I_CAN_ALLOCATE];
Then, copy your data to constant memory using cudaMemcpyToSymbol().
I think this might answer your question but your requirement for constant memory exceed the limits of the GPU.
I'll recommend other approaches, i.e. use the share memory which can broadcast data if all threads in a halfwarp read from the same location.

How are arrays and hash maps constant time in their access?

Specifically: given a hash (or an array index), how does the machine get to the data in constant time?
It seems to me that even passing by all the other memory locations (or whatever) would take an amount of time equal to the number of locations passed (so linear time). A coworker has tried valiantly to explain this to me but had to give up when we got down to circuits.
Example:
my_array = new array(:size => 20)
my_array[20] = "foo"
my_array[20] # "foo"
Access of "foo" in position 20 is constant because we know which bucket "foo" is in. How did we magically get to that bucket without passing all the others on the way? To get to house #20 on a block you would still have to pass by the other 19...
How did we magically get to that
bucket without passing all the others
on the way?
"We" don't "go" to the bucket at all. The way RAM physically works is more like broadcasting the bucket's number on a channel on which all buckets listen, and the one whose number was called will send you its contents.
Calculations happen in the CPU. In theory, the CPU is the same "distance" from all memory locations (in practice it's not, because of caching, which can have a huge impact on performance).
If you want the gritty details, read "What every programmer should know about memory".
Then to understand you have to look at how memory is organized and accessed. You may have to look at the way an address decoder works. The thing is, you do NOT have to pass by all the other addresses to get to the one you want in the memory. You can actually jump to the one you want. Otherwise our computers would be really really slow.
Unlike a turing machine, which would have to access memory sequentially, computers use random access memory, or RAM, which means if they know where the array starts and they know they want to access the 20th element of the array, they know what part of memory to look at.
It is less like driving down a street and more like picking the correct mail slot for your apartment in a shared mailbox.
2 things are important:
my_array has information about where in memory computer must jump to get this array.
index * sizeof type gets offset from beginning of array.
1 + 2 = O(1) where data can be found
Big O doesn't work like that. It's supposed to be a measure of how much computational resources are used by a particular algorithm and function. It's not meant to measure the amount of memory used and if you are talking about traversing that memory, it's still a constant time. If I need to find the second slot of an array it's a matter of adding an offset to a pointer. Now, if I have a tree structure and I want to find a particular node you are now talking about O(log n) because it doesn't find it on the first pass. On average it takes O(log n) to find that node.
Let's discuss this in C/C++ terms; there's some additional stuff to know about C# arrays but it's not really relevant to the point.
Given an array of 16-bit integer values:
short[5] myArray = {1,2,3,4,5};
What's really happened is that the computer has allocated a block of space in memory. This memory block is reserved for that array, is exactly the size needed to hold the full array (in our case 16*5 == 80 bits == 10 bytes), and is contiguous. These facts are givens; if any or none of them are true in your environment at any given time, you're generally at risk for your program crashing due to an access vialoation.
So, given this structure, what the variable myArray really is, behind the scenes, is the memory address of the start of the block of memory. This is also, conveniently, the start of the first element. Each additional element is lined up in memory right after the first, in order. The memory block allocated for myArray might look like this:
00000000000000010000000000000010000000000000001100000000000001000000000000000101
^ ^ ^ ^ ^
myArray([0]) myArray[1] myArray[2] myArray[3] myArray[4]
It is considered a constant-time operation to access a memory address and read a constant number of bytes. As in the above figure, you can get the memory address for each one if you know three things; the start of the memory block, the memory size of each element, and the index of the element you want. So, when you ask for myArray[3] in your code, that request is turned into a memory address by the following equation:
myArray[3] == &myArray+sizeof(short)*3;
Thus, with a constant-time calculation, you have found the memory address of the fourth element (index 3), and with another constant-time operation (or at least considered so; actual access complexity is a hardware detail and fast enough that you shouldn't care) you can read that memory. This is, if you've ever wondered, why indexes of collections in most C-style languages start at zero; the first element of the array starts at the location of the array itself, no offset (sizeof(anything)*0 == 0)
In C#, there are two notable differences. C# arrays have some header information that is of use to the CLR. The header comes first in the memory block, and the size of this header is constant and known, so the addressing equation has just one key difference:
myArray[3] == &myArray+headerSize+sizeof(short)*3;
C# doesn't allow you to directly reference memory in its managed environment, but the runtime itself will use something like this to perform memory access off the heap.
The second thing, which is common to most flavors of C/C++ as well, is that certain types are always dealt with "by reference". Anything you have to use the new keyword to create is a reference type (and there are some objects, like strings, that are also reference types although they look like value types in code). A reference type, when instantiated, is placed in memory, doesn't move, and is usually not copied. Any variable that represents that object is thus, behind the scenes, just the memory address of the object in memory. Arrays are reference types (remember myArray was just a memory address). Arrays of reference types are arrays of these memory addresses, so accessing an object that is an element in an array is a two-step process; first you calculate the memory address of the element in the array, and get that. That is another memory address, which is the location of the actual object (or at least its mutable data; how compound types are structured in memory is a whole other can o' worms). This is still a constant-time operation; just two steps instead of one.

Does initialization of 2D array in c program waste of too much time?

I am writing a C program which has to use a 2D array to store previously processed data for later using.
The size of this 2D array 33x33; matrix[33][33].
I define it as a global parameter, so it will be initialized for only one time. Dose this definition cost a lot of time when program is running? Because I found my program turn to be slower than previous version without using this matrix to store data.
Additional:
I initialize this matrix as a global parameter like this:
int map[33][33];
In one of function A, I need to store all of 33x33 data into this matrix.
In another function B, I will fetch 3x3 small matrix from map[33][33] for my next step of processing.
Above 2 steps will be repeated for about 8000 times. So, will it affect program running efficiency?
Or, I have another guess that the program truns to be slower because of there are couple of if-else branch statements were lately added into the program.
How ere you doing it before? The only problem I can think of is that extracting a 3x3 sub matrix from a 33x33 integer matrix is going to cause you cacheing issues every time you extract the sub matrix.
On most modern machines the cacheline is 64 bytes in size. Thats enough for 8 elements of the matrix. So for each extra line of the 3x3 sub matrix you will be performing a new cacheline fetch. If the matrix gets hammered very regularly then the matrix will probably sit mostly in the level 2 cache (or maybe even the level 1 if its big enough) but if you are doing lots of other data calculations in between each sub-matrix fetch then you will be getting 3expensive cacheline fetches each time you grab the sub matrix.
However even then its unlikely you'd see a HUGE difference in performance. As stated elsewhere we need to see before and after code to be able to hazard a guess at why performance has got worse ...
Simplifying slightly, there are three kinds of variables in C: static, automatic, and dynamic.
Static variables exist throughout the lifetime of the program, and include both global variables, and local variables declared using static. They are either initialized to zeroes (the default), or explicitly initialized data. If they are zeroes, the linker stores them into a fresh memory page that it initializes to zeroes by the operating system (this takes a tiny amount of time). If they are explicitly allocated, the linker puts the data into a memory area in the executable and the operating system loads it from there (this requires reading the data from disk into memory).
Automatic variables are allocated from the stack, and if they are initialized, this happens every time they are allocated. (If not, they have no value, or perhaps they have a random value, and so initialization takes no time.)
Dynamic variables are allocated using malloc, and you have to initialize them yourself, and that again takes a bit of time.
It is highly probably that your slowdown is not caused by the initialization. To make sure of this, you should measure it by profiling your program and seeing where time is spent. Unfortunately, profiling may be difficult for initialization done by the compiler/linker/operating system, especially for the parts that happen before your program starts executing.
If you want to measure how much time it takes to initialize your array, you could write a dummy program that does nothing but includes the array.
However, since 33*33 is a fairly small number, either your matrix items are very large, your computer is very slow, or your 33 is larger than mine.
No, there is no difference in runtime between initializing an array once (with whatever method) and not initializing it.
If you found a difference between your 2 versions, that must be due to differences in the implementation of the algorithm (or a different algorithm).
Well, I wouldn't expect it to (something like that should take much less than a second), but an easy way to find out would be to simply put a print statement at the start of main().
That way you can see if global, static variable initialization is really causing this. Is there anything else in your program that you've changed lately?
EDIT One way to get a clearer idea of whats taking so long would be to use a debugger like GDB or a profiler like GProf
If your program is accessing the matrix a lot during running (even if it's not being updated at all), the calculations of address of an element will involve a multiply by 33. Doing a lot of this could have the effect of slowing down your program.
How did your previous program version store the data if not in matrix? How were you able to read a sub-matrix if you did not have the big matrix?
Many answers talk about the time spent for initializing. But I don't think that was the question. Anyway, on modern processors, initializing such a small array takes just a few microseconds. And it is only done once, at program start.
If you need to fetch a sub-matrix from any position, there is probably no faster method than using a static 2D array. However, depending on processor architecture, accessing the array could be faster if the array dimensions (or just the last dimension) are power of 2 (e.g. 32, 64 etc.) since this would allow using shift instead of multiply.
If the accessed sub-matrices do not overlap (i.e. you would only access indexes 0, 3, 6 etc.) then using 3-dimensional or 4-dimensional array could speed up the access
int map[11][11][3][3];
This makes each sub-matrix a contiguous block of memory, which can be copied with a single block copy command.
Further, it may fit in single cache line.
theoretically using N-th dimensional array shouldn't have performance difference as all of them resolve into contiguous memory reservation by compiler.
int _1D[1089];
int _2D[33][33];
int _3D[3][11][33];
should give similar allocation/deallocation speed.
You need to benchmark your program. If you don't need the initialization, don't make the variable static, or (maybe) allocate it yourself from the heap using malloc():
mystery_type *matrix;
matrix = malloc(33 * 33 * sizeof *matrix);

Resources