How do arrays generally work at a low level? - arrays

How do they map an index directly to a value without having to iterate though the indices?
If it's quite complex where can I read more?

An array is just a contiguous chunk of memory, starting at some known address. So if the start address is p, and you want to access the i-th element, then you just need to calculate:
p + i * size
where size is the size (in bytes) of each element.
Crudely speaking, accessing an arbitrary memory address takes constant time.

Essentially, computer memory can be described as a series of addressed slots. To make an array, you set aside a continuous block of those. So, if you need fifty slots in your array, you set aside 50 slots from memory. In this example, let's say you set aside the slots from 1019 through 1068 for an array called A. Slot 0 in A is slot 1019 in memory. Slot 1 in A is slot 1020 in memory. Slot 2 in A is slot 1021 in memory, and so forth. So, in general, to get the nth slot in an array we would just do 1019+n. So all we need to do is to remember what the starting slot is and add to it appropriately.
If we want to make sure that we don't write to memory beyond the end of our array, we may also want to store the length of A and check our n against it. It's also the case that not all values we wish to keep track of are the same size, so we may have an array where each item in the array takes up more than one slot. In that case, if s is the size of each item, then we need to set aside s times the number of items in the array and when we fetch the nth item, we need to add s time n to the start rather than just n. But in practice, this is pretty easy to handle. The only restriction is that each item in the array be the same size.

Wikipedia explains this very well:
http://en.wikipedia.org/wiki/Array_data_structure
Basically, a memory base is chosen. Then the index is added to the base. Like so:
if base = 2000 and the size of each element is 5 bytes, then:
array[5] is at 2000 + 5*5.
array[i] is at 2000 + 5*i.
Two-dimensional arrays multiply this effect, like so:
base = 2000, size-of-each = 5 bytes
array[i][j] is at 2000 + 5*i*j
And if every index is of a different size, more calculation is necessary:
for each index
slot-in-memory += size-of-element-at-index
So, in this case, it is almost impossible to map directly without iteration.

Related

Given the max size of an array, should I write everything and shrink at the end or extend each time?

Let's say I know the max size of my new would be 8 but in most cases, the logical size would be lower given the program's logic.
In this scenario, is it more memory and time efficient to?:
A. Begin with an array with a physical size of 1 and write to it and afterwards extend (dynamically allocate and freeing the old array) it by one each time and then continue writing to it.
B. Begin with an array of the given max size (in our case 8) and write every needed slot in this array and at the end shrink it to the logical size (dynamically allocated and freeing the old array).

Dynamically indexing an array in C

Is it possible to create arrays based of their index as in
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y] = someNr;
dynamically/on the run, without creating foo[0...3][0...4]?
If not, is there a data structure that allow me to do something similar to this in C?
No.
As written your code make no sense at all. You need foo to be declared somewhere and then you can index into it with foo[x][y] = someNr;. But you cant just make foo spring into existence which is what it looks like you are trying to do.
Either create foo with correct sizes (only you can say what they are) int foo[16][16]; for example or use a different data structure.
In C++ you could do a map<pair<int, int>, int>
Variable Length Arrays
Even if x and y were replaced by constants, you could not initialize the array using the notation shown. You'd need to use:
int fixed[3][4] = { someNr };
or similar (extra braces, perhaps; more values perhaps). You can, however, declare/define variable length arrays (VLA), but you cannot initialize them at all. So, you could write:
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y];
for (int i = 0; i < x; i++)
{
for (int j = 0; j < y; j++)
foo[i][j] = someNr + i * (x + 1) + j;
}
Obviously, you can't use x and y as indexes without writing (or reading) outside the bounds of the array. The onus is on you to ensure that there is enough space on the stack for the values chosen as the limits on the arrays (it won't be a problem at 3x4; it might be at 300x400 though, and will be at 3000x4000). You can also use dynamic allocation of VLAs to handle bigger matrices.
VLA support is mandatory in C99, optional in C11 and C18, and non-existent in strict C90.
Sparse arrays
If what you want is 'sparse array support', there is no built-in facility in C that will assist you. You have to devise (or find) code that will handle that for you. It can certainly be done; Fortran programmers used to have to do it quite often in the bad old days when megabytes of memory were a luxury and MIPS meant millions of instruction per second and people were happy when their computer could do double-digit MIPS (and the Fortran 90 standard was still years in the future).
You'll need to devise a structure and a set of functions to handle the sparse array. You will probably need to decide whether you have values in every row, or whether you only record the data in some rows. You'll need a function to assign a value to a cell, and another to retrieve the value from a cell. You'll need to think what the value is when there is no explicit entry. (The thinking probably isn't hard. The default value is usually zero, but an infinity or a NaN (not a number) might be appropriate, depending on context.) You'd also need a function to allocate the base structure (would you specify the maximum sizes?) and another to release it.
Most efficient way to create a dynamic index of an array is to create an empty array of the same data type that the array to index is holding.
Let's imagine we are using integers in sake of simplicity. You can then stretch the concept to any other data type.
The ideal index depth will depend on the length of the data to index and will be somewhere close to the length of the data.
Let's say you have 1 million 64 bit integers in the array to index.
First of all you should order the data and eliminate duplicates. That's something easy to achieve by using qsort() (the quick sort C built in function) and some remove duplicate function such as
uint64_t remove_dupes(char *unord_arr, char *ord_arr, uint64_t arr_size)
{
uint64_t i, j=0;
for (i=1;i<arr_size;i++)
{
if ( strcmp(unord_arr[i], unord_arr[i-1]) != 0 ){
strcpy(ord_arr[j],unord_arr[i-1]);
j++;
}
if ( i == arr_size-1 ){
strcpy(ord_arr[j],unord_arr[i]);
j++;
}
}
return j;
}
Adapt the code above to your needs, you should free() the unordered array when the function finishes ordering it to the ordered array. The function above is very fast, it will return zero entries when the array to order contains one element, but that's probably something you can live with.
Once the data is ordered and unique, create an index with a length close to that of the data. It does not need to be of an exact length, although pledging to powers of 10 will make everything easier, in case of integers.
uint64_t* idx = calloc(pow(10, indexdepth), sizeof(uint64_t));
This will create an empty index array.
Then populate the index. Traverse your array to index just once and every time you detect a change in the number of significant figures (same as index depth) to the left add the position where that new number was detected.
If you choose an indexdepth of 2 you will have 10² = 100 possible values in your index, typically going from 0 to 99.
When you detect that some number starts by 10 (103456), you add an entry to the index, let's say that 103456 was detected at position 733, your index entry would be:
index[10] = 733;
Next entry begining by 11 should be added in the next index slot, let's say that first number beginning by 11 is found at position 2023
index[11] = 2023;
And so on.
When you later need to find some number in your original array storing 1 million entries, you don't have to iterate the whole array, you just need to check where in your index the first number starting by the first two significant digits is stored. Entry index[10] tells you where the first number starting by 10 is stored. You can then iterate forward until you find your match.
In my example I employed a small index, thus the average number of iterations that you will need to perform will be 1000000/100 = 10000
If you enlarge your index to somewhere close the length of the data the number of iterations will tend to 1, making any search blazing fast.
What I like to do is to create some simple algorithm that tells me what's the ideal depth of the index after knowing the type and length of the data to index.
Please, note that in the example that I have posed, 64 bit numbers are indexed by their first index depth significant figures, thus 10 and 100001 will be stored in the same index segment. That's not a problem on its own, nonetheless each master has his small book of secrets. Treating numbers as a fixed length hexadecimal string can help keeping a strict numerical order.
You don't have to change the base though, you could consider 10 to be 0000010 to keep it in the 00 index segment and keep base 10 numbers ordered, using different numerical bases is nonetheless trivial in C, which is of great help for this task.
As you make your index depth become larger, the amount of entries per index segment will be reduced
Please, do note that programming, especially lower level like C consists in comprehending the tradeof between CPU cycles and memory use in great part.
Creating the proposed index is a way to reduce the number of CPU cycles required to locate a value at the cost of using more memory as the index becomes larger. This is nonetheless the way to go nowadays, as masive amounts of memory are cheap.
As SSDs' speed become closer to that of RAM, using files to store indexes is to be taken on account. Nevertheless modern OSs tend to load in RAM as much as they can, thus using files would end up in something similar from a performance point of view.

How much memory uses an array with one high-index element?

Would running this code occupy about 4_000_000 bytes of memory?
my uint32 #array;
#array[1_000_000] = 1;
If you assign element 1_000_000 and each element is 4 bytes, that would be 4_000_004 bytes of memory. So strictly speaking, the answer is "No" :-)
But less pedantically: native arrays are guaranteed to be laid out consecutively in memory, so such an assignment would at least allocate a single block of 4 x 1_000_001 = 4_000_004 bytes of memory. As Christoph stated in his comment, if you want to make sure it is all it will ever allocate, you need to make it a shaped native array. You will get upper bounds checks as a bonus as well.

Is it a good way to take an unkown-size array with using realloc many times

I will read a file which has an array with unknown-size
like that
1, 2, 3, ....
5, 6, 8 ....
Is that algorithm safe and fast to use ?
array =NULL; /* for realloc */
for(i=0;fgets(line,256,input) != NULL ;++i){
array =(double**)realloc(array,sizeof(double*)*(i+1));
value =strtok(line,selector);
for(j=0;value != NULL;++j){
array[i] =(double*)realloc(array[i],sizeof(double)*(j+1));
sscanf(value,"%lf",&array[i][j]);
value =strtok(NULL,selector);
}
}
On the speed: Your algorithm has quadratic complexity O(n^2), where n is the number of values per line or the number of lines. This is not efficient.
The normal workaround for this is, to keep track of two sizes, the size of the allocated array, and the number of elements that are currently in use. A value is added either by just incrementing the number of elements in current use (and storing the value at the correct location, of course), or by first realloc()ing the array to twice the current size. The result of this is, that even when n is very large, the average element in the array is copied only once. This brings the complexity down to O(n).
Of course, all of this is irrelevant if you only ever have something like ten entries in your arrays. But you were asking for speed.
On the safety: The only risk that I see is that you are fragmenting your address space more than necessary by creating tons of temporary objects which are just created to be replaced by a slightly larger one in the next iteration. This may lead to increased memory hunger in the long run, but it's virtually impossible to gauge this effect precisely.

What is the best way to append an array using for in loop in Swift?

I am currently attempting to learn how to code swift via treehouse.com and so far I am enjoying it. I just came to my first "code challenge" as they call it that I somewhat struggled on. As the title suggested, the started with an empty array as such:
var results: [Int] = []
That is fine and dandy. The goal is to then write a for in loop which finds the multiples of 6 for the numbers 1-10 and then to append those values to the array. I eventually did figure it out, but I am not sure that I did it in the ideal way. I'd appreciate any criticism. My code will be included below. Do bear in mind that I am a total beginner to Swift and to coding in general. Thanks in advance for any suggestions/tips.
var results: [Int] = []
for multiplier in 1...10 {
let multiples = (multiplier * 6)
print(multiples)
results.append(multiples)
}
The code executes properly and the array is appended with the values, but again, I am unsure if there is a better way to do this.
For your first question, Is there batter way or best way to append objects in array in for in loop is already explain by #Alexander, But if check properly he is still at last doing the same way you are doing the difference is only that he has specified the capacity of array, So the way you are appending object in array is look like perfect but for that you need to write to much code.
Now to reduce your code and do what you are currently doing in a Swifty way you can use built in map function for that.
let result = (1...10).map { $0 * 6 }
print(result) // [6, 12, 18, 24, 30, 36, 42, 48, 54, 60]
First (1...10) will create CountableClosedRange after that we are calling map(_:) closure with it. Now map will sequentially take each element from this CountableClosedRange. So $0 will take each argument from CountableClosedRange. Now we are multiplying that element with 6 and returning the result from the closure and generate the result according to its return value in this case it will create Array of Int.
This is a great question, and a perfect opportunity to explain map and array reallocation costs.
Arrays have two important properties you need to understand.
The first is their count. The count is simply the number of items in the array.
The other is the capacity. The capacity is the number of elements that can fit the memory that has been allocated for this array. The capacity is always ≥ the count (d'uh!).
When you have an Array of say, capacity 10, and count 5, and you try to append to it, you can do that right away. Set the 6th element to the new value, and boom, you're done instantly.
Now, what happens if you have an Array of capacity 10, and count 10, and you try to append to it? Well there's no more room for new elements given the memory the array currently has allocated. What needs to happen is that the Array has to ask the operating system for a new chunk of memory that has a capacity that's high enough to fit at least the existing elements, plus the new incoming element. However, this new chunk of memory is empty, so we need to copy over what we had in our old small chunk of memory, before we can do the append. Suppose our new chunk has capacity 20, and we copy our 10 old elements. We can now add in our new element in the 11th slot.
Notice we had to copy the elements from our old memory into the new memory. Well if our count is 10, that's 10 copy operations. This means that appending to an array that's at capacity is much slower than appending an array with available capacity.
Suppose we go on to add another 1000 elements.
- Our array will fill its new memory at count 20, so it'll need to reallocate a new piece of memory, do 20 copy operations. Suppose this new piece of memory has capacity 40.
- Our array will fill its new memory at count 40, so it'll need to reallocate a new piece of memory, do 40 copy operations. Suppose this new piece of memory has capacity 80.
- This process will repeat at count = 80, 160, 320, 640, until we finally have all 1011 elements in our array of capacity 1280. This took 10 + 20+ 40 + 80 + 160 + 320 + 640 = 1270 copy operations in total!
What if we were smart about this, and knew in advance that we were about to add 1000 items, and could tell the array to resize itself all at once, to fit our new 1000 items?
Well we can do exactly that, with reserveCapacity(_:). It lets us tell the array to allocate a certain capacity for us, rather than it having to constantly resize on-demand as we keep stuffing it. Using this technique, we can ensure now copy operations occur at all!
Now let's apply that to your code. We can rewrite it like so, to ensure no array reallocaitons happen (even it 1...10 were something higher, like 1...10000000)
let numbers = 1...10
let numberCount = numbers.count
var products = [Int]()
products.reserveCapacity(numberCount)
for number in numbers {
let product = (number * 6)
print(product)
products.append(product)
}
Now all that is quite a bit of code, for such a simple operation. It's also a pretty common operation. This is why map exists, to simplify all this.
map(_:) takes a closure which tells it what to do to each element. map(_:) will take every element in your sequence (in this case, your CountableClosedRange<Int>, 0...10, but it can be an Array, or any other Sequence), run it through your closure to produce some result, and put the result in the final array.
Internally, map(_:) works very similar to the code you wrote above. It creates a mutable array, reserves enough capacity for all the elements, and runs a for loop which repeatedly appends to that array. The nice thing is that it hides all this logic from you, so all you have to see is this really simple statement:
let produces = (1...10).map{ $0 * 6 }
Closures are explained simply, and in great detail in the Swift Progamming Language Guide. I suggest you take a look when you have a chance!

Resources