Pointer memory address to binary conversion in C? - c

I'm trying to implement the fast fourier transform from first principles. One of the first steps in doing so is to reorder the input data to a specific sequence from which a radix-2 butterfly algorithm can be applied. This specific sequence is achieved through bit reversal of the positions of the array as illustrated below:
The way I thought to do this was, given an array of real sampled data, to create a pointer which references the first position of the array. Then to use that pointer to convert the memory address of the first position of the data array to a binary number, perform bit reversal on it, convert back to hexadecimal and to set the first position of a new array equal to the dereferenced value of that 'bit reversed' memory address. Doing this in a loop I would be able to increase the original pointer each time, work out the 'reversed' address and populate the new array with the values in the correct order.
I have two questions:
Is this even good programming practice? I know that setting pointers to specific addresses is frowned upon but I figure that the arrays are allocated in memory on startup so it should be okay.
How would I convert a pointer value to a binary value using C language? I thought of something like this:
int sampledData [8];
int * pointer = samples;
int hex_address = (int)pointer;

For an FFT, you don't want to bit-reverse an address pointer (which might not be aligned to a suitable boundary for the FFT length), you want to bit-reverse a zero-based array index (sometimes implemented in C as a pointer offset to access an array).
Very commonly, the permutation is done "in place" where the elements of the array are swapped (using a temporary variable) using both the original and the bit-revered index, rather than copied to a new array, which requires more memory (larger data cache footprint, etc.).

Related

Theta of 1 in big arrays

If I allocated an array of 1,000,000,000 members successfully, how can I access to member in index 999,999,999 in Theta of 1?
According to array attributes, an access for each member should be Theta of 1. However, isn't there some sort of internal loop that counts the indices until it gets to required member? If there is, shouldn't it be Theta of n?
No, there's no internal loop. Arrays are random access, meaning any element can be accessed in Θ(1) time. All the computer has to do is take the array's starting address, add an offset to the desired element, and look up the value at the computed address.
In practice, you are unlikely to ever have an array with a billion elements. Arrays aren't well suited to such large data sets as they'd be several gigabytes or more in size. More sophisticated data structures and/or algorithms are typically employed. For instance, a naïve program might read a 2GB file into a 2GB byte array, whereas a smarter one would read it in small chunks, say 4KB at a time.
It is actually in theta of (1) only. when you declare arr=int[100000000]
arr var will store the first address of the memory allocations.
When you do arr[n] what it does is *(arr+n) directly add n to the starting address and directly access the array.
Arrays are always stored in sequence manner only.
for more info please read https://www.ics.uci.edu/~dan/class/165/notes/memory.html
ask in the comment for more resources if you need.

How does a C program get information from an array internally?

I'm relatively new to programming, so when someone suggested that building an array of structs (each containing n attributes of a particular "item") was faster than building n arrays of attributes, I found that I didn't know enough about arrays to argue one way or the other.
I read this:
how do arrays work internally in c/c++
and
Basic Arrays Tutorial
But I still don't really understand how a C program retrieves a particular value from an array by index.
It seems pretty clear that data elements of the array are stored adjacent in memory, and that the array name points to the first element.
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data, or does the program have to somehow iterate over every intermediary piece of data before it gets there (as in the linkedlist data structure)?
More fundamentally, if a program asks for a piece of information by its memory address, how does the machine find it?
Let's take a simpler example. Let's say you have a array int test[10] which is stored like this at address 1000:
1|2|3|4|5|6|7|8|9|10
The complier knows that, for example, an int is 4 bytes. The array access formula is this:
baseaddr + sizeof(type) * index
The size of a struct is just the sum of the sizes of its elements plus any padding added by the compiler. So the size of this struct:
struct test {
int i;
char c;
}
Might be 5. It also might not be, because of padding.
As for your last question, very shortly (this is very complicated) the MMU uses the page table to translate the virtual address to a physical address, which is then requested, which if it's in cache, is returned, and otherwise is fetched from main memory.
You wrote:
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data
Yes, that is exactly what they do. They do not iterate over intervening items (and doing so would not help since there are no markers to guide beginning and end of each item).
so here is the whole trick, array elements are adjacent in memory.
when you declare an array for example: int A[10];
the variable A is a pointer to the first element in the array.
now comes the indexing part, whenever you do A[i] it is exactly as if you were doing *(A+i).
the index is just an offset to the beginning address of the array, also keep in mind that in pointer arithmetic the offset is multiplied by the size of the data type of the array.
to get a better understanding of this write a little code, declare an array and print the address of it and then of each element in the array.
and notice how the offset is always the same and equal to the size of the data type of your array on your machine.

What is the purpose of the byte size of the type of a variable if I know the address of the variable?

I am not getting the whole purpose of working with the byte size of a variable by knowing the address of it. For example, let's say I know where an int variable is stored, let's say it is stored in address 0x8C729A09, if I want to get the int stored in that address I can just dereference the address and get the number stored on it.
So, what exactly is the purpose of knowing the byte size of the variable? Why does it matter if the variable has 4 bytes (being int) or 8 bytes if I am able to get the value of the variable by just dereference the address? I am asking this, because I am working on dereferencing some address and I thought that I needed to go through a for loop to get the variable (By knowing the start address, which is the address of the variable, and the size of the variable in bytes) but whenever I do this I am just getting other variables that are also declared.
A little bit of context: I am working on a tool called Pin and getting the addresses of the global variables declared in another program.
The for case looks something like this:
for(address1 = (char *) 0x804A03C, limit = address1 + bytesize; address1 < limit; address1++)
cout << *(address1) << "\n";
Michael Krelin gave a very compact answer but I think I can expand on it a bit more.
In any language, not just C, you need to know the size for a variety of reasons:
This determines the maximum value that can be stored
The memory space an array of those values will take (1000 bytes will get you 250 ints or 125 longs).
When you want to copy one array of values into another, you need to know how many bytes are used to allocate enough space.
While you may dereference a pointer and get the value, you could dereference the pointer at a different portion of that value, but only if you know how many bytes it is composed of. You could get the high-value of an int by grabbing just the first two bytes, and the low value by getting the last two bytes.
Different architectures may have different sizes for different variables, which would impact all the above points.
Edit:
Also, there are certainly reasons where you may need to know the number of bits that a given variables is made of. If you want 32 booleans, what not a better variable to use than a single int, which is made of 32 bits? Then you can use some constants to create pointers to each bit and now you have an "array" of booleans. These are usually called bit-fields (correct me if I am wrong). In programming, every detail can matter, just not all the time for every application. Just figured that might be an interesting thought exercise.
The answer is simple: the internal representation of most types needs more than one byte. In order to dereference a pointer you (either you or the compiler) need to know how much bytes should be read.
Also consider it when working with strings, you cannot always relay on the terminating \0, hence you need to know how many bytes you have to read. Examples of these are functions like memcpy or strncmp.
Supposed you have an array of variables. Where do you find the variable at non-zero index without knowing its size? And how many bytes do you allocate for non-zero length array of variables?

How are arrays and hash maps constant time in their access?

Specifically: given a hash (or an array index), how does the machine get to the data in constant time?
It seems to me that even passing by all the other memory locations (or whatever) would take an amount of time equal to the number of locations passed (so linear time). A coworker has tried valiantly to explain this to me but had to give up when we got down to circuits.
Example:
my_array = new array(:size => 20)
my_array[20] = "foo"
my_array[20] # "foo"
Access of "foo" in position 20 is constant because we know which bucket "foo" is in. How did we magically get to that bucket without passing all the others on the way? To get to house #20 on a block you would still have to pass by the other 19...
How did we magically get to that
bucket without passing all the others
on the way?
"We" don't "go" to the bucket at all. The way RAM physically works is more like broadcasting the bucket's number on a channel on which all buckets listen, and the one whose number was called will send you its contents.
Calculations happen in the CPU. In theory, the CPU is the same "distance" from all memory locations (in practice it's not, because of caching, which can have a huge impact on performance).
If you want the gritty details, read "What every programmer should know about memory".
Then to understand you have to look at how memory is organized and accessed. You may have to look at the way an address decoder works. The thing is, you do NOT have to pass by all the other addresses to get to the one you want in the memory. You can actually jump to the one you want. Otherwise our computers would be really really slow.
Unlike a turing machine, which would have to access memory sequentially, computers use random access memory, or RAM, which means if they know where the array starts and they know they want to access the 20th element of the array, they know what part of memory to look at.
It is less like driving down a street and more like picking the correct mail slot for your apartment in a shared mailbox.
2 things are important:
my_array has information about where in memory computer must jump to get this array.
index * sizeof type gets offset from beginning of array.
1 + 2 = O(1) where data can be found
Big O doesn't work like that. It's supposed to be a measure of how much computational resources are used by a particular algorithm and function. It's not meant to measure the amount of memory used and if you are talking about traversing that memory, it's still a constant time. If I need to find the second slot of an array it's a matter of adding an offset to a pointer. Now, if I have a tree structure and I want to find a particular node you are now talking about O(log n) because it doesn't find it on the first pass. On average it takes O(log n) to find that node.
Let's discuss this in C/C++ terms; there's some additional stuff to know about C# arrays but it's not really relevant to the point.
Given an array of 16-bit integer values:
short[5] myArray = {1,2,3,4,5};
What's really happened is that the computer has allocated a block of space in memory. This memory block is reserved for that array, is exactly the size needed to hold the full array (in our case 16*5 == 80 bits == 10 bytes), and is contiguous. These facts are givens; if any or none of them are true in your environment at any given time, you're generally at risk for your program crashing due to an access vialoation.
So, given this structure, what the variable myArray really is, behind the scenes, is the memory address of the start of the block of memory. This is also, conveniently, the start of the first element. Each additional element is lined up in memory right after the first, in order. The memory block allocated for myArray might look like this:
00000000000000010000000000000010000000000000001100000000000001000000000000000101
^ ^ ^ ^ ^
myArray([0]) myArray[1] myArray[2] myArray[3] myArray[4]
It is considered a constant-time operation to access a memory address and read a constant number of bytes. As in the above figure, you can get the memory address for each one if you know three things; the start of the memory block, the memory size of each element, and the index of the element you want. So, when you ask for myArray[3] in your code, that request is turned into a memory address by the following equation:
myArray[3] == &myArray+sizeof(short)*3;
Thus, with a constant-time calculation, you have found the memory address of the fourth element (index 3), and with another constant-time operation (or at least considered so; actual access complexity is a hardware detail and fast enough that you shouldn't care) you can read that memory. This is, if you've ever wondered, why indexes of collections in most C-style languages start at zero; the first element of the array starts at the location of the array itself, no offset (sizeof(anything)*0 == 0)
In C#, there are two notable differences. C# arrays have some header information that is of use to the CLR. The header comes first in the memory block, and the size of this header is constant and known, so the addressing equation has just one key difference:
myArray[3] == &myArray+headerSize+sizeof(short)*3;
C# doesn't allow you to directly reference memory in its managed environment, but the runtime itself will use something like this to perform memory access off the heap.
The second thing, which is common to most flavors of C/C++ as well, is that certain types are always dealt with "by reference". Anything you have to use the new keyword to create is a reference type (and there are some objects, like strings, that are also reference types although they look like value types in code). A reference type, when instantiated, is placed in memory, doesn't move, and is usually not copied. Any variable that represents that object is thus, behind the scenes, just the memory address of the object in memory. Arrays are reference types (remember myArray was just a memory address). Arrays of reference types are arrays of these memory addresses, so accessing an object that is an element in an array is a two-step process; first you calculate the memory address of the element in the array, and get that. That is another memory address, which is the location of the actual object (or at least its mutable data; how compound types are structured in memory is a whole other can o' worms). This is still a constant-time operation; just two steps instead of one.

What is a "value" array?

In C, the idea of an array is very straightforward—simply a pointer to the first element in a row of elements in memory, which can be accessed via pointer arithmetic/ the standard array[i] syntax.
However, in languages like Google Go, "arrays are values", not pointers. What does that mean? How is it implemented?
In most cases they're the same as C arrays, but the compiler/interpreter hides the pointer from you. This is mainly because then the array can be relocated in memory in a totally transparent way, and so such arrays appear to have an ability to be resized.
On the other hand it is safer, because without a possibility to move the pointers you cannot make a leak.
Since then (2010), the article Slices: usage and internals is a bit more precise:
The in-memory representation of [4]int is just four integer values laid out sequentially:
Go's arrays are values.
An array variable denotes the entire array; it is not a pointer to the first array element (as would be the case in C).
This means that when you assign or pass around an array value you will make a copy of its contents. (To avoid the copy you could pass a pointer to the array, but then that's a pointer to an array, not an array.)
One way to think about arrays is as a sort of struct but with indexed rather than named fields: a fixed-size composite value.
Arrays in Go are also values in that they are passed as values to functions(in the same way ints,strings,floats etc.)
Which requires copying the whole array for each function call.
This can be very slow for a large array, which is why in most cases it's usually better to use slices

Resources