Pointer in arrays. How does it work "physically" in memory? - arrays

I have been wondering about pointers and can't find a source explaining them with details.
For example. Given an array int a[3]
There is a pointer pointing at 4 locations?
It starts as *[a+0] and points at address of a?
Then what does it do next? Int is minimum of 16 bites, so it needs to read 2 bytes, but every byte is given an address.
Does it mean that for a[0] the pointer points at the beginning address, then the program reads sizeof(int) bytes starting at the given address?
What would it do the next? Would it stop reading, give the result and
for a[1] would it point at address of &a+1*sizeof(int).
It would start reading at address of (&a+2(as 2 stands for already read addresses of 2 bytes)), start reading, so it would read another 2 bytes and on and on?
I can't quite understand these concepts.
PS: String consist of unsigned char which are 1 byte elements.
The post you mentioned doesn't explain what happens with elements larger than 1 byte. It also doesn't explain exactly what the program does beside "here is a string the program reads from memory". I assume that I am right, but nonetheless the title you mentioned is far away from what I asked about.
(since somebody wrote this already, one address stands for one byte)
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
|----|----|----|----|----|----| | | | | | | | | | |
0----+----01---+----12---+----2----+----+----+----+----+----+----+----+----+----+
I specifically asked if
int a[2] means that the pointer first:
Points at memory address (54), the program reads data from 2 following addresses (54 to 54 as int takes 2 bytes), then the pointer points at address 54+2, the program starts reading from address range <56,57>. Then again, the pointer points at starting range of 58, the program reads at address of <58,59>
Is this logic correct? It isn't a string ended up with NULL.
My guess to strings is that the program would access memory byte's address by byte's address and read the values till it found NULL.
Arrays aren't strings.

Consider
int a[3] = {};
int b[300] = {};
These 2 arrays are "similar" in that they contain values of int and are different in these two major regards:
They are of different "size" - that is the memory they point to is reserved with different amount for each. The first array points to a memory that is reserved to hold at least 3 int values. However that is the minimum allocated memory (in this case - on a stack, so most likely it is also a precise amount of memory allocated for it as well)
They point to different addresses in memory (again - in this case they are both allocated on a stack but it is still a RAM)
You can just as easily take an address of the first element of either array:
int * p = a;
p = &a[0]; // same as above
p = b; // now p points to the first element of the second array
When you perform an indexing operation what the compiler does is: it takes the address of the first element and increments it by a value that is equal to the index times the size of each element (if there's no padding due to alignment, of course). In essence the compiler is doing this:
b[1] = 1;
*(p+1) = 1; // same as above
uint8_t * c = reinterpret_cast<uint8_t*>(p); // WARNING! Explanation follows
The last line will cause the compiler to reinterpret the pointer differently and "the same" address arithmetic "suddenly" works differently:
c[1] = 1; // this is NOT the same as b[1] = 1
In this case the compiler will only "move" the pointer 8-bits (not 16 or 32 bits, depending on your platform's sizeof(int)) and end up in the middle of that first int element of array b. Granted this can be useful (especially when dealing directly with hardware) but is super-duper-puper non-portable and you should avoid doing so at all times!
This is admittedly not a comprehensive answer but I was not aiming to provide one as the topic is very vast and there are plenty of resources on the Web that can provide you with many more details on this subject

Related

C pointer cast - value truncation

The Question
Here's the code that casts a pointer to 16-bit value into a pointer to 32-bit value:
int low_level_read(uint32_t * read_data)
{
// some low level access to get 32-bit read here
}
int i2c_read(uint16_t * data)
{
low_level_read((uint32_t *) data);
printf("data=0x%X\n", *data);
}
Expected:
low_level_read obtains: 0x0000C101
i2c_read obtains: 0xC101
Observed:
low_level_read obtains: 0x0000C101
i2c_read obtains: 0x0000
Why does it seem like it's truncating/cutting-off the least significant 16 bits?
My Solution to this problem
If the i2c_read() is modified to look like below, then this works as expected:
int i2c_read(uint16_t * data)
{
uint32_t raw_data;
low_level_read(raw_data);
*data = (uint16_t) raw_data;
}
That's fine but I would still like to understand why the first piece of code is acting like that.
My Educated Guess as to Why
When we pass in the pointer to i2c_read(), it was meant for 8-bits:
pointer address 0x100 ->
+--+--+--+--+--+--+--+--+
7 6 5 4 3 2 1 0
+--+--+--+--+--+--+--+--+
However when cast to (uint32_t *), it "grows" the size of what the memory location could hold to 32-bits:
pointer address 0x100 ->
+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+
31 30 29 28 27 26 25 24 .... 7 6 5 4 3 2 1 0
+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+
The 32-bit value is put into that location.
But when it truncates it, it actually truncates bits [15:0] and leaves the pointer address as 0x100. This means when it returns, I end up seeing what was bits [31:16] thus all zeros.
That's my best guess.
Can someone explain this? Thanks :).
i2c_read() is receiving argument uint16_t *data which says "here is an address to 2 bytes of memory":
data --> [ByteA][ByteB]
When you perform the cast (uint32_t*)data, you're now claiming that data is an address to 4 bytes of memory:
data --> [ByteA][ByteB][ByteC][ByteD]
In good faith, low_level_read() uses the address that you passed as uint32_t* and populates all 4 bytes of memory as [00][00][C1][01]. This is bad. Pointer data has no rights to [ByteC][ByteD] and now you've overwritten memory that may have held some important data for some other part of your program.
Back in i2c_read() at the printf(), variable data goes back to being just a uint16_t* and *data reads just [ByteA][ByteB] as the value to print ([00][00]).
If you instead called printf("℅08X", *(uint32_t*)data), 4 bytes would be read and 0x00000C01 would print.
To fix your code, ensure that the argument types of i2c_read() and low_level_read() are the same.

Unexpected memory allocation in stack segment

I am trying to see that for a given function, memory allocation on stack segment of memory will happen in contiguous way. So, I wrote below code and I got below output.
For int allocation I see the memory address are coming as expected but not for character array. After memory address 0xbff1599c I was expecting next address to be 0xbff159a0 and not 0xbff159a3. Also, since char is 1 byte and I am using 4 bytes, so after 0xbff159a3 I was expecting 0xbff159a7 and not 0xbff159a8
All memory locations comes as expected if I remove char part but I am not able to get expected memory locations with character array.
My base assumption is that on stack segment, memory will always be contiguous. I hope that is not wrong.
#include <stdio.h>
int main(void)
{
int x = 10;
printf("Value of x is %d\n", x);
printf("Address of x is %p\n", &x);
printf("Dereferencing address of x gives %d\n", *(&x));
printf("\n");
int y = 20;
printf("Value of y is %d\n", y);
printf("Address of y is %p\n", &y);
printf("Dereferencing address of y gives %d\n", *(&y));
printf("\n");
char str[] = "abcd";
printf("Value of str is %s\n", str);
printf("Address of str is %p\n", &str);
printf("Dereferencing address of str gives %s\n", *(&str));
printf("\n");
int z = 30;
printf("Value of z is %d\n", z);
printf("Address of z is %p\n", &z);
printf("Dereferencing address of z gives %d\n", *(&z));
}
Output:
Value of x is 10
Address of x is 0xbff159ac
Dereferencing address of x gives 10
Value of y is 20
Address of y is 0xbff159a8
Dereferencing address of y gives 20
Value of str is abcd
Address of str is 0xbff159a3
Dereferencing address of str gives abcd
Value of z is 30
Address of z is 0xbff1599c
Dereferencing address of z gives 30
Also, since char is 1 byte and I am using 4 bytes, so after 0xbff159a3 I was expecting 0xbff159a7 and not 0xbff159a8
char takes up 1 byte , but str is string and you did not count '\0' which is at the end of string and thus ,char str[]="abcd" takes up 5 bytes.
I think this could be because the addresses are aligned to boundaries(e.g. 8 byte boundary)?.
The allocations are always aligned to boundaries and allocated in chunks
in some OS. You can check using a structure. For example,
struct A
{
char a;
char b;
int c;
};
The size of the struct will not be 6 bytes on a UNIX/LINUX platform.
But it might vary from OS to OS.
Similar thing apply to other data types also .
Moreover, a string just points to an address allocated in a
heap if malloc is used and the allocation logic might vary
from OS to OS. The following is output from Linux box
for the same program.
Value of x is 10
Address of x is 0x7ffffa43a50c
Dereferencing address of x gives 10
Value of y is 20
Address of y is 0x7ffffa43a508
Dereferencing address of y gives 20
Value of str is abcd
Address of str is 0x7ffffa43a500
Dereferencing address of str gives abcd
Value of z is 30
Address of z is 0x7ffffa43a4fc
Dereferencing address of z gives 30
Both answers from #ameyCU and #Umamahesh were good but none was self-sufficient so I am writing my answer and adding more information so that folks visiting further can get maximum knowledge.
I got that result because of concept called as Data structure alignment. As per this, computer will always try to allocate memory (whether in heap segment or stack segment or data segment, in my case it was stack segment) in chunks in such a way that it can read and write quickly.
When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system) or larger. Data alignment means putting the data at a memory address equal to some multiple of the word size, which increases the system's performance due to the way the CPU handles memory.
On a 32 bits architecture, computers word size is 4 bytes, so computer will always try to allocate memory with addresses falling in multiple of 4, so that it can quickly read and write in block of 4 bytes. When there are lesser number of bytes then computer does padding of some empty bytes either in start or end.
In my case, suppose I use char str[] = "abc"; then including EOL character '\0' I have requirement of 4 bytes, so there will be no padding. But when I do char str[] = "abcd"; then including EOL character '\0' I have requirement of 5 bytes, now computer wants to allocate in block of 4 so it will add padding of 3 bytes (either in start or end) and hence complete char array will be spanned over 8 bytes in memory.
Since int, long memory requirement is already in multiple of 4 so there is no issue and it gets tricky with char or short which are not in multiple of 4. This explains the thing which I reported - "All memory locations comes as expected if I remove char part but I am not able to get expected memory locations with character array."
Rule of thumb is that if your memory requirement is not in multiple of 4 (for example, 1 short, char array of size 2) then extra padding will be added and then memory allocation will happen, so that computer can read and write quickly.
Below is nice excerpt from this answer which explains data structure alignment.
Suppose that you have the structure.
struct S {
short a;
int b;
char c, d;
};
Without alignment, it would be laid out in memory like this (assuming a 32-bit architecture):
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | | words
The problem is that on some CPU architectures, the instruction to load a 4-byte integer from memory only works on word boundaries. So your program would have to fetch each half of b with separate instructions.
But if the memory was laid out as:
0 1 2 3 4 5 6 7 8 9 A B
|a|a| | |b|b|b|b|c|d| | |
| | | |
Then access to b becomes straightforward. (The disadvantage is that more memory is required, because of the padding bytes.)

Which of the following is the correct output for the program given below?

if the machine is 32bit little-endianess and the sizeof(int) is 4 byte.
Given the following program:
line1: #include<stdio.h>
line2: {
line3: int arr[3]={2,3,4};
line4: char *p;
line5: p=(char*)arr;
line6: printf("%d",*p);
line7: p=p+1;
line8: printf("%d\n",*p);
line9: return 0;
}
What is the expected output?
A: 2 3
B: 2 0
C: 1 0
D: garbage value
one thing that bothering me the casting of the integer pointer to an character pointer.
How important the casting is?
What is the compiler doing at line 5? (p = (char *) arr;)
What is happening at line 7? (p = p + 1)
If the output is 20 then how the 0 is being printed out?
(E) none of the above
However, provided that (a) you are on a little-endian machine (e.g. x86), and (b) sizeof(int) >= 2, this should print "20" (no space is printed between the two).
a) the casting is "necessary" to read the array one byte at a time instead of as a series of ints
b) this is just coercing the address of the first int into a pointer to char
c) increment the address stored in p by sizeof(char) (which is 1)
d) the second byte of the machine representation of the int is printed by line 8
(D), or compiler specific, as sizeof(int) (as well as endianness) is platform-dependent.
How important the casting is?
Casting, as a whole is an integral (pun unintended) part of the C language.
and what the compilar would do in line number5?
It takes the address of the first element of arr and puts it in p.
and after line number 5 whats going on line number7?
It increments the pointer so it points to the next char from that memory address.
and if the output is 2 0 then how the 0 is being printed by the compiler?
This is a combination of endanness and sizeof(int). Without the specs of your machine, there isn't much else I can do to explain.
However, assuming little endian and sizeof(int) == 4, we can see the following:
// lets mark these memory regions: |A|B|C|D|
int i = 2; // represented as 0x02000000
char *ptr = (char *) &i; // now ptr points to 0x02 (A)
printf("%d\n", *ptr); // prints '2', because ptr points to 0x02 (A)
ptr++; // increment ptr, ptr now points to 0x00 (B)
printf("%d\n", *ptr); // prints '0', because ptr points to 0x00 (B)
1.important of casting:-
char *p;
this line declare a pointer to a character.That means its property is it can de-reference
only one byte at a time,and also displacement are one one byte.
p=(char*)arr;
2. type casting to char * is only for avoid warning by compiler nothing else.
If you don't then also same behavior.
as pointer to a character as I already write above p=p+1 point to next byte
printf("%d\n",*p);
%d is formatting the value to decimal integer so decimal format shown
here *p used and as per its property it can de-reference only one byte.So now memory organisation comes into picture.
that is your machine follows little endian/LSB first or big endian/MSB first
as per your ans your machine follow little endian.So first time your ans is 0.
Then next byte must be zero so output is 0.
in binary:
2 represented as 00-00-00-02(byte wise representation)
but in memory it stores like
02-00-00-00 four bytes like this
in first memory byte 02
and in 2nd memory byte 00

Understand the following line

I read this code in a library which is used to display a bitmap (.bmp) to an LCD.
I do really hard in understanding what is happening at the following lines, and how it does happen.
Maybe someone can explain this to me.
uint16_t s, w, h;
uint8_t* buffer; // does get malloc'd
s = *((uint16_t*)&buffer[0]);
w = *((uint16_t*)&buffer[18]);
h = *((uint16_t*)&buffer[22]);
I guess it's not that hard for a real C programmer, but I am still learning, so I thought I just ask :)
As far as I understand this, it sticks somehow together two uint8_tvariables to an uint16_t.
Thanks in advance for your help here!
In the code you've provided, buffer (which is an array of bytes) is read, and values are extracted into s, w and h.
The (uint16_t*)&buffer[n] syntax means that you're extracting the address of the nth byte of buffer, and casting it into a uint16_t*. The casting tells the compiler to look at this address as if points at a uint16_t, i.e. a pair of uint8_ts.
The additional * in the code dereferences the pointer, i.e. extracts the value from this address. Since the address now points at a uint16_t, a uint16_t value is extracted.
As a result:
s gets the value of the first uint16_t, i.e. bytes 0 and 1.
w gets the value of the tenth uint16_t, i.e. bytes 18 and 19.
h gets the value of the twelveth uint16_t, i.e. bytes 22 and 23.
The code:
takes two bytes at positions 0 and 1 in the buffer, sticks them together into an unsigned 16-bit value, and stores the result in s;
it does the same with bytes 18/19, storing the result in w;
ditto for bytes 22/23 and h.
It is worth noting that the code uses the native endianness of the target platform to decide which of the two bytes represents the top 8 bits of the result, and which represents the bottom 8 bits.
uint8_t* buffer; // pointer to 8 bit or simply one byte
Buffer points to memory address of bytes -> |byte0|byte1|byte2|....
(uint16_t*)&buffer[0] // &buffer[0] is actually the same as buffer
(uint16_t*)&buffer[0] equals (uint16_t*)buffer; it points to 16 bit or halfword
(uint16_t*)buffer points to memory: |byte0byte1 = halfword0|byte2byte3 = halfword1|....
w = *((uint16_t*)&buffer[18]);
Takes memory address to byte 18 in buffer, then reinterpret this address to address of halfword then gets halfword on this address;
it's simply w = byte18 and byte19 sticked together forming a halfword
h = *((uint16_t*)&buffer[22]);
h = byte22 and byte 23 sticked together
UPD More detailed explanation:
h = *((uint16_t*)&buffer[22]) =>
1) buffer[22] === 22nd uint8_t (a.k.a. byte) of buffer; let's call it byte22
2) &buffer[22] === &byte === address of byte22 in memory; it's of type uint8_t*, as same as buffer; letscall it byte22_address;
3) (uint16_t*)&buffer[22] = (uint16_t*)byte22_address; casts address of byte to address of (two bytes sticked together; address of halfword of the same address; let's call it halfword11_address;
4) h = *((uint16_t*)&buffer[22]) === *halfword11_address; * operator takes value at address, that is 11th halfword or bytes 22 and 23 sticked together;

Pointer Dereferencing = Program Crash

unsigned int *pMessageLength, MessageLength;
char *pszParsePos;
...
//DATA into pszParsePos
...
printf("\nMessage Length\nb1: %d\nb2: %d\nb3: %d\nb4: %d\n",
pszParsePos[1],pszParsePos[2],pszParsePos[3],pszParsePos[4]);
pMessageLength= (unsigned int *)&pszParsePos[1];
MessageLength = *((unsigned int *)&pszParsePos[1]);
//Program Dies
Output:
Message Length
b1: 0
b2: 0
b3: 0
b4: 1
I'm don't understand why this is crashing my program. Could someone explain it, or at least suggest an alternative method that won't crash?
Thanks for your time!
Bus error means that you're trying to access data with incorrect alignment. Specifically, it seems like the processor requires int to be aligned more strictly than just anywhere, and if your *pszParsePos is aligned, say on an int boundary (which depends on how you initialize it, but will happen, e.g., if you use malloc), it's certain that &pszParsePos[1] isn't.
One way to fix this would be constructing MessageLength explicitly, i.e., something like
MessageLength = (pszParsePos[1] << 24) | (pszParsePos[2] << 16) | (pszParsePos[3] << 8) | pszParsePos[4]
(or the other way around if it's supposed to be little-endian). If you really want to type-pun, make sure that the pointer you're accessing is properly aligned.
Here's what I think is going wrong:
You added in a comment that you are runing on the Blackfin Processor. I looked this up on some web sites and they claim that the Blackfin requires what are called aligned accesses. That is, if you are reading or writing a 32-bit value to/from memory, then the physical address must be a an even multiple of 4 bytes.
Arrays in C are indexed beginning with [0], not [1]. A 4-byte array of char ends with element [3].
In your code, you have a 4-byte array of char which:
You treat as though it began at index 1.
You convert via pointer casts to a DWORD via 32-bit memory fetch.
I suspect your 4-char array is aligned to a 4-byte boundary, but as you are beginning your memory access at position +1 byte, you get a misalignment of data bus error.

Resources