Pointer Dereferencing = Program Crash - c

unsigned int *pMessageLength, MessageLength;
char *pszParsePos;
...
//DATA into pszParsePos
...
printf("\nMessage Length\nb1: %d\nb2: %d\nb3: %d\nb4: %d\n",
pszParsePos[1],pszParsePos[2],pszParsePos[3],pszParsePos[4]);
pMessageLength= (unsigned int *)&pszParsePos[1];
MessageLength = *((unsigned int *)&pszParsePos[1]);
//Program Dies
Output:
Message Length
b1: 0
b2: 0
b3: 0
b4: 1
I'm don't understand why this is crashing my program. Could someone explain it, or at least suggest an alternative method that won't crash?
Thanks for your time!

Bus error means that you're trying to access data with incorrect alignment. Specifically, it seems like the processor requires int to be aligned more strictly than just anywhere, and if your *pszParsePos is aligned, say on an int boundary (which depends on how you initialize it, but will happen, e.g., if you use malloc), it's certain that &pszParsePos[1] isn't.
One way to fix this would be constructing MessageLength explicitly, i.e., something like
MessageLength = (pszParsePos[1] << 24) | (pszParsePos[2] << 16) | (pszParsePos[3] << 8) | pszParsePos[4]
(or the other way around if it's supposed to be little-endian). If you really want to type-pun, make sure that the pointer you're accessing is properly aligned.

Here's what I think is going wrong:
You added in a comment that you are runing on the Blackfin Processor. I looked this up on some web sites and they claim that the Blackfin requires what are called aligned accesses. That is, if you are reading or writing a 32-bit value to/from memory, then the physical address must be a an even multiple of 4 bytes.
Arrays in C are indexed beginning with [0], not [1]. A 4-byte array of char ends with element [3].
In your code, you have a 4-byte array of char which:
You treat as though it began at index 1.
You convert via pointer casts to a DWORD via 32-bit memory fetch.
I suspect your 4-char array is aligned to a 4-byte boundary, but as you are beginning your memory access at position +1 byte, you get a misalignment of data bus error.

Related

Why does a reference to an int return only one memory address?

Example Program:
#include <stdio.h>
int main() {
int x = 0;
printf("%p", &x);
return 0;
}
I have read that most machines are byte-accessible, meaning that only one
byte can be stored on a single memory address (e.g. 0xf4829cba stores the value 01101011). Assuming that x is a 32-bit integer, shouldn't the reference to the variable return four memory addresses, instead of one?
Please ELI5, as I am very confused right now.
Thank you so much for your time.
-Matt
The address (it's not a "reference") you're given is to the beginning of the memory where the variable is stored. The variable will then take as many bytes as needed according to its type. So if int is 32 bits in your target architecture, the address you get is of the first of four bytes used to store that int.
+−−−−−−−−+
address−−−>| byte 0 |
| byte 1 |
| byte 2 |
| byte 3 |
+−−−−−−−−+
It may help to think in terms of objects1 rather than bytes. Most useful data types in C take up more than a single byte.
As for an expression like &x evaluating to multiple addresses, think of it like the address to your house - you don't specify a distinct address for every room in the house, do you? No, for the purpose of telling other people where your house is, you only need to specify one address. For the purpose of knowing where an int ordouble or struct humongous object is, we only need to know the address of the first byte.
You can access and manipulate individual bytes in a larger object in several different ways. You can use bit masking operations like
int x = some_value;
unsigned char aByte = (x & 0xFF000000) >> 24; // isolate the MSB
or you can map the object onto an array of unsigned char using a union:
union {
int x;
unsigned char b[sizeof (int)];
} u;
u.x = some_value;
aByte = u.b[0]; // access the initial byte - depending on byte ordering, this
// may be the MSB or the LSB.
or by creating a pointer to the first byte:
int x = some_value;
unsigned char *b = (unsigned char *) &x;
unsigned char aByte = b[0];
Byte ordering is a thing - some architectures store multi-byte values starting at the most significant byte, others starting at the least significant byte:
For any address A
A+0 A+1 A+2 A+3
Big endian +---+---+---+---+
|MSB| | |LSB|
+---+---+---+---+ Little endian
A+3 A+2 A+1 A+0
The M68K chips that powered the original Macintosh were big-endian, while x86 is little-endian.
Bitwise operators like & and | take byte ordering into account - x & 0xFF000000 will always isolate the MSB2. When you map an object onto an array of unsigned char, the first element may map to the MSB, or it may map to the LSB, or it may map to something else (the old VAX architecture used a "middle-endian" ordering for 32-bit floats that either went 2301 or 1032, can't remember which offhand).
In the C sense of a region of storage that may be used to hold a value, not the OOP sense of an instance of a class.
Assuming 32-bit int and 8-bit bytes, anyway.

Pointer in arrays. How does it work "physically" in memory?

I have been wondering about pointers and can't find a source explaining them with details.
For example. Given an array int a[3]
There is a pointer pointing at 4 locations?
It starts as *[a+0] and points at address of a?
Then what does it do next? Int is minimum of 16 bites, so it needs to read 2 bytes, but every byte is given an address.
Does it mean that for a[0] the pointer points at the beginning address, then the program reads sizeof(int) bytes starting at the given address?
What would it do the next? Would it stop reading, give the result and
for a[1] would it point at address of &a+1*sizeof(int).
It would start reading at address of (&a+2(as 2 stands for already read addresses of 2 bytes)), start reading, so it would read another 2 bytes and on and on?
I can't quite understand these concepts.
PS: String consist of unsigned char which are 1 byte elements.
The post you mentioned doesn't explain what happens with elements larger than 1 byte. It also doesn't explain exactly what the program does beside "here is a string the program reads from memory". I assume that I am right, but nonetheless the title you mentioned is far away from what I asked about.
(since somebody wrote this already, one address stands for one byte)
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
|----|----|----|----|----|----| | | | | | | | | | |
0----+----01---+----12---+----2----+----+----+----+----+----+----+----+----+----+
I specifically asked if
int a[2] means that the pointer first:
Points at memory address (54), the program reads data from 2 following addresses (54 to 54 as int takes 2 bytes), then the pointer points at address 54+2, the program starts reading from address range <56,57>. Then again, the pointer points at starting range of 58, the program reads at address of <58,59>
Is this logic correct? It isn't a string ended up with NULL.
My guess to strings is that the program would access memory byte's address by byte's address and read the values till it found NULL.
Arrays aren't strings.
Consider
int a[3] = {};
int b[300] = {};
These 2 arrays are "similar" in that they contain values of int and are different in these two major regards:
They are of different "size" - that is the memory they point to is reserved with different amount for each. The first array points to a memory that is reserved to hold at least 3 int values. However that is the minimum allocated memory (in this case - on a stack, so most likely it is also a precise amount of memory allocated for it as well)
They point to different addresses in memory (again - in this case they are both allocated on a stack but it is still a RAM)
You can just as easily take an address of the first element of either array:
int * p = a;
p = &a[0]; // same as above
p = b; // now p points to the first element of the second array
When you perform an indexing operation what the compiler does is: it takes the address of the first element and increments it by a value that is equal to the index times the size of each element (if there's no padding due to alignment, of course). In essence the compiler is doing this:
b[1] = 1;
*(p+1) = 1; // same as above
uint8_t * c = reinterpret_cast<uint8_t*>(p); // WARNING! Explanation follows
The last line will cause the compiler to reinterpret the pointer differently and "the same" address arithmetic "suddenly" works differently:
c[1] = 1; // this is NOT the same as b[1] = 1
In this case the compiler will only "move" the pointer 8-bits (not 16 or 32 bits, depending on your platform's sizeof(int)) and end up in the middle of that first int element of array b. Granted this can be useful (especially when dealing directly with hardware) but is super-duper-puper non-portable and you should avoid doing so at all times!
This is admittedly not a comprehensive answer but I was not aiming to provide one as the topic is very vast and there are plenty of resources on the Web that can provide you with many more details on this subject

structure memory layout in multithreaded code

The following code is a multi-threaded and is running for thread id=0 and 1 simultaneously.
typedef struct
{
unsigned char pixels[4];
} FourPixels;
main()
{
FourPixels spixels[];
//copy on spixels
spixels[id] = gpixels[id];
//example : remove blue component
spixels[id].pixels[0] &= 0xFC;
spixels[id].pixels[1] &= 0xFC;
spixels[id].pixels[2] &= 0xFC;
spixels[id].pixels[3] &= 0xFC;
}
We see that thread id =0 fetches 4 chars, and the thread id =1 fetches another set of 4 chars.
I want to know in memory how the structures spixels[0] and spixles[1] are put, means something like this?
spixels[0] spixels[1]
pixel[0] pixel[1] pixel[2] pixel[3] pixel[0] pixel[1] pixel[2] pixel[3]
2000 2001 2002 2003 2004 2005 2006 2007
The question is are spixel[0] and spixel[1] placed contiguously with guarantee as shown above?
Yes, they will be laid out contiguously as you say. Now, probably someone will come and say that it is not guaranteed on all platforms, because the alignment of the struct could be more than its size, so you could have a gap between the two struct "bodies" due to implicit padding after the first one. But no matter, because the alignment on any sane compiler and platform will be just 1 byte (as in char).
If I were writing code that relied on this, I'd add a compile-time assertion that the size of two of those structs should be exactly 8 bytes, and then I'd be 100% confident.
Edit: here's an example of how a compile-time check might work:
struct check {
char floor[sizeof(FourPixels[2]) - 8];
char ceiling[8 - sizeof(FourPixels[2])];
};
The idea is that if the size is not 8, one of the arrays will have negative size. If it is 8, they'll both have zero size. Note that this is a compiler extension (GCC supports zero-length arrays for example), so you may want to look for a better way. I'm more of a C++ person, and we have fancier tricks for this (in C++11 it's built in: static_assert()).
An array is guaranteed by the standard to be contiguous. It's also guaranteed that the first entry will be on a low address in memory, and the next will be on a higher, etc.
In the case the structures pixel array, pixel[1] will always come directly after pixel[0]. The same with the next entries.
Yes arrays are placed in contiguous memory location.
This is to allow the pointer arithmetic.

C programming: words from byte array

I have some confusion regarding reading a word from a byte array. The background context is that I'm working on a MIPS simulator written in C for an intro computer architecture class, but while debugging my code I ran into a surprising result that I simply don't understand from a C programming standpoint.
I have a byte array called mem defined as follows:
uint8_t *mem;
//...
mem = calloc(MEM_SIZE, sizeof(uint8_t)); // MEM_SIZE is pre defined as 1024x1024
During some of my testing I manually stored a uint32_t value into four of the blocks of memory at an address called mipsaddr, one byte at a time, as follows:
for(int i = 3; i >=0; i--) {
*(mem+mipsaddr+i) = value;
value = value >> 8;
// in my test, value = 0x1084
}
Finally, I tested trying to read a word from the array in one of two ways. In the first way, I basically tried to read the entire word into a variable at once:
uint32_t foo = *(uint32_t*)(mem+mipsaddr);
printf("foo = 0x%08x\n", foo);
In the second way, I read each byte from each cell manually, and then added them together with bit shifts:
uint8_t test0 = mem[mipsaddr];
uint8_t test1 = mem[mipsaddr+1];
uint8_t test2 = mem[mipsaddr+2];
uint8_t test3 = mem[mipsaddr+3];
uint32_t test4 = (mem[mipsaddr]<<24) + (mem[mipsaddr+1]<<16) +
(mem[mipsaddr+2]<<8) + mem[mipsaddr+3];
printf("test4= 0x%08x\n", test4);
The output of the code above came out as this:
foo= 0x84100000
test4= 0x00001084
The value of test4 is exactly as I expect it to be, but foo seems to have reversed the order of the bytes. Why would this be the case? In the case of foo, I expected the uint32_t* pointer to point to mem[mipsaddr], and since it's 32-bits long, it would just read in all 32 bits in the order they exist in the array (which would be 00001084). Clearly, my understanding isn't correct.
I'm new here, and I did search for the answer to this question but couldn't find it. If it's already been posted, I apologize! But if not, I hope someone can enlighten me here.
It is (among others) explained here: http://en.wikipedia.org/wiki/Endianness
When storing data larger than one byte into memory, it depends on the architecture (means, the CPU) in which order the bytes are stored. Either, the most significant byte is stored first and the least significant byte last, or vice versa. When you read back the individual bytes through byte access operations, and then merge them to form the original value again, you need to consider the endianess of your particular system.
In your for-loop, you are storing your value byte-wise, starting with the most significant byte (counting down the index is a bit misleading ;-). Your memory looks like this afterwards: 0x00 0x00 0x10 0x84.
You are then reading the word back with a single 32 bit (four byte) access. Depending on our architecture, this will either become 0x00001084 (big endian) or 0x84100000 (little endian). Since you get the latter, you are working on a little endian system.
In your second approach, you are using the same order in which you stored the individual bytes (most significant first), so you get back the same value which you stored earlier.
It seems to be a problem of endianness, maybe comes from casting (uint8_t *) to (uint32_t *)

C: Memcpy vs Shifting: Whats more efficient?

I have a byte array containing 16 & 32bit data samples, and to cast them to Int16 and Int32 I currently just do a memcpy with 2 (or 4) bytes.
Because memcpy is probably isn't optimized for lenghts of just two bytes, I was wondering if it would be more efficient to convert the bytes using integer arithmetic (or an union) to an Int32.
I would like to know what the effiency of calling memcpy vs bit shifting is, because the code runs on an embedded platform.
I would say that memcpy is not the way to do this. However, finding the best way depends heavily on how your data is stored in memory.
To start with, you don't want to take the address of your destination variable. If it is a local variable, you will force it to the stack rather than giving the compiler the option to place it in a processor register. This alone could be very expensive.
The most general solution is to read the data byte by byte and arithmetically combine the result. For example:
uint16_t res = ( (((uint16_t)char_array[high]) << 8)
| char_array[low]);
The expression in the 32 bit case is a bit more complex, as you have more alternatives. You might want to check the assembler output which is best.
Alt 1: Build paris, and combine them:
uint16_t low16 = ... as example above ...;
uint16_t high16 = ... as example above ...;
uint32_t res = ( (((uint32_t)high16) << 16)
| low16);
Alt 2: Shift in 8 bits at a time:
uint32_t res = char_array[i0];
res = (res << 8) | char_array[i1];
res = (res << 8) | char_array[i2];
res = (res << 8) | char_array[i3];
All examples above are neutral to the endianess of the processor used, as the index values decide which part to read.
Next kind of solutions is possible if 1) the endianess (byte order) of the device match the order in which the bytes are stored in the array, and 2) the array is known to be placed on an aligned memory address. The latter case depends on the machine, but you are safe if the char array representing a 16 bit array starts on an even address and in the 32 bit case it should start on an address dividable by four. In this case you could simply read the address, after some pointer tricks:
uint16_t res = *(uint16_t *)&char_array[xxx];
Where xxx is the array index corresponding to the first byte in memory. Note that this might not be the same as the index to he lowest value.
I would strongly suggest the first class of solutions, as it is endianess-neutral.
Anyway, both of them are way faster than your memcpy solution.
memcpy is not valid for "shifting" (moving data by an offset shorter than its length within the same array); attempting to use it for such invokes very dangerous undefined behavior. See http://lwn.net/Articles/414467/
You must either use memmove or your own shifting loop. For sizes above about 64 bytes, I would expect memmove to be a lot faster. For extremely short shifts, your own loop may win. Note that memmove has more overhead than memcpy because it has to determine which direction of copying is safe. Your own loop already knows (presumably) which direction is safe, so it can avoid an extra runtime check.

Resources