Why does casting the pointer change the value at the address? - c

So I was doing an exercise to see if I was using memset correctly.
Here's the original code I wrote which was supposed to memset some addressese to have value 50:
int main(){
int *block1 = malloc(2048);
memset(block1, 50, 10);
// int count = 0;
for (int *iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) ){
printf("%p : %d\n", iter, *iter);
}
return 0;
}
I expected every address in memory to store the value 50. HOWEVER my output was:
(Address : Value)
0x14e008800 : 842150450
0x14e008801 : 842150450
0x14e008802 : 842150450
0x14e008803 : 842150450
0x14e008804 : 842150450
0x14e008805 : 842150450
0x14e008806 : 842150450
0x14e008807 : 3289650
0x14e008808 : 12850
0x14e008809 : 50
I was stuck on the problem for a while and tried a bunch of things until I randomly decided that maybe my pointer is the problem. I then tried a uint8_t pointer.
int main(){
uint8_t *block1 = malloc(2048);
memset(block1, 50, 10);
for (uint8_t *iter = block1; iter < block1 + 10; iter++ ){
printf("%p : %d\n", iter, *iter);
}
return 0;
}
All I did was change the type of the block1 variable and my iter variable to be uint8_t pointers instead of int pointers and I got the correct result!
0x13d808800 : 50
0x13d808801 : 50
0x13d808802 : 50
0x13d808803 : 50
0x13d808804 : 50
0x13d808805 : 50
0x13d808806 : 50
0x13d808807 : 50
0x13d808808 : 50
0x13d808809 : 50
My question is then, why did that make such a difference?

My question is then, why did that make such a difference?
Because the exact type of a pointer is hugely important. Pointers in C are not just memory addresses. Pointers are memory addresses, along with a notion of what type of data is expected to be found at that address.
If you write
uint8_t *p;
... p = somewhere ...
printf("%d\n", *p);
then in that last line, *p fetches one byte of memory pointed to by p.
But if you write
int *p;
... p = somewhere ...
printf("%d\n", *p);
where, yes, the only change is the type of the pointer, then in that exact same last line, *p now fetches four bytes of memory pointed to by p, interpreting them as a 32-bit int. (This assumes int on your machine is four bytes, which is pretty common these days.)
When you called
memset(block1, 50, 10);
you were asking for some (though not all) of the individual bytes of memory in block1 to be set to 50.
When you used an int pointer to step over that block of memory, fetching (as we said earlier) four bytes of memory at a time, you got 4-byte integers where each of the 4 bytes contained the value 50. So the value you got was
(((((50 << 8) | 50) << 8) | 50) << 8) | 50
which just happens to be exactly 842150450.
Or, looking at it another way, if you take that value 842150450 and convert it to hex (base 16), you'll find that it's 0x32323232, where 0x32 is the hexadecimal value of 50, again showing that we have four bytes each with the value 50.
Now, that all makes sense so far, although, you were skating on thin ice in your first program. You had int *iter, but then you said
for(iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) )
In that cumbersome increment expression
iter = (int *) ((uint8_t *)iter + 1)
you have contrived to increment the address in iter by just one byte. Normally, we say
iter = iter + 1
or just
iter++
and this means to increment the address in iter by several bytes, so that it points at the next int in a conventional array of int.
Doing it the way you did had three implications:
You were accessing a sort of sliding window of int-sized subblocks of block1. That is, you fetched an int made from bytes 1, 2, 3, and 4, then an int made from bytes 2, 3, 4, and 5, then an int made from bytes 3, 4, 5, and 6, etc. Since all the bytes had the same value, you always got the same value, but this is a strange and generally meaningless thing to do.
Three out of four of the int values you fetched were unaligned. It looks like your processor let you get away with this, but some processors would have given you a Bus Error or some other kind of memory-access exception, because unaligned access aren't always allowed.
You also violated the rule about strict aliasing.

The function memset sets each byte of the supplied memory with the specified value.
So in this call
memset(block1, 50, 10);
10 bytes of the memory addressed by the pointer block1 were set with the value 50.
But using the pointer iter that has the type int * you are outputting at once sizeof( int ) bytes pointed to by the pointer.
On the other hand if to declare the pointer as having the type
uint8_t *iter;
then you will output only one byte of memory.
Consider the following demonstration program.
#include <stdio.h>
int main( void )
{
int x;
memset( &x, 50, sizeof( x ) );
printf( "x = %d\n", x );
for ( const char *p = ( const char * )&x; p != ( const char * )&x + sizeof( x ); ++p )
{
printf( "%d", *p );
}
putchar( '\n' );
}
The program output is
x = 842150450
50505050
That is each byte of the memory occupied by the integer variable x was set equal to 50.
If to output each byte separately then the program outputs the values 50.
To make it even more clear consider one more demonstration program.
#include <stdio.h>
int main( void )
{
printf( "50 in hex is %#x\n", 50 );
int x = 0x32323232;
printf( "x = %d\n", x );
}
The program output is
50 in hex is 0x32
x = 842150450
That is the value 50 in hexadecimal is equal tp 0x32.
Thus this initialization
int x = 0x32323232;
yields the same result as the call of the function memset
memset( &x, 50, sizeof( x ) );
that you could equivalently rewrite like
memset( &x, 0x32, sizeof( x ) );

In the first case you are de-referencing the int* iter so it prints the (misaligned) int value at the address, not the byte value.
It is clear what is happening when you look at the value 842150450 in hexadecimal - 0x32323232 - that is each byte of the integer is 0x32 (50 decimal). The bytes after the tenth byte are undefined, but happen to be zero in this case and the machine is little-endian, so it tails off with 0x323232, 0x3232, and finally 0x32.
Clearly the second case is the more "correct" solution, but you can fix the first case thus;
printf("%p : %d\n",
(void*)iter,
*(uint8_t*)iter);

Related

in C, why do I have " "s": initialization requires a brace-enclosed initializer list"?

DISCLAIMER: it's just a piece of the whole algorithm, but since I encountered a lot of errors, I've decided to divide and conquer, therefore, I start with the following code. (goal of the following code: create a string with the remainders of the division by 10 (n%10). for now, it's not reversed, I haven't done it yet, because I wanted to check this code first).
(i'm working in C, in visual studio environment).
I have to implement a function like atoi (that works from a string to a number), but I want to do the opposite (from a number to a string). but I have a problem:
the debugger pointed out that in the lines with the malloc, I should have initialized the string first (initialization requires a brace-enclosed initializer list),
but I have done it, I have initialized the string to a constant (in the 2nd line, I've written "this is the test seed")(because I need to work with a string, so I initialized, and then I malloc it to write the values of (unsigned int n) ).
this is how my program is supposed to work:
(1) the function takes an unsigned int constant (n),
(2) the function creates a "prototype" of the array (the zero-terminated string),
(3) then, I've created a for-loop without a check condition because I added it inside the loop body,
(4) now, the basic idea is that: each step, the loop uses the i to allocate 1 sizeof(char) (so 1 position) to store the i-th remainder of the n/10 division. n takes different values every steps ( n/=10; // so n assumes the value of the division). and if n/10 is equal to zero, that means I have reached the end of the loop because each remainder is in the string). Therefore, I put a break statement, in order to go outside the for-loop.
finally, the function is supposed to return the pointer to the 0-th position of the string.
so, to sum up: my main question is:
why do I have " "s": initialization requires a brace-enclosed initializer list"? (debugger repeated it twice). that's not how string is supposed to be initialized (with curly braces "{}"). String is initialized with " " instead, am I wrong?
char* convert(unsigned int n) {
char s[] = "this is the test seed";
for (unsigned int i = 0; ; i++) {
if (i == 0) {
char s[] = malloc (1 * sizeof(char));
}
if (i != 0) {
char s[] = malloc(i * sizeof(char));
}
if ((n / 10) == 0) {
break;
}
s[i] = n % 10;
n /= 10;
}
return s;
}
char s[]is an array, and therefore needs a brace-enclosed initializer list (or a character string literal). In the C standard, see section 6.7.8 (with 6.7.8.14 being the additional special case of a literal string for an array of character type). char s[] = malloc(...); is neither a brace-enclosed initializer list or a literal string, and the compiler is correctly reporting that as an error.
The reason for this, is that char s[] = ...; declares an array, which means that the compiler needs to know the length of the array at compile-time.
Perhaps you want char *s = malloc(...) instead, since scalars (for example, pointers) can be initialized with an assignment statement (see section 6.7.8.11).
Unrelated to your actual question, the code you've written is flawed, since you're returning the value of a local array (the first s). To avoid memory problems when you're coding, avoid mixing stack-allocated memory, statically allocated strings (eg: literal strings), and malloc-ed memory. If you mix these together, you'll never know what you can or can't do with the memory (for example, you won't be sure if you need to free the memory or not).
A complete working example:
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
char *convert(unsigned n) {
// Count digits of n (including edge-case when n=0).
int len = 0;
for (unsigned m=n; len == 0 || m; m /= 10) {
++len;
}
// Allocate and null-terminate the string.
char *s = malloc(len+1);
if (!s) return s;
s[len] = '\0';
// Assign digits to our memory, lowest on the right.
while (len > 0) {
s[--len] = '0' + n % 10;
n /= 10;
}
return s;
}
int main(int argc, char **argv) {
unsigned examples[] = {0, 1, 3, 9, 10, 100, 123, 1000000, 44465656, UINT_MAX};
for (int i = 0; i < sizeof(examples) / sizeof(*examples); ++i) {
char *s = convert(examples[i]);
if (!s) {
return 2;
}
printf("example %d: %u -> %s\n", i, examples[i], s);
free(s);
}
return 0;
}
It can be run like this (note the very useful -fsanitize options, which are invaluable especially if you're beginning programming in C).
$ gcc -fsanitize=address -fsanitize=leak -fsanitize=undefined -o convert -Wall convert.c && ./convert
example 0: 0 -> 0
example 1: 1 -> 1
example 2: 3 -> 3
example 3: 9 -> 9
example 4: 10 -> 10
example 5: 100 -> 100
example 6: 123 -> 123
example 7: 1000000 -> 1000000
example 8: 44465656 -> 44465656
example 9: 4294967295 -> 4294967295

Different types and sizes for each row of a 2D array in C

I'm trying to write a 2D array where each row has a different datatype and a different number of cells.
The first row contains 3 chars, whereas the second row contains 2 ints.
The function "copy" should copy byte-by-byte the array po into the row-array p[1], but the visualization shows -24 3 instead of 1000 2000 (see picture). What is the solution? Thanks.
#include <stdio.h>
#include <stdio.h>
void copy(char* dest,char* source,int dim) {
int i;
for(i=0; i<dim ;i++)
dest[i]=source[i];
}
int main(void) {
char **p;
int po[]={1000,2000};
p = (char**) calloc(2,sizeof(char*));
p[0]= (char*) calloc(3,sizeof(char));
p[1]= (char*) calloc(2,sizeof(int));
p[0][0]='A';p[0][1]='B';p[0][2]='C';
copy((char*) p[1],(char*) po,2*sizeof(int));
printf("%c ",p[0][0]); printf("%c ",p[0][1]); printf("%c \n",p[0][2]);
printf("%i ",p[1][0]); printf("%i \n",p[1][1]);
free(p[1]);free(p[0]);free(p);
return 0;
}
1000 is represented in binary as 1111101000. Since int takes 2 bytes that means with 16 bits the actual representation should be 00000011 11101000 . Although it might appear so, the actual storage in the memory happens like this.
11101000 00000011
Now this is really weird, I know. But many of the machines follow little-endian convention, which means that
Whenever a multibyte value is stored, the first byte of the memory stores the least significant byte of the value.
I know this is weird, but it is very helpful to follow little-endian over big-endian(which obviously means the other way round) in implementing many algorithms.
Hence clearly 11101000 means -24 and 00000011 means 3. And there is no discussion about the 2000 at all because you are only asking for 2 characters and then converting them to integers using %i.
Now, that being said, I appreciate your curiosity and experimenting nature. But if you wanted what you expect to happen, the right piece of code would be
printf("%i ",((int*)p[1])[0]);
printf("%i \n",((int*)p[1])[1]);
And about how you wanted to have different datatypes in a single 2-D array, it is not technically possible. But as you have already guessed, you can store all addresses in a char*, that is true. But then like how I did above,
you will have to cast every array to its true form. For that purpose, you will have to store the datatype of every row in some other form.(Maybe in an int array, by putting 0 for int, 1 for char, 2 for double etc).
Try this for the second line of printf:
printf("%d %d \n", ((int *)p[1])[0], ((int *)p[1])[1]);
p[1][1] is the second byte (char) in array of chars. You want to access second int in array of ints. For that reason you have to convert p[1] to int * array and then get the element. For better readability:
int * p_int = (int *)p[1];
printf("%d %d \n", p_int[0], p_int[1]);
I'm trying to write a 2D array where each row has a different datatype
That's not going to work. By definition all elements of an array are the same type. You can't have some elements be one type and some be a different type.
If p has type char **, then p[i] has type char *, and p[i][j] has type char. In the call
printf( "%i", p[1][0] );
it's treating p[1][0] as a char object, not an int object, and thus only looking at part of the value. You could do something like
printf( "%i", *((int *)&p[1][0]) );
that is, treat the address of p[1][0] as the address of an int object and dereference it, which is ... well, pretty eye-stabby, and prone to error.
A better option would be to create an array of union type where each member can store either a char or int value, like so:
union e { char c; int i };
union e **p = malloc( 2 * sizeof *p );
p[0] = malloc( 3 * sizeof *p[0] );
p[1] = malloc( 2 * sizeof *p[1] );
p[0][0].c = 'A'; p[0][1].c = 'B'; p[0][2].c = 'C';
p[1][0].i = 1000; p[1][1].i = 2000;
printf( "%c\n", p[0][0].c );
printf( "%i\n", p[1][0].i );
There's no good way to specify "this entire row must store char values, while that entire row must store int values" - you'll have to enforce that rule yourself.

Can somebody invalidate my way of finding array length from pointer in C

I have come across three Questions in techgig Code Gladiator which had method signature like
int getMax(int [] a);
where "a" is array of positive integers.
which with all the theory I know, I can say is insufficient for solving this with C program.
with observations, though I know they prove wrong in theory, I came up with a method
for(i=0;a[i]>=0;i++);
which gave correct results in all cases.
Could somebody advise if this can used in all OS and compilers?
The function has undefined behaviour. Nothing prevents the generated object code by the compiler to scan memory beyond the array.
I tried the following simple program
#include <stdio.h>
int getMax( int a[] )
{
int i = 0;
for ( ; a[i] >= 0; i++ );
return i;
}
int main( void )
{
int x = 0;
int a[] = { 1, 2, 3 };
int y = 4;
printf( "&x = %p, a = %p, &y = %p\n", &x, a, &y );
printf( "%i\n", getMax( a ) );
}
And I got the following result
&x = 0302FF00, a = 0302FF04, &y = 0302FEFC
55
As you can see yourself 55 is not close to the size of the array:).
Running this code on another computer I got result
&x = 0xbf8704ac, a = 0xbf8704a0, &y = 0xbf87049c
4
When you read a negative value, it's not part of the array, and so that's potentially "out of bounds".
Also, you don't know how many "not in the array" positive values there were before you hit the negative value.
Consider an array of length 5 { 10, 20, 30, 40, 50 }, then uninitialized data {7378, 3562, 6271, 73473, -1 }. Your max is the "bad" value 73473 not the expected 50.
I say you can't do it reliably even on one OS or compiler.

Loop control in C using pointers for an array of structures

I am a newbie and am trying to understand the concept of pointers to an array using the example below. Can anyone tell me what the exit condition for the loop should be?
The while loop seems to be running forever but the program terminates with no output.
Thank you.
typedef struct abc{
int a;
char b;
} ABC;
ABC *ptr, arr[10];
int main()
{
ptr = &arr[0];
int i;
for(i = 0; i < 10; i++){
arr[i].a = i;
}
while(ptr!=NULL){
printf("%d \n", ptr->a);
ptr++; //Can I use ptr = ptr + n to skip n elements for some n?
}
}
while(ptr!=NULL){
This will run until ptr becomes NULL. Since it points to the first element of the array, and it's always incremented, and we don't know any other implementation detail, it may or may not become NULL. That's not how you check for walking past the end of the array. You would need
while (ptr < arr + 10)
instead.
Can I use ptr = ptr + n to skip n elements for some n?
Of course. And while we are at it: why not ptr += n?
The loop isn't infinite, it stops when ptr == 0.
Assuming you have a 32bit computer, ptr is 32 bits wide.
SO it can hold numbers from 0 to 4294967296-1 (0 to 2 ^ 32 -1).
Each time through the loop it adds 8 to ptr.
Eventually ptr will get to be 4294967296-8.
Adding 8 to that results in 4294967296 - but that is an overflow so the actual result is 0.
Note: This only works if PTR happens to start at a multiple of 8.
Offset it by 4 and this would be an infinite loop.
CHange the printf from "%d" to "%x" - printing the numbers in hex will make it more clear I think.

Does making the iterator a pointer speed up a C loop?

I ran the following:
#include <stdio.h>
typedef unsigned short boolean;
#define false 0
#define true (!false)
int main()
{
int STATUS = 0;
int i = 0;
boolean ret = true;
for(i = 0; i < 99999; i++)
{
ret = ret && printf("Hello, World.");
}
if(!ret)
{
STATUS = -1;
}
return STATUS;
}
It completes in just under a second. Typically 0.9 - 0.92.
Then I changed int i = 0; to int *i = 0; and now I am getting execution times under 0.2 seconds. Why the speed change?
Your runtime is dominated by the time required to print to the console. i++ on an int* will increment the pointer by the size of a pointer. That will be either 4 or 8 depending on your computer and compiler settings. Based on the numbers you report, presumably it would be 4. So printf is executed only a quarter as many times.
Typically, printing to a console will be several orders of magnitude larger than any gain with micro optimization you could do to such a loop.
Are you really sure your second version prints hello world 99999 times as well ?
When you're doing for(int *i = 0; i++ ; i < 99999 ) , you're cheking if the pointer value(an address) is less than 99999, which doesn't normally make a lot of sense. Incrementing a pointer means you step it up to point at the next element, and since you have an int*, you'll increment the pointer by sizeof(int) bytes.
You're just iterating 99999/sizeof(int) times.
Your comment on nos's answer confirmed my suspicion: it's pointer arithmetic. When you increment an int pointer using ++, it doesn't just add one to the number, but it actually jumps up by the size of an integer, which is usually 4 (bytes). So i++ is actually adding 4 to the numeric value of i.
Similarly, if you use += on a pointer, like i += 5, it won't just add 5 (or whatever) to the numeric value of i, it'll advance i by the size of that many integers, so 5*4 = 20 bytes in that case.
The reasoning behind this is that if you have a chunk of memory that you're treating as an array,
int array[100]; // for example
you can iterate over the elements in the array by incrementing a pointer.
int* i = array;
int* end = array + 100;
for (i = array; i < end; i++) { /* do whatever */ }
and you won't have to rewrite the loop if you use a data type of a different size.
The reason is because the increment operates differently on pointers.
On ints, i++ increments i by 1.
For pointers, i++ increments by the size of the pointed-to object, which will be 4 or 8 depending on your architecture.
So your loop runs for only 1/4 or 1/8 of the iteration count when i is a pointer vs when i is an int.
The correct way to do this test with a pointer would be something like:
int i;
int *i_ptr = &i;
for (*i_ptr = 0; *i_ptr < 99999; *i_ptr++) {
...

Resources