Copying a string literal to an uint32_t array and accessing it - c

I pasted below code else where but it was suggested as a bad solution. The standard has this to say about memcpy:
"The memcpy function copies n characters from the object pointed to by s2 into the
object pointed to by s1. If copying takes place between objects that overlap, the behavior
is undefined."
and this about uint32_t:
"The typedef name uintN_t designates an unsigned integer type with width N and no
padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of
exactly 24 bits."
Are there any alignment issues ? I have always been using this on linux and never encountered any bugs or such. I only use bitwise ops for access when i had to worry about the endianness for example receiving data over a link from another architecture. Kindly throw some light.
#include <stdio.h>
#include<string.h>
#include<stdint.h>
char* pointer = "HelloWorld!Hell!";
uint32_t arr[4];
unsigned char myArray[16];
int main(void) {
memcpy(arr, pointer, (size_t)16);
// Is this illegal ?
char *arr1 = (char *)arr;
for(int i = 0 ; i < 16; i++)
{
printf("arr[%d]=%c\n", i, arr1[i]);
}
}

The call to memcpy is fine. Where you have undefined behavior is here:
printf("%s\n", arr);
The %s format specifier expects a char * argument but you're passing a uint32_t *. Such an argument mismatch is undefined behavior. The two pointer types may have the same representation on your system, but that isn't necessarily true in general.
Even if the types matched, you would still have UB because arr isn't large enough to contain the string "HelloWorld!Hell!". This string (including the null terminating byte) is 17 bytes wide and so the null terminator isn't copied. Then printf reads past the end of the array which is UB.
As an example, I modified the list of variables as follows:
uint32_t x = 0x11223344;
uint32_t arr[4] = { 1, 2, 3, 4 };
uint32_t y = 0x55667788;
And got the following output:
HelloWorld!Hell!�wfU
As for this:
char *arr1 = (char *)arr;
This is legal because a pointer of one object type may be converted to a pointer to another object type. Also, because the destination type is char *, it is legal to dereference that pointer to access the underlying bytes of the original object.

Related

Simple implementation of sizeof in C

I came across one simple (maybe over simplified) implementation of the sizeof operator in C, which goes as follows:
#include <stdio.h>
#define mySizeof(type) ((char*)(&type + 1) - (char*)(&type))
int main() {
char x;
int y;
double z;
printf("mySizeof(char) is : %ld\n", mySizeof(x));
printf("mySizeof(int) is : %ld\n", mySizeof(y));
printf("mySizeof(double) is : %ld\n", mySizeof(z));
}
Note: Please ignore whether this simple function can work in all cases; that's not the purpose of this post (though it works for the three cases defined in the program).
My question is: How does it work? (Especially the char* casting part.)
I did some investigations as follows:
#include <stdio.h>
#define Address(x) (&x)
#define NextAddress(x) (&x + 1)
int main() {
int n = 1;
printf("address is : %lld\n", Address(n));
printf("next address is : %lld\n", NextAddress(n));
printf("size is %lld\n", NextAddress(n) - Address(n));
return 0;
}
The above sample program outputs:
address is : 140721498241924
next address is : 140721498241928
size is 1
I can see the addresses of &x and &x + 1. Notice that the difference is 4, which means 4 bytes, since the variable is int type. But, when I do the subtraction operation, the result is 1.
What you have to remember here is that pointer arithmetic is performed in units of the size of the pointed-to type.
So, if p is a pointer to the first element of an int array, then *p refers to that first element and the result of the p + 1 operation will be the address resulting from adding the size of an int to the address in p; thus, *(p + 1) will refer to the second element of the array, as it should.
In your mySizeof macro, the &type + 1 expression will yield the result of adding the size of the relevant type to the address of type; so, in order for the subsequent subtraction of &type to yield the size in bytes, we cast the pointers to char*, so that the subtraction will be performed in base units of the size of a char … which is guaranteed by the C Standard to be 1 byte.
Pointers carry the information about their type. If you have a pointer to a 4-byte value such is int, and add 1 to it, you get a pointer to the next int, not a pointer to the second byte of the original int. Similarly for subtraction.
If you want to obtain the item size in bytes, it's necessary to force pointers to point to byte-like items. Hence the typecast to char*.
See also Pointer Arithmetic
Your implementation of sizeof works for most objects, albeit you should modify it this way:
the misnamed macro argument type (which cannot be a type) should be bracketed in the expansion to avoid operator precedence issues.
the expression has type ptrdiff_t, it should be cast as size_t
the printf format for size_t is %zu. Note that %ld is incorrect for ptrdiff_t, you should use %td for this.
Here is a modified version:
#include <stdio.h>
#define mySizeof(obj) ((size_t)((char *)(&(obj) + 1) - (char *)&(obj)))
int main() {
char x;
int y;
double z;
printf("mySizeof(char) is : %zu\n", mySizeof(x));
printf("mySizeof(int) is : %zu\n", mySizeof(y));
printf("mySizeof(double) is : %zu\n", mySizeof(z));
return 0;
}
How it works:
valid pointers can point to an element of an array or the the element just past the last element of the array. Objects that are not arrays are considered as arrays of 1 element for this purpose.
so if obj is a valid lvalue &(obj) + 1 is a valid pointer past the end of obj in memory and casting it as (char *) is valid.
similarly (char *)&(obj) is a valid pointer to the beginning of the object, and the only iffy operation here is the subtraction of 2 valid pointers that cannot be considered to point to the same array of char.
the C standard make a special case of character type pointers to allow the representation of objects to be accessed as individual bytes. So (char *)(&(obj) + 1) - (char *)&(obj) effectively evaluates to the number of bytes in the representation of obj.
Note these limitations for this implementation of sizeof:
it does not work for types as in mySizeof(int)
the argument must be an object: mySizeof(1) does not work, nor mySizeof(x + 1)
the object may be struct or an array: char foo[3]; mySizeof(foo) but not a string literal: mySizeof("abc") nor a compound literal: mySizeof((char[2]){'a','b'})

Assigning a string to a char pointer is valid but assigning an integer to int pointer is invalid in C. Why?

A char pointer can be assigned an arbitrary string but an integer pointer cannot be assigned an integer. Since both of them are pointers and contains address. Why is assigning string valid but an integer invalid in C to a pointer before dynamic allocation.
#include<stdio.h>
int main()
{
char *s = "sample_string"; // valid
printf("%s\n", s);
int *p = (int)5; // invalid
printf("%d\n", *p);
return 0;
}
Which gives output :
sample_string
Segmentation fault (core dumped)
What is the reason behind it? Although both of them are invalid in C++.
There is no "string type" in C. A "string", by C definition, is an array of char with a zero byte at the end.
The type of "sample_string" is char[14], which can be assigned to a pointer.
The type of (int)5 is int, which cannot[1].
The segmentation fault happens because you are accessing the address 0x00000005, which is not valid.
[1]: Technically you can. But if you want to dereference that pointer successfully, you have to take care that the address value of that integer has the proper alignment for the type, and is referring to a valid object of the type. Which is why compilers generate a warning if you don't explicitly cast that integer to pointer type in the assignment, to indicate that you do know what you're doing.
char *s = "sample_string"; Here "sample_string" is a string literal which is a const char[] in C++. It's implicitly converted to const char*. You'll get a warning though since you're assigning it to a char*.
int *p = (int)5; Here 5 is just an integer. Since you're assigning this a pointer, that means it's an invalid pointer value. And hence when it's referenced, you get a segfault.
This is simple:
A char object may hold a char value: char x = 'a';.
An int object may hold an int value: int x = 3;.
A char * object may point to an array of char: char *p = "abc";.
An int * object may point to an array of int: int *p = (int []) {1, 2, 3};.
(In this answer, “point to an array” is short for “point to the first element of an array”.)
In C, a string literal, such as "abc", is effectively an array of char, including a null character at the end. Also, the text above, (int []) {1, 2, 3}, is a compound literal that creates an array of int. So both "abc" and (int []) {1, 2, 3} are arrays. When an array is assigned to a pointer, the C implementation automatically converts it to a pointer to its first element. (This conversion occurs whenever an array is used in any expression other than as the operand of sizeof, as the operand of unary &, or, if it is a string literal, as the initializer for an array.)
The convention is that strings are arrays of char (char[]) and a pointer to a string points to the first element of this char array, similar like a pointer to an array always points to its first element by default, i.e for an int array
int a[10];
int *p;
p=&a
points to the first element of a that is a[0] in index notation
but an integer pointer cannot be assigned an integer.
Not quite. In C an integer can be assigned to a pointer - with certain conditions. Yet this only sets the pointer to 5, not that p points to an int with the value of 5. *p attempts to read what is at address 5 and interpret that location as an int. Certainly access to address 5 is invalid and causes a seg fault.
Even if those conditions are met (see below), this is certainly not what OP is seeking which I assume to be set the pointer p to point to someplace with the value/type of 5/int in it.
(int) {5} is a compound literal, available since C99. Here it is an int with the value of 5 and code takes the address of that object and assigns that address to p.
// int *p = (int)5;
int *p = & ((int) {5});
printf("%d\n", *p); // prints 5
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation. C17dr § 6.3.2.3 5

Access c variable as an array

I've got moderately stuck, googling the right words can't got me to the right answer. Even worse, I've already done that but my own code example lost somewhere in the source code.
#include <stdio.h>
int main()
{
short x = 0xABCD;
char y[2] = { 0xAB, 0xCD };
printf("%x %x\n", y[0], y[1]);
printf("%x %x\n", (char *)&x[0], (char *)&x[1]);
}
Basically I need to access individual variable bytes via array by pointer arithmetic, without any calculations, just by type casting.
Put parentheses around your cast:
printf("%x %x\n", ((char *)&x)[0], ((char *)&x)[1]);
Note that endian-ness may change your expected result.
In the future, compile with -Wall to see what the warnings or errors are.
It's somewhat supported in C99. By a process known as type punning via union.
union {
short s;
char c[2];
} pun;
pun.s = 0xABCD;
pun.c[0] // reinterprets the representation of pun.s as char[2].
// And accesses the first byte.
Pointer casting (as long as it's to char*, to avoid strict aliasing violations) is also ok.
short x = 0xABCD;
char *c = (char*)&x;
If you're only bothered about getting the values, you can store the address of the source variable in a char * and increment and dereference the char pointer to print the values of each byte.
Quoting C11, chapter §6.3.2.3
[....] When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
Something like (consider pseudo-code, not tested)
#include <stdio.h>
int main(void)
{
int src = 0x12345678;
char * t = &src;
for (int i = 0; i < sizeof(src); i++)
printf("%x\t", t[i]);
return 0;
}
should do it.
That said, to elaborate on the accepted answer, the why part:
As per the operator precedence table, array indexing operator has higher precedence over the type-casting, so unless forced explicitly, in the expression
(char *)&x[0]
the type of x is not changed as expected. So, to enforce the meaningful usage of the type-casting, we need to enclose it into extra par of parenthesis.

How to access char array using an int pointer?

Hi how to access character array using integer point.
char arr[10] = {'1','2','3','4','5','6','7','8','9','10'};
int *ptr;
How i can print values of 'arr' using pointer ptr?
It is a little unclear what your goal is, but trying to print out a character array with an integer pointer is a bit like trying to get to the second step taking four-steps at a time. When you tell the compiler that you would like to reference a memory address with an integer pointer, the compiler knows that an integer is sizeof (int) bytes (generally 4-bytes on x86/x86_64). So attempting to access each element in a character array with an integer pointer and normal pointer arithmetic wouldn't work. (you would be advancing 4-bytes at a time).
However printing the character array though an integer pointer is possible if you use the integer pointer for the starting address of the array and advance the pointer by the number of characters in the array by casting back to char. While it is doubtful this is your goal, the plain statement of your question seems to suggest it. To accomplish this, you could:
#include <stdio.h>
int main (void)
{
char arr[] = {'1','2','3','4','5','6','7','8','9'};
int *ptr = (int *)arr;
unsigned int i;
for (i = 0; i < sizeof arr; i++)
printf (" %c", (*(char *)ptr + i));
putchar ('\n');
return 0;
}
Output
$ ./bin/char_array_int_ptr
1 2 3 4 5 6 7 8 9
Note: your original initialization of your array with a character '10' was invalid. If this was an assignment, it is likely intended to expose you to how pointer arithmetic is influenced by type and the ability to cast from and to type char (without violating strict aliasing rules)
If you are just after the integer values you can print out the characters as integers
for (i = 0; i < sizeof(arr)/sizeof(arr[0]); ++i)
{
printf( "%d ", arr[i] );
}
Using a pointer of the wrong data type to access anything is undefined behavior, thus making it not something you want to do. If you want to cast a char to an integer, you can do that. If you want to print the integer value of a char, you can do that too.
But using a pointer type integer to access a char array is undefined behavior.

Why does my homespun sizeof operator need a char* cast?

Below is the program to find the size of a structure without using sizeof operator:
struct MyStruct
{
int i;
int j;
};
int main()
{
struct MyStruct *p=0;
int size = ((char*)(p+1))-((char*)p);
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
return 0;
}
Why is typecasting to char * required?
If I don't use the char* pointer, the output is 1 - why?
Because pointer arithmetic works in units of the type pointed to. For example:
int* p_num = malloc(10 * sizeof(int));
int* p_num2 = p_num + 5;
Here, p_num2 does not point five bytes beyond p_num, it points five integers beyond p_num. If on your machine an integer is four bytes wide, the address stored in p_num2 will be twenty bytes beyond that stored in p_num. The reason for this is mainly so that pointers can be indexed like arrays. p_num[5] is exactly equivalent to *(p_num + 5), so it wouldn't make sense for pointer arithmetic to always work in bytes, otherwise p_num[5] would give you some data that started in the middle of the second integer, rather than giving you the sixth integer as you would expect.
In order to move a specific number of bytes beyond a pointer, you need to cast the pointer to point to a type that is guaranteed to be exactly 1 byte wide (a char).
Also, you have an error here:
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
You have two format specifiers but only one argument after the format string.
If I don't use the char* pointer, the output is 1 - WHY?
Because operator- obeys the same pointer arithmetic rules that operator+ does. You incremented the sizeof(MyStruct) when you added one to the pointer, but without the cast you are dividing the byte difference by sizeof(MyStruct) in the operator- for pointers.
Why not use the built in sizeof() operator?
Because you want the size of your struct in bytes. And pointer arithmetics implicitly uses type sizes.
int* p;
p + 5; // this is implicitly p + 5 * sizeof(int)
By casting to char* you circumvent this behavior.
Pointer arithmetic is defined in terms of the size of the type of the pointer. This is what allows (for example) the equivalence between pointer arithmetic and array subscripting -- *(ptr+n) is equivalent to ptr[n]. When you subtract two pointers, you get the difference as the number of items they're pointing at. The cast to pointer to char means that it tells you the number of chars between those addresses. Since C makes char and byte essentially equivalent (i.e. a byte is the storage necessary for one char) that's also the number of bytes occupied by the first item.

Resources