Creating and extracting data from trap representations via memcpy - c

If we use memcpy to copy arbitrary data into a variable, potentially creating a trap representation, can we reliably extract that data later so long as the value of that variable is never accessed? In other words, does the following program avoid undefined behavior and reliably pass the assert test at the end?
#include <assert.h>
#include <stdlib.h>
#include <string.h>
int main()
{
// Create some random data
unsigned char original_data[ 100 ];
for( int i = 0; i < sizeof( original_data ); ++i )
original_data[ i ] = rand();
// A float array whose size is at least as big as the data
float float_array[ sizeof( original_data ) / sizeof( float ) + 1 ] = { 0 };
// Create probable trap representations in the float array
memcpy( float_array, original_data, sizeof( original_data ) );
// Do other things without ever accessing the float values in the array
// ...
// Extract data
unsigned char extracted_data[ sizeof( original_data ) ];
memcpy( extracted_data, float_array, sizeof( original_data ) );
// original_data and extracted_data should now store the same data
for( int i = 0; i < sizeof( original_data ); ++i )
assert( extracted_data[ i ] == original_data[ i ] );
return 0;
}
Obviously, the mere existence of a trap representation does not cause undefined behavior, since any uninitialized variable could contain one. But is it possible that the trap representations could somehow spontaneously change between the calls to memcpy?
Note that this is a language lawyer question about this specific scenario, so I’m not asking about how we can copy or store data without creating trap representations.

Related

Reliably and portably store and retrieve objects of structure type in C

#bdonlan,in Copying structure in C with assignment instead of memcpy(), lists several reasons for using memcpy to copy objects of structure type. I have one more reason: I want to use the same area of memory to store and retrieve arbitrary objects—of possibly different structure type—at different times (like storage on a pre-allocated heap).
I want to know:
how this can be done portably (in the sense that the behavior defined by the Standard) and
what parts of the Standard allow me to reasonably assume that it can be done portably.
Here is an MRE (sorta: not so much on the "M" [minimal] and I'm basically asking about the "R" [reproducible]):
Edit: I hope to have placed a better example after this one. I'm leaving this one here so as to provide a reference for the answers and comments thus far.
// FILE: memcpy_struct.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// EDIT: #john-bollinger POINTS OUT THAT THE FOLLOWING LINE
// IS NOT PORTABLE.
// typedef struct { } structure ;
// INSTEAD:
typedef struct { char dummy ; } structure ;
typedef struct {
unsigned long long u ; unsigned long long v ;
} unsignedLongLong2; // TWICE AS MANY BITS AS long long
typedef struct
{
unsigned long long u ; unsigned long long v ;
unsigned long long w ; unsigned long long x ;
} unsignedLongLong4; // FOUR TIMES AS MANY BITS AS long long
typedef unsigned char byte ;
void store ( byte * target , const structure * source , size_t size ) {
memcpy ( target , source , size ) ;
}
void fetch ( structure * target , const byte * source , size_t size ) {
memcpy ( target , source , size ) ;
}
const size_t enough =
sizeof ( unsignedLongLong2 ) < sizeof ( unsignedLongLong4 )
? sizeof ( unsignedLongLong4 ) : sizeof ( unsignedLongLong2 ) ;
int main ( void )
{
byte * memory = malloc ( enough ) ;
unsignedLongLong2 v0 = { 0xabacadabaabacada , 0xbaabacadabaabaca } ;
unsignedLongLong4 w0= {
0xabacadabaabacada , 0xbaabacadabaabaca ,
0xdabaabacadabaaba , 0xcadabaabacadabaa } ;
unsignedLongLong2 v1 ;
unsignedLongLong4 w1 ;
store ( memory , ( structure * ) & v0 , sizeof v0 ) ;
fetch ( ( structure * ) & v1 , memory , sizeof v1 ) ;
store ( memory , ( structure * ) & w0 , sizeof w0 ) ;
fetch ( ( structure * ) & w1 , memory , sizeof w1 ) ;
char s [ 1 + sizeof w0 * CHAR_BIT ] ; // ENOUGH FOR TERMINATING NULL CHAR-
char t [ 1 + sizeof w0 * CHAR_BIT ] ; // ACTERS + BASE-2 REPRESENTATION.
sprintf ( s, "%llx-%llx", v0 . u, v0 . v ) ;
sprintf ( t, "%llx-%llx", v1 . u, v1 . v ) ;
puts ( s ) ; puts ( t ) ;
puts ( strcmp ( s , t ) ? "UNEQUAL" : "EQUAL" ) ;
sprintf ( s, "%llx-%llx-%llx-%llx", w0 . u, w0 . v, w0 . w, w0 . x ) ;
sprintf ( t, "%llx-%llx-%llx-%llx", w1 . u, w1 . v, w1 . w, w1 . x ) ;
puts ( s ) ; puts ( t ) ;
puts ( strcmp ( s , t ) ? "UNEQUAL" : "EQUAL" ) ;
free ( memory ) ;
}
Compiled with
gcc -std=c11 memcpy_struct.c # can do C99 or C17, too
Output of corresponding executable
abacadabaabacada-baabacadabaabaca
abacadabaabacada-baabacadabaabaca
EQUAL
abacadabaabacada-baabacadabaabaca-dabaabacadabaaba-cadabaabacadabaa
abacadabaabacada-baabacadabaabaca-dabaabacadabaaba-cadabaabacadabaa
EQUAL
But what guarantees that the pairs of outputs will always be EQUAL, provided that the Standard is respected? I think the following helps (N2176 Types 6.2.5-28):
All pointers to structure types shall have
the same representation and alignment requirements as each other.
Edit: After considering the answers and comments, I think the following is a better MRE:
// FILE: memcpy_struct-1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
size_t length ;
} array_header ;
typedef struct
{
size_t capacity ;
size_t length ;
} buffer_header ;
const size_t hsize_max =
sizeof ( array_header ) < sizeof ( buffer_header )
? sizeof ( buffer_header ) : sizeof ( array_header ) ;
const size_t block = 512u ;
const size_t pageSize = block * ( 1 +
( hsize_max / block + ! ! hsize_max % block ) ) ;
int main ( void )
{
void * memory = malloc ( pageSize ) ;
array_header a0 = { 42u } ;
buffer_header b0 = { 42u , 0u } ;
array_header a1 ;
buffer_header b1 ;
memcpy ( memory , & a0 , sizeof a0 ) ;
memcpy ( & a1 , memory , sizeof a1 ) ;
memcpy ( memory , & b0 , sizeof b0 ) ;
memcpy ( & b1 , memory , sizeof b1 ) ;
fputs ( "array_header-s are " , stdout ) ;
puts ( a0.length == a1.length ? "EQUAL" : "UNEQUAL" ) ;
fputs ( "buffer_header-s are " , stdout ) ;
puts ( b0.capacity == b1.capacity && b0.length == b1.length
? "EQUAL" : "UNEQUAL" ) ;
free ( memory ) ;
}
Since you are asking about portability and the provisions of the standard, the very first thing that came to mind was that structure types without any members, such as this ...
typedef struct { } structure ;
... are a non-portable extension. Your objective there seems to be to use structure * as a generic pointer-to-structure type, but you don't need that when you have void * available as a generic pointer-to-anything type. And with void *, you even get the pointer conversions automatically, without the explicit casts. Note also that you eventually get the conversions to void * anyway when you call memcpy().
I want to use the same area of memory to store and retrieve arbitrary objects—of possibly different structure type—at different
times (like storage on a pre-allocated heap).
Ok. That's not a particularly big ask.
I want to know:
how this can be done portably (in the sense that the behavior defined by the Standard) and
Your example is fine. Alternatively, if you know in advance all the different structure types that you may want to store, then you can use a union.
what parts of the Standard allow me to reasonably assume that it can be done portably.
With your dynamic allocation / memcpy() example, there is
C17 7.22.3.4/2: "The malloc function allocates space for an object whose size is specified by size"
C17 6.2.4/2: "An object exists, has a constant address, and retains its last-stored value throughout its lifetime."
C17 7.22.3/1: "The lifetime of an allocated object extends from
the allocation until the deallocation."
C17 7.24.2.1/3: "The memcpy function copies n characters from the object pointed to by s2 into the object pointed to
by s1."
Thus, in a program exhibiting only defined behavior, memcpy() faithfully copies all the specified bytes from the source object's representation to the destination object's representation. That object retains them unchanged until and unless either they are overwritten or the end of its lifetime. That keeps them available for the second memcpy() to copy them from there to some other object. Neither memcpy alters the byte sequence, and the allocated object faithfully keeps them in between, so in the end, all three objects -- the original, the allocated, and the final destination, must contain the same byte sequence, up to the number of bytes copied.
If you are asking about some way to “store” a structure and later recover the same structure into an object of the same type, then it suffices merely to copy the bytes. This can be done by memcpy, and there is no need for any kludges using structures defined with various numbers of unsigned long long elements.1 This is guaranteed by C 2018 6.2.6.1 paragraphs 2 to 4:
2 Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number,
order, and encoding of which are either explicitly specified or implementation-defined.
3 Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a
pure binary notation.
4 Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value…
So, to store any structure, or any object other than a bit-field, reserve enough memory for it2 and copy the object’s bytes into that memory. To recover the structure, copy the bytes back.
Regarding:
I think the following helps (N2176 Types 6.2.5-28):
All pointers to structure types shall have the same representation and alignment requirements as each other.
That is irrelevant. No representation of any pointer is used in the code int the question, so their representations (what bytes make up the recorded value for a pointer) are irrelevant.
Footnotes
1 Why use multiple members with different names? To define a structure with N unsigned long elements in it, all you need is struct { unsigned long long x[N]; }.
2 For an object X, this can be done with void * Memory = malloc(sizeof X) or, if your compiler supports variable length arrays, with unsigned char Memory[sizeof X];, or, if you want it inside a structure, struct { unsigned char x[sizeof X]; } Memory;.

system("cls") in C is showing some weird behavior after using dynamic allocation

I've just started learning C and I decided to create a snake games using characters.
So I started building blocks for the games (functions and all). And I try to test each block individually.
So after some time I created the movement block and tested it. the program returned 0xC0000005 which appears to be illegal memory access error code.
After some tinkering I found that the problem is with the system("cls") function. I experimented with putting it elsewhere in the program and this behavior emerged:
If I use dynamic allocation the system("cls") no longer works.
This code works fine because the system("cls") is used before dynamic allocation:
#include <stdio.h>
main()
{
char ** grid;
int i;
system( "cls" );
grid = (char**)malloc( 16 * sizeof( char * ) );
for ( i = 0; i <= 16; ++i )
{
*(grid + i) = (char*)malloc( 75 * sizeof( char ) );
}
}
Whereas this code returns an error because it is called after the dynamic allocation:
#include <stdio.h>
main()
{
char ** grid;
int i;
grid = (char**)malloc( 16 * sizeof( char * ) );
for ( i = 0; i <= 16; ++i )
{
*(grid + i) = (char*)malloc( 75 * sizeof( char ) );
}
system( "cls" );
}
EDIT: after a bit of tinkering I found that reducing the size of each pointer allocated memory solves the problem which makes no sense
grid = (char**)malloc( 16 * sizeof( char * ) );
for ( i = 0; i <= 16; ++i )
You reserve space for 16 elements. Then you loop over the indices 0 through 16, including the 16 -- which makes for 17 loops, one too many. This leads to undefined behavior and your program acting funny.
The correct idiom is to loop to size, exclusive:
grid = (char**)malloc( 16 * sizeof( char * ) );
for ( i = 0; i < 16; ++i )
Note the < instead of <=.
As for various other issues with your code, here is a stylistically cleaned up version for you to consider:
// stdio.h was unused, but these two were missing
#include <string.h> // for malloc()
#include <stdlib.h> // for system(), but see below
// ncurses really is the go-to solution for console applications
#include <curses.h>
// Define constants globally instead of having "magic numbers" in the source
// (and then forgetting to adjust one of them when the number changes)
#define GRID_ROWS 16
#define GRID_COLUMNS 75
// int main( void ) or int main( int argc, char * argv[] )
int main( void )
{
char ** grid;
// system() is pretty evil. There are almost always better solutions.
// For console applications, try Ncurses library.
//system( "cls" );
clear();
// don't cast malloc(), sizeof object instead of type, using constant
grid = malloc( GRID_ROWS * sizeof( *grid ) );
// loop-scope index, using constant
for ( int i = 0; i < GRID_ROWS; ++i )
{
// indexed access, no cast, sizeof( char ) == 1, using constant
grid[i] = malloc( GRID_COLUMNS );
}
// You should also release any resources you allocated.
// Relying on the OS to do it for you is a bad habit, as not all OS
// actually can do this. Yes, this makes your program more complex,
// but it also gives you good hints which parts of main() should
// be made into functions, to ease clean-up.
// return 0; from main() is implicit, but still a good habit
// to write out.
return 0;
}
The wrong includes hint at you not using, or not paying attention to, compiler warnings. Something like -Wall -Wextra is the bare minimum of warning settings (there are a lot more available, check your compiler manual). There is also something like -Werror which interprets any warning as a hard error; it might be instructive to use that while you are still learning your way around the language.

How to create an array of ints when size of array and size of ints are not constant?

I need to create an array of integers with the aim of using them one by one later in the program. Integers are inputted by the user, so the size of the array and the size of each integer aren't constant. All I can say that due to specification of the program the array would not exceed, let's say, 100 elements and an integer would always be between 0 and 99. How can I do this? I am new to C and this is really baffling to me, as this is very easy to do in Python, and I've spent quite some time already trying to figure out how to make this work (I understand that all those complications with arrays arise from the fact that C is an old language).
First, don't confuse an integer value with the size of the type used to represent it. The values 0 and 99 will take up exactly the same amount of space in whatever integral type you use.
As for the array itself, you have several options available.
You can pick a meximum number of elements that your user won't be allowed to exceed:
#define MAX_ALLOWED 100
int main( void )
{
int values[MAX_ALLOWED]; // declare array with max-allowed values
size_t howMany;
// get howMany from user, make sure it doesn't exceed MAX_ALLOWED
...
}
If you are using a C99 or C2011 compiler that supports variable-length arrays, you could get the array length from the user and use that value in the array
declaration:
int main( void )
{
size_t howMany;
// get howMany from user
int values[howMany]; // declare array with user-specified number of values
...
}
If you don't want to use VLAs (and there are some good reasons why you don't), you can use dynamic memory management routines:
#include <stdlib.h>
int main( void )
{
size_t howMany;
// get howMany from user
int *values = malloc( howMany * sizeof *values ); // dynamically allocate
// space to store user-
// specified number of
// values.
if ( !values )
{
// memory allocation failed, panic
exit( 0 );
}
...
free( values ); // release the memory when you're done.
}
Note that in the last case, values is not declared as an array of int, but a pointer to int; it will store the address of the memory that was dynamically allocated. You can still use the [] operator to index into that space like a regular array, but in many other respects it will not be treated the same as an array.
Regardless of how you create the array, assigning elements is the same:
for ( size_t i = 0; i < howMany; i++ )
{
values[i] = value_for_this_element();
}

Setting an Array of Integer Pointers

I am trying to set an array of integer pointers. The programs is supposed to set the pointers at index i to point to an integer of value 2*i. The programs then should print out the pointees of the first 5 pointer elements, which should be 0,2,4,6,8.
For some reason I am getting a segmentation fault. Could anyone tell me why this happens and what I can do to fix it?
I attempted to replace the final line with " arr[index] = &i; ", which does not give me a segmentation fault but still gives me the wrong results.
Help would be greatly appreciated, just starting off array of pointers.
#include <stdio.h>
void setArr (int);
int * arr[10]; // array of 10 int pointers
int main(int argc, char *argv[])
{
int i;
setArr(0);
setArr(1);
setArr(2);
setArr(3);
setArr(4);
for(i=0; i<5;i++)
printf("arr [%d] = %d\n", i, *arr[i]); /* should be 0, 2, 4, 6, 8 */
return 0;
}
/* set arr[index], which is a pointer, to point to an integer of value 2*index */
void setArr (int index){
int i = 2 * index;
* arr[index] = i;
}
The problem is that you are not allocating memory for what each item in your array points to. The line
*arr[index] = i;
will set some random memory address (whatever was originally in arr[index]) to the value of i.
What you should do is:
void setArr(int index)
{
int *i = malloc(sizeof(int)); // allocate memory for the value
*i = 2 * index; // set the value
arr[index] = i; // make the array slot point at the value
}
but you need to make sure to free() the memory later. For example, before the return 0; statement in your main() function, put:
for (i = 0; i < 5; i++)
free(arr[i]);
You need to malloc space for your pointers inside the array.
This is taken from chapter 9 of Yashavant P. Kanetkar's Let Us "C" and should clarify why what you did earlier didn't work.
When we are using a two-dimensional array of characters we are at liberty to either initialize the strings where we are declaring the array, or receive the strings using scanf( ) function. However, when we are using an array of pointers to strings we can initialize the strings at the place where we are declaring the array, but we cannot receive the strings from keyboard using scanf( ). Thus, the following program would never work out.
main( )
{
char *names[6] ;
int i ;
for ( i = 0 ; i <= 5 ; i++ )
{
printf ( "\nEnter name " ) ;
scanf ( "%s", names[i] ) ;
}
}
The program doesn’t work because; when we are declaring the array it is containing garbage values. And it would be definitely wrong to send these garbage values to scanf( ) as the addresses where it should keep the strings received from the keyboard.
Solution
If we are bent upon receiving the strings from keyboard using scanf( ) and then storing their addresses in an array of pointers to strings we can do it in a slightly round about manner as shown below.
#include "alloc.h"
main( )
352 Let Us C
{
char *names[6] ;
char n[50] ;
int len, i ;
char *p ;
for ( i = 0 ; i <= 5 ; i++ )
{
printf ( "\nEnter name " ) ;
scanf ( "%s", n ) ;
len = strlen ( n ) ;
p = malloc ( len + 1 ) ;
strcpy ( p, n ) ;
names[i] = p ;
}
for ( i = 0 ; i <= 5 ; i++ )
printf ( "\n%s", names[i] ) ;
}
Here we have first received a name using scanf( ) in a string n[ ]. Then we have found out its length using strlen( ) and allocated space for making a copy of this name. This memory allocation has been done using a standard library function called malloc( ). This function requires the number of bytes to be allocated and returns the base address of the chunk of memory that it allocates. The address returned by this function is always of the type void *. Hence it has been converted into char * using a feature called typecasting. Typecasting is discussed in detail in Chapter 15. The prototype of this function has been declared in the file ‘alloc.h’. Hence we have #included this file.
But why did we not use array to allocate memory? This is because with arrays we have to commit to the size of the array at the time of writing the program. Moreover, there is no way to increase or decrease the array size during execution of the program. In other words, when we use arrays static memory allocation takes place.
Chapter 9: Puppetting On Strings 353
Unlike this, using malloc( ) we can allocate memory dynamically, during execution. The argument that we pass to malloc( ) can be a variable whose value can change during execution.
Once we have allocated the memory using malloc( ) we have copied the name received through the keyboard into this allocated space and finally stored the address of the allocated chunk in the appropriate element of names[ ], the array of pointers to strings.
This solution suffers in performance because we need to allocate memory and then do the copying

struct padding influence in C struct serialization ( saving to file )

I have the following structs in C:
typedef struct sUser {
char name[nameSize];
char nickname[nicknameSize];
char mail[mailSize];
char address[addressSize];
char password[passwordSize];
int totalPoints;
PlacesHistory history;
DynamicArray requests;
}User;
typedef struct sPlacesHistory {
HistoryElement array[HistorySize];
int occupied;
int last;
}PlacesHistory;
and the functions:
void serializeUser( User * user, FILE * fp ) {
fwrite( user, nameSize + nicknameSize + mailSize + addressSize + passwordSize + sizeof( int ) + sizeof( PlacesHistory ), 1, fp );
serializeDynamicArray( user -> requests, fp );
}
User * loadUser( FILE * fp ) {
User * user = malloc( sizeof( User ) );
fread( user, nameSize + nicknameSize + mailSize + addressSize + passwordSize + sizeof( int ) + sizeof( PlacesHistory ), 1, fp );
user -> requests = loadDynamicArray( fp );
return user;
}
When I load the struct User, and I print that user (loaded from file), the field "last" of placesHistory has the value of 255 or -1, depending on the order of the fields of the PlacesHistory structure. But The User I saved had -1 on that member.
So when i get 255, it is obviously wrong..
I suspect this has to do about struct padding.
How can I do this in such a way that the order of fields in the structure doesn't matter?
Or which criteria do I need to follow to make things work right?
Do I need to fwrite/fread one member at a time? ( I would like to avoid this for efficiency matters )
Do I need to serialize to an array first instead of a file? (I hope not .. because this implicates to know the size of all my structures beforehand because of the mallocated array- which means extra work creating a function for every non simple structure to know it's size)
Note: *Size are defined constants
Note2: DynamicArray is a pointer to another structure.
Yes, it probably has to do with padding in front of either totalPoints or history.
You can just write out sizeof(User) - sizeof(DynamicArray) and read back in the same. Of course this will only be compatible as long as your struct definitions and compiler don't change. If you don't need serialized data from one version of your program to be compatible with another version, then the above should work.
Why are you adding up all elements individually? That's just adding a lot of room for error. Whenever you change your structure, your code might break if you forgot to change all the places where you add the size up (in fact, why do you add it up each time?).
And, as you suspected, your code doesn't account for structure padding either, so you may be missing up to three bytes at the end of your data block (if your largest element is 4 bytes).
Why not sizeof(User) to get the size of the amount of data you're reading/writing? If you don't want parts of it saved (like requests), then use a struct inside a struct. (EDIT: Or, like rlibby suggested, just subtract the sizeof of the part you don't want to read.)
My guess is that your strings sizes are not divisible by 4, so you are 3 bytes short, and as such, it's possible that you were supposed to read "0xffffffff" (=-1) but ended up just reading "0xff000000" (=255 when using little endian, and assuming that your structure was zeroed out initially).
padding may be your problem because
nameSize + nicknameSize + mailSize + addressSize + passwordSize + sizeof( int ) + sizeof( PlacesHistory ) != sizeof( User )
so la last member (and last in struct) remain unitialized. To check this do a memset(,0,sizeof(User)) before reading from file.
To fix this use #pragma pack(push,1) before and #pragma pack(pop) after

Resources