How to save uint64_t bytes to file on C? - c

How to save uint64_t bites to file on plain C?
As i suppose output file will be 8 bytes length?

fwrite(&sixty_four_bit_var, 8, 1, file_pointer)

EDIT. Since my compiler does not have uint64_t I have shown two ways to save a 64-bit value to file, by using unsigned long long. The first example writes it in (hex) text format, the second in binary.
Note that unsigned long long may be more than 64 bits on some systems.
#include <stdio.h>
int main(void) {
FILE *fout;
unsigned long long my64 = 0x1234567887654321;
// store as text
fout = fopen("myfile.txt", "wt");
fprintf(fout, "%llX\n", my64);
fclose(fout);
// store as bytes
fout = fopen("myfile.bin", "wb");
fwrite(&my64, sizeof(my64), 1, fout);
fclose(fout);
return 0;
}
Content of myfile.txt (hex dump)
31 32 33 34 35 36 37 38 38 37 36 35 34 33 32 31 1234567887654321
0D 0A ..
Content of myfile.bin (hex dump) (little-endian)
21 43 65 87 78 56 34 12 !Ce‡xV4.

I recommend fprintf it as String. Easier to verify (cat or open in text editor) and no hazzle with endianess. Check inttypes.h for the proper format specifier (PRIu64).
Read back with fscanf, using SCNu64 as format specifier.
That will also work if the data type is not aligned to the first position. Whil impropable for uint64_t, consider a char of 1 octet, but not starting from offset 0 for some reason (big endian CPU with no 8 bit load/store e.g.). This would be allowed by the standard.
However, if you realy want to get 8-bit values, use the following:
uint64_t value = input_value;
for ( size_t i = 0 ; i < 8 ; i++ ) {
fputc(value & 0xFF), filep);
value >>= 8;
}
That will store the value in little-endian format. Note that this is not guaranteed to work for signed due to the right-shift (but it will very likely).
For more complex structures, you might use a proper format like JSON with a library.

Related

getting random characters in terminal with C and Python

For some reason, when I open a file and read it byte by byte in Python and C and try to print the result, I get random characters/data mixed in.
For example, when I read the first 8 bytes of a PNG image, as in the following example:
/* Test file reading and see if there's random data */
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#define PNG_BYTES_TO_CHECK 8
int
main(void)
{
char fname[] = "../images/2.png";
FILE *fp = fopen(fname, "rb");
if (fp == NULL) abort();
char *buffer = (char *)malloc(PNG_BYTES_TO_CHECK);
if (fread(buffer, 1, PNG_BYTES_TO_CHECK, fp) != PNG_BYTES_TO_CHECK)
abort();
unsigned i;
for (i = 0; i < PNG_BYTES_TO_CHECK; ++i) printf("%x ", buffer[i]);
printf("\n");
free(buffer); fclose(fp);
return 1;
}
I get this garbage to stdout:
ffffff89 50 4e 47 d a 1a a
But when I open the file with a hex editor, the bytes are perfectly fine (it's a valid PNG signature):
Any ideas as to what may cause this ? I don't have an example for Python, but I recall a few days ago I was getting repetitive mumbo jumbo while working with files at the byte level and printing stuff as well.
The png spec states that a png file should always start with the bytes137 80 78 71 13 10 26 10. The maximum value for a signed byte is 127, meaning that the first byte's value overflows and becomes -119 (if this is confusing, check out the way negative numbers are represented). You are then printing it as an unsigned hexadecimal integer. To do so, the signed byte is promoted to an integer. Again, because of the way negative numbers are represented, a 4-byte integer whose value is -119 has the following binary representation: 11111111111111111111111110001001. %x is the format specifier for an unsigned hexadecimal value. Because it thinks the value you are giving it is unsigned, it won't interpret that binary as if it were represented like a negative number. If you convert 11111111111111111111111110001001 to hex, you'll see that it is ffffff89.
tl;dr: there's nothing wrong with the file. You just forgot to make your bytes unsigned.

How to convert a hexadecimal char into a 4-bit binary representation?

I wish to compare a SHA-256 hash which is stored in u8[32](after being calculated in kernel-space) with a 64 char string that the user passes as string.
For eg. : User passes a SHA-256 hash "49454bcda10e0dd4543cfa39da9615a19950570129420f956352a58780550839" as char* which will take 64 bytes. But this has to be compared with a hash inside the kernel space which is represented as u8 hash[32].
The hash inside the kernel gets properly printed in ASCII by the following code:
int i;
u8 hash[32];
for(i=0; i<32; i++)
printk(KERN_CONT "%hhx ", hash[i]);
Output :
"49 45 4b cd a1 0e 0d d4 54 3c fa 39 da 96 15 a1 99 50 57 01 29 42 0f 95 63 52 a5 87 80 55 08 39 "
As the complete hash is stored in 32 bytes and printed as 64 chars in groups of 2 chars per u8 space, I assume that currently one u8 block stores information worth 2 chars i.e. 00101111 prints to be 2f.
Is there a way to store the 64 bytes string in 32 bytes so that it can be compared?
Here is how to use scanf to do the conversion:
char *shaStr = "49454bcda10e0dd4543cfa39da9615a19950570129420f956352a58780550839";
uint8_t sha[32];
for (int i = 0 ; i != 32 ; i++) {
sscanf(shaStr+2*i, "%2" SCNx8, &sha[i]);
printf("%02x ", sha[i]);
}
The approach here is to call sscanf repeatedly with the "%2" SCNx8 format specifier, which means "two hex characters converted to uint8_t". The position is determined by the index of the loop iteration, i.e. shaStr+2*i
Demo.
Characters are often stored in ASCII, so start by having a look at an ASCII chart. This will show you the relationship between a character like 'a' and the number 97.
You will note all of the numbers are right next to each other. This is why you often see people do c-'0' or c-48 since it will convert the ASCII-encoded digits into numbers you can use.
However you will note that the letters and the numbers are far away from each other, which is slightly less convenient. If you arrange them by bits, you may notice a pattern: Bit 6 (&64) is set for letters, but unset for digits. Observing that, converting hex-ASCII into numbers is straightforward:
int h2i(char c){return (9*!!(c&64))+(c&15);}
Once you have converted a single character, converting a string is also straightforward:
void hs(char*d,char*s){while(*s){*d=(h2i(*s)*16)+h2i(s[1]);s+=2;++d;}}
Adding support for non-hex characters embedded (like whitespace) is a useful exercise you can do to convince yourself you understand what is going on.

writing a byte with the "write" system call in C

Using the system call write, I am trying to write a number to a file. I want the file pointed by fileid to have 4 as '04'(expected outcome).
unsigned int g = 4;
if (write(fileid, &g, (size_t) sizeof(int) ) == -1)
{
perror("Error"); exit(1);
}
I get the output '0000 0004' in my file. If I put one instead of sizeof(int) I get 00.
Is there a specific type that I missed ?
PS. I have to read this value form the file also, so if there isn't a type I'm not quite sure how I would go about doing that.
If writing 1 byte of g will print 00 or 04 will depend on the architecture. Usually, 32-bit integers will be stored in the memory using little-endian, meaning the less significant byte comes first, therefore 32-bits int 4 is stored as 04 00 00 00 and the first byte is 04.
But this is not always true. Some architectures will store using big-endian, so the byte order in memory is the same as its read in 32-bit hexadecimal 00 00 00 04.
Wikipedia Article.
sizeof(int) will return 4; so actually, the code is writing four bytes.
Change the type of 'g' from
unsigned int
to
unsigned char
... and, change
sizeof(int)
to
sizeof(unsigned char) .. or sizeof(g)
Then you should see that only one byte '04' will be written.
In this circumstance I would recommend using uint8_t, which is defined in <stdint.h>. On basically all systems you will ever encounter, this is a typedef for unsigned char, but using this name makes it clearer that the value in the variable is being treated as a number, not a character.
uint8_t g = 4;
if (write(fileid, &g, 1) != 1) {
perror("write");
exit(1);
}
(sizeof(char) == 1 by definition, and therefore so is sizeof(uint8_t).)
To understand why your original code did not behave as you expected, read up on endianness.
If you want to save only one byte, it will be more appropriate to create a variable that is of size one byte and save it using write.
unsigned int g = 4;
unsinged char c = (unsigned char)g;
if (write(fileid, &c, 1 ) == -1)
{
perror("Error"); exit(1);
}
If you lose any data, it will be in the program and not in/out of files.

Compress a struct into a binary file? [C]

This is part of my homework that I'm having difficults to solve.
I have a simple structure:
typedef struct Client {
char* lname;
unsigned int id;
unsigned int car_id;
} Client;
And the exercise is:
Create a text file named as the company name and then branch number with txt extention.
the file contain all clients' details.
The file you created in exercise 1 will be compressed. as a result, a binary file be created with .cmpr extention.
I don't really have an idea how to implement 2.
I remember at the lectures that the professor said we have to use "all" the variable, with binary operators (<< , >> , | , &, ~), but I don't know how to used it.
I'm using Ubuntu, under GCC and Eclipse. I'm using C.
I'd be glad to get helped. thanks!
Let's say the file from step 1 looks like:
user1798362
2324
462345
where the three fields were simply printed on three lines. Note that the above is the text/readable (i.e. ASCII) representation of that file.
Looking at the contents of this file in hex(adecimal) representation we get (with the ASCII character printed below each byte value):
75 73 65 72 31 37 39 38 33 36 32 0a 32 33 32 34 0a 34 36 32 33 34 35 0a
u s e r 1 7 9 8 3 6 2 nl 2 3 2 4 nl 4 6 2 3 4 5 nl
here nl is of course the newline character. You can count that there are 24 bytes.
In step 2 you have to invent another format that saves as many bits as possible. The simplest way to do this is to compress each of the three fields individually.
Similar to where the text format uses a nl to mark the end of a field, you also need a way to define where a binary field begins and ends. A common way is to put a length in front of the binary field data. As a first step we could replace the nl's with a length and get:
58 75 73 65 72 31 37 39 38 33 36 32 20 32 33 32 34 30 34 36 32 33 34 35
-- u s e r 1 7 9 8 3 6 2 -- 2 3 2 4 -- 4 6 2 3 4 5
For now we simply take a whole byte for the length in bits. Note that 58 is the hex representation of 77 (i.e. 11 characters * 8 bits), the bit length of lname',20hex equals 4 * 8 = 32, and30is 6 * 8 = 48. This does not compress anything, as it's still 24 bytes in total. But we already got a binary format because58,20and30` got a special meaning.
The next step would be to compress each field. This is where it gets tricky. The lname field consists of ASCII character. In ASCII only 7 of the 8 bits are needed/used; here's a nice table For example the letter u in binary is 01110101. We can safely chop off the leftmost bit, which is always 0. This yields 1110101. The same can be done for all the characters. So you'll end up with 11 7-bit values -> 77 bits.
These 77 bits now must be fit in 8-bit bytes. Here are the first 4 bytes user in binary representation, before chopping the leftmost bit off:
01110101 01110011 01100101 01110010
Chopping off a bit in C is done by shifting the byte (i.e. unsigned char) to the left with:
unsigned char byte = lname[0];
byte = byte << 1;
When you do this for all characters you get:
1110101- 1110011- 1100101- 1110010-
Here I use - to indicate the bits in these bytes that are now available to be filled; they became available by shifting all bits one place to the left. You now use one or more bit from the right side of the next byte to fill up these - gaps. When doing this for these four bytes you'll get:
11101011 11001111 00101111 0010----
So now there's a gap of 4 bits that should be filled with the bit from the character 1, etc.
Filling up these gaps is done by using the binary operators in C which you mention. We already use the shift left <<. To combine 1110101- and 1110011- for example we do:
unsigned char* name; // name MUST be unsigned to avoid problems with binary operators.
<allocated memory for name and read it from text file>
unsigned char bytes[10]; // 10 is just a random size that gives us enough space.
name[0] = name[0] << 1; // We shift to the left in-place here, so `name` is overwritten.
name[1] = name[1] << 1; // idem.
bytes[0] = name[0] | (name[1] >> 7);
bytes[1] = name[1] << 1;
With name[1] >> 7 we have 1110011- >> 7 which gives: 00000001; the right most bit. With the bitwise OR operator | we then 'add' this bit to 1110101-, resulting in 111010111.
You have to do things like this in a loop to get all the bits in the correct bytes.
The new length of this name field is 11 * 7 = 77, so we've lost a massive 11 bits :-) Note that with a byte length, we assume that the lname field will never be more than 255 / 7 = 36 characters long.
As with the bytes above, you can then coalesce the second length against the final bits of the lname field.
To compress the numbers you first read 'em in with (fscanf(file, %d, ...)) in an unsigned int. There will be many 0s at the left side in this 4-byte unsigned int. The first field for example is (shown in chunks of 4 bit only for readability):
0000 0000 0000 0000 0000 1001 0001 0100
which has 20 unused bits at the left.
You need to get rid of these. Do 32 minus the number of zero's at the left, and you get the bit-length of this number. Add this length to the bytes array by coalescing its bits against those of previous field. Then only add the significant bits of the number to the bytes. This would be:
1001 0001 0100
In C, when working with the bits of an 'int' (but also 'short', 'long', ... any variable/number larger than 1 byte), you must take byte-order or endianness into account.
When you do the above step twice for both numbers, you're done. You then have a bytes array you can write to a file. Of course you must have kept where you were writing in bytes in the steps above; so you know the number of bytes. Note that in most cases there will be a few bits in the last byte that are not filled with data. But that doesn't hurt and it simply unavoidable waste of the fact that files are stored in chunks of 8 bits = 1 byte minimally.
When reading the binary file, you'll get a reverse process. You'll read in a unsigned char bytes array. You then know that the first byte (i.e. bytes[0]) contains the bit-length of the name field. You then fill in the bytes of the 'lname' byte-by-byte by shifting and masking. etc....
Good luck!

memory layout - C union

I have a union type of array of three integers (4 bytes each), a float (4 bytes), a double (8 bytes) and a character (1 byte).
if I assign 0x31313131 to each of the three integer elements and then printed the union's character, I will get the number 1. Why ?
I don't understand the output I know that the bits of 3 0x31313131 is
001100010011000100110001001100010011000100110001001100010011000100110001001100010011000100110001
Because '1' == 0x31. You are printing it as character, not integer.
since it is a union all the int and char share the same memory location (the float and double does not matter in this context). So assigning 0x31313131 to the int does affect the char value -- nothing much confusing there.
Every member of a union has the same starting address; different members may have different sizes. The size of the union as a whole is at least the maximum size of any member; there may be extra padding at the end for alignment requirements.
You store the value 0x31313131 in the first three int-sized memory areas of your union object. 0x31313131 is 4 bytes, each of which has the value 0x31.
You then read the first byte (from offset 0) by accessing the character member. That byte has the value 0x31, which happens to be the encoding for the character '1' in ASCII and similar character sets. (If you ran your program on an EBCDIC-based system, you'd see different results.)
Since you haven't shown us any actual source code, I will, based on your description:
#include <stdio.h>
#include <string.h>
void hex_dump(char *name, void *base, size_t size) {
unsigned char *arr = base;
char c = ' ';
printf("%-8s : ", name);
for (size_t i = 0; i < size; i ++) {
printf("%02x", arr[i]);
if (i < size - 1) {
putchar(' ');
}
else {
putchar('\n');
}
}
}
int main(void) {
union u {
int arr[3];
float f;
double d;
char c;
};
union u obj;
memset(&obj, 0xff, sizeof obj);
obj.arr[0] = 0x31323334;
obj.arr[1] = 0x35363738;
obj.arr[2] = 0x393a3b3c;
hex_dump("obj", &obj, sizeof obj);
hex_dump("obj.arr", &obj.arr, sizeof obj.arr);
hex_dump("obj.f", &obj.f, sizeof obj.f);
hex_dump("obj.d", &obj.d, sizeof obj.d);
hex_dump("obj.c", &obj.c, sizeof obj.c);
printf("obj.c = %d = 0x%x = '%c'\n",
(int)obj.c, (unsigned)obj.c, obj.c);
return 0;
}
The hex_dump function dumps the raw representation of any object, regardless of its type, by showing the value of each byte in hexadecimal.
I first fill the union object with 0xff bytes. Then, as you describe, I initialize each element of the int[3] member arr -- but to show more clearly what's going on, I use different values for each byte.
The output I get on one system (which happens to be little-endian) is:
obj : 34 33 32 31 38 37 36 35 3c 3b 3a 39 ff ff ff ff
obj.arr : 34 33 32 31 38 37 36 35 3c 3b 3a 39
obj.f : 34 33 32 31
obj.d : 34 33 32 31 38 37 36 35
obj.c : 34
obj.c = 52 = 0x34 = '4'
As you can see, the initial bytes of each member are consistent with each other, because they're stored in the same place. The trailing ff bytes are unaffected by assigning values to arr (this is not the only valid behavior; the standard says they take unspecified values). Because the system is little-endian, the high-order byte of each int value is stored at the lowest position in memory.
The output on a big-endian system is:
obj : 31 32 33 34 35 36 37 38 39 3a 3b 3c ff ff ff ff
obj.arr : 31 32 33 34 35 36 37 38 39 3a 3b 3c
obj.f : 31 32 33 34
obj.d : 31 32 33 34 35 36 37 38
obj.c : 31
obj.c = 49 = 0x31 = '1'
As you can see, the high-order byte of each int is at the lowest position in memory.
In all cases, the value of obj.c is the first byte of obj.arr[0] -- which will be either the high-order or the low-order byte, depending on endianness.
There are a lot of ways this can vary across different systems. The sizes of int, float, and double can vary. The way floating-point numbers are represented can vary (though this example doesn't show that). Even the number of bits in a byte can vary; it's at least 8, but it can be bigger. (It's exactly 8 on any system you're likely to encounter). And the standard allows padding bits in integer representations; there are none in the examples I've shown.

Resources