I have a C program that writes to a file using fwrite(..) and the result is not consistent with the function's arguments I provide.
uint32_t x = 1009716161;
FILE * file = fopen("helloop.txt", "wb+");
rewind(file);
fwrite( &x, sizeof(uint32_t), 1, file);
fclose(file);
When I check the file afterward it seems to contains symbols that does not translate into anything
>cat helloop.txt
>Á/<
as I should be getting this
>cat helloop.txt
>000000003C2F0BC1
I checked the file's permissions and I chmodded it
chmod 777 helloop.txt
The way I see it I have a 1 element of size 32 bit integer that I want to write to file,
what am I doing wrong ?
Your program did exactly what you told it to.
In general, if a program you've written seems to be misbehaving, you shouldn't jump to the conclusion that the operating system, the compiler, or the runtime library is at fault. It's far more likely that there's a bug in your program, or that you've misunderstood something. That remains true no matter how many years of experience you have.
The value 1009716161, expressed in hexadecimal, is 0x3c2f0bc1. When you write that value to a binary file, you write 4 8-bit bytes, with values 0x3c, 0x2f, 0x0b, and 0xc1. In ASCII, those are '<', '/', and two characters outside the printable ASCII range. The order in which they're written depends on the endianness your system, but the contents you report seem consistent with that.
I"m not sure why you'd expect to see 000000003C2F0BC1. That's 16 byte when you only wrote 4 bytes to the file. Furthermore, binary files do not contain an ASCII rendering of the hexadecimal representation of the data you wrote -- they just contain the data.
If you examine the file by converting it from raw binary to hexadecimal (by using the hexdump or od -x command if your system has it), you should see something recognizable.
And if you open the file in binary mode and use fread to read the data back into a uint32_t object, you should get the original value 1009716161 back -- which is the whole point.
cat helloop.txt
Á/<
cat prints character data. It doesn't print a 4-byte value in a file as a 32-bit integer.
as I should be getting this
cat helloop.txt
000000003C2F0BC1
No you shouldn't, not with cat. You'd have to write the character string "000000003C2F0BC1" to the file if you expect to get that. The file would probably be 16 characters long. I'd bet right now if you run ls -l helloop.txt you'll see size 4 because you wrote a uint32_t binary integer to the file.
what am I doing wrong ?
As far as I can tell, the only thing you've done wrong is expecting cat to print out your uint32_t as a hexadecimal representation. (Though I did not check your hex value, so that may be incorrect as well)
See if you have hexdump on your Linux machine, it may give you better results.
EDIT: If you actually DO want to print a uint32_t as a hex string, you can use fprintf(..., "%x", x) with the x or X format specifier, but keep in mind this is not compatible with fwrite/fread, so to read it back in you'll have to read in the string and convert back from hex to int.
Related
I need to read an unsigned short using the read system call.
According to the manpage:
read() attempts to read up to count bytes from file descriptor fd
into the buffer starting at buf.
In my case, an unsigned short is two bytes size, so it can store the numbers up to 65535. But when I execute this code:
char buf[2];
bytes_read = read(0, buf, 2);
bytes_wrote = write(1, buf, 2);
and type in the command line, say, the number 123, it returns only 12. Does it not read bytes, but symbols? How can I read a value with more than 2 symbols into a 2-byte buffer? For example, the maximum values of an unsigned short. I found nothing in either K&R or the manpages about it, so I think it's very simple.
NB: I'm assuming your terminal uses either ASCII or UTF8. For the purposes of this explanation, they're equivalent.
When you type, say, 123, read isn't getting that as a number. It's seeing that as a sequence of bytes -- since you said that it should look to fill a 2-char buffer, it sees the first two bytes: First, 0x31, then 0x32. It reads the first byte, and then the second; it doesn't interpret them into numbers. That the series of bytes happens to represent a number when decoded as ASCII or UTF8 is irrelevant; all C cares about is the actual sequence of bytes, and that's what it gives you.
Note that it doesn't even see the third byte. That's left in the input stream to be consumed by later input operations.
If that's what you want, great! If you wanna get a number typed out (i.e. that's been entered as a string of bytes whose values align with a decimal number), take a look at fscanf and its related functions.
I am trying to read double values from a binary in C, but the binary starts with an integer and then the doubles I am looking for.
How do I skip that first 4 bytes when reading with fread()?
Thanks
Try this:
fseek(input, sizeof(int), SEEK_SET);
before any calls to fread.
As Weather Vane said you can use sizeof(int) safely if the file was generated in the same system architecture as the program you are writing. Otherwise, you should manually specify the size of integer of the system where the file originated.
You can use fseek to skip the initial integer. If you insist on using fread anyway, then you can read the integer first:
fread(ptr, sizeof(int), 1, stream).
Of course you have to declare ptr before calling fread.
As I said, fseek is another option:
fseek(stream, sizeof(int), SEEK_SET).
Beware that fseek moves the file pointer in bytes (1 in the given line from the beginning of the file); integer can be 4 or other number of bytes which is system specific.
Be careful when implementing things like this. If the file isn't created on the same machine, you may get invalid values due to different floating point specifications.
If the file you're reading is created on the same machine, make sure that the program that writes, correctly address the type sizes.
If both writer and reader are developed in C and are supposed to run only on the same machine, use the fseek() with the sizeof(type) used in the writer in the offset parameter.
If the machine that writes the binary isn't the same that will read it, you probably don't want to even read the doubles with fread() as their format may differ due to possible different architectures.
Many architectures rely on the IEEE 754 for floating point format, but if the application is supposed to address multi-platform support, you should make sure that the serialized format can be read from all architectures (or converted while unserializing).
Just read those 4 unneeded bytes, like
void* buffer = malloc(sizeof(double));
fread(buffer,4,1,input); //to skip those four bytes
fread(buffer,sizeof(double),1,input); //then read first double =)
double* data = (double*)buffer;//then convert it to double
And so on
I hope this question makes sense! I'm currently learning C (go easy!) and I'm interested in how table mappings work.
I'm using the extended ASCII table as an experiment. (http://www.ascii-code.com)
For example I can create a char and set its value to a tilde like so:
char charSymbol = '~';
And I can also specify the exact same value like so:
char charDec = 126;
char charHex = 0x7E;
char charOct = 0176;
char charBin = 0b01111110;
Regardless of which of the above declarations I choose (if I'm understanding things correctly) the value that's held in memory for each of these variables is always exactly the same. That is, the binary representation (01111110)
My question is; does the compiler hold the extended ASCII table and perform the binary value lookup during compilation? And if that's the case, does the machine the program is running on also hold the extended ASCII table to know that when the program is asked to print 01111110 to screen that it's to print a "~" ?
For most of the code in your question, no ASCII lookup table is needed.
Note that in C, char is an integer type, just like int, but narrower. A character constant like 'x' (for historical reasons) has type int, and on an ASCII-based system x is pretty much identical to 120.
char charDec = 126;
char charHex = 0x7E;
char charOct = 0176;
char charBin = 0b01111110;
(Standard C does not support binary constants like 0b01111110; that's a gcc extension.)
When the compiler sees an integer constant like 126 it computes an integer value from it. For this, it needs to know that 1, 2, and 6 are decimal digits, and what their values are.
char charSymbol = '~';
For this, the compiler just needs to recognize that ~ is a valid character.
The compiler reads all these characters from a text file, your C source. Each character in that file is stored as a sequence of 8 bits, which represent a number from 0 to 255.
So if your C source code contains:
putchar('~');
(and ~ happens to have the value 126), then all the compiler needs to know is that 126 is a valid character value. It generates code that sends the value 126 to the putchar() function. At run time, putchar sends that value to the standard output stream. If standard output is going to a file, the value 126 is stored in that file. If it's going to a terminal, the terminal software will do some kind of lookup to map the number 126 to the glyph that displays as the tilde character.
Compilers have to recognize specific character values. They have to recognize that + is the plus character, which is used to represent the addition operator. But for input and output, no ASCII mapping is needed, because each ASCII character is represented as a number at all stages of processing, from compilation to execution.
So how does a compiler recognize the '+' character? C compilers are typically written in C. Somewhere in the compiler's own sources, there's probably something like:
switch (c) {
...
case '+':
/* code to handle + character */
...
}
So the compiler recognizes + in its input because there's a + in its own source code -- and that + (stored in the compiler source code as the 8-bit number 43) resulted in the number 43 being stored in the compiler's own executable machine code.
Obviously the first C compiler wasn't written in C, because there was nothing to compile it. Early C compilers may have been written in B, or in BCPL, or in assembly language -- each of which is processed by a compiler or assembler that probably recognizes + because there's a + in its own source code. Each generation of C compiler passes on the "knowledge" that + to the next C compiler that it compiles. The "knowledge" that + is 43 is not necessarily written in the source code; it's propagated each time a new compiler is compiled using an old one.
For a discussion of this, see Ken Thompson's article "Reflections on Trusting Trust".
On the other hand, you can also have, for example, a compiler running on an ASCII-based system that generates code for an EBCDIC-based system, or vice versa. Such a compiler would have to have a lookup table mapping from one character set to the other.
Actually, technically speaking your text editor is the one with the ASCII (or Unicode) table. The file is saved simply as a sequence of bytes; a compiler doesn't actually need to have an ASCII table, it just needs to know which bytes do what. (Yes, the compiler logically interprets the bytes as ASCII, but if you looked at the compiler's machine code all you'd see is a bunch of comparisons of the bytes against fixed byte values).
On the flip side, the executing computer has an ASCII table somewhere to map the bytes output by the program into readable characters. This table is probably in your terminal emulator.
C language has pretty weak type-safety and that is why you could always assign an integer to a character variable.
You used different representations of an integer to assign to the character variable - and that is supported in C programming language.
When you typed a "~" in a text file in your C program, your text editor actually converted the key-strokes and stored its ASCII equivalent. Therefore when the compiler parsed the C- code, it did not sense that what is written is a ~ (tilde). While parsing, when compiler encountered ASCII equivalent of ' (i.e single quotes) it went into a mode to read next byte as something that fits in a a char variable followed by another ' (single quote) . Since a char variable can have 0-255 different values it covers whole ASCII set, with extended char set included.
This is same when you use an assembler.
Printing on to screen in entirely different game - That is part of I/O system.
When you key-in a specific character on keyboard, a pulse of a mapped integer goes in and settles in memory of the reading program. Similarly, when you print a specific integer on a printer or screen, that integer takes the shape of corresponding character.
Therefore if you want to print an integer in an int variable there are routines that convert each of its digits and send the ASCII code for each of them, and I/O system converts them into characters.
All those values are exactly equal to each other - they're just different representations of the same value, so the compiler sees them all in exactly the same way after translation from your written text into the byte value.
I have to write a byte in hex to a file but I have a problem. For example.
If I have:
unsigned char a = 0x0;
and I write to a file using fwrite:
FILE *fp = fopen("file.txt",wb);
fwrite(&a,sizeof(unsigned char),1,fp);
fclose(fp);
When I open file I always see 20h, why not 00h?
So, I try to use:
fprintf(fp,"%x",a);
In this case I see 0h, but I need a full byte, not a nibble.
What should I do?
The first example is hard to believe, it ought to generate a file with a single byte with the value 0 in it. That's not really a text file though, so I guess your tools might fool you.
The second attempt is better, assuming you want a text file with a text representation of the value in it. To make it two hexadecimal digits, specify a width and padding:
fprintf(fp, "%02x", a);
Please note that there is no such thing as "a hex value". A value is a value; it can be represented as hex, but that's not part of the value. 100 decimal is the same thing as 64 in hex, and 1100100 in binary. The base only matters when representing the number as a string of digits, the number itself can't "be hex".
I want to ask few questions about bits and bytes as I am very confused.
For example, suppose I have a short int x = 345;. I know that short take 16 bits, so when I write the short in a file it is written as char '3' , '4' ,'5' each containing 1 bytes so total of 24 bits.
Is there a way to write the number (short integer) in file as short integer taking 16 bits?
Also am I right in my concept? What will be the difference in writing to my file if the file is binary or text in terms of bytes?
Yes, there is a way.
uint16_t sh = 345;
fwrite(&sh, sizeof(sh), 1, fp);
In the case you mentioned 345 is written as text (for example ASCII if that's what you use). In the example I posted, the binary representation of sh is written in the file and it will take only 2 bytes.
What will be the difference in writing to my file if the file is
binary or text in terms of bytes?
Text write (fprintf)
0000000: 00110011 00110100 00110101
3 4 5
Binary write (fwrite)
0000000: 01011001 00000001
#Little endian. Read as: 00000001 01011001 = 345
If interoperabillity is an issue (i.e. you want to send the file to another machine) the text format is a superior choice as it's portable.
If you write the value as a string, it will occupy at least three bytes for the three digits; there would often be a newline or space as well to mark the end of the value.
Yes, you can write the value as 2 bytes. One way would be:
fwrite(&x, sizeof(x), 1, fp);
The difference between binary and text is that you can transport the text between different types of machine with almost complete impunity and all the machines will interpret the data the same way. The binary file can only be interpreted on machines that have the same endian-ness (big-endian or non-Intel vs little-endian or Intel). On the other class of machines, you have to swap the order of the bytes to get the binary data interpreted correctly.