How does C read bytes using the read syscall? - c

I need to read an unsigned short using the read system call.
According to the manpage:
read() attempts to read up to count bytes from file descriptor fd
into the buffer starting at buf.
In my case, an unsigned short is two bytes size, so it can store the numbers up to 65535. But when I execute this code:
char buf[2];
bytes_read = read(0, buf, 2);
bytes_wrote = write(1, buf, 2);
and type in the command line, say, the number 123, it returns only 12. Does it not read bytes, but symbols? How can I read a value with more than 2 symbols into a 2-byte buffer? For example, the maximum values of an unsigned short. I found nothing in either K&R or the manpages about it, so I think it's very simple.

NB: I'm assuming your terminal uses either ASCII or UTF8. For the purposes of this explanation, they're equivalent.
When you type, say, 123, read isn't getting that as a number. It's seeing that as a sequence of bytes -- since you said that it should look to fill a 2-char buffer, it sees the first two bytes: First, 0x31, then 0x32. It reads the first byte, and then the second; it doesn't interpret them into numbers. That the series of bytes happens to represent a number when decoded as ASCII or UTF8 is irrelevant; all C cares about is the actual sequence of bytes, and that's what it gives you.
Note that it doesn't even see the third byte. That's left in the input stream to be consumed by later input operations.
If that's what you want, great! If you wanna get a number typed out (i.e. that's been entered as a string of bytes whose values align with a decimal number), take a look at fscanf and its related functions.

Related

How to properly receive hex data from serial communication in linux

I have been searching for this similar issue on internet but haven't found any good solution. I am in ubuntu and I am receiving serial hex data on serial port. I have connected my system using null modem cable to windows system and windows system is running docklight software which is sending hex data like below:
2E FF AA 4F CC 0D
Now by saving the data, I do not mean that I want to print data on terminal. I just want to save this data as it is in a buffer so that I an later process it.
To read the data I am doing
res = read (fd, buf, sizeof buf)
//where buf is
unsigned char buf[255];
At this point after reading, buf contains some random chars. From some links on internet, I got a way to convert it into hex:
unsinged char hexData[255];
for (int i = 0; i < res; i++)
{
sprintf((char*)hexData+i*2, "%02X", buf[i]);
}
Now this hexData contains 2EFFAA4FCC0D, which is ok for my purpose. (I do not know if it is the right way of doing it or not). Now lets say I want to convert E from the hexData into decimal. But at this point, E will be considered as character not a hex so may be I'll get 69 instead of 14.
How can I save hex data in a variable. Should I save it as array of chars or int. Thanks.
You already have data in binary form in buf
But if you still need to convert hex to decimal, you can use sscanf
sscanf(&hexBuf[i],"%02X",&intVar);// intVar is integer variable
It wll convert hex byte formed by hexBuf[i] and hexBuf[i+1] to binary in intVar, When you printf intVar with %d you will get to see the decimal value
You can store intVar as element of unsigned character array
You may want to think about what you're trying to achieve.
Hexadecimal is just a representation. The byte you are receiving could be shown as hexadecimal pairs, as binary octet or as a series of (more or less) printable characters (what you see if you print your unsigned char array).
If what you need is storing only the hexadecimal representation of those bytes, convert them to hexadecimal as you are doing, but remember you'll need an array twice as big as your buffer (since a single byte will be represented by two hexadecimal characters once you convert it).
Usually, the best thing to do is to keep them as an array of bytes (chars), that is, your buf array, and only convert them when you need to show the hexadecimal representation.

Reading from binary after x bytes in C

I am trying to read double values from a binary in C, but the binary starts with an integer and then the doubles I am looking for.
How do I skip that first 4 bytes when reading with fread()?
Thanks
Try this:
fseek(input, sizeof(int), SEEK_SET);
before any calls to fread.
As Weather Vane said you can use sizeof(int) safely if the file was generated in the same system architecture as the program you are writing. Otherwise, you should manually specify the size of integer of the system where the file originated.
You can use fseek to skip the initial integer. If you insist on using fread anyway, then you can read the integer first:
fread(ptr, sizeof(int), 1, stream).
Of course you have to declare ptr before calling fread.
As I said, fseek is another option:
fseek(stream, sizeof(int), SEEK_SET).
Beware that fseek moves the file pointer in bytes (1 in the given line from the beginning of the file); integer can be 4 or other number of bytes which is system specific.
Be careful when implementing things like this. If the file isn't created on the same machine, you may get invalid values due to different floating point specifications.
If the file you're reading is created on the same machine, make sure that the program that writes, correctly address the type sizes.
If both writer and reader are developed in C and are supposed to run only on the same machine, use the fseek() with the sizeof(type) used in the writer in the offset parameter.
If the machine that writes the binary isn't the same that will read it, you probably don't want to even read the doubles with fread() as their format may differ due to possible different architectures.
Many architectures rely on the IEEE 754 for floating point format, but if the application is supposed to address multi-platform support, you should make sure that the serialized format can be read from all architectures (or converted while unserializing).
Just read those 4 unneeded bytes, like
void* buffer = malloc(sizeof(double));
fread(buffer,4,1,input); //to skip those four bytes
fread(buffer,sizeof(double),1,input); //then read first double =)
double* data = (double*)buffer;//then convert it to double
And so on

fwrite not behaving as it should be

I have a C program that writes to a file using fwrite(..) and the result is not consistent with the function's arguments I provide.
uint32_t x = 1009716161;
FILE * file = fopen("helloop.txt", "wb+");
rewind(file);
fwrite( &x, sizeof(uint32_t), 1, file);
fclose(file);
When I check the file afterward it seems to contains symbols that does not translate into anything
>cat helloop.txt
>Á/<
as I should be getting this
>cat helloop.txt
>000000003C2F0BC1
I checked the file's permissions and I chmodded it
chmod 777 helloop.txt
The way I see it I have a 1 element of size 32 bit integer that I want to write to file,
what am I doing wrong ?
Your program did exactly what you told it to.
In general, if a program you've written seems to be misbehaving, you shouldn't jump to the conclusion that the operating system, the compiler, or the runtime library is at fault. It's far more likely that there's a bug in your program, or that you've misunderstood something. That remains true no matter how many years of experience you have.
The value 1009716161, expressed in hexadecimal, is 0x3c2f0bc1. When you write that value to a binary file, you write 4 8-bit bytes, with values 0x3c, 0x2f, 0x0b, and 0xc1. In ASCII, those are '<', '/', and two characters outside the printable ASCII range. The order in which they're written depends on the endianness your system, but the contents you report seem consistent with that.
I"m not sure why you'd expect to see 000000003C2F0BC1. That's 16 byte when you only wrote 4 bytes to the file. Furthermore, binary files do not contain an ASCII rendering of the hexadecimal representation of the data you wrote -- they just contain the data.
If you examine the file by converting it from raw binary to hexadecimal (by using the hexdump or od -x command if your system has it), you should see something recognizable.
And if you open the file in binary mode and use fread to read the data back into a uint32_t object, you should get the original value 1009716161 back -- which is the whole point.
cat helloop.txt
Á/<
cat prints character data. It doesn't print a 4-byte value in a file as a 32-bit integer.
as I should be getting this
cat helloop.txt
000000003C2F0BC1
No you shouldn't, not with cat. You'd have to write the character string "000000003C2F0BC1" to the file if you expect to get that. The file would probably be 16 characters long. I'd bet right now if you run ls -l helloop.txt you'll see size 4 because you wrote a uint32_t binary integer to the file.
what am I doing wrong ?
As far as I can tell, the only thing you've done wrong is expecting cat to print out your uint32_t as a hexadecimal representation. (Though I did not check your hex value, so that may be incorrect as well)
See if you have hexdump on your Linux machine, it may give you better results.
EDIT: If you actually DO want to print a uint32_t as a hex string, you can use fprintf(..., "%x", x) with the x or X format specifier, but keep in mind this is not compatible with fwrite/fread, so to read it back in you'll have to read in the string and convert back from hex to int.

Difference between binary zeros and ASCII character zero

gcc (GCC) 4.8.1
c89
Hello,
I was reading a book about pointers. And using this code as a sample:
memset(buffer, 0, sizeof buffer);
Will fill the buffer will binary zero and not the character zero.
I am just wondering what is the difference between the binary and the character zero. I thought it was the same thing.
I know that textual data is human readable characters and binary data is non-printable characters. Correct me if I am wrong.
What would be a good example of binary data?
For added example, if you are dealing with strings (textual data) you should use fprintf. And if you are using binary data you should use fwrite. If you want to write data to a file.
Many thanks for any suggestions,
The quick answer is that the character '0' is represented in binary data by the ASCII number 48. That means, when you want the character '0', the file actually has these bits in it: 00110000. Similarly, the printable character '1' has a decimal value of 49, and is represented by the byte 00110001. ('A' is 65, and is represented as 01000001, while 'a' is 97, and is represented as 01100001.)
If you want the null terminator at the end of the string, '\0', that actually has a 0 decimal value, and so would be a byte of all zeroes: 00000000. This is truly a 0 value. To the compiler, there is no difference between
memset(buffer, 0, sizeof buffer);
and
memset(buffer, '\0', sizeof buffer);
The only difference is a semantic one to us. '\0' tells us that we're dealing with a character, while 0 simply tells us we're dealing with a number.
It would help you tremendously to check out an ascii table.
fprintf outputs data using ASCII and outputs strings. fwrite writes pure binary data. If you fprintf(fp, "0"), it will put the value 48 in fp, while if you fwrite(fd, 0) it will put the actual value of 0 in the file. (Note, my usage of fprintf and fwrite were obviously not proper usage, but shows the point.)
Note: My answer refers to ASCII because it's one of the oldest, best known character sets, but as Eric Postpichil mentions in the comments, the C standard isn't bound to ASCII. (In fact, while it does occasionally give examples using ASCII, the standard seems to go out of its way to never assume that ASCII will be the character set used.). fprintf outputs using the execution character set of your compiled program.
If you are asking about the difference between '0' and 0, these two are completely different:
Binary zero corresponds to a non-printable character \0 (also called the null character), with the code of zero. This character serves as null terminator in C string:
5.2.1.2 A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.
ASCII character zero '0' is printable (not surprisingly, producing a character zero when printed) and has a decimal code of 48.
Binary zero: 0
Character zero: '0', which in ASCII is 48.
binary data: the raw data that the cpu gets to play with, bit after bit, the stream of 0s and 1s (usually organized in groups of 8, aka Bytes, or multiples of 8)
character data: bytes interpreted as characters. Conventions like ASCII give the rules how a specific bit sequence should be displayed by a terminal, a printer, ...
for example, the binary data (bit sequence ) 00110000 should be displayed as 0
if I remember correctly, the unsigned integer datatypes would have a direct match between the binary value of the stored bits and the interpreted value (ignore strangeness like Endian ^^).
On a higher level, for example talking about ftp transfer, the destinction is made between:
the data should be interpreted as (multi)byte characters, aka text (this includes non-character signs like a line break)
the data is a big bit/bytestream, that can't be broken down in smaller human readable bits, for example an image or a compiled executable
in system every character have a code and zero ASCII code is 0x30(hex).
to fill this buffer with zero character you must enter this code :
memset(buffer,30,(size of buffer))

How many bits are read by fgetc in a stream?

How many bits are read by the function fgetc in a stream?
The man page of fgetc tells that this function reads a "character", but a character is not a clear definition for me. How many bits does contain a "character" ? Is reading a character with fgetc equivalent as reading a byte?
Does it depend on the architecture of the machine and on the size of "char" or "byte"?
My objective is to read binary data in a stream with portability (byte=8bits or byte=16bits). Is it a better idea to use fread/fwrite with types like uintN_t instead of fgetc in order to control how many bits are read in the stream? Is there a better solution?
How many bits does contain a "character" ?
A character contains precisely CHAR_BIT bits, an implementation-specific value defined in limits.h.
/* Number of bits in a `char'. */
# define CHAR_BIT 8
Is reading a character with fgetc equivalent as reading a byte
Yup, fgetc reads exactly one byte.
This portability problem isn't easily solvable. The best way around it is to not make assumptions on the binary representation.
fgetc read exactly one byte. A character type (signed char, char, unsigned char and qualified versions) contains CHAR_BIT bits (<limits.h>), which is a constant greater than 8.
Your platform has a smallest unit of data, which corresponds to the C data type char. All I/O happens in units of chars. You are guaranteed that a char can hold the values 0–127, and either 0–255 or −127–127. Everything else is platform-specific. (The actual number of bits inside a char is contained in the macro CHAR_BIT.)
That said, as long as you only write and read values within the advertised range into each char, you are guaranteed that your program will work on any conforming platform. The only thing you are not guaranteed is that the resulting data stream will be binarily identical.

Resources