Reading from binary after x bytes in C - c

I am trying to read double values from a binary in C, but the binary starts with an integer and then the doubles I am looking for.
How do I skip that first 4 bytes when reading with fread()?
Thanks

Try this:
fseek(input, sizeof(int), SEEK_SET);
before any calls to fread.
As Weather Vane said you can use sizeof(int) safely if the file was generated in the same system architecture as the program you are writing. Otherwise, you should manually specify the size of integer of the system where the file originated.

You can use fseek to skip the initial integer. If you insist on using fread anyway, then you can read the integer first:
fread(ptr, sizeof(int), 1, stream).
Of course you have to declare ptr before calling fread.
As I said, fseek is another option:
fseek(stream, sizeof(int), SEEK_SET).
Beware that fseek moves the file pointer in bytes (1 in the given line from the beginning of the file); integer can be 4 or other number of bytes which is system specific.

Be careful when implementing things like this. If the file isn't created on the same machine, you may get invalid values due to different floating point specifications.
If the file you're reading is created on the same machine, make sure that the program that writes, correctly address the type sizes.
If both writer and reader are developed in C and are supposed to run only on the same machine, use the fseek() with the sizeof(type) used in the writer in the offset parameter.
If the machine that writes the binary isn't the same that will read it, you probably don't want to even read the doubles with fread() as their format may differ due to possible different architectures.
Many architectures rely on the IEEE 754 for floating point format, but if the application is supposed to address multi-platform support, you should make sure that the serialized format can be read from all architectures (or converted while unserializing).

Just read those 4 unneeded bytes, like
void* buffer = malloc(sizeof(double));
fread(buffer,4,1,input); //to skip those four bytes
fread(buffer,sizeof(double),1,input); //then read first double =)
double* data = (double*)buffer;//then convert it to double
And so on

Related

Writing a byte to a file and then reading the same byte are not the same

Basically I have a file, and in this file I am writing 3 bytes, and then I'm writing a 4 byte integer. In another application I read the first 3 bytes, and then I read the next 4 bytes and convert them to an integer.
When I print out the value, I have very different results...
fwrite(&recordNum, 2, 1, file); //The first 2 bytes (recordNum is a short int)
fwrite(&charval, 1, 1, file); //charval is a single byte char
fwrite(&time, 4, 1, file);
// I continue writing a total of 40 bytes
Here is how time was calculated:
time_t rawtime;
struct tm * timeinfo;
time(&rawtime);
timeinfo = localtime(&rawtime);
int time = (int)rawtime;
I have tested to see that sizeof(time) is 4 bytes, and it is. I have also tested using an epoch converter to make sure this is the correct time (in seconds) and it is.
Now, in another file I read the 40 bytes to a char buffer:
char record[40];
fread(record, 1, 40, file);
// Then I convert those 4 bytes into an uint32_t
uint32_t timestamp =(uint32_t)record[6] | (uint32_t)record[5] << 8 | (uint32_t)record[4] << 16 | (uint32_t)record[3] << 24;
printf("Testing timestamp = %d\n", timestamp);
But this prints out -6624. The expected value is 551995007.
EDIT
To be clear, everything else that I am reading from the char buffer is correct. After this timestamp I have text, which I simply print and it runs fine.
You write the time at once with fwrite, which uses the native byte ordering, then you explicitly read the individual bytes in big-endian format (most significant byte first). Your machine is likely using little-endian format for byte ordering, which would explain the difference.
You need to read/write in a consistent manner. The simplest way to do this is to fread one variable at a time, just like you're writing:
fread(&recordNum, sizeof(recordNum), 1, file);
fread(&charval, sizeof(charval), 1, file);
fread(&time, sizeof(time), 1, file);
Also note the use of sizeof to calculate the size.
You problem is probably right here:
uint32_t timestamp =(uint32_t)record[6] | (uint32_t)record[5] << 8 | (uint32_t)record[4] << 16 | (uint32_t)record[3] << 24;
printf("Testing timestamp = %d\n", timestamp);
You've used fwrite to write out a 32 bit integer.. in whatever order the processor stored it in memory.. and you don't actually know what byte ordering (endian-ness) the machine used. Maybe the first byte written out is the lowest byte of the integer, or maybe it's the highest byte of the integer.
If you're reading and writing the data on the same machine, or on different machines with the same architecture, you don't need to care about that.. it will work. But if the data is written on an architecture with one byte ordering, and potentially read in on an architecture with another byte ordering, it will be wrong: Your code needs to know what order the bytes should be in memory and what order they will be read/written on disk.
In this case, in your code, you are doing a mix of both: You write them out in whatever endian-ness the machine uses natively.. then when you read them in, you start shifting the bits around as if you know what order they were originally in.. but you don't, because you didn't pay attention to the order when you wrote them out.
So if you're writing and reading the file on the same machine, or identical machine (same processor, OS, compiler, etc), just write them out in the native order (without worrying about what that is) and then read them back in exactly as you wrote them out. If you write them and read them on the same machine, it'll work.
So if your timestamp is located at offset 3 through 6 of your record, just do this:
uint_32t timestamp;
memcpy(&timestamp, record+3, sizeof(timestamp);
Note that you cannot directly cast record+3 to a uint32_t pointer because it might violate the systems word alignment requirements.
Note also that you should probably be using time_t type to hold the timestamp, if you're on a unix-like system, that'll be the natural type supplied to hold epoch time values.
But if you are planning to move this file to another machine at any point and try to read it there, you could easily end up with your data on a system that has different endian-ness or different size for time_t. Simply writing bytes in and out of a file with no thought to the endian-ness or size of types on different operating systems is just fine for temporary files or for files which are meant to be used on one computer only and which will never be moved to other types of system.
Making data files that are portable between systems is a whole subject in itself. But the first thing you should do, if you care about that, is to look at functions htons(), ntonhs(), htonl(), ntonhl(), and their ilk.. which convert to and from the system native endian-ness to a known (big) endian-ness which is the standard for internet communications and generally used for interoperability (even though Intel processors are little-endian and dominate the market these days). These function do something similar to what you were doing with your bit-shifting but since someone else wrote it, you don't have to. It's a lot easier to use the library functions for this!
For example:
#include <stdio.h>
#include <arpa/inet.h>
int main() {
uint32_t x = 1234, y, z;
// open a file for writing, convert x from native to big endian, write it.
FILE *file = fopen("foo.txt", "w");
z = htonl(x);
fwrite(&z, sizeof(z), 1, file);
fclose(file);
file = fopen("foo.txt", "r");
fread(&z, sizeof(z), 1, file);
x = ntohl(z);
fclose(file);
printf("%d\n", x);
}
NOTE I am NOT CHECKING FOR ERRORS in this code, it is just an example.. do not use functions like fopen, fread etc without checking for errors.
By using these functions both when writing the data out to disk and when reading it back, you guarantee that the data on disk is always big-endian.. eg htonl() when on a big-endian platform does nothing, when on a little-endian platform it does the conversion from bit to little endian. And ntohl() does the opposite. So your data on disk will always be read in correctly.

Convert 2 bytes or 3 bytes to a float in C

I'm working with a low-level protocol that has a high data rate and therefore uses 2 or 3 bytes to represent a float depending on the range of the number, to make the system more efficient.
I'm trying to parse these numbers but the values I get don't make sense to me, they're all zero and I don't think the device would output zero for the variables in question.
The first 5 bytes in my buffer are: FF-FF-FF-FF-FF
The first two bytes make up a float t. It should be noted that the documentation says that the bytes are little endian.
To parse t I do:
float t = 0;
memcpy(buffer, &t, 2);
The next 3 bytes make up float ax, to parse that I do:
float ax = 0;
memcpy(buffer+2, &ax, 3);
Is this the correct way to handle this? I set both t and ax to zero first in case there are random bytes hanging around.
Update
The documentation is not great. Firstly they define a Float as a 32-bit IEEE 754 floating-point number.
Then there is this quote:
To increase efficiency many of the data packets are sent as 24-bit signed integer words
because 16-bits do not provide the range/precision required for many of the quantities,
whereas 32-bit precision makes the packet much longer than required.
Then there is a table which defines t as the first 2 bytes of the buffer. It states that the range is 0-59.999. It doesn't explicitly say that it's a Float, I'm just making that assumption.
Possibly you have the arguments the wrong way around. Change as follows:
memcpy (buffer, &t, 2);
to
memcpy (&t, buffer, 2);
HTH
[edit: To clarify to all those people voting my answer down, the question does indeed have the arguments the wrong way around. Tries to read the buffer but specifies it as the destination for memcpy]

fwrite not behaving as it should be

I have a C program that writes to a file using fwrite(..) and the result is not consistent with the function's arguments I provide.
uint32_t x = 1009716161;
FILE * file = fopen("helloop.txt", "wb+");
rewind(file);
fwrite( &x, sizeof(uint32_t), 1, file);
fclose(file);
When I check the file afterward it seems to contains symbols that does not translate into anything
>cat helloop.txt
>Á/<
as I should be getting this
>cat helloop.txt
>000000003C2F0BC1
I checked the file's permissions and I chmodded it
chmod 777 helloop.txt
The way I see it I have a 1 element of size 32 bit integer that I want to write to file,
what am I doing wrong ?
Your program did exactly what you told it to.
In general, if a program you've written seems to be misbehaving, you shouldn't jump to the conclusion that the operating system, the compiler, or the runtime library is at fault. It's far more likely that there's a bug in your program, or that you've misunderstood something. That remains true no matter how many years of experience you have.
The value 1009716161, expressed in hexadecimal, is 0x3c2f0bc1. When you write that value to a binary file, you write 4 8-bit bytes, with values 0x3c, 0x2f, 0x0b, and 0xc1. In ASCII, those are '<', '/', and two characters outside the printable ASCII range. The order in which they're written depends on the endianness your system, but the contents you report seem consistent with that.
I"m not sure why you'd expect to see 000000003C2F0BC1. That's 16 byte when you only wrote 4 bytes to the file. Furthermore, binary files do not contain an ASCII rendering of the hexadecimal representation of the data you wrote -- they just contain the data.
If you examine the file by converting it from raw binary to hexadecimal (by using the hexdump or od -x command if your system has it), you should see something recognizable.
And if you open the file in binary mode and use fread to read the data back into a uint32_t object, you should get the original value 1009716161 back -- which is the whole point.
cat helloop.txt
Á/<
cat prints character data. It doesn't print a 4-byte value in a file as a 32-bit integer.
as I should be getting this
cat helloop.txt
000000003C2F0BC1
No you shouldn't, not with cat. You'd have to write the character string "000000003C2F0BC1" to the file if you expect to get that. The file would probably be 16 characters long. I'd bet right now if you run ls -l helloop.txt you'll see size 4 because you wrote a uint32_t binary integer to the file.
what am I doing wrong ?
As far as I can tell, the only thing you've done wrong is expecting cat to print out your uint32_t as a hexadecimal representation. (Though I did not check your hex value, so that may be incorrect as well)
See if you have hexdump on your Linux machine, it may give you better results.
EDIT: If you actually DO want to print a uint32_t as a hex string, you can use fprintf(..., "%x", x) with the x or X format specifier, but keep in mind this is not compatible with fwrite/fread, so to read it back in you'll have to read in the string and convert back from hex to int.

How many bits are read by fgetc in a stream?

How many bits are read by the function fgetc in a stream?
The man page of fgetc tells that this function reads a "character", but a character is not a clear definition for me. How many bits does contain a "character" ? Is reading a character with fgetc equivalent as reading a byte?
Does it depend on the architecture of the machine and on the size of "char" or "byte"?
My objective is to read binary data in a stream with portability (byte=8bits or byte=16bits). Is it a better idea to use fread/fwrite with types like uintN_t instead of fgetc in order to control how many bits are read in the stream? Is there a better solution?
How many bits does contain a "character" ?
A character contains precisely CHAR_BIT bits, an implementation-specific value defined in limits.h.
/* Number of bits in a `char'. */
# define CHAR_BIT 8
Is reading a character with fgetc equivalent as reading a byte
Yup, fgetc reads exactly one byte.
This portability problem isn't easily solvable. The best way around it is to not make assumptions on the binary representation.
fgetc read exactly one byte. A character type (signed char, char, unsigned char and qualified versions) contains CHAR_BIT bits (<limits.h>), which is a constant greater than 8.
Your platform has a smallest unit of data, which corresponds to the C data type char. All I/O happens in units of chars. You are guaranteed that a char can hold the values 0–127, and either 0–255 or −127–127. Everything else is platform-specific. (The actual number of bits inside a char is contained in the macro CHAR_BIT.)
That said, as long as you only write and read values within the advertised range into each char, you are guaranteed that your program will work on any conforming platform. The only thing you are not guaranteed is that the resulting data stream will be binarily identical.

Range of MPI_offset in MPI

I am using MPI2.2 standard to write parallel program in C. I have 64 bit machine.
/* MPI offset is long long*/
MPI_Offset my_offset; printf ("%3d: my offset = %lld\n", my_rank, my_offset);
int count;
MPI_Get_count(&status, MPI_BYTE, &count);
printf ("%3d: read =%d\n", my_rank, count);
I am reading a file of very large size byte by byte. To read the file parallely i am setting the offset for each process using offset variable. I am having confusion for the data-type of MPI_offset type, that "whither it is signed or unsigned" long ?
My second question is about limitation of the "range of count variable" which is used in MPI_Get_count() function. since this function is used here to read all the elements from each process's buffer so i think it should also be of the long long type to read such a very large file.
MPI_Offset's size isn't defined by the standard - it is, roughly, as large as possible. ROMIO, a widely-used underlying implemetation of MPI-IO, uses 8-byte integers on systems which support them. You can probably find out for sure by looking in your system's mpi.h.
MPI_Offset is very definitely signed; there are functions like MPI_File_seek where it is perfectly reasonable to have values of type MPI_Offset take negative values.
MPI_Get_count returns an integer, of normal integer size, and this can certainly cause problems for some large file IO strategies.
Generally, it's better for a number of reasons not to use small low-level units of IO like bytes when doing MPI-IO; it's better in terms of performance and code readability to express the IO in units of your underlying data types. In doing so, these size limitations become less of an issue. If your underlying data type really is bytes, though, thre aren't many options.
Did you try to interleave MPI_File_read with something like MPI_File_seek(mpiFile,mpiOffset,MPI_SEEK_CUR ) ? This way you may succeed to avoid MPI_Offset overflow

Resources