Most efficient way to write a number in a file in C? - c

I need to keep trace of an int number greater than 255 in a file. It is greater than the largest unsigned char and so the use of the fputc seems to be not reliable (first question: is it always true?).
I could use fputs, by converting the digits in characters, so obtaining a string; but in the program i need the number as an int too!
So, the question in the title: what is so the most efficient way to write that number? Is there any way to avoid the conversion to string?
Keep that the file should then be readed by another process, where char number should become an int again.

Just write out the binary representation:
int fd;
...
int foo = 1234;
write (fd, &foo, sizeof(foo));
(and add error handling).
Or if you like FILE*
FILE *file;
...
int foo = 1234;
fwrite (&foo, sizeof(foo), 1, file);
(and add error handling).
Note that if your file is to be loaded on a different system, potential with different endianness, you might want to ensure the endianness of the bytes is constant (e.g. most significant byte or least significant byte first). You can use htnol, htons etc. for this if you want. If you know the architecture that is loading the file is the same as that saving it, there is no need for this.

Related

Segmentation Fault 11 when trying to read an image byte per byte

Im trying to write a simple C code that counts how many times a byte is repeated in a file. We tried the code with .txt files and works wonders (max size tested: 137MB). But when we tried it with an image (even small, 2KB) it returned Segmentation Fault 11.
I've done some research and found some specific libs for images, but I don't want to resort to them since the code it's not only meant for images, but for virtually any type of file. Is there a way to simple read a file byte per byte regardless of anything else (extension, meta, etc).
This is the code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
FILE *f;
char *file;
long numTotalBytes = 0;
int bytesCount[256] = {0};
f = fopen ( argv[1], "rb");
fseek(f, 0L, SEEK_END);
numTotalBytes = ftell(f);
rewind(f);
file = calloc(1, numTotalBytes);
fread(file, numTotalBytes, 1, f);
fclose(f);
printf("numTotalBytes: %ld", numTotalBytes); //<- this gives the right output even for images
unsigned int i;
for (i=0; i<numTotalBytes; ++i) {
unsigned char pointer = file[i]; //<- This access fails at file[1099]
int pointer_int = (int)pointer;
printf("iteration %i with pointer at %i\n", i, pointer_int); //<- pointer_int is never below 0 or above 255
//++bytesCount[(int)file[i]];
++bytesCount[pointer_int];
}
free(file);
}
Some extra info:
- Changing the extension of the img to .txt doesn't work.
- The code returns Segmentation Fault exactly at iteration 1099 (file I'm using is aprox 163KB so file[i] should accept accesses up to aprox file[163000]).
- For txt files works perfect. Reads the bytes one by one and counts them as expected, regardless of file size.
- I'm on Mac (you never know...)
//EDIT: I have edited the code for a more desglosed and explanatory one because some of you where telling me things I've already tried.
//EDIT_2: Ok guys, never mind. This version should work in any other computer that its not mine. I think the problem is with my terminal when passing arguments but I just switched OS and it works.
Do check if fopen() and calloc() are successful.
The format specifier to print long is %ld, not %lu.
(int)file[i] is bad for array index because converting char to int will preserve its value if all values that can be represented as char are representable in int, and because if char is signed in your environment (and setting), it may access negative index, cause out-of-range access and invoke undefined behavior.
You should change ++bytesCount[(int)file[i]]; to ++bytesCount[(unsigned char)file[i]]; in order to prevent using negative index.
Also note that ftell() with SEEK_END may note be supported for binary stream (N1570 7.21.9.2 The fseek function), so it is better to read one-by-one using fgetc() in order to avoid undefined behavior and to use less memory.
MikeCAT just beat me to it. A bit more explanation follows, in case it helps.
To fix: change file to unsigned char *file and the increment to ++bytesCount[file[i]];.
Exaplanation: per this answer, a plain char may be signed or unsigned. In this case, I'm guessing it defaults to signed. That means any value >=0x80 will become a negative number. Such values are not likely to be in your English-language text file, but are very likely to be in an image! The typecast to (int) will keep negatives negative. Therefore, the code will index byteCounts with a negative number, leading to the segmentation fault.
It might be caused by this line
++bytesCount[(int)file[i]];
The bytesCount is array of 256 ints. If file[i] is more than 256, you are accessing invalid memory and that can cause segmentation fault.

write() bad address

I am trying to write out the size in bytes of a string that is defined as
#define PATHA "/tmp/matrix_a"
using the code
rtn=write(data,(strlen(PATHA)*sizeof(char)),sizeof(int));
if(rtn < 0)
perror("Writing data_file 2 ");
I get back Writing data_file 2 : Bad address
What exactly about this is a bad address? The data file descriptor is open, and writes correctly immediately before and after the above code segment. The data to be written to the file data needs to be raw, and not ASCII.
I have also tried defining the string as a char[] with the same issue
The second argument to write() is the address of the bytes you want to write, but you are passing the bytes you want to write themselves. In order to get an address, you must store those bytes in a variable (you can't take the address of the result of an expression). For example:
size_t patha_len = strlen(PATHA);
rtn = write(data, &patha_len, sizeof patha_len);
The arguments to POSIX write() are:
#include <unistd.h>
ssize_t write(int fildes, const void *buf, size_t nbyte);
That's a:
file descriptor
buffer
size
You've passed two sizes instead of an address and a size.
Use:
rtn = write(data, PATHA, sizeof(PATHA)-1);
or:
rtn = write(data, PATHA, strlen(PATHA));
If you are seeking to write the size of the string as an int, then you need an int variable to pass to write(), like this:
int len = strlen(PATHA);
rtn = write(data, &len, sizeof(len));
Note that you can't just use a size_t variable unless you want to write a size_t; on 64-bit Unix systems, in particular, sizeof(size_t) != sizeof(int) in general, and you need to decide which size it is you want to write.
You also need to be aware that some systems are little-endian and others big-endian, and what you write using this mechanism on one type is not going to be readable on the other type (without mapping work done before or after I/O operations). You might choose to ignore this as a problem, or you might decide to use a portable format (usually, that's called 'network order', and is equivalent to big-endian), or you might decide to define that your code uses the opposite order. You can write the code so that the same logic is used on all platforms if you're careful (and all platforms get the same answers).
The second argument to write() is the buffer and third argument is the size:
ssize_t write(int fd, const void *buf, size_t count);
The posted code passes the length which is interpreted as an address which is incorrect. The compiler should have emitted a warning about this (don't ignore compiler warnings and compile with the warning level at the highest level).
Change to:
rtn=write(data, PATHA, strlen(PATHA));
Note sizeof(char) is guaranteed to be 1 so it can be omitted from the size calculation.
The Bad address error has already been answered. If you want to write the size of a string just use printf.
printf("Length: %d\n", strlen(data));
Either that, or you can write a function that will convert an integer to a string and print that out... I prefer printf :)
rtn = write(data, PATHA, strlen(PATHA));
is what you want I think. Arguments are supposed to be
file descriptor (data)
the source buffer (your string constant PATHA)
The number of bytes to pull from that buffer (measured using strlen() on the same PATHA constant)
Also, to be complete, you should always check rtn for how many characters you've written. You're not guaranteed that you write() all the bytes requested on all descriptor types. So sometimes you end up writing it in chunks, determined by the amount it answers that it wrote, vs how many you know you have yet to write still then.

Go to a certain point of a binary file in C (using fseek) and then reading from that location (using fread)

I am wondering if this is the best way to go about solving my problem.
I know the values for particular offsets of a binary file where the information I want is held...What I want to do is jump to the offsets and then read a certain amount of bytes, starting from that location.
After using google, I have come to the conclusion that my best bet is to use fseek() to move to the position of the offset, and then to use fread() to read an amount of bytes from that position.
Am I correct in thinking this? And if so, how is best to go about doing so? i.e. how to incorporate the two together.
If I am not correct, what would you suggest I do instead?
Many thanks in advance for your help.
Matt
Edit:
I followed a tutorial on fread() and adjusted it to the following:
`#include <stdio.h>
int main()
{
FILE *f;
char buffer[11];
if (f = fopen("comm_array2.img", "rt"))
{
fread(buffer, 1, 10, f);
buffer[10] = 0;
fclose(f);
printf("first 10 characters of the file:\n%s\n", buffer);
}
return 0;
}`
So I used the file 'comm_array2.img' and read the first 10 characters from the file.
But from what I understand of it, this goes from start-of-file, I want to go from some-place-in-file (offset)
Is this making more sense?
Edit Number 2:
It appears that I was being a bit dim, and all that is needed (it would seem from my attempt) is to put the fseek() before the fread() that I have in the code above, and it seeks to that location and then reads from there.
If you are using file streams instead of file descriptors, then you can write yourself a (simple) function analogous to the POSIX pread() system call.
You can easily emulate it using streams instead of file descriptors1. Perhaps you should write yourself a function such as this (which has a slightly different interface from the one I suggested in a comment):
size_t fpread(void *buffer, size_t size, size_t mitems, size_t offset, FILE *fp)
{
if (fseek(fp, offset, SEEK_SET) != 0)
return 0;
return fread(buffer, size, nitems, fp);
}
This is a reasonable compromise between the conventions of pread() and fread().
What would the syntax of the function call look like? For example, reading from the offset 732 and then again from offset 432 (both being from start of the file) and filestream called f.
Since you didn't say how many bytes to read, I'm going to assume 100 each time. I'm assuming that the target variables (buffers) are buffer1 and buffer2, and that they are both big enough.
if (fpread(buffer1, 100, 1, 732, f) != 1)
...error reading at offset 732...
if (fpread(buffer2, 100, 1, 432, f) != 1)
...error reading at offset 432...
The return count is the number of complete units of 100 bytes each; either 1 (got everything) or 0 (something went awry).
There are other ways of writing that code:
if (fpread(buffer1, sizeof(char), 100, 732, f) != 100)
...error reading at offset 732...
if (fpread(buffer2, sizeof(char), 100, 432, f) != 100)
...error reading at offset 432...
This reads 100 single bytes each time; the test ensures you got all 100 of them, as expected. If you capture the return value in this second example, you can know how much data you did get. It would be very surprising if the first read succeeded and the second failed; some other program (or thread) would have had to truncate the file between the two calls to fpread(), but funnier things have been known to happen.
1 The emulation won't be perfect; the pread() call provides guaranteed atomicity that the combination of fseek() and fread() will not provide. But that will seldom be a problem in practice, unless you have multiple processes or threads concurrently updating the file while you are trying to position and read from it.
It frequently depends on the distance between the parts you care about. If you're only skipping over/ignoring a few bytes between the parts you care about, it's often easier to just read that data and ignore what you read, rather than using fseek to skip past it. A typical way to do this is define a struct holding both the data you care about, and place-holders for the ones you don't care about, read in the struct, and then just use the parts you care about:
struct whatever {
long a;
long ignore;
short b;
} w;
fread(&w, 1, sizeof(w), some_file);
// use 'w.a' and 'w.b' here.
If there's any great distance between the parts you care about, though, chances are that your original idea of using fseek to get to the parts that matter will be simpler.
Your theory sounds correct. Open, seek, read, close.
Create a struct to for the data you want to read and pass a pointer to read() of struct's allocated memory. You'll likely need #pragma pack(1) or similar on the struct to prevent misalignment problems.

Writing structure variables into a file, problem

Hi I have to write the contents of a structure variable into a file. I have a working program but the output looks distorted, you can understand when you look at the output. The simple version of the code is as below and the outputs follows.
Code:
#include<stdio.h>
#include <stdlib.h>
struct mystruct
{
char xx[20];
char yy[20];
int zz;
};
void filewrite(struct mystruct *myvar)
{
FILE *f;
f = fopen("trace.bin","wb");
if (f == NULL)
{
printf("\nUnable to create the file");
exit(0);
}
fwrite(myvar, sizeof(struct mystruct), 1, f);
fclose(f);
}
void main()
{
struct mystruct myvar = {"Rambo 1", "Rambo 2", 1234};
filewrite(&myvar);
}
Output:
(1. Where is the integer '1234'? I need that intact.)
(2. Why does some random character appear here?)
trace.bin
Rambo 1Rambo 2Ò
Your program is correct and the output too...
Your are writing a binary file containing the raw data from memory.
The integer zz gets written to disk as 4 bytes (or 2 depending on the size of an int on your system), with the least significant byte first (Intel machine I guess).
1234 (decimal) gets written as 0xD2, 0x04, 0x00, 0x00 to disk.
0xD2 is a Ò when you look at in text form. The 0x04 and the 0x's are non-printable characters so they don't show.
First, in general it's not a good practice to copy non-packed struct-types to files since the compiler can add padding to the struct in order to align it in memory. Thus you will end up with either a non-portable implementation, or some garbled output where someone else tries to read your file, and the bits/bytes are not placed at the correct offset because of the compiler's padding bytes.
Second, I'm not sure how you are reading your file back (it appears you just copied it into a buffer and tried to print that), but the last set of bytes is an int type at the end ... it's not going to be a null-terminated string, so the manner in which it prints will not look "correct" ... printing non-null-terminated strings as strings can also lead to buffer overflows resulting in segmentation faults, etc.
In order to read back the contents of the file in a human-readable format, you would need to open the file and read contents back into the correct data-structures/types, and then appropriately call printf or some other means of converting the binary data to ASCII data for a print-out.
I don't recommend dumping memory directly into file, you should use some serialization method (e.x if you have pointer in the struct, you are doomed). I recommend Google Buffers Protocol if data will be shared between multiple applications.

Read a binary file C programming

I'm doing a program to manage a clinic, but I'm having a problem. I need to read a binary file with the information from the Doctors. The information is name, code and telephone. They are inserted by the user.
How can I printf that info separately. For example:
Name: John Cruz
Code: JC
Telephone: 90832324
I'm trying to use
typedef struct {
char code[10];
char name[100];
int telephone;
} DOCTOR;
int newDoctor() {//This is the function that create the binary file
DOCTOR d;
FILE *fp;
fp = fopen("Doctors.dat","wb");
if(fp==NULL) {
printf("Error!");
return -1;
}
printf("Code\n");
fflush(stdin);
gets(d.code);
printf("Name\n");
gets(d.name);
printf("Telephone\n");
scanf("%d",&d.telephone);
fprintf(fp,"%s;%s;%d",d.code,d.name, d.telephone);
fclose(fp);
}
//And to open
FILE* fp;
fp=fopen("Doctors.dat","rb");
while(!EOF(fp)) {
fgets(line, 100, fp);
printf("%s",line);
}
Just to see the line but it's not working, and how i can separate the info?
Regards
fgets assumes that the data is ascii strings. For binary data you need to know the binary format and read the data into the appropriate data structures.
You must know the format the binary is in, such as if you serialized a previous struct then you can read it into a struct of the same type:
typedef struct
{
int stuff;
double things;
} myStruct;
myStruct writeMe = {5, 20.5};
FILE* fp;
fp = fopen("Doctores.dat","wb");
if (fp == NULL) { fputs ("File error", stderr); exit(EXIT_ERROR); }
fwrite(writeMe, 1, sizeof(writeMe), fp);
fclose(fp);
Then later to read:
myStruct readMe;
FILE* fp2;
fp2 = fopen("Doctores.dat","rb");
if (fp2 == NULL) { fputs ("File error", stderr); exit(EXIT_ERROR); }
fread(readMe, 1, sizeof(readMe), fp2);
fclose(fp2);
printf("my int: %i\nmy double: %f", readMe.stuff, readMe.things);
Hope this helps
There are at least two issues here: file & data format and reading a binary file. File format is how the information is organized within the file. Binary reading involves reading the file without any translations.
File and Data Format
For text fields, you need to know the following:
Fixed or variable length field.
Maximum field width.
Representation (null terminated,
fixed length, padded, preceded by
length of string, etc.)
You can't assume anything. Get the format in writing. If you don't understand the writing, have the original author rewrite the documentation or explain it to you.
For integral numeric fields you need to know the following:
Size of number, in bytes.
Endianness: Is first byte the Most
Significant (MSB) or Least
significant (LSB)?
Signed or Unsigned
One's complement or two's complement
Numbers can range from 1 "byte" to at least 8 bytes, depending on the platform. If your platform has a native 32-bit integer but the format is 16-bit, your program will read 16 extra bits from the next field. Not good; bad, very bad.
For floating point: you need to know the representation.
The are many ways to represent a floating point number. Floating point numbers can very in size also. Some platforms use 32-bits, while others use 80 bits or more. Again, assume nothing.
Binary Reading
There are no magic methods in the C and C++ libraries to read your structure correctly in one function call; you will have to assemble the fields yourself. One thorn or bump is the fact that compilers may insert "padding" bytes between fields. This is compiler dependent and the quantity of padding bytes is not standard. It is also known as alignment.
Binary reading involves using fread or std::istream::read. The common method is to allocate a buffer, read a block of data into the buffer, then compose the structures from that buffer based on the file format specification.
Summary
Before reading a binary stream of data, you will need a format specification. There are various ways to represent data and internal data representation varies by platform. Binary data is best read into a buffer, then program structures and variables can be built from the buffer.
Textual representations are simpler to input. If possible, request that the creator of the data file use textual representations of the data. Field separators are useful too. A language like XML helps organize the textual data and provides the format in the data file (but may be too verbose for some applications).

Resources