I'm writing down a function that should save 3 structures (2 of them are arrays of structs) in a binary file. Here's my function:
void saveFile(Struct1 *s1, Struct2 *s2, Struct3 s3) {
FILE *fp = NULL;
fp = fopen("save.bin", "w+b");
if (fp == NULL) {
printf("Save failed.\n");
}
fwrite(s1, sizeof(Struct1), struct3.nElements, fp);
fwrite(s2, sizeof(Struct2), NELEMENTS, fp);
fwrite(&s3, sizeof(Struct3), 1, fp);
printf("Save done.\n");
}
s1 have struct3.nElements, s2 have NELEMENTS (that's a constant) and s3 is just one struct and not an array. When I try to open the save.bin with HexEditor it gives very different results from the ones I was expecting, I'm wondering if I used correctly the fwrite function, especially for array of structs.
There are small issues with you function that might cause problems:
you define the function as taking s3 by value. Why not pass a pointer to the third struct? Is the saveFile function properly declared before the calling code? Are you sure the calling code passes the struct by value?
You forget to close the stream. The handle gets lost, and the contents is not flushed to disk until the program exits.
You open the file in "w+b" mode: write with read. It is correct to use binary mode, but unnecessary to add the + for read. Just use "wb".
If fopen fails, you output a diagnostic message, but you do not return from the function. You will invoke undefined behavior when trying to write to a NULL stream pointer.
Regarding your question, the dump of the file does not correspond to what you expect... give us more information, such as the definitions of the different structures and the hex dump. Here are some ideas:
Some of the fields in the structures might need a specific aligned and thus be separated from the previous field by padding bytes. The values of those padding bytes is not necessarily 0: if the structures are in automatic storage or allocated with malloc, their initial state is undefined and can change as a side effect of storing other fields.
Integers can have different sizes and be stored in little endian or big endian order in the file, depending on the specific architecture your program is compiled for. For this reason, values stored by your program should only be read back with the appropriate, but reasonably similar code, running on the same architecture and OS.
If your structures contain pointers, you cannot really make sense from the values stored in the output file.
Related
I'm new to c from python. I'm trying to write two c scripts, one that reads a plain-text file in FASTA format (for DNA/RNA/protein sequences). They look like this...
>sequence1
ATCTATGTCGCTCGCTCGAGAGCTA
>sequence2
CGTCGCTGGGATCGATTTCGATAGCT
>sequence3
AAATATAACTCGCTAGCTCGATCGATC
>sequence4
CTCTCTCCTCTCTCTATATAGGGG
...where individual sequences are separated by ">" characters. Within each sequence, the actual sequence and its label are separated by a newline character. (ie ">label \n sequence"). The script for reading the plain-text and then writing it to a binary file seems to work. However, when I try to read the binary file and print its contents, I get a Segmentation Fault (Core dump).
I tried to produce a reduced example for posting here, but that example seems to work without error. So, I feel forced to attach my whole code snippets here. I must be missing something.
Here's the first script which reads in a plain text fasta file, splits it first by the ">" character, and then by the newline character, to make "sequence" structures for each sequence in the above FASTA file. These structures are then written to "your_sequences.bin".
#include <stdio.h>
#include <string.h>
#define BUZZ_SIZE 1024
struct sequence {
char *sequence;
char *label;
};
int main(int argc, char *argv[]) {
FILE *fptr;
char buffer[BUZZ_SIZE];
char fasta[BUZZ_SIZE];
char *token;
char *seqs[3];
int idx = 0;
const char fasta_delim[2] = ">";
const char newline[3] = "\n";
/* Read-in plain-text */
fptr = fopen(argv[1],"r");
while (fgets(buffer, BUZZ_SIZE, fptr) != NULL) {
strcat(fasta, buffer);
}
fclose(fptr);
/* Process text, first by splitting by > and then by \n for each sequence, and then write to binary */
FILE *out;
out = fopen("your_sequences.bin","wb");
struct sequence final_entry;
token = strtok(fasta,fasta_delim);
while (token != NULL) {
seqs[idx++] = token;
token = strtok(NULL,fasta_delim);
}
for (idx=0; idx<4; idx++) {
token = strtok(seqs[idx],newline);
char *this_seq[1];
int p = 0;
while (token != NULL) {
this_seq[p] = token;
token = strtok(NULL,newline);
p++;
}
final_entry.label = this_seq[0];
final_entry.sequence = this_seq[1];
printf("%s\n%s\n\n", final_entry.label, final_entry.sequence);
fwrite(&final_entry, sizeof(struct sequence), 1, out);
}
fclose(out);
return(0);
}
This outputs, as expected from the fprint() statement toward the bottom:
sequence1
ATCTATGTCGCTCGCTCGAGAGCTA
sequence2
CGTCGCTGGGATCGATTTCGATAGCT
sequence3
AAATATAACTCGCTAGCTCGATCGATC
sequence4
CTCTCTCCTCTCTCTATATAGGGG
I'm thinking the error has to be somewhere in the above script (ie my binary file is messed up), because the Segmentation Fault is caused by the fread() statement in the script below. I don't think I've made an error in calling fread(), but maybe I'm wrong.
#include <stdio.h>
#define BUZZ_SIZE 1024
struct sequence {
char *sequence;
char *label; };
int main(int argc, char *argv[]) {
struct sequence this_seq;
int n;
FILE *fasta_bin;
fasta_bin = fopen(argv[1],"rb");
for (n=0;n<4;n++) {
fread(&this_seq, sizeof(struct sequence), 1, fasta_bin);
printf (">%s\n%s\n", this_seq.label, this_seq.sequence);
}
fclose(fasta_bin);
return(0);
}
This outputs the segmentation fault
[1] 8801 segmentation fault (core dumped)
I've tinkered around with and gone over this a good amount over the past couple hours. I hope I haven't made some stupid mistake a wasted your time!
Thanks for your help.
I'm thinking the error has to be somewhere in the above script (ie my
binary file is messed up),
Sort of.
because the Segmentation Fault is caused by
the fread() statement in the script below.
I'm fairly confident that the error occurs not in the fread() but in the following printf().
I don't think I've made an
error in calling fread(), but maybe I'm wrong.
Your fread() corresponds to the fwrite(). There is every reason to expect that you will accurately read back what was written. The main problem here is a common one for C neophytes: you've misunderstood the nature of C strings (a null-terminated array of char), and failed to appreciate the crucial, but subtle, distinction between arrays and pointers.
To expand on that a bit, C does not have a first-class string data type. Instead, the standard library provides "string" functions that operate on sequences of objects of type char, where the end of the sequence is marked by a terminator char with the value 0. Such sequences typically are contained in char arrays, and always can be treated as if they were. Because that's what the standard library supports, that convention is ubiquitously used in programs and third-party libraries, too.
C, however, has no mechanism for passing arrays to functions or receiving them as return values. Nor do the assignment operator or most others work on arrays -- not even the indexing operator, []. Instead, in most contexts, values of array type are automatically converted to pointers to the first array element, and these can be passed around and used as operands to a wide variety of operators. Seeing (part of) this, inexperienced C programmers often mistakenly identify strings with such pointers instead of with the pointed-to data.
Of course a pointer value is just an address. You can copy it around and store it at any number of locations in the program, but this does nothing to the pointed-to data. And now I finally come around to the point: you can also write out a pointer value and read it back in, as your programs do, but it is rarely useful to do so, because the pointed-to data don't come along when you do that. Unless you read the pointer back into the same process that wrote it, the read-back pointer value is unlikely to be valid, and it certainly does not have the same significance it did in the program that wrote it.
You must instead write the pointed-to data, but you have to choose a format. In particular, titles and sequences generally have varying lengths, and one of the key things you need to decide is how, if at all, your binary format should reflect that. If I might be so bold, however, I have a suggestion for a well-defined format you could use: Fasta format! Seriously.
There's not much you can do short of data compression to express fasta-format data more compactly, as that format does little more than it needs to do to express the varying-length data it conveys. The question you need to answer, then, is what exactly you're trying to achieve by your reformatting -- both the reason for reformatting at all, and based on that, what your target format actually is.
You are getting segmentation fault because in your program you are using pointer without allocating memory to them:
printf (">%s\n%s\n", this_seq.label, this_seq.sequence);
You first need to allocate memory to this_seq.label and this_seq.sequence pointers, something like this:
this_seq.sequence = malloc(size_of_sequence);
if (this_seq.sequence == NULL)
exit(EXIT_FAILURE);
this_seq.label = malloc(size_of_label);
if (this_seq.label == NULL)
exit(EXIT_FAILURE);
and then read the data into them, like this:
fread(this_seq.sequence, size_of_sequence, 1, fasta_bin);
fread(this_seq.label, size_of_label, 1, fasta_bin);
The problem is that struct sequence doesn't actually carry any salvageable information, it only contains pointers.
Pointers carry memory addresses, they point where the actual information is in memory, but of course if you are reading the file in another process with an entirely different memory space, the information won't be there. In fact, you are likely to crash for trying to interact with memory space that wasn't properly initialized first.
A very simple solution is, don't use pointers, use arrays:
struct sequence
{
char sequence[1024];
char label[1024];
}
Now the structure actually carry the data, no longer just pointers. You will be able to read and write it to file with no worries. However, some code will need to be changed further.
You can no longer assign data to them like x.label = label, you need to use strcpy(), like strcpy(x.label, label). Those changes will need to be made everywhere in the code where you assign values to the properties of this structure.
As far as I understand passing a pointer to a function essentially passes the copy of the pointer to the function in C. I have a FILE pointer that I pass to a function func(), func() reads a line from a file and then when we return to main(). I read another line from the file using the same FILE pointer.
However, while I would imagine that I'd read the line exactly from before func() was called, I actually read the next line after what func() had read. Can you please explain why FILE pointer behaves this way?
This is my code:
#include <stdio.h>
#define STR_LEN 22
void func(FILE *fd);
int main() {
FILE *fd;
char mainString[STR_LEN];
if (!(fd = fopen("inpuFile", "r"))) {
printf("Couldn't open file\n");
fprintf(stderr, "Couldn't open file\n");
}
func(fd);
fgets(mainString, STR_LEN, fd);
printf("mainString = %s\n", mainString);
fclose(fd);
return 0;
}
void func(FILE *fd) {
char funcString[STR_LEN];
fgets(funcString,STR_LEN, fd);
printf("funcString = %s\n", funcString);
}
However, while I would imagine that I'd read the line exactly from before func was called ...
I can't imagine why you would imagine that. What if the FILE* references a network connection that has no replay capability at all where reading is consumption. Where would the line be stored such that you could read it again? There would be absolutely no place to put it.
Not only would I not imagine that, it's kind of crazy.
As far as I understand passing a pointer to a function essentially passes the copy of the pointer to the function in C.
Correct. But a copy of a pointer points to the very same object. If I point to a car and you copy me, you're pointing to the very same one and only car that I'm pointing to.
Because FILE pointer points on some data that gets changed when the file is read/written.
So the pointer doesn't change (still points to the handler structure of the file) but the data pointed by the structure does.
Try passing pointer as const FILE * you'll see that you cannot because fread operation (and others) alter the pointed data.
One way would be to duplicate the file descriptor, which dup does, but doesn't work on buffered FILE object, only raw file descriptors.
The problem is in your initial statement:
As far as I understand passing a pointer to a function essentially passes the copy of the pointer to the function in C.
This does not change much, as whatever you are accessing as a pointer, still holds the location of the FILE you are accessing, the whole point of using pointers as arguments for a function in C, is so that you can modify a certain value outside the scope of a function.
For example, common usage of an integer pointer as a function argument:
void DoSomethingCool(int *error);
Now using this code to catch the error would work like this:
int error = 0;
DoSomethingCool(&error);
if(error != 0)
printf("Something really bad happened!");
In other words, the pointer will actually modify the integer error, by accessing it's location and writing to it.
An important thing to keep in mind to avoid these kinds of misunderstandings is to recognize that all a pointer is, is essentially the address of something.
So you could (in theory, by simplifying everything a lot) think of an int * as simply an int, the value of which happens to be an address of some variable, for a FILE *, you can think of it as an int, where the value of the int is the location of the FILE variable.
FILE *fd is a pointer only in the sense that its implementation uses C construct called a "pointer". It is not a pointer in the sense of representing a file position.
FILE *fd represents a handle to a file object inside the I/O library, a struct that includes the actual position of the file. In a grossly simplified way, you can think of fd as a C pointer to a file pointer.
When you pass fd around your program, I/O routines make modifications to the file position. This position is shared among all users of fd. If a func() makes a change to that position by reading some data or by calling fseek, all users of the same fd will see the updated position.
When reading K&R, I became interested in how the file position is determined. By file position, I mean where in the file the stream is currently reading or writing. I think it must have something to do with the file pointer, or the piece of data it's pointing to. So I checked stack overflow, and find the following answer:
Does fread move the file pointer?
The answer indicates that file pointer will change with the change of file position. This makes me very confused, because in my understanding, a file pointer for a certain file should always point to the same address, where information about this file is stored. So I wrote a small piece of code, trying to find the answer:
#include<stdio.h>
int main(void)
{
char s[1000];
FILE *fp,*fp1,*fp2;
fp = fopen("input","r");
fp1 = fp; /* File poiter before fread */
fread(s,sizeof(char),100,fp);
fp2 = fp; /* File pointer after fread */
printf("%d\n",(fp1 == fp2) ? 1 : -1);
}
It gives the output 1, which I believe indicates that the file pointer actually doesn't move and is still pointing to the same address. I have also changed the fread line to be a fseek, which gave the same output. So does file pointer move with the change of file position, or where am I wrong in the verifying process?
Thanks!
I think you are confusing the general concept of pointers in C, vs. the nomenclature of a "file pointer". FILE is just a structure that contains most of the "housekeeping" attributes that the C stdio runtime library needs to interact with when using the stdio functions such as, fopen(), fread(), etc. Here is an example of the structure:
typedef struct {
char *fpos; /* Current position of file pointer (absolute address) */
void *base; /* Pointer to the base of the file */
unsigned short handle; /* File handle */
short flags; /* Flags (see FileFlags) */
short unget; /* 1-byte buffer for ungetc (b15=1 if non-empty) */
unsigned long alloc; /* Number of currently allocated bytes for the file */
unsigned short buffincrement; /* Number of bytes allocated at once */
} FILE;
Note that this may be somewhat platform-dependent, so don't take it as gospel. So when you call fopen(), the underlying library function interacts with the O/S's file system APIs and caches relevant information about the file, buffer allocation, etc, in this structure. The fopen() function allocates memory for this structure, and then returns the address of that memory back to the caller in the form of a C Pointer.
Assigning the pointers values to another pointer has no effect on the attributes inside the FILE structure. However, the FILE structure, internally, may have indexes or "pointers" to the underlying O/S file. Hence, the confusion in terminology. Hope that helps.
You are right fp is never changed by fread, fseekor other f... functions. Except, of course, if you do fp = fopen(...), but then you are assigning the return value of fopen to fp and then fp changes of course.
Remember, in C parameters are passed by value, so fread cannot change it's value.
But fread does change the internal structure fp points to.
You made some confusion between a file pointer, under common definition, and the pointer in the file.
Normally with the term file pointer we refer to a pointer to a FILE structure. That structure contains all variables necessary to manage file access. This structure is created upon a successful opening of a file, and remains the same (same address) for all the time until you fclose() the file (when became undefined).
Inside the FILE structure there are many pointers that points to the file block on disk and to the position inside the current record. These pointers, managed by file I/O routines, changes when file is accessed (read or write).
And these pointers are that to which the answer you cited refers.
I have the following struct:
struct entry
{
int next;
int EOC;
} entry;
After creating an instance of the struct and setting its 'next' and 'EOC' values accordingly, I wrote the struct to a file via the C fwrite function. I'd now like to read the values of 'next' and 'EOC' from that file. My question is this: how are these two int variables stored in memory? Once I call fseek to move the file pointer to the correct byte in the file, how many bytes do I read to get the value of 'next' or 'EOC'?
If the file is written on the same machine as you read it, then:
You write with fwrite(&entry, sizeof(entry), 1, fp).
You read with fread(&entry, sizeof(entry), 1, fp).
The size you write is the size of the structure; the size you read is the size of the structure. You test the return values; they'll be either 0 or 1 given what I wrote, and 0 indicates a problem and 1 success in this context.
If your file is open for reading and writing, after writing, you'd seek backwards by sizeof(entry) and then read.
If your file is written on a different (type of) machine, then you may have to worry about the difference in the sizes of the basic data types, and in the endianness of the two systems, and with more complex structures, you might have to worry about packing and alignment rules too. There are ways of dealing with these issues (and they're more complex than single fwrite() and fread() calls).
I've been tasked with updating a function which currently reads in a configuration file from disk and populates a structure:
static int LoadFromFile(FILE *Stream, ConfigStructure *cs)
{
int tempInt;
...
if ( fscanf( Stream, "Version: %d\n",&tempInt) != 1 )
{
printf("Unable to read version number\n");
return 0;
}
cs->Version = tempInt;
...
}
to one which allows us to bypass writing the configuration to disk and instead pass it directly in memory, roughly equivalent to this:
static int LoadFromString(char *Stream, ConfigStructure *cs)
A few things to note:
The current LoadFromFile function is incredibly dense and complex, reading dozens of versions of the config file in a backward compatible manner, which makes duplication of the overall logic quite a pain.
The functions that generate the config file and those that read it originate in totally different parts of the old system and therefore don't share any data structures so I can't pass those directly. I could potentially write a wrapper, but again, it would need to handle any structure passed in in a backwards compatible manner.
I'm tempted to just pass the file as is in as a string (as in the prototype above) and convert all the fscanf's to sscanf's but then I have to handle incrementing the pointer along (and potentially dealing with buffer overrun errors) manually.
This has to remain in C, so no C++ functionality like streams can help here
Am I missing a better option? Is there some way to create a FILE * that actually just points to a location in memory instead of on disk? Any pointers, suggestions or other help is greatly appreciated.
If you can't pass structures and must pass the data as a string, then you should be able to tweak your function to read from a string instead of a file. If the function is as complicated as you describe, then converting fscanf->sscanf would possibly be the most straightforward way to go.
Here's an idea using your function prototype above. Read in the entire data string (without processing any of it) and store it in a local buffer. That way the code can have random access to the data as it can with a file and making buffer overruns easier to predict and avoid. Start by mallocing a reasonably-sized buffer, copy data into it, and realloc yourself more space as needed. Once you have a local copy of the entire data buffer, scan through it and extract whatever data you need.
Note that this might get tricky if '\0' characters are valid input. In that case, you would have to add additional logic to test if this was the end of the input string or just a zero byte (difficulty depending on the particular format of your data buffer).
Since you are trying to keep the file data in memory, you should be able to use shared memory. POSIX shared memory is actually a variation of mapped memory. The shared memory object can be mapped into the process address space using mmap() if necessary. Shared memory is usually used as an IPC mechanism, but you should be able to use it for your situation.
The following example code uses POSIX shared memory (shm_open() & shm_unlink()) in conjunction with a FILE * to write text to the shared memory object and then read it back.
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#define MAX_LEN 1024
int main(int argc, char ** argv)
{
int fd;
FILE * fp;
char * buf[MAX_LEN];
fd = shm_open("/test", O_CREAT | O_RDWR, 0600);
ftruncate(fd, MAX_LEN);
fp = fdopen(fd, "r+");
fprintf(fp, "Hello_World!\n");
rewind(fp);
fscanf(fp, "%s", buf);
fprintf(stdout, "%s\n", buf);
fclose(fp);
shm_unlink("/test");
return 0;
}
Note: I had to pass -lrt to the linker when compiling this example with gcc on Linux.
Use mkstemp() to create a temporary file. It takes char * as argument and uses it as a template for the file's name. But, it will return a file descriptor.
Use tmpfile(). It returns FILE *, but has some security issues and also, you have to copy the string yourself.
Use mmap() ( Beej's Guide for help)