I am reading and writting a structure into a text file which is not readable. I have to write readable data into the file from the structure object.
Here is little more detail of my code:
I am having the code which reads and writes a list of itemname and code into a file (file.txt). The code uses linked list concept to read and write data.
The data are stored into a structure object and then writen into a file using fwrite.
The code works fine. But I need to write a readable data into the text file.
Now the file.txt looks like bellow,
㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀\䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠\㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈\䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔\䥆㘸䘠㵅㩃䠀䵏㵈
I am expecting the file should be like this,
pencil aaaa
Table bbbb
pen cccc
notebook nnnn
Here is the snippet:
struct Item
{
char itemname[255];
char dspidc[255];
struct Item *ptrnext;
};
// Writing into the file
printf("\nEnter Itemname: ");
gets(ptrthis->itemname);
printf("\nEnter Code: ");
gets(ptrthis->dspidc);
fwrite(ptrthis, sizeof(*ptrthis), 1, fp);
// Reading from the file
while(fread(ptrthis, sizeof(*ptrthis), 1, fp) ==1)
{
printf("\n%s %s", ptrthis->itemname,ptrthis->dspidc);
ptrthis = ptrthis->ptrnext;
}
Writing the size of an array that is 255 bytes will write 255 bytes to file (regardless of what you have stuffed into that array). If you want only the 'textual' portion of that array you need to use a facility that handles null terminators (i.e. printf, fprintf, ...).
Reading is then more complicated as you need to set up the idea of a sentinel value that represents the end of a string.
This speaks nothing of the fact that you are writing the value of a pointer (initialized or not) that will have no context or validity on the next read. Pointers (i.e. memory locations) have application only within the currently executing process. Trying to use one process' memory address in another is definitely a bad idea.
The code works fine
not really:
a) you are dumping the raw contents of the struct to a file, including the pointer to another instance if "Item". you can not expect to read back in a pointer from disc and use it as you do with ptrthis = ptrthis->ptrnext (i mean, this works as you "use" it in the given snippet, but just because that snippet does nothing meaningful at all).
b) you are writing 2 * 255 bytes of potential crap to the file. the reason why you see this strange looking "blocks" in your file is, that you write all 255 bytes of itemname and 255 bytes of dspidc to the disc .. including terminating \0 (which are the blocks, depending on your editor). the real "string" is something meaningful at the beginning of either itemname or dspidc, followed by a \0, followed by whatever is was in memory before.
the term you need to lookup and read about is called serialization, there are some libraries out there already which solve the task of dumping data structures to disc (or network or anything else) and reading it back in, eg tpl.
First of all, I would only serialize the data, not the pointers.
Then, in my opinion, you have 2 choices:
write a parser for your syntax (with yacc for instance)
use a data dumping format such as rmi serialization mechanism.
Sorry I can't find online docs, but I know I have the grammar on paper.
Both of those solution will be platform independent, be them big endian or little endian.
Related
I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files. First of all, when I use fopen(filename, 'w') it creates a file but without any extension. However, when I use vim filename it opens it up in some binary form.
For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file. When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read. What happens if I put a decimal value less than the number of structs there are in the # of structs parameter? How would fread know what to read correctly? How does it work in reading data just by looking at the sizes and not knowing what type of data it is?
fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. No assumption is made regarding the types and representations of the C objects stored in this memory.
Hence a number of problems can occur:
the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches. Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems. For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to. Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread.
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances.
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data. Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.
For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening. For other purposes, saving as text files in human readable form is much preferred.
In question comments from #David C. Rankin
"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf"
So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?
"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.."
I am working on a database flat file project using c language. I have created a structure with few members. I am using fwrite() to write them in binary file and fread() to fetch the data. My two major question
1st can we write structure in text file? I have seen no good example. Is it practically wrong to write it in text format? when I write using "w" instead of "wb" I get the text format but with some extra words.
2nd how these fread() & fwrite works(). They operate on a block of data how they get the address of next block. I mean we do have the pointer but file doesnt have any address so how the pointer go to next block?
1st can we write structure in text file ? i have seen no good example
.is it practically wrong to write it in text format ?when i write
using "w" instead of "wb" i get the text format but with some extra
words
Imagine your structure contains some integers inside. Now if you write them using fwrite, these integers will be written in file in binary format.
If you try to interpret this as text, this won't work, text editor will try to interpret the binary values as characters - which will most likely not work as you expect.
e.g. if your structure contains integer 3, when written using fwrite, it will be stored as
00000000 0000000 0000000 00000011 (3 in binary)
assuming big endian notation. Now if you will try to read above using a text editor, of course you will not get desired effect.
Not saying anything about the padding bytes which maybe inserted in your structure by compiler.
2nd how these fread() & fwrite works(). They operate on a block of
data how they get the address of next block. I mean we do have the
pointer but file doesnt have any address so how the pointer go to next
block?
This is most likely taken care of using OS.
PS. I suggest you read more about serialization, and try to understand difference between text and binary files.
How best would I output the following code
#include <CoreFoundation/CoreFoundation.h> // Needed for CFSTR
int main(int argc, char *argv[])
{
char *c_string = "Hello I am a C String. :-).";
CFStringRef cf_string = CFStringCreateWithCString(0, c_string, kCFStringEncodingUTF8);
// output cf_string
//
}
There's no API to write a CFString directly to any file (including stdout or stderr), because you can only write bytes to a file. Characters are a (somewhat) more ideal concept; they're too high-level to be written to a file. It's like saying “I want to write these pixels”; you must first decide what format to write them in (say, PNG), and then encode them in that format, and then write that data.
So, too, with characters. You must encode them as bytes in some format, then write those bytes.
Encoding the characters as bytes/data
First, you must pick an encoding. For display on a Terminal, you probably want UTF-8, which is kCFStringEncodingUTF8. For writing to a file… you usually want UTF-8. In fact, unless you specifically need something else, you almost always want UTF-8.
Next, you must encode the characters as bytes. Creating a C string is one way; another is to create a CFData object; still another is to extract bytes (not null-terminated) directly.
To create a C string, use the CFStringGetCString function.
To extract bytes, use the CFStringGetBytes function.
You said you want to stick to CF, so we'll skip the C string option (which is less efficient anyway, since whatever calls write is going to have to call strlen)—it's easier, but slower, particularly when you use it on large strings and/or frequently. Instead, we'll create CFData.
Fortunately, CFString provides an API to create a CFData object from the CFString's contents. Unfortunately, this only works for creating an external representation. You probably do not want to write this to stdout; it's only appropriate for writing out as the entire contents of a regular file.
So, we need to drop down a level and get bytes ourselves. This function takes a buffer (region of memory) and the size of that buffer in bytes.
Do not use CFStringGetLength for the size of the buffer. That counts characters, not bytes, and the relationship between number of characters and number of bytes is not always linear. (For example, some characters can be encoded in UTF-8 in a single byte… but not all. Not nearly all. And for the others, the number of bytes required varies.)
The correct way is to call CFStringGetBytes twice: once with no buffer (NULL), whereupon it will simply tell you how many bytes it'll give you (without trying to write into the buffer you haven't given it); then, you create a buffer of that size, and then call it again with the buffer.
You could create a buffer using malloc, but you want to stick to CF stuff, so we'll do it this way instead: create a CFMutableData object whose capacity is the number of bytes you got from your first CFStringGetBytes call, increase its length to that same number of bytes, then get the data's mutable byte pointer. That pointer is the pointer to the buffer you need to write into; it's the pointer you pass to the second call to CFStringGetBytes.
To recap the steps so far:
Call CFStringGetBytes with no buffer to find out how big the buffer needs to be.
Create a CFMutableData object of that capacity and increase its length up to that size.
Get the CFMutableData object's mutable byte pointer, which is your buffer, and call CFStringGetBytes again, this time with the buffer, to encode the characters into bytes in the data object.
Writing it out
To write bytes/data to a file in pure CF, you must use CFWriteStream.
Sadly, there's no CF equivalent to nice Cocoa APIs like [NSFileHandle fileHandleWithStandardOutput]. The only way to create a write stream to stdout is to create it using the path to stdout, wrapped in a URL.
You can create a URL easily enough from a path; the path to the standard output device is /dev/stdout, so to create the URL looks like this:
CFURLRef stdoutURL = CFURLCreateWithFileSystemPath(kCFAllocatorDefault, CFSTR("/dev/stdout"), kCFURLPOSIXPathStyle, /*isDirectory*/ false);
(Of course, like everything you Create, you need to Release that.)
Having a URL, you can then create a write stream for the file so referenced. Then, you must open the stream, whereupon you can write the data to it (you will need to get the data's byte pointer and its length), and finally close the stream.
Note that you may have missing/un-displayed text if what you're writing out doesn't end with a newline. NSLog adds a newline for you when it writes to stderr on your behalf; when you write to stderr yourself, you have to do it (or live with the consequences).
So:
Create a URL that refers to the file you want to write to.
Create a stream that can write to that file.
Open the stream.
Write bytes to the stream. (You can do this as many times as you want, or do it asynchronously.)
When you're all done, close the stream.
I was given an assignment to manage a sort of music store. In the database (which is saved as a .dat file) we have an artists name, and the album.
I'm having problems writing and reading the file.
First thing is, even if I don't write anything, just create the file, and then open the file in notepad, i see gibberish and letters in chinese or japanese.
Even if i write to the fail, or read from it using visual studio, this doesnt seem to change. Here's my code:
I opened the file with:
p=fopen("database.dat","w+");
The add item function:
void add_item(char* artist,char* record,FILE* p) //adds an item with artist and record to store
{
item node;
int item_size=sizeof(item);
rewind(p);
strcpy(node.artist,artist);
strcpy(node.record,record);
fwrite(&node,item_size,1,p);
printf("Data added\n");
}
item is the struct that is used to define a single item in the store. it has 2 fields, string artist and string record.
typedef struct item
{
char artist[100],record[100];
}item;
This is for reading:
void print_file(FILE* p) //print the entire file
{
int size=sizeof(item);
item node;
rewind(p);
while(!feof(p))
{
fread(&node,size,1,p);
printf("%s - %s\n",node.artist,node.record);
}
}
if i use print_file, i see gibberish, if i actually open the file with notepad, i see japanese.
Help! :D
edit: Just discovered something. If I add an item, and then read the file, i will read the item. But if i run the program again, and try to read the file immediatly, i see gibberish.
Problem is with "w+", it will truncate existing file to zero length and open it for write & read. When you run the program second time this is what happens and your read (before write) returns gibberish.
More on fopen here
I'm guessing based on your symptoms that the strings are defined as char* and not char[], which would be your problem. String IO on files doesn't work that way. Keep in mind that a string is actually of type char*, that is, a pointer to one or more 8-bit integers. When you write the string to the file, you're actually writing the value of the address itself, not the characters it points to.
You should use the fprintf() function to write strings to files:
http://www.cplusplus.com/reference/cstdio/fprintf/
And then fscanf() to read them:
http://www.cplusplus.com/reference/cstdio/fscanf/
In general, if you're going to use a struct as a file format, you can't put pointers in the struct. You could put char[] values, because they aren't handled the same as pointers, but that would require a hard limit on string size. This is one of the reasons why structs as file formats are discouraged by some-- better to read and write the values one at a time, handling strings and so on appropriately.
The reason it works when you read the file back immediately (before quitting your program) is because that pointer address is still valid for that string. But the string itself never got written into the file.
I'm novice programmer and am writing a simple wav-player in C as a pet project. Part of the file loading process requires reading specific data (sampling rate, number of channels,...) from the file header.
Currently what I'm doing is similar to this:
Scan for a sequence of bytes and skip past it
Read 2 bytes into variable a
Check value and return on error
Skip 4 bytes
Read 4 bytes into variable b
Check value and return on error
...and so on. (code see: https://github.com/qgi/Player/blob/master/Importer.c)
I've written a number of helper functions to do the scanning/skipping/reading bit. Still I'm repeating the reading, checking, skipping part several times, which doesn't seem to be neither very effective nor very smart. It's not a real issue for my project, but as this seems to be quite a common task when handling binary files, I was wondering:
Is there some kind of a pattern on how to do this more effectively with cleaner code?
Most often, people define structs (often with something like #pragma pack(1) to assure against padding) that matches the file's structures. They then read data into an instance of that with something like fread, and use the values from the struct.
The cleanest option that I've come across is the scanf-like function unpack presented by Kernighan & Pike on page 219 of The Practice of Programming, which can be used like
// assume we read the file header into buf
// and the header consists of magic (4 bytes), type (2) and length (4).
// "l" == 4 bytes (long)
// "s" == 2 bytes (short)
unpack(buf, "lsl", &magic, &type, &length);
For efficiency using a buffer of say size 4096 to read into and then doing your parsing on the data in the buffer would be more efficient, and ofcource doing a single scan parsing where you only go forward is most efficient.