C - Read in file according to a format - c

I am trying to read a file in a specific file format in c.
the file contains some data items. every data item is seprated by a flag.
the file should look look like this:
file-header: "FIL0"
file-id: 0x1020304
flag : 0|1 : uint8_t
length : uint32_t
char[length] : int utf-8
so its: [File-Header] [FileID] [Flag | Length | Data ] [Flag | Length | Data] ...
--> "FIL0" | 0xFFFFFF | 0 or 1 | Data as char[] | 0 or 1 | ... (next data item) ....
My Problem occurs when reading in the file. My idea is to open the file and scan through it using some sscanf-magic.
FILE *fp;
fp = fopen("data.dat". "r");
/* scan file for data components */
while (fgets(buffer, sizeof buffer, fp) != NULL) /* read in file */
{
/* scan for sequence */
if (sscanf(buffer, "%5s", fil0_header) == 1) /* if the "FIL0" header is found */
{
printf("FIL0-header found: %s\n", buffer);
// proceed and scan for [FLAG] [LENGTH] [DATA]
// sscanf()
if (sscanf(buffer, "%u", node) == 1)
{
// doesnt seem to work
}
// read in length of string and extract stringdata
else
{
printf("FIL0-Header not found, found instead: %s\n", buffer);
// do something
}
}
My problem that I have a hard time with my buffer and the varying data types in the file.
The comparision of fil0-header works alright, but:
how to read in the next hexadeciaml number (sscanf using %D)
how scan for the flag which is 1 byte
how to extract the length which is 4 bytes
A problem is, that the check for the flag starts at the beginning of the buffer.
but the pointer should be moved on, after the FIL0-header is found.
I'd be gratefull for any help!
Please help me to find the proper sscanf() -calls:
and want to read it in and retrieve the single parts of my file:
On single [File-Header]
and many {[FileID] [Flag | Length | Data ]} {...} items

well you could just read the file per byte using
line[0] = (char) fgetc(fp);
line[1] = (char) fgetc(fp);
and so on or leave out the cast to retrieve an int-value... should do the trick to do an easy right to left scan of the file (or line - as you say there arent any line breaks)...

You probably could use some standard parsing techniques, for instance have a lexer and a recursive parser. You should define your input syntax more in details. You could perhaps use parser generators (but it might be overkill for your simple example) like ANTLR ...
I suggest you to read some good textbook on parsing (& compiling), it will learn you a lot of useful stuff.

Related

Reading, translating and writing data between multiple CSV files using C

I have 3 CSV files. The first of them contains 4 columns:
ID,Section1,Section2,Secion3
1,23,12,7
2,11,26,9
. . . .
. . . .
19,30,22,4
20,5,6,16
The first column is the ID and the other three contain random numbers ranging from 0 to 30.
The next file is a "conversion" file. It shows the corresponding value to each number:
30,45,44,45
29,44,42,43
28,43,42,41
. . . .
. . . .
1,22,21,22
0,20,21,21
The first column is the number to be read and the next columns are the values that will replace those numbers in each Section.
Like, if you read a 30 in Section1 it will be replaced by a 45, and if you find a 29 in Section 2 it will be replaced by a 42 and so on.
If I were to write the converted numbers into a third CSV file with the same 4 column format, how should I do it?
So far I've had no problems with generating the files since they are randomized, except for the "conversion" file, but I don't know how to proceed with the conversion.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
int main (int argc,char *argv[]){
int i, num1;
char numbers[20][20];
char conversions[20][20];
//File with the initial numbers
FILE *registry = fopen("registry.csv","w+");
if (registry == NULL){
fputs ("File error",stderr);
exit (1);
}
//Conversion file
FILE *conversion = fopen("conversion.csv","r");
if (conversion == NULL){
fputs ("File error",stderr);
exit (1);
}
//File where the resulting values will be written
FILE *results = fopen("results.csv","a");
if (results == NULL){
fputs ("File error",stderr);
exit (1);
}
//Writing the headers
fprintf(results,"ID,Results1,Results2,Results3\n");
fprintf(registry,"ID,Section1,Section2,Section3\n");
srand(time(0));
//Generating the random numbers
for(i=1;i<21;i++){
sprintf(numbers[i],"%d,%d,%d,%d\n",i,rand()%31,rand()%31,
rand()%31);
fputs(numbers[i], registry);
}
//This is where I don't know how to proceed
for(i=1;i<21;i++){
sprintf(conversions[i],"%d,%d,%d,%d\n",i,
fscanf(registry,"%d,%*s,%*s,%*s\n",num1)...);
}
I was trying to do something like what I did to generate the random numbers by saving everything into a buffer and then writing it into the file and I found that the fscanf function can be of great use to skip the parts I don't need to read, but I couldn't figure out how to use it to skip the headers and the ID column and I'm still missing the conversion part.
Conversion file:
30,45,44,45
29,44,43,43
28,43,42,42
27,43,41,40
26,41,40,40
25,40,39,39
24,39,37,39
23,38,37,38
22,38,36,37
21,36,36,37
20,35,35,36
19,34,34,35
18,33,34,35
17,33,33,34
16,32,32,33
15,32,31,32
14,31,30,31
13,30,30,31
12,29,29,30
11,29,28,29
10,28,28,29
9,28,27,27
8,27,26,26
7,27,26,25
6,26,25,25
5,25,24,24
4,25,23,24
3,24,23,23
2,23,22,21
1,21,21,20
0,20,21,20
This answer is scoped to address:
//This is where I don't know how to proceed.
To do what you have described, if it is not necessary to include the first columns in registry, results or conversion for storing, then do not. Although nice for human readability, they complicate the task of reading in, and using arrays. The unused column data and header row requires array indexing gymnastics to get the data to line up correctly into the arrays.
Without the header row and numbering column, array indexes alone are sufficient to track all that is needed to make the conversion.
However, if you feel the first column/row is needed for your purposes, then adjust the array indexing of the routines shown below to accommodate.
The following steps (some of which you already do) are what I identified as essential toward making the conversion step as simple as possible...
Steps:
Create registry - 60 - random numbers (0-30) arranged in 20 rows, 3
data columns
Create int reg[20][3] = {{0}};
Read registry into reg array using a combination fgets sscanf routine.
Example:(assuming registry has already been created in your code)
registry = fopen(".\\registry.csv","r");
if (registry)
{
while(fgets(buf, sizeof(buf), registry))
{
sscanf(buf, "%d,%d,%d", &reg[index][0],&reg[index][1],&reg[index][2] );
index++;
}
fclose(registry);
}
conversion table - pre-existing file
Create int conv[31][3] = {{0}};
Read conversion table into conv array using a combination fgets sscanf routine (see above example)
Once you have these arrays created properly,
Perform conversion:
Create int res[20][3] = {{0}};
Create char buf[80] = {0};
Open 'results' file: FILE *results = fopen(".\\results.csv","w");
populate results:
Example:
//File where the resulting values will be written
FILE *results = fopen(".\\results.csv","w");
if (results)
{
for(i=0;i<sizeof(reg)/sizeof(reg[0]);i++)
{
for(j=0;j<sizeof(reg[0])/sizeof(reg[0][0]);j++)
{
res[i][j] = conv[reg[i][j]][j];
// | | |___reg col == conv col
// | |______value reg == conv row
// |________________________________row and column of conv
}
sprintf(buf, "%d,%d,%d\n", res[i][0],res[i][1],res[i][2]);
fputs(buf, results);
}
fclose(results);
}
At this point, you can print the results file in a format that suits your needs.
Following is an image showing actual results using suggested approaches above for a randomly generated registry, the values you provided for conversion and resulting results. (each created without column 1 shown in your original code, as I suggested in my comments.) Image is showing indexing from 0 - n for each set of array data:

how to read data from a file with space ignored?

I have this piece of program that I use to read data from file:
void baca(int *n)
{
FILE *f = fopen("namafile.txt", "r");
if (f)
{
while (fscanf(f, "%[^|]|%d|%[^\n]\n", mhs[*n].nama, &mhs[*n].umur, mhs[*n].hp)==3)
{
(*n)++;
}
}
fclose(f);
}
If I write the data in the file like this, then the program reads it correctly:
nko|20|9999
hotma|21|9982882
andi|30|212313
But when I add some spaces like this, somehow it doesn't read it correctly:
nko | 20 | 9999
hotma | 21 | 9982882
andi | 30 | 212313
Can someone give me some hint on what I should do ?
Add a space to the format string to specify where the input can have optional whitespace
fscanf(f, "%[^|] |%d | %[^\n]\n", ...)
// ^^^ ^^^^^ optional whitespace
The conversion "%d" already includes optional leading whitespace.
If your input strings can get messier in the future, you will do better with a separate parser instead of scanf().

Bus Error on void function return

I'm learning to use libcurl in C. To start, I'm using a randomized list of accession names to search for protein sequence files that may be found hosted here. These follow a set format where the first line is a variable length (but which contains no information I'm trying to query) then a series of capitalized letters with a new line every sixty (60) characters (what I want to pull down, but reformat to eighty (80) characters per line).
I have the call itself in a single function:
//finds and saves the fastas for each protein (assuming on exists)
void pullFasta (proteinEntry *entry, char matchType, FILE *outFile) {
//Local variables
URL_FILE *handle;
char buffer[2] = "", url[32] = "http://www.uniprot.org/uniprot/", sequence[2] = "";
//Build full URL
/*printf ("u:%s\nt:%s\n", url, entry->title); /*This line was used for debugging.*/
strcat (url, entry->title);
strcat (url, ".fasta");
//Open URL
/*printf ("u:%s\n", url); /*This line was used for debugging.*/
handle = url_fopen (url, "r");
//If there is data there
if (handle != NULL) {
//Skip the first line as it's got useless info
do {
url_fread(buffer, 1, 1, handle);
} while (buffer[0] != '\n');
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
//Print it
printFastaEntry (entry->title, sequence, matchType, outFile);
}
url_fclose (handle);
return;
}
With proteinEntry being defined as:
//Entry for fasta formatable data
typedef struct proteinEntry {
char title[7];
struct proteinEntry *next;
} proteinEntry;
And the url_fopen, url_fclose, url_feof, url_read, and URL_FILE code found here, they mimic the file functions for which they are named.
As you can see I've been doing some debugging with the URL generator (uniprot URLs follow the same format for different proteins), I got it working properly and can pull down the data from the site and save it to file in the proper format that I want. I set the read buffer to 1 because I wanted to get a program that was very simplistic but functional (if inelegant) before I start playing with things, so I would have a base to return to as I learned.
I've tested the url_<function> calls and they are giving no errors. So I added incremental printf calls after each line to identify exactly where the bus error is occurring and it is happening at return;.
My understanding of bus errors is that it's a memory access issue wherein I'm trying to get at memory that my program doesn't have control over. My confusion comes from the fact that this is happening at the return of a void function. There's nothing being read, written, or passed to trigger the memory error (as far as I understand it, at least).
Can anyone point me in the right direction to fix my mistake please?
EDIT: As #BLUEPIXY pointed out I had a potential url_fclose (NULL). As #deltheil pointed out I had sequence as a static array. This also made me notice I'm repeating my bad memory allocation for url, so I updated it and it now works. Thanks for your help!
If we look at e.g http://www.uniprot.org/uniprot/Q6GZX1.fasta and skip the first line (as you do) we have:
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
Which is a 60 characters string.
When you try to read this sequence with:
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
The problem is sequence is not expandable and not large enough (it is a fixed length array of size 2).
So make sure to choose a large enough size to hold any sequence, or implement the ability to expand it on-the-fly.

Editing a line in a text file using temp file C

I am trying to edit a line in a textfile but i have an unexpected behavior while i am editing the file. What i want to do is adjust a specific line (points : 100) of a text that looks like. In the function i pass arguments by value the new coins to be adjusted and the offset of the file with ftell->user_point. What i get as an output is weird. I try to copy the rest of the file to a temp,with an edited line, and then copy it back to the original file from the point that i copied to temp.(thats the user_point offset with ftell).
Here is the original fie with entries like that:
...
_______________________________________
nickname : geo
password : cuvctq
Name : george
Surname : papas
points : 100
participated :
past draws : 0
Chosen No. :
future draws : 0
Registered : Sun Feb 05 19:23:50 2012
...
What i get after 2nd edit run is:
...
_______________________________________
nickname : geo
password : cuvctq
Name : george
Surname : papaspoints : 98
participated :
past draws : 0
Chosen No. :
future draws : 0
Registered : Sun Feb 05 19:23:50 2012
...
At the end of the text i get one extra \n after i edit the
file whch is something i dont want :/
and so further edit will spoil the text...
I also get an EXTRA \n at the end of the line which, at least what i think so, is due to "r+" mode which is something that i also dont want...
void coins_adjust(int coins_new,int user_point)
{
int lines,i,ln_point_copy;
char buffer[50],buff_copied[50];
FILE *lottary,*temp;
memset(buff_copied,'\0',sizeof(char)*50);
lottary=fopen("customers.txt","r");
temp=fopen("temp.txt","w");
fseek(lottary,user_point,SEEK_SET);
for (lines=0;lines<5;lines++)
{
memset(buffer,'\0',sizeof(char)*50);
if (lines==5)
ln_point_copy=ftell(lottary); //from TEMP to CUSTOMERS
fgets (buffer ,50 , lottary);
}
coins_new+=atoi(buffer+15);
strncpy(buff_copied,buffer,15); //copy 15 chars and fill with null
memset(buffer,'\0',sizeof(char)*50);
itoa (coins_new,buffer,10); //fix the new line to be entered
strcat(buff_copied,buffer); //the edited line is as it is supposed
strcat(buff_copied,"\n"); //to be with \n at the end.
puts(buff_copied);
printf("%s",buff_copied);fflush(stdout);
fprintf(temp,"%s",buff_copied);
for(i=getc(lottary); i!=EOF; i=getc(lottary)) //copy to temp
{
putc(i, temp);
}
fclose(lottary);
fclose(temp);
temp=fopen("temp.txt","r");
lottary=fopen("customers.txt","r+");
fseek(lottary,ln_point_copy,SEEK_SET);
for(i=getc(temp); i!=EOF; i=getc(temp)) //copy until eof
{
putc(i, lottary);
}
fclose(lottary);fclose(temp);
}
I have debugged the program and everything seems to work at least on what values are passed to the arrays where i store the line chars but i cant see why it ignores the \n of the previous line when i try to copy it back to the original... There seems to be a \r char that i cant get rid of while i copy back to the original...
Thanks in advance.
I was more thinking about something like this:
void change_points(int new_points)
{
FILE *input = fopen("customers.txt", "r");
FILE *output = fopen("temp.txt", "w");
char buffer[256];
while (fgets(buffer, sizeof(buffer), input))
{
/* Look for the correct line */
/* Can also use e.g. "if (strncmp(buffer, "points", 6) == 0)"
* if it's at the start of the line
*/
if (strstr(buffer, "points") != NULL)
{
int old_points;
sscanf(buffer, "%*s : %d ", &old_points);
/* Format how you like it */
fprintf(output, "%-13s: %d\n", "points", new_points + old_points);
}
else
fputs(buffer, output);
}
fclose(output);
fclose(input);
/* The file "temp.txt" now contains the modifeed text */
/* Copy either using "fgets"/"fputs", or using "fread"/"fwrite" */
input = fopen("temp.txt", "r");
output = fopen("customers.txt", "w");
while (fgets(buffer, sizeof(buffer), input))
fputs(buffer, output);
fclose(output);
fclose(input);
}
It's shorter, simpler, maybe more effective (looping over line-by-line instead of char-by-char), and the line you are looking for can be anywhere in the file without you knowing its exact position.

Why does my program read an extra structure?

I'm making a small console-based rpg, to brush up on my programming skills.
I am using structures to store character data. Things like their HP, Strength, perhaps Inventory down the road. One of the key things I need to be able to do is load and save characters. Which means reading and saving structures.
Right now I'm just saving and loading a structure with first name and last name, and attempting to read it properly.
Here is my code for creating a character:
void createCharacter()
{
char namebuf[20];
printf("First Name:");
if (NULL != fgets(namebuf, 20, stdin))
{
char *nlptr = strchr(namebuf, '\n');
if (nlptr) *nlptr = '\0';
}
strcpy(party[nMember].fname,namebuf);
printf("Last Name:");
if (NULL != fgets(namebuf, 20, stdin))
{
char *nlptr = strchr(namebuf, '\n');
if (nlptr) *nlptr = '\0';
}
strcpy(party[nMember].lname,namebuf);
/*Character created, now save */
saveCharacter(party[nMember]);
printf("\n\n");
loadCharacter();
}
And here is the saveCharacter function:
void saveCharacter(character party)
{
FILE *fp;
fp = fopen("data","a");
fwrite(&party,sizeof(party),1,fp);
fclose(fp);
}
and the loadCharacter function
void loadCharacter()
{
FILE *fp;
character tempParty[50];
int loop = 0;
int count = 1;
int read = 2;
fp= fopen("data","r");
while(read != 0)
{
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
printf("%d. %s %s\n",count,tempParty[loop].fname,tempParty[loop].lname);
loop++;
count++;
}
fclose(fp);
}
So the expected result of the program is that I input a name and last name such as 'John Doe', and it gets appended to the data file. Then it is read in, maybe something like
1. Jane Doe
2. John Doe
and the program ends.
However, my output seems to add one more blank structure to the end.
1. Jane Doe
2. John Doe
3.
I'd like to know why this is. Keep in mind I'm reading the file until fread returns a 0 to signify it's hit the EOF.
Thanks :)
Change your loop:
while( fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp) )
{
// other stuff
}
Whenever you write file reading code ask yourself this question - "what happens if I read an empty file?"
You have an algorithmic problem in your loop, change it to:
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
while(read != 0)
{
//read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
printf("%d. %s %s\n",count,tempParty[loop].fname,tempParty[loop].lname);
loop++;
count++;
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
}
There are ways to ged rid of the double fread but first get it working and make sure you understand the flow.
Here:
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
printf("%d. %s %s\n",count,tempParty[loop].fname,tempParty[loop].lname);
You are not checking whether the read was successful (the return value of fread()).
while( 1==fread(&tempParty[loop],sizeof*tempParty,1,fp) )
{
/* do anything */
}
is the correct way.
use fopen("data","rb")
instead of fopen("data","r") which is equivalent to fopen("data","rt")
You've got the answer to your immediate question but it's worth pointing out that blindly writing and reading whole structures is not a good plan.
Structure layouts can and do change depending on the compiler you use, the version of that compiler and even with the exact compiler flags used. Any change here will break your ability to read files saved with a different version.
If you have ambitions of supporting multiple platforms issues like endianness also come into play.
And then there's what happens if you add elements to your structure in later versions ...
For robustness you need to think about defining your file format independently of your code and having your save and load functions handle serialising and de-serialising to and from this format.

Resources