reading file unknown format in C - c

I need some help with this exercise in C language.
I would like to know how do I read data from a file that I don't know it's format.
-The file will contain int(1-999) and char: "OL"=overloaded, "ND"=noData, "LB"=lowBattery.
Example:
My_file.txt
Can be made like this:
25
764
OL
ND
34
LB
624
235
ND
........
Or like this:
534 ND 356 LB LB 234 765 123 ND ND......
235 976 LB 156 ND......
I know that this:
FILE *f;
char str1;
f=fopen(filename,"r");
str1=fgetc(f);
while(str1 != EOF)
{
printf("%c",str1);
str1=fgetc(f);
}
fclose(f);
can read the file until EOF. But I can't use it because i need to assign those values to some int or chars...(what if i use enum?)
I am sure that I can't use fscanf. But the real question is: How to I read the file, and how to I assign those values to a struct or something...
So then i can use them for operations(like sum and more).
Thank you very much guys...

 I don't know it's format
Hmm .. it seems to me that You know the format exactly:
The file will contain int(1-999) and char: "OL"=overloaded, "ND"=noData, "LB"=lowBattery
Your file contains a whitespace separated sequence of tokens, each of which is either OL, ND, LB or an integer in the specified range.
So to parse that file read one character at a time. Whitespace? Ignore and continue with the next. A digit? Now should come up to 2 more digits. Read them and convert to an integer. 'O', 'N' or 'L'? Look for the next character to be the correct one. Everything else? Parse error!
To save each token create a structure like:
struct Token
{
enum
{ TokenOverLoad
, TokenNoData
, TokenLowBattery
, TokenData
} kind;
short data; // only if kind == TokenData
};
Then store these in either a list or dynamic array during parsing. Afterwards You can iterate over that list/array to implement any required functions like sum ...

I asked a friend. He said that I can use fscanf.
I only need to define a struct with characters.
With fscanf I will read %s and add them to the char char_name[20];
If I want, i can use atoi/atof for the numbers or strcmp for the chars.
If anybody knows another easiest solution. Please answer :)
Soon i will post the code, working on it:)

Related

Reading a file into a struct C

I am working on an assignment that puts takes a file containing a recipe and creates an instance of a struct to store the information. This is the format that my struct follows:
struct Dinner
{
char* recipeName;
unsigned numMainDishIngredients;
char** mainDishIngredients;
unsigned numDessertIngredients;
char** DessertIngredients;
};
I need to figure out how to use a read in a file which will be structured as follows:
The first line will contain the name of the recipe, the second line will be the number of ingredients in the main dish, then the next lines will each contain one ingredient that is in the main dish until one blank line is hit. The line following the blank line will contain the number of ingredients in the dessert and the following lines will each contain a dessert ingredient.
An example is as follows:
Pizza and Ice Cream
4
Dough
Cheese
Sauce
Toppings
3
Cream
Sugar
Vanilla
I am mostly unsure of how to read into the char** types. So far this is all I have:
struct Dinner* readRecipe(const char* recipeFile)
if (!recipeFile)
{
return NULL;
}
File* file = fopen(recipeFile, "r");
if (!file)
{
return NULL;
}
char recipeName[50]; // specified that strings wont exceed 49 chars
int numMainIngredients, numDessertIngredients;
fscanf(file, "%s, %d", &recipeName, numMainIngredients);
...
}
Basically I do not know how to read multiple lines of a file into an array type in a structure and I would really appreciate any tips on how to do this.
Reading from a file is pretty straight forward. Most of the std functions are designed to read a single line from the file, then move to the next line automatically. So all you really need to do, is loop.
I recommend that you write the following
#define MAXCHAR 256
char[MAXCHAR] line;
while(fgets(line, MAXCHAR, file) != NULL)
{
// line now has the next line in the file
// do something with it. store it away
// use atoi() for get the number?
// whatever you need.
}
That is, we use fgets() to grab the next line in the file; and if we loop that a bunch; it will read until the end of file (EOF).
Note that I used fgets() instead of fscanf(). fgets() is a safer and more efficient option than fscanf. Granted, it does not come with the fancy ability to specify line formatting and such; but it's not too difficult to do that on your own.
edit: I mixed up my languages pretty badly.. fixed it.

Issue reading Japanese characters from file - C

I am writing a program which reads a file with almost 2 million lines. The file is in the format integer ID tab with an artist name string.
6821361 Selinsgrove High School Chorus
10151460 greek-Antique
10236365 jnr walker & the all-stars
6878792 Grieg - Kraggerud, Kjekshus
6880556 Mr. Oiseau
6906305 stars on 54 (maxi single)
10584525 Jonie Mitchel
10299729 エリス レジーナ/アントニオ カルロス ジョビン
Above is an example with some items from the file (not some lines do not follow the specific format). My program work file until it gets to the last line from the example then it endlessly prints エリス レジーナ/アントニオ カルロス ジョビ\343\203.
struct artist *read_artists(char *fname)
{
FILE *file;
struct artist *temp = (struct artist*)malloc(sizeof(struct artist));
struct artist *head = (struct artist*)malloc(sizeof(struct artist));
file = fopen("/Users/Daniel/Library/Developer/Xcode/DerivedData/project_Audioscrobbler_Artists-hgwyqpinuoxayzbmvarcjxryqnrz/Build/Products/Debug/artist_data.txt", "r");
if(file == 0)
{
perror("fopen");
exit(1);
}
int artist_ID;
char artist_name[650];
while(!feof(file))
{
fscanf(file, "%d\t%65[^\t\n]\n", &artist_ID, artist_name);
temp = create_play(artist_ID, artist_name, 0, -1);
head = add_play(head, temp);
printf("%s\n", artist_name);
}
fclose(file);
//print_plays(head);
return head;
}
Above is my code for reading from the file. Can you please help explain what is wrong?
As the comments indicate, one problem is with while(!feof(file)) The linked content will explain in detail why this is not a good idea, but in summary, quoting from one of the answers in the link:
(!feof(file))...
...is wrong because it tests for something that is
irrelevant and fails to test for something that you need to know. The
result is that you are erroneously executing code that assumes that it
is accessing data that was read successfully, when in fact this never
happened. - Kerrek SB
In your case, this usage does not cause your problem, but as Kerrek explains might happen, masks it.
You can replace that with fgets(...):
char lineBuf[1000];//make length longer or shorter for your purpose
file = fopen("/Users/Daniel/Library/Developer/Xcode/DerivedData/project_Audioscrobbler_Artists-hgwyqpinuoxayzbmvarcjxryqnrz/Build/Products/Debug/artist_data.txt", "r");
if(!file) return -1;
while(fgets (lineBuf, sizeof(lineBuf), file))
{
//process each line here
//But processing Japanese characters
//will require special considerations.
//Refer to the link below for UNICODE tips
}
Unicode in C and C++...
In particular, you will need to use variable types that are sufficient for containing the different size characters you will be processing. The link discusses this in great detail.
Here is an excerpt:
"char" no longer means character
I hereby recommend referring to character codes in C programs using a 32-bit unsigned integer type. Many platforms provide a
"wchar_t" (wide character) type, but unfortunately it is to be avoided
since some compilers allot it only 16 bits—not enough to represent
Unicode. Wherever you need to pass around an individual character,
change "char" to "unsigned int" or similar. The only remaining use for
the "char" type is to mean "byte".
Edit:
In the comments above, you state but the string it's failing on is 66 bytes long. Because you are reading into a 'char' array, the bytes necessary to complete the character were truncated one byte before including the last necessary byte. ASCII characters can be contained in a single char space. Japanese characters cannot. If you were using an array of unsigned int instead of array of char, the last byte would have been included.
OP's code failed because the result of fscanf() was not checked.
fscanf(file, "%d\t%65[^\t\n]\n", &artist_ID, artist_name);
The fscanf() read in 65 char of "エリス レジーナ/アントニオ カルロス ジョビン". Yet this string, encoded in UTF8, has a length of 66. The last 'ン' is codes 227, 131, 179 (octal 343 203 263) and only the last 2 were read. When artist_name is printed the following appears.
エリス レジーナ/アントニオ カルロス ジョビ\343\203
Now begins the problem. The last char 179 remains in in file. On the next fscanf(), it fails as char 179 does not convert into a int ("%d"). So fscanf() returns 0. Since code did not check the result of fscanf(), it does not realize artist_ID and artist_name are left over from before and so prints the same text.
As feof() is never true for the char 179 is not consumed, we have infinite loop.
The while(!feof(file)) hid this problem, but did not cause it.
The fgets() proposed by #ryyker is a good approach. Another is:
while (fscanf(file, "%d\t%65[^\t\n]\n", &artist_ID, artist_name) == 2) {
temp = create_play(artist_ID, artist_name, 0, -1);
head = add_play(head, temp);
printf("%s\n", artist_name);
}
IOWs, validate the results of *scanf().

C Programming, Reading specific sections from file

my question is how can I read specific sections from a file? For instance, if my file was:
454545454 Joe Brown 70 50 40
656565656 David Smith 80 90 100
383838383 George Williams 95 100 80
How could I read the first string (9-Digit #), skip over the name, and then read the 3 sets of numbers?
I think that you could notice that the white space is your sentinel. I'm thinking that maybe you can store the whole file into a char* and asking for this sentinel each time.
Other solution could be using atoi (ascii to int) for validate if it's a number or a letter. You can also read about fread and fseek.
I think that the best way is to mix both solution... find each sentinel and try to parse it using atoi.
The main idea is that you try to find some pattern in the file that allows you to think the algorithm.
In C, most of the times you have to solve the logic by yourself.
Hope it helps!
Instead of "reading specific sections," read file line by line and save the information you want and discard the others. scanf is used to read formatted from an external source into program variables. Since scanf returns the number of successful reads from the source, you can use that to do some error checking.
char num_string[STR_LEN];
int numbers[3];
char dummy1[STR_LEN], dummy2[STR_LEN];
int num_read = scanf( "%s%s%s%d%d%d", num_string, dummy1, dummy2, &numbers[0], &numbers[1], &numbers[2] );
if( num_read != 6 )
// error
else
{
// do stuff with num_string, and numbers[0]-numbers[2]
}

how to read from binary file into struct c?

i have the following struct
struct
{
char order;
int row;
int column;
int length;
char printChar;
}record;
and my file look's like this
F
30
40
7
X
how can i use fread to store the file in the struct?
does my file appear correctly or should all the components need to be in one-line?
If I understand correctly, you're asking if you can do
struct record r;
fread(file, &r, sizeof(r));
or are you forced to use
struct record r;
fread(file, &r.order, sizeof(r.order));
If this is your question, then the answer is: you have to read the fields one-by-one since there may be padding between struct members. Or, if you use a GNU-compatible compiler, you might instruct it not to include any padding by declaring your struct as "packed":
struct record {
// ...
} __attribute__((packed));
But this is not advised unless absolutely necessary (it's not portable).
Also, is your file really a binary file? If not, you should pay attention to newline characters and converting the numbers from text to their actual numeric value.
It is not possible to read from a file in that format (essentially containing the character representations of the data) into the structure. One method for reading it would be to use fgets and read each line and assign the data into the structure (converting numeric values as necessary with functions such as strtol or perhaps atoi if error checking is not as important).
Your file seems to be a text file, so if that's exactly the format of the file, you can use fscanf:
fscanf(file, "%c%d%d%d%c", &(record.order), &(record.row), ...
You can check the return value if you're interested in basic error handling. If you need a better description of the error, just use fgets to read one line at a time and parse it with sscanf, atoi, strtol and similar functions.
If you want to directly save data in the structure, no, you can't (with that kind of file), in a text file 30 is a string of two characters, not an integer in binary form.

Looking for patterns in binary files

I'm working on a small project in C where I have to parse a binary file of undocumented file format. As I'm quite new to C I have two questions to some more experienced programmers.
The first seems to be an easy one. How do I extract all the strings from the binary file and put them into an array? Basically I am looking for a simple implementation of strings program in C.
When I open the binary file in any text editor I get a lot of rubbish with some readable strings mixed in. I can extract this strings using strings in the command line. Now I'd like to do something similar in C, like in the pseudocode below:
while (!EOF) {
if (string found) {
put it into array[i]
i++
}
return i;
}
The second problem is a little bit more complicated and is, I believe, the proper way of achieving the same thing. When I look at the file in HEX editor it's easy to notice some patterns. For example before each string there is a byte of value 02 (0x02) followed by the length of the string and the string itself. For example 02 18 52 4F 4F 54 4B 69 57 69 4B 61 4B 69 is a string with the string part in bold.
Now the function I'm trying to create would work like this:
while(!EOF) {
for(i=0; i<buffer_size; ++i) {
if(buffer[i] hex value == 02) {
int n = read the next byte;
string = read the next n bytes as char;
put string into array;
}
}
}
Thanks for any pointers. :)
The first seems to be an easy one. How do I extract all the strings from the binary file and put them into an array?
Figure out what character range represents printable ASCII characters. Iterate across the file, checking if characters are ASCII characters, and counting up for adjacent ASCII characters. By default, strings will treat sequences of four or more characters as strings; when you find the next non-ASCII character, check if the number has been exceeded; if it has, output the string. Some book-keeping is necessary.
The second problem is a little bit more complicated and is, I believe, the proper way of achieving the same thing.
Your pseudocode is essentially correct. You can manually compare the contents of buffer[i] with an integer (e.g. 2). Reading a byte is as simple as incrementing i. Make sure you don't overrun the buffer, and make sure the array your reading the string to is big enough (if the size parameter is only one byte, you can get away with a 255 length array buffer.)
I'm not sure your solution will work: what if you find a string with 350 char length?
Numbers can be part of a string or you can consider them "rubbish"?
I think the most safe way is
Define what you consider string and what you consider "rubbish" - for instance ":,!?" are "string" or "rubbish"?
Define a minimum string length to be considered a "readable" string
Parse the file looking for every group of char with length >= minimum.
I know, it's boring, but I think it's the only safe way. Good luck!

Resources