Need help parsing a "|" seperated line from a file - c

I have to parse a file that would look something like this
String|OtherString|1234|0
String2|OtherString2|4321|1
...
So, I need to go through every line of the file and take each seperate token of each line.
FILE *fp=fopen("test1.txt","r");
int c;
char str1[500];
char str2[500];
int num1=0;
int num2;
while((c=fgetc(fp))!=EOF){
fscanf(fp, "%s|%s|%d|%d", &str1[0], &str2[0], &num1, &num2);
}
fclose(fp);
There's more to it, but these are the sections relevant to my question. fscanf isn't working, presumably because I've written it wrong. What's supposed to happen is that str1[500] should be set to String, in this case, str2 to OtherString, etc. It seems as though fscanf isn't doing anything, however. Would greatly appreciate some help.
EDIT: I am not adamant about using fgetc or fscanf, these are just what I have atm, I'd use anything that would let me do what I have to

strtok() in a loop will work for you. The following is a bare bones example, with very little error handling etc, but illustrates the concept...
char strArray[4][80];
char *tok = NULL;
char *dup = strdup(origLine);
int i = 0;
if(dup)
{
tok = strtok(dup, "|\n");
while(tok)
{
strcpy(strArray[i], tok);
tok = strtok(NULL, "|\n");
i++;
}
free(dup);
}
If reading from a file, then put this loop inside another while loop that reads the file, line by line. Functions useful for this will include fopen(), fgets() and fclose(). One additional feature that should be considered for code that reads data from a file is to determine the number of records (lines) in the file to be read, and use that information to create a properly sized container with which to populate with the parsing results. But this will be for another question.
Note: fgetc() is not suggested here as it reads one char per loop, and would be less efficient than using fgets() for reading lines from a file when used in conjunction with strtok().
Note also, in general, the more consistently a file is formatted in terms of number of fields, content of fields, etc. the least complicated a parser needs to be. The inverse is also true. The less consistently formatted input file requires a more complex parser. For example, for human entered line data, the parser required is typically more complicated than say one used for a computer generated set of uniform lines.

Related

Read a string from a file including white space and search a file for a given string in C

I am trying to search a string in a text file,when the text file is like what given below :
"Naveen; Okies
PSG; Diploma
SREC; BECSE"
When output console ask for input string and when i type naveen it will result in printing Okies, when i typed PSG it will print Diploma. This works fine as I am using the below code :
fscanf(fp, "%[^;];%s\n", temp, Mean);
However below text file is not working,
"Naveen; Okies Is it working
PSG; Diploma Is it working
SREC; BECSE Is it working"
My code still gives me Okies as output for Naveen, where i need "Okies Is it working" as output.
So i changed my code to fscanf(fp, "%[^;];%[^\n]s", temp, Mean); where i am getting 'Okies Is it working' as output. But for searching string it's not searching next line. When i search PSG, I dont get any ouput.
Kindly help me to understand my issue.
Side-bar
Note that you should check the return value from fscanf().
You say you tried:
fscanf(fp, "%[^;];%[^\n]s", temp, Mean);
This is probably a confused format. The s at the end is looking for a literal s in the input, but it will never be found and you'll have no way of knowing that it is not found. The %[^\n] scan set conversion specification looks for a sequence of 'non-newlines'. It will only stop when the next character is a newline, or EOF. The s therefore is a literal s that will never be matched. But the return values from fscanf() is the number of successful conversions, which would probably be 2. You have no way of spotting whether that s was read. It should be removed from the format string.
Main answer
To address your main question, the %s format stops at the first blank. If you want to process the whole line, don't use %s. Use either POSIX getline() or standard C fgets() to read the line, and then analyze it.
You can analyze it with strtok(). I wouldn't do that in any library code because any library function that calls strtok() cannot be used from code that might also be using strtok(), nor can it call any function where that function, or one of the functions it calls directly or indirectly, uses strtok(). The strtok() function is poisonous — you can only use it in one function at a time. These comments do not apply to
strtok_r() or the analogous Microsoft-provided variant strtok_s() — which is similar to but different from the strtok_s() defined in optional Annex K of C11. The variants with a suffix are reentrant and do not poison the system like strtok() does.
I'd probably use strchr() in this context; you could also look at
strstr(),
strpbrk(),
strspn(),
strcpsn(). All of these are standard C functions, and have been since C89/C90.
I think Jonathan has explained it pretty well, however, I am adding a sample to show how to deal with your example for learning purposes. Bear in mind you might wanna change some functions as I used the deprecated (insecure) ones, probably this could be an exercise for you.
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
int main(int argc, char *argv[]) {
FILE *fp = NULL;
char szBuffer[1024] = { '\0' };
char szChoice[256] = { '\0' };
char szResult[256] = { '\0' };
if ((fp = fopen("test.txt", "r")) == NULL) {
printf("Error opening file\n");
return EXIT_FAILURE;
}
printf("Enter your choice: ");
scanf("%s", &szChoice);
while (fgets(szBuffer, sizeof(szBuffer), fp) != NULL) {
if (!strncmp(szBuffer, szChoice, strlen(szChoice))) {
char *pch = szBuffer;
pch += (strlen(szChoice) + 1);
printf("Result: %s", pch);
}
}
getchar();
return EXIT_SUCCESS;
}

sscanf to get segment of string surrounded by two fixed strings

I'm trying to remove the extension of a file (I know it is .txt) using sscanf(). I've tried with many format strings I think may work, but with no success. The main problem is that I just can't understand sscanf()'s documentation, so I don't get how to use this [=%[*][width][modifiers]type=] I've tried to tell it that end must be ".txt" or to save initial string in a variable and a %4ccorresponding to the extension in another one, but again… can't make it work.
I know this has been asked before here: sscanf: get first and last token in a string but as I said... I don´t understand its solution.
The part of my code that does that:
sscanf(fileName,"the_sender_is_%s%*[.txt]", sender);
The input file name is, for example: "the_sender_is_Monika.txt"
In sender I should have
Monika
but whatever I try gives me
Monika.txt
When you use
sscanf(fileName,"the_sender_is_%s%*[.txt]", sender);
The function reads as much as it can with %s before it processes %*[.txt].
Use
sscanf(fileName,"the_sender_is_%[^.]", sender);
While sscanf() is powerful, it is not the universal tool. There are limits on what you can do with it, and you're hitting them. A moderate approximation to the task would be:
char body[32];
char tail[5];
if (sscanf("longish-name-without-dots.txt", "%31[^.]%4s", body, tail) != 2)
…oops — can't happen with the constant string, but maybe with a variable one…
This gets you longish-name-without-dots into body and .txt into tail. But it won't work all that well if there are dots in the name part before the extension.
You're probably looking for:
const char *file = "longish-name.with.dots-before.txt";
char *dot = strrchr(file, '.');
if (dot == NULL)
…oops — can't happen with the literal, but maybe with a variable…
strcpy(tail, dot); // Beware buffer overflow
memcpy(body, file, dot - file);
body[dot - file] = '\0';

fprintf only specific lines from text file

Making a program that adds user records to text file; so far so good! Yet I ran into a problem that I can not figure out on my own.
int main()
{
FILE *fp;
struct info
{
char name[15];
char surename[15];
char gender[15];
char education[15];
} info;
char c;
int i,j,a;
struct info sem;
beginning:
scanf("%d",&a);
if (a==1)
and at this part if user chooses option one 1, program needs to check all the records per person in txt file and printf me every single person information who has bachelors education.
{
FILE *fp=fopen("info.txt", "r");
char tmp[256]={0x0};
while(fp!=NULL && fgets(tmp, sizeof(tmp),fp)!=NULL)
{
if(strstr(tmp,"bachelors"))
printf("test test");
fprintf(fp, "\n%s %s %s %s %s %s",
sem.name,
sem.surname,
sem.gender,
sem.education,);
}
if(fp!=NULL) fclose(fp);
goto beginning;
}
This code so far detects the word "bachelor" but doesn't want to print out the line where it detected it; any ideas how to solve it? Plus, any suggestions how to make sure program only checks education field and doesn't give me false positive if some one would be named Bachelors?
printf/fprintf procedures use internal buffers to avoid calling the syscall "write" everytime.
I would say that you should add a '\n' character at the end of your string to actually force flushing the buffers writing the test. Moreover your token "bachelors" has to be in the 256 first characters of your file.
Second question depends on how your file is formatted, and you are the only one who knows that.
What output do you actually get? You say it doesn't want to print out the correct lines, but does that mean it prints all lines, no lines, some but not others?
Looking at your program, I would hazard a guess that every line is being printed out. You need braces around if(strstr(tmp,"bachelors")) if you want more than one statement in the body of the if.
EDIT:
In addition to your braces problem, you are attempting to use fprintf to print back to fp, which was opened in read mode (using the r flag). You need to use read/update mode (r+) if you want to modify info.txt. However, this is probably not the way you want to do this.
Firstly, you stated you wanted to "printf" the data, which means printing to standard out. If so then you should use printf instead of fprintf. On the other hand it would seem likely that what you want to do is to read lines from one text file and print data out to another file. In that case you probably want two files.
Even with the above modifications, you will be printing the same data to file each time, since the sem struct is not being updated at all.
Finally, your fprintf format string expects six inputs and you only have four (and an erroneous trailing ,). Why?

Skip remainder of line with fscanf in C

I'm reading in a file and after reading in a number, I want to skip to remaining part of that line. An example of a file is this
2 This part should be skipped
10 and also this should be skipped
other part of the file
At the moment I solve this by using this loop:
char c = '\0';
while(c!='\n') fscanf(f, "%c", &c);
I was however wondering whether there isn't a better way of doing this. I tried this, but for some reason it isn't working:
fscanf(f, "%*[^\n]%*c");
I would have expected this to read everything up to the new line and then also read the new line. I don't need the content, so I use the * operator. However, when I use this command nothing happens. The cursor isn't moved.
I suggest you to use fgets() and then sscanf() to read the number. scanf() function is prone to errors and you can quite easily get the format string wrong which may seem to work for most cases and fail unexpectedly for some cases when you find it doesn't handle some specific input formats.
A quick search for scanf() problems on SO would show how often people get it wrong and run into problems when using scanf().
Instead fgets() + sscanf() gives would give you better control and you know for sure you have read one line and you can process the line you read to read integer out it:
char line[1024];
while(fgets(line, sizeof line, fp) ) {
if( sscanf(line, "%d", &num) == 1 )
{
/* number found at the beginning */
}
else
{
/* Any message you want to show if number not found and
move on the next line */
}
}
You may want to change how you read num from line depending on the format of lines in the file. But in your case, it seems the integer is either located at first or not present at all. So the above will work fine.
#include <stdio.h>
int main(){
FILE *f = fopen("data.txt", "r");
int n, stat;
do{
if(1==(stat=fscanf(f, "%d", &n))){
printf("n=%d\n", n);
}
}while(EOF!=fscanf(f, "%*[^\n]"));
fclose(f);
return 0;
}
I wanted to parse the /proc/self/maps file, but only wanted the first 2 columns (start and end of address range). This worked fine with Linux gcc.
scanf("%llx-%llx %*[^\n]\n", &i, &e);
The trick was "%*[^\n]\n" which means skip a sequence of anything except the end of line, then skip the end of line.

fprintf prints a new line at the beginning of the file

I'm using a fprintf function to print to a new file
I'm using the following command to write multiple times:
fprintf(fp, "%-25s %d %.2f %d",temp->data.name, temp->data.day, temp->data.temp, temp->data.speed);
The problem is that sometimes the file gets an extra new line as the first character.
Could this be lelftovers from some buffer, I don't really know...
typedef struct Data {
char name[26];
int day;
int speed;
float temp;
} Data ;
#spatz you were right, I'm kind of new to the string format thing and I was told to make one for a fscanf where I was to expect an undetermined amount of space between the bits of data, here is what I came up with, I'm pretty sure its the source of the problem:
check=fscanf(fp1, "%20c%*[^0-9]%d%*[^0-9]%f%*[^0-9]%d%*[^\n]%*c", name, &day, &temp, &speed);
only the first line get read normally and everything afterwards reads the new line of the previous line.
Can someone please show me the proper way to write this thing?
Rather than calling fscanf() over and over, and hoping that the newlines match up how you want, use fgets() to get one line at a time, parse it using fscanf(), and do error handling on a line-by-line basis. This will be less error-prone, and it sounds like it will clear up your problem with no extra effort.
Your problem is that name starts with a newline, and that newline ends up in the file.
In order to properly parse the file I would have to know its format, but for now I assume it's <string> <int> <int> <float> where the number of spaces between each element may vary.
The format string I would start with is simply "%s%d%d%f", and let fscanf() deal with the whitespace. With this format string I was able to properly parse lines like
foo 3 4 7
If this does not satisfy you feel free to elaborate on the format of the file you are parsing and I'll try to come up with solutions.

Resources