File data split by ";" and then by "," in C - c

Basically I need to read the input from a file, the data on the file looks something like this:
A422TCRE234VD3KJ349D;2000,Gasoleo;Vermelho,;17,200,Platinum;17,200,45;
3KJ349DA422TCRE234VD;,diesel;Azul,Minivan;17,200,45;10,20,30;
DA422TC3KJ349RE234VD;,;Vermelho,;,,;,,;
Now what I need to do next is read this data and separate it by ";" which I'm doing using sscanf, like this:
sscanf(line," %[^;];%[^;];%[^;];%[^;];%[^;]", campos[0],campos[1], campos[2] ,campos[3],campos[4]);
"line" variable holds an entire line read from the file. All fine up until this point. Now I need to split each one of those arrays (campos[x]) by "," because I need to save each data on different structs I created. I was trying to achieve this by using an auxiliary array "output" where I combine all of the previous fields, like so:
strcpy(output,campos[1]);
strcat(output,",");
strcat(output,campos[2]);
strcat(output,",");
strcat(output,campos[3]);
strcat(output,",");
strcat(output,campos[4]);
Then I use sscanf again to try to split them:
sscanf(output, " %[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,],%[^,]",helper[0],helper[1],helper[2],helper[3],helper[4],helper[5],helper[6],helper[7],helper[8], helper[9]);
But though luck, it's not working, probably because as you can see on the input file, some of the atributes are empty, the thing is I need them to stay empty, because afterwards I need to show data to the user in the same format as the input file.
I've tried using strtok(), and many variations of the code I have here, nothing seems to work.
Just so you have a better idea I'll write here the output I'm currently having:
puts(output); -> 2000,Gasoleo,Vermelho,,17,200,Platinum,17,200,45
,diesel,Azul,Minivan,17,200,45,10,20,30
,,Vermelho,,,,,,,
printf("%s,%s\n", helper[0],helper[1]); -> 2000,Gasole
,
,
Also so I can give a better perspective, this file represents work orders for a car factory, so each line is as follows:
Word Order Id;Horsepower,Fuel Type;Colour,Model;Diameter,Width,Colour;Diameter,Width,Height;
Example: Horsepower and Fuel Type are engine attributes, each ";" separates between car parts.
What can I try to fix this?

The fact that there are no spaces between the fields you want to parse makes it difficult to use sscanf or strtok for that matter.
Maybe a custom algorithm is the best way to go:
Live demo
int main() {
FILE *f = fopen("file.txt", "r");
if (f == NULL) {
perror("File");
}
else {
size_t i = 0, j = 0, it = 0;
char line[100];
char helper[12][100];
while (fgets(line, sizeof(line), f)) {
int c;
i = j = it = 0;
while ((c = line[it]) != '\n' && c != '\0'){ //read and assign cycle
if (c != ';' && c != ',') { //if character is not, or ; assign
helper[i][j++] = c;
}
else {
helper[i][j] = '\0'; //else terminate string
i++; //go to next helper line
j = 0; //reset column iterator
}
it++;
}
for (size_t a = 1; a < i; a++){ //test print
printf(",%s", helper[a]);
}
printf("\n");
}
}
}
Output:
,2000,Gasoleo,Vermelho,,17,200,Platinum,17,200,45
,,diesel,Azul,Minivan,17,200,45,10,20,30
,,,Vermelho,,,,,,,
As requested.

Related

parsing a file while reading in c

I am trying to read each line of a file and store binary values into appropriate variables.
I can see that there are many many other examples of people doing similar things and I have spent two days testing out different approaches that I found but still having difficulties getting my version to work as needed.
I have a txt file with the following format:
in = 00000000000, out = 0000000000000000
in = 00000000001, out = 0000000000001111
in = 00000000010, out = 0000000000110011
......
I'm attempting to use fscanf to consume the unwanted characters "in = ", "," and "out = "
and keep only the characters that represent binary values.
My goal is to store the first column of binary values, the "in" values into one variable
and the second column of binary values, the "out" value into another buffer variable.
I have managed to get fscanf to consume the "in" and "out" characters but I have not been
able to figure out how to get it to consume the "," "=" characters. Additionally, I thought that fscanf should consume the white space but it doesn't appear to be doing that either.
I can't seem to find any comprehensive list of available directives for scanners, other than the generic "%d, %s, %c....." and it seems that I need a more complex combination of directives to filter out the characters that I'm trying to ignore than I know how to format.
I could use some help with figuring this out. I would appreciate any guidance you could
provide to help me understand how to properly filter out "in = " and ", out = " and how to store
the two columns of binary characters into two separate variables.
Here is the code I am working with at the moment. I have tried other iterations of this code using fgetc() in combination with fscanf() without success.
int main()
{
FILE * f = fopen("hamming_demo.txt","r");
char buffer[100];
rewind(f);
while((fscanf(f, "%s", buffer)) != EOF) {
fscanf(f,"%[^a-z]""[^,]", buffer);
printf("%s\n", buffer);
}
printf("\n");
return 0;
}
The outputs from my code appear as follows:
= 00000000000,
= 0000000000000000
= 00000000001,
= 0000000000001111
= 00000000010,
= 0000000000110011
Thank you for your time.
The scanf family function is said to be a poor man'parser because it is not very tolerant to input errors. But if you are sure of the format of the input data it allows for simple code. The only magic here if that a space in the format string will gather all blank characters including new lines or none. Your code could become:
int main()
{
FILE * f = fopen("hamming_demo.txt", "r");
if (NULL == f) { // always test open
perror("Unable to open input file");
return 1;
}
char in[50], out[50]; // directly get in and out
// BEWARE: xscanf returns the number of converted elements and never EOF
while (fscanf(f, " in = %[01], out = %[01]", in, out) == 2) {
printf("%s - %s\n", in, out);
}
printf("\n");
return 0;
}
So basically you want to filter '0' and '1'? In this case fgets and a simple loop will be enough: just count the number of 0's and 1's and null-terminate the string at the end:
#include <stdio.h>
int main(void)
{
char str[50];
char *ptr;
// Replace stdin with your file
while ((ptr = fgets(str, sizeof str, stdin)))
{
int count = 0;
while (*ptr != '\0')
{
if ((*ptr >= '0') && (*ptr <= '1'))
{
str[count++] = *ptr;
}
ptr++;
}
str[count] = '\0';
puts(str);
}
}

Searching for how many times a word appears in a text file in C

I am new to C and pointers, so it is still confusing as hell! Below is the code of a function with the main purpose of finding how many times a word appears on a text file. Any help will be appreciated!
void count_occurrences (int n, FILE *file, Entry *entries) {
file = fopen("test/flicka.txt", "r");
if (file != NULL) {
char buff[LINE_MAX_CHARS];
int i = 0;
char * haystack = fgets(buff, 1000, file);
char * needle = NULL;
char * p = NULL;
while (haystack != NULL) {
for (i; i < n; i++) {
needle = entries[i].string;
while ( (p = strstr(haystack, needle)) != NULL) {
entries[i].count++;
p++;
}
}
haystack = fgets(buff, 1000, file);
i = 0;
}
fclose(file);
}
else {
printf("File not found!\n");
}
}
The problem with an exercise like this is that the best way of solving the specific problem - a character-based state machine attached to the stream - doesn't scale up to larger problems.
To do it first way, you maintain a "parse position" which is initially zero. You then call fgetc() in a loop until data runs out and you get EOF. If the character matches the character at the parse position, increment the parse position, if the parse position goes to the end of the string, you have a match, so increment the count. If it doesn't, reset the parse position to zero or one depending on whether the first character matches.
The first way is fast and easy, but inflexible.
A more scaleable way is on line-based input. Call fgets with a big buffer if you know lines must be short, or build a "getline" if lines are unbounded. Then call strstr on the line to see if you have a match. If you have a match, you need to increment the pointer and check for another.
The scaleable way separates the parse from the IO and allows you to search for multiple patterns. Pseudo-code
while(line = getline() )
{
N += countwords(line, "myword");
}
int countwords(line, word)
{
ptr = line;
while(strstr(ptr, word))
{
ptr = strstr(ptr, word) + strlen(word); // replace strlen with 1 to allow overlaps
answer++;
}
}
Obviously you now need to modify the main loop to search for several words, keeping an array of Ns and calling repeated with each word. But it scales up to any sort of pattern matching.

Using fgetc to pass only part of a text file to a buffer

I have the following text file:
13.69 (s, 1H), 11.09 (s, 1H).
So far I can quite happily use either fgets or fgetc to pass all text to a buffer as follows:
char* data;
data = malloc(sizeof(char) * 100);
int c;
int n = 0;
FILE* inptr = NULL;
inptr = fopen("NMR", "r");
if(NULL == fopen("NMR", "r"))
{
printf("Error: could not open file\n");
return 1;
}
for (c = fgetc(inptr); c != EOF && c != '\n'; c = fgetc(inptr))
{
data[n++] = c;
}
for (int i = 0, n = 100; i < n; i++)
{
printf ("%c", data[i]);
}
printf("\n");
and then print the buffer to the screen afterwards. However, I am only looking to pass part of the textfile to the buffer, namely:
13.69 (s, 1H),
So this means I want fgetc to stop after ','. However, this means the that the text will stop at 13.69 (s, and not 13.69 (s, 1H),
Is there a way around this? I have also experimented with fgets and then using strstr as follows:
char needle[4] = ")";
char* ret;
ret = strstr(data, needle);
printf("The substring is: %s\n", ret);
However, the output from this is:
), 11.09 (s, 1H)
thus giving me the rest of the string which I do not want. It's an interesting one and if anyone has any tips it would be much appreciated!
If you know that the closing parenthesis is the last character you want, you can use that as your stopping point in the fgetc() loop:
char data[100]; //No need to dynamically allocate if we know the size at compile time
int c;
int n = 0;
FILE* inptr = NULL;
inptr = fopen("NMR", "r");
if(inptr == NULL) //We want to check the value of the file we just opened
{ //and plan to use
printf("Error: could not open file\n");
return 1;
}
//We'll keep the original value guards (EOF and '\n') below and add two more
//to make sure we break from the loop
//We use n<98 below to make sure we can always create a null-terminated string,
//If we used 99, the 100th character might be a ')', then we have no room for a
//terminating null-char
for (c = fgetc(inptr); c != ')' && n < 98 && c != EOF && c != '\n'; c = fgetc(inptr))
{
data[n++] = c;
}
if(c != ')') //We hit EOF, \n, or ran out of space in data[]
{
printf("Error: no matching sequence found\n");
return 2;
}
data[n]=')'; //Could also write data[n]=c here, since we know it's a ')'
data[n+1]='\0'; //Add the terminating null character
printf("%s\n",data); //Since it's a properly formatted string, we can use %s
(Note that this example will handle null input characters differently from yours. If you expect null characters to be in the input stream (NMR file) then change the printf("%s",...) line back to the for loop you originally had.
Well with only one example of the format you are trying to parse it's not totally possible to give an answer, however if your input is always like this I would simply have a counter and break after the second comma.
int comma = 0;
for (c = fgetc(inptr); c != EOF && c != '\n' && c != ',' && comma < 1; c = fgetc(inptr))
{
if (data[n] = ',')
comma++;
data[n++] = c;
}
In case the characters inside the parenthesis can be more complex I would simply maintain a boolean state to know if I am actually inside or outside a parenthesis and break when I read a comma outside of it.
Simply read using fgets and store desired string in char * using sscanf-
char *new_data;
new_data=malloc(100); // allocate memory
...
fgets(data,100,inptr); // read from file but check its return
sscanf(data,"%[^)]",new_data); // store string untill ')' in new_data from data
strcat(new_data,")"); // concatenating new_data and ")"
printf("%s",new_data); // print new_data
...
free(new_data); // remember to free memory
Also you should check return of malloc though not done in my example and also close the file opened .

How to store the even lines of a file to one array and the odd lines to another

I am given a file of DNA sequences and asked to compare all of the sequences with each other and delete the sequences that are not unique. The file I am working with is in fasta format so the odd lines are the headers and the even lines are the sequences that I want to compare. SO I am trying to store the even lines in one array and the odd lines in another. I am very new to C so I'm not sure where to begin. I figured out how to store the whole file in one array like this:
int main(){
int total_seq = 50;
char seq[100];
char line[total_seq][100];
FILE *dna_file;
dna_file = fopen("inabc.fasta", "r");
if (dna_file==NULL){
printf("Error");
}
while(fgets(seq, sizeof seq, dna_file)){
strcpy(line[i], seq);
printf("%s", seq);
i++;
}
}
fclose(dna_file);
return 0;
}
I was thinking I would have to incorporate some sort of code that looked like this:
for (i = 0; i < rows; i++){
if (i % 2 == 0) header[i/2] = getline();
else seq[i/2] = getline();
but I'm not sure how to implement it.
Any help would be greatly appreciated!
To store the even lines of a file to one array and the odd lines to another,
read each char and swap output files when '\n' encountered.
void Split(FILE *even, FILE* odd, FILE *source) {
int evenflag = 1;
int ch;
while ((ch = fgetc(source)) != EOF) {
if (evenflag) {
fputc(ch, even);
} else {
fputc(ch, odd);
}
if (ch == '\n') {
evenflag = !evenflag;
}
}
}
It is not clear if this post also requires code to do the unique filtering step.
Could you please give me an example of the data in the file?
Am I right in thinking it'd be something like:
Header
Sequence
Header
Sequence
And so on
Perhaps you could do something like this:
int main(){
int total_seq = 50;
char seq[100];
char line[total_seq][100];
FILE *dna_file;
dna_file = fopen("inabc.fasta", "r");
if (dna_file==NULL){
printf("Error");
}
// Put this in an else statement
int counter = 1;
while(fgets(seq, sizeof seq, dna_file)){
// If counter is odd
// Place next line read in headers array
// If counter is even
// Place next line read in sequence array
// Increment counter
}
// Now you have all the sequences & headers. Remove any duplicates
// Foreach number of elements in 'sequence' array - referenced by, e.g. 'j' where 'j' starts at 0
// Foreach number of elements in 'sequence' array - referenced by 'k' - Where 'k' Starts at 'j + 1'
// IF (sequence[j] != '~') So if its not our chosen escape character
// IF (sequence[j] == sequence[k]) (I think you'd have to use strcmp for this?)
// SET sequence[k] = '~';
// SET header[k] = '~';
// END IF
// END IF
// END FOR
// END FOR
}
// You'd then need an algorithm to run through the arrays. If a '~' is found. Move the following non tilda/sequence down to its position, and so on.
// EDIT: Infact. It would probably be easier if when writing back to file, just ignore/don't write if sequence[x] == '~' (where 'x' iterates through all)
// Finally write back to file
fclose(dna_file);
return 0;
}
First: write a function that counts the number of newline (\n) characters in the file.
Then write a function that searches for the n-th newline
Last, write a function to go through and read from one '\n' to the next.
Alternately, you could just go online and read about string parsing.

Same Array in Different Procedures

I'm really new to C, and currently I'm trying to read in from a file which contains a list of names, and import that into an array. The current array is of type char[][] since it will have more information than just the name, but essentially I want team[0][0] to be the first name i read in, team[1][0] to be the second, etc. I'm pretty sure the actual importing of the names is correct, but I'm having problems storing these arrays.
FILE *teamfile;
teamfile = fopen(file, "r");
char line[MAXLENGTH+1];
int i = 0;
while( fgets(line, sizeof line, teamfile) != NULL )
{
trim_line(line);
strcpy(&team[i][NAME],line);
i++;
}
fclose(teamfile);
Which is called from the main function as teams = teamlist(argv[1], team);
But when I try to refer to the array from elsewhere in my program eg printf(&team[0][0]) it outputs what seems to be all names in one block...
What am I doing wrong?
edit:
static void trim_line(char line[])
{
int i = 0;
// LOOP UNTIL WE REACH THE END OF line
while(line[i] != '\0')
{
// CHECK FOR CARRIAGE-RETURN OR NEWLINE
if( line[i] == '\r' || line[i] == '\n' )
{
line[i] = '\0'; // overwrite with nul-byte
break; // leave the loop early
}
i = i+1; // iterate through character array
}
}
thanks for the help so far! :D
if team is declared as char team[NUM_OF_TEAMS][LENGHT_OF_NAME]
then it should always be strcpy(&team[i],line);
Hint: it is a char array, not a "string object" in C

Resources