Need help parsing data from .csv file C - c

I have the following .csv file containing information about the song, artist, release year (if specified) and number of listens:
Look What The Cat Dragged In,Poison,,Look What The Cat Dragged In by Poison,1,0,1,0
Nothin' But A Good Time,Poison,1988,Nothin' But A Good Time by Poison,1,1,21,21
Something To Believe In,Poison,1990,Something To Believe In by Poison,1,1,1,1
Talk Dirty To Me,Poison,1978,Talk Dirty To Me by Poison,1,1,1,1
A Salty Dog,Procol Harum,1969,A Salty Dog by Procol Harum,1,1,1,1
A Whiter Shade of Pale,Procol Harum,1967,A Whiter Shade of Pale by Procol Harum,1,1,3,3
Blurry,Puddle of Mudd,2001,Blurry by Puddle of Mudd,1,1,1,1
Amie,Pure Prairie League,,Amie by Pure Prairie League,1,0,4,0
Another One Bites the Dust,Queen,1980,Another One Bites the Dust by Queen,1,1,102,102
Bicycle Race,Queen,1978,Bicycle Race by Queen,1,1,3,3
Kiss You All Over,Kiss,1978,Kiss You All Over by Kiss,1,1,5,5
The name of the file and the desired year should be given as command line arguments, and the program should print all songs from that specific year.
e.g.: ./a.out music.csv 1978
Output:
Talk dirty to me
Bicycle Race
Kiss You All Over
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX 300
typedef struct {
char song[101], *artist, *line;
long int year;
} music;
int checkYear(char *word)
{
for (int i = 0; i < strlen(word); i++) {
if (!isdigit(word[i]))
return 0;
}
return 1;
}
int main(int argc, char **argv)
{
FILE *fin = fopen(argv[1], "r");
if (!fin)
{
printf("Error opening the file.\n");
return 1;
}
char buf[MAX];
//int nLines = 0; //count the number of lines
//music *array = NULL;
while( fgets(buf, MAX, fin))
{
buf[strcspn(buf, "\n")] = '\0'; // strip the trailing newline
char *word = strtok(buf, ",");
while (word)
{
//printf("Word is : %s\n", word);
if (checkYear(word))
{
//printf("Year : %s\n", word);
music *array = (music *)malloc(sizeof(music));
char *p;
array->year = strtol(word, &p, 10);
if (array->year == atoi(argv[2]))
{
//printf("Year : %ld\t%d\n", array->year, atoi(argv[2]));
if (scanf("%100[^,]", array->song) == 1)
{
printf("Song : %s\n", array->song);
}
}
}
word = strtok(NULL, ",");
}
}
//printf("I've read %d lines\n", nLines);
fclose(fin);
return 0;
}
So far, it's going decent, I can extract the specified year from each line, but now I just need to print the name of the song from those lines (the first token on the line). I thought about using scanf("%[^,]") to read and print everything up until the first comma but it's just stuck in an endless loop. Could you give me an idea? Thanks in advance!

There are multiple problems in the code:
you do not check that enough arguments were passed on the command line, potentially invoking undefined behavior if not.
you do not need to allocate a music structure: you can just parse the first 3 fields, check the year and output the name of the song directly.
strtok() is inappropriate to split fields from a csv file because it treats a sequence of separators as a single separator, which is incorrect and causes invalid parsing if some fields are empty.
sscanf("%[^,]", ...) will fail to convert an empty field.
To split the fields from the csv line, I recommend you use a utility function that behaves like strtok_r() but tailored for csv lines. A simplistic version will stop on , and \n and replace these with a null byte, returning the initial pointer and updating the pointer for the next field. A more advanced version would also handle quotes.
Here is a modified version:
#include <stdio.h>
#include <string.h>
#define MAX 300
char *get_field(char **pp) {
char *p, *start;
for (p = start = *pp; *p; p++) {
if (*p == ',' || *p == '\n') {
*p++ = '\0';
break;
}
}
*pp = p;
return start;
}
int main(int argc, char *argv[]) {
char buf[MAX];
FILE *fin;
char *filename;
char *select_year;
if (argc < 3) {
printf("Missing arguments\n");
return 1;
}
filename = argv[1];
select_year = argv[2];
fin = fopen(filename, "r");
if (!fin) {
printf("Error opening the file %s.\n", filename);
return 1;
}
while (fgets(buf, sizeof buf, fin)) {
char *p = buf;
char *song = get_field(&p);
char *artist = get_field(&p);
char *year = get_field(&p);
if (!strcmp(year, target_year)) {
printf("%s\n", song);
}
}
fclose(fin);
return 0;
}

regarding: scanf("%[^,]") this consumes (upto but not including) the comma.
So the next instruction needs to be something like getchar() to consume the comma. Otherwise, on the next loop nothing will be read because the first character in stdin is that same comma.

Related

Comparing user input with text in a file

The below code can output what is inside my file. I am trying to find a way to compare if the user input word/character is included in the text file. For instance, if the user writes "r" then, the program finds all the words that have an "r" in the file and output them. After that, I want to replace this word with something, so instead of "r", make it "k". For example, "roadtrip" --> "koadtrip".
The text file has a lot of words line by line , a screenshot of a small part
#define MAX 1024
int main() {
FILE* myFile = fopen("C:\\Users\\Luther\\Desktop\\txtfiles\\words.txt", "r+");
char inputWord[MAX];
char lineBuffer[MAX];
if (myFile1 == NULL)
{
printf("File Does Not Exist \n");
return 1;
}
printf("Enter the word \n");
fgets(inputWord, MAX, stdin);
while (!feof(myFile1))
{
char lineBuffer[1024];
fscanf(myFile1, "%1024[^\n]\n", lineBuffer);
//printf("%s\n", lineBuffer);
while (fgets(lineBuffer, MAX, myFile)) {
if (strstr(lineBuffer, inputWord))
puts(lineBuffer);
}
}
}
I 've managed to make it work and now the program outputs regarding the user input. If a word is the same or part of it is found in the text file, then it prints the word. Look the screenshot below:
Now I am looking for a way to replace the word. For instance, in this specific situation, the word the user inputted is "es" and then all the words that have "es" as a part of them are printed. Is there a way that I can replace for all occasions the "es" and make it "er". Then save the changes in another file without changing anything in the original file.
Here is something to use as a start:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
int main (int argc, char *argv[])
{
FILE *fp = fopen(argc > 1 ? argv[1] : "/etc/motd", "r");
char *line = NULL;
char *p = NULL;
char *needle = argv[2];
char *replace = argv[3];
size_t len = 0;
ssize_t read;
assert(fp);
while ((read = getline(&line, &len, fp)) != -1) {
if (line[0] != '#') {
if ((p = strstr(line, needle))) {
printf("%.*s%s%s", (int)(p - line), line, replace, p+strlen(replace));
} else {
printf("%s", line);
}
}
}
free(line);
fclose(fp);
return 0;
}
Note: this may not handle all edge cases. Also writing back to a file or renaming to
original is left as an exercise :)
Some other starting point
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<regex.h>
#include<sys/types.h>
int main (){
//open file
FILE *file_pointer = fopen("./test_txt.txt", "r");
const char* search_for = "3_hau_gbs";
int line_number = 1;
char* line = NULL;
size_t len = 0;
regex_t regex;
int failed = regcomp(&regex, search_for, REG_EXTENDED);
//You are serching bitwise, so you must first semicompile it
if(failed){
regfree(&regex);
} else {
while(getline(&line, &len, file_pointer) != -1){
//go line after line and check if it include the word you
//you are looking for
int match = regexec(&regex, line, 0, NULL, 0);
if(!match){
//when find so output
printf("%d:%s",line_number, line);
}
line_number++;
}
if(line){
free(line);
}
regfree(&regex);
fclose(file_pointer);
}
}

Find number of occurrences for the substring in a string using C programming

I am trying a program in c to read a text file that contains array of characters or a string and find the number of occurrences of the substring called "GLROX" and say sequence found when it is found. And the "inputGLORX.txt" contains following string inside it.
GLAAAROBBBBBBXGLROXGLROXGLROXGLROXGLCCCCCCCCCCCCCCROXGGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROX
But i am getting wierd results. It would be great if some expert in C-programming helps me to solve this and thanks in advance.
#include <stdio.h>
#include <conio.h>
#include <string.h>
#define NUMBER_OF_STRINGS 40
#define MAX_STRING_SIZE 7
void seqFound()
{
printf("Sequence Found\n");
}
int main()
{
FILE *fp;
char buff[1000];
char strptrArr[NUMBER_OF_STRINGS] [MAX_STRING_SIZE];
const char *search = "GLROX";
fp = fopen("D:/CandC++/inputGLORX.txt", "r");
if(fp==NULL)
printf("It is a null pointer");
while(!feof(fp))
{
//fscanf(fp, "%s", buff);
fgets(buff, 1000,fp);
}
int len = strlen(buff);
printf("length is %d\n",len);
int count = 0;
char *store;
while(store = strstr(buff, search))
{
printf("substring is %s \n",store);
count++;
search++;
}
printf("count is %d\n",count);
while (count!=0) {
seqFound();
count--;
}
return 0;
}
As said in the comment, their are at least 2 problems in the code: your fgets will only fetch the last line (if it fetch one at all ? In any case, this is not what you want), and you are incrementing the search string instead of the buff string.
Something like this should fix most of your problems, as long as no lines in your file are longer than 999 characters. This will not work properly if you use the \n or NULL characters in your search string.
int count = 0;
while (fgets(buff, 1000, fp) != NULL)
{
char *temp = buff;
while ((temp = strstr(temp, search)))
{
printf("%d. %s\n", count + 1, temp);
count++;
temp++;
}
}
Here is a main for testing. I used argv to provide the input.txt and the search string.
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char buff[1000];
char *search;
if (argc < 3)
return (-1);
search = argv[2];
if (search[0] == '\0')
return (-1);
if ((fp = fopen(argv[1], "r")) == NULL)
return (-1);
int count = 0;
while (fgets(buff, 1000, fp) != NULL)
{
char *temp = buff;
while ((temp = strstr(temp, search)))
{
printf("%d. %s\n", count + 1, temp);
count++;
temp++;
}
}
printf("Match found: %d\n", count);
return 0;
}
The way you search in buff is wrong, i.e. this code:
while(store = strstr(buff, search))
{
printf("substring is %s \n",store);
count++;
search++; // <------- ups
}
When you have a hit, you change search, i.e. the string you are looking for. That's not what you want. The search string (aka the needle) shall be the same all the time. Instead you want to move forward in the buffer buff so that you can search in the remainder of the buffer.
That could be something like:
int main()
{
const char* buff = "GLAAAROBBBBBBXGLROXGLROXGLROXGLROXGLCCCCCCCCCCCCCCROXGGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROX";
const char* search = "GLROX";
const char* remBuff = buff; // Pointer to the remainder of buff
// Initialized to be the whole buffer
const char* hit;
int cnt = 0;
while((hit = strstr(remBuff, search))) // Search in the remainder of buff
{
++cnt;
remBuff = hit + 1; // Update the remainder pointer so it points just 1 char
// after the current hit
}
printf("Found substring %d times\n", cnt);
return 0;
}
Output:
Found substring 15 times

fgets is returning a blank screen

I am new to C, this is my first project and have been teaching myself. Within my program, one of my functions needs to read a line from a file, and store it in a char array. When I trace the program with gdb the array (line[]) is simply zeros. This leads to my program returning the error "Error: a line in the asset file lacks a ':' separator\n"
Here is my code:
//return the line number (0 based) that the cmd is on, -1 if absent
int locateCmd(char cmd[]) {
int lineIndex = -1; //-1, because lineIndex is incramented before the posible return
char cmdTemp[10] = "\0";
//create a compareable cmd with correct cmd that has its remaining values zeroed out
char cmdCmp[10] = "\0";
memset(cmdCmp, 0, sizeof(cmdCmp));
for (int i = 0; i < strlen(cmd); i++) {
cmdCmp[i] = cmd[i];
}
FILE *file = fopen(ASSET_FILE, "r");
//loop until target line is reached
while (strcmp(cmdTemp, cmdCmp) != 0) {
//check if last line is read
if (lineIndex == lineCounter(file)-1) {
return -1;
}
memset(cmdTemp, 0, sizeof(cmdTemp));
char line[61];
fgets(line, 61, file);
//set cmdTemp to the command on current line
lineIndex++;
for (int i = 0; line[i] != ':'; i++) {
cmdTemp[i] = line[i];
//return error if line doesn't contain a ':'
if (line[i] = '\n') {
printf("Error: a line in the asset file lacks a ':' separator\n");
exit(1);
}
}
}
return lineIndex;
}
Some context, this function is passed a command, and its job is to read a document that appears like this:
command:aBunchOfInfoOnTheComand
anotherCommand:aBunchOfInfoOnTheComand
and pick out the line that the passed command (cmd[]) is stored on.
The issue is with the fgets on line 24. I have separated the relevant portion of this code out into a smaller test program and it works fine.
The test program that works is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (int argc, char *argv[]) {
FILE *file = fopen("tutorInfo.txt", "r");
char line[61];
fgets(line, 61, file);
printf("%s\n", line);
}
The proper exicution of my test program leads me to believe other code in my function is causing the issue, but i'm not sure what. It may be important to note, the problematic code has the same imports as my sample program. Any help would be much appreciated.
As OP didn't provide a Minimal, Complete, and Verifiable example, I have to base my answer on the functional description provided in the question.
I already covered some error and corner cases, but I'm sure I missed some. The approach is also inefficient, as the file is read over and over again, instead of parsing it once and returning a hash/map/directory for easy lookup. In real life code I would use something like GLib instead of wasting my time trying to re-invent the wheel(s)...
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINE_BUFFER_LENGTH 200
unsigned int locateCmd(FILE *fh, const char *key, const char **cmd_line) {
unsigned int found = 0;
size_t key_length = strlen(key);
*cmd_line = NULL;
/* make sure to start read from start of file */
rewind(fh);
unsigned int line_no = 0;
static char buffer[LINE_BUFFER_LENGTH];
while (!feof(fh) && (found == 0)) {
// NOTE: EOF condition will be checked on the next iteration
fgets(buffer, sizeof(buffer), fh);
size_t length = strlen(buffer);
line_no++;
if (buffer[length - 1] != '\n') {
printf("line %u is too long, aborting!\n", line_no);
return(0);
}
if ((strncmp(key, buffer, key_length) == 0) &&
(buffer[key_length] == ':')) {
found = line_no;
buffer[length - 1] = '\0'; // strip line ending
*cmd_line = &buffer[key_length + 1];
}
}
return(found);
}
int main(int argc, char *argv[]) {
FILE *fh = fopen("dummy.txt", "r");
if (!fh) {
perror("file open");
return(1);
}
int ret = 0;
while (--argc > 0) {
const char *cmd;
const char *key = *++argv;
unsigned line_no = locateCmd(fh, key, &cmd);
if (line_no != 0) {
printf("key '%s' found on line %u: %s\n", key, line_no, cmd);
ret = 0;
} else {
printf("key '%s' not found!\n", key);
};
}
if (fclose(fh) != 0) {
perror("file close");
return(1);
}
return(ret);
}
Test input dummy.txt:
command:aBunchOfInfoOnTheComand
anotherCommand:aBunchOfInfoOnTheComand
brokenline
foo:bar
toolong:sadflkjaLKFJASDJFLKASJDFLKSAJ DLFKJ SLDKJFLKASDFASDFKJASKLDJFLKASJDFLKJASDLKFJASLKDFJLKASDJFLKASJDLFKJASDKLFJKLASDJFLKSAJDFLKJASDLKFJKLASDJFLKASJDFKLJASDLKFJLKASDJFLKASJDFLKJSADLKFJASLKDJFLKC
Some test runs:
$ gcc -Wall -o dummy dummy.c
$ ./dummy command foo bar
key 'command' found on line 1: aBunchOfInfoOnTheComand
key 'foo' found on line 5: bar
line 6 is too long, aborting!
key 'bar' not found!

Check if words of an array exist in a txt file in C

I have an array of words:
const char *words[3]={cat,dog,snake,bee};
and a txt file like this one:
apple tree day night story bee oil lemons get fight 234 meow woof safari
jazz stuff what is dog fight street snake garden glass house bee question
foot head 29191 43493 ==
(where we don't know how many lines this file has)
I want to check the whole file and each time I find one of the words of the array to print that word and also print the line where it was found.
I'm having trouble with the comparison. My thought was to save every word of the file into an array and compare each one with the words of the words array. But i cannot do that. I have this:
FILE *f;
const char *arr;
f=fopen("test.txt","r");
while(fscanf(f,"%s",arr)!EOF)
I don't really know what to write here so that that I separate the file into words.
Please be kind to me, I'm only trying to learn.
There are several problems in the code snippets you've provided:
const char *words[3]={cat,dog,snake,bee};
Here you declare an array of 3 elements but you have 4 initializers. And you forgot to put the words between quotes.
Here you use fscanf to read into arr, but you didn't allocate memory, arr is not initialized, you probably meant to write char arr[200], 200 being the maximum word length.
FILE *f;
const char *arr;
f=fopen("test.txt","r");
while(fscanf(f,"%s",arr)!EOF)
You want this as base, tough there is still room form improvement:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const char *words[] = { "cat", "dog", "snake", "bee" };
int main()
{
char line[200]; // maximum line size is 200
size_t len = 0;
FILE *f;
f = fopen("test.txt", "r");
if (f == NULL)
{
printf("Can't open file\n");
return 1;
}
int line_no = 0;
while (fgets(line, sizeof line, f))
{
++line_no;
// (sizeof words)/sizeof *words is the the number of words in the words array
for (int i = 0; i < (sizeof words)/sizeof *words; i++)
{
if (strstr(line, words[i]) != NULL)
{
printf("found %s in line %d\n", words[i], line_no);
}
}
}
fclose(f);
}
You are using fscanf() to read the words out of your file, which is not the best way to do this. You should use getline(3) or fgets(3) to read each line of your file.
Additionally, this line:
const char *words[3]={cat,dog,snake,bee};
Needs to be able to hold 4 char* pointers, not 3. You will also need to include quotes with these string literals. This is another way to do this:
const char *words[] = {"cat", "dog", "snake", "bee"};
Then to get the size of this array, just use sizeof(x) / sizeof(x[0]).
Furthermore, in this code segment:
FILE *f;
const char *arr;
f=fopen("test.txt","r");
while(fscanf(f,"%s",arr)!EOF)
You are using fscanf() on an uninitialized pointer, which leads to many problems. If you wish to use a pointer, you may need to dynamically allocate arr on the heap with malloc(3). If you don't wish to do this, just declare a VLA, such as char arr[200]. Also fscanf() returns number of items scanned, so fscanf(f,"%s",arr)!=EOF will have to be replaced with fscanf(f,"%s",arr)==1, to ensure one word is being read at a time.
Note: You should also check if FILE *f was opened correctly, as it can return NULL on error.
I'm having trouble with the comparison. My thought was to save every word of the file into an array and compare each one with the words of the words array.
As others have mentioned to use strstr(3), another possible option is to use strtok(3) to parse each word on the line, then use strcmp(3) to compare words[i] with the word parsed from the file. If words[] becomes bigger in the future, I would suggest using binary search instead of linear search to compare the words. This will improve you search time from O(n) to O(logn).
Here is some (modified) code I wrote before which does something similar:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRAYSIZE(x) (sizeof x / sizeof x[0])
int main(void) {
const char *words[] = {"cat", "dog", "snake", "bee"};
FILE *fptr;
char *line = NULL, *word = NULL;
const char *delim = " \n";
size_t len = 0, lineno = 0;
ssize_t read;
fptr = fopen("somewords.txt", "r");
if (fptr == NULL) {
fprintf(stderr, "Error reading file\n");
exit(EXIT_FAILURE);
}
while ((read = getline(&line, &len, fptr)) != -1) {
lineno++;
word = strtok(line, delim);
while (word != NULL) {
for (size_t i = 0; i < ARRAYSIZE(words); i++) {
if (strcmp(word, words[i]) == 0) {
printf("Found matched word: %s, Line number: %zu\n", word, lineno);
}
}
word = strtok(NULL, delim);
}
}
free(line);
fclose(fptr);
return 0;
}
Use getline & strstr
char *line = NULL;
size_t len = 0;
ssize_t read;
int line_no = 0;
while ((read = getline(&line, &len, f)) != -1)
{
++line_no;
for (int i = 0; i < 3; i++) {
if (strstr(line, words[i]) != null)
{
// if matched
}
}
}

Program to read words from a file and count their occurrence in the file

I'm currently trying to make a program that will read a file find each unique word and count the number of times that word appears in the file. What I have currently ask the user for a word and searches the file for the number of times that word appears. However I need the program to read the file by itself instead of asking the user for an individual word.
This is what I have currently:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[])
{
int num =0;
char word[2000];
char *string;
FILE *in_file = fopen("words.txt", "r");
if (in_file == NULL)
{
printf("Error file missing\n");
exit(-1);
}
scanf("%s",word);
printf("%s\n", word);
while(!feof(in_file))//this loop searches the for the current word
{
fscanf(in_file,"%s",string);
if(!strcmp(string,word))//if match found increment num
num++;
}
printf("we found the word %s in the file %d times\n",word,num );
return 0;
}
I just need some help figuring out how to read the file for unique words (words it hasn't checked for yet) although any other suggestions for my program will be appreciated.
If you want to print every line contained in the file just once, you have to save the strings you have read in a given data structure. For example, a sorted array could do the trick. The code might look as follow:
#include <stddef.h>
size_t numberOfLine = getNumberOfLine (file);
char **previousStrings = allocArray (numberOfLine, maxStringSize);
size_t i;
for (i = 0; i < numberOfLine; i++)
{
char *currentString = readNextLine (file);
if (!containString (previousStrings, currentString))
{
printString (currentString);
insertString (previousStrings, currentString);
}
}
You may use binary search to code the functions containString and insertString in an efficient way. See here for further informations.
You have to split your code into functions (subroutines).
One function would read the file and record all words; the other would count the number of occurrences for each word.
int main(int argc, char const *argv[])
{
char *words[2000];
// Read the file; store all words in the list
int number_of_words = ReadWords("words.txt", words, 2000);
// Now count and print the number of occurrences for each word
for (int i = 0; i < number_of_words; i++)
{
int n = CountOccurrences(words[i], "words.txt");
printf("we found the word %s in the file %d times\n", words[i], n);
}
// Deallocate dynamically allocated memory
Cleanup(words, number_of_words);
}
Note how the main function is relatively short. All the details are in the functions ReadWords and CountOccurrences.
To implement reading all words from a file:
int ReadWords(const char *filename, char *words[], int max_number_of_words)
{
FILE *f = fopen(filename, "rt"); // checking for NULL is boring; i omit it
int i;
char temp[100]; // assuming the words cannot be too long
for (i = 0; i < max_number_of_words; ++i)
{
// Read a word from the file
if (fscanf(f, "%s", temp) != 1)
break;
// note: "!=1" checks for end-of-file; using feof for that is usually a bug
// Allocate memory for the word, because temp is too temporary
words[i] = strdup(temp);
}
fclose(f);
// The result of this function is the number of words in the file
return i;
}
`#include <stdio.h>
#include <stdlib.h>
int main(int argc, char*argv[])
{
int num =0;
char word[2000];
char string[30];
FILE *in_file = fopen(argv[1], "r");
if (in_file == NULL)
{
printf("Error file missing\n");
exit(-1);
}
scanf("%s",word);
printf("%s\n", word);
while(!feof(in_file))//this loop searches the for the current word
{
fscanf(in_file,"%s",string);
if(!strcmp(string,word))//if match found increment num
num++;
}
printf("we found the word %s in the file %d times\n",word,num );
return 0;
}`
if any suggestion plz..most welcome
Blockquote

Resources