Splitting by comma doesn't work as expected

Splitting by comma doesn't work as expected - c

I Read some data from a text file, I am trying to iterate line by line and split by comma, and I ignore lines that starts with #, here is the text file content:
#this is the simulation file for your exercise, please read it carefully.
#every line that begins with a pound sign [now known as "the" hashtag (#)] is a comment line. you can automatically skip it.
#here are a few examples.
#there will be 5 categories : Comedy, Adventure, Educational, SciFi, Fantasy.
#it is recommended that when you save in the main program, that you follow this convetion.
#the input syntax :
#id,book name,author,pages,yearofpublishing,category
CNV301,Treasure Island,Robert Louis Stevenson,304,1882,Adventure
8T88FF,Heir to The Empire,Timothy Zahn,416,1992,SciFi
911MAR10,Plumbing for Dummies,Gene Hamilton,242,1999,Educational
6U754E,Berserk,Kenturo Miura,224,1989,Fantasy
7R011,The Troll Cookbook : Human Delights,Underchief Trogdor,7,-35,Educational
M140,Funny Cats,Jean-Claude Suarès,78,1995,Comedy
V269W7,Linus the Vegetarian T. rex,Robert Neubecker,40,2013,Adventure
UFF404,Algebra 3,Nebi Rogen,300,0,Educational
424242,The Hitchhiker's Guide to the Galaxy,Douglas Adams,224,1979,Comedy
#add your own. you can use sites like : http://www.generatedata.com/ to create quick lists.
Here my code:
FILE* file = fopen(filepath, "r");
char line[256] = "";
while (fgets(line, sizeof(line), file) != NULL) {
if (!starts_with(line, "#") && !starts_with(line, " "))
{
if (line[0] == '#' || line[0] == '\n')
continue; // skip the rest of the loop and continue
printf("%s", line);
char* p;
p = strtok(line, ",");
while (p != NULL)
{
//printf("%s\n", p); //<-- line*******
p = strtok(NULL, ",");
}
}
}
fclose(file);
where:
int starts_with(const char* line, const char* c)
{
size_t lenpre = strlen(c),
lenstr = strlen(line);
return lenstr < lenpre ? 0 : strncmp(c, line, lenpre) == 0;
}
When I run the code and I prints the first line with some weird characters like: ∩╗┐#this is the simulation file for your exercise, please read it carefully.
if I enables the commented line: //<-- line*******
I get error: "Access violation reading location", I only want to see the splitted values

Related

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!

For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.

First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.

My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

How to parse each column in a CSV file using C

I'm trying to use C to read a CSV file, iterate line by line (until EOF), and delimit/split each line by the comma. Then I wish to separate each column into "bins" and put add them to a struct (which isn't shown here; I defined it in a helper file) based on type.
For example, if I have 1,Bob, I'd like to split 1 and Bob into two variables. Here's what I've written so far.
void readFile(char file[25]) {
FILE *fp;
char line[1000];
fp = fopen(file, "r"))
while(fgets(line, 1000, fp)) {
char* tmp = strdup(line);
char* token;
while((token = strsep(&tmp, ","))) {
printf("%s\n", token); // I want to split token[0] and token[1]
}
}
fclose(fp);
}
T he above code does compile and run. I just don't know how to access each split of the token, like token[0] or token[1]. In python, this would be simple enough. I could just access 1 using token[0] and Bob using token[1] for each line. But here in C, I can't do that.
For testing purposes, all I'm doing right now is printing each line (in the second while loop), just to see how each split looks. I haven't implemented the code where I put each split line into its respective struct member.
I've searched Stack Overflow and found a multitude of threads on this topic. None of them seemed to help me except for this one, which I have drawn from. But I wasn't able to get the storing of split columns working.

In python, this would be simple enough. I could just access 1 using token[0] and Bob using token[1] for each line. But here in C, I can't do that.
Yes, you can, if only you define the array.
while (fgets(line, sizeof line, fp))
{
char *tmp = strchr(line, '\n');
if (tmp) *tmp = '\0'; // remove the '\n'
tmp = strdup(line);
#define MAXCOLUMNS 2
char *token[MAXCOLUMNS];
int c = 0;
while (tmp)
{
if (c == MAXCOLUMNS) puts("too many columns"), exit(1);
token[c++] = strsep(&tmp, ",");
}
if (1 <= c) printf("column 1: %s\n", token[0]);
if (2 <= c) printf("column 2: %s\n", token[1]);
// ONLY if the line's tokens are no longer needed:
free(*token);
}

C - loop and text files

I must write a program which will be changing a words from one text file basic on dictionary from another text file. For example in "test.txt" i have:
"mama poszla z tata na zakupy"
and in "slownik.txt" i have:
"mama:mother,
tata:father,
babcia:grandma,
na:on,"
I expected to my program disply "mother poszla z father on zakupy", but only first word is changed. Below my code fragment in C:
char *token;
int k = 0;
while (!feof(slownik)) //
{
k = 0;
fscanf(slownik,"%s",&liniatekstu);
token = strtok(liniatekstu," ,.:");
while(token != NULL)
{
tab[k] = token;
// printf("%s\n", tab[k]);
token = strtok(NULL," ,.:");
k = k + 1;
}
char c;
char slowo[1000];
int idx = 0;
while(!feof(fp))
{
c = fgetc(fp); // get sign
if( ! isspace(c) )
{ // full sign - add to word
slowo[idx++] = c;
if(idx>=1000)
{
printf("Error - word has > 1000 signs\n");
}
}
else
{ // blank sign - end of word
// if(idx == 0) // idx=0 word is empty
// continue;
// we have final word
// - add zero to end of word and display to screen
slowo[idx] = 0;
// printf("%s\n", slowo);
// TU MAM SLOWO
const char* x = tab[0]; // polish version of word
const char* y = tab[1]; // english version of word
if ( strcmp(slowo,x) == 0) // comparation word from "test.txt" and "slownik.txt" if its the same display english version of word
{
printf("%s ",y);
}
else
{
printf("%s ",slowo); // display polish version
}
idx = 0;
}
}
}
Please help.

Working with string is not a very easy work in c language for newcomer. For good programming first write down your requirement and then generate an algorithm from it. Once your algorithm is ready start coding based on that algorithm. If I look into your code you are most of the time just doing hit and try to fix your problem. This will not only creating more trouble for you but also raise frustrating as well. See my program below and compare with your code and find out you mistakes. Hope you will following my advice in future.
void main()
{
FILE *fpointer_s, *fpointer_d;
fpointer_s = fopen("test.txt","r");
fpointer_d = fopen("slownik.txt","r");
if(fpointer_s != NULL && fpointer_d != NULL)
{
printf("Dictionary job starting....\n");
}
else
{
printf("File does not exist....\n");
return;
}
//FILEs are OPENED
char line[255];
char dictionary[1025];//for dictionary file
char text[1025];//for text file
char delim[2]=" ";
memset(text,0,sizeof(text));
while(!feof(fpointer_d) && fgets(line,sizeof line,fpointer_d))
{
strcat(dictionary,line);//we are loading the dictionary here
}
memset(line,0,sizeof(line));//clear line to read next file
//now read the next file line by line
while(!feof(fpointer_s) && fgets(line,sizeof line,fpointer_s))
{
char *word = strtok(line,delim);
do
{
char *found = strstr(dictionary,word);//check if the word available in dictionary
char tword[20];//variable to store translated word
int i = 0;
if (found)//if the word found in dictionary use the translated word i.e. tword
{
found = found + strlen(word)+1;//pointing to the English equivalent
memset(tword,0,sizeof(tword));//clear previous value
while(*found !=',' && *found !='\n' && *found !=NULL )//copy character by character till end of English word
tword[i++] = *found++;
tword[i]=0;//assign end of string character
if(strlen(text)> 0)
strcat(text," ");
strcat(text,tword);
}//end if
else//if word not found in dictionary just add the original word
{
if(strlen(text)> 0)
strcat(text," ");
strcat(text,word);
}
word = strtok(NULL,delim);
}while(word);
}
//finally we translated the text into english
printf("%s\n",text);
}
Also use below header files as well
stdio.h,stdlib.h,string.h

Parsing item list line by line then character by character

I have to parse a game file that has this format:
ItemID = 3288 # This is a comments and begins with '#' character.
Name = "a magic sword"
Description = "It has some ancient runic inscriptions."
Flags = {MultiUse,Take,Weapon}
Attributes = {Weight=4200,WeaponType=1,WeaponAttackValue=48,WeaponDefendValue=35}
# A line can also begin with this character and it should be ignored.
and I have to parse it's data and put them into variables. I have tried many things, and I've been told that I will have to read the file line by line, then read each line character by character (so I'm able to read until '#' character) and then read the result word by word following the pattern. I have done this:
void ParseScriptFile(FILE* File) {
char Line[1024];
while (fgets(Line, sizeof(Line), File)) {
}
fclose(File);
}
I think I should read the lines inside the while loop but I don't know how would I read until # character is reached and if it does not exist just continue looping line through line. Is there an easy way to do this?

Use sscanf, like I did two for you
void ParseScriptFile(FILE* File) {
char Line[1024];
int ItemID; // variable to store ItemId
char name[40]; // string to store Name
while (fgets(Line, sizeof(Line), File)) {
sscanf(Line, "ItemID = %d", &ItemID);
sscanf(Line, "Name = %[^n]s", name); // ^n upto newline
}
printf("ItemId= %d\n", ItemID);
printf("Name= %s", name);
fclose(File);
}

Here's more or less workable code. Reading lines with fgets() is correct. You can then eliminate empty lines and comment lines trivially. If the line ends with a comment, you can convert the # into a null byte to ignore the comment. Then you need to scan for the entries name field (assume there are no spaces in the name part, to the left of the equals sign), and the = and the value on the right.
#include <stdio.h>
#include <string.h>
static
void ParseScriptFile(FILE *File)
{
char Line[1024];
while (fgets(Line, sizeof(Line), File))
{
if (Line[0] == '#' || Line[0] == '\n')
continue;
char *comment_start = strchr(Line, '#');
if (comment_start != NULL)
*comment_start = '\0';
char name[64];
char value[1024];
if (sscanf(Line, " %63s = %1023[^\n]", name, value) == 2)
printf("Name = [%s] Value = [%s]\n", name, value);
else
printf("Mal-formed line: [%s]\n", Line);
}
fclose(File);
}
int main(void)
{
ParseScriptFile(stdin);
return 0;
}
The program reads from standard input. An example run from your data file yielded:
Name = [ItemID] Value = [3288 ]
Name = [Name] Value = ["a magic sword"]
Name = [Description] Value = ["It has some ancient runic inscriptions."]
Name = [Flags] Value = [{MultiUse,Take,Weapon}]
Name = [Attributes] Value = [{Weight=4200,WeaponType=1,WeaponAttackValue=48,WeaponDefendValue=35}]
Note the space at the end of the ItemID value; there was a space before the # symbol.
If you need to handle strings that could themselves contain # symbols, you have to work harder (Curse = "You ###$%!" # Language, please!). Parsing an entry such as the value for Attributes is a separate task for a separate function (callable from this one). Indeed, you should be calling one or more functions to process each name/value pair. You probably also need some context passed to the ParseScriptFile() function so that the data can be saved appropriately. You wouldn't want to contaminate clean code with unnecessary global variables, would you?

Read in individual words from text file and translate - C

I am writing a program (for a class assignment) to translate normal words into their pirate equivalents (hi = ahoy).
I have created the dictionary using two arrays of strings and am now trying to translate an input.txt file and put it into an output.txt file. I am able to write to the output file, but it only writes the translated first word over and over on a new line.
I've done a lot of reading/scouring and from what I can tell, using fscanf() to read my input file isn't ideal, but I cannot figure out what would be a better function to use. I need to read the file word by word (separated by space) and also read in each punctuation mark.
Input File:
Hi, excuse me sir, can you help
me find the nearest hotel? I
would like to take a nap and
use the restroom. Then I need
to find a nearby bank and make
a withdrawal.
Miss, how far is it to a local
restaurant or pub?
Output: ahoy (46 times, each on a separate line)
Translate Function:
void Translate(char inputFile[], char outputFile[], char eng[][20], char pir[][20]){
char currentWord[40] = {[0 ... 39] = '\0'};
char word;
FILE *inFile;
FILE *outFile;
int i = 0;
bool match = false;
//open input file
inFile = fopen(inputFile, "r");
//open output file
outFile = fopen(outputFile, "w");
while(fscanf(inFile, "%s1023", currentWord) == 1){
if( ispunct(currentWord) == 0){
while( match != true){
if( strcasecmp(currentWord, eng[i]) == 0 || i<28){ //Finds word in English array
fprintf(outFile, pir[i]); //Puts pirate word corresponding to English word in output file
match = true;
}
else {i++;}
}
match = false;
i=0;
}
else{
fprintf(outFile, &word);//Attempt to handle punctuation which should carry over to output
}
}
}

As you start matching against different english words, i<28 is initially true. Hence the expression <anything> || i<28 is also immediately true and correspondingly the code will behave as though a match was found on the first word in your dictionary.
To avoid this you should handle the "found a match at index i" and the "no match found" condition separately. This can be achieved as follow:
if (i >= dictionary_size) {
// No pirate equivalent, print English word
fprintf(outFile, "%s", currentWord);
break; // stop matching
}
else if (strcasecmp(currentWord, eng[i]) == 0){
...
}
else {i++;}
where dictionary_size would be 28 in your case (based on your attempt at a stop condition with i<28).

Here's a code snippet that I use to parse things out. Here's what it does:
Given this input:
hi, excuse me sir, how are you.
It puts each word into an array of strings based on the DELIMS constant, and deletes any char in the DELIMS const. This will destroy your original input string though. I simply print out the array of strings:
[hi][excuse][me][sir][how][are][you][(null)]
Now this is taking input from stdin, but you can change it around to take it from a file stream. You also might want to consider input limits and such.
#include <stdio.h>
#include <string.h>
#define CHAR_LENGTH 100
const char *DELIMS = " ,.\n";
char *p;
int i;
int parse(char *inputLine, char *arguments[], const char *delimiters)
{
int count = 0;
for (p = strtok(inputLine, delimiters); p != NULL; p = strtok(NULL, delimiters))
{
arguments[count] = p;
count++;
}
return count;
}
int main()
{
char line[1024];
size_t bufferSize = 1024;
char *args[CHAR_LENGTH];
fgets(line, bufferSize, stdin);
int count = parse(line, args, DELIMS);
for (i = 0; i <= count; i++){
printf("[%s]", args[i]);
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Splitting by comma doesn't work as expected - c

Related

C Reading a file of digits separated by commas

How to parse each column in a CSV file using C

C - loop and text files

Parsing item list line by line then character by character

Read in individual words from text file and translate - C

Categories

Resources