Fail to scan integers from a text file - c

I have a text file that looks like that:
coordinate size average Intensity
========== ==== =================
(187,18) 31 217.8
(58,29) 34 212.1
(124,71) 47 216.1
(245,71) 32 197.8
(96,113) 30 191.6
(244,135) 33 199.6
Im trying to receive only the 'size' to a variable but for some reason im not able to do so. This is the code i tried:
FILE * textofGags;
int xx;
fopen_s(&textOfGags,"dust.txt","rt");
fseek(textOfGags,0L,SEEK_SET);
fscanf_s(textOfGags,"%*[^\n]\n,",NULL);
fscanf_s(textOfGags,"%*[^\n]\n,",NULL);
while(fscanf_s(textOfGags,"%d",&xx)==1){
printf("%d",xx);
fscanf_s(textOfGags,"%*[^\n]\n,",NULL);
}
For now im just trying to print in order to see where the problem is but it seems that i cant even receive the number. Can someone point out my mistake?

I suppose you got some downvotes for wrong tagging :-)
Anyway, I think we should help each other.
The code looks a little bit complicated to me, and - even with a debugger - it is probably hard to find out whether an fscanf("%*[^\n]\n," ... moves the file position actually to the right place;
I suggest to read in the file line by line, then analyse each line based on it's specific content, and read in the size. For example, one might use the fact that the size-value is the first integral value after a closing ')', and lines without such a ')' may be ignored.
Hope it helps :-)
int main() {
FILE * textofGags;
int xx;
textofGags = fopen("dust.txt","rt");
if (textofGags) {
char line[1000];
while (fgets(line,1000,textofGags)) {
char *closingBrace = strchr(line, ')');
if (!closingBrace)
continue;
closingBrace++; // first char after the ')'
if (sscanf(closingBrace,"%d",&xx) == 1) {
printf("size: %d \n", xx);
}
}
}
}

Related

I seem to be losing the end of a string after exiting a for loop in c

I seem to be losing the end of my string after the exit of the for loop in the function below. Any help would be very much appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
char  *get_out_file_name(int in_file_num);
int main(int argc, char *argv[])
{
char* file_name = get_out_file_name(37);
printf("name_final = %s \n", file_name);
free (file_name);
}
char *get_out_file_name(int in_file_num)
{
    const int max_name_length = 3;
    const int file_type_length = 4;
    const char *type_suffix;
    type_suffix = ".jpg";
    int num_remainder = in_file_num;
    char * name_final = malloc(sizeof(char) * (max_name_length + file_type_length + 1));
    for (int i = max_name_length - 1; i >= 0 ; i --)
    {
        printf("i = %d\n",i);
        name_final[i] = num_remainder % 10;
        sprintf (&name_final[i], "%d", name_final[i]);
        printf("name_final[i] = %c \n", name_final[i]);
        num_remainder /= 10;
    }
printf("pre concat[1] = %c \n",name_final[1]);
    strcat(name_final, type_suffix);
  //  printf("name_final %s \n", name_final);
    return name_final;
}
The only place the data is retained is in name_final[0], array index 1 and 2 hold no data after the exit of the for loop.
I'm thinking it's possibly sprintf creates it's own local copy in the for loop or that I'm missing a reference/dereference somewhere. I was hoping someone could clarify.
The output is in the screenshot below. Apologies for the quality of the output screen grab. I'm having to use my phone to post.
I've just had chance to work out this function properly and wanted to give an explanation in case anyone stumbles across this in the future.
When I wrote that code I didn't understand how to convert a char by adding 0. I understand now that we don't actually add the value 0. We add the value 48 as this is the ASCII number/index for 0. All this is due to chars being saved as integers by c.

Print the 10 best results of a txt file

I can't find a way to solve this one.
I have a text file like this : score.txt
pat 20
ananna 20
radis 19
The number of lines can be between 10 and anything.
My goal is to print the 10 line where the integers are the higher, in order.
I tried with this, but I can't even read the number in my text file.
void fillScore(){
FILE* f =NULL;
f= fopen("score.txt", "r");
char name[20];
int score,i,j,tmp;
int tabScore[10]={-1,-1,-1,-1,-1,-1,-1,-1,-1,-1};
char tabName[10][20];
if (f==NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
while(fscanf(f,"%s%d",name,&score)==1){
printf("TEST : %d - ",score);
for(i=0;i<10;i++){
if(score>tabScore[i]){
tmp=tabScore[i];
tabScore[i]=score;
for(j=i+1;j<10;j++){
score=tabScore[j];
tabScore[j]=tmp;
tmp=score;
}
}
}
}
for(i=0;i<10;i++){
printf("%d\n",tabScore[i]);
}
fclose(f);
}
Does anybody have a hint on this ? I can get how to do it. I know it's not a great question, because it shows a lack of research, but I swear that I search on the web for hours.
Thanks a lot.
You're testing fscanf against the wrong value 1.
As per man page,
on success, these functions return the number of input items successfully matched and assigned; this can be fewer than provided for, or even zero, in the event of an early matching failure
Since you're asking *scanf() to convert two items, you want to check its return value against 2.
The following works:
...
while(fscanf(f,"%s%d",name,&score) == 2){
...

Inserting word from a text file into a tree in C

I have been encountering a weird problem for the past 2 days and I can't get to solve it yet. I am trying to get words from 2 texts files and add those words to a tree. The methods I choose to get the words are refereed here:
Splitting a text file into words in C.
The function that I use to insert words into a tree is the following:
void InsertWord(typosWords Words, char * w)
{
int error ;
DataType x ;
x.word = w ;
printf(" Trying to insert word : %s \n",x.word );
Tree_Insert(&(Words->WordsRoot),x, &error) ;
if (error)
{
printf("Error Occured \n");
}
}
As mentioned in the link posted , when I am trying to import the words from a text file into the tree , I am getting "Error Occured". For once again the function:
the text file :
a
aaah
aaahh
char this_word[15];
while (fscanf(wordlist, "%14s", this_word) == 1)
{
printf("Latest word that was read: '%s'\n", this_word);
InsertWord(W,this_word);
}
But when I am inserting the exact same words with the following way , it works just fine.
for (i = 0 ; i <=2 ; i++)
{
if (i==0)
InsertWord(W,"a");
if (i==1)
InsertWord(W,"aaah");
if (i==2)
InsertWord(W,"aaahh");
}
That proves the tree's functions works fine , but I can't understand what's happening then.I am debugging for straight 2 days and still can't figure it. Any ideas ?
When you read the words using
char this_word[15];
while (fscanf(wordlist, "%14s", this_word) == 1)
{
printf("Latest word that was read: '%s'\n", this_word);
InsertWord(W,this_word);
}
you are always reusing the same memory buffer for the strings. This means when you do
x.word = w ;
you are ALWAYS storing the SAME address. And every read redefine ALL already stored words, basically corrupting the data structure.
Try changing the char this_word[15]; to char *this_word; and placing a this_word = malloc(15);in the beggining of thewhile` loop instead, making it allocate a new buffer for each iteration. So looking like
char *this_word;
while (fscanf(wordlist, "%14s", this_word) == 1)
{
this_word = malloc(15);
printf("Latest word that was read: '%s'\n", this_word);
InsertWord(W,this_word);
}
As suggested by Michael Walz a strdup(3) also solves the immediate problem.
Of course you will also have do free up the .word elements when finished with the tree.
Seems like the problem was in the assignment of the strings.Strdup seemed to solve the problem !

Filter text from huge .csv files, in C

I have the raw and unfiltered records in a csv file (more than 1000000 records), and I am suppose to filter out those records from a list of files (each weighing more than 282MB; approx. more than 2000000 records). I tried using strstr in C. This is my code:
while (!feof(rawfh)) //loop to read records from raw file
{
j=0; //counter
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh)) //read a line from raw file
{
line[j] = c; line[j+1] = '\0'; j++;
}
//function to extract the element in the specified column, in the CSV
extractcol(line, relcolraw, entry);
printf("\nWorking on : %s", entry);
found=0;
//read a set of 4000 bytes; this is the target file
while( fgets(buffer, 4000, dncfh)!=NULL && !found )
{
if( strstr(buffer, entry) !=NULL) //compare it
found++;
}
rewind(dncfh); //put the file pointer back to the start
// if the record was not found in the target list, write it into another file
if(!found)
{
fprintf(out, "%s,\n", entry); printf(" *** written to filtered ***");
}
else
{
found=0; printf(" *** Found ***");
}
//I hope this is the right way to null out a string
entry[0] = '\0'; line[0] ='\0';
//just to display a # on the screen, to let the user know that the program
//is still alive and running.
rawreccntr++;
if(rawreccntr>=10)
{
printf("#"); rawreccntr=0;
}
}
This program takes approximately 7 to 10 seconds, on an average, to search one entry in the target file (282 MB). So, 10*1000000 = 10000000 seconds :( God knows how much is that going to take if I decide to search in 25 files.
I was thinking of writing a program, and not going to spoon fed solutions (grep, sed etc.). OH, sorry, but I am using Windows 8 (64 bit, 4 GB RAM, AMD processor Radeon 2 core - 1000Mhz). I used DevC++ (gcc) to compile this.
Please enlighten me with your ideas.
Thanks in advance, and sorry if I sound stupid.
Update by Ali, the key information extracted from a comment:
I have a raw CSV file with details for customer's phone number and address. I have the target file(s) in CSV format; the Do Not Call list. I am suppose to write a program to filter out phone number that are not present in the Do No Call List. The phone numbers (for both files) are in the 2nd column. I, however, don't know of any other method. I searched for Boyer-Moore algorithm, however, could not implement that in C. Any suggestions about how should I go about searching for records?
EDITED
I would recommend you have a try with the readymade tools in any Unix/Linux system, grep and awk. You'll probably find they are just as fast and much more easily maintained. I haven't seen your data format, but you say the phone numbers are in the second column, so you can get the phone numbers on their own like this:
awk '{print $2}' DontCallFile.csv
If your phone numbers are in double quotes, you can remove those like this:
awk '{print $2}' DontCallFile.csv | tr -d '"'
Then you can use fgrep with the -f option, to search whether strings listed in one file are present in a second file, like this:
fgrep -f file1.csv file2.csv
or you can invert the search and search for strings NOT present in another file, by adding the -v switch to fgrep.
So, your final command would probably end up like this:
fgrep -v -f <(awk '{print $2}' DontCallFile.csv | tr -d '"') file2.csv
That says... search, in file2.csv for all strings not present (-v option) in column 2 of file "DontCallFile.csv". If you want to understand the bit in <() it is called process substitution and it basically makes a pseudo-file out of the result of running the command inside the brackets. And we need a pseudo-file because fgrep -f expects a file.
ORIGINAL ANSWER
Why are you using fgetc() anyway. Surely you would use getline() like this:
while(getline(myfile,line ))
{
...
}
Are you really reading the whole "target" file from the start for every single line in your main file? That will kill you! And why are you doing it in chunks of 4,000 bytes? And what if one of your strings straddles the 4,000 bytes you compare it with - i.e. the first 8 bytes are in one 4k chunk and the last however many bytes are in the nect 4k chunk?
I think you will get better help on here if you take the time to explain properly what you are trying to do - and maybe do it with awk or grep (at least figuratively) so we can see what you are actually trying to achieve. Your decription doesn't mention the "target" file you use in the code, for example.
You can do this with awk, like this:
awk -F, '
FNR==NR {gsub(/"/,"",$2);dcn[$2]++;next}
{gsub(/ /,"",$2);if(!dcn[$2])print}
' DontCallFile.csv x.csv
That says... the field separator is a comma (-F,). Now read the first file (DontCallFile.csv) and process according to the part in curly braces after FNR==NR. Remove the double quotes from around the phone number in field 2, using gsub (global substitution). Then increment the element in the associative array (i.e. hash) as indexed by unquoted field 2 and then move to next record. So basically, after file "DontCallFile.csv" is processed, the array dcn[] will hold a hash of all the numbers not to call (dcn=dontcallnumbers). Then, the code in the second set of curly braces is executed for each line of the second file ("x.csv"). That says... remove all spaces from around the phone number in field 2. Then, if that phone number is not present in the array dcn[] that we built earlier, print the line.
Here is one idea for improvement...
In the code below, what's the point in setting line[j+1] = '\0' at every iteration?
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh))
{
line[j] = c; line[j+1] = '\0'; j++;
}
You might as well do it outside the loop:
while( (c = fgetc(rawfh))!='\n' && !feof(rawfh))
line[j++] = c;
line[j] = '\0';
My advice is the following.
Put all don't call phone numbers into an array.
Sort this array.
Use binary search to check if a given phone number is among the sorted
don't call numbers.
In the code below, I just hard-coded the numbers. In your application, you will have to replace that with the corresponding code.
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int compare(const void* a, const void* b) {
return (strcmp(*(char **)a, *(char **)b));
}
int binary_search(const char** first, const char** last, const char* val) {
ptrdiff_t len = last - first;
while (len > 0) {
ptrdiff_t half = len >> 1;
const char** middle = first;
middle += half;
if (compare(&*middle, &val)) {
first = middle;
++first;
len = len - half - 1;
}
else
len = half;
}
return first != last && !compare(&val,&*first);
}
int main(int argc, char** argv) {
size_t i;
/* Read _all_ of your don't call phone numbers into an array. */
/* For the sake of the example, I just hard-coded it. */
char* dont_call[] = { "908-444-555", "800-200-400", "987-654-321" };
/* in your program, change length to the number of dont_call numbers actually read. */
size_t length = sizeof dont_call / sizeof dont_call[0];
qsort(dont_call, length, sizeof(char *), compare);
printf("The don\'t call numbers sorted\n");
for (i=0; i<length; ++i)
printf("%lu %s\n", i, dont_call[i]);
/* For each phone number, check if it is in the sorted dont_call list. */
/* Use binary search to check it. */
char* numbers[] = { "999-000-111", "333-444-555", "987-654-321" };
size_t n = sizeof numbers / sizeof numbers[0];
printf("Now checking if we should call a given number\n");
for (i=0; i<n; ++i) {
int should_call = binary_search((const char **)dont_call, (const char **)dont_call+length, numbers[i]);
char* as_text = should_call ? "no" : "yes";
printf("Should we call %s? %s\n",numbers[i], as_text);
}
return 0;
}
This prints:
The don't call numbers sorted
0 800-200-400
1 908-444-555
2 987-654-321
Now checking if we should call a given number
Should we call 999-000-111? yes
Should we call 333-444-555? yes
Should we call 987-654-321? no
The code is definitely not perfect but it is sufficient to get you started.
The problem with your algorithm is complexity. You approach is O(n*m) where n is number of customers and m is number of do_not_call records (or size of file in your case). You need reduce this complexity. (And Boyer-Moore algorithm would not help there which suggested by Ali. It would not improve asymptotic complexity but only constant.) Even binary search as Ali suggest in his answer is not best. It would be O((n+m)*log m). We can do better. Nice solutions are using fgrep and awk as suggested by Mark Setchell in his answers. (I would chose one using fgrep which should perform better I guess but it is only guess.) I can provide one similar solution in Perl which will provide more robust CSV parsing and should handle your data sizes in easy on decent HW. This type of solutions has complexity O(n+m).
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Text::CSV_XS;
use constant PHN_COL_DNC => 1;
use constant PHN_COL_CUSTOMERS => 1;
die "Usage: $0 dnc_file [customers]" unless #ARGV>0;
my $dncfile = shift #ARGV;
my $csv = Text::CSV_XS->new({eol=>"\n", allow_whitespace=>1, binary=>1});
my %dnc;
open my $dnc, '<', $dncfile;
while(my $row = $csv->getline($dnc)){
$dnc{$row->[PHN_COL_DNC]} = undef;
}
close $dnc;
while(my $row = $csv->getline(*ARGV)){
$csv->print(*STDOUT, $row) unless exists $dnc{$row->[PHN_COL_CUSTOMERS]};
}
If it would not meet our performance expectation you can go down to C road but I would definitely recommend use some good csv parsing and hashmap libraries. I would try libcsv and khash.h

Error In C File Handling

I am making a quiz program in c for a school project.I was storing question and answers in a text file.The text file contains 1 question and followed by 4 choices and a correct answer(each in a new line) and so on.The code for file handling is
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<process.h>
void main()
{
int tnum=2,mnum;
printf("Enter a file name to load the quiz from or enter dhruv.txt to load the default file\n");
printf("(For type of file and arrangement of data in it, refer to the documentation\n");
printf("WARNING: An improper quiz file may result in malfunctioning of the program.\n");
char quizfile[100];
scanf("%s",quizfile);
FILE *dj;
dj = fopen(quizfile,"r");
int test = 1;
while(dj == NULL)
{
printf("Requested file does not exist.Please enter a valid name\n");
scanf("%s",quizfile);
dj = fopen(quizfile,"r");
test++;
if(test == 5)
{
exit(0);
}
}
char line[500];
char ques[20][500],ansa[20][500],ansb[20][500],ansc[20][500],ansd[20][500],anse[20][500];
int start = 1,quesval=1,ans1=1,ans2=1,ans3=1,ans4=1,ans5=1;
while(fgets(line,sizeof line,dj) != NULL)
{
if((start%6) == 1)
{
strcpy(ques[quesval],line);
quesval++;
}
if((start%6) == 2)
{
strcpy(ansa[ans1],line);
ans1++;
}
if((start%6) == 3)
{
strcpy(ansb[ans2],line);
ans2++;
}
if((start%6) == 4)
{
strcpy(ansc[ans3],line);
ans3++;
}
if((start%6) == 0)
{
strcpy(anse[ans5],line);
ans5++;
}
if((start%6) == 5)
{
strcpy(ansd[ans4],line);
ans4++;
}
start++;
}
fclose(dj);
printf("Quiz file successfully loaded\n");
printf("/t/t WELCOME TO THE QUIZ\n\n");
printf("Every team must select one of the four correct answers to the asked questions to gain points\n");
printf("Wrong answer will not be penalized\n");
for(int k =1;k<quesval;k++)
{
int myvar;
myvar = k%tnum;
if(myvar == 0)
{
myvar = tnum;
}
printf("Question for TEAM %d\n\n",myvar);
printf("%s \n A.%s B.%s C.%s D.%s\n",ques[k],ansa[k],ansb[k],ansc[k],ansd[k]);
}
getch();
}
The problem is
if((start%6) == 0)
{
strcpy(anse[ans5],line);
ans5++;
}
The program shows File does not exist if i use this but as soon as i comment it out the program works fine.I don't know what the error is.Please do help
EDIT:My text file looks like:
Who is the owner
dhruv
jain
kalio
polika
dhruv
who is his friend
sarika
katrina
jen
aarushi
aarushi
where is he
home
office
college
toilet
office
where will he go
home
office
college
toilet
home
EDIT
I am using Turbo c++ in windows 7 using DOSBOX..The script is updated above
It's difficult to say without seeing your input file, but I suspect that your array declarations are backwards. For example, you have:
char ques[500][20];
This declares an array of 500 elements, where each element can be up to 20 characters. You probably want:
char ques[20][500];
This declares an array of 20 elements, where each element can be up to 500 characters.
If your input file contains lines longer than 20 characters, then your current code is likely overwriting your arrays.
There are several problems here but your immediate problem is this:
strcpy(anse[ans5],line);
(And all the other strcpy calls like it.)
You are copying line to the array beginning at anse[1][0]. If line contains more than 20 characters, it will overwrite memory past the end of anse. For example, if line contains 25 characters, you'll be putting it in anse[1][0] through anse[1][24]. Unfortunately anse[1][24] does not exist because anse is only 20 characters long. If any question exceeds 20 characters, you'll be corrupting memory and possibly causing a crash. Let me guess: Question 5 is longer than 19 characters, right?
In short, you have your rows and columns mixed up in your declarations. I think you wanted to allow 20 questions of 500 characters each, but you're actually allowing 500 questions of 20 characters each.
Next problem: In C, arrays are zero-based, not one-based. The first string in ques, for example, is ques[0], not ques[1].
To simplify this, think of two-dimensional arrays as a table composed of rows and columns. For example, declare a 3x4 array named foo:
char foo[3][4];
Picture it like this:
0 1 2 3
0 . . . .
1 . . . .
2 . . . .
What I have is an array of three character strings, each 4 characters long. The first string in my array is at foo[0]. The first character of the first string is at foo[0][0]. The second character is at foo[0][1], the second character of the third string is foo[2][1], and so on.
To solve this, your declarations should look like this:
char ques[20][500],ansa[20][500],ansb[20][500],ansc[20][500],ansd[20][500],anse[20][500];
int start = 1,quesval=0,ans1=0,ans2=0,ans3=0,ans4=0,ans5=0;
When you get it working, you should then ask yourself why you're testing the value of start six times each pass through the loop when it only changes once. There's a much better solution available here. Consider a three-dimensional array like this:
char answers[20][500][5];
That gives you 20 questions with 5 answers each.

Resources