I'm attempting to recreate the wc command in c and having issues getting the proper number of words in any file containing machine code (core files or compiled c). The number of logged words always comes up around 90% short of the amount returned by wc.
For reference here is the project info
Compile statement
gcc -ggdb wordCount.c -o wordCount -std=c99
wordCount.c
/*
* Author(s) - Colin McGrath
* Description - Lab 3 - WC LINUX
* Date - January 28, 2015
*/
#include<stdio.h>
#include<string.h>
#include<dirent.h>
#include<sys/stat.h>
#include<ctype.h>
struct counterStruct {
int newlines;
int words;
int bt;
};
typedef struct counterStruct ct;
ct totals = {0};
struct stat st;
void wc(ct counter, char *arg)
{
printf("%6lu %6lu %6lu %s\n", counter.newlines, counter.words, counter.bt, arg);
}
void process(char *arg)
{
lstat(arg, &st);
if (S_ISDIR(st.st_mode))
{
char message[4056] = "wc: ";
strcat(message, arg);
strcat(message, ": Is a directory\n");
printf(message);
ct counter = {0};
wc(counter, arg);
}
else if (S_ISREG(st.st_mode))
{
FILE *file;
file = fopen(arg, "r");
ct currentCount = {0};
if (file != NULL)
{
char holder[65536];
while (fgets(holder, 65536, file) != NULL)
{
totals.newlines++;
currentCount.newlines++;
int c = 0;
for (int i=0; i<strlen(holder); i++)
{
if (isspace(holder[i]))
{
if (c != 0)
{
totals.words++;
currentCount.words++;
c = 0;
}
}
else
c = 1;
}
}
}
currentCount.bt = st.st_size;
totals.bt = totals.bt + st.st_size;
wc(currentCount, arg);
}
}
int main(int argc, char *argv[])
{
if (argc > 1)
{
for (int i=1; i<argc; i++)
{
//printf("%s\n", argv[i]);
process(argv[i]);
}
}
wc(totals, "total");
return 0;
}
Sample wc output:
135 742 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
231 1189 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
24 224 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
39 247 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
41 230 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11508 85256 1324239 total
Sample rebuild output (./wordCount):
139 76 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
233 493 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
25 3 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
40 6 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
42 3 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11517 83205 1324239 total
Notice the difference in the word count (second int) from the first two files (core files) as well as the roman file and parseURL files (machine code, no extension).
C strings do not store their length. They are terminated by a single NUL (0) byte.
Consequently, strlen needs to scan the entire string, character by character, until it reaches the NUL. That makes this:
for (int i=0; i<strlen(holder); i++)
desperately inefficient: for every character in holder, it needs to count all the characters in holder in order to test whether i is still in range. That transforms a simple linear Θ(N) algorithm into an Θ(N2) cycle-burner.
But in this case, it also produces the wrong result, since binary files typically include lots of NUL characters. Since strlen will actually tell you where the first NUL is, rather than how long the "line" is, you'll end up skipping a lot of bytes in the file. (On the bright side, that makes the scan quadratically faster, but computing the wrong result more rapidly is not really a win.)
You cannot use fgets to read binary files because the fgets interface doesn't tell you how much it read. You can use the Posix 2008 getline interface instead, or you can do binary input with fread, which is more efficient but will force you to count newlines yourself. (Not the worst thing in the world; you seem to be getting that count wrong, too.)
Or, of course, you could read the file one character at a time with fgetc. For a school exercise, that's not a bad solution; the resulting code is easy to write and understand, and typical implementations of fgetc are more efficient than the FUD would indicate.
Related
Currently trying to read data from a text file line by line using strtok and a space as a delimiter and save the info into different arrays. Im using the FatFs library to read the file from an sd card. Atm im only trying to read the first 2 elements from the line.
My text file looks like this:
223 895 200 200 87 700 700 700
222 895 200 200 87 700 700 700
221 895 200 200 87 700 700 700
222 895 200 200 87 700 700 700
My current code is something like this:
void sd_card_read()
{
char buffer[30];
char buffer2[10];
char buffer3[10];
int i=0;
int k=0;
int l=0;
int16 temp_array[500];
int16 hum_array[500];
char *p;
FIL fileO;
uint8 resultF;
resultF = f_open(&fileO, "dados.txt", FA_READ);
if(resultF == FR_OK)
{
UART_UartPutString("Reading...");
UART_UartPutString("\n\r");
while(f_gets(buffer, sizeof(buffer), &fileO))
{
p = strtok(buffer, " ");
temp_array[i] = atoi(p);
UART_UartPutString(p);
UART_UartPutString("\r\n");
p = strtok(NULL, " ");
hum_array[i] = atoi(p);
UART_UartPutString(p);
UART_UartPutString("\r\n");
i++;
}
UART_UartPutString("Done reading");
resultF = f_close(&fileO);
}
UART_UartPutString("Printing");
UART_UartPutString("\r\n");
for (k = 0; k < 10; k++)
{
itoa(temp_array[k], buffer2, 10);
UART_UartPutString(buffer2);
UART_UartPutString("\r\n");
}
for (l = 0; l < 10; l++)
{
itoa(hum_array[l], buffer3, 10);
UART_UartPutString(buffer3);
UART_UartPutString("\r\n");
}
}
The output atm is this:
223
0
222
0
etc..
895
0
895
0
etc..
After reading one time it puts the next position the value of 0 in both arrays, which is not what is wanted. Its probably something basic but cant see what is wrong.
Any help is valuable!
If we take the first line of the file
223 895 200 200 87 700 700 700
That lines is, including space and newline (assuming single '\n') 31 characters long. And since strings in C needs to be terminated by '\0' the line requires at least 32 characters (if f_gets works similar to the standard fgets function, and adds the newline).
Your buffer you read into only fits 30 characters, which means only 29 characters of your line would be read and then the terminator added. So that means you only read
223 895 200 200 87 700 700 70
The next time you call f_gets the function will read the remaining
0
You need to increase the size of the buffer to be able to fit all of the line. With the current data it needs to be at least 32 characters. But be careful since an extra character in one of the lines will give you the same problem again.
I am fairly new to programming so bear with me.
I'm trying to create some code that would read a text file that contains 3 numbers. I want to use a created function to find the max number. I get no errors when compiling but when I run the code the program crashes (no message recieved or anything, just simply file.exe has stopped working).
I would greatly appreciate help in tackling this problem.
Also I would like to avoid using arrays.
#include <stdio.h>
#include <stdlib.h>
int max(int a,int b,int c);
int main()
{
FILE *fpointer;
int a, b, c;
int maxNumber = max(a,b,c);
fpointer = fopen("marks.txt","r");
while(fscanf(fpointer,"%d %d %d",a,b,c)!=EOF) {
printf("%d",max(a,b,c));
}
fclose(fpointer);
return 0;
}
int max(int a,int b,int c){
if((a>b)&&(a>c))
return a;
if((b>a)&&(b>c))
return b;
if((c>a)&&(c>b))
return c;
}
I am fairly new to programming so bear with me.
OK, and we will, but no matter how hard we try, we cannot fix the Undefined Behavior you invoke with:
int maxNumber = max(a,b,c);
The values of a, b & c have not been initialized at the time you call max. This invokes Undefined Behavior. (attempting to access the value of an uninitialized object).
Second, also easily leading to Undefined Behavior is the failure to validate that fopen succeeds, and failing to validate that fscanf succeeds. Testing that fscanf (...) != EOF does not tell you anything about whether valid conversions actually took place. The return for fscanf is the successful number of conversions that took place -- based upon the number of conversion specifiers present in the format string (e.g. "%d %d %d" contains 3 conversion specifiers). So to validate that a, b & c all contain values, you must compare fscanf (...) == 3.
Putting those pieces together, you could do something similar to the following:
#include <stdio.h>
int max (int a, int b, int c);
int main (int argc, char **argv) {
int a, b, c, n = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (fscanf (fp, "%d %d %d", &a, &b, &c) == 3)
printf ("line[%2d] : %d\n", n++, max (a, b, c));
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
int max (int a, int b, int c)
{
int x = a > b ? a : b,
y = a > c ? a : c;
return x > y ? x : y;
}
Example Input
$ cat int3x20.txt
21 61 78
94 7 87
74 1 86
79 80 50
35 8 96
17 82 42
83 40 61
78 71 88
62 20 51
58 2 11
32 23 73
42 18 80
61 92 14
79 3 26
30 70 67
26 88 49
1 3 89
62 81 93
50 75 13
33 33 47
Example Use/Output
$ ./bin/maxof3 <dat/int3x20.txt
line[ 0] : 78
line[ 1] : 94
line[ 2] : 86
line[ 3] : 80
line[ 4] : 96
line[ 5] : 82
line[ 6] : 83
line[ 7] : 88
line[ 8] : 62
line[ 9] : 58
line[10] : 73
line[11] : 80
line[12] : 92
line[13] : 79
line[14] : 70
line[15] : 88
line[16] : 89
line[17] : 93
line[18] : 75
line[19] : 47
Look things over and let me know if you have further questions.
fscanf uses pointer arguments and you are passing the variable values. When the function tries to access the address a (which is in fact an uninitialized variable), it causes a segmentation fault (you are trying to access an invalid memory address) and your program crashes and exits.
You should instead pass the variable addresses to the pointer argument (e.g. &a - the address of variable a) , thus it will access a valid memory address.
while(fscanf(fpointer,"%d %d %d",&a,&b,&c)!=EOF) {
Other undefined behaviors might be avoided like initializing the variables and checking the return values correctly as the answer of #DavidC.Rankin describes in details.
All need to make it work is adding an & at this line...
while(fscanf(fpointer,"%d %d %d",&a,&b,&c)!=EOF) { ... }
In my class today we were assigned a project that involves reading in a file using the ./a.out"<"filename command. The contents of the file look like this
16915 46.25 32 32
10492 34.05 56 52
10027 98.53 94 44
13926 32.94 19 65
15736 87.67 5 1
16429 31.00 58 25
15123 49.93 65 38
19802 37.89 10 20
-1
but larger
My issue is that any scanf used afterwards is completely ignored and just scans in what looks like garbage when printed out, rather than taking in user input. In my actual program this is causing an issue with a menu that requires input.
How do I get the program to stop reading the file provided by the ./a.out"<"filename command?
also I stop searching at -1 rather than EOF for the sake of not having an extra set of array data starting with -1
ex
-1 0 0 0
in my real program the class size is a constant that is adjustable and is used to calculate class averages, I'd rather not have a set of 0's skewing that data.
#include <stdio.h>
int main(void)
{
int i = 0,j = 1,d,euid[200],num;
int tester = 0;
float hw[200],ex1[200],ex2[200];
while(j)
{
scanf("%d",&tester);
if( tester == -1)
{
j = 0;
}
else
{
euid[i] = tester;
}
scanf("%f",hw+i);
scanf("%f",ex1+i);
scanf("%f",ex2+i);
i++;
}
for(d = 0;d < 50;d++) /*50 because the actual file size contains much more than example*/
{
printf("euid = %d\n",euid[d]);
printf("hw = %f\n",hw[d]);
printf("ex1 = %f\n",ex1[d]);
printf("ex2 = %f\n",ex2[d]);
}
printf("input something user\n");
scanf("%d",&num);
printf("This is what is being printed out -> %d\n",num);
return 0;
}
I'm having the exact same problem. Tried every method I could find to eat the remaining input in the buffer, but it never ends.
Got it to work using fopen and fscanf, but the prof. said he prefers the code using a.out < filename
Turns out this is in fact not possible.
I'm using strtok() to parse a string I get from fgets() that is separated by the ~ character
e.g. data_1~data_2
Here's a sample of my code:
fgets(buff, LINELEN, stdin);
pch = strtok(buff, " ~\n");
//do stuff
pch = strtok(NULL, " ~\n");
//do stuff
The first instance of strtok breaks it apart fine, I get data_1 as is, and strlen(data_1) provides the correct length of it. However, the second instance of strtok returns the string, with something appended to it.
With an input of andrewjohn ~ jamessmith, I printed out each character and the index, and I get this output:
a0
n1
d2
r3
e4
w5
j6
o7
h8
n9
j0
a1
m2
e3
s4
s5
m6
i7
t8
h9
10
What is that "11th" value corresponding to?
EDIT:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char buff[100];
char * pch;
fgets(buff, 100, stdin);
pch = strtok(buff, " ~\n");
printf("FIRST NAME\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
printf("SECOND NAME\n");
pch = strtok(NULL, " ~\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
}
I ran it by:
cat sample.in | ./myfile
Where sample.in had
andrewjohn ~ johnsmith
Output was:
FIRST NAME
a 97 0
n 110 1
d 100 2
r 114 3
e 101 4
w 119 5
j 106 6
o 111 7
h 104 8
n 110 9
SECOND NAME
j 106 0
o 111 1
h 104 2
n 110 3
s 115 4
m 109 5
i 105 6
t 116 7
h 104 8
13 9
So the last character is ASCII value 13, which says it's a carriage return ('\r'). Why is this coming up?
Based on your edit, the input line ends in \r\n. As a workaround you could just add \r to your list of tokens in strtok.
However, this should be investigated further. \r\n is the line ending in a Windows file, but stdin is a text stream, so \r\n in a file would be converted to just \n in the fgets result.
Are you perhaps piping in a file that contains something weird like \r\r\n ? Try hex-dumping the file you're piping in to check this.
Another possible explanation might be that your Cygwin (or whatever) environment has somehow been configured not to translate line endings in a file piped in.
edit: Joachim's suggestion is much more likely - using a \r\n file on a non-Windows system. If this is the case , you can fix it by running dos2unix on the file. But in accordance with the principle "accept everything, generate correctly" it would be useful for your program to handle this file.
I am in the early stages of coding a homework assignment. The larger goal is a little bit bigger and beyond the scope of this question. The immediate goal is to take one or more two digit numbers from the command line which correspond to years (e.g. 52). Then open the file that goes with that year. The files are formatted thusly:
1952 Topps baseball
-------------------
8 10 15 17 20 47 48 49 59 71 136
153 155 159 162 168 170 175 176 186 188 202
215 233 248 252 253 254 257 259 264 270 271 272 274
282 283 284 285 287 293 294 295 297 299 300 308 310 311
312
Each file has a random (between 1-50) number of 1-3 digit integers. I store the year in an int. Then I store each of the later digits into an array. Then I will use that array to do other cool stuff. My problem is, how to I scan for a random number of integer inputs from the file. THis is what I have done so far:
#include <stdio.h>
#include <string.h>
main(int argc, char** argv) {
char filename[30];
int cards[100];
FILE *fp;
int year,n,i;
for (i=1; i<argc; i++) {
n=atoi(argv[i]);
sprintf (filename,"topps.%d",n);
if (!(fp=fopen(filename,"r"))){
printf("cannot open %s for reading\n",filename);
exit(3);
}
fscanf (fp, "%d%*s%*s%*s%d%d%d%d%d%d%d%d%d%d%d%d",
&year,
&cards[i],
&cards[i+1],
&cards[i+2], //this is what needs to be improved upon
&cards[i+3],
&cards[i+4],
&cards[i+5],
&cards[i+6],
&cards[i+7],
&cards[i+8],
&cards[i+9],
&cards[i+10],
&cards[i+11],
&cards[i+12]);
printf ("%d\n",year);
printf ("%d\n",cards[i+11]);
}
}
The current fscanf is just a sort of stopgap to make sure I can read and print the info. It stores up to the 12th integer and prints it. For obvious reasons I didn't want to go to the 50th, because it's pointless. Some files only have 2 or 3 numbers in them. Can anyone help guide me to a more ideal solution for reading files like this? Thanks for having a look.
Something like this does the trick:
Declare 3 new variables at the top:
char sData[10000];
char * pch;
int j = 0;
Then replace your number reading code with the snippet below:
fscanf (fp, "%d%*s%*s%*s", &year);
/* ignore the line with all the dashes (crude, but works)*/
fgets(sData, 10000, fp);
/* read all the number data in */
fgets(sData, 10000, fp);
pch = strtok (sData," ");
j = 0;
while (pch != NULL)
{
cards[j++] = atoi(pch);
pch = strtok (NULL, " ");
}
At the end of this code, cards[] should have all your numbers, and j should contain the count.
I greatly appreciate the help I got from everyone. It definitely led me down the right path. However, this is the answer to the problem that eventually worked for me:
fscanf(fp,"%*[^\n]%*c"); //Skip first two
fscanf(fp,"%*[^\n]%*c"); //lines of file
while (!feof(fp)) { //Read ints into array
fscanf(fp,"%d ",&cards[i++]);
}