Line counting and abberant results - c

I'm writing a utility to count the lines in a given file via the Unix command line. Normally this would be dead simple for me, but apparently I'm having a major off night. The goal of this program is to take in an unknown number of files from the command line, read them into a buffer and check for the newline character. Sounds simple?
int size= 4096;
int main(int argc, char *argv[]){
int fd, i, j, c, fileLines, totalLines;
char *buf= (char *)malloc(size); //read buffer
for (i=2; i<argc; i++){ //get first file
fileLines=1;
if ((fd=open(argv[i], O_RDONLY))!= -1){ //open, read, print file count, close
while ((c= read(fd, buf, size))!= 0){
for (j=0; j<size; j++){
if (buf[j] == '\n')
fileLines++;
}
}
}
printf("%s had %d lines of text\n", argv[i], fileLines);
totalLines+= fileLines;
close(fd);
}
printf("%d lines were counted overall\n", totalLines);
return 0;
}
I have two problems. The first is that the first printf statement is never executed outside of the debugger. The second thing is the totalLines printout should be roughly 175K lines, but the printed value is about 767 times larger.
I'm having trouble understanding this, because all the relevant variables have been declared out of scope from their modification, but that still doesn't explain why the first print statemeent and line counter update is ignored outside of the debugger along with the abberant totalLines result
Any help is appreciated.
ANSWER
Two changes were suggested.
The first was to change j<size to j<c. While this was not the solution required, it follows good coding convention
The second was to change i=2 to i=1. The reason I had the original start variable was the way I started the debugger executable. In the gdb command line, I entered in run lc1 f1.txt to start the debugger. This resulted in the arglist having three variables, and I didn't know that run f1.txt was perfectly suitable, since my professor introduced us to gdb by using the first example.

You're not initializing totalLines. You increment it inside of your loop, but you don't set it to 0 when you first declare it.
Also, why do you start from i=2? This is the third command-line argument, and the second parameter to your program. Is this what you intended, or did you want to start from the first parameter to your program?
And as others have pointed out, you should have j < c instead of j < size.

Your loop is wrong. It should be j=0; j<c; j++. That's probably not directly responsible for the errors you're seeing but will definitely cause problems.
Did you try stepping through the code with a debugger?

Consider: ./program file.txt
argv[0] is "program"
argv[1] is "file.txt"
which means your for loop starts from the wrong index, and if you are passing only 1 file through the cmd line your code will never enter in that loop! It should start at index 1:
for (i=1; i<argc; i++){
Do yourself a favor and initialize all variables when you declare them. Is the only way to ensure that there will be no garbage on those memory locations.

First, excellent question. :) All the necessary code, well stated, and it's obvious you've done your work. :)
How are you starting your program when in the debugger? I think the argv[2] starting point might be related to not reaching the printf(), but it would depend upon how you're starting. More details below.
A few comments:
int size= 4096;
Typically, C preprocessor macros are used for this kind of magic number. I know your teachers probably said to never use the preprocessor, but idiomatic C would read:
#define SIZE 4096
for (i=2; i<argc; i++){ //get first file
Try i=1 -- argv[0] is the name of the program, argv[1] is going to be the first command line argument -- presumably if someone calls it via ./wc foo you want to count the number of lines in the file foo. :) (Also, you want the loop to terminate. :) Of course, if you're trying to write a replacement for wc -l, then your loop is alright, but not very helpful if someone screws up the arguments. That can safely be kept as a project for later. (If you're curious now, read the getopt(3) manpage. :)
if ((fd=open(argv[i], O_RDONLY))!= -1){
while ((c= read(fd, buf, size))!= 0){
for (j=0; j<size; j++){
You are ending the loop at j<size -- but you only read in c characters in the last block. You're reading left-over garbage on the last block. (I wouldn't be surprised if there are generated files in /proc/ that might return short reads out of convenience for kernel programmers.)
if (buf[j] == '\n')
fileLines++;
}
}
}
printf("%s had %d lines of text\n", argv[i], fileLines);
totalLines+= fileLines;
This is the first time you've assigned to totalLines. :) It is liable to have garbage initial value.
close(fd);
You should probably move the close(fd); call into the if((fd=open())) block; if the open failed, this will call close(-1);. Not a big deal, but if you were checking the close(2) error return (always good practice), it'd return a needless error.
}
Hope this helps!

You're probably aware of wc, but I'll mention it just in case.
I know it doesn't directly help you debug your specific problem, but maybe you could glance at the source code and/or use it to verify that your program is working.

You have logical error in for() loop. You should use "bytes read" instead "read up to", what I mean in your code use "c" instead "size" in for()

Related

C: Segmentation Fault when reading from a file

I recently started working on this project, and I'm having trouble reading certain things into a global variable. It's for practice with pthreads, which is why I'm using a global variable in the first place. The program is supposed to read in numbers from a file that represent a solved sudoku puzzle, and the text file will be formatted with 9 number characters followed by a new line, nine times. I've made sure that, when running this program, the file is formatted as such. I know that this segment of my code contains the segmentation fault, but I can't tell where. I can only presume that it has something to do with fgets(). However, none of the resources I looked up have anything in them that would make me think that I'm using it incorrectly. It even does this when I resort to using fgetc, reading it in one bit at a time, making accomodations for fgetc returning an int, unlike fgets assigning a string to a variable (in this case, s).
I wouldn't bring it to stack overflow unless I was sure that I couldn't find it; I've been combing over the code for an hour trying to find this seg fault, and it doesn't make any sense to me. I know that the Seg Fault is here because directly after it, it should print out the entire puzzle matrix, but it doesn't make it that far.
int main(int argc, char *argv[]) {
FILE* puzzlefile;
char s[10];
int i=0, j=0, skip;
//open the file passed in via command line
puzzlefile = fopen(argv[1], "r");
for (i=0; i<9; i++){
//get first string of 10 characters
fgets(s,10, puzzlefile);
for (j=0; j<9; i++){
//read the numbers from s into the puzzle 2D
//array, which takes ints. Ignore the 10th
//character, which will be \n
puzzle[j][i] = (int)(s[j]-'0');
}
}
...
}
Your problem seems to be this:
for (j=0; j<9; i++)
^^^
This should be j++, not i++

for loop brackets in c

I write a program to print integer values using for loop and after printing, the program should wait one second, after that those integers are overwrite with double space character in other words the purpose of the program is to erase those integer after wait for one second.
This the program:
#include <stdio.h>
#include <time.h>
int main (void) {
int i;
for(i=1;i<=5;i++){
printf("%d ",i);
}
for(;clock () < CLOCKS_PER_SEC;){} /*wait for one second*/
printf("\r"); /*move to the beginning of the line*/
for(i=1;i<=5;i++){
printf(" "); /*overwriting the integers*/
}
printf("\n");
return 0;
}
The problem is in the wait loop loop brackets `for(;clock () < CLOCKS_PER_SEC;){}' when i
remove those brackets the program work properly. but if the for loop with brackets. the program doesn't work,I mean the program still runs but it overwrite the integer instead of showing those integers first.
please someone explain what happen?
When you remove the brackets, the printf("\r") statement becomes the body of the for loop, logically equivalent to this:
for(;clock () < CLOCKS_PER_SEC;) {printf("\r");}
So the integers get overwritten right away instead of after the end of the delay period.
Of course, the real question is why you are using a busy-loop for a delay rather than just calling sleep(1), which is much more efficient (i.e. it won't pin your CPU at 100% during the delay period)
You aren't flushing stdout (the stream that printf writes to) yourself so it doesn't happen until the '\r' is written, and then you immediately clear it.
If you remove the {}, then your loop is equivalent to
for(;clock () < CLOCKS_PER_SEC;)
printf("\r");
which writes a bunch of \r, the first of which flushes the output and the rest of which are redundant. After the loop completes, you clear the line, working as you want it to.
You should call fflush(stdout) after printing the numbers. Or you could move the printf("\r") so it comes before the wait loop (the difference being where the cursor ends up).
Your loop is problematic, as there's no guarantee that clock() starts at 0, and it won't on many systems, and you shouldn't spin like that ... it slows down other programs running on your system. You could just use sleep(1), although it's not very accurate.
I suspect somehow the output buffer is being flushed differently between the two cases. You could check this by manually flushing the buffer using fflush(stdout) before the problematic loop.
Also note that the {} aren't mandatory in C, for single-line statements within the loop.
Here is the code you might want:
#include <stdio.h>
int main(int argc, char * argv[]) {
int seconds = 10;
while(seconds>0) {
printf("%10d", --seconds);
fflush(stdout);
sleep(1);
printf("\r");
}
printf("%10s\n", "time up!");
return 0;
}
(Since you ask about what fflush() acturally is, here is a little explanation base on my understanding)
It's all about io cache, 1 reason cache exists is: read/write memory could be over 1000 times quicker than hard disk.
So program should try do reduce the frequence of read/write hard disk, and use memory instead, but need a proper tradeoff for the user experience and io delay.
e.g
When read a file by lines, it could read 2kb or so at once instead a single line, then could read from the memory cache,
When write to console, the program might choose to write to the memory cache until meet a \n or \t char, and some other case.
fflush(FILE * file), is a function from stdio.h, it flush the cache of specified FILE. In your case, the file is stdout(standard output), which print to your console. When you use printf() to print a single number, it might write to cache of stdout, so you didn't see it in console, but calling fflush(stdout) flush out the cache to your console.
On the first 'for' loop you are printing values using printf. Here 'printf' uses 'stdout' which is a buffered output- meaning output will not be printed unless '\n' is provided or buffer full. so you can either uses flush(stdout) after the first loop or uses fprintf(stderr, "") to print to standard error which is not a buffered output.

What happens to command line arguments once printf() is used on them?

So I've written this small program, and I'm a newbie. It prints out the command line arguments I give it. I just don't understand why it worked before I changed the i variable to be initialized to one, yet when I changed it I get a segmentation fault.
The code:
#include<stdio.h>
int main ( int argc, char *argv[] )
{
if ( argc > 1) {
printf( "Filename: %s has %d arguments.", argv[0], argc );
} else {
printf ("No arguments found!");
getchar();
return 0;
}
int i = 1;
printf( "The arguments are: \n" );
for ( i < argc; ++i;) {
printf( "Argument %d is: %s \n", i, argv[i] );
}
getchar();
return 0;
}
I've never seen anything which says something happens to the command line arguments once they have been used. However my hypothesis is something happened to the command line arguments after I used printf() on them. It worked the first time when the counter variable i was initialized to zero. When I retooled the program to skip the zero-eth by initializing i to one the argument gave me that segmentation fault. I did this because I was a bit confused about what was happening. It wasn't printing out the filename a second time like I thought it would yet I changed it so it wouldn't anyway (makes a lot of sense huh? Not in retrospect lol).
Your for loop is broken:
for(;i<argc;++i)
The first block is the initial condition and the second block is the check performed before each iteration. As you wrote it, the check was ++i which would be true even after the last argument.
The correct way to use the for...next loop is like so (for C 1990/1989):
int i;
printf( "The arguments are: \n" );
for ( i = 0; i < argc; i++) {
Many people were complaining about my lack of knowledge about the syntax. I do know how to write a for...next loop. I've written many in C++ and C is almost no different. I followed the compilers warnings and adjusted but did not really think about what I was doing or check for anyone else with the same problem.
Because the compiler I am using GCC does not completely support C99 and I chose to stick with C90 as my compiling option. I am just learning C and I've found it a lot like C++ (fancy that) but I do not claim to know everything about C, or its previous versions.
if you're working with C you should first initialize declare and initialize i and do you if statement your program works fine if you change the placement of int i = 1

C - Error while running .exe file that I compiled

I used geany to compile my code and no errors were found.
But when I run the .exe file the program stops working, and I'm not a good programmer, this is a work for school.
My program consists of reading 2 words, in this words its going to count how many letters each one has, and then he divides de number of letters in wordA for number of letters in wordB.
This is my code
#include <stdio.h>
int main(int argc, char *argv[])
{
int i, j;
float n;
printf ("Insert first word:\n");
for(i=0; argv[1][i] != '\0'; i++);
printf ("Insert second word:\n");
for(j=0; argv[2][j] != '\0'; j++);
n=i/j;
printf("%.4f", n);
return 0;
}
In this line
n = i/j;
you are performing integer division. So, for example, let's say that i is 3 and j is 5, then you perform 3/5 which equals 0.
But I think you are looking to perform 3.0/5.0 and hoping for the answer 0.6. So you need to perform floating point division. You can force that by casting one of the operands to a float.
n = (float)i/j;
In the question you wrote Int rather than int. I assumed that was a transcription error when asking the question. But perhaps your real code looks like that. In which case, you'll need to change it to int to get it to compile.
The other possible problem you have is that the program expects arguments to be passed on the command line. Are you passing two arguments to your program? In other words you need to execute your program like this:
program.exe firstword secondword
If you are not passing arguments then you will encounter runtime errors when attempting to access non-existent arguments in argv[]. At the very least you should add a check to the program to ensure that argc==3.
If you want to read the input from stdin, rather than passing command line arguments, use scanf.
I think this is a conceptual error. Your program (probably) runs fine when called like this:
myapp word1 word2
But I think you expect it to work like this:
myapp
Insert first word:
> word1
Insert second word:
> word2
But that's not what argv is about. You should look into scanf
Specifically, the error in the second case is because argv[1] is NULL, so argv[1][i] is a bad memory access.

File Reading in C, Random Error

Thank you everybody so far for your input and advice!
Additionally:
After testing and toying further, it seems individual calls to FileReader succeed. But calling FileReader multiple times (these might be separate versions of FileReader) causes the issue to occur.
End Add
Hello,
I have a very unusual problem [please read this fully: it's important] (Code::Blocks compiler, Windows Vista Home) [no replicable code] with the C File Reading functions (fread, fgetc). Now, normally, the File Reading functions load up the data correctly to a self-allocating and self-deallocating string (and it's not the string's issue), but this is where it gets bizarre (and where Quantum Physics fits in):
An error catching statement reports that EOF occurred too early (IE inside the comments section at the start of the text file it's loading). Printing out the string [after it's loaded] reports that indeed, it's too short (24 chars) (but it has enough space to fit it [~400] and no allocation issues). The fgetc loop iterator reports it's terminating at just 24 (the file is roughly 300 chars long) with an EOF: This is where it goes whacky:
Temporarily checking Read->_base reports the entire (~300) chars are loaded - no EOF at 24. Perplexed, [given it's an fgetc loop] I added a printf to display each char [as a %d so I could spot the -1 EOF] at every step so I could see what it was doing, and modified it so it was a single char. It loops fine, reaching the ~300 mark instead of 24 - but freezes up randomly moments later. BUT, when I removed printf, it terminated at 24 again and got caught by the error-catching statement.
Summary:
So, basically: I have a bug that is affected by the 'Observer Effect' out of quantum physics: When I try to observe the chars I get from fgetc via printf, the problem (early EOF termination at 24) disappears, but when I stop viewing it, the error-catch statement reports early termination.
The more bizarre thing is, this isn't the first time it's occurred. Fread had a similar problem, and I was unable to figure out why, and replaced it with the fgetc loop.
[Code can't really be supplied as the code base is 5 headers in size].
Snippet:
int X = 0;
int C = 0;
int I = 0;
while(Copy.Array[X] != EOF)
{
//Copy.Array[X] = fgetc(Read);
C = fgetc(Read);
Copy.Array[X] = C;
printf("%d %c\n",C,C); //Remove/add this as necessary
if(C == EOF){break;}
X++;
}
Side-Note: Breaking it down into the simplest format does not reproduce the error.
This is the oldest error in the book, kind of.
You can't use a variable of type char to read characters (!), since the EOF constant doesn't fit.
You need:
int C;
Also, the while condition looks scary, you are incrementing X in the loop, then checking the (new) position, is that properly initialized? You don't show how Copy.Array is set up before starting the loop.
I would suggest removing that altogether, it's very strange code.
In fact, I don't understand why you loop reading single characters at all, why not just use fread() to read as much as you need?
Firstly, unwind's answer is a valid point although I'm not sure whether it explains the issues you are seeing.
Secondly,
printf("%d %c\n",C,C); //Remove/add this as necessary
might be a problem. The %d and %c format specifiers expect an int to be the parameter, you are only passing a char. Depending on your compiler, this might mean that they are too small.
This is what I think the problem is:
How are you allocating Copy.Array? Are you making sure all its elements are zeroed before you start? If you malloc it (malloc just leaves whatever garbage was in the memory it returns) and an element just happens to contain 0xFF, your loop will exit prematurely because your while condition tests Copy.Array[X] before you have placed a character in that location.
This is one of the few cases where I allow myself to put an assignment in a condition because the pattern
int c;
while ((c = fgetc(fileStream)) != EOF)
{
doSomethingWithC(c);
}
is really common
Edit
Just read your "Additionally" comment. I think it is highly likely you are overrunning your output buffer. I think you should change your code to something like:
int X = 0; int C = 0; int I = 0;
while(X < arraySize && (C = fgetc(Read)) != EOF)
{
Copy.Array[X] = C;
printf("%d %c\n", (int)C, (int)C);
X++;
}
printf("\n");
Note that I am assuming that you have a variable called arraySize that is set to the number of characters you can write to the array without overrunning it. Note also, I am not writing the EOF to your array.
You probably have some heap corruption going on. Without seeing code it's impossible to say.
Not sure if this is your error but this code:
C = fgetc(Read);
Copy.Array[X] = C;
if(C == EOF){break;}
Means you are adding the EOF value into your array - I'm pretty sure you don't want to do that, especially as your array is presumably char and EOF is int, so you'll actually end up with some other value in there (which could mess up later loops etc).
Instead I suggest you change the order so C is only put in the array once you know it is not EOF:
C = fgetc(Read);
if(C == EOF){break;}
Copy.Array[X] = C;
Whilst this isn't what I'd call a 'complete' answer (as the bug remains), this does solve the 'observer effect' element: I found, for some reason, printf was somehow 'fixing' the code, and using std::cout seemed to (well, I can't say 'fix' the problem) prevent the observer effect happening. That is to say, use std::cout instead of printf (as printf is the origin of the observer effect).
It seems to me that printf does something in memory on a lower level that seems to partially correct what does indeed seem to be a memory allocation error.

Resources