How to read numbers from a text file properly? - c

I would like to write a lottery program in C, that reads the chosen numbers of former weeks into an array. I have got a text file in which there are 5 columns that are separated with tabulators. My questions would be the following:
What should I separate the columns with? (e.g. a comma, a semicolon, a tabulator or something else)
Should I include a kind of EOF in the last row? (e.g. -1, "EOF") Is there any accepted or "official" convention to do this?
Which function should I use for reading the numbers? Is there any proper or "accepted" way of reading data from text files?
I used to write a C program for a "Who Wants to Be a Billionaire" game. In that one I used a kind of function that read each line into an array that was big enough to hold a whole line. After that I separated its data into variables like this:
line: "text1";"text2";"text3";"text4"endline (-> line loaded into a buffer array)
text1 -> answer1 (until reaching the semicolon)
text2 -> answer2 (until reaching the semicolon)
text3 -> answer3 (until reaching the semicolon)
text4 -> answer4 (until reaching the end of the line)
endline -> start over, that is read a new line and separate its contents into variables.
It worked properly, but I don't know if it was good enough for a programmer. (btw I'm not a programmer yet, I study Computer Science at a university)
Every answers and advice is welcome. Thanks in advance for your kind help!

The scanf() family of functions don't care about newlines, so if you want to process lines, you need to read the lines first and then process the lines with sscanf(). The scanf() family of functions also treats white space โ€” blanks, tabs, newlines, etc. โ€” interchangeably. Using tabs as separators is fine, but blanks will work too. Clearly, if you're reading and processing a line at a time, newlines won't really factor into the scanning.
int lottery[100][5];
int line;
char buffer[4096];
for (line = 0; fgets(buffer, sizeof(buffer), stdin) != 0 && line < 100; line++)
{
if (sscanf(buffer, "%d %d %d %d %d", &lottery[line][0], &lottery[line][1],
&lottery[line][2], &lottery[line][3], &lottery[line][4]) != 5)
{
fprintf(stderr, "Faulty line: [%s]\n", line);
break;
}
}
This stops on EOF, too many lines, and a faulty line (one which doesn't start with 5 numbers; you can check their values etc in the loop if you want to โ€” but what are the tests you need to run?). If you want to validate the white space separators, you have to work harder.
Maybe you want to test for nothing but spaces and newlines after the 5 numbers; that's a bit trickier (it can be done; look up the %n conversion specification in sscanf()).

Related

How to find out how many words are in each line?

Say you have a text file filled with sentences. For example:
hey how are you
you good?
nice to meet you jeff
I'm writing a program to print things out depending on how many indexes are on each line but I cant wrap my head around how to find how many words on each line. How could I go about counting how many words are on each line?
for (int i=0; i < wordle->leng; i++) {
printf ("%s ", wordle->allwords[i]);
This is my print function for the program. leng is how many lines so it knows how many times to repeat.
Some of the lines have 5 words, some 3, and it isn't printing in the correct format. Also not all lines will end with punctuation.
The POSIX getline() function is very useful for that; it reads line from stream until EOL. So you can read with that line by line and the you could make a loop that adds 1 to int word_count = 0; every time you read something that is not a whitespace and the previous char before that was whitespace (but you have to make additional logic for initial word).
You can use fgets() if you don't have getline() available, but it doesn't expand the buffer to deal with extra long lines, unlike getline().

how to scan line in c program not from file

How to scan total line from user input with c program?
I tried scanf("%99[^\n]",st), but it is not working when I scan something before this scan statment.It worked if this is the first scan statement.
How to scan total line from user input with c program?
There are many ways to read a line of input, and your usage of the word scan suggests you're already focused on the scanf() function for the job. This is unfortunate, because, although you can (to some extent) achieve what you want with scanf(), it's definitely not the best tool for reading a line.
As already stated in the comments, your scanf() format string will stop at a newline, so the next scanf() will first find that newline and it can't match [^\n] (which means anything except newline). As a newline is just another whitespace character, adding a blank in front of your conversion will silently eat it up ;)
But now for the better solution: Assuming you only want to use standard C functions, there's already one function for exactly the job of reading a line: fgets(). The following code snippet should explain its usage:
char line[1024];
char *str = fgets(line, 1024, stdin); // read from the standard input
if (!str)
{
// couldn't read input for some reason, handle error here
exit(1); // <- for example
}
// fgets includes the newline character that ends the line, but if the line
// is longer than 1022 characters, it will stop early here (it will never
// write more bytes than the second parameter you pass). Often you don't
// want that newline character, and the following line overwrites it with
// 0 (which is "end of string") **only** if it was there:
line[strcspn(line, "\n")] = 0;
Note that you might want to check for the newline character with strchr() instead, so you actually know whether you have the whole line or maybe your input buffer was to small. In the latter case, you might want to call fgets() again.
How to scan total line from user input with c program?
scanf("%99[^\n]",st) reads a line, almost.
With the C Standard Library a line is
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr ยง7.21.2 2
scanf("%99[^\n]",st) fails to read the end of the line, the '\n'.
That is why on the 2nd call, the '\n' remains in stdin to be read and scanf("%99[^\n]",st) will not read it.
There are ways to use scanf("%99[^\n]",st);, or a variation of it as a step in reading user input, yet they suffer from 1) Not handling a blank line "\n" correctly 2) Missing rare input errors 3) Long line issues and other nuances.
The preferred portable solution is to use fgets(). Loop example:
#define LINE_MAX_LENGTH 200
char buf[LINE_MAX_LENGTH + 1 + 1]; // +1 for long lines detection, +1 for \0
while (fgets(buf, sizeof buf, stdin)) {
size_t eol = strcspn(buf, "\n"); **
buf[eol] = '\0'; // trim potential \n
if (eol >= LINE_MAX_LENGTH) {
// IMO, user input exceeding a sane generous threshold is a potential hack
fprintf(stderr, "Line too long\n");
// TBD : Handle excessive long line
}
// Use `buf[[]`
}
Many platforms support getline() to read a line.
Short-comings: Non C-standard and allow a hacker to overwhelm system resources with insanely long lines.
In C, there is not a great solution. What is best depends on the various coding goals.
** I prefer size_t eol = strcspn(buf, "\n\r"); to read lines in a *nix environment that may end with "\r\n".
scanf() should never be used for user input. The best way to get input from the user is with fgets().
Read more: http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html
char str[1024];
char *alline = fgets(str, 1024, stdin);
scanf("%[^'\n']s",alline);
I think the correct solution should be like this. It is worked for me.
Hope it helps.

Word count debugging

On K&R, the following code is proposed to count words, lines and characters in input. Exercise 1.11 asks:
How would you test the word count program? What kinds of input are
most likely to uncover bugs if there are any?
The only answer I see to these questions is testing the code on some input that contains several lines, words and tabs.
Can you see any other way to test this code?
#include <stdio.h>
#define IN 1 /* inside a word */
#define OUT 0 /* outside a word */
/* count lines, words and characters in input */
main(){
int c, n1, nw, nc, state;
state = OUT;
n1 = nw = nc = 0;
while ((c = getchar()) != EOF){
++nc;
if (c == '\n')
++n1;
if (c == ' ' || c == '\n' || c == '\t')
state = OUT;
else if (state == OUT){
state = IN;
++ nw;
}
}
printf("%d %d %d\n",n1,nw,nc);
}
Test the program using all of the following types of inputs:
An empty file.
A file with only new lines and no words.
A file with very long words, all on one line.
A file with very long words, on many lines.
The program might produce invalid output, but should not crash if given special characters.
Test the program with "N" blank lines inserted at random locations throughout the document.
Test the program with "N" blank lines inserted at the beginning of the document.
Test the program with "N" blank lines inserted at the end of the document.
Test the program with both one character words and long words, including hyphenated words with these inputs:
A file with only one space separating each word.
A file with one space or "N" spaces separating each word.
A file with only one tab separating each word.
A file with one space or "N" tabs separating each word.
A file with only one space OR tab separating each word.
A file with one space or "N" spaces OR tabs separating each word.
Test the program with single quotes and double quotes, with and without spaces between the words and the quotes, and with nested levels of quotes.
Also:
Make sure the program doesn't count un-intended characters as a word or part of a word. For example, make sure a carriage return, which is a legal MS-DOS character is not counted as a word if it is included at the end of a line.
Create the largest possible file for which space was designated for this application, and make sure that the program does not crash, that other applications are NOT impacted, and that the output is correct.
Create the largest possible file for which space was designated for this application, containing only spaces, newlines and tabs, except for words at the end of the file, and make sure that the program does not crash, that other applications are NOT impacted, and that the output is correct.
Create the largest possible file for which space was designated for this application, containing only spaces, newlines and tabs, except for words at the beginning of the file, and make sure that the program does not crash, that other applications are NOT impacted, and that the output is correct.
Create the largest possible file for which space was designated for this application, containing only one very long word: the output of the program should be 1.
Have the program write a debugging file that contains a printf for each while, if, and else statement. Make sure that the tests cause all of the printf statements to be reached. In other words, there shouldn't be any parts of the code that remain unused at the end of the testing.
There should be a good reason the output doesn't match the output of the wc program.
The idea behind the question is to illustrate the concept of "white box" testing. Look at every "choice point in your program, and see how you can exercise the logic behind it to uncover the "corner cases":
To exercise the while loop, feed it input that has no data (i.e. EOF comes right away)
Feed the program a file with a single line and no \n before EOF to exercise the line counting if
Feed the program a file with one or more lines composed entirely of whitespace characters
Feed the program a file with the last \n missing, and see if the last word gets counted
Feed the program a file with single-character words to exercise the logic of switching between IN and OUT

Flushing stdin after every input - which approach is not buggy?

After Mark Lakata pointed out that the garbage isn't properly defined in my question I came up with this. I'll keep this updated to avoid confusions.
I am trying to get a function that I can call before a prompt for user input like printf("Enter your choice:); followed a scanf and be sure that only the things entered after the prompt would be scanned in by scanf as valid input.
As far as I can understand the function that is needed is something that flushes standard input completely. That is what I want. So for the purpose of this function the "garbage" is everything in user input i.e. the whole user input before that user prompt.
While using scanf() in C there is always the problem of extra input lying in the input buffer. So I was looking for a function that I call after every scanf call to remedy this problem. I used this, this, this and this to get these answers
//First approach
scanf("%*[^\n]\n");
//2ndapproach
scanf("%*[^\n]%*c");
//3rd approach
int c;
while((c = getchar()) != EOF)
if (c == '\n')
break;
All three are working as far as I could find by hit-and-trial and going by the references. But before using any of these in all of my codes I wanted to know whether any of these have any bugs?
EDIT:
Thanks to Mark Lakata for one bug in 3rd. I corrected it in the question.
EDIT2:
After Jerry Coffin answered I tested the 1st 2 approaches using this program in code:blocks IDE 12.11 using GNU GCC Compiler(Version not stated in the compiler settings).
#include<stdio.h>
int main()
{
int x = 3; //Some arbitrary value
//1st one
scanf("%*[^\n]\n");
scanf("%d", &x);
printf("%d\n", x);
x = 3;
//2nd one
scanf("%*[^\n]%*c");
scanf("%d", &x);
printf("%d", x);
}
I used the following 2 inputs
First Test Input (2 Newlines but no spaces in the middle of garbage input)
abhabdjasxd
23
bbhvdahdbkajdnalkalkd
46
For the first I got the following output by the printf statements
23
46
i.e. both codes worked properly.
Second Test input: (2 Newlines with spaces in the middle of garbage input)
hahasjbas asasadlk
23
manbdjas sadjadja a
46
For the second I got the following output by the printf statements
23
3
Hence I found that the second one won't be taking care of extra garbage input whitespaces. Hence, it isn't foolproof against garbage input.
I decided to try out a 3rd test case (garbage includes newline before and after the non-whitespace character)
``
hahasjbas asasadlk
23
manbdjas sadjadja a
46
The answer was
3
3
i.e. both failed in this test case.
The first two are subtly different: they both read and ignore all the characters up to a new-line. Then the first skips all consecutive white space so after it executes, the next character you read will be non-whitespace.
The second reads and ignores characters until it encounters a new-line then reads (and discards) exactly one more character.
The difference will show up if you have (for example) double-spaced text, like:
line 1
line 2
Let's assume you read to somewhere in the middle of line 1. If you then execute the first one, the next character you read in will be the 'l' on line 2. If you execute the second, the next character you read in will be the new-line between line 1 and line 2.
As for the third, if I were going to do this at all, I'd do something like:
int ch;
while ((ch=getchar()) != EOF && ch != '\n')
;
...and yes, this does work correctly -- && forces a sequence point, so its left operand is evaluated first. Then there's a sequence point. Then, if and only if the left operand evaluated to true, it evaluates its right operand.
As for performance differences: since you're dealing with I/O to start with, there's little reasonable question that all of these will always be I/O bound. Despite its apparent complexity, scanf (and company) are usually code that's been used and carefully optimized over years of use. In this case, the hand-rolled loop may be quite a bit slower (e.g., if the code for getchar doesn't get expanded inline) or it may be about the same speed. The only way it stands any chance of being significantly faster is if the person who wrote your standard library was incompetent.
As far maintainability: IMO, anybody who claims to know C should know the scan set conversion for scanf. This is neither new nor rocket science. Anybody who doesn't know it really isn't a competent C programmer.
The first 2 examples use a feature of scanf that I didn't even know existed, and I'm sure a lot of other people didn't know. Being able to support a feature in the future is important. Even if it was a well known feature, it will be less efficient and harder to read the format string than your 3rd example.
The third example looks fine.
(edit history: I made a mistake saying that ANSI-C did not guarantee left-to-right evaluation of && and proposed a change. However, ANSI-C does guarantee left-to-right evaluation of &&. I'm not sure about K&R C, but I can't find any reference to it and no one uses it anyways...)
Many other solutions have the problem that they cause the program to hang and wait for input when there is nothing left to flush. Waiting for EOF is wrong because you don't get that until the user closes the input completely!
On Linux, the following will do a non-blocking flush:
// flush any data from the internal buffers
fflush (stdin);
// read any data from the kernel buffers
char buffer[100];
while (-1 != recv (0, buffer, 100, MSG_DONTWAIT))
{
}
The Linux man page says that fflush on stdin is non-standard, but "Most other implementations behave the same as Linux."
The MSG_DONTWAIT flag is also non-standard (it causes recv to return immediately if there is no data to be delivered).
You should use getline/getchar:
#include <stdio.h>
int main()
{
int bytes_read;
int nbytes = 100;
char *my_string;
puts ("Please enter a line of text.");
/* These 2 lines are the heart of the program. */
my_string = (char *) malloc (nbytes + 1);
bytes_read = getline (&my_string, &nbytes, stdin);
if (bytes_read == -1)
{
puts ("ERROR!");
}
else
{
puts ("You typed:");
puts (my_string);
}
return 0;
I think if you see carefully at right hand side of this page you will see many questions similar to yours. You can use fflush() on windows.

String delimiter using C

I've started learning how to use strings, but I'm a little bit confused about the whole concept. I'm trying to read word by word from a file that contains strings.
Here is the file:
Row, row, row your boat,
Gently down the stream.
Merrily, merrily, merrily, merrily,
Life is but a dream.
My approach was to use
char hold[25];
// Statement
while(fscanf(fpRow, "%s", hold) != EOF)
printf("%s %d\n", hold, strlen(hold));
So my task is to read each string and exclude all the , and . in the file. To do so the approach would be to use %[^,.] instead of %s correct? But when I tried this approach my string only wants to read the first word of the file and the loop never exits. Can someone explain to me what I'm doing wrong? Plus, if it's not too much to ask for what's the significance between fscanf and fgets? Thanks
while(fscanf(fpRow, "%24[^,.\n ]", hold) != EOF)
{
fscanf(fpRow, "%*c", hold);
printf("%s %d\n", hold, strlen(hold));
}
Yes, %[^,. ] should work -- but keep in mind that when you do that, it will stop reading when it encounters any of those characters. You then need to read that character from the input buffer, before trying to read another word.
Also note that when you use either %s or %[...], you want to specify the length of the buffer, or you end up with something essentially like gets, where the wrong input from the user can/will cause buffer overflow.

Resources