Scanning unknown set of characters - c

What is the correct way to scan unknown number of set of characters? let me explain , i want to scan with form "char:integer" e.g a:5
What is the right way
while(scanf(" %c%c%d",&a,&b,&c) != EOF){...}
or
while(scanf(" %c%c%d",&a,&b,&c) ==3){...}
the second pattern is tricky bcs sometimes it works and sometimes it does not.
also why does in first pattern using windows require ctr+Z (eof) to press twice?

the second pattern is tricky bcs sometimes it works and sometimes it does not.
That's because
while(scanf(" %c%c%d",&a,&b,&c) == 3){...}
will run only when all the three variables got assigned properly without any early matching failure or end of file.
Whereas
while(scanf(" %c%c%d",&a,&b,&c) != EOF){...}
will run even if few (<3) variables got assigned unless there's an end of file. That means this definition will work even in the case of matching failure which is undesirable.
So use while(scanf(" %c%c%d",&a,&b,&c) == 3){...} this definition.

Maybe you should just use while(scanf("%c:%d",&a,&b) ==2){...}.
Because this will stop the loop when a matching failure or EOF is met, both of which implies there happenes something unexpected, so you'd better jump out of the loop immediately. And I don't think the value of ':' is useful.

Related

Extracting the domain extension of a URL stored in a string using scanf()

I am writing a code that takes a URL address as a string literal as input, then runs the domain extension of the URL through an array and returns the index if finds a match, -1 if does not.
For example, an input would be www.stackoverflow.com, in this case, I'd need to extract only the com part. In case of www.google.com.tr, I'd need only com again, ignoring the .tr part.
I can think of basically writing a function that'll do that just fine but I'm wondering if it is possible to do it using scanf() itself?
It's really an overhead to use scanf here. But you can do this to realize something similar
char a[MAXLEN],b[MAXLEN],c[MAXLEN];
scanf("%[^.].%[^.].%[^. \n]",a,b,c);
printf("Desired part is = %s\n",c);
To be sure that formatting is correct you can check whether this scanf call is successful or not. For example:
if( 3 != scanf("%[^.].%[^.].%[^. \n]",a,b,c)){
fprintf(stderr,"Format must be atleast sth.something.sth\n");
exit(EXIT_FAILURE);
}
What is the other way of achieving this same thing. Use fgets to read the whole line and then parse with strtok with delimiters ".". This way you will get parts of it. With fgets you can easily support different kind of rules. Instead of incorporating it in scanf (which will be a bit difficult in error case), you can use fgets,strtok to do the same.
With the solution provided above only the first three parts of the url is being considered. Rest are not parsed. But this is hardly the practical situation. Most the time we have to process the whole information, all the parts of the url (and we don't know how many parts can be there). Then you would be better using fgets/strtok as mentioned above.

Solve ÿ end of file in C

When I copy the content of a file to another in C at the end of the output file I have this character ÿ. I understand thanks to this forum that it is the EOF indicator but I don't understand what to do in order to get rid of it in the output file.
This is my code:
second_file = fopen(argv[2], "w+");
while (curr_char != EOF)
{
curr_char = fgetc(original_file);
fputc(curr_char, second_file);
}
printf("Your file has been successfully copy\n");
fclose(second_file);
fclose(original_file);
For each character you read, you have two things to do:
Check to see if it's EOF.
If not, write it to the output.
Your problem is you're doing these two things in the wrong order.
There are potentially several different ways of solving this. Which one you pick depends on how much you care about your program looking good, as opposed to merely working.
One. Starting with the code you wrote, we could change it to:
while (curr_char != EOF)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
Here, we explicitly test to see if the character is EOF, immediately after reading it, before writing it. If it's EOF, we break out of the loop early. This will work, but it's ugly: there are two different places where we test for EOF, and one of them never "fires". (Also, as a commentator reminded me, there's the problem that the first time through the loop, we're testing curr_char before we've ever set it.)
Two. You could rearrange it like this:
curr_char = getc(original_file);
while (curr_char != EOF)
{
putc(curr_char, second_file);
curr_char = getc(original_file);
}
Here, we read an initial character, and as long as it's not EOF, we write it and read another. This will work just fine, but but it's still a little bit ugly, because this time there are two different places where we read the character.
Three. You could rearrange it like this:
while ((curr_char = getc(original_file)) != EOF)
{
putc(curr_char, second_file);
}
This is the conventional way of writing a character-copying loop in C. The call to getc and the assignment to curr_char are buried inside of the controlling expression of the while loop. It depends on the fact that in C, an assignment expression has a value just like any other expression. That is, the value of the expression a = b is whatever value we just assigned to a (that is, b's value). So the value of the expression curr_char = getc(original_file) is the character we just read. So when we say while ((curr_char = getc(original_file)) != EOF), what we're actually saying is, "Call getc, assign the result to curr_char, and if it's not equal to EOF, take another trip around the loop."
(If you're still having trouble seeing this, I've written other explanations in these notes and this writeup.)
This code is both good and bad. It's good because we've got exactly one place we read characters, one place we test characters, and one place we write characters. But it's a little bit bad because, let's admit it, it's somewhat cryptic at first. It's hard to think about that assignment-buried-inside-the-while-condition. It's code like this that gives C a reputation as being full of obscure gobbledegook.
But, at least in this case, it really is worth learning the idiom, and becoming comfortable with it, because the reductions to just one read and one test and one write really are virtues. It doesn't matter so much in a trivial case like this, but in real programs which are complicated for other reasons, if there's some key piece of functionality that happens in two different places, it's extremely easy to overlook this fact, and to make a change to one of them but forget to make it to the other.
(In fact, this happened to me just last week at work. I was trying to fix a bug in somebody else's code. I finally figured out that when the code did X, it was inadvertently clearing Y. I found the place where it did X, and I added some new code to properly recreate Y. But when I tested my fix, it didn't work! It turned out there were two separate places where the code did X, and I had found and fixed the wrong one.)
Finally, here's an equivalently minimal but unconventional way of writing the loop:
while (1)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
This is kind of like number 1, but it gets rid of the redundant condition in the while loop, and replaces it with the constant 1, which is "true" in C. This will work just fine, too, and it shares the virtue of having one read, one test, and one write. It actually ends up doing exactly the same operations and in exactly the same order as number 3, but by being laid out linearly it may be easier to follow.
The only problem with number 4 is that it's an unconventional, "break in the middle" loop. Personally, I don't have a problem with break-in-the-middle loops, and I find they come up from time to time, but if I wrote one and someone said "Steve, that's ugly, it's not an idiom anyone recognizes, it will confuse people", I'd have to agree.
P.S. I have replaced your calls to fgetc and fputc with the more conventional getc and putc. I'm not sure who told you to use fgetc and fputc, and there are obscure circumstances where you need them, but they're so rare that in my opinion one might as well forget that the "f" variants exist, and always use getc and putc.

scanf skip all until a string

Is it possible using scanf to skip all characters until I reach s specific string.
I have an html file and I want to skip all characters before and including this string: "<h2><a href=" and then read http link between two quotes.
What an old question I've stumbled upon. Nevertheless, it's still here and I think I've got a good answer. So, why not leave for the following generations, right?
You've been told scanf can't do it. Well, I disagree, and here is why:
scanf can ignore everything until it finds the first letter of the sought string
scanf ("%*[^<]");
Then it can try to ignore the string you are looking for (char by char).
found = scanf ("<h2><a href=\"%[^\"]", str_link) == 1;
It will fail in case it is not it yet and will stop executing, never getting to the %[^\"] command, which reads/stores everything until " character is found.
In such a case, it returns 0, or EOF, for not being able to execute the scan (it returns how many of the variable it was able to fill)
Now, if it does find, it will finally execute the reading and returns 1.
note: you should check the documentation for precise information, which can be found at cplusplus.com
while ( !found && !feof(stdin) )
{
scanf ("%*[^<]");
found = scanf ("<h2><a href=\"%[^\"]", str_link) == 1;
}
I suppose the rest of the file could just be ignored.
Up to you.
This is a good method, I suppose, because it takes full advantage of scanf's speed, and it doesn't require you to store the whole file.
The idea can be applied to many other tasks.
scanf is a very powerful tool, though a bit tricky.
You can always search for string href=" then set a pointer there. then copy or scan the sting until you encounter a " again.
while(*p!='"'){
copy to abuffer; }

Can we have while loop test two arguments at the same time without &&/||

I was checking Beej's guide to IPC and one line of code took my attention.
In the particular page, the while loop in speak.c has two conditions to check while (gets(s), !feof(stdin)).
So my question is how is this possible as I have seen while look testing only one condition most of the time.
PS: I am little new to these. Will be grateful for any help. Thanks!
The snippet
while (gets(s), !feof(stdin))
uses the comma operator, first it executes gets(s), then it tests !feof(stdin), which is the result of the condition.
By the way don't use gets, it's extremely unsafe. Be wary of sources using it, they probably aren't good sources for learning the language.
The code
while(gets(s), !feof(stdin)) {
/* loop body */
}
is equivalent to
gets(s);
while(!feof(stdin)) {
/* loop body */
gets(s);
}
just more concise as it avoids the repetition of gets before the loop and in the loop body.
A couple of people have already pointed out some of the problems with this. I certainly agree that using gets (at all) is a lousy idea.
I think it's worth mentioning one other detail though: since this uses feof(file) as the condition for exiting the loop, it can/will also misbehave if you encounter an error before the end of the file. When an error occurs, the error flag will be set but (normally) the EOF flag won't be -- and since you can't read from the file any more (due to the error) it never will be either, so this will go into an infinite loop.
The right way to do the job is with fgets, and check its return value:
while (fgets(s, length_of_s, stdin))
process(s);
This tests for fgets succeeding at reading from the file, so it'll exit the loop for either end of file or an error.
One other minor detail: when fgets reads a string, it normally retains the new-line at the end of the line (where gets throws it away). You'll probably have to add a little more code to strip it off is it's present (and possibly deal with a line longer than the buffer you allocated if the newline isn't present).
This test is using the comma operator and has been used as a way of getting the next line of text using gets(s) and testing for end-of-file using !feof(stdin).
This syntax doesn't evaluate two expression. It executes first the gets(s) and then evaluates !feof(stdin) which may be modified by the gets() function call.
It's not a very good way to do it since it both use gets(), which is not a safe function and it's quite uneasy to read for a beginner (hence your question).

Using scanf in a while loop

Probably an extremely simple answer to this extremely simple question:
I'm reading "C Primer Plus" by Pratta and he keeps using the example
while (scanf("%d", &num) == 1)...
Is the == 1 really necessary? It seems like one could just write:
while (scanf("%d", &num))
It seems like the equality test is unnecessary since scanf returns the number of objects read and 1 would make the while loop true. Is the reason to make sure that the number of elements read is exactly 1 or is this totally superfluous?
In C, 0 is evaluated to false and everything else to true. Thus, if scanf returned EOF, which is a negative value, the loop would evaluate to true, which is not what you'd want.
Since scanf returns the value EOF (which is -1) on end of file, the loop as written is correct. It runs as long as the input contains text that matches %d, and stops either at the first non-match or end of file.
It would have been clearer at a glance if scanf were expecting more than one input....
while (scanf("%d %d", &x, &y)==2) { ... }
would exit the loop when the first time it was unable to match two values, either due to end of file end of file (scanf returns EOF (which is -1)) or on input matching error (e.g. the input xyzzy 42 does not match %d %d so scanf stops on the first failure and returns 0 without writing to either x or y) when it returns some value less than 2.
Of course, scanf is not your friend when parsing real input from normal humans. There are many pitfalls in its handling of error cases.
Edit: Corrected an error: scanf returns EOF on end of file, or a non-negative integer counting the number of variables it successfully set.
The key point is that since any non-zero value is TRUE in C, failing to test the return value correctly in a loop like this can easily lead to unexpected behavior. In particular, while(scanf(...)) is an infinite loop unless it encounters input text that cannot be converted according to its format.
And I cannot emphasize strongly enough that scanf is not your friend. A combination of fgets and sscanf might be enough for some simple parsing, but even then it is easily overwhelmed by edge cases and errors.
You understood the C code correctly.
Sometimes the reason for testing the number of items read is that someone wants to make sure that all items were read instead of scanf quitting early when it the input didn't match the expected type. In this particular case it didn't matter.
Usually scanf is a poor choice of functions because it doesn't meet the needs of interactive input from a human user. Usually a combination of fgets and sscanf yield better results. In this particular case it didn't matter.
If later chapters explain why some kinds of coding practices are better than this trivial example, good. But if not, you should dump the book you're reading.
On the other hand, your substitute code isn't exactly a substitute. If scanf returns -1 then your while loop will execute.
While you are correct it is not strictly necessary, some people prefer it for several reasons.
First, by comparing to 1 it becomes an explicit boolean value (true or false). Without the comparison, you are testing on an integer, which is valid in C, but not in later languages (like C#).
Secondly, some people would read the second version in terms of while([function]), instead of while([return value]), and be momentarily confused by testing a function, when what is clearly meant is testing the return value.
This can be completely a matter of personal preference, and as far as I'm concerned, both are valid.
One probably could write it without an explicit comparison (see the JRL's answer though), but why would one? I'd say that comparison-less conditions should only be used with values that have explicitly boolean semantics (like an isdigit() call, for example). Everything else should use an explicit comparison. In this case (the result of scanf) the semantics is pronouncedly non-boolean, so the explicit comparison is in order.
Also, the comparison one can usually omit is normally a comparison with zero. When you feel the urge to omit the comparison with something else (like 1 in this case) it is better to think twice and make sure you know what your are doing (see the JRL's answer again).
In any case, when the comparison can be safely omitted and you actually omit it, the actual semantical meaning of the condition remains the same. It has absolutely no impact on the efficiency of the resultant code, if that's something you are worrying about.

Resources