scanf skip all until a string - c

Is it possible using scanf to skip all characters until I reach s specific string.
I have an html file and I want to skip all characters before and including this string: "<h2><a href=" and then read http link between two quotes.

What an old question I've stumbled upon. Nevertheless, it's still here and I think I've got a good answer. So, why not leave for the following generations, right?
You've been told scanf can't do it. Well, I disagree, and here is why:
scanf can ignore everything until it finds the first letter of the sought string
scanf ("%*[^<]");
Then it can try to ignore the string you are looking for (char by char).
found = scanf ("<h2><a href=\"%[^\"]", str_link) == 1;
It will fail in case it is not it yet and will stop executing, never getting to the %[^\"] command, which reads/stores everything until " character is found.
In such a case, it returns 0, or EOF, for not being able to execute the scan (it returns how many of the variable it was able to fill)
Now, if it does find, it will finally execute the reading and returns 1.
note: you should check the documentation for precise information, which can be found at cplusplus.com
while ( !found && !feof(stdin) )
{
scanf ("%*[^<]");
found = scanf ("<h2><a href=\"%[^\"]", str_link) == 1;
}
I suppose the rest of the file could just be ignored.
Up to you.
This is a good method, I suppose, because it takes full advantage of scanf's speed, and it doesn't require you to store the whole file.
The idea can be applied to many other tasks.
scanf is a very powerful tool, though a bit tricky.

You can always search for string href=" then set a pointer there. then copy or scan the sting until you encounter a " again.
while(*p!='"'){
copy to abuffer; }

Related

why does this part of the program skip the second input?

On entering the first inputs this part of my program terminates.
float p_x,p_y,q_x,q_y,r_x,r_y;
printf("Note :- coordinates are entered as follows (x,y) \n");
printf("Enter the coordinates of 'p' :");
scanf("(%f,%f)",&p_x,&p_y);
printf("Enter the coordinates of 'q' :");
scanf("(%f,%f)",&q_x,&q_y);
Short answer: Change the second scanf call to
scanf(" (%f,%f)", &q_x, &q_y);
Note the extra space in the format statement.
Longer explanation: One of scanf's quirks is that it only reads what it needs, and leaves everything else sitting on the input stream for next time. So when you read p_x and p_y, scanf reads the (, the value for p_x, the ,, the value for p_y, and the ), but it doesn't do anything with the invisible final \n on the line, the "newline" character that's there as a result of the Enter key you typed.
So, later, when you try to read q_x and q_y, the first thing scanf wants to see is the ( you said should be there, but the first thing scanf actually sees is that \n character left over from last time. So the second scanf call fails, reading nothing.
In a scanf format statement, a space character means "read and ignore whitespace". So if you change the second format string to " (%f,%f)", scanf will read and ignore the stray newline, and then it will be able to read the input you asked for.
The reason you don't have this problem all the time is that most (though not all) of the scanf format characters automatically skip leading whitespace as a part of doing their work. If you had, say, a simpler call to
scanf("%f", &q_x);
followed by a later call to
scanf("%f", &q_y);
it would work just fine, without any extra explicit space characters in the format strings.
General advice: You may be thinking that this is a pretty lame situation. You may have gotten the impression that scanf was supposed to be a nice, simple way to do input in your C programs, and you may be thinking, "But this is not simple! How was I supposed to know to write " (%f,%f)"?" And I would absolutely agree with you: This is a pretty lame situation!
The fact is that scanf only seems to be a nice, simple way to do input. It's actually a terribly complicated, unforgiving, nearly useless mess. It's only simple if you're reading very simple input, such as single numbers or simple strings (not containing spaces), or maybe single characters. For anything more complicated than that, there are so many quirks and foibles and exceptions and special cases that it almost never works the first time, and it's often more trouble than it's worth to even try to get it working.
So the general advice is to only try to use scanf for very simple input, during the first few weeks of your C programming career. You can read what I mean by "very simple input" in this answer to a previous question. Then, once you've gotten a few skills under your belt, or when you need to do something a little more complicated, its time to learn how to do input using better techniques than scanf.
Check return value to identify problems.
// scanf("(%f,%f)",&q_x,&q_y);
if (scanf("(%f,%f)",&q_x,&q_y) != 2) {
puts("Bad input");
}
Tolerate and consume spaces, new lines, tabs, etc. between portion of input. OP did not do this. The '\n' of the first entry was not consumed and caused the 2nd scanf() to fail. "%f" already allows optional leading white-spaces. Add a " " before the fixed formats characters: (,) so optional white-spaces will get consumed there too.
// v---v---v add spaces to format to allow any white-space in input.
if (scanf(" (%f ,%f )",&q_x,&q_y) != 2) {
puts("Bad input");
}

Extracting the domain extension of a URL stored in a string using scanf()

I am writing a code that takes a URL address as a string literal as input, then runs the domain extension of the URL through an array and returns the index if finds a match, -1 if does not.
For example, an input would be www.stackoverflow.com, in this case, I'd need to extract only the com part. In case of www.google.com.tr, I'd need only com again, ignoring the .tr part.
I can think of basically writing a function that'll do that just fine but I'm wondering if it is possible to do it using scanf() itself?
It's really an overhead to use scanf here. But you can do this to realize something similar
char a[MAXLEN],b[MAXLEN],c[MAXLEN];
scanf("%[^.].%[^.].%[^. \n]",a,b,c);
printf("Desired part is = %s\n",c);
To be sure that formatting is correct you can check whether this scanf call is successful or not. For example:
if( 3 != scanf("%[^.].%[^.].%[^. \n]",a,b,c)){
fprintf(stderr,"Format must be atleast sth.something.sth\n");
exit(EXIT_FAILURE);
}
What is the other way of achieving this same thing. Use fgets to read the whole line and then parse with strtok with delimiters ".". This way you will get parts of it. With fgets you can easily support different kind of rules. Instead of incorporating it in scanf (which will be a bit difficult in error case), you can use fgets,strtok to do the same.
With the solution provided above only the first three parts of the url is being considered. Rest are not parsed. But this is hardly the practical situation. Most the time we have to process the whole information, all the parts of the url (and we don't know how many parts can be there). Then you would be better using fgets/strtok as mentioned above.

Use scanf with Regular Expressions

I've been trying to use regular expressions on scanf, in order to read a string of maximum n characters and discard anything else until the New Line Character. Any spaces should be treated as regular characters, thus included in the string to be read.
I've studied a Wikipedia article about Regular Expressions, yet I can't get scanf to work properly. Here is some code I've tried:
scanf("[ ]*%ns[ ]*[\n]", string);
[ ] is supposed to go for the actual space character, * is supposed to mean one or more, n is the number of characters to read and string is a pointer allocated with malloc.
I have tried several different combinations; however I tend to get only the first word of a sentence read (stops at space character). Furthermore, * seems to discard a character instead of meaning "zero or more"...
Could anybody explain in detail how regular expressions are interpreted by scanf? What is more, is it efficient to use getc repetitively instead?
Thanks in Advance :D
The short answer: scanf does not handle regular expressions literally speaking.
If you want to use regular expressions in C, you could use the regex POSIX library. See the following question for a basic example on this library usage : Regular expressions in C: examples?
Now if you want to do it the scanf way you could try something like
scanf("%*[ ]%ns%*[ ]\n",str);
Replace the n in %ns by the maximal number of characters to read from input stream.
The %*[ ] part asks to ignore any spaces. You could replace the * by a specific number to ignore a precise number of characters. You could add other characters between braces to ignore more than just spaces.
Not sure if the above scanf would work as spaces are also matched with the %s directive.
I would definitely go with a fgets call, then triming the surrounding whitespaces with something like the following: How do I trim leading/trailing whitespace in a standard way?
is it efficient to use getc repetitively instead?
Depends somewhat on the application, but YES, repeated getc() is efficient.
unless I read the question wrong, %[^'\n']s will save everything until the carriage return is encountered.

Can we have while loop test two arguments at the same time without &&/||

I was checking Beej's guide to IPC and one line of code took my attention.
In the particular page, the while loop in speak.c has two conditions to check while (gets(s), !feof(stdin)).
So my question is how is this possible as I have seen while look testing only one condition most of the time.
PS: I am little new to these. Will be grateful for any help. Thanks!
The snippet
while (gets(s), !feof(stdin))
uses the comma operator, first it executes gets(s), then it tests !feof(stdin), which is the result of the condition.
By the way don't use gets, it's extremely unsafe. Be wary of sources using it, they probably aren't good sources for learning the language.
The code
while(gets(s), !feof(stdin)) {
/* loop body */
}
is equivalent to
gets(s);
while(!feof(stdin)) {
/* loop body */
gets(s);
}
just more concise as it avoids the repetition of gets before the loop and in the loop body.
A couple of people have already pointed out some of the problems with this. I certainly agree that using gets (at all) is a lousy idea.
I think it's worth mentioning one other detail though: since this uses feof(file) as the condition for exiting the loop, it can/will also misbehave if you encounter an error before the end of the file. When an error occurs, the error flag will be set but (normally) the EOF flag won't be -- and since you can't read from the file any more (due to the error) it never will be either, so this will go into an infinite loop.
The right way to do the job is with fgets, and check its return value:
while (fgets(s, length_of_s, stdin))
process(s);
This tests for fgets succeeding at reading from the file, so it'll exit the loop for either end of file or an error.
One other minor detail: when fgets reads a string, it normally retains the new-line at the end of the line (where gets throws it away). You'll probably have to add a little more code to strip it off is it's present (and possibly deal with a line longer than the buffer you allocated if the newline isn't present).
This test is using the comma operator and has been used as a way of getting the next line of text using gets(s) and testing for end-of-file using !feof(stdin).
This syntax doesn't evaluate two expression. It executes first the gets(s) and then evaluates !feof(stdin) which may be modified by the gets() function call.
It's not a very good way to do it since it both use gets(), which is not a safe function and it's quite uneasy to read for a beginner (hence your question).

How to use sscanf correctly and safely

First of all, other questions about usage of sscanf do not answer my question because the common answer is to not use sscanf at all and use fgets or getch instead, which is impossible in my case.
The problem is my C professor wants me to use scanf in a program. It's a requirement.
However the program also must handle all the incorrect input.
The program must read an array of integers. It doesn't matter in what format the integers
for the array are supplied. To make the task easier, the program might first read the size of the array and then the integers each in a new line.
The program must handle the inputs like these (and report errors appropriately):
999999999999999...9 (numbers larger than integer)
12a3 (don't read this as an integer 12)
a...z (strings)
11 aa 22 33\n all in one line (this might be handled by discarding everything after 11)
inputs larger than the input array
There might be more incorrect cases, these are the only few I could think of.
If the erroneous input is supplied, the program must ask the user to input again until
the correct input is given, but the previous correct input must be kept (only incorrect
input must be cleared from the input stream).
Everything must conform to C99 standard.
The scanf family of function cannot be used safely, especially when dealing with integers. The first case you mentioned is particularly troublesome. The standard says this:
If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undeļ¬ned.
Plain and simple. You might think of %5d tricks and such but you'll find they're not reliable. Or maybe someone will think of errno. The scanf functions aren't required to set errno.
Follow this fun little page: they end up ditching scanf altogether.
So go back to your C professor and ask them: how exactly does C99 mandate that sscanf will report errors ?
Well, let sscanf accept all inputs as %s (i.e. strings) and then program analyze them
If you must use scanf to accept the input, I think you start with something a bit like the following.
int array[MAX];
int i, n;
scanf("%d", &n);
for (i = 0; i < n && !feof(stdin); i++) {
scanf("%d", &array[i]);
}
This will handle (more or less) the free-format input problem since scanf will automatically skip leading whitespace when matching a %d format.
The key observation for many of the rest of your concerns is that scanf tells you how many format codes it parsed successfully. So,
int matches = scanf("%d", &array[i]);
if (matches == 0) {
/* no integer in the input stream */
}
I think this handles directly concerns (3) and (4)
By itself, this doesn't quite handle the case of the input12a3. The first time through the loop, scanf would parse '12as an integer 12, leaving the remaininga3` for the next loop. You would get an error the next time round, though. Is that good enough for your professor's purposes?
For integers larger than maxint, eg, "999999.......999", I'm not sure what you can do with straight scanf.
For inputs larger than the input array, this isn't a scanf problem per se. You just need to count how many integers you've parsed so far.
If you're allowed to use sscanf to decode strings after they've been extracted from the input stream by something like scanf("%s") you could also do something like this:
while (...) {
scanf("%s", buf);
/* use strtol or sscanf if you really have to */
}
This works for any sequence of white-space separated words, and lets you separate scanning the input for words, and then seeing if those words look like numbers or not. And, if you have to, you can use scanf variants for each part.
The problem is my C professor wants me to use scanf in a program.
It's a requirement.
However the program also must handle all the incorrect input.
This is an old question, so the OP is not in that professor's class any more (and hopefully the professor is retired), but for the record, this is a fundamentally misguided and basically impossible requirement.
Experience has shown that when it comes to interactive user input, scanf is suitable only for quick-and-dirty situations when the input can be assumed to correct.
If you want to read an integer (or a floating-point number, or a simple string) quickly and easily, then scanf is a nice tool for the job. However, its ability to gracefully handle incorrect input is basically nonexistent.
If you want to read input robustly, reliably detecting incorrect input, and perhaps warning the user and asking them to try again, scanf is simply not the right tool for the job. It's like trying to drive screws with a hammer.
See this answer for some guidelines for using scanf safely in those quick-and-dirty situations. See this question for suggestions on how to do robust input using something other than scanf.
scanf("%s", string) into long int_string = strtol(string, &end_pointer, base:10)

Resources