What does [^0-9]+$ mean (regular expression in FLEX) - c

This is what I know:
^ inside brackets matches a character that isn't one of the included inside the brackets.
+ Matches one or more appearances of the expression to its left (in my ex. [^0-9]).
$ If I'm not mistaken, matches to an expression that ends with the expression to its left.
Then it seems this expression should match input that has at least one character that isn't a digit and that ends with that expression, for example it should match:
1a, aaa, 2321a,1b1b
and should not match:
111, 432423,asd3213
but it is unclear to me from running this rule what exactly it matches.
This is my full code:
%option noyywrap
%{
#include<stdio.h>
%}
%%
[^0-9]+$ printf("not a number");
%%
int main()
{
yylex();
return 0;
}
And I'm using flex.
output examples(sorry for the links, it won't let me upload a photo):
[1] https://ibb.co/qp3hB0r - doesn't match but prints back
[2] https://ibb.co/syZHjrw - doesn't match and eats it (why does it happen if I didn't add ".|\n" in the code?)
[3] https://ibb.co/s6S0tQh - matches and prints back
[4] https://ibb.co/VmZW7KR - same as the 3rd
[5] https://ibb.co/2vPfWhc - matched only the 11(?) and ate up the aa
I'm really confused as to what it actually matches and would appreciate the help.

This is what I know:
^ inside brackets matches a character that isn't one of the included inside the brackets.
That's an odd way to put it. More accurate would be that the whole bracket-enclosed fragment matches one character that is not (because of the ^) in the range '0' - '9'.
+ Matches one or more appearances of the expression to its left (in my ex. [^0-9]).
Again an odd way to put it. The + quantifier modifies the preceding fragment to match one or more appearances of whatever it otherwise would match exactly once.
$ If I'm not mistaken, matches to an expression that ends with the expression to its left.
You are mistaken. The $ anchors the match to the end of a line -- the overall pattern matches only text that ends at the end of a line, as determined by immediately preceding a newline (and therefore not at the very end of the file). That's a restriction, not an extension: nothing is matched that wouldn't be matched by the pattern excluding the $, but there is an additional requirement that the match occur at the end of a line. That's not at all the same thing as matching text that ends with a match to the preceding pieces of the pattern.
Thus,
it seems this expression should match input that has at least one
character that isn't a digit and that ends with that expression, for
example it should match: 1a, aaa, 2321a,1b1b
No. Taking those as four separate examples, it would not match any of them unless they appeared at the end of a line. If they all did appear at the end of a line, then only aaa would be matched in total, but the trailing a or b of each of the others would be matched.
Note also, however, that when a flex scanner cannot match the input to any user-defined rule, its default rule is invoked, which copies the next input character to the standard output, consuming it. Therefore, if you present an input to your scanner that contains at least one non-digit at the end of a line, then it will eventually consume any preceding input up to the last digit, printing all of that on the standard output, before eventually matching that trailing portion of the line and printing "not a number".

Related

C - format specifier for scanf?

float lat, lon;
char info[50];
scanf("%f, %f, %49[^\n]", &lat, &lon, info);
In the above snippet, what kind of format specifier is %49[^\n].
I do understand that it is the format specifier for the character array which is going to accept input upto 49 characters (+ the sentinal \0), and [^\n] looks like its a regex (although I had read somewhere that scanf doesn't support regex) OR a character set which is to expand to "any other character" that is NOT "newline" \n. Am I correct?
Also, why is there no s in the format specifier for writing into array info?
The program this snippet is from works. But is this good C style?
The specifier %[ is a different conversion specifier from %s, even if it also must be paired with an argument of type char * (or wchar_t *). See e.g. the table here
[set] matches a non-empty sequence of character from set of characters.
If the first character of the set is ^, then all characters not in the set are matched. If the set begins with ] or ^] then the ] character is also included into the set. It is implementation-defined whether the character - in the non-initial position in the scanset may be indicating a range, as in [0-9]. If width specifier is used, matches only up to width. Always stores a null character in addition to the characters matched (so the argument array must have room for at least width+1 characters)
My apologies, I incorrectly answered below. If you can skip to the end, I'll give you the correct answer.
*** Incorrect Answer Begins ***
It would not be a proper format specifier, as there is no type.
%[parameter][flags][width][.precision][length]type
are the rules for a format statement. As youc an see, the type is non-optional. The author of this format item is thinking they can combine regex with printf, when the two have entirely different processing rules (and printf doesn't follow regex's patterns)
*** Correct Answer Begins ***
scanf uses different format string rules than printf Within scanf's man page is this addition to printf's rules
[
Matches a nonempty sequence of characters from the specified set
of accepted characters; the next pointer must be a pointer to char,
and there must be enough room for all the characters in the string,
plus a terminating null byte. The usual skip of leading white space is
suppressed. The string is to be made up of characters in (or not in) a
particular set; the set is defined by the characters between the open
bracket [ character and a close bracket ] character. The set excludes
those characters if the first character after the open bracket is a
circumflex (^). To include a close bracket in the set, make it the
first character after the open bracket or the circumflex; any other
position will end the set. The hyphen character - is also special;
when placed between two other characters, it adds all intervening
characters to the set. To include a hyphen, make it the last character
before the final close bracket. For instance, [^]0-9-] means the set
"everything except close bracket, zero through nine, and hyphen". The
string ends with the appearance of a character not in the (or, with a
circumflex, in) set or when the field width runs out.
Which basically means that scanf can scan with a subset of regex's rules (the character set subset) but not all of regex's rules

Having hard time understanding how to us void printSp(int) in this given task,

I only really understand how to do equation in program I need help on how to approach this task, also I not really sure whats its asking of me to do.
Suppose you are given a function with the following declaration:
void printSp(int); /* prints specified number of spaces */
Write a function named printTri that takes a single argument, a character, and returns an integer value. If the character is not a capital letter (between 'A' and 'Z'), then the function simply returns 0. Otherwise, if it is a capital letter, the function will print a triangle of characters that looks like this: A ABA ABCBA ABCDCBA
NOTE: WIth a fixed-width font, the center letter in each row would line up. Write this out for yourself on paper, to figure out how many spaces should be printed on each line before the characters start. The bottom line has zero spaces. How many spaces should the top line have? The letter passed in becomes the highest letter in the triangle. For example, to print the triangle above, the caller passes in 'D'. After printing, the function returns the total number of non-space characters printed. For example, for the example triangle above, the function must return 16. You must call the printSp function, once per line, as part of your solution. (NOTE: Call printSp for every line, even when the number of spaces to print is zero.) History:
This is what I have so far I know its not much but its all I understand how to do.
if (x >='A' && x <= 'Z') printf(" A\n ABA\n ABCBA\nABCDCBA")
else return 0;
The function printSp() prints spaces.
You currently are outputting a hard coded number of spaces instead of using printSp(). Swap printf(" ") for printSp(3)
Reading the question, they want you to output a variable number of rows based on the letter provided. So for D you print the pattern you have hard coded which contains four rows and enough letters to output the D. For E you add another row.
I generally suggest students approach problems like this by starting with hardcoding, as you have. Ensure that you incorporate the required features, like printSP(). Then make the code more generic to handle other inputs.

Why there should not be type specifier like s or c after [0-9A-Z^%]?

For example consider the following code -
fscanf(fp,"%d:%d:%[^:]:%[^\n]\n",&pow->no,&pow->seen,pow->word,pow->means);
printf("\ntthis is what i read--\n%d:%d:%s:%s:\n",pow->no,pow->seen,pow->word,pow->means);
here pow is pointer to an object declared before,
when I put s as in fscanf(fp,"%d:%d:%[^:]s:%[^\n]\n" the 3rd one is read but not the last one
output is --
4:0:Abridge::
but when i do fscanf(fp,"%d:%d:%[^:]:%[^\n]s\n" all are read
output is --
4:0:Abridge:To condense:
AND without s anywhere fscanf(fp,"%d:%d:%[^:]:%[^\n]\n" all are read
output is --
`4:0:Abridge:To condense:
WHY??
To answer your question what is the meaning of %[^\n]s there are two format specifier one is [] and another is s.
Now the first one will scan anything other than \n and then it gets a \n and keeps it in stdin. And move on. But it doesn't stop here - it basically then tries to find a match for the letter s. In case it doesn't find it - it fails. (The explanation with %[^:]s will be same as this one).
Now decide if this is what you really want.[^\n] is the right one which will scan until \n is found (and yes it doesn't skip whitespace like %s do). scanset covers the letter including s also. And more than that %[^\n]s is self contradictory. So no use of it either.
%d:%d:%[^:]s:%[^\n]
%d - Matches an optionally signed decimal integer. (Ignore whitespace)
: - Then looks for ':'
%d - Matches an optionally signed decimal integer. (Ignore whitespace)
: - Then looks for ':'
%[^:] - No white space ignored - everything is taken into input except `:`
':' is unread.
s - Tries to match 's'. No white space ignored.
%[^\n] - Everything except '\n' inputted. `\n` left unread.
The specifier IS "%[]", you don't need the "s" there.
Read the manual page for scanf()
Your format string doesn't match the input because you the "s" is not part of the specifier and it's not present in the input where the format is expecting it.
By reading the documentation in the link above, you will find out — if you don't already know — that you should also check the return value of scanf() before calling printf() or otherwise your code will invoke undefined behavior, because some of the passed pointers don't get initialized.

How to REGEX // in C? Single line comments

I used the following to get it to work partially:
%{
#define OR 2
#define AND 3
.........
.........
%}
delim [ \t]
ws {delim}*
letter [A-Za-z]
digit [0-9]
comments [/]+({letter}|{digit}|{delim})*
%%
{comments} {return(COMMENT);}
......................
......................
%%
int main()
{
int tkn = 0;
while (tkn = yylex())
{
switch (tkn)
{
case COMMENT:
printf("GOT COMMENT");
}
}
}
This is working fine. The problem is that the regex obviously does not recognize special characters because [/]+({letter}|{digit}|{delim})* does not consider special characters. How to change the regex to accommodate more characters till end of line?
Couldn't you just use
[/]+.*
It will match some number of / and then anything till the end of line. Of course this will not cover comments like /* COMMENT */.
may be its late. But I find this more appropriate to use \/[\/]+.* This will cover double slash and more and then the rest of the text.
Following is the explanation from regex101.com
\/
matches the character / literally (case sensitive) Match a single character present in the text
[\/]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) \/ matches the character / literally (case sensitive)
.* matches any character (except for line terminators)
A single-line comment expression starting with '//' can be captured by the following regular expression.
\/\/[^\r\n]*
\/\/ matches the double-slash
[^\r\n]* matches as many characters that are not carriage-return or line-feed as it can find.
However, the C language allows a single line comment to be extended to the next line when the last character in the line is a backslash (\). Therefore, you may want to use the following.
\/\/[^\r\n]*(?:(?<=\\)\r?\n[^\r\n]*)*
\/\/ matches the double-slash
[^\r\n]* matches as many characters that are not carriage-return (\r) or line-feed (\n) as it can find
(?: start a non-capturing group
(?<=\\) assert that a backslash (\) immediately precedes the current position
\r?\n match the end of a line
[^\r\n]* matches as many characters that are not carriage-return (\r) or line-feed
)* complete the non-capturing group and let it repeat 0 or more times
Note that this method has problems. Depending on what you are doing, you may want to find and use a lexical scanner. A lexical scanner can avoid the following problems.
Scanning the text
/* Comment appears to have // a comment inside it */
will match
// a comment inside it */
Scanning the text
char* a = "string appears to have // a comment";
will match
// a comment";
Why can't you just write
"//"|"/*" {return(COMMENT);}
?
Following regular expression works just fine for me.
\/\/.*

usage of % [^\n]

A[50][5000];
for(i=0;i<50;++i)
scanf("%[\n]",A[i]);
%[^\n]
usage and meaning of it
and can i use that struct like
%[\t]
%[\a]
scanf()'s "%[" conversion specifier starts what's called a "scanset". It's has some similarities to the regex construct that looks the same (but it still is quite different) Here's what the standard says:
Matches a nonempty sequence of characters from a set of expected characters (the scanset).
...
The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the right bracket character is in the scanlist and the next following right bracket character is the matching right bracket that ends the specification; otherwise the first following right bracket character is the one that ends the specification. If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.
So the scanf() conversion "%[\n]" will match a newline character, while "%[^\n]" will match all characters up to a newline.
Here's what P.J. Plauger has to say about scansets in "The Standard C Library":
A scan set behaves much like the s conversion specifier. It stores up to w characters (default is the rest of the input) in the char array pointed at by ptr. It always stores a null character after any input. It does not skip leading white-space. It also lets you specify what characters to consider as part of the field. You can specify all the characters that match, as in %[0123456789abcdefABCDEF], which matches an arbitrary sequence of hexadecimal digits. Or you can specify all the characters that do not match, as in %[^0123456789] which matches any characters other than digits.
If you want to include the right bracket (]) in the set of characters you specify, write it immediately after the opening [ (or [^), as in %[][] which scans for square brackets. You cannot include the null character in the set of characters you specify. Some implementations may let you specify a range of characters by using a minus sign (-). The list of hexadecimal digits, for example, can be written as %[0-9abcdefABCDEF] or even, in some cases, as %[0-9a-fA-F]. Please note, however, that such usage is not universal. Avoid it in a program that you wish to keep maximally portable.
Yes, it's pretty much like a set in a regular expression -- you can specify a set of character to be accepted, or a set of characters to end the scan, so "%[^ \r\n\t]" would read until it encountered a space, carriage return, new-line or tab. Like with an RE, the leading "^" means "not" -- you can omit it to specify the characters that will be accepted instead of those that will end the conversion. With most compilers (though it's not technically required) you can specify ranges, such as "%[a-z]" to specify any lower-case letter (in this case, where the '-' isn't the first or last character, the behavior is implementation defined).
Though not widely used (or even known) this conversion has been part of C almost forever, and is supported in C89/90.
copies a string up to a newline from standard input to element i of A. as written, this acts almost like gets().

Resources