Use of regular expressions in c for strcmp function - c

So, what im interesting in trying to do is read in a line with getline and look for specific phases if i see something like "verbose on" or "verbose off" i know what i want to do, however if it is "verbose "something"" then i want to error. Im pretty sure that this is going to require regular expressions because what comes after is arbitrary. Some insight on this problem would be much appreciated. Thank you.
strcmp(buf,"verbose on")==0
strcmp(buf,"verbose off")==0
strcmp(buf,"verbose "regex expression here im thinking"")==0
This is how i think it should go, just need a bit of a push.

No need for a regex. You can use strncmp:
strncmp(buf, "verbose", strlen("verbose")) == 0
This only compares the first 7 characters, so it will match any buf that starts with "verbose".
Note: I'm allergic against magic numbers, but you could of course replace the strlen call with a literal 7 if you prefer. Also, for real code, I would replace the duplicated string literal with a constant.

Related

Unable to form the required regex in C

I am trying to write a regex which can search a string and return true if it matches with the regex and false otherwise.
Check should ensure string is wildcard domain name of a website.
Example:
*.cool.dude is valid
*.cool is not valid
abc.cool.dude is not valid
So I had written something which like this
\\*\\.[.*]\\.[.*]
However, this is also allowing a *.. string as valid string because * means 0 or infinite occurrences.
I am looking for something which ensures that at-least 1 occurrence of the string happens.
Example:
*.a.b -> valid but *.. -> invalid
how to change the regex to support this?
I have already tried doing something like this:
\\*\\.([.*]{1,})\\.([.*]{1,}) -> doesnt work
\\*\\.([.+])\\.(.+) -> doesnt work
^\\*\\.[a-zA-Z]+\\.[a-zA-Z]+ -> doesnt work
I have tried a bunch of other options as well and have failed to find a solution. Would be great if someone can provide some input.
PS. Looking for a solution which works in C.
[.*] does not mean "0 or more occurrences" of anything. It means "a single character, either a (literal) . or a (literal) [*]". […] defines a character class, which matches exactly one character from the specified set. Brackets are not even remotely the same as parentheses.
So if you wanted to express "zero or more of any character except newline", you could just write .*. That's what .* means. And if you wanted "one or more" instead of "zero or more", you could change the * to a plus, as long as you remember that regex.h regexes should always be compiled with the REG_EXTENDED flag. Without that flag, + is just an ordinary character. (And there are a lot of other inconveniences.)
But that's probably not really what you want. My guess is that you want something like:
^[*]([.][A-Za-z0-9_]+){2,}$
although you'll have to correct the character class to specify the precise set of characters you think are legitimate.
Again, don't forget the crucial REG_EXTENDED flag when you call regcomp.
Some notes:
The {2,} requires at least two components after the *, so that *.cool doesn't match.
The ^ and $ at the beginning and end of the regex "anchor" the match to the entire input. That stops the pattern matching just a part of the input, but it might not be exactly what you want, either.
Finally, I deliberately used a single-character character class to force [*] and [.] to be ordinary characters. I find that a lot more readable than falling timber (\\) and it avoids having to think about the combination of string escaping and regex-escaping.
For more information, I highly recommend reading man regcomp and man 7 regex. A good introduction to regexes might be useful, as well.

'If' Condition Not Working As Expected In My C Code

I am fully aware that this is due to some error overlooked by me while writing my text-based calculator project in C, but I have only started learning C less than a week ago, so please help me out!
Since the entire code is 119 lines, I'll just post the necessary snippet where the real issue lies: (There are no errors during compiling, so there is no error beyond these lines)
char choice[15];
printf("Type 'CALCULATE' or 'FACTORISE' or 'AVERAGE' to choose function :\n");
gets(choice);
if (choice == "CALCULATE")
The bug is that even after perfectly entering CALCULATE or FACTORISE or AVERAGE, I still get the error message that I programmed in case of invalid input (i.e, if none of these 3 inputs are entered). It SHOULD be going on to ask me the first number I wish to operate on, as written for the CALCULATE input.
The code runs fine, no errors in VS 2013, and so I'm sure its not a syntax error but rather something stupid I've done in these few lines.
If you use == you are comparing the addresses of 2 arrays, not the contents of the arrays.
Instead, you need to do:
if (strcmp(choice, "CALCULATE") == 0)
Two things to mention here:
Never use gets() it has serious security issues and is removed from the latest standard. Use fgets() instead.
To compare strings, you should use strcmp(), not ==.
The problem is you're trying to compare a string literal with a char array. C isn't found those things being the same, since the '==' comparison operator is not implemented in that way.
You have two options for performing that comparison :
1) Use the strcmp() function, from string.h library
2) Manually comparing the chars in your array, and the string literal
Definitely, the first option is the easiest and cleanest one.

What is the syntax in c to combine statements as a parameter

I have an inkling there is an old nasty way to get a function run as a parameter is calculated, but sine I do not know what it is called I cannot search out the rules.
An example
char dstr[20];
printf("a dynamic string %s\n", (prep_dstr(dstr),dstr));
The idea is that the "()" will return the address dstr after having executed the prep_dstr function.
I know it is ugly and I could just do it on the line before - but it is complicated...
#
Ok - in answer to the pleading not to do it.
I am actually doing a MISRA cleanup on some existing code (not mine don't shoot me), currently the 'prep_dstr' function takes a buffer modifies it (without regard to the length of the buffer) and returns the pointer it was passed as a parameter.
I like to take a small step - test then another small step.
So - a slightly less nasty approach than returning a pointer with no clue about its persistence is to stop the function returning a pointer and use the comma operator (after making sure it does not romp off the end of the buffer).
That gets the MISRA error count down, when it all still works and the MISRA errors are gone I will try to get around to elegance - perhaps the year after next :).
Comma operator has the appropriate precedence and, besides, it gives a sequence point, that is, it defines a point in the execution flow of the program where all the previous side effects are resolved.
So, whatever your function prep_dstr() does to the string dstr, it's completely performed before the comma operator is reached.
On the other hand, comma operator gives an expression whose value is the rightest operand.
The following examples give you the value dstr, as you want:
5+3, prep_dstr(dstr), sqrt(25.0), dstr;
a+b-c, NULL, dstr;
(prep_dstr(dstr), dstr);
Of course, such expression can be used wherever you need the string dstr.
Theerefore, the syntax you employed in the question, then, it does the job perfectly.
Since you are open to play with the syntax, there is another possibility you can use.
By taking in account that the function printf() is a function, it is, in particular, an expression.
In this way, it can be put in a comma expression:
prep_dstr(dstr), printf("Show me the string: %s\n", dstr);
It seems that every body is telling you that "don't write code in this way and so and so...".
This kind of religious advices in the programming style are overestimated.
If you need to do something, just do it.
One of the principles of C says: "Don't prevent the programmer of doing what have be done."
However, whatever you do, try to write readable code.
Yes, the syntax you use will work for your purpose.
However, please consider writing clean and readable code. For instance,
char buffer[20];
char *destination = prepare_destination_string(buffer);
printf("a dynamic string %s\n", destination);
Everything can be cleanly named & understood, and intended behaviour easy to infer. You could even omit certain parts if you so would, like destination, or perform easier error checking.
Your inkling and your code are both correct. That said, please don't do this. Putting prep_dstr on its own line makes it much easier to reason about what happens and when.
What you're thinking of is the comma operator. In a context where the comma doesn't already have another meaning (such as separating function arguments), the expression a, b has the value of b, but evaluates a first. The extra parentheses in your code cause the comma to be interpreted this way, rather than as a function argument separator.

Parsing string with C - whats the right tool?

I got a string in this format:
Stuff: </value_1/value_2/value_3>; key="value"
What I need parsed out is value_1, value_2 and value_3 along with the key/value pair. value_3 might or might not be present in the string.
What would one use in C in order to get this done?
I thought about sscanf but the values can be of arbitratry size, so they should be allocated dynamically. strtok would have been my next idea, but that probably needs two separate loops to extract the / separated values and the key/value pair… seems tedious but at least doable.
Anyone with more experience in C has any better idea?
EDIT: regex could be an option, but I would prefer standard string functions if possible at all.
If you use "</>; =\"" as the delimiter set, then strtok() will work in a single pass, extracting in turn:
Stuff:
value_1
value_2
value_3
key
value
Not sure what Stuff: is or if it is needed in the extraction, or just an elidation on your part.
You could use regular expressions; see Regular Expressions - The GNU C Library.
If you don't know what regular expressions are; see Regular Expressions.
Here's an online regular expression editor (or this one, if you don't have Flash installed), so you can test your regexp!

Am I correct that strcmp is equivalent (and safe) for literals?

We all know the trouble overflows can cause, and this is why strn* exist - and most of the time they make sense. However, I have seen code which uses strncmp to compare commandline parameters like so:
if(... strncmp(argv[i], "--help", 6) == 0
Now, I would have thought that this is unnecessary and perhaps even dangerous (for longer parameters it would be easy to miscount the characters in the literal).
strncmp stops on nulls, and the code already assumes argv[i] is null-terminated. Any string literal is guaranteed to be null-terminated, so why not use strcmp?
Perhaps I'm missing something, but I've seen this a few times and this time it intrigued me enough to ask.
yes it is perfectly safe and considered standard practice. String literals are guaranteed to be properly null terminated.
Are you sure that the code is not intended to match on
"--helpmedosoemthingwithareallylongoptionname"?
You're right.
Moreover, the example you provided would match "--help" but also everything that begins with "--help" (like "--help-me").
A rare case in which overzealous == wrong.
As far as I know, you're absolutely right--there's no reason to use strncmp instead of strcmp. Perhaps people are just being overcautious (not necessarily a bad thing).
As others have said, strcmp() is perfectly safe to use with literals. If you want to use strncmp(), try this:
strncmp(argv[i], "--help", sizeof("--help"))
Let the compiler do the counting for you!
This will only match the exact string "--help". If you want to match all strings which begin with "--help" (as your code does), use sizeof() - 1 to not include the last '\0'.
Yes, the presence of literal limits the size of compared data to the size of the literal. stncmp is redundant here.
Some may say that strncmp is a good habit to get into, but this is outweighted by the trouble of counting chars.
It's probably not done for safety. It could have been done to check only the start of command line parameter. Many programs just check the beginning of the command line switches and ignore the rest.
I would probably write something like this in C(if I was using strncmp a lot & didn't want to do character counting):
if(... strncmp(argv[i], "--help", sizeof("--help") - 1) == 0
er... technically couldn't something like this happen?
char *cp1 = "help";
cp1[4] = '!'; // BAD PRACTICE! don't try to mutate a string constant!
// Especially if you remove the terminating null!
...
strcmp(some_variable, "help");
// if compiler is "smart" enough to use the same memory to implement
// both instances of "help", you are screwed...
I guess this is a pathological case and/or garbage-in, garbage out ("Doc, it hurts when I whack my head against the wall!" "Then don't do it!")...
(p.s. I'm just raising the issue -- if you feel this post muddies the waters, comment appropriately & I'll delete it)

Resources