performing regular expression in C

performing regular expression in C - c

I would like to perform regular expression in C . Suppose I have following text like:
thecapital([x], implies(maincity(y),x))
The program has to output like:
implies(maincity(y),x))
can anyone please suggest how shall I proceed?

To transform the input string thecapital([x], implies(maincity(y),x)) to the output string implies(maincity(y),x)) you can use the following simple function:
const char *
transform(const char *expr) {
return expr + 16;
}
It doesn't use regular expressions, but on the other hand it's lightning fast. Or maybe you didn't put your question clearly. For example, you didn't describe in words what transformation should be done. Giving just one example is not enough.
So what do you really want to do:?
Skip the first 16 characters of the input string
Return everything after the first space character
Return everything after the last space character
Return the suffix of the argument starting with the second i
Return "implies(maincity(y),x))"
Return the second argument to the term in parentheses, followed by an extra closing parenthesis
For your one example my simple suggested function fulfills all these requirements. But of course it will fail hopelessly when given any other input.

Related

C Trying to match the exact substring and nothing more

I have tried different functions including strtok(), strcmp() and strstr(), but I guess I'm missing something. Is there a way to match the exact substring in a string?
For example:
If I have a name: "Tan"
And I have 2 file names: "SomethingTan5346" and "nothingTangyrs634"
So how can I make sure that I match the first string and not both? Because the second file is for the person Tangyrs. Or is it impossible with this approach? Am I going at it the wrong way?

If, as seems to be the case, you just want to identify strings that have your text but are immediately followed by a digit, your best bet is probably to get yourself a good regular expression implementation and just search for Tan[0-9].
It could be done simply be using strstr() to find the string then checking the character following that with isnum() but the actual code to do that would be:
not as easy as you think since you may have to do multiple searchs (e.g., TangoTangoTan42 would need three checks); and
inadvisable if there's the chance the searches my become more complex (such as Tan followed by 1-3 digits or exactly two # characters and an X).
A regular expression library will make this much easier, provided you're willing to invest a little effort into learning about it.
If you don't want to invest the time in learning regular expressions, the following complete test program should be a good starting point to evaluate a string based on the requirements in the first paragraph:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int hasSubstrWithDigit(char *lookFor, char *searchString) {
// Cache length and set initial search position.
size_t lookLen = strlen(lookFor);
char *foundPos = searchString;
// Keep looking for string until none left.
while ((foundPos = strstr(foundPos, lookFor)) != NULL) {
// If at end, no possibility of following digit.
if (strlen(foundPos) == lookLen) return 0;
// If followed by digit, return true.
if (isdigit(foundPos[lookLen])) return 1;
// Otherwise keep looking, from next character.
foundPos++;
}
// Not found, return false.
return 0;
}
int main(int argc, char *argv[]) {
if (argc < 3) {
printf("Usage testprog <lookFor> <searchIn>...\n");
return 1;
}
for (int i = 2; i < argc; ++i) {
printf("Result of looking for '%s' in '%s' is %d\n", argv[1], argv[i], hasSubstrWithDigit(argv[1], argv[i]));
}
return 0;
}
Though, as you can see, it's not as elegant as a regex search, and is likely to become even less elegant if your requirements change :-)
Running that with:
./testprog Tan xyzzyTan xyzzyTan7 xyzzyTangy4 xyzzyTangyTan12
shows it is action:
Result of looking for 'Tan' in 'xyzzyTan' is 0
Result of looking for 'Tan' in 'xyzzyTan7' is 1
Result of looking for 'Tan' in 'xyzzyTangy4' is 0
Result of looking for 'Tan' in 'xyzzyTangyTan12' is 1

The solution depends on your definition of exact matching.
This might be useful for you:
Traverse all matches of the target substring.
C find all occurrences of substring
Finding all instances of a substring in a string
find the count of substring in string
https://cboard.cprogramming.com/c-programming/73365-how-use-strstr-find-all-occurrences-substring-string-not-only-first.html
etc.
Having the span of the match, verify that the previous and following characters match/do not match your criterion for "exact match".
Or,
You could take advantage of regex in C++ (I know the tag is "C"), with #include <regex>, or POSIX #include <regex.h>.

You may want to use strstr(3) to search a substring in a string, strchr(3) to search a character in a string, or even regular expressions with regcomp(3).
You should read more about parsing techniques, notably about recursive descent parsers. In some cases, sscanf(3) with %n can also be handy. You should take care of the return count.
You could loop to read then parse every line, perhaps using getline(3), see this.
You need first to document your input file format (or your file name conventions, if SomethingTan5346 is some file path), perhaps using EBNF notation.
(you probably want to combine several approaches I am suggesting above)
BTW, I recommend limiting (for your convenience) file paths to a restricted set of characters. For example using * or ; or spaces or tabs in file paths is possible (see path_resolution(7)) but should be frowned upon.

How to find tokens from a c file?

I am trying to generate tokens from a C source file. I have split the C file into an array line and stored the words of the entire file in an array words.
The problem is with the strtok() function, which is splitting the line on whitespace characters. Because of this, I am not getting certain delimiters like parentheses and brackets because there is no whitespace between them and other tokens.
How do I determine which one is an identifier and which one is an operator?
Code so far:
int main()
{
/* ... */
char line[300][200];
char delim[]=" \n\t";
char *words[1000];
char *token;
while (fgets(&line[i][0], 100, fp1) != NULL)
{
token = strtok(&line[i][0], delim);
while (token != NULL)
{
words[j++] = token;
token = strtok(NULL, delim);
}
i++;
}
for(i = 0; i < 50; i++)
{
printf("%s\n", words[i]);
}
return 0;
}

This is a tricky question, something that needs probably more depth than a StackOverflow answer. I'll try, nonetheless.
Tokenizing the input is the first part of the compilation process. The objective is to simplify the task of the parser, which is going to make an abstract syntax tree with the contents of the file. How do we simplify this? We do recognize those tokens that have a special meaning, also identifiers, operators... C is indeed a tricky, complex language. Let's simplify the language to tokenize: we'll start with a typical calculator.
An input example would be:
( 4 +5)* 2
When syntax is free, you can add or skip spaces, so as you have already experimented, splitting by space is not an option.
The tokenized output for the example above would be: LPAR, LIT, OP, LIT, RPAR, OP, LIT. The meaning goes as follows:
LPAR: Left parenthesis
RPAR: Right parenthesis
LIT: Literal (a number)
OP: Operator (say: +, -, * and /).
The complete ouput would therefore be:
{ LPAR, LIT(4), OP('+'), LIT(5), RPAR, OP('*'), LIT(2) }
Your lexer basically has to advance in the input string, char by char, using a state machine. For example, when you read a number, you enter in the "input literal" state, in which only other numbers and '.' are allowed.
Now the parser has an easier task. If you feed it with the previous tokens, it does not have to skip spaces, or distinguish between a negative number and a minus operator, it can just advance in a list or array. It can behave following the type of the token, and some of them have associated data, as you can see.
This is only an introduction of the introduction, anyway. Information about the whole compilation process could fill a book. And there are actually many books devoted to this topic, such as the famous "Dragon book" from Aho, Sethi&Ullman. A more updated one is the "Tiger book".
Finally, lexers are quite similar among each others, and it is therefore possible to find generic lexers out there. You can also even find the C grammar for that kind of tools.
Hope this (somehow) helps.

Identyfying prefix in the same string as a suffix

Eg-
maabcma is valid because it contains ma as a proper prefix as well as a proper suffix.
panaba is not.
How do I find out if a word is valid or not as above in C language?
I'm not very good at string operations. So, please help me out with a pseudocode.
Thanks in advance.
I'm completely lost. T=number of test cases.
EDIT: New code. My best code so far-
#include<stdio.h>
#include<string.h>
void main()
{
int i,T,flag=0;
int j,k,len=0;
char W[10],X[10];
scanf("%d",&T);
for(i=0;i<T;i++)
{
scanf("%s",W);
for(len=0;W[len]!='\0';len++)
X[len]=W[len];
X[len]='\0';
for(j=len-1;j>=0;j--)
for(k=0;k<len;k++)
{
if(X[k]!=W[j])
flag=0;
else if((j-k)==(len-1))
flag==1;
}
if (flag == 1)
printf("NICE\n");
else
printf("NOT\n");
}
}
Still not getting the proper results. Where am I going wrong?

The thing is you are only setting the value of flag if a match exists, otherwise you must set it to 0. because see, if I have:
pammbap
my prefix is pam and suffix is bap.
According to the final for loop,
p and a match so flag is set to 1.
but when it comes to b and m it does not become zero. Hence, it returns true.

First, void is not a valid return type for main, unless you are developing for Plan 9.
Second, you should get into the habit of checking the return value of scanf() and all input functions in general. You can't rely on the value of T if the user does not input a number, because T is uninitialised. On that same note, you shouldn't use scanf with an unbounded %s scan operation. If the user enters 20 characters, this isn't going to fit into the ten character buffer that you have. An alternative approach is to use fgets to get a whole line of text at once, or, to use a bounded scan operation. If your array fits 10 characters (including the null terminator) then you can use scanf("%9s", W).
Third, single-character variable names are often very hard to understand. Instead of W, use word, instead of T, use testCount or something similar. This means that someone looking at your code for the first time can more easily work out what each variable is used for.
Most importantly, think about the process in your head, and maybe jot it down on paper. How would you solve this problem yourself? As an example, starting with n = 1,
Take the first n characters from the string.
Compare it to the last n characters from the string
Do they match?
If yes, print out the first n characters as the suffix and stop processing.
If no, increment n and try again. Try until n is in the middle of the string.
There are a few other things to think about as well, do you want the biggest match? For example, in the input string ababcdabab, the prefix ab is also the suffix, but the same can be said about abab. In this case, you don't want to stop processing, you want to keep going even if you find a prefix, so, you should just store the length of the largest prefix that is also the suffix.
Second-most-importantly, running into hurdles like this is incredibly common when learning C, so don't let this put a dampener on your enthusiasm, just keep trying!

Switching numbers inside string

I got a string, that inside it has:
2#0.88315#1#1.5005#true#0.112 and it keep going...
I need to switch every number thats 2 or bigger, to 1,
so I wrote this :
for (i = 0 ; i < strlen(data) ; i++)
{
if (data[i] >= 50 && data[i] <= 57) // If it's a number
{
data[i] = '1'; // switch it to one
while (data[i] >= 48 && data[i] <= 57)
{
i++;
}
}
}
The problem is, that it makes numbers like 0.051511 as 1.111111 too...
Because it doesnt look at a double as one number, but every number seperatly...
How can I do it ?
Thanks

To clarify the question since it is unclear, you want to have the following input:
"2#0.88315#1#1.5005#true#0.112"
To be modified to be the following:
"1#0.88315#1#1#true#0.112"
Your problem is that you need to parse each number into a float value to do any sort of comparison. Either this, or you will need to manually parse it by checking for a '.' character. Doing it manually is rigid, error-prone and unnecessary because the C standard library provides functions which can help you.
Since this is homework, I'll give you some tips on how to approach this problem instead of the actual solution. What you should do is try to write a solution with these steps and if you get stuck, edit the original question with the code you wrote, where it is failing and why you think it is failing.
Your first step is to tokenise the input into the following:
"2"
"0.88315"
"1"
"1.5005"
"true"
"0.112"
This can be done by iterating through the string and either splitting it or using the pointer after which a '#' character occurs. Splitting the string can be done with strtok. However, strtok will split the string by modifying it which is not necessarily needed in our case. The simpler method is simply to iterate through the string and stop each time after a '#' character is reached. The input would then be tokenised to the following:
"2#0.88315#1#1.5005#true#0.112"
"0.88315#1#1.5005#true#0.112"
"1#1.5005#true#0.112"
"1.5005#true#0.112"
"true#0.112"
"0.112"
Some of these substrings do not start with a string which represents a float. You will need to determine which of them do. To do this, you can attempt to parse the front of each string as a float. This can be done with sscanf. After parsing the floats, you will be able to do the comparison you want to.
You are trying to modify the string into a different length so when replacing a float value by a '1', you need to check the length of the original value. If it is longer than 1 character, you will have to shift the subsequent characters forward. For example:
"3.423#1"
If you parsed the first token and found it to be > 2, you would replace the first character with a '1'. This result in:
"1.423#1"
You then still need to delete the rest of that token by shifting the rest of the string down to get:
"1#1"

It looks like you're comparing a char and an int in your if statements.
You should figure out why this matters and compensate for it.

You're comparing the characters in the string one at a time. If you need to consider everything between the "#" symbols as one number, this won't work. Try to get these numbers into an array, cast them to a double, and then do your comparison against 2.

Reading a string is not going properly

Hey guys, i'm working on a program that gets a postfix expression and calculates it..
I have two functions:
Converts infix to postfix
Calculate the postfix
When I try small expressions, like 1+1 or (1+1)*1, it works fine
but when i use all the operands I get something nasty,
Here is the example:
2*2/2+1-1
gets something like:
222/*11-+T_CHECKÖÐ7?█Ã
If you see, the expression is right until the 'T'
I believe it's some parameter mistake, so i'll put the header and return values here
1st)
char* convert(char *infix);
char *post = (char *)malloc(sizeof(char)*tamP);
return post;
2nd)
int evaluate(char *postfix)
while (*postfix != '\0')
return result;
Caller)
char* post = convert(infix);
result = evaluate(post);
Thanks

That kind of weird string looks more like a buffer overflow error. You are likely overwriting the null-terminator, so when the string is printed (or later used), it keeps going until it finds one, examining random program memory until it gets there.
Check that all of your string manipulations are correct.

It is possible that you are not adding the '\0' character at the end of 'post' (after the last sensible character) in the convert(char*) function. That's one reason I can think of.
Try setting the complete string to '\0' before you do anything with it:
memset(post, 0, tamP);
should do.