checking if a string with asterisk(*) is present within another string - c

I am working on a program to check if a particular string is present in the given string: that is if one string is sub string of another string.
For example:
1)String: YoungPeople --> Substring to be checked: ungPeo
The output should return true.
2)String: Hello How are You? --> Substring to be checked: l*are
The output should return true.
I have used the naive based searching algorithm and it works perfectly fine for the first input.
But I am having trouble in the second kind of input where the asterisk(*) is present which should be treated as a regular expression: i.e. matches zero or more characters.
How should I check for the sub string having an * sign?
Should I try to use the same naive algorithm for searching the character before * and for the string after it? Or is there a better approach to solve this problem?

How should i check for the sub string having an * sign?
Upon reading a *, you need to try 1-2 below.
... use the same naive algorithm for searching ... is there a better approach ...?*
There are better methods. A recursive one follows.
[Edit note: 6/10 found/fixed bug]
As you progress through the string, use recursion to check the rest of the string.
The * simple allows for 2 candidate paths:
1) advance the str
2) advance the substr
Else a matching char allows advancing both.
// StarCompare() helper function
bool StarCmp(const char *str, const char *pat) {
if (*pat == '\0') return 1;
if (*pat == '*') {
if (*str) {
// advance str and use the * again
if (StarCmp(str + 1, pat)) return 1;
}
// let * match nothing and advacne to the next pattern
return StarCmp(str, pat + 1);
}
if (*pat == *str) {
return StarCmp(str + 1, pat + 1);
}
return 0;
}
bool StarCompare(const char *str, const char *pat) {
if (!str || !pat) return 0;
do {
if (StarCmp(str, pat)) return 1;
} while (*str++);
return 0;
}
[Edit Test code in previous version]

The GNU Regex Library seems like what you are looking for. If you are not familiar with regular expression, check this site.

Here is what you have to do:
Split the search string by the * character
Look for each of the parts (in the correct order) in the string you are searching
Alternatively, you can use regexes as other people have suggested.

A good place to look for a well-written implementation of glob matching would be the bash sources. But here's a simple recursive implementation that works:
#include <assert.h>
int
_glob_match(char * pattern, char * str)
{
if (!*pattern) return 1;
if (!*str) return 0;
if (*pattern == '*') return match_any_tail(pattern + 1, str);
if (*pattern != *str) return 0;
else return _glob_match(pattern + 1, str + 1);
}
int
match_any_tail(char * pattern, char * str)
{
for (; *str; str++)
if (_glob_match(pattern, str))
return 1;
return 0;
}
int glob_match(char * pattern, char * str)
{
return match_any_tail (pattern, str);
}
void
main()
{
assert(glob_match("ungPeo", "YoungPeople"));
assert(glob_match("l*are", "Hello How are You?"));
}

Related

Finding indexes where substring is present

So right now my code checks if the sub string is present in the code and returns true or false, I would like to find where these substrings are located in the total string. how can you implement that.
#include <stdio.h>
#include <stdbool.h>
bool checksub(const char *strng,const char *subs){
if (*strng=='\0' && *subs!='\0'){
return false;
}
if (*subs=='\0'){
return true;}
if (*strng==*subs){
return checksub(strng+1,subs+1);
}
return false;
}
bool lsub(char *strng,char *subs){
if (*strng=='\0'){
return false;
}
if (*strng==*subs){
if (checksub(strng,subs)){
return 1;
}
}
return lsub(strng+1,subs);
}
int main(){
printf("%d\n",checksub("ababuu","ab"));
printf("%d\n",checksub("the bed bug bites","bit"));
return 0;
}
First you should get rid of recursion since it's often slow and dangerous, for nothing gained.
A (naive) version of strstr that returns an index rather than a pointer might look like this:
int strstr_index (const char* original, const char* sub)
{
int index = -1;
for(const char* str=original; *str!='\0' && index==-1; str++)
{
for(size_t i=0; str[i]==sub[i] && str[i]!='\0'; i++)
{
if(sub[i+1] == '\0')
{
index = (int)(str - original);
break;
}
}
}
return index;
}
This returns -1 if not found, otherwise an index.
It iterates across the string one character at a time.
When a character match with the sub string is found, it starts executing the inner loop as well.
If the inner loop continues to find matches all the way to the end of the sub string, then we found a match.
The index can be obtained by pointer arithmetic: the start address of the found sub string minus the start of the string. The result of that subtraction is strictly speaking a special integer type called ptrdiff_t, but I used int to simplify the example.

function SpacePlug takes a pointer to string and a another char as arguments.

void SpacePlug(char *StringPtr, char Ch)
{
int i = 0;
while (*(StringPtr + i)!= '\0')
{
if (*(StringPtr + i)== ' ')
{
*(StringPtr + i ) = '^^';
printf("%c",*(StringPtr + i));
}
i++;
}
}
int main()
{
char a[]= "Alton Tait";
SpacePlug(a,);
}
Function is to replace each space in the string with the character .In main, use SpacePlug
i want to replace the space between alton tait with ^^ so it should be alton^^tait
thats what i come up with i cant i would like to know where i went wrong. thank you
This is the output i get when i try to compile your code using gcc:
In function 'SpacePlug':
8:33: warning: multi-character character constant [-Wmultichar]
*(StringPtr + i ) = '^^';
^
8:33: warning: overflow in implicit constant conversion [-Woverflow]
In function 'main':
17:17: error: expected expression before ')' token
SpacePlug(a,);
you should hace included the error report in the question, so it's easier to see what's going on.
You've got a few problems on your code:
"^^" is not a character, but a string with 2 characters. '^' is a character. That's the reason for the "multi-charater" error
You're not using "Ch" inside SpacePlug. The replacing character is hardcoded. I'ts always '^^', which doesn't exist.
The function is not properly called in main. It's missing a parameter.
Now for the solution. What i understood is that "SpacePlug" tries to find all spaces inside a string, the first parameter, and replace them with a character, which is the second parameter. The following code will work just fine for that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void SpacePlug(char *StringPtr, char Ch, char *newString)
{
for (int i = 0; i < strlen(StringPtr); i++)
{
if (StringPtr[i] == ' ')
{
newString[i] = Ch;
}
else
{
newString[i] = StringPtr[i];
}
}
}
int main()
{
char *a = "Alton Tait";
char replace = '^';
char *newString = (char *)malloc(strlen(a) + 1); // the +1 is for the null terminator
SpacePlug(a, replace, newString);
printf("%s\n", newString);
free(newString);
}
Cheers.
'^^' is not a character. It is a multi-character constant, which is not portable.
Your code is good for a single-character replacements, i.e.
SpacePlug(a, '^');
You also need to move printing out of the if:
int i = 0;
while (*(StringPtr + i)!= '\0')
{
if (*(StringPtr + i)== ' ')
{
*(StringPtr + i ) = '^';
}
printf("%c", *(StringPtr + i));
i++;
}
Demo.
To make replacements for multiple characters you need an entirely different approach:
Pass char* for the replacement
Make sure the string has enough space to expand for the extra characters
Do the replacement in two passes
Compute the total length after replacement in the first pass
Starting from the back, perform replacements as you go, using the same space.

Removing single dot path names in URL in C

I'm making a function in an apache module which is supposed to fix URLs that are thrown at it. Currently I'm trying to remove single dot path names.
For example, if my URL is:
http://example.com/1/./2/./3/./4.php
Then I want the URL to be:
http://example.com/1/2/3/4.php
However I'm stuck with the logic. I'm using pointers in an effort to make this function run as fast as possible. I'm confused at the logic I should apply at the lines with //? added to the end of them.
Can someone give me advice on how to proceed? Even if its some hidden manual online? I searched bing and google for answers with no success.
static long fixurl(char *u){
char u1[10000];
char *u11=u1,*uu1=u;
long ct=0,fx=0;
while (*uu1){
*u11=*uu1;
if (*uu1=='/'){
ct++;
if (ct >=2){
uu1++;
break;
}
} else {
ct=0;
}
}
while (*uu1){
if (*uu1!='/') { //?
if (*uu1!='.') {
*u11=*uu1;
u11++;
} //?
} //?
uu1++;
}
*u11='\0';
strcpy(u,u1);
return fx;
}
You forget to look ahead one character here:
if (*uu1!='/') { //?
if (*uu1!='.') {
– you are checking the same character twice (against a 'not', so it could have some use, but your question marks indicate you are not sure what to do there and further on).
Note that you actually need to look ahead two characters. If you encounter a slash, test the next character for a . and the one after that for another /.
Rather than trying to fix your code (what is fx, the returned value, supposed to be?), I'd rewrite it from scratch to copy from source to dest and skip the offending sections. The continue makes sure that a sequence /1/././2 gets cleansed correctly to just /1/2 – it needs a chance to check the second slash again, so I just throw it back into the loop.
void fixurl (char *theUrl)
{
char *source, *dest;
source = dest = theUrl;
while (*source)
{
if (source[0] == '/' && source[1] == '.' && source[2] == '/')
{
source += 2; /* effectively, 'try again on the next slash' */
} else
{
*dest = *source;
source++;
dest++;
}
}
*dest = 0;
}
(Afterthought:)
Interestingly, adding proper support for removal of /../ is fairly trivial. If you test for that sequence, you should search backwards for the last / before it and reset dest to that position. You'll want to make sure the path is still valid, though.
This code is untested. In short, it is iterating the string (until the next character is the end sign, since if there is no next character, then you can no longer have a problem) and searches for '/'. When it finds one, analyzes the next character and handles it.
static long fixurl(char *u){
char u1[10000];
int currentIndex = 0;
if (*u == '\0') {
return 0;
}
for (; *(u + 1) != '\0'; u++){
if (*u == '/') {
if (*(u + 1) == '/') {
continue;
} else if ((*(u + 1) == '.') && (*(u + 2) == '.')) {
u++;
continue;
}
}
u1[currentIndex] = *u;
}
strcpy(u,u1);
return currentIndex;
}
here is a version of the code that works
Note it will remove all '.' that follow a '/'
However, it does not check for extraneous '/' characters being inserted into the output as the OPs posted code does not make that check.
Notice the proper formatting of the for() statement
Notice the use of meaningful names, removal of code clutter,
inclusion of a few key comments, etc
Notice the literal characters are placed on the left side of a comparison so writing a '=' when it should be '==' is caught by the compiler.
#include <string.h>
long fixurl( char * );
long fixurl(char *rawURL)
{
char cookedURL[10000] = {'\0'}; // assure new string is terminated
int currentIndex = 0;
cookedURL[currentIndex] = rawURL[0];
rawURL++;
for ( ; *rawURL; rawURL++)
{
// if prior saved char was / and current char is .
// then skip current char
if( ( '/' != cookedURL[currentIndex] )
||
( '.' != *rawURL ))
{
// copy input char to out buffer
currentIndex++;
cookedURL[currentIndex] = *rawURL;
}
} // end for
// copy modified URL back to caller's buffer
strcpy(rawURL, cookedURL);
return currentIndex+1; // number of characters in modified buffer
} // end function: fixurl

Find Verbs in a String

I am trying (and having trouble) to write a program (In C) that accepts a string in the command line (eg. $ test.out "This is a string") and looks through the string to find verbs (and nouns, but if I figure out verbs, I can do nouns on my own).
A list of aplphabetically sorted verbs is given in the file lexicon.h, and is what I am supposed to use as my dictionary.
I know how to accept the string from the command line and use that input to create an array of strings, each string itself being a separate word, and I already have a working program that can do that, and that I hope to use part of for this one.
I am supposed to create a function called binary_search(...stuffgoeshere...) and use that to search through the lexicon file and find the verb.
I would like some suggestions or guidance on how to create a function (binary_search) that can check to see if an already separated word matches any on the list in lexicon.h. I do not want someone to just write an answer, I would like to know why you are suggesting what you do. Hopefully I can learn something fun out of this!
I know it's messy, but this is what I have so far.
Also note that lexicon's verb array has 637 values (as seen when I make int size = 637)
This program does not compile anymore, as I have not yet figured out how to make the binary_search function work yet. I am trying to modify a binary search function used in an example for class, however, that one sorted numbers in a text file, not strings of characters.
If there is anything else I should include, let me know. Thank you for your help!
#include <stdio.h>
#include <string.h>
#include "lexicon.h"
int binary_search(char word[], char verbs[][], int size);
int
main(int argc, char*argv[])
{
char word[80];
char str[80],
args[80][80];
int counter = 0,
a = 0,
i = 0,
index = 0,
t = 0;
while(str[a] != '\0')
{
if(str[a] == ' ')
{
args[index][i] = '\0';
i = 0;
a++;
index ++;
counter ++;
}
args[index][i++] = str[a++];
}
args[index][i] = '\0';
counter = counter + 1;
printf("\nThe verbs were: ");
int verbposition= -1;
int size = 637;
while(t<counter)
{
strcpy(word, args[t]);
verbposition = binary_search(word, verbs, size);
if(verbposition > -1)
printf("%s", args[t]);
t++;
}
return 0;
}
int
binary_search(char word[], char &verbs[][], int size)
{
int bottom = 0,
top = size - 1,
found = 0,
middle;
while(bottom <= top && !found)
{
middle = (bottom + top) / 2;
if(strcmp(word, verbs[middle]))
{
found = 1;
return = middle;
}
if(strcmp(word, verbs[middle]) > 0)
{
top = middle - 1;
}
else
bottom = middle + 1;
}
return -1;
}
You are on the right track. I would highly suggest you to use print statements as you will have a clear idea of where you are going wrong.

Recursion problem in C

I've been trying to solve this problem for a few days now but it seems I haven't grasped the concept of recursion,yet.
I have to build a program in C (recursion is a must here but loops are allowed as well) which does the following:
The user inputs 2 different strings.For example:
String 1 - ABC
String 2 - DE
The program is supposed to print strings which are combined of the ones the user has entered.
the rule is that the inner order of the letters in each string (1&2) must remain.
That's the output for string1=ABC & string2=DE ":
abcde
abdce
abdec
adbce
adbec
adebc
dabce
dabec
daebc
deabc
If anyone could give me a hand here, it would be great.
Thanks guys.
Here is a partial solution in Java: it should be instructive:
public class Join { // prints:
static void join(String s, String s1, String s2) { // ABCde
if (s1.isEmpty() || s2.isEmpty()) { // ABdCe
System.out.println(s + s1 + s2); // ABdeC
} else { // AdBCe
join(s + s1.charAt(0), s1.substring(1), s2); // AdBeC
join(s + s2.charAt(0), s1, s2.substring(1)); // AdeBC
} // dABCe
} // dABeC
public static void main(String[] args) { // dAeBC
join("", "ABC", "de"); // deABC
}
}
How it works
Basically you have String s, the "output stream", and String s1, s2, the "input stream". At every opportunity, you first take from s1, and later you try again and take from s2, exploring both options recursively.
If at any time either "input stream" is empty, then you're left with no other choice but take whatever's left (if any).
Here it is in C, based on the same idea #polygenelubricants used. It's not that I stole his idea, it's that this is a classical problem and this is the simplest approach :).
#include <stdio.h>
#include <string.h>
void solve(const char *str1, const char *str2,
const int length1, const int length2,
char *output, int pozOut, int pozIn1, int pozIn2)
{
if (pozIn1 == length1 && pozIn2 == length2)
{
printf("%s\n", output);
return;
}
if (pozIn1 < length1)
{
output[pozOut] = str1[pozIn1];
solve(str1, str2, length1, length2, output, pozOut + 1, pozIn1 + 1, pozIn2);
}
if (pozIn2 < length2)
{
output[pozOut] = str2[pozIn2];
solve(str1, str2, length1, length2, output, pozOut + 1, pozIn1, pozIn2 + 1);
}
}
int main()
{
char temp[100]; // big enough to hold a solution.
solve("ABC", "12", strlen("ABC"), strlen("12"), temp, 0, 0, 0);
return 0;
}
This can be improved. For example, how would you get rid of some of the parameters?
Also, this has a bug: you should make sure that output contains a '\0' at the end before printing it, otherwise you might get unexpected results. I'll leave that for you to fix.
I don't feel like I want to write down the whole algorithm. However, here are some leads that might help you.
Basically, you must merge two strings, keeping the characters order. It's like you have 2 stacks of possibly different sizes.
In your example:
stack #1: A B C
stack #2: D E
You also know that the resulting string will have as length the sum of the length of the two input strings. (So you know already how much length to allocate)
If you proceed character by character: each turn you can choose wether to pop one character from either the stack #1 or the stack #2, then continue. (Here could be the recursion). If you roll up all the possible calls you'll have all the resulting strings.
I use to like problems like that when I was in college: it can seem difficult sometimes, but it is so rewarding when you solve it by yourself !
Feel free to comment if you need more clues.
The same algorithm as IVlad, but dynamically allocating the result array, and using pointers rather than indexes making it a bit clearer I think.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void solve(const char* result, const char* x0, const char* x1, char* p) {
if (!*x0 && !*x1) printf("%s\n", result);
if (*x0) {
*p = *x0;
solve(result, x0 + 1, x1, p + 1);
}
if (*x1) {
*p = *x1;
solve(result, x0, x1 + 1, p + 1);
}
}
int main(int argc, char* argv[]) {
if (argc >= 3) {
size_t total_length = strlen(argv[1]) + strlen(argv[2]) + 1;
char *result = malloc(total_length);
if (result) {
result[total_length - 1] = '\0';
solve(result, argv[1], argv[2], result);
free(result);
}
}
return 0;
}

Resources