Can I use the strstr function to match exact word? For example, let's say I have the word hello, and an input string line:
if
char* line = "hellodarkness my old friend";
and I use
result = strstr(line, "hello");
result will match (be not NULL), however I want to match only the exact word "hello" (so that "hellodarkness" would not match) and result will be NULL.
Is it possible to do this using strstr or do I have to use fscan and scan the line word by word and check for matches?
Here is a generic function for your purpose. It returns a pointer to the first match or NULL if none can be found:
#include <ctype.h>
#include <string.h>
char *word_find(const char *str, const char *word) {
const char *p = NULL;
size_t len = strlen(word);
if (len > 0) {
for (p = str; (p = strstr(p, word)) != NULL; p++) {
if (p == str || !isalnum((unsigned char)p[-1])) {
if (!isalnum((unsigned char)p[len]))
break; /* we have a match! */
p += len; /* next match is at least len+1 bytes away */
}
}
}
return p;
}
I would:
check if string is in sentence
if found at start (same pointer as line), add the length of the word and check if alphanumerical char found. If not (or null-terminated), then match
if found anywhere else, add the extra "no alphanum before" test
code:
#include <stdio.h>
#include <strings.h>
#include <ctype.h>
int main()
{
const char* line = "hellodarkness my old friend";
const char *word_to_find = "hello";
char* p = strstr(line,word_to_find);
if ((p==line) || (p!=NULL && !isalnum((unsigned char)p[-1])))
{
p += strlen(word_to_find);
if (!isalnum((unsigned char)*p))
{
printf("Match\n");
}
}
return 0;
}
here it doesn't print anything, but insert a punctuation/space before/after or terminate the string after "hello" and you'll get a match. Also, you won't get a match by inserting alphanum chars before hello.
EDIT: the above code is nice when there's only 1 "hello" but fails to find the second "hello" in "hellohello hello". So we have to insert a loop to look for the word or NULL, advancing p each time, like this:
#include <stdio.h>
#include <strings.h>
#include <ctype.h>
int main()
{
const char* line = " hellohello hello darkness my old friend";
const char *word_to_find = "hello";
const char* p = line;
for(;;)
{
p = strstr(p,word_to_find);
if (p == NULL) break;
if ((p==line) || !isalnum((unsigned char)p[-1]))
{
p += strlen(word_to_find);
if (!isalnum((unsigned char)*p))
{
printf("Match\n");
break; // found, quit
}
}
// substring was found, but no word match, move by 1 char and retry
p+=1;
}
return 0;
}
Since strstr() returns the pointer to the starting location of the substring that you want to identify, then you can use strlen(result) the check if it is a substring of longer string or the isolated string that you are looking for. if strlen(result) == strlen("hello"), then it ends correctly. If it ends with a space or punctuation (or some other delimiter), then it is also isolated at the end. You would also need to check if the start of the substring is at the beginning of the "long string" or preceded by a blank, punctuation, or other delimiter.
Related
Im trying to write a C program that removes all occurrences of repeating chars in a string except the last occurrence.For example if I had the string
char word[]="Hihxiivaeiavigru";
output should be:
printf("%s",word);
hxeavigru
What I have so far:
#include <stdio.h>
#include <string.h>
int main()
{
char word[]="Hihxiiveiaigru";
for (int i=0;i<strlen(word);i++){
if (word[i+1]==word[i]);
memmove(&word[i], &word[i + 1], strlen(word) - i);
}
printf("%s",word);
return 0;
}
I am not sure what I am doing wrong.
With short strings, any algorithm will do. OP's attempt is O(n*n) (as well as other working answers and #David C. Rankin that identified OP's short-comings.)
But what if the string was thousands, millions in length?
Consider the following algorithm: #paulsm4
Form a `bool` array used[CHAR_MAX - CHAR_MIN + 1] and set each false.
i,unique = n - 1;
From the end of the string (n-1 to 0) to the front:
if (character never seen yet) { // used[] look-up
array[unique] = array[i];
unique--;
}
Mark used[array[i]] as true (index from CHAR_MIN)
i--;
Shift the string "to the left" (unique - i) places
Solution is O(n)
Coding goal is too fun to just post a fully coded answer.
I would first write a function to determine if a char ch at a given position i is the last occurence of ch given a char *. Like,
bool isLast(char *word, char ch, int p) {
p++;
ch = tolower(ch);
while (word[p] != '\0') {
if (tolower(word[p]) == ch) {
return false;
}
p++;
}
return true;
}
Then you can use that to iteratively emit your desired characters like
int main() {
char *word = "Hihxiivaeiavigru";
for (int i = 0; word[i] != '\0'; i++) {
if (isLast(word, word[i], i)) {
putchar(word[i]);
}
}
putchar('\n');
}
And (for completeness) I used
#include <stdio.h>
#include <ctype.h>
#include <stdbool.h>
Outputs (as requested)
hxeavigru
Additional areas where you are currently hurting yourself.
Your for loop must NOT increment the index, e.g. for (int i=0; word[i];). This is because when you memmove() by 1, you have just incremented the indexes. That also means the value to save for last is now i - 1.
there should only be one call to strlen() in the program. You can simply subtract one from length each time memmove() is called.
only increment your loop counter variable when memmove() is not called.
Additionally, avoid hardcoding strings. You shouldn't have to recompile your code just to test the results of "Hihxiivaeiaigrui" instead of "Hihxiivaeiaigru". You shouldn't have to recompile just to remove all but the last 'a' instead of the 'i'. Either pass the string and character to find as arguments to your program (that's what int argc, char **argv are for), or prompt the user for input.
Putting it altogether you could do (presuming word is 1023 characters or less):
#include <stdio.h>
#include <string.h>
#define MAXC 1024
int main (int argc, char **argv) {
char word[MAXC]; /* storage for word */
strcpy (word, argc > 1 ? argv[1] : "Hihxiivaeiaigru"); /* copy to word */
int find = argc > 2 ? *argv[2] : 'i', /* character to find */
last = -1; /* last index where find found */
size_t len = strlen (word); /* only compute strlen once */
printf ("%s (removing all but last %c)\n", word, find);
for (int i=0; word[i];) { /* loop over each char -- do NOT increment */
if (word[i] == find) { /* is this my character to find? */
if (last != -1) { /* if last is set */
/* overwrite last with rest of word */
memmove (&word[last], &word[last + 1], (int)len - last);
last = i - 1; /* last now i - 1 (we just moved it) */
len = len - 1;
}
else { /* last not set */
last = i; /* set it */
i++; /* increment loop counter */
}
}
else /* all other chars */
i++; /* just increment loop counter */
}
puts (word); /* output result -- no need for printf (no coversions) */
}
Example Use/Output
$ ./bin/rm_all_but_last_occurrence
Hihxiivaeiaigru (removing all but last i)
Hhxvaeaigru
What if you want to use "Hihxiivaeiaigrui"? Just pass it as the 1st argument:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui
Hihxiivaeiaigrui (removing all but last i)
Hhxvaeagrui
What if you want to use "Hihxiivaeiaigrui" and remove duplicate 'a' characters? Just pass the string to search as the 1st argument and the character to find as the second:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui a
Hihxiivaeiaigrui (removing all but last a)
Hihxiiveiaigrui
Nothing removed if only one of the characters:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui H
Hihxiivaeiaigrui (removing all but last H)
Hihxiivaeiaigrui
Let me know if you have further questions.
Im trying to write a C program that removes all occurrences of repeating chars in a string except the last occurrence.
Process the string (or word) from last character and move towards the first character of string (or word). Now, think of it as a problem where you have to remove all occurrence of a character from string and except the first occurrence. Since, we are processing the string from last character to first character, so, we have to move the characters, which are remain after removing duplicates, to the start of string once you have processed whole string and, if, there were duplicate characters found in the string. The complexity of this algorithm is O(n).
Implementation:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define INDX(x) (tolower(x) - 'a')
void remove_dups_except_last (char str[]) {
int map[26] = {0}; /* to keep track of a character processed */
size_t len = strlen (str);
char *p = str + len; /* pointer pointing to null character of input string */
size_t i = 0;
for (i = len; i != 0; --i) {
if (map[INDX(str[i - 1])] == 0) {
map[INDX(str[i - 1])] = 1;
*--p = str[i - 1];
}
}
/* if there were duplicates characters then only copy
*/
if (p != str) {
for (i = 0; *p; ++i) {
str[i] = *p++;
}
str[i] = '\0';
}
}
int main(int argc, char* argv[])
{
if (argc != 2) {
printf ("Invalid number of arguments\n");
return -1;
}
char str[1024] = {0};
/* Assumption: the input string/word will contain characters A-Z and a-z
* only and size of input will not be more than 1023.
*
* Leaving it up to you to check the valid characters in input string/word
*/
strcpy (str, argv[1]);
printf ("Original string : %s\n", str);
remove_dups_except_last (str);
printf ("Removed duplicated characters except the last one, modified string : %s\n", str);
return 0;
}
Testcases output:
# ./a.out Hihxiivaeiavigru
Original string : Hihxiivaeiavigru
Removed duplicated characters except the last one, modified string : hxeavigru
# ./a.out aa
Original string : aa
Removed duplicated characters except the last one, modified string : a
# ./a.out a
Original string : a
Removed duplicated characters except the last one, modified string : a
# ./a.out TtYyuU
Original string : TtYyuU
Removed duplicated characters except the last one, modified string : tyU
You can re-iterate to get each characters of your string, then if it is not "i" and not the last occurrence of the i, copy to a new string.
#include <stdio.h>
#include <string.h>
int main() {
char word[]="Hihxiiveiaigru";
char newword[10000];
char* ptr = strrchr(word, 'i');
int index=0;
int index2=0;
while (index < strlen(word)) {
if (word[index]!='i' || index ==(ptr - word)) {
newword[index2]=word[index];
index2++;
}
index++;
}
printf("%s",newword);
return 0;
}
#include <stdio.h>
int
main() {
char string[] = "my name is geany";
int length = sizeof(string)/sizeof(char);
printf("%i", length);
int i;
for ( i = 0; i<length; i++ ) {
}
return 0;
}
if i want to print "my" "name" "is" and "geany" separate then what do I do. I was thinking to use a delimnator but i dont know how to do it in C
start with a pointer to the begining of the string
iterate character by character, looking for your delimiter
each time you find one, you have a string from the last position of the length in difference - do what you want with that
set the new start position to the delimiter + 1, and the go to step 2.
Do all these while there are characters remaining in the string...
I needed to do this because the environment was working in had a restricted library that lacked strtok. Here's how I broke up a hyphen-delimited string:
b = grub_strchr(a,'-');
if (!b)
<handle error>
else
*b++ = 0;
c = grub_strchr(b,'-');
if (!c)
<handle error>
else
*c++ = 0;
Here, a begins life as the compound string "A-B-C", after the code executes, there are three null-terminated strings, a, b, and c which have the values "A", "B" and "C". The <handle error> is a place-holder for code to react to missing delimiters.
Note that, like strtok, the original string is modified by replacing the delimiters with NULLs.
This breaks a string at newlines and trims whitespace for the reported strings. It does not modify the string like strtok does, which means this can be used on a const char* of unknown origin while strtok cannot. The difference is begin/end are pointers to the original string chars, so aren't null terminated strings like strtok gives. Of course this uses a static local so isn't thread safe.
#include <stdio.h> // for printf
#include <stdbool.h> // for bool
#include <ctype.h> // for isspace
static bool readLine (const char* data, const char** beginPtr, const char** endPtr) {
static const char* nextStart;
if (data) {
nextStart = data;
return true;
}
if (*nextStart == '\0') return false;
*beginPtr = nextStart;
// Find next delimiter.
do {
nextStart++;
} while (*nextStart != '\0' && *nextStart != '\n');
// Trim whitespace.
*endPtr = nextStart - 1;
while (isspace(**beginPtr) && *beginPtr < *endPtr)
(*beginPtr)++;
while (isspace(**endPtr) && *endPtr >= *beginPtr)
(*endPtr)--;
(*endPtr)++;
return true;
}
int main (void) {
const char* data = " meow ! \n \r\t \n\n meow ? ";
const char* begin;
const char* end;
readLine(data, 0, 0);
while (readLine(0, &begin, &end)) {
printf("'%.*s'\n", end - begin, begin);
}
return 0;
}
Output:
'meow !'
''
''
'meow ?'
use strchr to find the space.
store a '\0' at that location.
the word is now printfable.
repeat
start the search at the position after the '\0'
if nothing is found then print the last word and break out
otherwise, print the word, and continue the loop
Reinventing the wheel is often a bad idea. Learn to use implementation functions is also a good training.
#include <string.h>
/*
* `strtok` is not reentrant, so it's thread unsafe. On POSIX environment, use
* `strtok_r instead.
*/
int f( char * s, size_t const n ) {
char * p;
int ret = 0;
while ( p = strtok( s, " " ) ) {
s += strlen( p ) + 1;
ret += puts( p );
}
return ret;
}
I'm a beginner at C and I'm stuck on a simple problem. Here it goes:
I have a string formatted like this: "first1:second1\nsecond2\nfirst3:second3" ... and so on.
As you can see from the the example the first field is optional ([firstx:]secondx).
I need to get a resulting string which contains only the second field. Like this: "second1\nsecond2\nsecond3".
I did some research here on stack (string splitting in C) and I found that there are two main functions in C for string splitting: strtok (obsolete) and strsep.
I tried to write the code using both functions (plus strdup) without success. Most of the time I get some unpredictable result.
Better ideas?
Thanks in advance
EDIT:
This was my first try
int main(int argc, char** argv){
char * stri = "ciao:come\nva\nquialla:grande\n";
char * strcopy = strdup(stri); // since strsep and strtok both modify the input string
char * token;
while((token = strsep(&strcopy, "\n"))){
if(token[0] != '\0'){ // I don't want the last match of '\n'
char * sub_copy = strdup(token);
char * sub_token = strtok(sub_copy, ":");
sub_token = strtok(NULL, ":");
if(sub_token[0] != '\0'){
printf("%s\n", sub_token);
}
}
free(sub_copy);
}
free(strcopy);
}
Expected output: "come", "si", "grande"
Here's a solution with strcspn:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *str = "ciao:come\nva\nquialla:grande\n";
const char *p = str;
while (*p) {
size_t n = strcspn(p, ":\n");
if (p[n] == ':') {
p += n + 1;
n = strcspn(p , "\n");
}
if (p[n] == '\n') {
n++;
}
fwrite(p, 1, n, stdout);
p += n;
}
return 0;
}
We compute the size of the initial segment not containing : or \n. If it's followed by a :, we skip over it and get the next segment that doesn't contain \n.
If it's followed by \n, we include the newline character in the segment. Then we just need to output the current segment and update p to continue processing the rest of the string in the same way.
We stop when *p is '\0', i.e. when the end of the string is reached.
#include <stdio.h>
int
main() {
char string[] = "my name is geany";
int length = sizeof(string)/sizeof(char);
printf("%i", length);
int i;
for ( i = 0; i<length; i++ ) {
}
return 0;
}
if i want to print "my" "name" "is" and "geany" separate then what do I do. I was thinking to use a delimnator but i dont know how to do it in C
start with a pointer to the begining of the string
iterate character by character, looking for your delimiter
each time you find one, you have a string from the last position of the length in difference - do what you want with that
set the new start position to the delimiter + 1, and the go to step 2.
Do all these while there are characters remaining in the string...
I needed to do this because the environment was working in had a restricted library that lacked strtok. Here's how I broke up a hyphen-delimited string:
b = grub_strchr(a,'-');
if (!b)
<handle error>
else
*b++ = 0;
c = grub_strchr(b,'-');
if (!c)
<handle error>
else
*c++ = 0;
Here, a begins life as the compound string "A-B-C", after the code executes, there are three null-terminated strings, a, b, and c which have the values "A", "B" and "C". The <handle error> is a place-holder for code to react to missing delimiters.
Note that, like strtok, the original string is modified by replacing the delimiters with NULLs.
This breaks a string at newlines and trims whitespace for the reported strings. It does not modify the string like strtok does, which means this can be used on a const char* of unknown origin while strtok cannot. The difference is begin/end are pointers to the original string chars, so aren't null terminated strings like strtok gives. Of course this uses a static local so isn't thread safe.
#include <stdio.h> // for printf
#include <stdbool.h> // for bool
#include <ctype.h> // for isspace
static bool readLine (const char* data, const char** beginPtr, const char** endPtr) {
static const char* nextStart;
if (data) {
nextStart = data;
return true;
}
if (*nextStart == '\0') return false;
*beginPtr = nextStart;
// Find next delimiter.
do {
nextStart++;
} while (*nextStart != '\0' && *nextStart != '\n');
// Trim whitespace.
*endPtr = nextStart - 1;
while (isspace(**beginPtr) && *beginPtr < *endPtr)
(*beginPtr)++;
while (isspace(**endPtr) && *endPtr >= *beginPtr)
(*endPtr)--;
(*endPtr)++;
return true;
}
int main (void) {
const char* data = " meow ! \n \r\t \n\n meow ? ";
const char* begin;
const char* end;
readLine(data, 0, 0);
while (readLine(0, &begin, &end)) {
printf("'%.*s'\n", end - begin, begin);
}
return 0;
}
Output:
'meow !'
''
''
'meow ?'
use strchr to find the space.
store a '\0' at that location.
the word is now printfable.
repeat
start the search at the position after the '\0'
if nothing is found then print the last word and break out
otherwise, print the word, and continue the loop
Reinventing the wheel is often a bad idea. Learn to use implementation functions is also a good training.
#include <string.h>
/*
* `strtok` is not reentrant, so it's thread unsafe. On POSIX environment, use
* `strtok_r instead.
*/
int f( char * s, size_t const n ) {
char * p;
int ret = 0;
while ( p = strtok( s, " " ) ) {
s += strlen( p ) + 1;
ret += puts( p );
}
return ret;
}
I'm trying to parse the string below in a good way so I can get the sub-string stringI-wantToGet:
const char *str = "Hello \"FOO stringI-wantToGet BAR some other extra text";
str will vary in length but always same pattern - FOO and BAR
What I had in mind was something like:
const char *str = "Hello \"FOO stringI-wantToGet BAR some other extra text";
char *probe, *pointer;
probe = str;
while(probe != '\n'){
if(probe = strstr(probe, "\"FOO")!=NULL) probe++;
else probe = "";
// Nulterm part
if(pointer = strchr(probe, ' ')!=NULL) pointer = '\0';
// not sure here, I was planning to separate it with \0's
}
Any help will be appreciate it.
I had some time on my hands, so there you are.
#include <string.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
int getStringBetweenDelimiters(const char* string, const char* leftDelimiter, const char* rightDelimiter, char** out)
{
// find the left delimiter and use it as the beginning of the substring
const char* beginning = strstr(string, leftDelimiter);
if(beginning == NULL)
return 1; // left delimiter not found
// find the right delimiter
const char* end = strstr(string, rightDelimiter);
if(end == NULL)
return 2; // right delimiter not found
// offset the beginning by the length of the left delimiter, so beginning points _after_ the left delimiter
beginning += strlen(leftDelimiter);
// get the length of the substring
ptrdiff_t segmentLength = end - beginning;
// allocate memory and copy the substring there
*out = malloc(segmentLength + 1);
strncpy(*out, beginning, segmentLength);
(*out)[segmentLength] = 0;
return 0; // success!
}
int main()
{
char* output;
if(getStringBetweenDelimiters("foo FOO bar baz quaz I want this string BAR baz", "FOO", "BAR", &output) == 0)
{
printf("'%s' was between 'FOO' and 'BAR'\n", output);
// Don't forget to free() 'out'!
free(output);
}
}
In first loop, scan until to find your first delimiter string. Set an anchor pointer there.
if found, from the anchor ptr, in a second loop, scan until you find your 2nd delimiter string or you encounter end of the string
If not at end of string, copy characters between the anchor ptr and the 2nd ptr (plus adjustments for spaces, etc that you need)