Substrings in the middle of a String in C - c

I need to extract substrings that are between Strings I know.
I have something like char string = "abcdefg";
I know what I need is between "c" and "f", then my return should be "de".
I know the strncpy() function but do not know how to apply it in the middle of a string.
Thank you.

Here's a full, working example:
#include <stdio.h>
#include <string.h>
int main(void) {
char string[] = "abcdefg";
char from[] = "c";
char to[] = "f";
char *first = strstr(string, from);
if (first == NULL) {
first = &string[0];
} else {
first += strlen(from);
}
char *last = strstr(first, to);
if (last == NULL) {
last = &string[strlen(string)];
}
char *sub = calloc(strlen(string) + 1, sizeof(char));
strncpy(sub, first, last - first);
printf("%s\n", sub);
free(sub);
return 0;
}
You can check it at this ideone.
Now, the explanation:
1.
char string[] = "abcdefg";
char from[] = "c";
char to[] = "f";
Declarations of strings: main string to be checked, beginning delimiter, ending delimiter. Note these are arrays as well, so from and to could be, for example, cd and fg, respectively.
2.
char *first = strstr(string, from);
Find occurence of the beginning delimiter in the main string. Note that it finds the first occurence - if you need to find the last one (for example, if you had the string abcabc, and you wanted a substring from the second a), it might need to be different.
3.
if (first == NULL) {
first = &string[0];
} else {
first += strlen(from);
}
Handle situation, in which the first delimiter doesn't appear in the string. In such a case, we will make a substring from the beginning of the entire string. If it does appear, however, we move the pointer by length of from string, as we need to extract the substring beginning after the first delimiter (correction thanks to #dau_sama).
Depending on your specifications, this may or may not be needed, or another result might be expected.
4.
char *last = strstr(first, to);
Find occurence of the ending delimiter in the main string. Note that it finds the first occurence.
As noted by #dau_sama, it's better to search for ending delimiter from the first, not from beginning of the entire string. This prevents situations, in which to would appear earlier than from.
5.
if (last == NULL) {
last = &string[strlen(string)];
}
Handle situation, in which the second delimiter doesn't appear in the string. In such a case, we will make a substring until end of the string, so we get a pointer to the last character.
Again, depending on your specifications, this may or may not be needed, or another result might be expected.
6.
char *sub = calloc(last - first + 1, sizeof(char));
strncpy(sub, first, last - first);
Allocate sufficient memory and extract substring based on pointers found earlier. We copy last - first (length of the substring) characters beginning from first character.
7.
printf("%s\n", sub);
Here's the result.
I hope it does present the problem with enough details. Depending on your exact specifications, you may need to alter this somehow. For example, if you needed to find all substrings, and not just the first one, you may want to make a loop for finding first and last.

TY guys, worked using the form below:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *between_substring(char *str, char from, char to){
while(*str && *str != from)
++str;//skip
if(*str == '\0')
return NULL;
else
++str;
char *ret = malloc(strlen(str)+1);
char *p = ret;
while(*str && *str != to){
*p++ = *str++;//To the end if `to` do not exist
}
*p = 0;
return ret;
}
int main (void){
char source[] = "abcdefg";
char *target;
target = between(source, 'c', 'f');
printf("%s", source);
printf("%s", target);
return 0;
}

Since people seemed to not understand my approach in the comments, here's a quick hacked together stub.
const char* string = "abcdefg";
const char* b = "c";
const char* e = "f";
//look for the first pattern
const char* begin = strstr(string, b);
if(!begin)
return NULL;
//look for the end pattern
const char* end = strstr(begin, e);
if(!end)
return NULL;
end -= strlen(e);
char result[MAXLENGTH];
strncpy(result, begin, end-begin);

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *between(const char *str, char from, char to){
while(*str && *str != from)
++str;//skip
if(*str == '\0')
return NULL;
else
++str;
char *ret = malloc(strlen(str)+1);
char *p = ret;
while(*str && *str != to){
*p++ = *str++;//To the end if `to` do not exist
}
*p = 0;
return ret;
}
int main(void){
const char* string = "abcdefg";
char *substr = between(string, 'c', 'f');
if(substr!=NULL){
puts(substr);
free(substr);
}
return 0;
}

Related

Split string by a substring

I have following string:
char str[] = "A/USING=B)";
I want to split to get separate A and B values with /USING= as a delimiter
How can I do it? I known strtok() but it just split by one character as delimiter.
As others have pointed out, you can use strstr from <string.h> to find the delimiter in your string. Then either copy the substrings or modify the input string to split it.
Here's an implementation that returns the second part of a split string. If the string can't be split, it returns NULL and the original string is unchanged. If you need to split the string into more substrings, you can call the function on the tail repeatedly. The first part will be the input string, possibly shortened.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *split(char *str, const char *delim)
{
char *p = strstr(str, delim);
if (p == NULL) return NULL; // delimiter not found
*p = '\0'; // terminate string after head
return p + strlen(delim); // return tail substring
}
int main(void)
{
char str[] = "A/USING=B";
char *tail;
tail = split(str, "/USING=");
if (tail) {
printf("head: '%s'\n", str);
printf("tail: '%s'\n", tail);
}
return 0;
}
I known strtok() but it just split by one character as delimiter
Nopes, it's not.
As per the man page for strtok(), (emphasis mine)
char *strtok(char *str, const char *delim);
[...] The delim argument specifies a set of bytes that delimit the tokens in the parsed string. [...] A sequence of two or more contiguous delimiter bytes in the parsed string is considered to be a single delimiter. [...]
So, it need not be "one character" as you've mentioned. You can using a string, like in your case "/USING=" as the delimiter to get the job done.
Here is a little function to do this. It works exactly like strtok_r except that the delimiter is taken as a delimiting string, not a list of delimiting characters.
char *strtokstr_r(char *s, char *delim, char **save_ptr)
{
char *end;
if (s == NULL)
s = *save_ptr;
if (s == NULL || *s == '\0')
{
*save_ptr = s;
return NULL;
}
// Skip leading delimiters.
while (strstr(s,delim)==s) s+=strlen(delim);
if (*s == '\0')
{
*save_ptr = s;
return NULL;
}
// Find the end of the token.
end = strstr (s, delim);
if (end == NULL)
{
*save_ptr = s + strlen(s);
return s;
}
// Terminate the token and make *SAVE_PTR point past it.
memset(end, 0, strlen(delim));
*save_ptr = end + strlen(delim);
return s;
}
Dude, this answer is only valid if the input is this one, if were "abUcd/USING=efgh" your algorithm doesn't work.
This answer is the only valid for me:
char *split(char *str, const char *delim)
{
char *p = strstr(str, delim);
if (p == NULL) return NULL; // delimiter not found
*p = '\0'; // terminate string after head
return p + strlen(delim); // return tail substring
}
int main(void)
{
char str[] = "A/USING=B";
char *tail;
tail = split(str, "/USING=");
if (tail) {
printf("head: '%s'\n", str);
printf("tail: '%s'\n", tail);
}
return 0;
}
See this. I got this when I searched for your question on google.
In your case it will be:
#include <stdio.h>
#include <string.h>
int main (int argc, char* argv [])
{
char theString [16] = "abcd/USING=efgh";
char theCopy [16];
char *token;
strcpy (theCopy, theString);
token = strtok (theCopy, "/USING=");
while (token)
{
printf ("%s\n", token);
token = strtok (NULL, "/USING=");
}
return 0;
}
This uses /USING= as the delimiter.
The output of this was:
abcd
efgh
If you want to check, you can compile and run it online over here.

Tokenize a c string

I'm trying to get the string thet's after /xxx/, there is a must be forward slashes, two of them, then the string that I need to extract.
Here is my code, but I don't know where to set the null terminator, there is a math problem here
char str[100] = "/709/usr/datapoint/nviTemp1";
char *tmp;
char token[100];
tmp = strchr(str+1, '/');
size_t len = (size_t)(tmp-str)+1;
strncpy(token, str+len, strlen(str+len));
strcat(token,"\0");
I want to extract whatever after /709/ which is usr/datapoint/nviTemp1
Note that /709/ is variable and it could be any size but for sure there will be two forward slashes.
Simple improvement:
char str[100] = "/709/usr/datapoint/nviTemp1";
char *tmp;
char token[100];
tmp = strchr(str+1, '/');
if (tmp != NULL) strncpy(token, tmp + 1, sizeof(token)); else token[0] = '\0';
tmp points the slash after "/709", so what you want is right after there.
You don't have to calculate the length manually.
Moreover, strncpy copies at most (third argument) characters, so it should be the length of the destination buffer.
If you know for sure that the string starts with `"/NNN/", then it is simple:
char str[100] = "/709/usr/datapoint/nviTemp1";
char token[100];
strcpy(token, str+5); // str+5 is the first char after the second slash.
If you need to get everything after the second slash:
char str[100] = "/709/usr/datapoint/nviTemp1";
char token[100];
char* slash;
slash = strchr(str, '/'); // Expect that slash == str
slash = strchr(slash+1, '/'); // slash is second slash.
strcpy(token, slash+1); // slash+1 is the string after the second slash.
You can use a counter and a basic while loop:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
if (argc < 2) return 0;
char *s = argv[1];
char slash = '/';
int slash_count = 0;
while (slash_count < 2 && *s)
if (*s++ == slash) slash_count++;
printf("String: %s\n", s);
return 0;
}
Then you can do whatever you want with the s pointer, like duplicating it with strdup or using strcat, strcpy, etc...
Outputs:
$ ./draft /709/usr/datapoint/nviTemp1
String: usr/datapoint/nviTemp1
$ ./draft /70905/usr/datapoint/nviTemp1
String: usr/datapoint/nviTemp1
You can make use of sscanf-
sscanf(str,"%*[/0-9]%s",token);
/* '/709/' is read and discarded and remaining string is stored in token */
token will contain string "usr/datapoint/nviTemp1" .
Working example

Simpler way to extract occurrences without using strtok in C

I'm trying to extract the strings before and after the first comma from the given string. However, I feel there's got to be a better way than what I have below, perhaps I don't even need the strdup calls. Thanks
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int extract_names(const char *str)
{
char *name, *last, *p1, *p2, *p3;
name = strdup(str);
last = strdup(str);
p1 = strchr(name, ',');
if (p1)
{
*p1 = '\0';
printf("%s\n", name);
}
p2 = strchr(last, ',');
p2++;
if (p2)
{
p3 = strpbrk(p2 + 1, " \0");
if (p3)
*p3 = '\0';
printf("%s\n", p2);
}
free(name);
free(last);
return 0;
}
int main()
{
// strings should at least contain last,name.
// but can contain several words
const char *str1 = "jones,bob age,12";
extract_names(str1);
const char *str2 = "smith,peter";
extract_names(str2);
return 0;
}
Output
jones
bob
smith
peter
Use strchr to find the limits of the last and first names. Then you can use the precision specifier in printf to print just the part of the string you are interested in.
For example:
int extract_names(const char *str)
{
const char *comma = strchr(str, ',');
const char *name_end = strchr(str, ' ');
/* name ends at space or end of string */
if (!name_end) {
name_end = str + strlen(str);
}
/* print last name */
printf("%.*s\n", (comma - str), str);
/* print first name */
printf("%.*s\n", name_end - comma, comma + 1);
return 0;
}
Since you're obviously trying to learn, I'll only give you a few pointers and not a "better working solution".
strdup is a useful idea here. As you do, it allows you to overwrite the string (const char *str is "readonly").
This combination:
p2 = strchr(last, ',');
p2++;
if (p2) {
is wrong. after the call p2 equals the return value. if you advance it before the if, you're not testing anything (if NULL was returned, it's 1 by the time you test it).
You don't need two strdups, and you don't need to search twice. p1 already points to the right place in name, you can use that for the rest of your logic.

How to remove \n or \t from a given string in C?

How can I strip a string with all \n and \t in C?
This works in my quick and dirty tests. Does it in place:
#include <stdio.h>
void strip(char *s) {
char *p2 = s;
while(*s != '\0') {
if(*s != '\t' && *s != '\n') {
*p2++ = *s++;
} else {
++s;
}
}
*p2 = '\0';
}
int main() {
char buf[] = "this\t is\n a\t test\n test";
strip(buf);
printf("%s\n", buf);
}
And to appease Chris, here is a version which will make a place the result in a newly malloced buffer and return it (thus it'll work on literals). You will need to free the result.
char *strip_copy(const char *s) {
char *p = malloc(strlen(s) + 1);
if(p) {
char *p2 = p;
while(*s != '\0') {
if(*s != '\t' && *s != '\n') {
*p2++ = *s++;
} else {
++s;
}
}
*p2 = '\0';
}
return p;
}
If you want to replace \n or \t with something else, you can use the function strstr(). It returns a pointer to the first place in a function that has a certain string. For example:
// Find the first "\n".
char new_char = 't';
char* pFirstN = strstr(szMyString, "\n");
*pFirstN = new_char;
You can run that in a loop to find all \n's and \t's.
If you want to "strip" them, i.e. remove them from the string, you'll need to actually use the same method as above, but copy the contents of the string "back" every time you find a \n or \t, so that "this i\ns a test" becomes: "this is a test".
You can do that with memmove (not memcpy, since the src and dst are pointing to overlapping memory), like so:
char* temp = strstr(str, "\t");
// Remove \n.
while ((temp = strstr(str, "\n")) != NULL) {
// Len is the length of the string, from the ampersand \n, including the \n.
int len = strlen(str);
memmove(temp, temp + 1, len);
}
You'll need to repeat this loop again to remove the \t's.
Note: Both of these methods work in-place. This might not be safe! (read Evan Teran's comments for details.. Also, these methods are not very efficient, although they do utilize a library function for some of the code instead of rolling your own.
Basically, you have two ways to do this: you can create a copy of the original string, minus all '\t' and '\n' characters, or you can strip the string "in-place." However, I bet money that the first option will be faster, and I promise you it will be safer.
So we'll make a function:
char *strip(const char *str, const char *d);
We want to use strlen() and malloc() to allocate a new char * buffer the same size as our str buffer. Then we go through str character by character. If the character is not contained in d, we copy it into our new buffer. We can use something like strchr() to see if each character is in the string d. Once we're done, we have a new buffer, with the contents of our old buffer minus characters in the string d, so we just return that. I won't give you sample code, because this might be homework, but here's the sample usage to show you how it solves your problem:
char *string = "some\n text\t to strip";
char *stripped = strip(string, "\t\n");
This is a c string function that will find any character in accept and return a pointer to that position or NULL if it is not found.
#include <string.h>
char *strpbrk(const char *s, const char *accept);
Example:
char search[] = "a string with \t and \n";
char *first_occ = strpbrk( search, "\t\n" );
first_occ will point to the \t, or the 15 character in search. You can replace then call again to loop through until all have been replaced.
I like to make the standard library do as much of the work as possible, so I would use something similar to Evan's solution but with strspn() and strcspn().
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SPACE " \t\r\n"
static void strip(char *s);
static char *strip_copy(char const *s);
int main(int ac, char **av)
{
char s[] = "this\t is\n a\t test\n test";
char *s1 = strip_copy(s);
strip(s);
printf("%s\n%s\n", s, s1);
return 0;
}
static void strip(char *s)
{
char *p = s;
int n;
while (*s)
{
n = strcspn(s, SPACE);
strncpy(p, s, n);
p += n;
s += n + strspn(s+n, SPACE);
}
*p = 0;
}
static char *strip_copy(char const *s)
{
char *buf = malloc(1 + strlen(s));
if (buf)
{
char *p = buf;
char const *q;
int n;
for (q = s; *q; q += n + strspn(q+n, SPACE))
{
n = strcspn(q, SPACE);
strncpy(p, q, n);
p += n;
}
*p++ = '\0';
buf = realloc(buf, p - buf);
}
return buf;
}

Parsing a string with tokens for the first and last words (in C)

I'm going to try to explain the problem.
I am getting a string containing a registry key. For example:
HKEY_CURRENT_USER\Software\MyProgram\SomeOtherValue\SomeKey
now, I need to parse that string into 3 different char (or char *) variables. After the parsing it'll be something like:
string1 = HKEY_CURRENT_USER
string2 = \Software\MyProgram\SomeOtherValue\ /* with the '\' */
string3 = SomeKey
Not only do I need to group the backslashes; I also don't know how many of them are there. I could have something like:
HKEY_CURRENT_USER\Software\SomeKey
or something like:
HKEY_CURRENT_USER\Software\SomeValue\SomeOthervalue\Someblah\SomeKey
I tried with strtok() and strcspn() but i'm getting very confused here...
Any idea how to get this done?
Code is appreciated.
Thanks!
Pseudo-Code:
Step 1: Scan forward until the first "\", note the index.
Step 2: Scan Backward from the end to the last "\"
(the first "\" encountered when going backwards), note the index.
Step 3: StrCpy the relevant pieces out into 3 strings.
Code: (does not rely on strrchr, or other methods you seem to have issues with)
void ParseRegEntry(char* regKey, char** TopLevel, char** Path, char** Key);
int main(void)
{
char* regKey = "HKEY_CURRENT_USER\\Software\\MyProgram\\SomeOtherValue\\SomeKey";
char* TopLevel;
char* Path;
char* Key;
ParseRegEntry(regKey, &TopLevel, &Path, &Key);
printf("1: %s\n2: %s\n3: %s\n", TopLevel, Path, Key);
free(TopLevel);
free(Path);
free(Key);
return 0;
}
void ParseRegEntry(char* regKey, char** TopLevel, char** Path, char** Key)
{
int firstDelimiter = 0;
int lastDelimiter = strlen(regKey)-1;
int keyLen;
while(regKey[firstDelimiter] != '\\')
{
firstDelimiter++;
}
while(regKey[lastDelimiter] != '\\')
{
lastDelimiter--;
}
keyLen = strlen(regKey) - lastDelimiter-1;
*TopLevel = (char*)malloc(firstDelimiter+1);
strncpy(*TopLevel, regKey, firstDelimiter);
(*TopLevel)[firstDelimiter] = '\0';
*Path = (char*)malloc(lastDelimiter - firstDelimiter+2);
strncpy(*Path, regKey+firstDelimiter, lastDelimiter - firstDelimiter);
(*Path)[lastDelimiter-firstDelimiter] = '\0';
*Key = (char*)malloc(keyLen+1);
strncpy(*Key, regKey+lastDelimiter+1, keyLen);
(*Key)[keyLen] = '\0';
}
strchr(char*, char) : locate first occurrence of char in string
strrchr(char*, char) : locate last occurrence of char in string
char* str = "HKEY_CURRENT_USER\Software\MyProgram\SomeOtherValue\SomeKey";
char token1[SIZE], token2[SIZE], token3[SIZE];
char* first = strchr(str, '\\');
char* last = strrchr(str, '\\')+1;
strncpy(token1, str, first-str);
token1[first-str] = '\0';
strncpy(token2, first, last-first);
token2[last-first] = '\0';
strcpy(token3, last);
We use strchr to find the first '\', and strrchr to find the last '\'. We then copy to token1, token2, token3 based on those positions.
I decided to just use fixed size buffers instead of calloc-ing, because that's not so important to illustrate the point. And I kept messing it up. :)
Copy the string into an allocated one and split the variable placing a '\0' in the slash where you want to truncate it.
You can "scan" the string for slashes using the strchr function.
void to_split(char *original, int first_slash, int second_slash, char **first, char **second, char **third) {
int i;
char *first_null;
char *second_null;
char *allocated;
if (first_slash >= second_slash)
return;
allocated = malloc(strlen(original) + 1);
*first = allocated;
strcpy(allocated, original);
for (i = 0, first_null = allocated; i < first_slash && (first_null = strchr(first_null,'\\')); i++);
if (first_null) {
*first_null = '\0';
*second = first_null + 1;
}
second_null = allocated + strlen(original);
i = 0;
while (i < second_slash && second_null > allocated)
i += *second_null-- == '\\';
if (++second_null > allocated) {
*second_null = '\0';
*third = second_null + 1;
}
}
Usage:
int main (int argc, char **argv) {
char *toSplit = "HKEY_CURRENT_USER\\Software\\MyProgram\\SomeOtherValue\\SomeKey";
char *first;
char *second;
char *third;
to_split(toSplit, 1, 3, &first, &second, &third);
printf("%s %s %s\n", first, second, third);
return 0;
}
It isn't the best code in the world, but it gives you an idea.
Here's an example using strchr and strrchr to scan forwards and backwards in the string for the '\'.
char str[] = "HKEY_CURRENT_USER\Software\MyProgram\SomeOtherValue\SomeKey";
char *p, *start;
char root[128], path[128], key[128];
p = strchr (str, '\\');
strncpy (root, str, p - str);
start = p;
p = strrchr (str, '\\') + 1;
strncpy (path, start, p - start);
strcpy (key, p);

Resources