Set a string to a substring in C - c

I have a long string that I want to strip off the end. I want to get rid of everything after the character "<" (inclusively). Here is the code that works:
char *end;
end = strchr(mystring, '<');
mystring[strlen(mystring) - strlen(end)] = '\0';
So if mystring was
"asdfjk234klsjadflnwer023jokmnasdf</tag>alskjdflk23<tag2>akjsldfjsdf</tag2>blabla"
this code would return
"asdfjk234klsjadflnwer023jokmnasdf"
I'm wondering if this can be done in a easier way? I know I can increment a counter over each character in mystring till I find "<" and then used that int as the index, but that seems equally troublesome. All the other built-in string libraries don't seem useful but I'm sure I'm just looking at this in the wrong way. I haven't used C for years.
Any help is appreciated!

Sure. This is the idiomatic way to do it:
char *end;
end = strchr(mystring, '<');
if (end)
*end = '\0';

How about *end = '\0'; rather than the mystring[strlen... part?

char *end;
end = strchr(mystring, '<');
if (end != NULL)
*end = '\0';

strchr() returns a pointer to the character (if found), so:
if(end)
*end = '\0';

Related

output of strtok() is different

I want to compare two strings which contains some other characters as well. To eliminate those characters I am using strtok()
First I am copying strings into temp buffers, which I will use in strtok().
#include<stdio.h>
#include<string.h>
int main()
{
char ch[50]="supl-dev.google.com";
char ch1[50]="*.google.com";
printf("ch =%s\n",ch);
printf("ch1 =%s\n",ch1);
char temp_ch[50], temp_ch1[50];
strcpy(temp_ch,ch);
strcpy(temp_ch1,ch1);
char *ch_token, *ch1_token;
ch_token = strtok(temp_ch,".");
ch1_token = strtok(temp_ch1,"*");
printf("ch_token=%s\n",ch_token);
printf("ch1_token = %s\n",ch1_token);
return 0;
}
Expected results :
ch =supl-dev.google.com
ch1 =*.google.com
ch_token=supl-dev
ch1_token = *
Actual results :
ch =supl-dev.google.com
ch1 =*.google.com
ch_token=supl-dev
ch1_token = .google.com
Here I am expecting ch1_token should contain '*'.
Nope. Your expectation is wrong. You set your delimiter for ch2 to *, which means that strtok will strip off the leading * in *.google.com and return .google.com as the first token. To get what you want, you have to set the delimiter to ..
#include<stdio.h>
#include<string.h>
int main()
{
char ch[50]="supl-dev.google.com";
char ch1[50]="*.google.com";
printf("ch =%s\n",ch);
printf("ch1 =%s\n",ch1);
char temp_ch[50], temp_ch1[50];
strcpy(temp_ch,ch);
strcpy(temp_ch1,ch1);
char *ch_token, *ch1_token;
ch_token = strtok(temp_ch,".");
ch1_token = strtok(temp_ch1,".");
printf("ch_token=%s\n",ch_token);
printf("ch1_token = %s\n",ch1_token);
return 0;
}
Now ch_token should be supl-dev and ch1_token should be *.
The thing to keep in mind is that strtok will go on to find the next token if the current token is empty.
So, when you strtok the string *.google.com with delimiter *, it finds the delimiter in the first position itself. As the current token is empty, the next token is returned which is .google.com
you are splitting the ch1 by * so its result is an empty string which is ignored and the rest of string which is .google.com.(it ignores * because it's your delimiter).
just change your splitting code to ch1_token = strtok(temp_ch1,"."); and it will return *,google and then com.
Your stated need is to search for a common sub-string within two strings.
Using strtok may work, but there are simpler ways to do this without parsing.
Have you considered using strstr()]?
char ch[50]="supl-dev.google.com";
char ch1[50]="*.google.com";
if((strstr(ch, "google.com")) && (strstr(ch1, "google.com"))
{
/// sub-string exists in both strings
}

Remove Trailing Spaces in C

I'm trying to remove trailing spaces however I keep getting a segmentation fault. Not too sure where I am accessing memory out of bounds but this is my code. Leading spaces works fine.
String is the input for the function.
//remove leading spaces
char* final = string;
while(isspace((unsigned char)final[0]))
final++;
//removing trailing spaces
//getting segmentation fault here
int length = strlen(final);
while(length > 0 && isspace((unsigned char)final[length-1]))
length--;
final[length-1] = '\0';
The string I tested was
char* word = " 1 2 Hello ";
printf("%s\n", removeSpaces(word));
When I comment out the trailing spaces, it works perfectly. I don't why the code is failing at the trailing spaces. I would really appreciate the help.
The string literal " 1 2 Hello " is stored in read-only memory. Copy it first before attempting to write '\0' into it, and the problem will go away. So e.g., just replace this:
char* final = string;
with this:
char* final = strdup(string);
Edit 1: Upon considering this in more detail, I realized you also do a leading trim before trailing trim. Since you are moving the pointer, the allocation needs to happen after the leading trim, or the caller will not be able to free the memory. Here's a complete solution that shouldn't have any errors:
char *removeSpaces(const char *string) {
while(isspace((unsigned char)string[0]))
string++;
char *final = strdup(string);
int length = strlen(final);
while(length > 0 && isspace((unsigned char)final[length-1]))
length--;
final[length-1] = '\0';
return final;
}
Edit 2: While I would not recommend it in this case, it might be useful to be aware that if the variable was declared like this:
char word[] = " 1 2 Hello ";
It would have been in writable memory, and the problem would also not exist. Thanks to pmg for the idea.
The reason why it is not a good approach is that you are expecting the callers to the function to always provide writable strings, and that you will modify them. A function that returns a duplicate is a much better approach in general.
(Don't forget to free() the result afterwards!)

working with strings in c

may someone please help me understand these lines of code in the program below
this program according the writer it writes a string of hello world then there is a function in it that also reverses the string to world hello,my quest is what does this code do?
char * p_divs = divs; //what does divs do
char tmp;
while(tmp = *p_divs++)
if (tmp == c) return 1
;
also this code in the void function
*dest = '\0';//what does this pointer do?
int source_len = strlen(source); //what is source
if (source_len == 0) return;
char * p_source = source + source_len - 1;
char * p_dest = dest;
while(p_source >= source){
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
this is the main program
#include <stdio.h>
#include <string.h>
int inDiv(char c, char * divs){
char * p_divs = divs;
char tmp;
while(tmp = *p_divs++)
if (tmp == c) return 1;
return 0;
}
void reverse(char * source, char * dest, char * divs){
*dest = '\0';
int source_len = strlen(source);
if (source_len == 0) return;
char * p_source = source + source_len - 1;
char * p_dest = dest;
while(p_source >= source){
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
if (p_source < source) break;
char * w_end = p_source;
while((p_source >= source) && (!inDiv(*p_source, divs))) p_source--;
char * w_beg = p_source + 1;
for(char * p = w_beg; p <= w_end; p++) *p_dest++ = *p;
*p_dest++ = ' ';
}
*p_dest = '\0';
}
#define MAS_SIZE 100
int main(){
char source[MAS_SIZE], dest[MAS_SIZE], divs[MAS_SIZE];
printf("String : "); gets(source);
printf("Dividers : "); gets(divs);
reverse(source, dest, divs);
printf("Reversed string : %s", dest);
return 0;
}
Here, inDiv can be called to search for the character c in the string divs, for example:
inDiv('x', "is there an x character in here somewhere?') will return 1
inDiv('x', "ahhh... not this time') will return 0
Working through it:
int inDiv(char c, char * divs)
{
char * p_divs = divs; // remember which character we're considering
char tmp;
while(tmp = *p_divs++) // copy that character into tmp, and move p_divs to the next character
// but if tmp is then 0/false, break out of the while loop
if (tmp == c) return 1; // if tmp is the character we're searching for, return "1" meaning found
return 0; // must be here because tmp == 0 indicating end-of-string - return "0" meaning not-found
}
We can infer things about reverse by looking at the call site:
int main()
{
char source[MAS_SIZE], dest[MAS_SIZE], divs[MAS_SIZE];
printf("String : ");
gets(source);
printf("Dividers : ");
gets(divs);
reverse(source, dest, divs);
printf("Reversed string : %s", dest);
We can see gets() called to read from standard input into character arrays source and divs -> those inputs are then provided to reverse(). The way dest is printed, it's clearly meant to be a destination for the reversal of the string in source. At this stage, there's no insight into the relevance of divs.
Let's look at the source...
void reverse(char * source, char * dest, char * divs)
{
*dest = '\0'; //what does this pointer do?
int source_len = strlen(source); //what is source
if (source_len == 0) return;
char* p_source = source + source_len - 1;
char* p_dest = dest;
while(p_source >= source)
{
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
Here, *dest = '\0' writes a NUL character into the character array dest - that's the normal sentinel value encoding the end-of-string position - putting it in at the first character *dest implies we want the destination to be cleared out. We know source is the textual input that we'll be reversing - strlen() will set source_len to the number of characters therein. If there are no characters, then return as there's no work to do and the output is already terminated with NUL. Otherwise, a new pointer p_source is created and initialised to source + source_len - 1 -> that means it's pointing at the last non-NUL character in source. p_dest points at the NUL character at the start of the destination buffer.
Then the loop says: while (p_source >= source) - for this to do anything p_source must initially be >= source - that makes sense as p_source points at the last character and source is the first character address in the buffer; the comparison implies we'll be moving one or both towards the other until they would cross over - doing some work each time. Which brings us to:
while((p_source >= source) && (inDiv(*p_source, divs))) p_source--;
This is the same test we've just seen - but this time we're only moving p_source backwards towards the start of the string while inDiv(*p_source, divs) is also true... that means that the character at *p_source is one of the characters in the divs string. What it means is basically: move backwards until you've gone past the start of the string (though this test has undefined behaviour as Michael Burr points out in comments, and really might not work if the string happens to be allocated at address 0 - even if relative to some specific data segment, as the pointer could go from 0 to something like FFFFFFFF hex without ever seeming to be less than 0) or until you find a character that's not one of the "divider" characters.
Here we get some real insight into what the code's doing... dividing the input into "words" separated by any of a set of characters in the divs input, then writing them in reverse order with space delimiters into the destination buffer. That's getting ahead of ourselves a bit - but let's track it through:
The next line is...
if (p_source < source) break;
...which means if the loop exited having backed past the front of the source string, then break out of all the loops (looking ahead, we see the code just puts a new NUL on the end of the already-generated output and returns - but is that what we'd expect? - if we'd been backing through the "hello" in "hello world" then we'd hit the start of the string and terminate the loop without copying that last "hello" word to the output! The output will always be all the words in the input - except the first word - reversed - that's not the behaviour described by the author).
Otherwise:
char* w_end = p_source; // remember where the non-divider character "word" ends
// move backwards until there are no more characters (p_source < source) or you find a non-divider character
while((p_source >= source) && (!inDiv(*p_source, divs))) p_source--;
// either way that loop exited, the "word" begins at p_source + 1
char * w_beg = p_source + 1;
// append the word between w_beg and w_end to the destination buffer
for(char* p = w_beg; p <= w_end; p++) *p_dest++ = *p;
// also add a space...
*p_dest++ = ' ';
This keeps happening for each "word" in the input, then the final line adds a NUL terminator to the destination.
*p_dest = '\0';
Now, you said:
according [to] the writer it writes a string of hello world then there is a function in it that also reverses the string to world hello
Well, given inputs "hello world" and divider characters including a space (but none of the other characters in the input), then the output would be "hello world " (note the space at the end).
For what it's worth - this code isn't that bad... it's pretty normal for C handling of ASCIIZ buffers, though the assumptions about the length of the input are dangerous and it's missing that first word....
** How to fix the undefined behaviour **
Regarding the undefined behaviour - the smallest change to address that is to change the loops so they terminate when at the start of the buffer, and have the next line explicitly check why it terminated and work out what behaviour is required. That will be a bit ugly, but isn't rocket science....
char * p_divs = divs; //what does divs do
char tmp;
while(tmp = *p_divs++)
if (tmp == c) return 1
divs is a pointer to a char array (certainly a string). p_divs just points to the same string and within the while loop a single character is extraced and written to tmp, and then the pointer is incremented meaning that the next character will be extraced on the next iterator. If tmp matches c the function returns.
Edit: You should learn more about pointers, have a look at Pointer Arithmetic.
As I pointed out in the comments, I don't think C is really the ideal tool for this task (given a choice, I'd use C++ without a second thought).
However, I suppose if I'm going to talk about how horrible the code is, the counter-comment really was right: I should post something better. Contrary to the comment in question, however, I don't think this represents a compromise in elegance, concision, or performance.
The only part that might be open to real argument is elegance, but think this is enough simpler and more straightforward that there's little real question in that respect. It's clearly more concise -- using roughly the same formatting convention as the original, my rev_words is 14 lines long instead of 17. As most people would format them, mine is 17 lines and his is 21.
For performance, I'd expect the two to be about equivalent under most circumstances. Mine avoids running off the beginning of the array, which saves a tiny bit of time. The original contains an early exit, which will save a tiny bit of time on reversing an empty string. I'd consider both insignificant though.
I think one more point is far more important though: I'm reasonably certain mine doesn't use/invoke/depend upon undefined behavior like the original does. I suppose some people might consider that justified if it provided a huge advantage in another area, but given that it's roughly tied or inferior in the other areas, I can't imagine who anybody would consider it (even close to) justified in this case.
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int contains(char const *input, char val) {
while (*input != val && *input != '\0')
++input;
return *input == val;
}
void rev_words(char *dest, size_t max_len, char const *input, char const *delims) {
char const *end = input + strlen(input);
char const *start;
char const *pos;
do {
for (; end>input && contains(delims, end[-1]); --end);
for (start=end; start>input && !contains(delims,start[-1]); --start);
for (pos=start; pos<end && max_len>1; --max_len)
*dest++=*pos++;
if (max_len > 1) { --max_len; *dest++ = ' '; }
end=start;
} while (max_len > 1 && start > input);
*dest++ = '\0';
}
int main(){
char reversed[100];
rev_words(reversed, sizeof(reversed), "This is an\tinput\nstring with\tseveral words in\n it.", " \t\n.");
printf("%s\n", reversed);
return 0;
}
Edit: The:
if (max_len > 1) { --max_len; *dest++ = ' '; }
should really be:
if (max_len > 1 && end-start > 0) { --max_len; *dest++ = ' '; }
If you want to allow for max_len < 1, you can change:
*dest++ = '\0';
to:
if (max_len > 0) *dest++ = '\0';
If the buffer length could somehow be set by via input from a (possibly hostile) user, that would probably be worthwhile. For many purposes it's sufficient to simply require a positive buffer size.

String tokenizer in c

the following code will break down the string command using space i.e " " and a full stop i.e. "." What if i want to break down command using the occurrence of both the space and full stop (at the same time) and not each by themselves e.g. a command like: 'hello .how are you' will be broken into the pieces (ignoring the quotes)
[hello]
[how are you today]
char *token2 = strtok(command, " .");
You can do it pretty easily with strstr:
char *strstrtok(char *str, char *delim)
{
static char *prev;
if (!str) str = prev;
if (str) {
char *end = strstr(str, delim);
if (end) {
prev = end + strlen(delim);
*end = 0;
} else {
prev = 0;
}
}
return str;
}
This is pretty much exactly the same as the implementation of strtok, just calling strstr and strlen instead of strcspn and strspn. It also might return empty tokens (if there are two consecutive delimiters or a delimiter at either end); you can arrange to ignore those if you would prefer.
Your best bet might just be to crawl your input with strstr, which finds occurrences of a substring, and manually tokenize on those.
It's a common question you ask, but I've yet to see a particularly elegant solution. The above is straightforward and workable, however.

Remove the first part of a C String

I'm having a lot of trouble figuring this out. I have a C string, and I want to remove the first part of it. Let's say its: "Food,Amount,Calories". I want to copy out each one of those values, but not the commas. I find the comma, and return the position of the comma to my method. Then I use
strncpy(aLine.field[i], theLine, end);
To copy "theLine" to my array at position "i", with only the first "end" characters (for the first time, "end" would be 4, because that is where the first comma is). But then, because it's in a Loop, I want to remove "Food," from the array, and do the process over again. However, I cannot see how I can remove the first part (or move the array pointer forward?) and keep the rest of it. Any help would be useful!
What you need is to chop off strings with comma as your delimiter.
You need strtok to do this. Here's an example code for you:
int main (int argc, const char * argv[]) {
char *s = "asdf,1234,qwer";
char str[15];
strcpy(str, s);
printf("\nstr: %s", str);
char *tok = strtok(str, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
return 0;
}
This will give you the following output:
str: asdf,1234,qwer
tok: asdf
tok: 1234
tok: qwer
If you have to keep the original string, then strtok. If not, you can replace each separator with '\0', and use the obtained strings directly:
char s_RO[] = "abc,123,xxxx", *s = s_RO;
while (s){
char* old_str = s;
s = strchr(s, ',');
if (s){
*s = '\0';
s++;
};
printf("found string %s\n", old_str);
};
The function you might want to use is strtok()
Here is a nice example - http://www.cplusplus.com/reference/clibrary/cstring/strtok/
Personally, I would use strtok().
I would not recommend removing extracted tokens from the string. Removing part of a string requires copying the remaining characters, which is not very efficient.
Instead, you should keep track of your positions and just copy the sections you want to the new string.
But, again, I would use strtok().
if you know where the comma is, you can just keep reading the string from that point on.
for example
void readTheString(const char *theLine)
{
const char *wordStart = theLine;
const char *wordEnd = theLine;
int i = 0;
while (*wordStart) // while we haven't reached the null termination character
{
while (*wordEnd != ',')
wordEnd++;
// ... copy the substring ranging from wordStart to wordEnd
wordStart = ++wordEnd; // start the next word
}
}
or something like that.
the null termination check is probably wrong, unless the string also ends with a ','... but you get the idea.
anyway, using strtok would probably be a better idea.

Resources