I'm trying to write a small piece of code(function) that removes the substrings in a string only if the words of the string start with these substrings. Look what I've tried till now:
char *string="internship, is wonderful for people who like programming"
char *prefixes="intern wonder in pe"
char *delsubstr(char *string, const char *prefixes)
{
char *tokprefix;
tokprefix=strtok(prefixes, " ");
while(tokprefix)
{
size_t m = strlen(tokprefix);
char *p = string;
while ((p = strstr(p, tokprefix))!= NULL)
{
{
char *q = p + m;
size_t n = strlen(q);
memmove(p, q, n + 1);
}
}
tokprefix=strtok(NULL, " ");
}
return string;
}
The prob with it is it removes the substrings from everywhere and I only what the substrings from the beginning to be removed. Also if "intern" and "in" substrings are present, I want the bigger one to be removed from the string, not the smaller one. Anyone got any ideas/suggestions? I think an if condition before the memmove could be enough, but I'm not sure.
I think an if condition before the memmove could be enough, but I'm
not sure.
Apart from the problem of modifying the prefixes string literal, which is dealt with in the comments above, you're approximately correct:
while ((p = strstr(p, tokprefix))!= NULL)
{
if (p > string && p[-1] != ' ') // not at beginning of word?
++p; // then just advance pointer
else // else remove start of word
{
…
Related
I have two possible strings that would be pointed to by a char *:
char *s = "this is a string";
char *s = "this is a string [this is more string]";
I want to be able to remove the brackets and their contents from the char if they exist. What is the best way to do that in C?
Thanks!
First, you must have access to memory you are permitted to modify. For hysterical raisins, C still allows you to declare pointers to constant character arrays as non-const... but you still can’t modify them.
Instead, declare yourself an array which is initialized by the constant data.
char s[] = "hello world";
That gives you a mutable array of 11+1 characters (a copy of the immutable source string).
Often, however, we want to be able to work on our strings, so we need to make sure there is enough room to play with them.
char s[100] = "hello world"; // string capacity == 99+1. In use == 11+1.
s[6] = '\0'; // now string contains "hello ". In use == 6+1.
strcat( s, "everyone" ); // now string contains "hello everyone". In use == 14+1.
To remove content from a string, you must first identify where the start and stop is:
char s[] = "this is a string [this is more string] original string";
char * first = strchr( s, '[' ); // find the '['
char * last = strchr( s, ']' ); // find the ']'
if (first and last) // if both were found...
// ...copy the end of the string over the part to be replaced
memmove( first, last+1, strlen(last+1)+1 );
// s now contains "this is a string original string"
Notice all those +1s in there? Be mindful:
of where you are indexing the string (you want to copy from after the ']' to the end of the string)
that strings must end with a nul-terminator — a zero-valued character (which we should also copy, so we added one to the strlen).
Remember, a string is just an array of characters, where the last used character is immediately followed by a zero-value character (the nul-terminator). All the methods you use to handle arrays of any other kind apply to handling strings (AKA arrays of characters).
This is one of those "fun" problems that attracts more than its share of answers.
Full credit to other answers for noting that attempts to modify a "string literal" triggers UB. (String literals are often stored in an immutable "read only" region of memory.) (Thanks to #chux-reinstatemonica for clarification.)
Taking this to the next level, below is a bit of code that handles both multiple regions bounded by a pair of delimiters ('[' & ']'), and handles nested instances, too. The code is simple enough, using cut as a counter that ensures nesting is accounted for. NB: It is presumed that the source string is "well formed".
#include <stdio.h>
int main( void ) {
char *wb =
"Once [upon] a [time] there [lived a] princess. "
"She [ was [a hottie]]. The end.";
char wo[100]; // big enough
int cut = 0;
size_t s = 0, d = 0;
for( ; wb[s]; s++ ) {
cut += ( wb[s] == '[' ); // entering region to cut?
if( !cut ) wo[ d++ ] = wb[ s ]; // not cutting, so copy
cut -= ( wb[s] == ']' ); // exiting region that was cut?
if( cut < 0 ) cut = 0; // in case of spurious ']'
}
wo[ d ] = '\0'; // terminate shortened string
puts( wb );
puts( wo );
return 0;
}
Once [upon] a [time] there [lived a] princess. She [ was [a hottie]]. The end.
Once a there princess. She . The end.
It now becomes a small challenge to, perhaps, remove multiple consecutive SPs in the output array. This could quite easily be done on the fly, and is left as an exercise.
One begins to see how something like this could be extended to make an "infix calculator program". Always something new!
Use the strchnul function defined in string.h
*(strchrnul(s, '[')) = '\0';
You may need to define the _GNU_SOURCE macro.
It won't work on string literals.
To remove the bracketed fragment(s) from the string, you cannot modify the string literal in place. You should either use an automatic array or an allocated block and construct the modified string there.
Here is an example:
#include <stdio.h>
#include <stdlib.h>
char *strip_brackets(const char *s) {
size_t i, j, len;
char *p;
// compute the modified string length
for (i = len = 0; s[i];) {
if (s[i] == '[') {
for (j = i + 1; s[j] && s[j] != ']'; j++)
continue;
if (s[j] == ']') {
i = j + 1;
continue;
}
}
len++;
i++;
}
p = malloc(len + 1);
if (p != NULL) {
// construct the modified string
for (i = len = 0; s[i];) {
if (s[i] == '[') {
for (j = i + 1; s[j] && s[j] != ']'; j++)
continue;
if (s[j] == ']') {
i = j + 1;
continue;
}
}
p[len++] = s[i++];
}
p[len] = '\0';
}
return p;
}
int main() {
const char *s = "this is a string [this is more string]";
char *s1 = strip_brackets(s);
printf("before: %s\n", s);
if (s1) {
printf("after : %s\n", s1);
free(s1);
}
return 0;
}
I'm a beginner at C and I'm stuck on a simple problem. Here it goes:
I have a string formatted like this: "first1:second1\nsecond2\nfirst3:second3" ... and so on.
As you can see from the the example the first field is optional ([firstx:]secondx).
I need to get a resulting string which contains only the second field. Like this: "second1\nsecond2\nsecond3".
I did some research here on stack (string splitting in C) and I found that there are two main functions in C for string splitting: strtok (obsolete) and strsep.
I tried to write the code using both functions (plus strdup) without success. Most of the time I get some unpredictable result.
Better ideas?
Thanks in advance
EDIT:
This was my first try
int main(int argc, char** argv){
char * stri = "ciao:come\nva\nquialla:grande\n";
char * strcopy = strdup(stri); // since strsep and strtok both modify the input string
char * token;
while((token = strsep(&strcopy, "\n"))){
if(token[0] != '\0'){ // I don't want the last match of '\n'
char * sub_copy = strdup(token);
char * sub_token = strtok(sub_copy, ":");
sub_token = strtok(NULL, ":");
if(sub_token[0] != '\0'){
printf("%s\n", sub_token);
}
}
free(sub_copy);
}
free(strcopy);
}
Expected output: "come", "si", "grande"
Here's a solution with strcspn:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *str = "ciao:come\nva\nquialla:grande\n";
const char *p = str;
while (*p) {
size_t n = strcspn(p, ":\n");
if (p[n] == ':') {
p += n + 1;
n = strcspn(p , "\n");
}
if (p[n] == '\n') {
n++;
}
fwrite(p, 1, n, stdout);
p += n;
}
return 0;
}
We compute the size of the initial segment not containing : or \n. If it's followed by a :, we skip over it and get the next segment that doesn't contain \n.
If it's followed by \n, we include the newline character in the segment. Then we just need to output the current segment and update p to continue processing the rest of the string in the same way.
We stop when *p is '\0', i.e. when the end of the string is reached.
I need to replace " (ASCII value 34) char by empty character "".
In output, instead of quote i get an "?" question mark character.
I tried to use things like:
mystring[itit] = "";
mystring[itit] = '';
mystring[itit] = "\O";
My code:
strcpy( mystring ,op->data.value.str );
for(itit=0;itit<10;itit++)
{
if(mystring[itit] == 34)
{
mystring[itit] = NULL;
}
}
printf( "%s\n",mystring);
Any ideas how to fix that?
For clarification: the strings in mystring are like:
"hello"
"place "
"school"
all with the quotation marks - I Actually need to remove them and get:
hello
place
school
int removeChar(char *str, char c) {
int i, j;
for(i = 0, j = 0 ; str[i] ; i++){
if( str[i] == c) continue; // skip c do not copy it
str[j] = str[i]; // shift characters left
j++;
}
str[j]=0; // terminate the string
return j; // return the actual size
}
What you need to do is remove the character, not replace it, since you're not replacing it with anything. To do this, when you find the character is question, you need to move the remaining characters down.
int i,j;
strcpy(mystring, "aa\"bb\"cc");
for(i=0,j=0;i<10;i++)
{
if(mystring[i] != '"')
{
mystring[j] = mystring[i];
j++;
}
}
mystring[j] = '\0';
printf("mystring=%s\n",mystring);
Result:
mystring=aabbcc
To remove a character from a string, you can do this:
void remove(char* str, char rm)
{
char *src, *dst;
for (src = dst = str; *src != '\0'; ++src) {
*dst = *src;
if (*dst != rm) ++dst;
}
*dst = '\0'; /*insert terminator at the new place*/
}
and call with rm equal to 34.
This algorithm is well-known; I've adopted it from Kernighan & Ritchie. Do study it carefully with your debugger.
In C, strings are simply arrays of characters with a NUL (0) at the end. (They cannot contain NULs.) As with any array, you can't simply "remove" an element. You need to shift all the following elements one position, with the result that there will be an unneeded element at the end. With strings this extra element isn't a huge problem becauyse the NUL still identifies where the string ends.
In this case, you are copying the string first, so you might as well copy it without the characters you want to delete. Unless you know how many such characters there are, you will need to have allocated enough space in the new string for the entire string you want to copy:
/* Before this, you must ensure that mystring has enough space */
{
char* out = mystring;
const char* in = op->data.value.str;
do {
if (*in != '"') *out++ = *in;
} while (*in++);
}
Note: I use the fact that strings are NUL-terminated to terminate the loop, which saves me from having to know in advance how long op->data.value.str is. For this reason, I use character pointers rather than indexes.
There is no "empty character". A string can be empty by having no characters, but a character is an atomic element and can't be empty, like a box of apples can be empty, but one can't have an "empty apple".
Instead, you need to remove the quotes and close the space they took up. Better yet, if you do the copying yourself, just don't copy them:
char *psrc = op->data.value.str;
char *pdest = mystring;
while (*psrc != '\0')
{
if (*psrc != '\"')
{
*pdest = *psrc;
++pdest;
}
++psrc;
}
*pdest = '\0';
You can use this to strip all '\"'-characters:
void stripquotes(char *ptr) {
char *ptr2 = ptr;
do {
*ptr2 = *ptr++;
if (*ptr2 != '\"')
ptr2++;
} while (*ptr);
}
char *args[32];
char **next = args;
char *temp = NULL;
char *quotes = NULL;
temp = strtok(line, " \n&");
while (temp != NULL) {
if (strncmp(temp, "\"", 1) == 0) {
//int i = strlen(temp);
printf("first if");
quotes = strtok(temp, "\"");
} else if (strncmp(temp, "\"", 1) != 0) {
*next++ = temp;
temp = strtok(NULL, " \n&");
}
}
I'm having trouble with trying to understand with how to still keep spaces if a part of the string is surrounded with quotes. For example, if I want execvp() to execute this: diff "space name.txt" sample.txt, it should save diff at args[0], space name.txt at args[1] and sample.txt at args[2].
I'm not really sure on how to implement this, I've tried a few different ways of logic with if statements, but I'm not quite there. At the moment I am trying to do something simple like: ls "folder", however, it gets stuck in the while loop of printing out my printf() statement.
I know this isn't worded as a question - it's more explaining what I'm trying to achieve and where I'm up to so far, but I'm having trouble and would really appreciate some hints of how the logic should be.
Instead of using strtok process the string char by char. If you see a ", set a flag. If flag is already set - unset it instead. If you see a space - check the flag and either switch to next arg, or add space to current. Any other char - add to current. Zero byte - done processing.
With some extra effort you'll be able to handle even stuff like diff "file \"one\"" file\ two (you should get diff, file "one" and file two as results)
I'm confused even to understand what you try to do. Are you trying to tokenize the input string into space separated tokens?
Just separate the input string on spaces and when you encounter a double quote char you need a second inner loop which handles quoted strings.
There is more to quoted strings than to search for the closing quote. You need to handle backslashes, for example backslashed escaped quotes and also backslash escaped backslashes.
Just consider the following:
diff "space name \" with quotes.txt\\" foo
Which refers to a (trashy) filename space name " with quotes.txt\. Use this as a test case, then you know when you are done with the basics. Note that shell command line splitting is a lot more crazy than that.
Here is my idea:
Make two pointers A and B, initially pointing at first char of the string.
Iterate through the string with pointer A, copying every char into an array as long as it's not a space.
Once you have reached a ", take the pointer B starting from the position A+1 and go forward until you reach the next ", copying everything including space.
Now repeat from number 2, starting from the char B+1.
Repeat as long as you haven't reached \0.
Note: You'll have to consider what to do if there are nested quotes though.
You can also use a flag (int 1 || 0) and a pointer to denote if you're in a quote or not, following 2 separate rules based on the flag.
Write three functions. All of these should return the number of bytes they process. Firstly, the one that handles quoted arguments.
size_t handle_quoted_argument(char *str, char **destination) {
assert(*str == '\"');
/* discard the opening quote */
*destination = str + 1;
/* find the closing quote (or a '\0' indicating the end of the string) */
size_t length = strcspn(str + 1, "\"") + 1;
assert(str[length] == '\"'); /* NOTE: You really should handle mismatching quotes properly, here */
/* discard the closing quote */
str[length] = '\0';
return length + 1;
}
... then a function to handle the unquoted arguments:
size_t handle_unquoted_argument(char *str, char **destination) {
size_t length = strcspn(str, " \n");
char c = str[length];
*destination = str;
str[length] = '\0';
return c == ' ' ? length + 1 : length;
}
... then a function to handle (possibly repetitive) whitespace:
size_t handle_whitespace(char *str) {
int whitespace_count;
/* This will count consecutive whitespace characters, eg. tabs, newlines, spaces... */
assert(sscanf(str, " %n", &whitespace_count) == 0);
return whitespace_count;
}
Combining these three should be simple:
size_t n = 0, argv = 0;
while (line[n] != '\0') {
n += handle_whitespace(line + n);
n += line[n] == '\"' ? handle_quoted_argument(line + n, args + argv++)
: handle_unquoted_argument(line + n, args + argv++);
}
By breaking this up into four separate algorithms, can you see how much simpler this task becomes?
So here is where I read in the line:
while((qtemp = fgets(line, size, stdin)) != NULL ) {
if (strcmp(line, "exit\n") == 0) {
exit(EXIT_SUCCESS);
}
spaceorquotes(qtemp);
}
Then I go to this: (I haven't added my initializers, you get the idea though)
length = strlen(qtemp);
for(i = 0; i < length; i++) {
position = strcspn(qtemp, " \"\n");
while (strncmp(qtemp, " ", 1) == 0) {
memmove(qtemp, qtemp+1, length-1);
position = strcspn(qtemp, " \"\n");
} /*this while loop is for handling multiple spaces*/
if (strncmp(qtemp, "\"", 1) == 0) { /*this is for handling quotes */
memmove(qtemp, qtemp+1, length-1);
position = strcspn(qtemp, "\"");
stemp = malloc(position*sizeof(char));
strncat(stemp, qtemp, position);
args[i] = stemp;
} else { /*otherwise handle it as a (single) space*/
stemp = malloc(position*sizeof(char));
strncat(stemp, qtemp, position);
args[i] = stemp;
}
//printf("args: %s\n", args[i]);
length = strlen(qtemp);
memmove(qtemp, qtemp+position+1, length-position);
}
args[i-1] = NULL; /*the last position seemed to be a space, so I overwrote it with a null to terminate */
if (execvp(args[0], args) == -1) {
perror("execvp");
exit(EXIT_FAILURE);
}
I found that using strcspn helped, as modifiable lvalue suggested.
I have an array of strings and I need to separate each string into several different parts. The string may or may not have arbitrary spaces and tabs.
Example string:
str[0]: " apple e 3 a a fruit "
I need it to become:
word[0] = "apple"
row[0] = "e"
column[0] = "3"
direction[0] = "a"
clue[0] = "a fruit"
So I need to remove any leading/trailing whitespace, as well as any that are in between the fields (except for the clue field. The spaces within the clues need to be retained). I'm really not sure how to go about doing this. I have a few basic ideas, but I don't know how to go about implementing them, or if they're even do-able (new to coding and quite clueless). Everything I've tried so far either wouldn't compile, or didn't work.
My most recent attempt at pulling the first field out:
for (i=0; i<MAX_LENGTH; i++) {
for (j=0; j<MAX_INPUT; j++) {
if (isSpace(&input[i][j]) == FALSE) {
//if whitespace is not present, find the location of the next
//space and copy the string up til there
static int n = j;
for (n=0; n<MAX_INPUT; n++) {
if (isSpace(&input[i][n]) == TRUE) {
strncpy(word[i],&input[i][j],(n-j));
//printf("Word[%d]: %s\n",i,word[i]);
break;
}
}
}
}
}
Not too surprised that it didn't work. I haven't quite gotten my head around the problem yet. Help please?
Your current code seems way too complicated. Here is a three-step algorithm that I would implement if I were coding this up:
while the current character is a space: move to the next character;
while the current character is not a space: append it to the output and move on to the next character;
repeat steps 1 & 2 until you reach the end of the input string.
simply use sscanf to read 'words' separated by whitespaces:
char str[][100] = { " apple e 3 a a fruit ", ... }
char word[100][100],row[100][100],column[100][100],direction[100][100],clue[100][100];
int i;
for( i=0; i<...; ++i )
sscanf(str[i],"%s%s%s%s%[^\n]",word[i],row[i],column[i],direction[i],clue[i]);
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *trimEnd(char *str){
char *p;
if(str == NULL || *str == '\0') return str;
p=str+strlen(str);
while(isspace(*--p) && p >= str){
*p = '\0';
}
return str;
}
int main(){
char str[100][100] = { " apple e 3 a a fruit " };
char word[100][100], row[100][100], column[100][100], direction[100][100], clue[100][100];
const char *delim= " \t";
int i;
const dataSize = 1;//for sample
for(i=0;i<dataSize;++i){
char *pwk, *p = strdup(str[i]);
strcpy(word[i], strtok(p, delim));
strcpy(row[i], strtok(NULL, delim));
strcpy(column[i],strtok(NULL, delim));
strcpy(direction[i], pwk=strtok(NULL, delim));
pwk = pwk+strlen(pwk)+1;
strcpy(clue[i], trimEnd(pwk+strspn(pwk, delim)));
fprintf(stderr,"DEBUG:%s,%s,%s,%s,%s.",word[i],row[i],column[i],direction[i],clue[i]);
free(p);
}
return 0;
}
You can use the strtok_r function.
It breaks the string to tokens, separated by whatever you choose.
const char *delim= " \t";
char *tok = NULL;
word[0] = strtok_r(str[0],delim,&tok);
row[0] = strtok_r(NULL,delim,&tok);
column[0] = strtok_r(NULL,delim,&tok);
direction[0] = strtok_r(NULL,delim,&tok);
clue[0] = strtok_r(NULL,"",&tok); // delim="" - won't break by whitespace