How to find substring between quotation marks in C - c

If I have a string such as the string that is the command
echo 'foobar'|cat
Is there a good way for me to get the text between the quotation marks ("foobar")? I read that it was possible to use scanf to do it in a file, is it also possible in-memory?
My attempt:
char * concat2 = concat(cmd, token);
printf("concat:%s\n", concat2);
int res = scanf(in, " '%[^']'", concat2);
printf("result:%s\n", in);

Use strtok() once, to locate the first occurrence of delimiter you wish (' in your case), and then once more, to find the ending pair of it, like this:
#include <stdio.h>
#include <string.h>
int main(void) {
const char* lineConst = "echo 'foobar'|cat"; // the "input string"
char line[256]; // where we will put a copy of the input
char *subString; // the "result"
strcpy(line, lineConst);
subString = strtok(line, "'"); // find the first double quote
subString=strtok(NULL, "'"); // find the second double quote
if(!subString)
printf("Not found\n");
else
printf("the thing in between quotes is '%s'\n", subString);
return 0;
}
Output:
the thing in between quotes is 'foobar'
I was based on this: How to extract a substring from a string in C?

If your string is in this format -"echo 'foobar'|cat", sscanf can be used-
char a[20]={0};
char *s="echo 'foobar'|cat";
if(sscanf(s,"%*[^']'%[^']'",a)==1){
// do something with a
}
else{
// handle this condition
}
%*[^'] will read and discard a string until it encounter single quote ' , the second format specifier %[^'] will read string till ' and store it in a.

There are a large number of ways to approach the problem. From walking a pair of pointers down the string to locate the delimiters, and a large number of string functions provided in string.h. You can make use of character search functions such as strchr or string search functions like strpbrk, you can use tokenizing functions like strtok, etc...
Look over and learn from them all. Here is an implementation with strpbrk and a pointer difference. It is non-destructive, so you need not make a copy of the original string.
#include <stdio.h>
#include <string.h>
int main (void) {
const char *line = "'foobar'|cat";
const char *delim = "'"; /* delimiter, single quote */
char *p, *ep;
if (!(p = strpbrk (line, delim))) { /* find the first quote */
fprintf (stderr, "error: delimiter not found.\n");
return 1;
}
p++; /* advance to next char */
ep = strpbrk (p, delim); /* set end pointer to next delim */
if (!p) { /* validate end pointer */
fprintf (stderr, "error: matching delimiters not found.\n");
return 1;
}
char substr[ep - p + 1]; /* storage for substring */
strncpy (substr, p, ep - p); /* copy the substring */
substr[ep - p] = 0; /* nul-terminate */
printf ("\n single-quoted string : %s\n\n", substr);
return 0;
}
Example Use/Output
$ ./bin/substr
single-quoted string : foobar
Without Using string.h
As mentioned above, you can also simply walk a pair of pointers down the string and locate your pairs of quotes in that manner as well. For completeness, here is an example finding multiple quoted strings within a single line:
#include <stdio.h>
int main (void) {
const char *line = "'foobar'|cat'mousebar'sum";
char delim = '\'';
char *p = (char *)line, *sp = NULL, *ep = NULL;
size_t i = 0;
for (; *p; p++) { /* for each char in line */
if (!sp && *p == delim) /* find 1st delim */
sp = p, sp++; /* set start ptr */
else if (!ep && *p == delim) /* find 2nd delim */
ep = p; /* set end ptr */
if (sp && ep) { /* if both set */
char substr[ep - sp + 1]; /* declare substr */
for (i = 0, p = sp; p < ep; p++)/* copy to substr */
substr[i++] = *p;
substr[ep - sp] = 0; /* nul-terminate */
printf ("single-quoted string : %s\n", substr);
sp = ep = NULL;
}
}
return 0;
}
Example Use/Output
$ ./bin/substrp
single-quoted string : foobar
single-quoted string : mousebar
Look all the answers over and let us know if you have any questions.

Related

Removing array of occurrences from string in C

I'm having looping issues with my code. I have a method that takes in two char arrays (phrase, characters). The characters array holds characters that must be read individually and compared to the phrase. If it matches, every occurrence of the character will be removed from the phrase.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
//This method has two parameters: (str, c)
//It will remove all occurences of var 'c'
//inside of 'str'
char * rmstr(char * c, char * str) {
//Declare counters and pointers
int stemp = 0;
int ctemp = 0;
char *p = str;
char *d = c;
//Retrieve str count
while(str[stemp] != '\0') {
stemp++;
}
//Retrieve c count
while(c[ctemp] != '\0') {
ctemp++;
}
//Output information
printf("String Count: %d\n",stemp);
printf("Character Count: %d\n",ctemp);
//Iterate through arrays
for (int i = 0; i != stemp; i++) {
for (int j = 0; j != ctemp; j++) {
if (c[j] != str[i]){
*p++ = str[i];
}
break;
}
printf("%s\n",str);
}
*p = 0;
return str;
}
int main()
{
char c[256] = "ema";
char input[256] = "Great message!";
char *result = rmstr(c, input);
printf("%s", result);
return 0;
}
In this case, the input would be "Great Message!" and the character I'd like to remove all occurrences of the characters: e, m, a (As specified in main).
Using the code as it is above, the output is as follows:
Grat mssag!
It is only looping through 1 iteration and removing 'e'. I would like it to loop through 'm' and 'a' as well.
After you fix your break; that was causing your inner loop to exit, it may make sense to reorder your loops and loop over the chars to remove while checking against the characters in str. This is more of a convenience allowing you to shuffle each character down by one in str if it matches a character is c. If you are using the functions in string.h like memmove to move characters down, it doesn't really matter.
A simple implementation using only pointers to manually work through str removing all chars in c could look something like the following:
#include <stdio.h>
char *rmstr (char *str, const char *chars)
{
const char *c = chars; /* set pointer to beginning of chars */
while (*c) { /* loop over all chars with c */
char *p = str; /* set pointer to str */
while (*p) { /* loop over each char in str */
if (*p == *c) { /* if char in str should be removed */
char *sp = p, /* set start pointer at p */
*ep = p + 1; /* set end pointer at p + 1 */
do
*sp++ = *ep; /* copy end to start to end of str */
while (*ep++); /* (nul-char copied on last iteration) */
}
p++; /* advance to next char in str */
}
c++; /* advance to next char in chars */
}
return str; /* return modified str */
}
int main (void) {
char c[] = "ema";
char input[] = "Great message!";
printf ("original: %s\n", input);
printf ("modified: %s\n", rmstr (input, c));
return 0;
}
(there are many ways to do this -- how is largely up to you. whether you use pointers as above, or get the lengths and use string-indexes is also a matter of choice)
Example Use/Output
$ ./bin/rmcharsinstr
original: Great message!
modified: Grt ssg!
If you did want to use memmove (to address the overlapping nature of the source and destination) to move the remaining characters in str down by one each time the character in str matches a character in c, you could leave the loops in your original order, e.g.
#include <string.h>
char *rmstr (char *str, const char *chars)
{
char *p = str; /* set pointer to str */
while (*p) { /* loop over each char in str */
const char *c = chars; /* set pointer to beginning of chars */
while (*c) { /* loop over all chars with c */
while (*c == *p) { /* while the character matches */
memmove (p, p + 1, strlen (p)); /* shuffle down by 1 */
c = chars; /* reset c = chars to check next */
}
c++; /* advance to next char in chars */
}
p++; /* advance to next char in str */
}
return str; /* return modified str */
}
(make sure you understand why you must reset c = chars; in this case)
Finally, if you really wanted the shorthand way of doing it, you could use strpbrk and memmove and reduce your function to:
#include <string.h>
char *rmstr (char *str, const char *chars)
{
/* simply loop using strpbrk removing the character found */
for (char *p = strpbrk (str, chars); p; p = strpbrk (str, chars))
memmove (p, p+1, strlen(p));
return str; /* return modified str */
}
(there is always more than one way to skin-the-cat in C)
The output is the same. Look things over here and let me know if you have further questions.

Multiple Command-Line Arguments - Replace Words

I've a program which takes any number of words from the command-line arguments and replaces them with the word 'CENSORED'. I finally have the program working for the first argument passed in, and I am having trouble getting the program to censor all arguments, outputted in just a single string. The program rather functions individually on a given argument and does not take them all into account. How would I modify this?
How does one use/manipulate multiple command-line arguments collectively ?
My code follows.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *replace_str(char *str, char *orig, char *rep, int j, int argc)
{
static char buffer[4096];
char *p;
for ( j = 1; j <= argc; j++ )
{
if(!(p = strstr(str, orig))) // Check if 'orig' is not in 'str'
{
if ( j == argc ) { return str; } // return str once final argument is reached
else { continue; } // restart loop with next argument
}
strncpy(buffer, str, p-str); // Copy characters from 'str' start to 'orig' str
buffer[p-str] = '\0';
if ( j == argc ) { return buffer; }
else { continue; }
}
sprintf(buffer+(p-str), "%s%s", rep, p+strlen(orig));
}
int main( int argc, char* argv[] ) //argv: list of arguments; array of char pointers //argc: # of arguments.
{
long unsigned int c, i = 0, j = 1;
char str[4096];
while ( (c = getchar()) != EOF )
{
str[i] = c; // save input string to variable 'str'
i++;
}
puts(replace_str( str, argv[j], "CENSORED", j, argc ) );
return 0;
}
i.e.
$ cat Hello.txt
Hello, I am me.
$ ./replace Hello me < Hello.txt
CENSORED, I am CENSORED.
Two issues, you are not guaranteeing a null-terminated str and second, you are not iterating over the words on the command line to censor each. Try the following in main after your getchar() loop:
/* null-terminate str */
str[i] = 0;
/* you must check each command line word (i.e. argv[j]) */
for (j = 1; j < argc; j++)
{
puts(replace_str( str, argv[j], "CENSORED", j, argc ) );
}
Note: that will place each of the CENSORED words on a separate line. As noted in the comments, move puts (or preferably printf) outside the loop to keep on a single line.
Edit
I apologize. You have more issues than stated above. Attempting to check the fix, it became apparent that you would continue to have difficulty parsing the words depending on the order the bad words were entered on the command line.
While it is possible to do the pointer arithmetic to copy/expand/contract the original string regardless of the order the words appear on the command line, it is far easier to simply separate the words provided into an array, and then compare each of the bad words against each word in the original string.
This can be accomplished relatively easily with strtok or strsep. I put together a quick example showing this approach. (note: make a copy of the string before passing to strtok, as it will alter the original). I believe this is what you were attempting to do, but you were stumbling on not having the ability to compare each word (thus your use of strstr to test for a match).
Look over the example and let me know if you have further questions. Note: I replaced your hardcoded 4096 with a SMAX define and provided a word max WMAX for words entered on the command line. Also always initialize your strings/buffers. It will enable you to always be able to easily find the last char in the buffer and ensure the buffer is always null-terminated.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define SMAX 4096
#define WMAX 50
char *replace_str (char *str, char **bad, char *rep)
{
static char buffer[SMAX] = {0};
char *p = buffer;
char *wp = NULL;
unsigned i = 0;
unsigned char censored = 0;
char *str2 = strdup (str); /* make copy of string for strtok */
char *savp = str2; /* and save start address to free */
if (!(wp = strtok (str2, " "))) /* get first word in string or bail */
{
if (savp) free (savp);
return str;
}
while (bad[i]) /* test against each bad word */
{
if (strcmp (wp, bad[i++]) == 0) /* if matched, copy rep to buffer */
{
memcpy (buffer, rep, strlen (rep));
censored = 1;
}
}
if (!censored) /* if no match, copy original word */
memcpy (buffer, wp, strlen (wp));
while ((wp = strtok (NULL, " "))) /* repeat for each word in str */
{
i = 0;
censored = 0;
memcpy (strchr (buffer, 0), " ", 1);
p = strchr (buffer, 0); /* (get address of null-term char) */
while (bad[i])
{
if (strcmp (wp, bad[i++]) == 0)
{
memcpy (p, rep, strlen (rep));
censored = 1;
}
}
if (!censored)
memcpy (p, wp, strlen (wp));
}
if (savp) free (savp); /* free copy of strtok string */
return buffer;
}
int main ( int argc, char** argv)
{
unsigned int i = 0;
char str[SMAX] = {0};
char *badwords[WMAX] = {0}; /* array to hold command line words */
for (i = 1; i < argc; i++) /* save command line in array */
badwords[i-1] = strdup (argv[i]);
i = 0; /* print out the censored words */
printf ("\nCensor words:");
while (badwords[i])
printf (" %s", badwords[i++]);
printf ("\n\n");
printf ("Enter string: "); /* promt to enter string to censor */
if (fgets (str, SMAX-1, stdin) == NULL)
{
fprintf (stderr, "error: failed to read str from stdin\n");
return 1;
}
str[strlen (str) - 1] = 0; /* strip linefeed from input str */
/* print out censored string */
printf ("\ncensored str: %s\n\n", replace_str (str, badwords, "CENSORED"));
i = 0; /* free all allocated memory */
while (badwords[i])
free (badwords[i++]);
return 0;
}
use/output
./bin/censorw bad realbad
Censor words: bad realbad
Enter string: It is not nice to say bad or realbad words.
censored str: It is not nice to say CENSORED or CENSORED words.

Substring in c without using functions

I've seen many solutions for getting substring of a string with usage of strndup or memcpy or strncpy and etc,.
I was wondering if there's a way to get substring without using those functions; even if it's unnecessary.
EDIT: I tried making function myself; I don't remember what the problem was but something went wrong and I ended up not using it.
char *substring(char *str, int start, int length) {
char *s = malloc(sizeof(char)*(length+1));
for(int i=start; i<start+length; i++) {
s[i-start] = str[i];
}
s[length] = '\0';
return s;
}
There are a number of ways to recreate strstr. The following is a quick implementation using the inch-worm method, where you simply use pointers to search for the beginning of the substring in string, then if found, compare every character in substring with the corresponding character in string. If all characters match, the substring is found, return a pointer to the beginning of substring in string.
If a character fails the test, look for another character in string that matches the first character in substring, until string is exhausted.
There are probably several more checks that can be inplemented, but this example should get you started:
#include <stdio.h>
#include <stdlib.h>
char *strstr2 (char *str, char *sub)
{
if (!str || !sub) return NULL; /* validate both strings */
char *p = NULL; /* general pointer */
char *sp = NULL; /* substring pointer */
char *rp = NULL; /* return pointer */
char matched = 0; /* matched flag */
size_t szstr = 0; /* string length */
size_t szsub = 0; /* substring length */
p = sub;
while (*p++) szsub++; /* strlen of substr */
p = str;
while (*p++) szstr++; /* strlen of str */
if (szsub > szstr) return NULL; /* szstr < szsub - no match */
p = str;
while (p < (p + szstr - szsub + 1))
{
while (*p && *p != *sub) p++; /* find start of sub in str */
if ((str + szstr) == p) return NULL; /* if end reached - no sub */
rp = p; /* save return pointer */
sp = sub; /* set sp to sub */
matched = 1; /* presume will match */
while (*sp) /* for each in substring */
if (*p++ != *sp++) { /* check if match fails */
matched = 0; /* if failed, no match */
break; /* break & find new start */
}
if (matched) /* if matched, return ptr */
return rp; /* to start of sub in str */
}
return NULL; /* no match, return NULL */
}
int main() {
char *string = NULL;
char *substr = NULL;
char *begin = NULL;
printf ("\nEnter string : ");
scanf ("%m[^\n]%*c", &string);
printf ("\nEnter substr : ");
scanf ("%m[^\n]%*c", &substr);
if ((begin = strstr2 (string, substr)) != NULL)
printf ("\nSubstring found beginning at : %s\n\n", begin);
else
printf ("\nSubstring NOT in string.\n\n");
if (string) free (string);
if (substr) free (substr);
return 0;
}
output:
$ ./bin/strstr
Enter string : This is the full string or "haystack".
Enter substr : g or "
Substring found beginning at : g or "haystack".
$ ./bin/strstr
Enter string : This is the full string or "haystack".
Enter substr : g or '
Substring NOT in string.
Wow!!! So many variables and tests and lots of indentation.
In the 1970's, some considered it poor style to not have all of the return
statements at the bottom of the routine, but that thinking has mostly disappeared.
For some reason, many programmers write their conditionals to test
if one variable is equal, not equal, greater, or less than something else.
They believe that conditionals should be boolean values and nothing else.
But C allows tests of int, char or others equal or not equal to zero.
Zero can be NULL or NUL or any other zero value. This is legal and appropriate.
if (variable) return NULL;
Some consider conditionals with side effects, such as,
if (*h++ == *n++) continue;
where variables h and n are modified, to not be great style.
To avoid that, I suppose you can rewrite it as
if (*h == *n) { h++; n++; continue;}
Here is my version. It is not worse than the version you supplied on this page. But I want to believe it is shorter, simpler, and easier to understand.
My style is not perfect. Nobody has perfect style. I supply this only
for contrast.
char * strstr( const char *haystack, const char *needle) {
const char *h = haystack, *n = needle;
for (;;) {
if (!*n) return (char *)h;
if (!*h) return NULL;
if (*n++ == *h++) continue;
h = ++haystack;
n = needle;
}
}

Print delim used by strtok_r

I have this text for example:
I know,, more.- today, than yesterday!
And I'm extracting words with this code:
while(getline(&line, &len, fpSourceFile) > 0) {
last_word = NULL;
word = strtok_r(line, delim, &last_word);
while(word){
printf("%s ", word);
word = strtok_r(NULL, delim, &last_word);
// delim_used = ;
}
}
The output is:
I know more today than yesterday
But there is any way to get the delimiter used by strtok_r()? I want to replace same words by one integer, and do the same with delimiters. I can get one word with strtok_r(), but how get the delimiter used by that function?
Fortunately, strtok_r() is a pretty simple function - it's easy to create your own variant that does what you need:
#include <string.h>
/*
* public domain strtok_ex() based on a public domain
* strtok_r() by Charlie Gordon
*
* strtok_r from comp.lang.c 9/14/2007
*
* http://groups.google.com/group/comp.lang.c/msg/2ab1ecbb86646684
*
* (Declaration that it's public domain):
* http://groups.google.com/group/comp.lang.c/msg/7c7b39328fefab9c
*/
/*
strtok_ex() is an extended version of strtok_r() that optinally
returns the delimited that was used to terminate the token
the first 3 parameters are the same as for strtok_r(), the last
parameter:
char* delim_found
is an optional pointer to a character that will get the value of
the delimiter that was found to terminate the token.
*/
char* strtok_ex(
char *str,
const char *delim,
char **nextp,
char* delim_found)
{
char *ret;
char tmp;
if (!delim_found) delim_found = &tmp;
if (str == NULL)
{
str = *nextp;
}
str += strspn(str, delim);
if (*str == '\0')
{
*delim_found = '\0';
return NULL;
}
ret = str;
str += strcspn(str, delim);
*delim_found = *str;
if (*str)
{
*str++ = '\0';
}
*nextp = str;
return ret;
}
#include <stdio.h>
int main(void)
{
char delim[] = " ,.-!";
char line[] = "I know,, more.- today, than yesterday!";
char delim_used;
char* last_word = NULL;
char* word = strtok_ex(line, delim, &last_word, &delim_used);
while (word) {
printf("word: \"%s\" \tdelim: \'%c\'\n", word, delim_used);
word = strtok_ex(NULL, delim, &last_word, &delim_used);
}
return 0;
}
Getting any skipped delimiters would be a bit more work. I don't think it would be a lot of work, but I do think the interface would be unwieldy (strtok_ex()'s interface is already clunky), so you'd have to put some thought into that.
No, you cannot identify the delimiter (by means of the call to strtok_r() itself).
From man strtok_r:
BUGS
[...]
The identity of the delimiting byte is lost.

Search a string using 2 different delimiters - C Programming

I want to search through a string of text in C and find out where text falls between 2 different delimiters. I specifically looking for a comments falling between /* and */. I cannot find a function which will allow me to use 2 different delimiters each of 2 characters long.
I currently have a a very long char[] and I need to search through it. The closest thing I can find is strstr to find the first occurrence of "/*" and then use it again with "*/" instead. However, this completely omits the whole comment and just gives me the "*/" and the rest of the code.
char *pch;
char *pch2;
pch = strstr(wholeProgramStr, "/*");
printf("%s\n",pch);
pch2 = strstr(pch, "*/");
printf("%s\n",pch2);
Any ideas? and the same question but between "//" and a newline character. I can't find a way of implementing this without have some very messy and inefficient code.
Your idea to use strstr is good, but it doesn't give you a null-terminated string. It gives you a pointer to the beginning of the string and the string's length via pointer arithmetic. You can print a char array that is limited by its length by specifying a width with the %s format in printf:
void str_print_between(const char *str,
const char *left, const char *right)
{
const char *begin, *end;
int len;
begin = strstr(str, left);
if (begin == NULL) return;
begin += strlen(left);
end = strstr(begin, right);
if (end == NULL) return;
len = end - begin;
printf("'%.*s'\n", len, begin);
}
Depending on what you want to do, you can return both start pointer and length (via pointers or as a struct) from your function. There are many strn* functions that act equivalent to their str* counterparts, but take an additional maximum length, so you could use those to process your string further.
Treating line comments works just the same as with block comments, just with other delimiters. (But this simple solution does not heed the context; it will detect comments in or across strings, for example.)
char *tmp = strdup(wholeProgramStr); /* makes a copy to be writeable */
char *pch;
char *pch2;
pch = strstr(tmp, "/*"); /* pointer to first occurrence */
if (pch) { /* founded */
pch += 2; /* skip "/*" */
pch2 = strstr(pch, "*/"); /* pointer to second occurrence */
if (pch2) { /* founded */
*pch2 = '\0'; /* cut */
printf("%s\n", pch);
}
}
As pointed out by #alk, there is no need to duplicate the string if you only need to print the result:
char *pch;
char *pch2;
pch = strstr(wholeProgramStr, "/*"); /* pointer to first occurrence */
if (pch) { /* founded */
pch += 2; /* skip "/*" */
pch2 = strstr(pch, "*/"); /* pointer to second occurrence */
if (pch2) { /* founded */
printf("%*s\n", pch2 - pch, pch));
}
}
EDIT:
How would I run this again until it reaches the end of the string? So
it can find multiple comments?
Loop until you don't find both delimiters:
char *tmp = wholeProgramStr;
char *pch;
while (1) {
pch = strstr(tmp, "/*"); /* pointer to first occurrence */
if (pch) { /* founded */
pch += 2; // skip "/*"
tmp = strstr(pch, "*/"); /* pointer to second occurrence */
if (tmp) { /* founded */
printf("%*s\n", tmp - pch, pch));
tmp += 2; // skip "*/"
} else break;
} else break;
}

Resources