Search a string using 2 different delimiters - C Programming - c

I want to search through a string of text in C and find out where text falls between 2 different delimiters. I specifically looking for a comments falling between /* and */. I cannot find a function which will allow me to use 2 different delimiters each of 2 characters long.
I currently have a a very long char[] and I need to search through it. The closest thing I can find is strstr to find the first occurrence of "/*" and then use it again with "*/" instead. However, this completely omits the whole comment and just gives me the "*/" and the rest of the code.
char *pch;
char *pch2;
pch = strstr(wholeProgramStr, "/*");
printf("%s\n",pch);
pch2 = strstr(pch, "*/");
printf("%s\n",pch2);
Any ideas? and the same question but between "//" and a newline character. I can't find a way of implementing this without have some very messy and inefficient code.

Your idea to use strstr is good, but it doesn't give you a null-terminated string. It gives you a pointer to the beginning of the string and the string's length via pointer arithmetic. You can print a char array that is limited by its length by specifying a width with the %s format in printf:
void str_print_between(const char *str,
const char *left, const char *right)
{
const char *begin, *end;
int len;
begin = strstr(str, left);
if (begin == NULL) return;
begin += strlen(left);
end = strstr(begin, right);
if (end == NULL) return;
len = end - begin;
printf("'%.*s'\n", len, begin);
}
Depending on what you want to do, you can return both start pointer and length (via pointers or as a struct) from your function. There are many strn* functions that act equivalent to their str* counterparts, but take an additional maximum length, so you could use those to process your string further.
Treating line comments works just the same as with block comments, just with other delimiters. (But this simple solution does not heed the context; it will detect comments in or across strings, for example.)

char *tmp = strdup(wholeProgramStr); /* makes a copy to be writeable */
char *pch;
char *pch2;
pch = strstr(tmp, "/*"); /* pointer to first occurrence */
if (pch) { /* founded */
pch += 2; /* skip "/*" */
pch2 = strstr(pch, "*/"); /* pointer to second occurrence */
if (pch2) { /* founded */
*pch2 = '\0'; /* cut */
printf("%s\n", pch);
}
}
As pointed out by #alk, there is no need to duplicate the string if you only need to print the result:
char *pch;
char *pch2;
pch = strstr(wholeProgramStr, "/*"); /* pointer to first occurrence */
if (pch) { /* founded */
pch += 2; /* skip "/*" */
pch2 = strstr(pch, "*/"); /* pointer to second occurrence */
if (pch2) { /* founded */
printf("%*s\n", pch2 - pch, pch));
}
}
EDIT:
How would I run this again until it reaches the end of the string? So
it can find multiple comments?
Loop until you don't find both delimiters:
char *tmp = wholeProgramStr;
char *pch;
while (1) {
pch = strstr(tmp, "/*"); /* pointer to first occurrence */
if (pch) { /* founded */
pch += 2; // skip "/*"
tmp = strstr(pch, "*/"); /* pointer to second occurrence */
if (tmp) { /* founded */
printf("%*s\n", tmp - pch, pch));
tmp += 2; // skip "*/"
} else break;
} else break;
}

Related

Strcpy() not copying the second character in a string

I have been running into issues with the strcpy() function in C. In this function I take a string in buffer and the string contains a something along the lines of '(213);' and I am trying to remove the brackets so the output would be something like 200;.
for (i = 0; i < bufferlen; i++) {
// check for '(' followed by a naked number followed by ')'
// remove ')' by shifting the tail end of the expression
// remove '(' by shifting the beginning of the expression
if((buffer[i] == '(') && (isdigit(buffer[i+1]))){
int numberLen = 0;
int test =0;
i++;
while((isdigit(buffer[i]))){
i++;
numberLen++;
}
if(buffer[i] == ')'){
int numberStart = i - numberLen-1;
strcpy(&buffer[i], &buffer[i+1]);
strcpy(&buffer[numberStart], &buffer[numberStart+1]);
printf("buffer = %s\n", buffer);
}
}
}
However, the output is as follows
buffer before strcpy(&buffer[i], &buffer[i+1]); = (213);
buffer after strcpy(&buffer[i], &buffer[i+1]); = (213;
buffer after strcpy(&buffer[numberStart], &buffer[numberStart+1]); = 23;;
for some reason the second strcpy function removes the second value of the string. I have also tried
strcpy(&buffer[0], &buffer[1]); and still end up with the same results. Any insight as to why this is occurring would be greatly appreciated.
Continuing from the comment, strcpy(&buffer[i], &buffer[i+1]); where source and dest overlap results in Undefined Behavior, use memmove, or simply use a couple of pointers instead.
The prohibition on using strings that overlap (i.e. are the same string) is found in C11 Standard - 7.24.2.3 The strcpy function
If I understand your question and you simply want to turn "'(213)'" into "213", you don't need any of the string.h functions at all. You can simply use a couple of pointers and walk down the source-string until you find a digit. Start copying digits to dest at that point by simple assignment. When the first non-digit is encountered, break your copy loop. Keeping a flag to indicate when you are "in" a number copying digits will allow you to break on the 1st non-digit to limit your copy to the first sequence of digits found (e.g. so from the string "'(213)' (423)", only 213 is returned instead of 213423). You could do somehting like:
char *extractdigits (char *dest, const char *src)
{
/* you can check src != NULL here */
char *p = dest; /* pointer to dest (to preserve dest for return) */
int in = 0; /* simple flag to break loop when non-digit found */
while (*src) { /* loop over each char in src */
if (isdigit(*src)) { /* if it is a digit */
*p++ = *src; /* copy to dest */
in = 1; /* set in-number flag */
}
else if (in) /* if in-number, break on non-digit */
break;
src++; /* increment src pointer */
}
*p = 0; /* nul-terminate dest */
return dest; /* return pointer to dest (for convenience) */
}
A short example would be:
#include <stdio.h>
#include <ctype.h>
#define MAXC 32
char *extractdigits (char *dest, const char *src)
{
/* you can check src != NULL here */
char *p = dest; /* pointer to dest (to preserve dest for return) */
int in = 0; /* simple flag to break loop when non-digit found */
while (*src) { /* loop over each char in src */
if (isdigit(*src)) { /* if it is a digit */
*p++ = *src; /* copy to dest */
in = 1; /* set in-number flag */
}
else if (in) /* if in-number, break on non-digit */
break;
src++; /* increment src pointer */
}
*p = 0; /* nul-terminate dest */
return dest; /* return pointer to dest (for convenience) */
}
int main (void) {
char digits[MAXC] = "";
const char *string = "'(213}'";
printf ("in : %s\nout: %s\n", string, extractdigits (digits, string));
}
Example Use/Output
$ ./bin/extractdigits
in : '(213}'
out: 213
Look things over and let me know if you have further questions.

Removing array of occurrences from string in C

I'm having looping issues with my code. I have a method that takes in two char arrays (phrase, characters). The characters array holds characters that must be read individually and compared to the phrase. If it matches, every occurrence of the character will be removed from the phrase.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
//This method has two parameters: (str, c)
//It will remove all occurences of var 'c'
//inside of 'str'
char * rmstr(char * c, char * str) {
//Declare counters and pointers
int stemp = 0;
int ctemp = 0;
char *p = str;
char *d = c;
//Retrieve str count
while(str[stemp] != '\0') {
stemp++;
}
//Retrieve c count
while(c[ctemp] != '\0') {
ctemp++;
}
//Output information
printf("String Count: %d\n",stemp);
printf("Character Count: %d\n",ctemp);
//Iterate through arrays
for (int i = 0; i != stemp; i++) {
for (int j = 0; j != ctemp; j++) {
if (c[j] != str[i]){
*p++ = str[i];
}
break;
}
printf("%s\n",str);
}
*p = 0;
return str;
}
int main()
{
char c[256] = "ema";
char input[256] = "Great message!";
char *result = rmstr(c, input);
printf("%s", result);
return 0;
}
In this case, the input would be "Great Message!" and the character I'd like to remove all occurrences of the characters: e, m, a (As specified in main).
Using the code as it is above, the output is as follows:
Grat mssag!
It is only looping through 1 iteration and removing 'e'. I would like it to loop through 'm' and 'a' as well.
After you fix your break; that was causing your inner loop to exit, it may make sense to reorder your loops and loop over the chars to remove while checking against the characters in str. This is more of a convenience allowing you to shuffle each character down by one in str if it matches a character is c. If you are using the functions in string.h like memmove to move characters down, it doesn't really matter.
A simple implementation using only pointers to manually work through str removing all chars in c could look something like the following:
#include <stdio.h>
char *rmstr (char *str, const char *chars)
{
const char *c = chars; /* set pointer to beginning of chars */
while (*c) { /* loop over all chars with c */
char *p = str; /* set pointer to str */
while (*p) { /* loop over each char in str */
if (*p == *c) { /* if char in str should be removed */
char *sp = p, /* set start pointer at p */
*ep = p + 1; /* set end pointer at p + 1 */
do
*sp++ = *ep; /* copy end to start to end of str */
while (*ep++); /* (nul-char copied on last iteration) */
}
p++; /* advance to next char in str */
}
c++; /* advance to next char in chars */
}
return str; /* return modified str */
}
int main (void) {
char c[] = "ema";
char input[] = "Great message!";
printf ("original: %s\n", input);
printf ("modified: %s\n", rmstr (input, c));
return 0;
}
(there are many ways to do this -- how is largely up to you. whether you use pointers as above, or get the lengths and use string-indexes is also a matter of choice)
Example Use/Output
$ ./bin/rmcharsinstr
original: Great message!
modified: Grt ssg!
If you did want to use memmove (to address the overlapping nature of the source and destination) to move the remaining characters in str down by one each time the character in str matches a character in c, you could leave the loops in your original order, e.g.
#include <string.h>
char *rmstr (char *str, const char *chars)
{
char *p = str; /* set pointer to str */
while (*p) { /* loop over each char in str */
const char *c = chars; /* set pointer to beginning of chars */
while (*c) { /* loop over all chars with c */
while (*c == *p) { /* while the character matches */
memmove (p, p + 1, strlen (p)); /* shuffle down by 1 */
c = chars; /* reset c = chars to check next */
}
c++; /* advance to next char in chars */
}
p++; /* advance to next char in str */
}
return str; /* return modified str */
}
(make sure you understand why you must reset c = chars; in this case)
Finally, if you really wanted the shorthand way of doing it, you could use strpbrk and memmove and reduce your function to:
#include <string.h>
char *rmstr (char *str, const char *chars)
{
/* simply loop using strpbrk removing the character found */
for (char *p = strpbrk (str, chars); p; p = strpbrk (str, chars))
memmove (p, p+1, strlen(p));
return str; /* return modified str */
}
(there is always more than one way to skin-the-cat in C)
The output is the same. Look things over here and let me know if you have further questions.

Remove spaces from string doesn't return anything

This code removes spaces from a string.
char *RemoveSpaces(char *source)
{
char* i = source;
char* j = source;
while(*j != '\0')
{
*i = *j++;
if(*i != ' ')
i++;
}
*i = 0;
return i;
}
But when I use printf it doesn't return anything.
printf("%s", RemoveSpaces("HELLO WORLD!!! COME ONE\r"));
*i = 0;
return i; // wrong pointer returned
is wrong. It returns a pointer to the string termination (i.e. a pointer to the 0). Therefore it prints nothing.
Try:
*i = 0; // or *i = '\0';
return source; // correct pointer returned
Further, modifying a string literal is not allowed. Instead do:
char str[] = "HELLO WORLD!!! COME ONE\r";
printf("%s", RemoveSpaces(str));
A slight adjustment to your space removal logic may help things make more sense. Instead of setting
*i = *j++;
if(*i != ' ')
i++;
before you test whether *i != ' ', you may find it makes more sense to test first and then assign (where p is your i and ep (end pointer) is your j), e.g.
while (*ep) { /* iterate ep from beginning to end of src */
if (*ep != ' ') /* if its not a space */
*p++ = *ep; /* set begin pointer to char at end ptr, advance */
ep++; /* advance to next char */
}
*p = 0; /* nul-terminate src at p */
In that sense, you only assign and advance the initial pointer if the current character is not a space.
In your example and here, you are modifying the content of source in place, your 'i' and 'j' pointers are used for iterating over the characters in source, so when you are done with the iterations, you return source (or source can be used back in the caller without the return as you are modifying the contents of source in-place.
Putting those pieces together, you could do something similar like the following that will take the string to remove spaces from as the first argument to your code (or use "HELLO WORLD!!! COME ONE\n" by default if no argument is given), e.g.
#include <stdio.h>
char *rmspaces (char *src)
{
char *p = src, /* pointer to beginning of src */
*ep = src; /* pointer to iterate to the end of src */
while (*ep) { /* iterate ep from beginning to end of src */
if (*ep != ' ') /* if its not a space */
*p++ = *ep; /* set begin pointer to char at end ptr, advance */
ep++; /* advance to next char */
}
*p = 0; /* nul-terminate src at p */
return src;
}
int main (int argc, char **argv) {
char *s = argc > 1 ? argv[1] : (char[]){"HELLO WORLD!!! COME ONE\n"};
printf ("%s", rmspaces (s));
return 0;
}
(Note:, your editing in-place is fine, but source (or src) in my case must be modifiable, meaning you cannot pass a string-literal as source (which would likely SegFault). Above a compound-literal is used in the default case to insure the test past is a modifiable character array)
Example Use/Output
$ ./bin/rmspaces
HELLOWORLD!!!COMEONE
Look things over and let me know if you have further questions.

How to find substring between quotation marks in C

If I have a string such as the string that is the command
echo 'foobar'|cat
Is there a good way for me to get the text between the quotation marks ("foobar")? I read that it was possible to use scanf to do it in a file, is it also possible in-memory?
My attempt:
char * concat2 = concat(cmd, token);
printf("concat:%s\n", concat2);
int res = scanf(in, " '%[^']'", concat2);
printf("result:%s\n", in);
Use strtok() once, to locate the first occurrence of delimiter you wish (' in your case), and then once more, to find the ending pair of it, like this:
#include <stdio.h>
#include <string.h>
int main(void) {
const char* lineConst = "echo 'foobar'|cat"; // the "input string"
char line[256]; // where we will put a copy of the input
char *subString; // the "result"
strcpy(line, lineConst);
subString = strtok(line, "'"); // find the first double quote
subString=strtok(NULL, "'"); // find the second double quote
if(!subString)
printf("Not found\n");
else
printf("the thing in between quotes is '%s'\n", subString);
return 0;
}
Output:
the thing in between quotes is 'foobar'
I was based on this: How to extract a substring from a string in C?
If your string is in this format -"echo 'foobar'|cat", sscanf can be used-
char a[20]={0};
char *s="echo 'foobar'|cat";
if(sscanf(s,"%*[^']'%[^']'",a)==1){
// do something with a
}
else{
// handle this condition
}
%*[^'] will read and discard a string until it encounter single quote ' , the second format specifier %[^'] will read string till ' and store it in a.
There are a large number of ways to approach the problem. From walking a pair of pointers down the string to locate the delimiters, and a large number of string functions provided in string.h. You can make use of character search functions such as strchr or string search functions like strpbrk, you can use tokenizing functions like strtok, etc...
Look over and learn from them all. Here is an implementation with strpbrk and a pointer difference. It is non-destructive, so you need not make a copy of the original string.
#include <stdio.h>
#include <string.h>
int main (void) {
const char *line = "'foobar'|cat";
const char *delim = "'"; /* delimiter, single quote */
char *p, *ep;
if (!(p = strpbrk (line, delim))) { /* find the first quote */
fprintf (stderr, "error: delimiter not found.\n");
return 1;
}
p++; /* advance to next char */
ep = strpbrk (p, delim); /* set end pointer to next delim */
if (!p) { /* validate end pointer */
fprintf (stderr, "error: matching delimiters not found.\n");
return 1;
}
char substr[ep - p + 1]; /* storage for substring */
strncpy (substr, p, ep - p); /* copy the substring */
substr[ep - p] = 0; /* nul-terminate */
printf ("\n single-quoted string : %s\n\n", substr);
return 0;
}
Example Use/Output
$ ./bin/substr
single-quoted string : foobar
Without Using string.h
As mentioned above, you can also simply walk a pair of pointers down the string and locate your pairs of quotes in that manner as well. For completeness, here is an example finding multiple quoted strings within a single line:
#include <stdio.h>
int main (void) {
const char *line = "'foobar'|cat'mousebar'sum";
char delim = '\'';
char *p = (char *)line, *sp = NULL, *ep = NULL;
size_t i = 0;
for (; *p; p++) { /* for each char in line */
if (!sp && *p == delim) /* find 1st delim */
sp = p, sp++; /* set start ptr */
else if (!ep && *p == delim) /* find 2nd delim */
ep = p; /* set end ptr */
if (sp && ep) { /* if both set */
char substr[ep - sp + 1]; /* declare substr */
for (i = 0, p = sp; p < ep; p++)/* copy to substr */
substr[i++] = *p;
substr[ep - sp] = 0; /* nul-terminate */
printf ("single-quoted string : %s\n", substr);
sp = ep = NULL;
}
}
return 0;
}
Example Use/Output
$ ./bin/substrp
single-quoted string : foobar
single-quoted string : mousebar
Look all the answers over and let us know if you have any questions.

Substring in c without using functions

I've seen many solutions for getting substring of a string with usage of strndup or memcpy or strncpy and etc,.
I was wondering if there's a way to get substring without using those functions; even if it's unnecessary.
EDIT: I tried making function myself; I don't remember what the problem was but something went wrong and I ended up not using it.
char *substring(char *str, int start, int length) {
char *s = malloc(sizeof(char)*(length+1));
for(int i=start; i<start+length; i++) {
s[i-start] = str[i];
}
s[length] = '\0';
return s;
}
There are a number of ways to recreate strstr. The following is a quick implementation using the inch-worm method, where you simply use pointers to search for the beginning of the substring in string, then if found, compare every character in substring with the corresponding character in string. If all characters match, the substring is found, return a pointer to the beginning of substring in string.
If a character fails the test, look for another character in string that matches the first character in substring, until string is exhausted.
There are probably several more checks that can be inplemented, but this example should get you started:
#include <stdio.h>
#include <stdlib.h>
char *strstr2 (char *str, char *sub)
{
if (!str || !sub) return NULL; /* validate both strings */
char *p = NULL; /* general pointer */
char *sp = NULL; /* substring pointer */
char *rp = NULL; /* return pointer */
char matched = 0; /* matched flag */
size_t szstr = 0; /* string length */
size_t szsub = 0; /* substring length */
p = sub;
while (*p++) szsub++; /* strlen of substr */
p = str;
while (*p++) szstr++; /* strlen of str */
if (szsub > szstr) return NULL; /* szstr < szsub - no match */
p = str;
while (p < (p + szstr - szsub + 1))
{
while (*p && *p != *sub) p++; /* find start of sub in str */
if ((str + szstr) == p) return NULL; /* if end reached - no sub */
rp = p; /* save return pointer */
sp = sub; /* set sp to sub */
matched = 1; /* presume will match */
while (*sp) /* for each in substring */
if (*p++ != *sp++) { /* check if match fails */
matched = 0; /* if failed, no match */
break; /* break & find new start */
}
if (matched) /* if matched, return ptr */
return rp; /* to start of sub in str */
}
return NULL; /* no match, return NULL */
}
int main() {
char *string = NULL;
char *substr = NULL;
char *begin = NULL;
printf ("\nEnter string : ");
scanf ("%m[^\n]%*c", &string);
printf ("\nEnter substr : ");
scanf ("%m[^\n]%*c", &substr);
if ((begin = strstr2 (string, substr)) != NULL)
printf ("\nSubstring found beginning at : %s\n\n", begin);
else
printf ("\nSubstring NOT in string.\n\n");
if (string) free (string);
if (substr) free (substr);
return 0;
}
output:
$ ./bin/strstr
Enter string : This is the full string or "haystack".
Enter substr : g or "
Substring found beginning at : g or "haystack".
$ ./bin/strstr
Enter string : This is the full string or "haystack".
Enter substr : g or '
Substring NOT in string.
Wow!!! So many variables and tests and lots of indentation.
In the 1970's, some considered it poor style to not have all of the return
statements at the bottom of the routine, but that thinking has mostly disappeared.
For some reason, many programmers write their conditionals to test
if one variable is equal, not equal, greater, or less than something else.
They believe that conditionals should be boolean values and nothing else.
But C allows tests of int, char or others equal or not equal to zero.
Zero can be NULL or NUL or any other zero value. This is legal and appropriate.
if (variable) return NULL;
Some consider conditionals with side effects, such as,
if (*h++ == *n++) continue;
where variables h and n are modified, to not be great style.
To avoid that, I suppose you can rewrite it as
if (*h == *n) { h++; n++; continue;}
Here is my version. It is not worse than the version you supplied on this page. But I want to believe it is shorter, simpler, and easier to understand.
My style is not perfect. Nobody has perfect style. I supply this only
for contrast.
char * strstr( const char *haystack, const char *needle) {
const char *h = haystack, *n = needle;
for (;;) {
if (!*n) return (char *)h;
if (!*h) return NULL;
if (*n++ == *h++) continue;
h = ++haystack;
n = needle;
}
}

Resources