How to extract a substring from a string in C? - c

I tried using strncmp but it only works if I give it a specific number of bytes I want to extract.
char line[256] = This "is" an example. //I want to extract "is"
char line[256] = This is "also" an example. // I want to extract "also"
char line[256] = This is the final "example". // I want to extract "example"
char substring[256]
How would I extract all the elements in between the ""? and put it in the variable substring?

Note: I edited this answer after I realized that as written the code would cause a problem as strtok doesn't like to operate on const char* variables. This was more an artifact of how I wrote the example than a problem with the underlying principle - but apparently it deserved a double downvote. So I fixed it.
The following works (tested on Mac OS 10.7 using gcc):
#include <stdio.h>
#include <string.h>
int main(void) {
const char* lineConst = "This \"is\" an example"; // the "input string"
char line[256]; // where we will put a copy of the input
char *subString; // the "result"
strcpy(line, lineConst);
subString = strtok(line,"\""); // find the first double quote
subString=strtok(NULL,"\""); // find the second double quote
printf("the thing in between quotes is '%s'\n", subString);
}
Here is how it works: strtok looks for "delimiters" (second argument) - in this case, the first ". Internally, it knows "how far it got", and if you call it again with NULL as the first argument (instead of a char*), it will start again from there. Thus, on the second call it returns "exactly the string between the first and second double quote". Which is what you wanted.
Warning: strtok typically replaces delimiters with '\0' as it "eats" the input. You must therefore count on your input string getting modified by this approach. If that is not acceptable you have to make a local copy first. In essence I do that in the above when I copy the string constant to a variable. It would be cleaner to do this with a call to line=malloc(strlen(lineConst)+1); and a free(line); afterwards - but if you intend to wrap this inside a function you have to consider that the return value has to remain valid after the function returns... Because strtok returns a pointer to the right place inside the string, it doesn't make a copy of the token. Passing a pointer to the space where you want the result to end up, and creating that space inside the function (with the correct size), then copying the result into it, would be the right thing to do. All this is quite subtle. Let me know if this is not clear!

if you want to do it with no library support...
void extract_between_quotes(char* s, char* dest)
{
int in_quotes = 0;
*dest = 0;
while(*s != 0)
{
if(in_quotes)
{
if(*s == '"') return;
dest[0]=*s;
dest[1]=0;
dest++;
}
else if(*s == '"') in_quotes=1;
s++;
}
}
then call it
extract_between_quotes(line, substring);

#include <string.h>
...
substring[0] = '\0';
const char *start = strchr(line, '"') + 1;
strncat(substring, start, strcspn(start, "\""));
Bounds and error checking omitted. Avoid strtok because it has side effects.

Here is a long way to do this: Assuming string to be extracted will be in quotation marks
(Fixed for error check suggested by kieth in comments below)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char input[100];
char extract[100];
int i=0,j=0,k=0,endFlag=0;
printf("Input string: ");
fgets(input,sizeof(input),stdin);
input[strlen(input)-1] = '\0';
for(i=0;i<strlen(input);i++){
if(input[i] == '"'){
j =i+1;
while(input[j]!='"'){
if(input[j] == '\0'){
endFlag++;
break;
}
extract[k] = input[j];
k++;
j++;
}
}
}
extract[k] = '\0';
if(endFlag==1){
printf("1.Your code only had one quotation mark.\n");
printf("2.So the code extracted everything after that quotation mark\n");
printf("3.To make sure buffer overflow doesn't happen in this case:\n");
printf("4.Modify the extract buffer size to be the same as input buffer size\n");
printf("\nextracted string: %s\n",extract);
}else{
printf("Extract = %s\n",extract);
}
return 0;
}
Output(1):
$ ./test
Input string: extract "this" from this string
Extract = this
Output(2):
$ ./test
Input string: Another example to extract "this gibberish" from this string
Extract = this gibberish
Output(3):(Error check suggested by Kieth)
$ ./test
Input string: are you "happy now Kieth ?
1.Your code only had one quotation mark.
2.So the code extracted everything after that quotation mark
3.To make sure buffer overflow doesn't happen in this case:
4.Modify the extract buffer size to be the same as input buffer size
extracted string: happy now Kieth ?
--------------------------------------------------------------------------------------------------------------------------------
Although not asked for it -- The following code extracts multiple words from input string as long as they are in quotation marks:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char input[100];
char extract[50];
int i=0,j=0,k=0,endFlag=0;
printf("Input string: ");
fgets(input,sizeof(input),stdin);
input[strlen(input)-1] = '\0';
for(i=0;i<strlen(input);i++){
if(input[i] == '"'){
if(endFlag==0){
j =i+1;
while(input[j]!='"'){
extract[k] = input[j];
k++;
j++;
}
endFlag = 1;
}else{
endFlag =0;
}
//break;
}
}
extract[k] = '\0';
printf("Extract = %s\n",extract);
return 0;
}
Output:
$ ./test
Input string: extract "multiple" words "from" this "string"
Extract = multiplefromstring

Have you tried looking at the strchr function? You should be able to call that function twice to get pointers to the first and second instances of the " character and use a combination of memcpy and pointer arithmetic to get what you want.

Related

Appending chars into a String in C with a for loop

I'm still a newbie to C so please forgive me if anything below is wrong. I've searched this up online but nothing really helped.
Right now, I have the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void appendStr (char *str, char c)
{
for (;*str;str++); // note the terminating semicolon here.
*str++ = c;
*str++ = 0;
}
int main(){
char string[] = "imtryingmybest";
char result[] = "";
for(int i = 0; i < strlen(string); i++){
if(i >= 0 && i <= 3){
appendStr(result, string[i]);
}
}
printf("%s", result);
}
Basically, I'm trying to add the first 4 characters of the String named string to result with a for loop. My code above did not work. I've already tried to use strcat and strncat and neither of them worked for me either. When I used
strcat(result, string[i]);
It returns an error saying that the memory cannot be read.
I know that in this example it might have been easier if I just did
appendStr(result, string[0]);
appendStr(result, string[1]);
appendStr(result, string[2]);
appendStr(result, string[3]);
But there is a reason behind why I'm using a for loop that couldn't be explained in this example.
All in all, I'd appreciate it if someone could explain to me how to append individual characters to a string in a for loop.
The following code doesnt use your methods but successfully appends the first 4 chars to result
#include <stdio.h>
#include <string.h>
int main()
{
// declare and initialize strings
char str[] = "imtryingmybest";
char result[5]; // the 5th slot is for \0 as all strings are null terminated
// append chars to result
strncat(result, str, 4);
// ^ ^ ^
// | | |- number of chars to be appended
// | | - string to be appended from
// | - string to append to
// print string
printf("result: %s\n", result);
return (0);
}
The result of the above is as wanted:
>> gcc -o test test.c
>> ./test
result: imtr
Let me know if anything is not clear so i can elaborate further
string was ruined by the overflow of result buffer.
appendStr can be executed only once. next time strlen(string) will return 0. because *str++ = 0; has been written to the space of string.
result buffer has only 1 byte space, but you write 2 byte to it in appendStr call.
the second byte will ruin string space.
I suggest debug with gdb.
try to get rid of Magic numbers
#define BUFF_SIZE 10 // define some bytes to allocate in result char array
#define COPY_COUNT 4 // count of chars that will be copied
int main(){
char string[] = "imtryingmybest";
char result[BUFF_SIZE] {}; // allocate some chunk of memory
for(size_t i = 0; i < strlen(string); i++){
if(i < COPY_COUNT){
appendStr(result, string[i]);
}
}
printf("%s", result);
}
I showed the solution code Paul Yang showed the problem
As others have pointed out the code has a simple mistake in the allocation of the destination string.
When declaring an array without specifying its size, the compiler deduces it by its initializer, which in your case means a 0 + the NULL character.
char result[] = ""; // means { '\0' };
However, I think that the bigger issue here is that you're effectively coding a Schlemiel.
C strings have the serious drawback that they don't store their length, making functions that have to reach the end of the string linear in time complexity O(n).
You already know this, as shown by your function appendStr()
This isn't a serious issue until start you appending characters or strings in a loop.
In each iteration of your loop appendStr() reaches the last character of the string, and extends the string, making the next iteration a little slower.
In fact its time complexity is O(n²)
Of course this is not noticeable for small strings or loops with few iterations, but it'll become a problem if the data scales.
To avoid this you have to take into account the growing size of the string.
I modified appendStr() to show that now it starts from the last element of result
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void appendStr (char *str, char c, char *orig)
{
printf("i: %ld\n", str - orig);
for (; *str; str++); // note the terminating semicolon here.
*str++ = c;
*str++ = 0;
}
int main()
{
char string[32] = "imtryingmybest";
char result[32] = "";
for(int i = 0; i < strlen(string); i++) {
if(i >= 0 && i <= 3) {
// I'm passing a pointer to the last element of the string
appendStr(&result[i], string[i], result);
}
}
printf("%s", result);
}
You can run it here https://onlinegdb.com/HkogMxbG_
More on Schlemiel the painter
https://www.joelonsoftware.com/2001/12/11/back-to-basics/
https://codepen.io/JoshuaVB/pen/JzRoyp

How do I deal with multiple spaces before or in between the words in CS50's initials (more comfortable)?

Problem: http://docs.cs50.net/problems/initials/more/initials.html
As I said in the title, I can't seem to get the program to output the initials with no spaces if the user inputs extra spaces before the name or inputs extra spaces between the first and last name.
Right now, it works only if I input my name like:First Last with no spaces before the name and only one space inbetween the two words. It will print out FL without any additional spaces. I want it to do this no matter how many extra spaces I have before or inbetween the first and last name.
My current code:
#include <stdio.h>
#include <cs50.h>
#include <string.h>
#include <ctype.h>
int main(void) {
printf("Name: ");
string s = get_string();
printf("%c", toupper(s[0]));
for (int i = 0; i < strlen(s); i++) {
if (s[i] == ' ') {
printf("%c", toupper(s[i +1]));
}
}
printf("\n");
}
While you already have a good answer, presuming that string s = get_string(); in the cs50.h world just fills s with a nul-terminated string, and s is either a character array or pointer to allocated memory there are a couple of areas where you may consider improvements.
First, don't use printf to print a single character. That is what putchar (or fputc) is for. (granted a smart optimizing compiler should do it for you, but don't rely on the compiler to fix inefficiencies for you) E.g., instead of
printf("%c", toupper(s[0]));
simply
putchar (toupper(s[0]));
Also, there are some logic issues you may wish to consider. What you want to know is (1) "Is the current character a letter?" (e.g. isalpha (s[x]), (2) "Is this the first character (e.g. index 0), or is it a character that follows a space?" (e.g. s[x-1] == ' '). With than information, you can use a single putchar to output the initials.
Further, with s being a string, you can simply use pointer arithmetic (e.g. while (*s) {.. do stuff with *s ..; s++;}) which ends when you reach the nul-terminator, or if you want to preserve s as a pointer to the first character, or if it is an array, then char *p = s; and operate on p)
Putting those pieces together, you could do something like the following without relying on string.h (you can use simple ifs and bit manipulations of the 6th bit to remove reliance on ctype.h functions as well -- that's for later):
#include <stdio.h>
#include <cs50.h>
#include <ctype.h>
int main (void) {
char *p = NULL;
printf ("Name: ");
string s = get_string(); /* assuming this works as it appears */
for (p = s; *p; p++)
/* if current is [a-zA-Z] and (first or follows space) */
if (isalpha (*p) && (p == s || (*(p - 1) == ' ')))
putchar (toupper (*p));
putchar ('\n'); /* tidy up */
return 0;
}
Example Use/Output
$ ./bin/initials
Name: David C. Rankin
DCR
$ ./bin/initials
Name: Jane Doe
JD
$ ./bin/initials
Name: d
D
$ ./bin/initials
Name: George W... Bush
GWB
Don't use strlen in the condition of the for-loop, it will be executed at every single step, better save the value in a variable and use the variable in the condition instead.
I would use in this case strtok, it deals with inputs like Tom marvolo riddle where you have multiple white spaces between the names.
#include <stdio.h>
#include <ctype.h>
int main(void)
{
char line[1024];
fgets(line, sizeof line, stdin);
char *token, *src = line;
while(token = strtok(src, " \t"))
{
src = NULL; // subsequent calls of strtok must be called
// with NULL
printf("%c", toupper(*token));
}
printf("\n");
return 0;
}
When using strtok you have to remember not to pass a string literal ("this is a string literal") because they are read-only and strtok writes a \0 at the position where the delimiter is found. If you don't know if you have write access to buffer, you have to make a copy (either in a static buffer with enough length, or use malloc) and then use the copy in strtok.
In my example I know that line is not a read-only variable, thus I can safely use in strtok (provided that I won't use it any more, otherwise a copy is required).
One problem with strtok is that it is not reentrant and it's better to use strtok_r instead.
char *token, *src = line, *saveptr;
while(token = strtok_r(src, " \t", &saveptr))
...
In order to make your code work, a simple approach is to add a flag telling whether the previous character was a space. Something like:
#include <stdio.h>
#include <cs50.h>
#include <string.h>
#include <ctype.h>
int main(void)
{
int wasSpace = 1; // Add a flag.
printf("Name: ");
string s = get_string();
for (int i = 0; i < strlen(s); i++)
{
if (wasSpace && s[i] != ' ') // Only print if previous was a space and this isn't
{
wasSpace = 0;
printf("%c", toupper(s[i]));
}
else if (s[i] == ' ')
{
wasSpace = 1; // Update the flag
}
}
printf("\n");
return 0;
}

Better understanding of strstr in C

I already asked on question earlier about the string function strstr, and it just turned out that I had made a stupid mistake. Now again i'm getting unexpected results and can't understand why this is. The code i've written is just a simple test code so that I can understand it better, which takes a text file with a list of 11 words and i'm trying to find where the first word is found within the rest of the words. All i've done is move the text document words into a 2D array of strings, and picked a few out that I know should return a correct value but are instead returning NULL. The first use of strstr returns the correct value but the last 3, which I know include the word chant inside of them, return NULL. If again this is just a stupid mistake I have made I apologize, but any help here on understanding this string function would be great.
The text file goes is formatted like this:
chant
enchant
enchanted
hello
enchanter
enchanting
house
enchantment
enchantress
truck
enchants
And the Code i've written is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
FILE* file1;
char **array;
int i;
char string[12];
char *ptr;
array=(char **)malloc(11*sizeof(char*));
for (i=0;i<11;i++) {
array[i]=(char *)malloc(12*sizeof(char));
}
file1=fopen(argv[1],"r");
for (i=0;i<11;i++) {
fgets(string,12,file1);
strcpy(array[i],string);
}
ptr=strstr(array[1],array[0]);
printf("\nThe two strings chant and %s yield %s",array[1],ptr);
ptr=strstr(array[2],array[0]);
printf("\nThe two strings chant and %s yield %s",array[2],ptr);
ptr=strstr(array[4],array[0]);
printf("\nThe two strings chant and %s yield %s",array[4],ptr);
ptr=strstr(array[5],array[0]);
printf("\nThe two strings chant and %s yields %s",array[5],ptr);
return 0;
}
Get rid of the trailing \n after fgets().
for (i=0;i<11;i++) {
fgets(string, sizeof string, file1);
size_t len = strlen(string);
if (len > 0 && string[len-1] == '\n') string[--len] = '\0';
strcpy(array[i], string);
}
char *chomp(char *str){
char *p = strchr(str, '\n');
if(p)
*p = '\0';
return str;
}
...
strcpy(array[i], chomp(string));

How do I make this shell to parse the statement with quotes around them in C?

I am trying to make this shell parse. How do I make the program implement parsing in a way so that commands that are in quotes will be parsed based on the starting and ending quotes and will consider it as one token? During the second while loop where I am printing out the tokens I think I need to put some sort of if statement, but I am not too sure. Any feedback/suggestions are greatly appreciated.
#include <stdio.h> //printf
#include <unistd.h> //isatty
#include <string.h> //strlen,sizeof,strtok
int main(int argc, char **argv[]){
int MaxLength = 1024; //size of buffer
int inloop = 1; //loop runs forever while 1
char buffer[MaxLength]; //buffer
bzero(buffer,sizeof(buffer)); //zeros out the buffer
char *command; //character pointer of strings
char *token; //tokens
const char s[] = "-,+,|, ";
/* part 1 isatty */
if (isatty(0))
{
while(inloop ==1) // check if the standard input is from terminal
{
printf("$");
command = fgets(buffer,sizeof(buffer),stdin); //fgets(string of char pointer,size of,input from where
token = strtok(command,s);
while (token !=NULL){
printf( " %s\n",token);
token = strtok(NULL, s); //checks for elements
}
if(strcmp(command,"exit\n")==0)
inloop =0;
}
}
else
printf("the standard input is NOT from a terminal\n");
return 0;
}
For an arbitrary command-line syntax, strtok is not the best function. It works for simple cases, where the words are delimited by special characters or white space, but there will come a time where you want to split something like this ls>out into three tokens. strtok can't handle this, because it needs to place its terminating zeros somewhere.
Here's a quick and dirty custom command-line parser:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int error(const char *msg)
{
printf("Error: %s\n", msg);
return -1;
}
int token(const char *begin, const char *end)
{
printf("'%.*s'\n", end - begin, begin);
return 1;
}
int parse(const char *cmd)
{
const char *p = cmd;
int count = 0;
for (;;) {
while (isspace(*p)) p++;
if (*p == '\0') break;
if (*p == '"' || *p == '\'') {
int quote = *p++;
const char *begin = p;
while (*p && *p != quote) p++;
if (*p == '\0') return error("Unmachted quote");
count += token(begin, p);
p++;
continue;
}
if (strchr("<>()|", *p)) {
count += token(p, p + 1);
p++;
continue;
}
if (isalnum(*p)) {
const char *begin = p;
while (isalnum(*p)) p++;
count += token(begin, p);
continue;
}
return error("Illegal character");
}
return count;
}
This code understands words separated by white-space, words separated by single or double quotation marks and single-character operators. It doesn't understand escaped quotation marks inside quotes and non-alphanumeric characters such as the dot in words.
The code is not hard to understand and you can extend it easily to understand double-char operators such as >> or comments.
If you want to escape quotation marks, you'll have to recognise the escape character in parse and unescape it and possible other escape sequences in token.
First, you've declared argv to be an array of pointers to... pointers. In fact, it is an array of pointers to chars. So:
int main(int argc, char **argv){
The trend is you want to reach for [], which got you into incorrect code here, but the idiom in C/C++ is more commonly to use pointer syntax, e.g.:
const char* s = "-+| ";
FWIW.
Also, note that fgets() will return NULL when it hits end of file (e.g., the user types CTRL-D on *nix or CTRL-Z on DOS/Windows). You probably don't want a segment violation when that happens.
Also, bzero() is a nonportable function (you probably don't care in this context) and the C compiler will happily initialize an array to zeroes for you if you ask it to (possibly worth caring about; syntax demonstrated below).
Next, as soon as you allow quoted strings, the next language question that immediately arises is: "how do I quote a quote?". Then, you are immediately out of the territory that can be handled cleanly with strtok(). I'm not 100% sure how you want to break your string into tokens. Using strtok() in the way you do, I think the string "a|b" would produce two tokens, "a" and "b", making you overlook the "|". You're treating "|" and "-" and "+" like whitespace, to be ignored, which is not generally what a shell does. For example, given this command-line:
echo 'This isn''t so hard' | cp -n foo.h .. >foo.out
I would probably want to get the following list of tokens:
echo
'This isn''t so hard'
|
cp
-n
foo.h
..
>
foo.out
Usually, characters like '+' and '-' are not special for most shells' tokenizing process (unlike '|' and '&' and '<', etc. which are instructions to the shell that the spawned command never sees). They get passed onto the application that is then free to decide "'-' indicates this word is an option and not a filename" or whatever.
What follows is a version of your code that produces the output I described (which may or may not be exactly what you want) and allows either double or single-quoted arguments (trivial to extend to handle back-ticks too) that can contain quote marks of the same kind, etc.
#include <stdio.h> //printf
#include <unistd.h> //isatty
#include <string.h> //strlen,sizeof,strtok
#define MAXLENGTH 1024
int main(int argc, char **argv[]){
int inloop = 1; //loop runs forever while 1
char buffer[MAXLENGTH] = {'\0'}; //compiler inits entire array to NUL bytes
// bzero(buffer,sizeof(buffer)); //zeros out the buffer
char *command; //character pointer of strings
char *token; //tokens
char* rover;
const char* StopChars = "|&<> ";
size_t toklen;
/* part 1 isatty */
if (isatty(0))
{
while(inloop ==1) // check if the standard input is from terminal
{
printf("$");
token = command = fgets(buffer,sizeof(buffer),stdin); //fgets(string of char pointer,size of,input from where
if(command)
while(*token)
{
// skip leading whitespace
while(*token == ' ')
++token;
rover = token;
// if possible quoted string
if(*rover == '\'' || *rover == '\"')
{
char Quote = *rover++;
while(*rover)
if(*rover != Quote)
++rover;
else if(rover[1] == Quote)
rover += 2;
else
{
++rover;
break;
}
}
// else if special-meaning character token
else if(strchr(StopChars, *rover))
++rover;
// else generic token
else
while(*rover)
if(strchr(StopChars, *rover))
break;
else
++rover;
toklen = (size_t)(rover-token);
if(toklen)
printf(" %*.*s\n", toklen, toklen, token);
token = rover;
}
if(strcmp(command,"exit\n")==0)
inloop =0;
}
}
else
printf("the standard input is NOT from a terminal\n");
return 0;
}
Regarding your specific request: commands that are in quotes will be parsed based on the starting and ending quotes.
You can use strtok() by tokenizing on the " character. Here's how:
char a[]={"\"this is a set\" this is not"};
char *buf;
buf = strtok(a, "\"");
In that code snippet, buf will contain "this is a set"
Note the use of \ allowing the " character to used as a token delimiter.
Also, Not your main issue, but you need to:
Change this:
const char s[] = "-,+,|, "; //strtok will parse on -,+| and a " " (space)
To:
const char s[] = "-+| "; //strtok will parse on only -+| and a " " (space)
strtok() will parse out whatever you have in the delimiter string, including ","

Arrays in C not working

Well, I declared a global array of chars like this char * strarr[];
in a method I am tokenising a line and try to put everything into that array like this
*line = strtok(s, " ");
while (line != NULL) {
*line = strtok(NULL, " ");
}
seems like this is not working.. How can I fix it?
Thanks
Any number of things could be going wrong with the code you haven't shown us, such as undefined behaviour by strtoking a string constatnt, or getting your parameters wrong when calling the function.
But the most likely problem from the code we can see is the use of *line instead of line, assuming that line is of type char *.
Use the following code as a baseline:
#include <stdio.h>
#include <string.h>
int main (void) {
char str[] = "My name is paxdiablo";
// Start tokenising words.
char *line = strtok (str, " ");
while (line != NULL) {
// Print current token and get next word.
printf ("[%s]\n", line);
line = strtok(NULL, " ");
}
return 0;
}
This outputs:
[My]
[name]
[is]
[paxdiablo]
and should be easily modifiable into something you can use.
Be aware that, if you're trying to save the character pointers returned from strtok (which would make sense for using *line), they are transitory and will not be what you expect after you're done. That's because modifications are made in-place within the source string. You can do it with something like:
#include <stdio.h>
#include <string.h>
int main (void) {
char *word[4]; // The array of words.
size_t i; // General counter.
size_t nextword = 0; // For preventing array overflow.
char str[] = "My name is paxdiablo";
// Start tokenising.
char *line = strtok (str, " ");
while (line != NULL) {
// If array not full, duplicate string to array and advance index.
if (nextword < sizeof(word) / sizeof(*word))
word[nextword++] = strdup (line);
// Get next word.
line = strtok(NULL, " ");
}
// Print out all stored words.
for (i = 0; i < nextword; i++)
printf ("[%s]\n", word[i]);
return 0;
}
Note the specific size of the word array in that code above. The use of char * strarr[] in your code, along with the message tentative array definition assumed to have one element is almost certainly where the problem lies.
If your implementation doesn't come with a strdup, you can get a reasonably-priced one here :-)

Resources