Separating a string that can have arbitrary whitespace in C - c

I have an array of strings and I need to separate each string into several different parts. The string may or may not have arbitrary spaces and tabs.
Example string:
str[0]: " apple e 3 a a fruit "
I need it to become:
word[0] = "apple"
row[0] = "e"
column[0] = "3"
direction[0] = "a"
clue[0] = "a fruit"
So I need to remove any leading/trailing whitespace, as well as any that are in between the fields (except for the clue field. The spaces within the clues need to be retained). I'm really not sure how to go about doing this. I have a few basic ideas, but I don't know how to go about implementing them, or if they're even do-able (new to coding and quite clueless). Everything I've tried so far either wouldn't compile, or didn't work.
My most recent attempt at pulling the first field out:
for (i=0; i<MAX_LENGTH; i++) {
for (j=0; j<MAX_INPUT; j++) {
if (isSpace(&input[i][j]) == FALSE) {
//if whitespace is not present, find the location of the next
//space and copy the string up til there
static int n = j;
for (n=0; n<MAX_INPUT; n++) {
if (isSpace(&input[i][n]) == TRUE) {
strncpy(word[i],&input[i][j],(n-j));
//printf("Word[%d]: %s\n",i,word[i]);
break;
}
}
}
}
}
Not too surprised that it didn't work. I haven't quite gotten my head around the problem yet. Help please?

Your current code seems way too complicated. Here is a three-step algorithm that I would implement if I were coding this up:
while the current character is a space: move to the next character;
while the current character is not a space: append it to the output and move on to the next character;
repeat steps 1 & 2 until you reach the end of the input string.

simply use sscanf to read 'words' separated by whitespaces:
char str[][100] = { " apple e 3 a a fruit ", ... }
char word[100][100],row[100][100],column[100][100],direction[100][100],clue[100][100];
int i;
for( i=0; i<...; ++i )
sscanf(str[i],"%s%s%s%s%[^\n]",word[i],row[i],column[i],direction[i],clue[i]);

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *trimEnd(char *str){
char *p;
if(str == NULL || *str == '\0') return str;
p=str+strlen(str);
while(isspace(*--p) && p >= str){
*p = '\0';
}
return str;
}
int main(){
char str[100][100] = { " apple e 3 a a fruit " };
char word[100][100], row[100][100], column[100][100], direction[100][100], clue[100][100];
const char *delim= " \t";
int i;
const dataSize = 1;//for sample
for(i=0;i<dataSize;++i){
char *pwk, *p = strdup(str[i]);
strcpy(word[i], strtok(p, delim));
strcpy(row[i], strtok(NULL, delim));
strcpy(column[i],strtok(NULL, delim));
strcpy(direction[i], pwk=strtok(NULL, delim));
pwk = pwk+strlen(pwk)+1;
strcpy(clue[i], trimEnd(pwk+strspn(pwk, delim)));
fprintf(stderr,"DEBUG:%s,%s,%s,%s,%s.",word[i],row[i],column[i],direction[i],clue[i]);
free(p);
}
return 0;
}

You can use the strtok_r function.
It breaks the string to tokens, separated by whatever you choose.
const char *delim= " \t";
char *tok = NULL;
word[0] = strtok_r(str[0],delim,&tok);
row[0] = strtok_r(NULL,delim,&tok);
column[0] = strtok_r(NULL,delim,&tok);
direction[0] = strtok_r(NULL,delim,&tok);
clue[0] = strtok_r(NULL,"",&tok); // delim="" - won't break by whitespace

Related

How do I put a string into an array of "words" in C?

So I am making a small program and I wanted to ask how do I break a string into an array of words? I tried strtok but I what if there is a tab or something like that?
char *S = "This is a cool sentence";
char *words[];
words[0] = "This";
words[1] = "is";
// etc.
Can anyone help?
strtok works just fine even if there are tabs in between. Setting the delimiter(the second argument of strtok) to space(" ") ignores all consecutive spaces also. For further clarification refer to the below code.
EDITED: As #Chris Dodd has rightly mentioned, you should add \t to the delimiter strtok(str, " \t") to ignore tabs also.
#include<stdio.h>
#include <string.h>
int main() {
// Initialize str with a string with bunch of spaces and tabs in between
char str[100] = "Hi! This is a long sentence.";
// Get the first word
char* word = strtok(str, " \t");
printf("First word: %s", word); // prints `First word: Hi!`
// Declare an array of string to store each word
char * words[20];
int count = 0;
// Loop through the string to get rest of the words
while (1) {
word = strtok(NULL, " \t");
if(!word) break; // breaks out of the loop, if no more word is left
words[count] = word; // Store it in the array
count++;
}
int index = 0;
// Loop through words and print
while(index < count) {
// prints a comma after previous word and then the next word in a new line
printf(",\n%s", words[index]);
index++;
}
return 0;
}
Output (note that there is no space printed between the words and commas) :
First word: Hi!,
This,
is,
a,
long,
sentence.
Certainly making no claims on efficiency/elegance, but this is a possible implementation to split words on all whitespace. This only prints out the words, it does not save them off to an array or elsewhere, I'll leave that as an exercise to you:
#include <stdio.h>
#include <ctype.h>
void printOrSaveWord(char curWord[], size_t curWordIndex)
{
curWord[curWordIndex] = '\0';
if (curWordIndex > 0)
{
printf("%s\n", curWord);
}
}
void separateWords(const char* sentence)
{
char curWord[256] = { 0 };
size_t curWordIndex = 0;
for (size_t i=0; sentence[i]; i++)
{
// skip all white space
if (isspace(sentence[i]))
{
// found a space, print out the word. This where you would
// add it to an array or otherwise save it, I'll leave that
// task to you
printOrSaveWord(curWord, curWordIndex);
// reset index
curWordIndex = 0;
}
else
{
curWord[curWordIndex++] = sentence[i];
}
}
// catch the ending case
printOrSaveWord(curWord, curWordIndex);
}
Demonstration

What has strtok been replaced with? [duplicate]

#include <stdio.h>
int
main() {
char string[] = "my name is geany";
int length = sizeof(string)/sizeof(char);
printf("%i", length);
int i;
for ( i = 0; i<length; i++ ) {
}
return 0;
}
if i want to print "my" "name" "is" and "geany" separate then what do I do. I was thinking to use a delimnator but i dont know how to do it in C
start with a pointer to the begining of the string
iterate character by character, looking for your delimiter
each time you find one, you have a string from the last position of the length in difference - do what you want with that
set the new start position to the delimiter + 1, and the go to step 2.
Do all these while there are characters remaining in the string...
I needed to do this because the environment was working in had a restricted library that lacked strtok. Here's how I broke up a hyphen-delimited string:
b = grub_strchr(a,'-');
if (!b)
<handle error>
else
*b++ = 0;
c = grub_strchr(b,'-');
if (!c)
<handle error>
else
*c++ = 0;
Here, a begins life as the compound string "A-B-C", after the code executes, there are three null-terminated strings, a, b, and c which have the values "A", "B" and "C". The <handle error> is a place-holder for code to react to missing delimiters.
Note that, like strtok, the original string is modified by replacing the delimiters with NULLs.
This breaks a string at newlines and trims whitespace for the reported strings. It does not modify the string like strtok does, which means this can be used on a const char* of unknown origin while strtok cannot. The difference is begin/end are pointers to the original string chars, so aren't null terminated strings like strtok gives. Of course this uses a static local so isn't thread safe.
#include <stdio.h> // for printf
#include <stdbool.h> // for bool
#include <ctype.h> // for isspace
static bool readLine (const char* data, const char** beginPtr, const char** endPtr) {
static const char* nextStart;
if (data) {
nextStart = data;
return true;
}
if (*nextStart == '\0') return false;
*beginPtr = nextStart;
// Find next delimiter.
do {
nextStart++;
} while (*nextStart != '\0' && *nextStart != '\n');
// Trim whitespace.
*endPtr = nextStart - 1;
while (isspace(**beginPtr) && *beginPtr < *endPtr)
(*beginPtr)++;
while (isspace(**endPtr) && *endPtr >= *beginPtr)
(*endPtr)--;
(*endPtr)++;
return true;
}
int main (void) {
const char* data = " meow ! \n \r\t \n\n meow ? ";
const char* begin;
const char* end;
readLine(data, 0, 0);
while (readLine(0, &begin, &end)) {
printf("'%.*s'\n", end - begin, begin);
}
return 0;
}
Output:
'meow !'
''
''
'meow ?'
use strchr to find the space.
store a '\0' at that location.
the word is now printfable.
repeat
start the search at the position after the '\0'
if nothing is found then print the last word and break out
otherwise, print the word, and continue the loop
Reinventing the wheel is often a bad idea. Learn to use implementation functions is also a good training.
#include <string.h>
/*
* `strtok` is not reentrant, so it's thread unsafe. On POSIX environment, use
* `strtok_r instead.
*/
int f( char * s, size_t const n ) {
char * p;
int ret = 0;
while ( p = strtok( s, " " ) ) {
s += strlen( p ) + 1;
ret += puts( p );
}
return ret;
}

C string nested splitting

I'm a beginner at C and I'm stuck on a simple problem. Here it goes:
I have a string formatted like this: "first1:second1\nsecond2\nfirst3:second3" ... and so on.
As you can see from the the example the first field is optional ([firstx:]secondx).
I need to get a resulting string which contains only the second field. Like this: "second1\nsecond2\nsecond3".
I did some research here on stack (string splitting in C) and I found that there are two main functions in C for string splitting: strtok (obsolete) and strsep.
I tried to write the code using both functions (plus strdup) without success. Most of the time I get some unpredictable result.
Better ideas?
Thanks in advance
EDIT:
This was my first try
int main(int argc, char** argv){
char * stri = "ciao:come\nva\nquialla:grande\n";
char * strcopy = strdup(stri); // since strsep and strtok both modify the input string
char * token;
while((token = strsep(&strcopy, "\n"))){
if(token[0] != '\0'){ // I don't want the last match of '\n'
char * sub_copy = strdup(token);
char * sub_token = strtok(sub_copy, ":");
sub_token = strtok(NULL, ":");
if(sub_token[0] != '\0'){
printf("%s\n", sub_token);
}
}
free(sub_copy);
}
free(strcopy);
}
Expected output: "come", "si", "grande"
Here's a solution with strcspn:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *str = "ciao:come\nva\nquialla:grande\n";
const char *p = str;
while (*p) {
size_t n = strcspn(p, ":\n");
if (p[n] == ':') {
p += n + 1;
n = strcspn(p , "\n");
}
if (p[n] == '\n') {
n++;
}
fwrite(p, 1, n, stdout);
p += n;
}
return 0;
}
We compute the size of the initial segment not containing : or \n. If it's followed by a :, we skip over it and get the next segment that doesn't contain \n.
If it's followed by \n, we include the newline character in the segment. Then we just need to output the current segment and update p to continue processing the rest of the string in the same way.
We stop when *p is '\0', i.e. when the end of the string is reached.

Removing substrings from words that start with them.(in C)

I'm trying to write a small piece of code(function) that removes the substrings in a string only if the words of the string start with these substrings. Look what I've tried till now:
char *string="internship, is wonderful for people who like programming"
char *prefixes="intern wonder in pe"
char *delsubstr(char *string, const char *prefixes)
{
char *tokprefix;
tokprefix=strtok(prefixes, " ");
while(tokprefix)
{
size_t m = strlen(tokprefix);
char *p = string;
while ((p = strstr(p, tokprefix))!= NULL)
{
{
char *q = p + m;
size_t n = strlen(q);
memmove(p, q, n + 1);
}
}
tokprefix=strtok(NULL, " ");
}
return string;
}
The prob with it is it removes the substrings from everywhere and I only what the substrings from the beginning to be removed. Also if "intern" and "in" substrings are present, I want the bigger one to be removed from the string, not the smaller one. Anyone got any ideas/suggestions? I think an if condition before the memmove could be enough, but I'm not sure.
I think an if condition before the memmove could be enough, but I'm
not sure.
Apart from the problem of modifying the prefixes string literal, which is dealt with in the comments above, you're approximately correct:
while ((p = strstr(p, tokprefix))!= NULL)
{
if (p > string && p[-1] != ' ') // not at beginning of word?
++p; // then just advance pointer
else // else remove start of word
{
…

How to split string in C without using strtok

#include <stdio.h>
int
main() {
char string[] = "my name is geany";
int length = sizeof(string)/sizeof(char);
printf("%i", length);
int i;
for ( i = 0; i<length; i++ ) {
}
return 0;
}
if i want to print "my" "name" "is" and "geany" separate then what do I do. I was thinking to use a delimnator but i dont know how to do it in C
start with a pointer to the begining of the string
iterate character by character, looking for your delimiter
each time you find one, you have a string from the last position of the length in difference - do what you want with that
set the new start position to the delimiter + 1, and the go to step 2.
Do all these while there are characters remaining in the string...
I needed to do this because the environment was working in had a restricted library that lacked strtok. Here's how I broke up a hyphen-delimited string:
b = grub_strchr(a,'-');
if (!b)
<handle error>
else
*b++ = 0;
c = grub_strchr(b,'-');
if (!c)
<handle error>
else
*c++ = 0;
Here, a begins life as the compound string "A-B-C", after the code executes, there are three null-terminated strings, a, b, and c which have the values "A", "B" and "C". The <handle error> is a place-holder for code to react to missing delimiters.
Note that, like strtok, the original string is modified by replacing the delimiters with NULLs.
This breaks a string at newlines and trims whitespace for the reported strings. It does not modify the string like strtok does, which means this can be used on a const char* of unknown origin while strtok cannot. The difference is begin/end are pointers to the original string chars, so aren't null terminated strings like strtok gives. Of course this uses a static local so isn't thread safe.
#include <stdio.h> // for printf
#include <stdbool.h> // for bool
#include <ctype.h> // for isspace
static bool readLine (const char* data, const char** beginPtr, const char** endPtr) {
static const char* nextStart;
if (data) {
nextStart = data;
return true;
}
if (*nextStart == '\0') return false;
*beginPtr = nextStart;
// Find next delimiter.
do {
nextStart++;
} while (*nextStart != '\0' && *nextStart != '\n');
// Trim whitespace.
*endPtr = nextStart - 1;
while (isspace(**beginPtr) && *beginPtr < *endPtr)
(*beginPtr)++;
while (isspace(**endPtr) && *endPtr >= *beginPtr)
(*endPtr)--;
(*endPtr)++;
return true;
}
int main (void) {
const char* data = " meow ! \n \r\t \n\n meow ? ";
const char* begin;
const char* end;
readLine(data, 0, 0);
while (readLine(0, &begin, &end)) {
printf("'%.*s'\n", end - begin, begin);
}
return 0;
}
Output:
'meow !'
''
''
'meow ?'
use strchr to find the space.
store a '\0' at that location.
the word is now printfable.
repeat
start the search at the position after the '\0'
if nothing is found then print the last word and break out
otherwise, print the word, and continue the loop
Reinventing the wheel is often a bad idea. Learn to use implementation functions is also a good training.
#include <string.h>
/*
* `strtok` is not reentrant, so it's thread unsafe. On POSIX environment, use
* `strtok_r instead.
*/
int f( char * s, size_t const n ) {
char * p;
int ret = 0;
while ( p = strtok( s, " " ) ) {
s += strlen( p ) + 1;
ret += puts( p );
}
return ret;
}

Resources