Cut string at a certain character - c

Im trying to make a function which cuts and prints/does something with parts of a string.
Say i have this string:
"strings.no.header"
And i would like to split it up so it prints something like this:
strings
no
header
Here is my lousy attempt, where i am first able to (without the else statement) print out "strings". My idea was to make the function recursive so it prints and removes the first part before ".", and then does the same with the rest of the string.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void cut(char string[]);
int main() {
char string[] = "strings.no.header";
cut(string);
}
void cut(char string[]) {
char *toString = malloc(sizeof(char));
int len = strlen(string);
for(int i = 0; i < len; i++) {
int count = 0;
if(string[i] != '.') {
toString[i] = string[i];
} else if(string[i] == '.') {
char *tmp = malloc(sizeof(char));
tmp = string + i;
cut(tmp);
}
}
printf("%s", toString);
}
Would be grateful if someone could point me in the right direction.

I have an idea:
check this web page: https://en.wikibooks.org/wiki/C_Programming/Strings
use strchr to find '.'then, once you know the position, copy the characters in other string, remember that you already know where the '.' appears, so you know exactly what to copy.

Strtok is your friend. No need to reinvent the wheel.
void cut (const char* str){
char * pch;
pch = strtok (str,".");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ".");
}
}
The code is just a quick edit of the sample from the link.

A solution with strtok:
#include <stdio.h>
#include <string.h>
void cut(char string[]);
int main()
{
char string[] = "strings.no.header";
cut(string);
}
void cut(char string[])
{
char *pch;
pch = strtok (string, ".");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ".");
}
}

As others have noted, for something like this, strtok will give you what you need. A word of warning, this function will wreck your string, so it is always advisable to make a copy first.
Basically, what strtok does is replace all characters in the string that match the second argument of strtok and replace them with the '\0' character. The function then returns the pointer to the first "token". Each subsequent call to strtok will return the pointer to the next token. Here is a basic tutorial.
To keep it simple, I kept all the code in the main. You can now use this technique to implement the logic that suits your particular problem.
int main()
{
char string[] = "strings.no.header";
char buffer[50] = { 0 };//make sure big enough
char* token = NULL;
char* nextToken = NULL;
//want to make a copy because strtok will wreck the string
strcpy(buffer, string);
token = strtok_s(buffer, ".", &nextToken);
while (token)
{
printf("%s\n", token);//or do whatever you like.
token = strtok_s(NULL, ".", &nextToken);//pass NULL on subsequent calls
}
getchar();
return 0;
}
You may find this interesting that:
After the first call of token = strtok_s(buffer, ".", &nextToken);,
buffer now contains: "strings\0no.header".
After the second call,
buffer now contains: "strings\0no\0header".
This is the reason why you want to make a copy of the original string before using strtok. If you do not care about the original string, it is ok to use it.

Related

Taking only a part of a text to an array [duplicate]

How to split a string into an tokens and then save them in an array?
Specifically, I have a string "abc/qwe/jkh". I want to separate "/", and then save the tokens into an array.
Output will be such that
array[0] = "abc"
array[1] = "qwe"
array[2] = "jkh"
please help me
#include <stdio.h>
#include <string.h>
int main ()
{
char buf[] ="abc/qwe/ccd";
int i = 0;
char *p = strtok (buf, "/");
char *array[3];
while (p != NULL)
{
array[i++] = p;
p = strtok (NULL, "/");
}
for (i = 0; i < 3; ++i)
printf("%s\n", array[i]);
return 0;
}
You can use strtok()
char string[] = "abc/qwe/jkh";
char *array[10];
int i = 0;
array[i] = strtok(string, "/");
while(array[i] != NULL)
array[++i] = strtok(NULL, "/");
Why strtok() is a bad idea
Do not use strtok() in normal code, strtok() uses static variables which have some problems. There are some use cases on embedded microcontrollers where static variables make sense but avoid them in most other cases. strtok() behaves unexpected when more than 1 thread uses it, when it is used in a interrupt or when there are some other circumstances where more than one input is processed between successive calls to strtok().
Consider this example:
#include <stdio.h>
#include <string.h>
//Splits the input by the / character and prints the content in between
//the / character. The input string will be changed
void printContent(char *input)
{
char *p = strtok(input, "/");
while(p)
{
printf("%s, ",p);
p = strtok(NULL, "/");
}
}
int main(void)
{
char buffer[] = "abc/def/ghi:ABC/DEF/GHI";
char *p = strtok(buffer, ":");
while(p)
{
printContent(p);
puts(""); //print newline
p = strtok(NULL, ":");
}
return 0;
}
You may expect the output:
abc, def, ghi,
ABC, DEF, GHI,
But you will get
abc, def, ghi,
This is because you call strtok() in printContent() resting the internal state of strtok() generated in main(). After returning, the content of strtok() is empty and the next call to strtok() returns NULL.
What you should do instead
You could use strtok_r() when you use a POSIX system, this versions does not need static variables. If your library does not provide strtok_r() you can write your own version of it. This should not be hard and Stackoverflow is not a coding service, you can write it on your own.

String and String array Manipulation in c

I'm trying to write a string spliter function in C.It uses space as delimiter to split a given string in two or more. It more like the split funtion in Python.Here is the code:-
#include <stdio.h>
#include <string.h>
void slice_input (char *t,char **out)
{
char *x,temp[10];
int i,j;
x = t;
j=0;
i=0;
for (;*x!='\0';x++){
if (*x!=' '){
temp[i] = *x;
i++;
}else if(*x==' '){
out[j] = temp;
j++;i=0;
}
}
}
int main()
{
char *out[2];
char inp[] = "HEllo World ";
slice_input(inp,out);
printf("%s\n%s",out[0],out[1]);
//printf("%d",strlen(out[1]));
return 0;
}
Expeted Output:-
HEllo
World
but it is showing :-
World
World
Can you help please?
out[j] = temp;
where temp is a local variable. It will go out of scope as soon as your function terminates, thus out[j] will point to garbage, invoking Undefined Behavior when being accessed.
A simple fix would be to use a 2D array for out, and use strcpy() to copy the temp string to out[j], like this:
#include <stdio.h>
#include <string.h>
void slice_input(char *t, char out[2][10]) {
char *x, temp[10];
int i,j;
x = t;
j=0;
i=0;
for (;*x!='\0';x++) {
if (*x!=' ') {
temp[i] = *x;
i++;
} else if(*x==' ') {
strcpy(out[j], temp);
j++;
i=0;
}
}
}
int main()
{
char out[2][10];
char inp[] = "HEllo World ";
slice_input(inp,out);
printf("%s\n%s",out[0],out[1]);
return 0;
}
Output:
HEllo
World
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
From the website:
char * strtok ( char * str, const char * delimiters ); On a first
call, the function expects a C string as argument for str, whose first
character is used as the starting location to scan for tokens. In
subsequent calls, the function expects a null pointer and uses the
position right after the end of last token as the new starting
location for scanning.
Once the terminating null character of str is found in a call to
strtok, all subsequent calls to this function (with a null pointer as
the first argument) return a null pointer.
Parameters
str C string to truncate. Notice that this string is modified by being
broken into smaller strings (tokens). Alternativelly [sic], a null
pointer may be specified, in which case the function continues
scanning where a previous successful call to the function ended.
delimiters C string containing the delimiter characters. These may
vary from one call to another. Return Value
A pointer to the last token found in string. A null pointer is
returned if there are no tokens left to retrieve.
Example
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
You can use this function to split string into tokens - there is no need to use some own functions. Your code looks like garbage, please format it.
Your source propably would look like this:
char *
strtok(s, delim)
char *s; /* string to search for tokens */
const char *delim; /* delimiting characters */
{
static char *lasts;
register int ch;
if (s == 0)
s = lasts;
do {
if ((ch = *s++) == '\0')
return 0;
} while (strchr(delim, ch));
--s;
lasts = s + strcspn(s, delim);
if (*lasts != 0)
*lasts++ = 0;
return s;
}

Split and Join strings in C Language

I learnt C in uni but haven't used it for quite a few years. Recently I started working on a tool which uses C as the programming language. Now I'm stuck with some really basic functions. Among them are how to split and join strings using a delimiter? (I miss Python so much, even Java or C#!)
Below is the function I created to split a string, but it does not seem to work properly. Also, even this function works, the delimiter can only be a single character. How can I use a string as a delimiter?
Can someone please provide some help?
Ideally, I would like to have 2 functions:
// Split a string into a string array
char** fSplitStr(char *str, const char *delimiter);
// Join the elements of a string array to a single string
char* fJoinStr(char **str, const char *delimiter);
Thank you,
Allen
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
char** fSplitStr(char *str, const char *delimiters)
{
char * token;
char **tokenArray;
int count=0;
token = (char *)strtok(str, delimiters); // Get the first token
tokenArray = (char**)malloc(1 * sizeof(char*));
if (!token) {
return tokenArray;
}
while (token != NULL ) { // While valid tokens are returned
tokenArray[count] = (char*)malloc(sizeof(token));
tokenArray[count] = token;
printf ("%s", tokenArray[count]);
count++;
tokenArray = (char **)realloc(tokenArray, sizeof(char *) * count);
token = (char *)strtok(NULL, delimiters); // Get the next token
}
return tokenArray;
}
int main (void)
{
char str[] = "Split_The_String";
char ** splitArray = fSplitStr(str,"_");
printf ("%s", splitArray[0]);
printf ("%s", splitArray[1]);
printf ("%s", splitArray[2]);
return 0;
}
Answers: (Thanks to Moshbear, Joachim and sarnold):
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
char** fStrSplit(char *str, const char *delimiters)
{
char * token;
char **tokenArray;
int count=0;
token = (char *)strtok(str, delimiters); // Get the first token
tokenArray = (char**)malloc(1 * sizeof(char*));
tokenArray[0] = NULL;
if (!token) {
return tokenArray;
}
while (token != NULL) { // While valid tokens are returned
tokenArray[count] = (char*)strdup(token);
//printf ("%s", tokenArray[count]);
count++;
tokenArray = (char **)realloc(tokenArray, sizeof(char *) * (count + 1));
token = (char *)strtok(NULL, delimiters); // Get the next token
}
tokenArray[count] = NULL; /* Terminate the array */
return tokenArray;
}
char* fStrJoin(char **str, const char *delimiters)
{
char *joinedStr;
int i = 1;
joinedStr = realloc(NULL, strlen(str[0])+1);
strcpy(joinedStr, str[0]);
if (str[0] == NULL){
return joinedStr;
}
while (str[i] !=NULL){
joinedStr = (char*)realloc(joinedStr, strlen(joinedStr) + strlen(str[i]) + strlen(delimiters) + 1);
strcat(joinedStr, delimiters);
strcat(joinedStr, str[i]);
i++;
}
return joinedStr;
}
int main (void)
{
char str[] = "Split_The_String";
char ** splitArray = (char **)fStrSplit(str,"_");
char * joinedStr;
int i=0;
while (splitArray[i]!=NULL) {
printf ("%s", splitArray[i]);
i++;
}
joinedStr = fStrJoin(splitArray, "-");
printf ("%s", joinedStr);
return 0;
}
Use strpbrk instead of strtok, because strtok suffers from two weaknesses:
it's not re-entrant (i.e. thread-safe)
it modifies the string
For joining, use strncat for joining, and realloc for resizing.
The order of operations is very important.
Before doing the realloc;strncat loop, set the 0th element of the target string to '\0' so that strncat won't cause undefined behavior.
For starters, don't use sizeof to get the length of a string. strlen is the function to use. In this case strdup is better.
And you don't actually copy the string returned by strtok, you copy the pointer. Change you loop to this:
while (token != NULL) { // While valid tokens are returned
tokenArray[count] = strdup(token);
printf ("%s", tokenArray[count]);
count++;
tokenArray = (char **)realloc(tokenArray, sizeof(char *) * count);
token = (char *)strtok(NULL, delimiters); // Get the next token
}
tokenArray[count] = NULL; /* Terminate the array */
Also, don't forget to free the entries in the array, and the array itself when you're done with it.
Edit At the beginning of fSplitStr, wait with allocating the tokenArray until after you check that token is not NULL, and if token is NULL why not return NULL?
I'm not sure the best solution for you, but I do have a few notes:
token = (char *)strtok(str, delimiters); // Get the first token
tokenArray = (char**)malloc(1 * sizeof(char*));
if (!token) {
return tokenArray;
}
At this point, if you weren't able to find any tokens in the string, you return a pointer to an "array" that is large enough to hold a single character pointer. It is un-initialized, so it would not be a good idea to use the contents of this array in any way. C almost never initializes memory to 0x00 for you. (calloc(3) would do that for you, but since you need to overwrite every element anyway, it doesn't seem worth switching to calloc(3).)
Also, the (char **) case before the malloc(3) call indicates to me that you've probably forgotten the #include <stdlib.h> that would properly prototype malloc(3). (The cast was necessary before about 1989.)
Do note that your while() { } loop is setting pointers to the parts of the original input string to your tokenArray elements. (This is one of the cons that moshbear mentioned in his answer -- though it isn't always a weakness.) If you change tokenArray[1][1]='H', then your original input string also changes. (In addition to having each of the delimiter characters replaced with an ASCII NUL character.)

copying string from strtok

I need to divide a C string into tokens. I thought that strtok will be my best try, but I'm getting very strange results...
Here is my test program. In this example I will get 3 tokens with "##" separator but when I try to work with the ones I supposedly had copied, only the third one is shown correctly.. the other two look corrupted or something... I don't know... ?
#include <stdio.h>
#include <string.h>
#include <malloc.h>
#define TAM 3 //elements
char** aTokens(char* str, char* delimitador)
{
char* pch;
char** tokens;
int i = 0;
tokens = (char**)malloc(sizeof(char*)*TAM);
pch = strtok(str, delimitador);
while(pch != NULL)
{
tokens[i] = (char*)malloc((sizeof(strlen(pch))+1) * sizeof(char));
strcpy(tokens[i], pch);
pch = strtok(NULL, delimitador);
i++;
}
return tokens;
}
int main ()
{
char str[] = "30117700,1,TITULAR,SIGQAA070,1977/11/30,M,1,14000,0.00,6600.00,10.00,2011/09/01,2012/09/01,0|17,0.00,NO,0,0,0.00, ,##30117700,1,TITULAR,SIGQAA070,1977/11/30,M,1,14000,0.00,6600.00,10.00,2011/09/01,2012/09/01,0|17,0.00,NO,0,0,0.00, ,##30117700,1,TITULAR,SIGQAA070,1977/11/30,M,1,14000,0.00,6600.00,10.00,2011/09/01,2012/09/01,0|17,0.00,NO,0,0,0.00, ,";
char** tokens;
int i;
tokens = aTokens(str, "##");
for(i = 0; i<TAM; i++)
printf("%d -- %s\n", strlen(tokens[i]), tokens[i]);
//Clean
//for(i = 0; i<TAM; i++)
//free(tokens[i]);
//free(tokens);
return 0;
}
output with GCC on Linux:
13 -- 30117700,1,T <---- ?
13 -- 30117700,1,T <----- ?
115 -- 30117700,1,TITULAR,SIGQAA070,1977/11/30,M,1,14000,0.00,6600.00,10.00,2011/09/01,2012/09/01,0|17,0.00,NO,0,0,0.00, ,
I have commented the "clean" section because it provides lots of runtime error too ... :(
Help please!!
I think you are slightly confused on how strtok works.
For the most part, you've got it right. However, the string of separator characters that is given to strtok is not used as a string per se, but it used more like an array of characters, and strtok only cares about these individual characters. So calling strtok with the string "#" is exactly the same as giving it "##". In order to tokenize your string correctly, you need to decide on a single separator character to use, or use a different (perhaps custom) tokenizer function that can handle multi-character separators..
The following line isn't correct. sizeof(strlen(..)) will be 4 (in a 32-bit app) regardless of the length of the string.
tokens[i] = (char*)malloc((sizeof(strlen(pch))+1) * sizeof(char));
It should probably be:
tokens[i] = (char*)malloc((strlen(pch)+1) * sizeof(char));
Standard implementation of strtok:
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "30117700,1,TITULAR,SIGQAA070,1977/11/30,M,1,14000,0.00,6600.00,10.00,2011/09/01,2012/09/01,0|17,0.00,NO,0,0,0.00, ,##30117700,1,TITULAR,SIGQAA070,1977/11/30,M,1,14000,0.00,6600.00,10.00,2011/09/01,2012/09/01,0|17,0.00,NO,0,0,0.00, ,##30117700,1,TITULAR,SIGQAA070,1977/11/30,M,1,14000,0.00,6600.00,10.00,2011/09/01,2012/09/01,0|17,0.00,NO,0,0,0.00, ,";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str,"#");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, "#");
}
return 0;
}

How to safety parse tab-delimited string ?

How to safety parse tab-delimiter string ? for example:
test\tbla-bla-bla\t2332 ?
strtok() is a standard function for parsing strings with arbitrary delimiters. It is, however, not thread-safe. Your C library of choice might have a thread-safe variant.
Another standard-compliant way (just wrote this up, it is not tested):
#include <string.h>
#include <stdio.h>
int main()
{
char string[] = "foo\tbar\tbaz";
char * start = string;
char * end;
while ( ( end = strchr( start, '\t' ) ) != NULL )
{
// %s prints a number of characters, * takes number from stack
// (your token is not zero-terminated!)
printf( "%.*s\n", end - start, start );
start = end + 1;
}
// start points to last token, zero-terminated
printf( "%s", start );
return 0;
}
Use strtok_r instead of strtok (if it is available). It has similar usage, except it is reentrant, and it does not modify the string like strtok does. [Edit: Actually, I misspoke. As Christoph points out, strtok_r does replace the delimiters by '\0'. So, you should operate on a copy of the string if you want to preserve the original string. But it is preferable to strtok because it is reentrant and thread safe]
strtok will leave your original string modified. It replaces the delimiter with '\0'. And if your string happens to be a constant, stored in a read only memory (some compilers will do that), you may actually get a access violation.
Using strtok() from string.h.
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "test\tbla-bla-bla\t2332";
char * pch;
pch = strtok (str," \t");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " \t");
}
return 0;
}
You can use any regex library or even the GLib GScanner, see here and here for more information.
Yet another version; this one separates the logic into a new function
#include <stdio.h>
static _Bool next_token(const char **start, const char **end)
{
if(!*end) *end = *start; // first call
else if(!**end) // check for terminating zero
return 0;
else *start = ++*end; // skip tab
// advance to terminating zero or next tab
while(**end && **end != '\t')
++*end;
return 1;
}
int main(void)
{
const char *string = "foo\tbar\tbaz";
const char *start = string;
const char *end = NULL; // NULL value indicates first call
while(next_token(&start, &end))
{
// print substring [start,end[
printf("%.*s\n", end - start, start);
}
return 0;
}
If you need a binary safe way to tokenize a given string:
#include <string.h>
#include <stdio.h>
void tokenize(const char *str, const char delim, const size_t size)
{
const char *start = str, *next;
const char *end = str + size;
while (start < end) {
if ((next = memchr(start, delim, end - start)) == NULL) {
next = end;
}
printf("%.*s\n", next - start, start);
start = next + 1;
}
}
int main(void)
{
char str[] = "test\tbla-bla-bla\t2332";
int len = strlen(str);
tokenize(str, '\t', len);
return 0;
}

Resources