String Split in C

String Split in C - c

I want to split a String in C.
My String is defined by my Struct:
struct String
{
char *c;
int length;
int maxLength;
}
Then I have a function that does the splitting. Perhaps C has something that does this, but although I wanted my own, I have not found anything that will do it so far.
String ** spliter(String *s)
{
if(s == NULL)
return NULL;
// set of splitters: {'\n', ' '}
}
Input looks something like this: This is Sparta.
Then I want to return a pointer to each character array.
*p1 = This
*p2 = is
*p3 = Sparta.
If that makes any sense, I want an array of pointers, and each pointer points to a character array.
I will have to realloc the String as I increment the size of each character array. Probably my biggest problem is imagining how the pointers work.
Similar problem: c splitting a char* into an char**
So, how do I go about doing this?

#include <string>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string test = "aa aa bbc cccd";
vector<string> strvec;
string strtemp;
string::size_type pos1, pos2;
pos2 = test.find(' ');
pos1 = 0;
while (string::npos != pos2)
{
strvec.push_back(test.substr(pos1, pos2 - pos1));
pos1 = pos2 + 1;
pos2 = test.find(' ', pos1);
}
strvec.push_back(test.substr(pos1));
vector<string>::iterator iter1 = strvec.begin(), iter2 = strvec.end();
while (iter1 != iter2)
{
cout << *iter1 << endl;
++iter1;
}
return 0;
}

Have you looked at strtok? It should be possible to do this using strtok.

here is a exemple :
String ** spliter(String *s)
{
int i;
int j;
char *p1;
char *p2;
char *p3;
i = 0;
j = 0;
if(s == NULL)
return NULL;
p1 = malloc(sizeof(*p1) * strlen(s));
p2 = malloc(sizeof(*p2) * strlen(s));
p3 = malloc(sizeof(*p3) * strlen(s));
while (s[i] != ' ')
{
p1[j++] = s[i];
i++;
}
i++;
j = 0;
while (s[i] != ' ')
{
p2[j++] = s[i];
i++;
}
i++;
j = 0;
while (s[i] != '\0')
{
p3[j++] = s[i];
i++;
}
printf("%s\n", p1);
printf("%s\n", p2);
printf("%s\n", p3);
}

You're looking for strtok, check out man 3 strtok, or here if you're not on *nix.
You would use it like this: (Assuming that you can write the add_string code yourself.)
String ** spliter(String *s)
{
if(s == NULL)
return NULL;
String **return_strings = NULL;
char *delim = " \n";
char *string = strtok(s, delim);
int i = 0;
for(i = 0; add_string(return_strings, string, i) != -1; i++) {
string = strtok(NULL, delim);
}
return strings;
}
Note that if you need to save the original string (strtok modifies the string it works on), you'll need to call strdup on the original string, then operate on the copy.
EDIT: OP said he was having trouble thinking about the pointers. With the above code sample, add_string only has to worry about dealing with a string of characters, as opposed to an array of pointers to pointers to characters. So it might look something like this:
int add_string(String **strings, char *s, int len)
{
if(s == NULL)
return -1;
String *current_string = NULL;
strings = realloc(strings, sizeof(String) * (len + 1));
current_string = strings[len];
/* fill out struct fields here */
}

add strdup and strtok can work on a copy of the string. The split() call is more generic than the other spliter() examples, but does the same thing with strtok on a duplicate.
char **
split(char **result, char *w, const char *src, const char *delim)
{
int i=0;
char *p;
strcpy(w,src);
for(p=strtok(w, delim) ; p!=NULL; p=strtok('\0', delim) )
{
result[i++]=p;
result[i]=NULL;
}
return result;
}
void display(String *p)
{
char *result[24]={NULL};
char *w=strdup(p->c);
char **s=split(result, w, p->, "\t \n"); split on \n \t and space as delimiters
for( ; *s!=NULL; s++)
printf("%s\n", *s);
free(w);
}

Related

Dynamic memory allocation for an array of pointers to char in C

I'm building a word counter program. To achieve this, I was thinking about saving the string the user inputted, and using strtok() to split the sentence with space as the delimiter. But first I want to allocate enough memory for each word. Let's say the sentence is "Hello World". I've already dynamically allocated memory for the string itself. Now I want to split Hello World into 2 strings, "Hello" and "World". My goal is to allocate enough memory so that there's not too much empty space but I also don't want to allocate too little space. Here is my code so far:
#include <stdio.h>
#include <stdlib.h>
char *strmalloc(char **string);
char *user_input = NULL;
char *word_array[];
int main(void) {
printf("Enter a sentence to find out the number of words: ");
user_input = strmalloc(&user_input);
return 0;
}
char *strmalloc(char **string) {
char *tmp = NULL;
size_t size = 0, index = 0;
int ch;
while ((ch = getchar()) != '\n' && ch != EOF) {
if (size <= index) {
size += 1;
tmp = realloc(*string, size);
if (!tmp) {
free(*string);
string = NULL;
break;
}
*string = tmp;
}
(*string)[index++] = ch;
}
return *string;
}
How would I go about doing this? Should I do the splitting first or allocate the space required for the array first?

You can count words without splitting the sentence, here is an example :
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
// Change this to change the separator characters
static inline char isSeparator(char ch) { return isspace(ch) || ispunct(ch); }
char * jumpSeparator(char *string) {
while(string[0] && isSeparator(string[0])) string++;
return string;
}
char * findEndOfWord(char *string) {
while (string[0] && !isSeparator(string[0])) string++;
return string;
}
int countWords(char *string) {
char * ptr = jumpSeparator(string);
if (strlen(ptr) == 0) return 0;
int count = 1;
while((ptr = findEndOfWord(ptr)) && ptr[0]) {
ptr = jumpSeparator(ptr);
if (!ptr) break;
count++;
}
return count;
}
int main() {
char * sentence = "This is,a function... to||count words";
int count = countWords(sentence);
printf("%d\n", count); //====> 7
}
EDIT : Reusing the same functions here is another example that allocates substrings dynamically :
int main() {
char * sentence = "This is,a function... to||split words";
int count = countWords(sentence);
char * ptr = sentence, *start, *end;
char ** substrings = malloc(count * sizeof(char *));
int i=0;
while((ptr = jumpSeparator(ptr)) && ptr[0]) {
start = ptr;
ptr = findEndOfWord(ptr);
end = ptr;
int len = end-start;
char * newString = malloc(len + 1);
memcpy(newString, start, len);
newString[len] = 0;
substrings[i++] = newString;
}
// Prints the result
for(int i=0; i<count; i++) printf("%s\n", substrings[i]);
// Frees the allocated memory
for(int i=0; i<count; i++) free(substrings[i]);
free(substrings);
return 0;
}
Output :
This
is
a
function
to
split
words

Copying specific number of characters from a string to another

I have a variable length string that I am trying to divide from plus signs and study on:
char string[] = "var1+vari2+varia3";
for (int i = 0; i != sizeof(string); i++) {
memcpy(buf, string[0], 4);
buf[9] = '\0';
}
since variables are different in size I am trying to write something that is going to take string into loop and extract (divide) variables. Any suggestions ? I am expecting result such as:
var1
vari2
varia3

You can use strtok() to break the string by delimiter
char string[]="var1+vari2+varia3";
const char delim[] = "+";
char *token;
/* get the first token */
token = strtok(string, delim);
/* walk through other tokens */
while( token != NULL ) {
printf( " %s\n", token );
token = strtok(NULL, delim);
}
More info about the strtok() here: https://man7.org/linux/man-pages/man3/strtok.3.html

It seems to me that you don't just want to want to print the individual strings but want to save the individual strings in some buffer.
Since you can't know the number of strings nor the length of the individual string, you should allocate memory dynamic, i.e. use functions like realloc, calloc and malloc.
It can be implemented in several ways. Below is one example. To keep the example simple, it's not performance optimized in anyway.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
char** split_string(const char* string, const char* token, int* num)
{
assert(string != NULL);
assert(token != NULL);
assert(num != NULL);
assert(strlen(token) != 0);
char** data = NULL;
int num_strings = 0;
while(*string)
{
// Allocate memory for one more string pointer
char** ptemp = realloc(data, (num_strings + 1) * sizeof *data);
if (ptemp == NULL) exit(1);
data = ptemp;
// Look for token
char* tmp = strstr(string, token);
if (tmp == NULL)
{
// Last string
// Allocate memory for one more string and copy it
int len = strlen(string);
data[num_strings] = calloc(len + 1, 1);
if (data[num_strings] == NULL) exit(1);
memcpy(data[num_strings], string, len);
++num_strings;
break;
}
// Allocate memory for one more string and copy it
int len = tmp - string;
data[num_strings] = calloc(len + 1, 1);
if (data[num_strings] == NULL) exit(1);
memcpy(data[num_strings], string, len);
// Prepare to search for next string
++num_strings;
string = tmp + strlen(token);
}
*num = num_strings;
return data;
}
int main()
{
char string[]="var1+vari2+varia3";
// Split the string into dynamic allocated memory
int num_strings;
char** data = split_string(string, "+", &num_strings);
// Now data can be used as an array-of-strings
// Example: Print the strings
printf("Found %d strings:\n", num_strings);
for(int i = 0; i < num_strings; ++i) printf("%s\n", data[i]);
// Free the memory
for(int i = 0; i < num_strings; ++i) free(data[i]);
free(data);
}
Output
Found 3 strings:
var1
vari2
varia3

You can use a simple loop scanning the string for + signs:
char string[] = "var1+vari2+varia3";
char buf[sizeof(string)];
int start = 0;
for (int i = 0;;) {
if (string[i] == '+' || string[i] == '\0') {
memcpy(buf, string + start, i - start);
buf[i - start] = '\0';
// buf contains the substring, use it as a C string
printf("%s\n", buf);
if (string[i] == '\0')
break;
start = ++i;
} else {
i++;
}
}

Your code does not have any sense.
I wrote such a function for you. Analyse it as sometimes is good to have some code as a base
char *substr(const char *str, char *buff, const size_t start, const size_t len)
{
size_t srcLen;
char *result = buff;
if(str && buff)
{
if(*str)
{
srcLen = strlen(str);
if(srcLen < start + len)
{
if(start < srcLen) strcpy(buff, str + start);
else buff[0] = 0;
}
else
{
memcpy(buff, str + start, len);
buff[len] = 0;
}
}
else
{
buff[0] = 0;
}
}
return result;
}
https://godbolt.org/z/GjMEqx

Recreate the strstr() function

Hello i am trying to make my own strstr() function and i can't figure out why it is returning a segmentation fault.I am trying to search a string within another string and then return a pointer to the first 'same' letter. Any help would be appreciated.
This is my code:
char* ms_search(char *Str1,char* Str2){
char* p = NULL;
int i,k=0,j = 0;
for(i = 0;i < ms_length(Str1); i++){
if(Str1[i] == Str2[k]){
if(k == 0){
p = &Str1[i];
j= i;
}
if(k == ms_length(Str2)){
break;
}
k++;
}
else{
if(Str1[i] == Str2[0]){
p = &Str1[i];
k=1;
j= i;
}
else{
j=0;
k = 0;
p = NULL;
}
}
}
if(p != NULL){
Str1[ms_length(Str2)+1] = '\0';
}
return &Str1[j];
}
int main(){
int i;
char* p2;
char* p="lolaaa";
char* p1= "aaa";
//char ar2[] = "aaa4";
//ms_copy(p,p1);
//printf("%s",p);
//ms_nconcat(p,p1,3);
//if(ms_ncompare(p,p1,3) == 1) printf("einai idia");
p2 = ms_search(p,p1);
printf("%s",p2);
return 0;
}

Hello i am trying to make my own strstr()
First of all you have to follow the C standard.
The C89/C99 prototype is:
char *strstr(const char *s1, const char *s2);
Standard strstr() function will NOT change the passed buffers.
The functionality is described as:
strstr() function locates the ﬁrst occurrence in the string pointed to by s1 of the sequence of characters (excluding the terminating null character) in the string pointed to by s2.
The strstr function returns a pointer to the located string, or a null pointer if the string is not found. If s2 points to a string with zero length, the function returns s1.
In standard C, this can be implemented as:
#include <string.h> /* size_t memcmp() strlen() */
char *strstr(const char *s1, const char *s2)
{
size_t n = strlen(s2);
while(*s1)
if(!memcmp(s1++,s2,n))
return (char *) (s1-1);
return 0;
}
The standalone implementation is given below:
#include <stdio.h>
char *strstr1(const char *str, const char *substring)
{
const char *a;
const char *b;
b = substring;
if (*b == 0) {
return (char *) str;
}
for ( ; *str != 0; str += 1) {
if (*str != *b) {
continue;
}
a = str;
while (1) {
if (*b == 0) {
return (char *) str;
}
if (*a++ != *b++) {
break;
}
}
b = substring;
}
return NULL;
}
int main (void)
{
char string[64] ="This is a test string for testing strstr";
char *p;
p = strstr1 (string,"test");
if(p)
{
printf("String found:\n" );
printf ("First occurrence of string \"test\" in \"%s\" is:\n%s", string, p);
}
else
{
printf("String not found!\n" );
}
return 0;
}
Output:
String found:
First occurrence of string "test" in "This is a test string for testing strstr" is:
test string for testing strstr

Your standalone strstrl is correct.
I have my preferences, and you have yours.
Neither is perfect.
You prefer
if ( *b == 0 ) {
return (char *) s1;
}
I prefer
if ( ! *b ) return (char *) s1;
You prefer
str += 1;
I prefer
str++;
You prefer
while (1)
I prefer
for (;;)
If I rewrite your strstrl with my preferences, we get
char *strstr1(const char *str, const char *substring)
{
const char *a, *b = substring;
if ( !*b ) return (char *) str;
for ( ; *str ; str++) {
if (*str != *b) continue;
a = str;
for (;;) {
if ( !*b ) return (char *) str;
if (*a++ != *b++) break;
}
b = substring;
}
return NULL;
}
Note that this version has the same snippet
if ( ! *b ) return (char *) str;
in two locations. Can we rearrange to do that test only once?
Note how we do two tests when lead character matches
if ( *str != *b )
and again later for the same lead char
a = str ; if ( *a++ != *b++)
Can we rearrange that to do a single test?
My rewrite of your standalone strstr is below. It might not be
your style, but it is in many ways similar to your standalone strstr.
My rewrite is shorter and, I want to believe, easier to understand.
char *strstr(const char *str, const char *substring)
{
const char *a = str, *b = substring;
for (;;) {
if ( !*b ) return (char *) str;
if ( !*a ) return NULL;
if ( *a++ != *b++) { a = ++str; b = substring; }
}
}

Create a new string that will consist of common letters from other two strings

I'm new to C programming. I have a task to do.
User inputs two strings. What I need to do is to create a new string that will consist only from common letters of those two given strings.
For example:
if given:
str1 = "ABCDZ"
str2 = "ADXYZ"
the new string will look like: "ADZ".
I can't make it work. I think there must be a better (more simple) algorithm but I have waisted too much time for this one so I want to complete it .. need your help!
what I've done so far is this:
char* commonChars (char* str1, char* str2)
{
char *ptr, *qtr, *arr, *tmp, *ch1, *ch2;
int counter = 1;
ch1 = str1;
ch2 = str2;
arr = (char*) malloc ((strlen(str1)+strlen(str2)+1)*(sizeof(char))); //creating dynamic array
strcpy(arr, str1);
strcat(arr,str2);
for (ptr = arr; ptr < arr + strlen(arr); ptr++)
{
for (qtr = arr; qtr < arr + strlen(arr); qtr++) // count for each char how many times is appears
{
if (*qtr == *ptr && qtr != ptr)
{
counter++;
tmp = qtr;
}
}
if (counter > 1)
{
for (qtr = tmp; *qtr; qtr++) //removing duplicate characters
*(qtr) = *(qtr+1);
}
counter = 1;
}
sortArray(arr, strlen(arr)); // sorting the string in alphabetical order
qtr = arr;
for (ptr = arr; ptr < arr + strlen(arr); ptr++, ch1++, ch2++) //checking if a letter appears in both strings and if at least one of them doesn't contain this letter - remove it
{
for (qtr = ptr; *qtr; qtr++)
{
if (*qtr != *ch1 || *qtr != *ch2)
*qtr = *(qtr+1);
}
}
}
Don't know how to finish this code .. i would be thankful for any suggestion!

The output array cannot be longer that the shorter of the two input arrays.
You can use strchr().
char * common (const char *in1, const char *in2) {
char *out;
char *p;
if (strlen(in2) < strlen(in1)) {
const char *t = in2;
in2 = in1;
in1 = t;
}
out = malloc(strlen(in2)+1);
p = out;
while (*in1) {
if (strchr(in2, *in1)) *p++ = *in1;
++in1;
}
*p = '\0';
return out;
}
This has O(NxM) performance, where N and M are the lengths of the input strings. Because your input is alphabetical and unique, you can achieve O(N+M) worst case performance. You apply something that resembles a merge loop.
char * common_linear (const char *in1, const char *in2) {
char *out;
char *p;
if (strlen(in2) < strlen(in1)) {
const char *t = in2;
in2 = in1;
in1 = t;
}
out = malloc(strlen(in2)+1);
p = out;
while (*in1 && *in2) {
if (*in1 < *in2) {
++in1;
continue;
}
if (*in2 < *in1) {
++in2;
continue;
}
*p++ = *in1;
++in1;
++in2;
}
*p = '\0';
return out;
}

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define min(x,y) ((x)<(y)? (x) : (y))
char* commonChars (const char *str1, const char *str2){
//str1, str2 : sorted(asc) and unique
char *ret, *p;
int len1, len2;
len1=strlen(str1);
len2=strlen(str2);
ret = p = malloc((min(len1, len2)+1)*sizeof(char));
while(*str1 && *str2){
if(*str1 < *str2){
++str1;
continue;
}
if(*str1 > *str2){
++str2;
continue;
}
*p++ = *str1++;
++str2;
}
*p ='\0';
return ret;
}
char *deleteChars(const char *str, const char *dellist){
//str, dellist : sorted(asc) and unique
char *ret, *p;
ret = p = malloc((strlen(str)+1)*sizeof(char));
while(*str && *dellist){
if(*str < *dellist){
*p++=*str++;
continue;
}
if(*str > *dellist){
++dellist;
continue;
}
++str;
++dellist;
}
if(!*dellist)
while(*str)
*p++=*str++;
*p ='\0';
return ret;
}
int main(void){
const char *str1 = "ABCDXYZ";
const char *str2 = "ABCDZ";
const char *str3 = "ADXYZ";
char *common2and3;
char *withoutcommon;
common2and3 = commonChars(str2, str3);
//printf("%s\n", common2and3);//ADZ
withoutcommon = deleteChars(str1, common2and3);
printf("%s\n", withoutcommon);//BCXY
free(common2and3);
free(withoutcommon);
return 0;
}

I will do something like this :
char* commonChars(char* str1, char* str2) {
char* ret = malloc(strlen(str1) * sizeof(char));
int i = j = k = 0;
for (; str1[i] != '\n'; i++, j++) {
if (str1[i] == str2[j]) {
ret[k] = str1[i];
k++;
}
}
ret[k] = '\0';
ret = realloc(ret, k);
return ret;
}
It's been a while i didn't do C, hope this is correct

You can use strpbrk() function, to do this job cleanly.
const char * strpbrk ( const char * str1, const char * str2 );
char * strpbrk ( char * str1, const char * str2 );
Locate characters in string
Returns a pointer to the first occurrence in str1 of any of the characters that are part of str2, or a null pointer if there are no matches.
The search does not include the terminating null-characters of either strings, but ends there.
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "ABCDZ";
char key[] = "ADXYZ";
char *newString = malloc(sizeof(str)+sizeof(key));
memset(newString, 0x00, sizeof(newString));
char * pch;
pch = strpbrk (str, key);
int i=0;
while (pch != NULL)
{
*(newString+i) = *pch;
pch = strpbrk (pch+1,key);
i++;
}
printf ("%s", newString);
return 0;
}

Sorry for the weird use of char arrays, was just trying to get it done fast. The idea behind the algorithm should be obvious, you can modify some of the types, loop ending conditions, remove the C++ elements, etc for your purposes. It's the idea behind the code that's important.
#include <queue>
#include <string>
#include <iostream>
using namespace std;
bool isCharPresent(char* str, char c) {
do {
if(c == *str) return true;
} while(*(str++));
return false;
}
int main ()
{
char str1[] = {'h', 'i', 't', 'h', 'e', 'r', 'e', '\0'};
char str2[] = {'a', 'h', 'i', '\0'};
string result = "";
char* charIt = &str1[0];
do {
if(isCharPresent(str2, *charIt))
result += *charIt;
} while(*(charIt++));
cout << result << endl; //hih is the result. Minor modifications if dupes are bad.
}

So i found the solution for my problem. Eventually I used another algorithm which, as turned out, is very similar to what #BLUEPIXY and #user315052 have suggested. Thanks everyone who tried to help! Very nice and useful web source!
Here is my code. Someone who'll find it useful can use it.
Note:
(1) str1 & str2 should be sorted alphabetically;
(2) each character should appear only once in each given strings;
char* commonChars (char* str1, char* str2)
{
char *ptr, *arr,*ch1, *ch2;
int counter = 0;
for (ch1 = str1; *ch1; ch1++)
{
for(ch2 = str2; *ch2; ch2++)
{
if (*ch1 == *ch2)
counter++;
}
}
arr = (char*)malloc ((counter+1) * sizeof(char));
ch1 = str1;
ch2 = str2;
ptr = arr;
for (ch1 = str1; *ch1; ch1++,ch2++)
{
while (*ch1 < *ch2)
{
ch1++;
}
while (*ch1 > *ch2)
{
ch2++;
}
if (*ch1 == *ch2)
{
*ptr = *ch1;
ptr++;
}
}
if (ptr = arr + counter)
*ptr = '\0';
return arr;
}

string parsing occurrence in c

I have a string as const char *str = "Hello, this is an example of my string";
How could I get everything after the first comma. So for this instance: this is an example of my string
Thanks

You can do something similar to what you've posted:
char *a, *b;
int i = 0;
while (a[i] && a[i] != ',')
i++;
if (a[i] == ',') {
printf("%s", a + i + 1);
} else {
printf("Comma separator not found");
}
Alternatively, you can take a look at strtok and strstr.
With strstr you can do:
char *a = "hello, this is an example of my string";
char *b = ",";
char *c;
c = strstr(a, b);
if (c != NULL)
printf("%s", c + 1);
else
printf("Comma separator not found");

Since you want a tail of the original string, there's no need to copy or modify anything, so:
#include <string.h>
...
const char *result = strchr(str, ',');
if (result) {
printf("Found: %s\n", result+1);
} else {
printf("Not found\n");
}
If you want ideas how to do it yourself (useful if you later want to do something similar but not identical), take a look at an implementation of strchr.

const char *result;
for(result = str; *result; result++)
if(*result == ',')
{
result++;
break;
}
//result points to the first character after the comma
After this code, result points to the string starting right after the comma. Or to the final '\0' (empty string), if there is no comma in the string.

You have the right idea, the following programs is one way to do it:
#include <stdio.h>
#include <string.h>
static char *comma (char *s) {
char *cpos = strchr (s, ',');
if (cpos == NULL)
return s;
return cpos + 1;
}
int main (int c, char *v[]) {
int i;
if (c >1 )
for (i = 1; i < c; i++)
printf ("[%s] -> [%s]\n", v[i], comma (v[i]));
return 0;
}
It produced the following output:
$ commas hello,there goodbye two,commas,here
[hello,there] -> [there]
[goodbye] -> [goodbye]
[two,commas,here] -> [commas,here]