How to do a split of a string

How to do a split of a string - c

HI, I would like how to do a split of a string in c without #include

Multiple ways of doing that, which I'll just explain and not write for you as this can only be a homework (or self-enhancement exercise, so the intent is the same).
Either you split the string into multiple strings that you re-allocate into a multi-dimensional array,
or you simply cut the string on separators and add terminal '\0' where appropriate and just copy the starting address of each sub-string to an array of pointers.
The approach for the splitting is similar in both cases, but in the second one you don't need to allocate any memory (but modify the original string), while in the first one you create safe copies of each sub-string.
You were not specific on the splitting, so I don't know if you wanted to cut on substrings, a single charater, or a list of potential separators, etc...
Good luck.

find the point you would like to split it
make two buffers large enough to contain data
strcpy() or do it manually (see example)
in this code I assume you have a string str[] and would like to split it at the first comma:
for(int count = 0; str[count] != '\0'; count++) {
if(str[count] == ',')
break;
}
if(str[count] == '\0')
return 0;
char *s1 = malloc(count);
strcpy(s1, (str+count+1)); // get part after
char *s2 = malloc(strlen(str) - count); // get part before
for(int count1 = 0; count1 < count; count1++)
s2[count1] = str[count1];
got it? ;)

Assuming I have complete control of the function prototype, I'd do this (make this a single source file (no #includes) and compile, then link with the rest of the project)
If #include <stddef.h> is part of the "without #include" thing (but it shouldn't), then instead of size_t, use unsigned long in the code below
#include <stddef.h>
/* split of a string in c without #include */
/*
** `predst` destination for the prefix (before the split character)
** `postdst` destination for the postfix (after the split character)
** `src` original string to be splitted
** `ch` the character to split at
** returns the length of `predst`
**
** it is UB if
** src does not contain ch
** predst or postdst has no space for the result
*/
size_t split(char *predst, char *postdst, const char *src, char ch) {
size_t retval = 0;
while (*src != ch) {
*predst++ = *src++;
retval++;
}
*predst = 0;
src++; /* skip over ch */
while ((*postdst++ = *src++) != 0) /* void */;
return retval;
}
Example usage
char a[10], b[42];
size_t n;
n = split(b, a, "forty two", ' ');
/* n is 5; b has "forty"; a has "two" */

Related

How to return empty string from a function in C?

How should I return an empty string from a function? I tried using lcp[i] = ' ' but it creates an error. Then I used lcp[i] = 0 and it returned an empty string. However, I do not know if it's right.
Also, is it necessary to use free(lcp) in the caller function? Since I could not free and return at the same time.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LEN 50
char *find_LCP(char str1[], char str2[]);
char *find_LCP(char str1[], char str2[]){
char * lcp = malloc(MAX_LEN * sizeof(char));
int a = strlen(str1);
int b = strlen(str2);
int min = a < b ? a : b;
for(int i = 0; i < min; i++){
if(str1[i] == str2[i])
lcp[i] = str1[i];
else
lcp[i] = 0;
}
return lcp;
}
int main()
{
char str1[MAX_LEN], str2[MAX_LEN];
char * lcp;
printf("Enter first word > ");
scanf("%s", str1);
printf("Enter second word > ");
scanf("%s", str2);
lcp = find_LCP(str1, str2);
printf("\nLongest common prefix: '%s'\n", lcp);
free(lcp);
return 0;
}

An "empty" string is just a string with the first byte zero, so you can write:
s[0] = 0;
However, it is not clear what you are trying to do. The LCP of "foo" and "fob" is "fo", not the empty string.
You can also return as soon as you find the first non-matching character, no need to go until the end.
Further, you can simply pass the output string as a parameter and have lcp be an array. That way you avoid both malloc and free:
char lcp[MAX_LEN];
...
find_LCP(lcp, str1, str2);

If you want to empty a string without using a for loop then you can do
lcp[0] = 0
but for emptying a string it was right the way you did using a for loop.
There are plenty other ways of emptying the string word by word using for loop:
lcp[i] = '\0';
and it's the right way to make string empty as letter by letter you trying to do using for loop
But if you are not using some loops and simply empty a string then you can do this.
memset(buffer,0,strlen(buffer));
but this will only work for zeroing up to the first NULL character.
If the string is a static array, you can use:
memset(buffer,0,sizeof(buffer));

Your program has a bug: If you supply two identical strings, lcp[i] = 0; never executes which means that your function will return a string which is not NUL-terminated. This will cause undefined behvaior when you use that string in your printf in main.
The fix for this is easy, NUL-terminate the string after the loop:
int i;
for (i = 0; i < min; i++){
if(str1[i] == str2[i])
lcp[i] = str1[i];
else
break;
}
lcp[i] = 0;
As for the answer to the question, an empty string is one which has the NUL-terminator right at the start. We've already handled that as we've NUL-terminated the string outside the loop.
Also, is it necessary to use free(lcp) in the caller function?
In this case, it is not required as the allocated memory will get freed when the program exits, but I'd recommend keeping it because it is good practice.
As the comments say, you can use calloc instead of malloc which fills the allocated memory with zeros so you don't have to worry about NUL-terminating.

In the spirit of code golf. No need to calculate string lengths. Pick any string and iterate through it until the current character either null or differs from the corresponding character in the other string. Store the index, then copy appropriate number of bytes.
char *getlcp(const char *s1, const char *s2) {
int i = 0;
while (s1[i] == s2[i] && s1[i] != '\0') ++i;
char *lcp = calloc((i + 1), sizeof(*lcp));
memcpy(lcp, s1, i);
return lcp;
}
P.S. If you don't care about preserving one of input strings then you can simplify the code even further and just return the index (the position of the last character of the common prefix) from the function, then put '\0' at that index into one of the strings.

sscanf_s doesn't store the right pattern

I just want the string without underscore. I tried below few codes all doesn't work:
string is char pointer from another function, it looks like this: " "_I_have_1_dog.dat)" "
void func1(char *string)
{
char buffer[256]="";
unsigned long count = 0;
count = sscanf_s(string, " \"%*c%255[^\"]\"", buffer, _countof(buffer));
output:
_I_have_1_dog.dat
count = sscanf_s(string, " \"%*[^_]_%255[^\"]\"", buffer, _countof(buffer));
output:
_I_have_1_dog.dat
count = sscanf_s(string, " \"[^_]_%255[^\"]\"", buffer, _countof(buffer));
output:
_I_have_1_dog.dat
count = sscanf_s(string, " \"[^_]_%255[^\"]\"", buffer, _countof(buffer));
output:
_I_have_1_dog.dat

Edit Based on Removing 1st char '_' Instead of All '_'
The easiest approach to remove a leading '_' is simply to shift all characters down by 1 in string if the first character is an '_'.
You can use the functions like memmove from string.h to do the same thing. (in a single function call) However, looping is just as easy.
A simple function using the loop method could be:
void rm_1st_underscore (char *string)
{
int i = 1; /* set index to 1 (2nd char in string) */
if (*string != '_') /* if 1st char not '_', just return */
return;
do /* loop over each char in string */
string[i-1] = string[i]; /* shift chars back by 1 in string */
while (string[i++] != 0); /* (note: causes '\0' to copy) */
}
A short example would be:
#include <stdio.h>
void rm_1st_underscore (char *string)
{
int i = 1; /* set index to 1 (2nd char in string) */
if (*string != '_') /* if 1st char not '_', just return */
return;
do /* loop over each char in string */
string[i-1] = string[i]; /* shift chars back by 1 in string */
while (string[i++] != 0); /* (note: causes '\0' to copy) */
}
int main (void) {
char str[] = "_I_have_1_dog.dat";
rm_1st_underscore (str);
puts (str);
}
Example Use/Output
$ ./bin/rmunderscore
I_have_1_dog.dat
Look things over and let me know if you still have questions. If you need help trying it with memmove let me know and I'll drop another example.
Using memmove
Since we are just removing the first '_' instead of all of them, using memmove makes it trivial. Simply include string.h and get the length of string and then call memmove copying from the second char in string back to the first, e.g.
...
#include <string.h>
void rm_1st_underscore (char *string)
{
size_t len = strlen (string);
memmove (string, string + 1, len);
}
...
(the output is the same)

Custom STRCAT is overwhelmed by too many arguments

I am trying to code a custom strcat that separates arguments with \n except for the last one and terminates the string with \0.
It's working fine as is up to 5 arguments, but if I try passing a sixth one I get a strange line in response :
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok
ok
ok
ok
ok
ok
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok ok
ok
ok
ok
ok
ok
P/Users/domingodelmasok
Here is my custom strcat code:
char cat(char *dest, char *src, int current, int argc_nb)
{
int i = 0;
int j = 0;
while(dest[i])
i++;
while(src[j])
{
dest[i + j] = src[j];
j++;
}
if(current < argc_nb - 1)
dest[i + j] = '\n';
else
dest[i + j] = '\0';
return(*dest);
}
UPDATE Complete calling function:
char *concator(int argc, char **argv)
{
int i;
int j;
int size = 0;
char *str;
i = 1;
while(i < argc)
{
j = 0;
while(argv[i][j])
{
size++;
j++;
}
i++;
}
str = (char*)malloc(sizeof(*str) * (size + 1));
i = 1;
while(i < argc)
{
cat(str, argv[i], i, argc);
i++;
}
free(str);
return(str);
}
What's wrong here?
Thanks!
Edit: Fixed blunder.

There are quite a few issues with the code:
sizeof (char) == 1 by the C standard.
cat() requires the destination to be a string (terminated by a \0), but does not append it itself (except for current >= argc_nb - 1). This is a bug.
free(str); return str; is an use-after-free bug. If you call free(str), the contents at str are irrevocably lost, inaccessible. The free(str) should simply be removed; it is not appropriate here.
Arrays in C are indexed at 0. However, the concator() function skips the first string pointer (because argv[0] contains the name used to execute the program). This is wrong, and will eventually trip someone. Instead, have concator() add all strings in the array, but call it using concator(argc - 1, argv + 1);.
There might be even more, but at this point, I believe a rewrite from scratch, using a much more appropriate approach, is in order.
Consider the following join() function:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
char *join(const size_t parts, const char *part[],
const char *separator, const char *suffix)
{
const size_t separator_len = (separator) ? strlen(separator) : 0;
const size_t suffix_len = (suffix) ? strlen(suffix) : 0;
size_t total_len = 0;
size_t p;
char *dst, *end;
/* Calculate sum of part lengths */
for (p = 0; p < parts; p++)
if (part[p])
total_len += strlen(part[p]);
/* Add separator lengths */
if (parts > 1)
total_len += (parts - 1) * separator_len;
/* Add suffix length */
total_len += suffix_len;
/* Allocate enough memory, plus end-of-string '\0' */
dst = malloc(total_len + 1);
if (!dst)
return NULL;
/* Keep a pointer to the current end of the result string */
end = dst;
/* Append each part */
for (p = 0; p < parts; p++) {
/* Insert separator */
if (p > 0 && separator_len > 0) {
memcpy(end, separator, separator_len);
end += separator_len;
}
/* Insert part */
if (part[p]) {
const size_t len = strlen(part[p]);
if (len > 0) {
memcpy(end, part[p], len);
end += len;
}
}
}
/* Append suffix */
if (suffix_len > 0) {
memcpy(end, suffix, suffix_len);
end += suffix_len;
}
/* Terminate string. */
*end = '\0';
/* All done. */
return dst;
}
The logic is simple. First, we find out the length of each component. Note that separator is only added between parts (so occurs parts-1 times), and suffix at the very end.
(The (string) ? strlen(string) : 0 idiom just means "if string is non-NULL, strlen(0), otherwise 0". We do that, because we allow NULL separator and suffix, but strlen(NULL) is Undefined Behaviour.)
Next, we allocate enough memory for the result, including the end-of-string NUL char, \0, that was not included in the lengths.
To append each part, we keep the result pointer intact, and instead use a temporary end pointer. (It is the end of the string thus far.) We use a loop, where we copy the next part to the end. Before the second and subsequent parts, we copy the separator before the part.
Next, we copy the suffix, and finally the end-of-string '\0'. (It is important to return a pointer to the beginning of the string, rather than end, of course; and that is why we kept dst to point to the new resulting string, and end at the point we appended each substring.)
You could use it from the command line using for example the following main():
int main(int argc, char *argv[])
{
char *result;
if (argc < 4) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s SEPARATOR SUFFIX PART [ PART ... ]\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
result = join(argc - 3, (const char **)(argv + 3), argv[1], argv[2]);
if (!result) {
fprintf(stderr, "Failed.\n");
return EXIT_FAILURE;
}
fputs(result, stdout);
return EXIT_SUCCESS;
}
If you compile the above to e.g. example (I use gcc -Wall -O2 example.c -o example), then running
./example ', ' $'!\n' Hello world
in a Bash shell outputs
Hello, world!
(with a newline at end). Running
./example ' and ' $'.\n' a b c d e f g
outputs
a and b and c and d and e and f and g
(again with a newline at end). The $'...' is just a Bash idiom to specify special characters in strings; $'!\n' is the same in Bash as "!\n" is in C, and $'.\n' is the Bash equivalent of ".\n" in C.
(Removing the automatic newline between parts, and allowing a string rather than just one char to be used as a separator and suffix, was a deliberate choice for two reasons. The main one is to stop anyone from just copy-pasting this as an answer to some exercise. The secondary one is to show that while it might sound more complicated than just using single characters for them, it is actually very little additional code; and if you consider the practical use cases, allowing a string to be used as the separator opens up a lot of options.)
The example code above is only very lightly tested, and might contain bugs. If you find any, or disagree with anything I've written above, do let me know in a comment so I can review, and fix as necessary.

Inplace string replacement in C

Write a function
void inplace(char *str,
const char pattern,
const char* replacement,
size_t mlen)
Input:
str: a string ending with \0. the input indicates that we need an inplace algorithm.
pattern: a letter.
replacement: a string.
mlen: the size of the memory holds the string str starts from the beginning of the memory and that mlen should be larger than strlen(str)
The final result is still pointed by str.
Note that all occurrence of pattern should be replaced.
For example,
helelo\0...........
Here "helelo" is the string to replace with '\0' at the end. After '\0' there are still L valid bytes. We want to replace "e" by "123".
A simple approach works like this, we go through str, when a pattern is matched, we shift all the rest with the place to fill the replacement string, then replace the pattern by the replacement.
If the original string is with length n and contains only e, we need (n-1) + (n-2) + ... + 1 shifts.
Is there an algorithm that scans the string with only one pass and constant memory cost?

I think two passes is the minimum. On the first pass, count the number of characters that will be replaced. Given that count and the length of the replacement string, you can compute the length of the final string. (And you should verify that it's going to fit into the buffer.)
On the second pass, you scan the string backwards (starting at the last character), copying characters to their final location. When you encounter the search character, copy the replacement string to that location.
In your example, the increase in length would be 2. So you would
copy str[5] which is '\0' to str[7]
copy str[4] which is 'o' to str[6]
copy str[3] which is 'l' to str[5]
copy str[2] which is 'l' to str[4]
at str[1] you find the 'e' so str[3]='3' str[2]='2' str[1]='1'
At this point the output index is the same as the input index, so you can break the loop.
As #chux pointed out in the comments, the cases where the replacement string is either empty, or has exactly one character, can be handled with a single forward pass through the string. So the code should handle those cases separately.

A candidate single pass solution.
For each character in str, recurse. After the recursion, do the replacement.
It does recurse heavily.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
// return 0:success else 1:fail
static int inplace_help(char *dest, const char *src, int pattern,
const char* replacement, size_t rlen, size_t mlen) {
printf("'%p' '%s' %c\n", dest, src, pattern);
if (*src == pattern) {
if (rlen > mlen) return 1;
if (inplace_help(dest + rlen, src + 1, pattern, replacement, rlen,
mlen - rlen)) return 1;
memcpy(dest, replacement, rlen);
return 0;
}
if (mlen == 0) return 1;
int replace1 = *src;
if (*src) {
if (inplace_help(dest + 1, src + 1, pattern, replacement, rlen, mlen - 1)) {
return 1;
}
}
*dest = replace1;
return 0;
}
void inplace(char *str, const char pattern, const char* replacement,
size_t mlen) {
if (pattern == 0) return;
if (mlen == 0) return;
if (*replacement == 0) return; // Insure str does not shrink.
inplace_help(str, str, pattern, replacement, strlen(replacement), mlen - 1);
}
int main(void) {
char str[1000] = "eeeeec";
inplace(str, 'e', "1234", sizeof str);
printf("'%s'\n", str); // --> '12341234123412341234c'
return 0;
}

The following assumes that the memory allocated to the string has been initialized to something at some point in time, since standard C does not seem to allow access to uninitialized memory. In practice, it will work fine.
It does precisely two scans: the first one is over the entire allocated space, and moves the string to the right-hand edge of the space. The second scan is over the string itself, which it moves back to the left-hand edge while it does replacements.
I changed the prototype to return 0 on success; -1 on failure. I also allow the pattern to be a string. (Maybe a single character was intentional? Easy to change, anyway.) (As written, pattern must not be length zero. Should be checked.)
int inplace(char *str,
const char* pattern,
const char* replacement,
size_t mlen) {
/* We don't know how long the string is, but we know that it ends
with a NUL byte, so every time we hit a NUL byte, we reset
the output pointer.
*/
char* left = str + mlen;
char* right = left;
while (left > str) {
if (!*--left) right = str + mlen;
*--right = *left;
}
/* Naive left-to-right scan. KMP or BM would be more efficient. */
size_t patlen = strlen(pattern);
size_t replen = strlen(replacement);
for (;;) {
if (0 == strncmp(pattern, right, patlen)) {
right += patlen;
if (right - left < replen) return -1;
memcpy(left, replacement, replen);
left += replen;
} else {
if (!(*left++ = *right++)) break;
}
}
return 0;
}

append a character from an array to a char pointer

Ok, so I'm a person who usually writes Java/C++, and I've just started getting into writing C. I'm currently writing a lexical analyser, and I can't stand how strings work in C, since I can't perform string arithmetic. So here's my question:
char* buffer = "";
char* line = "hello, world";
int i;
for (i = 0; i < strlen(line); i++) {
buffer += line[i];
}
How can I do that in C? Since the code above isn't valid C, how can I do something like that?
Basically I'm looping though a string line, and I'm trying to append each character to the buffer string.

string literals are immutable in C. Modifying one causes Undefined Behavior.
If you use a char array (your buffer) big enough to hold your characters, you can still modify its content :
#include <stdio.h>
int main(void) {
char * line = "hello, world";
char buffer[32]; // ok, this array is big enough for our operation
int i;
for (i = 0; i < strlen(line) + 1; i++)
{
buffer[i] = line[i];
}
printf("buffer : %s", buffer);
return 0;
}

First off the buffer needs to have or exceed the length of the data being copied to it.
char a[length], b[] = "string";
Then the characters are copied to the buffer.
int i = 0;
while (i < length && b[i] != '\0') { a[i] = b[i]; i++; }
a[i] = '\0';
You can reverse the order if you need to, just start i at the smallest length value among the two strings, and decrement the value instead of increment. You can also use the heap, as others have suggested, ordinate towards an arbitrary or changing value of length. Furthermore, you can change up the snippet with pointers (and to give you a better idea of what is happening):
int i = 0;
char *j = a, *k = b;
while (j - a < length && *k) { *(j++) = *(k++); }
*j = '\0';
Make sure to look up memcpy; and don't forget null terminators (oops).

#include <string.h>
//...
char *line = "hello, world";
char *buffer = ( char * ) malloc( strlen( line ) + 1 );
strcpy( buffer, line );
Though in C string literals have types of non-const arrays it is better to declare pointers initialized by string literals with qualifier const:
const char *line = "hello, world";
String literals in C/C++ are immutable.
If you want to append characters then the code can look the following way (each character of line is appended to buffer in a loop)
#include <string.h>
//...
char *line = "hello, world";
char *buffer = ( char * ) malloc( strlen( line ) + 1 );
buffer[0] = '\0';
char *p = Buffer;
for ( size_t i = 0; i < strlen( line ); i++ )
{
*p++ = line[i];
*p = '\0';
}
The general approach is that you find the pointer to the terminating zero substitute it for the target character advance the pointer and appenf the new terminating zero. The source buffer shall be large enough to accomodate one more character.

If you want to append a single character to a string allocated on the heap, here's one way to do it:
size_t length = strlen(buffer);
char *newbuffer = realloc(buffer, length + 2);
if (newbuffer) { // realloc succeeded
buffer = newbuffer;
buffer[length] = newcharacter;
buffer[length + 1] = '\0';
}
else { // realloc failed
// TODO handle error...
free(buffer); // for example
}
However, this is inefficient to do repeatedly in a loop, because you'll be repeatedly calling strlen() on (essentially) the same string, and reallocating the buffer to fit one more character each time.
If you want to be smarter about your reallocations, keep track of the buffer's current allocated capacity separately from the length of the string within it — if you know C++, think of the difference between a std::string object's "size" and its "capacity" — and when it's necessary to reallocate, multiply the buffer's size by a scaling factor (e.g. double it) instead of adding 1, so that the number of reallocations is O(log n) instead of O(n).
This is the sort of thing that a good string class would do in C++. In C, you'll probably want to move this buffer-management stuff into its own module.

The simplest solution, lacking any context, is to do:
char buffer[ strlen(line) + 1 ];
strcpy(buffer, line);
You may be used to using pointers for everything in Java (since non-primitive types in Java are actually more like shared pointers than anything else). However you don't necessarily have to do this in C and it can be a pain if you do.
Maybe a good idea given your background would be to use a counted string object in C, where the string object owns its data. Write struct my_string { char *data; size_t length; } . Write functions for creating, destroying, duplicating, and any other operation you need such as appending a character, or checking the length. (Separate interface from implementation!) A useful addition to this would be to make it allocate 1 more byte than length, so that you can have a function which null-terminates and allows it to be passed to a function that expects a read-only C-style string.
The only real pitfall here is to remember to call a function when you are doing a copy operation, instead of allowing structure assignment to happen. (You can use structure assignment for a move operation of course!)

The asprintf function is very useful for building strings, and is available on GNU-based systems (Linux), or most *BSD based systems. You can do things like:
char *buffer;
if (asprintf(&buffer, "%s: adding some stuff %d - %s", str1, number, str2) < 0) {
fprintf(stderr, "Oops -- out of memory\n");
exit(1); }
printf("created the string \"%s\"\n", buffer);
free(buffer); /* done with it */

Appending is best done with snprintf
Include the stdio.h header
#include <stdio.h>
then
char* buffer;
char line[] = "hello, world";
// Initialise the pointer to an empty string
snprintf(buffer, 1, "%s", "");
for (i = 0; i < strlen(line); ++i) {
snprintf(buffer, sizeof line[i], "%s%s", buffer, line[i]);
}
As you have started the code you have there is different from the question you are asking.
You could have split the line with strtok though.
But I hope my answer clarifies it.