Inplace string replacement in C

Inplace string replacement in C - c

Write a function
void inplace(char *str,
const char pattern,
const char* replacement,
size_t mlen)
Input:
str: a string ending with \0. the input indicates that we need an inplace algorithm.
pattern: a letter.
replacement: a string.
mlen: the size of the memory holds the string str starts from the beginning of the memory and that mlen should be larger than strlen(str)
The final result is still pointed by str.
Note that all occurrence of pattern should be replaced.
For example,
helelo\0...........
Here "helelo" is the string to replace with '\0' at the end. After '\0' there are still L valid bytes. We want to replace "e" by "123".
A simple approach works like this, we go through str, when a pattern is matched, we shift all the rest with the place to fill the replacement string, then replace the pattern by the replacement.
If the original string is with length n and contains only e, we need (n-1) + (n-2) + ... + 1 shifts.
Is there an algorithm that scans the string with only one pass and constant memory cost?

I think two passes is the minimum. On the first pass, count the number of characters that will be replaced. Given that count and the length of the replacement string, you can compute the length of the final string. (And you should verify that it's going to fit into the buffer.)
On the second pass, you scan the string backwards (starting at the last character), copying characters to their final location. When you encounter the search character, copy the replacement string to that location.
In your example, the increase in length would be 2. So you would
copy str[5] which is '\0' to str[7]
copy str[4] which is 'o' to str[6]
copy str[3] which is 'l' to str[5]
copy str[2] which is 'l' to str[4]
at str[1] you find the 'e' so str[3]='3' str[2]='2' str[1]='1'
At this point the output index is the same as the input index, so you can break the loop.
As #chux pointed out in the comments, the cases where the replacement string is either empty, or has exactly one character, can be handled with a single forward pass through the string. So the code should handle those cases separately.

A candidate single pass solution.
For each character in str, recurse. After the recursion, do the replacement.
It does recurse heavily.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
// return 0:success else 1:fail
static int inplace_help(char *dest, const char *src, int pattern,
const char* replacement, size_t rlen, size_t mlen) {
printf("'%p' '%s' %c\n", dest, src, pattern);
if (*src == pattern) {
if (rlen > mlen) return 1;
if (inplace_help(dest + rlen, src + 1, pattern, replacement, rlen,
mlen - rlen)) return 1;
memcpy(dest, replacement, rlen);
return 0;
}
if (mlen == 0) return 1;
int replace1 = *src;
if (*src) {
if (inplace_help(dest + 1, src + 1, pattern, replacement, rlen, mlen - 1)) {
return 1;
}
}
*dest = replace1;
return 0;
}
void inplace(char *str, const char pattern, const char* replacement,
size_t mlen) {
if (pattern == 0) return;
if (mlen == 0) return;
if (*replacement == 0) return; // Insure str does not shrink.
inplace_help(str, str, pattern, replacement, strlen(replacement), mlen - 1);
}
int main(void) {
char str[1000] = "eeeeec";
inplace(str, 'e', "1234", sizeof str);
printf("'%s'\n", str); // --> '12341234123412341234c'
return 0;
}

The following assumes that the memory allocated to the string has been initialized to something at some point in time, since standard C does not seem to allow access to uninitialized memory. In practice, it will work fine.
It does precisely two scans: the first one is over the entire allocated space, and moves the string to the right-hand edge of the space. The second scan is over the string itself, which it moves back to the left-hand edge while it does replacements.
I changed the prototype to return 0 on success; -1 on failure. I also allow the pattern to be a string. (Maybe a single character was intentional? Easy to change, anyway.) (As written, pattern must not be length zero. Should be checked.)
int inplace(char *str,
const char* pattern,
const char* replacement,
size_t mlen) {
/* We don't know how long the string is, but we know that it ends
with a NUL byte, so every time we hit a NUL byte, we reset
the output pointer.
*/
char* left = str + mlen;
char* right = left;
while (left > str) {
if (!*--left) right = str + mlen;
*--right = *left;
}
/* Naive left-to-right scan. KMP or BM would be more efficient. */
size_t patlen = strlen(pattern);
size_t replen = strlen(replacement);
for (;;) {
if (0 == strncmp(pattern, right, patlen)) {
right += patlen;
if (right - left < replen) return -1;
memcpy(left, replacement, replen);
left += replen;
} else {
if (!(*left++ = *right++)) break;
}
}
return 0;
}

Related

Trim function in C

I am writing my own trim() in C. There is a structure which contains all string values, the structure is getting populated from the data coming from a file which contains spaces before and after the beginning of a word.
char *trim(char *string)
{
int stPos,endPos;
int len=strlen(string);
for(stPos=0;stPos<len && string[stPos]==' ';++stPos);
for(endPos=len-1;endPos>=0 && string[endPos]==' ';--endPos);
char *trimmedStr = (char*)malloc(len*sizeof(char));
strncpy(trimmedStr,string+stPos,endPos+1);
return trimmedStr;
}
int main()
{
char string1[]=" a sdf ie ";
char *string =trim(string1);
printf("%s",string);
return 0;
}
Above code is working fine, but i don't want to declare new variable that stores the trimmed word. As the structure contains around 100 variables.
Is there any way to do somthing like below where I dont need any second variable to print the trimmed string.
printf("%s",trim(string1));
I believe above print can create dangling pointer situation.
Also, is there any way where I don't have to charge original string as well, like if I print trim(string) it will print trimmed string and when i print only string, it will print original string

elcuco was faster. but it's done so here we go:
char *trim(char *string)
{
char *ptr = NULL;
while (*string == ' ') string++; // chomp away space at the start
ptr = string + strlen(string) - 1; // jump to the last char (-1 because '\0')
while (*ptr == ' '){ *ptr = '\0' ; ptr--; } ; // overwrite with end of string
return string; // return pointer to the modified start
}
If you don't want to alter the original string I'd write a special print instead:
void trim_print(char *string)
{
char *ptr = NULL;
while (*string == ' ') string++; // chomp away space at the start
ptr = string + strlen(string) - 1; // jump to the last char (-1 because '\0')
while (*ptr == ' '){ ptr--; } ; // find end of string
while (string <= ptr) { putchar(*string++); } // you get the picture
}
something like that.

You could the original string in order to do this. For trimming the prefix I just advance the pointer, and for the suffix, I actually add \0. If you want to keep the original starting as is, you will have to move memory (which makes this an O(n^2) time complexity solution, from an O(n) I provided).
#include <stdio.h>
char *trim(char *string)
{
// trim prefix
while ((*string) == ' ' ) {
string ++;
}
// find end of original string
char *c = string;
while (*c) {
c ++;
}
c--;
// trim suffix
while ((*c) == ' ' ) {
*c = '\0';
c--;
}
return string;
}
int main()
{
char string1[] = " abcdefg abcdf ";
char *string = trim(string1);
printf("String is [%s]\n",string);
return 0;
}
(re-thinking... is it really O(n^2)? Or is it O(2n) which is a higher O(n)...? I guess depending on implementation)

You can modify the function by giving the output in the same input string
void trim(char *string)
{
int i;
int stPos,endPos;
int len=strlen(string);
for(stPos=0;stPos<len && string[stPos]==' ';++stPos);
for(endPos=len-1;endPos>=0 && string[endPos]==' ';--endPos);
for (i=0; i<=(endPos-stPos); i++)
{
string[i] = string[i+stPos];
}
string[i] = '\0'; // terminate the string and discard the remaining spaces.
}

...is there any way where i don't have to charge original string as well, like if i do trim(string) it will print trimmed string and when i print only string, it will print original string – avinashse 8 mins ago
Yes, though it gets silly.
You could modify the original string.
trim(string);
printf("trimmed: %s\n", string);
The advantage is you have the option of duplicating the string if you want to retain the original.
char *original = strdup(string);
trim(string);
printf("trimmed: %s\n", string);
If you don't want to modify the original string, that means you need to allocate memory for the modified string. That memory then must be freed. That means a new variable to hold the pointer so you can free it.
char *trimmed = trim(original);
printf("trimmed: %s\n", trimmed);
free(trimmed);
You can get around this by passing a function pointer into trim and having trim manage all the memory for you.
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
void trim(char *string, void(*func)(char *) )
{
// Advance the pointer to the first non-space char
while( *string == ' ' ) {
string++;
}
// Shrink the length to the last non-space char.
size_t len = strlen(string);
while(string[len-1]==' ') {
len--;
}
// Copy the string to stack memory
char trimmedStr[len + 1];
strncpy(trimmedStr,string, len);
// strncpy does not add a null byte, add it ourselves.
trimmedStr[len] = '\0';
// pass the trimmed string into the user function.
func(trimmedStr);
}
void print_string(char *str) {
printf("'%s'\n", str);
}
int main()
{
char string[]=" a sdf ie ";
trim(string, print_string);
printf("original: '%s'\n", string);
return 0;
}
Ta da! One variable, the original is left unmodified, no memory leaks.
While function pointers have their uses, this is a bit silly.
It's C. Get used to managing memory. ¯\_(ツ)_/¯

Also, is there any way where I don't have to charge original string as
well, like if I print trim(string) it will print trimmed string and
when i print only string, it will print original string
Yes you can, but you cannot allocate new memory in the trim function as you will not be holding the return memory.
You can have a static char buffer in the trim function and operate on it.
Updated version of #elcuco answer.
#include <stdio.h>
char *trim(char *string)
{
static char buff[some max length];
// trim prefix
while ((*string) == ' ' ) {
string++;
}
// find end of original string
int i = 0;
while (*string) {
buff[i++] = *string;
string++;
}
// trim suffix
while ((buff[i]) == ' ' ) {
buff[i] = '\0';
i--;
}
return buff;
}
int main()
{
char string1[] = " abcdefg abcdf ";
char *string = trim(string1);
printf("String is [%s]\n",string);
return 0;
}
With this you don't need to worry about holding reference to trim function return.
Note: Previous values of buff will be overwritten with new call to trim function.

If you don't want to change the original, then you will need to make a copy, or pass a second array of sufficient size as a parameter to your function for filling. Otherwise a simple in-place trmming is fine -- so long as the original string is mutable.
An easy way to approach trimming on leading and trailing whitespace is to determine the number of leading whitespace characters to remove. Then simply use memmove to move from the first non-whitespace character back to the beginning of the string (don't forget to move the nul-character with the right portion of the string).
That leaves only removing trailing whitespace. An easy approach there is to loop from the end of the string toward the beginning, overwriting each character of trailing whitespace with a nul-character until your first non-whitespace character denoting the new end of string is found.
A simple implementation for that could be:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define DELIM " \t\n" /* whitespace constant delimiters for strspn */
/** trim leading and trailing whitespace from s, (s must be mutable) */
char *trim (char *s)
{
size_t beg = strspn (s, DELIM), /* no of chars of leading whitespace */
len = strlen (s); /* length of s */
if (beg == len) { /* string is all whitespace */
*s = 0; /* make s the empty-string */
return s;
}
memmove (s, s + beg, len - beg + 1); /* shift string to beginning */
for (int i = (int)(len - beg - 1); i >= 0; i--) { /* loop from end */
if (isspace(s[i])) /* checking if char is whitespace */
s[i] = 0; /* overwrite with nul-character */
else
break; /* otherwise - done */
}
return s; /* Return s */
}
int main (void) {
char string1[] = " a sdf ie ";
printf ("original: '%s'\n", string1);
printf ("trimmed : '%s'\n", trim(string1));
}
(note: additional intervening whitespace was added to your initial string to show that multiple intervening whitespace is left unchanged, the output is single-quoted to show the remaining text boundaries)
Example Use/Output
$ ./bin/strtrim
original: ' a sdf ie '
trimmed : 'a sdf ie'
Look things over and let me know if you have further questions.

Custom STRCAT is overwhelmed by too many arguments

I am trying to code a custom strcat that separates arguments with \n except for the last one and terminates the string with \0.
It's working fine as is up to 5 arguments, but if I try passing a sixth one I get a strange line in response :
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok
ok
ok
ok
ok
ok
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok ok
ok
ok
ok
ok
ok
P/Users/domingodelmasok
Here is my custom strcat code:
char cat(char *dest, char *src, int current, int argc_nb)
{
int i = 0;
int j = 0;
while(dest[i])
i++;
while(src[j])
{
dest[i + j] = src[j];
j++;
}
if(current < argc_nb - 1)
dest[i + j] = '\n';
else
dest[i + j] = '\0';
return(*dest);
}
UPDATE Complete calling function:
char *concator(int argc, char **argv)
{
int i;
int j;
int size = 0;
char *str;
i = 1;
while(i < argc)
{
j = 0;
while(argv[i][j])
{
size++;
j++;
}
i++;
}
str = (char*)malloc(sizeof(*str) * (size + 1));
i = 1;
while(i < argc)
{
cat(str, argv[i], i, argc);
i++;
}
free(str);
return(str);
}
What's wrong here?
Thanks!
Edit: Fixed blunder.

There are quite a few issues with the code:
sizeof (char) == 1 by the C standard.
cat() requires the destination to be a string (terminated by a \0), but does not append it itself (except for current >= argc_nb - 1). This is a bug.
free(str); return str; is an use-after-free bug. If you call free(str), the contents at str are irrevocably lost, inaccessible. The free(str) should simply be removed; it is not appropriate here.
Arrays in C are indexed at 0. However, the concator() function skips the first string pointer (because argv[0] contains the name used to execute the program). This is wrong, and will eventually trip someone. Instead, have concator() add all strings in the array, but call it using concator(argc - 1, argv + 1);.
There might be even more, but at this point, I believe a rewrite from scratch, using a much more appropriate approach, is in order.
Consider the following join() function:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
char *join(const size_t parts, const char *part[],
const char *separator, const char *suffix)
{
const size_t separator_len = (separator) ? strlen(separator) : 0;
const size_t suffix_len = (suffix) ? strlen(suffix) : 0;
size_t total_len = 0;
size_t p;
char *dst, *end;
/* Calculate sum of part lengths */
for (p = 0; p < parts; p++)
if (part[p])
total_len += strlen(part[p]);
/* Add separator lengths */
if (parts > 1)
total_len += (parts - 1) * separator_len;
/* Add suffix length */
total_len += suffix_len;
/* Allocate enough memory, plus end-of-string '\0' */
dst = malloc(total_len + 1);
if (!dst)
return NULL;
/* Keep a pointer to the current end of the result string */
end = dst;
/* Append each part */
for (p = 0; p < parts; p++) {
/* Insert separator */
if (p > 0 && separator_len > 0) {
memcpy(end, separator, separator_len);
end += separator_len;
}
/* Insert part */
if (part[p]) {
const size_t len = strlen(part[p]);
if (len > 0) {
memcpy(end, part[p], len);
end += len;
}
}
}
/* Append suffix */
if (suffix_len > 0) {
memcpy(end, suffix, suffix_len);
end += suffix_len;
}
/* Terminate string. */
*end = '\0';
/* All done. */
return dst;
}
The logic is simple. First, we find out the length of each component. Note that separator is only added between parts (so occurs parts-1 times), and suffix at the very end.
(The (string) ? strlen(string) : 0 idiom just means "if string is non-NULL, strlen(0), otherwise 0". We do that, because we allow NULL separator and suffix, but strlen(NULL) is Undefined Behaviour.)
Next, we allocate enough memory for the result, including the end-of-string NUL char, \0, that was not included in the lengths.
To append each part, we keep the result pointer intact, and instead use a temporary end pointer. (It is the end of the string thus far.) We use a loop, where we copy the next part to the end. Before the second and subsequent parts, we copy the separator before the part.
Next, we copy the suffix, and finally the end-of-string '\0'. (It is important to return a pointer to the beginning of the string, rather than end, of course; and that is why we kept dst to point to the new resulting string, and end at the point we appended each substring.)
You could use it from the command line using for example the following main():
int main(int argc, char *argv[])
{
char *result;
if (argc < 4) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s SEPARATOR SUFFIX PART [ PART ... ]\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
result = join(argc - 3, (const char **)(argv + 3), argv[1], argv[2]);
if (!result) {
fprintf(stderr, "Failed.\n");
return EXIT_FAILURE;
}
fputs(result, stdout);
return EXIT_SUCCESS;
}
If you compile the above to e.g. example (I use gcc -Wall -O2 example.c -o example), then running
./example ', ' $'!\n' Hello world
in a Bash shell outputs
Hello, world!
(with a newline at end). Running
./example ' and ' $'.\n' a b c d e f g
outputs
a and b and c and d and e and f and g
(again with a newline at end). The $'...' is just a Bash idiom to specify special characters in strings; $'!\n' is the same in Bash as "!\n" is in C, and $'.\n' is the Bash equivalent of ".\n" in C.
(Removing the automatic newline between parts, and allowing a string rather than just one char to be used as a separator and suffix, was a deliberate choice for two reasons. The main one is to stop anyone from just copy-pasting this as an answer to some exercise. The secondary one is to show that while it might sound more complicated than just using single characters for them, it is actually very little additional code; and if you consider the practical use cases, allowing a string to be used as the separator opens up a lot of options.)
The example code above is only very lightly tested, and might contain bugs. If you find any, or disagree with anything I've written above, do let me know in a comment so I can review, and fix as necessary.

how to cut out a chinese words & english words mixture string in c language

I have a string that contains both Mandarin and English words in UTF-8:
char *str = "你a好测b试";
If you use strlen(str), it will return 14, because each Mandarin character uses three bytes, while each English character uses only one byte.
Now I want to copy the leftmost 4 characters ("你a好测"), and append "..." at the end, to give "你a好测...".
If the text were in a single-byte encoding, I could just write:
strncpy(buf, str, 4);
strcat(buf, "...");
But 4 characters in UTF-8 isn't necessarily 4 bytes. For this example, it will be 13 bytes: three each for 你, 好 and 测 and one for a. So, for this specific case, I would need
strncpy(buf, str, 13);
strcat(buf, "...");
If I had a wrong value for the length, I could produce a broken UTF-8 stream with an incomplete character. Obviously I want to avoid that.
How can I compute the right number of bytes to copy, corresponding to a given number of characters?

First you need to know your encoding. By the sound of it (3 byte Mandarin) your string is encoded with UTF-8.
What you need to do is convert the UTF-8 back to unicode code points (integers). You can then have an array of integers rather than bytes, so each element of the array will be 1 character, reguardless of the language.
You could also use a library of functions that already handle utf8 such as http://www.cprogramming.com/tutorial/utf8.c
http://www.cprogramming.com/tutorial/utf8.h
In particular this function: int u8_toucs(u_int32_t *dest, int sz, char *src, int srcsz); might be very useful, it will create an array of integers, with each integer being 1 character. You can then modify the array as you see fit, then convert it back again with int u8_toutf8(char *dest, int sz, u_int32_t *src, int srcsz);

I would recommend dealing with this at a higher level of abstraction: either convert to wchar_t or use a UTF-8 library. But if you really want to do it at the byte level, you could count characters by skipping over the continuation bytes (which are of the form 10xxxxxx):
#include <stddef.h>
size_t count_bytes_for_chars(const char *s, int n)
{
const char *p = s;
n += 1; /* we're counting up to the start of the subsequent character */
while (*p && (n -= (*p & 0xc0) != 0x80))
++p;
return p-s;
}
Here's a demonstration of the above function:
#include <string.h>
#include <stdio.h>
int main()
{
const char *str = "你a好测b试";
char buf[50];
int truncate_at = 4;
size_t bytes = count_bytes_for_chars(str, truncate_at);
strncpy(buf, str, bytes);
strcpy(buf+bytes, "...");
printf("'%s' truncated to %d characters is '%s'\n", str, truncate_at, buf);
}
Output:
'你a好测b试' truncated to 4 characters is '你a好测...'

The Basic Multilingual Plane was designed to contain characters for almost all modern languages. In particular, it does contain Chinese.
So you just have to convert your UTF8 string to a UTF16 one to have each character using one single position. That means that you can just use a wchar_t array or even better a wstring to be allowed to use natively all string functions.
Starting with C++11, the <codecvt> header declares a dedicated converter std::codecvt_utf8 to specifically convert UTF8 narrow strings to wide Unicode ones. I must admit it is not very easy to use, but it should be enough here. Code could be like:
char str[] = "你a好测b试";
std::codecvt_utf8<wchar_t> cvt;
std::mbstate_t state = std::mbstate_t();
wchar_t wstr[sizeof(str)] = {0}; // there will be unused space at the end
const char *end;
wchar_t *wend;
auto cr = cvt.in(state, str, str+sizeof(str), end,
wstr, wstr+sizeof(str), wend);
*wend = 0;
Once you have the wstr wide string, you can convert it to a wstring and use all the C++ library tools, or if you prefer C strings you can use the ws... counterparts of the str... functions.

Pure C solution:
All UTF8 multibyte characters will be made from char-s with the most-significant-bit set to 1 with the first bits of their first character indicating how many characters makes a codepoint.
The question is ambiguous in regards to the criterion used in cutting; either:
a fixed number of codepoints followed by three dots, this wil require a variable size output buffer
a fixed size output buffer, which will impose "whatever you can fit inside"
Both the solutions will require a helper function telling how many chars make the next codepoint:
// Note: the function does NOT fully validate a
// UTF8 sequence, only looks at the first char in it
int codePointLen(const char* c) {
if(NULL==c) return -1;
if( (*c & 0xF8)==0xF0 ) return 4; // 4 ones and one 0
if( (*c & 0xF0)==0xE0 ) return 3; // 3 ones and one 0
if( (*c & 0xE0)==0xC0 ) return 2; // 2 ones and one 0
if( (*c & 0x7F)==*c ) return 1; // no ones on msb
return -2; // invalid UTF8 starting character
}
So, solution for the criterion 1 (fixed number of code points, variable output buff size) - does not append ... to the destination, but you can ask "how many chars I need" upfront and if it is longer than you can afford, reserve yourself the extra space.
// returns the number of chars used from the output
// If not enough space or the dest is null, does nothing
// and returns the lenght required for the output buffer
// Returns negative val if the source in not a valid UTF8
int copyFirstCodepoints(
int codepointsCount, const char* src,
char* dest, int destSize
) {
if(NULL==src) {
return -1;
}
// do a cold run to see if size of the output buffer can fit
// as many codepoints as required
const char* walker=src;
for(int cnvCount=0; cnvCount<codepointsCount; cnvCount++) {
int chCount=codePointLen(walker);
if(chCount<0) {
return chCount; // err
}
walker+=chCount;
}
if(walker-src < destSize && NULL!=dest) {
// enough space at destination
strncpy(src, dest, walker-src);
}
// else do nothing
return walker-src;
}
Second criterion (limited buffer size): just use the first one with the number of codepoints returned by this one
// return negative if UTF encoding error
int howManyCodepointICanFitInOutputBufferOfLen(const char* src, int maxBufflen) {
if(NULL==src) {
return -1;
}
int ret=0;
for(const char* walker=src; *walker && ret<maxBufflen; ret++) {
int advance=codePointLen(walker);
if(advance<0) {
return src-walker; // err because negative, but indicating the err pos
}
// look on all the chars between walker and walker+advance
// if any is 0, we have a premature end of the source
while(advance>0) {
if(0==*(++walker)) {
return src-walker; // err because negative, but indicating the err pos
}
advance--;
} // walker is set on the correct position for the next attempt
}
return ret;
}

static char *CutStringLength(char *lpszData, int nMaxLen)
{
if (NULL == lpszData || 0 >= nMaxLen)
{
return "";
}
int len = strlen(lpszData);
if(len <= nMaxLen)
{
return lpszData;
}
char strTemp[1024] = {0};
strcpy(strTemp, lpszData);
char *p = strTemp;
p = p + (nMaxLen-1);
if ((unsigned char)(*p) < 0xA0)
{
*(++p) = '\0'; // if the last byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but one byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but two byte is Mandarin character
}
else
{
int i = 0;
p = strTemp;
while(*p != '\0' && i+2 <= nMaxLen)
{
if((unsigned char)(*p++) >= 0xA0 && (unsigned char)(*p) >= 0xA0)
{
p++;
i++;
}
i++;
}
*p = '\0';
}
printf("str = %s\n",strTemp);
return strTemp;
}

How to do a split of a string

HI, I would like how to do a split of a string in c without #include

Multiple ways of doing that, which I'll just explain and not write for you as this can only be a homework (or self-enhancement exercise, so the intent is the same).
Either you split the string into multiple strings that you re-allocate into a multi-dimensional array,
or you simply cut the string on separators and add terminal '\0' where appropriate and just copy the starting address of each sub-string to an array of pointers.
The approach for the splitting is similar in both cases, but in the second one you don't need to allocate any memory (but modify the original string), while in the first one you create safe copies of each sub-string.
You were not specific on the splitting, so I don't know if you wanted to cut on substrings, a single charater, or a list of potential separators, etc...
Good luck.

find the point you would like to split it
make two buffers large enough to contain data
strcpy() or do it manually (see example)
in this code I assume you have a string str[] and would like to split it at the first comma:
for(int count = 0; str[count] != '\0'; count++) {
if(str[count] == ',')
break;
}
if(str[count] == '\0')
return 0;
char *s1 = malloc(count);
strcpy(s1, (str+count+1)); // get part after
char *s2 = malloc(strlen(str) - count); // get part before
for(int count1 = 0; count1 < count; count1++)
s2[count1] = str[count1];
got it? ;)

Assuming I have complete control of the function prototype, I'd do this (make this a single source file (no #includes) and compile, then link with the rest of the project)
If #include <stddef.h> is part of the "without #include" thing (but it shouldn't), then instead of size_t, use unsigned long in the code below
#include <stddef.h>
/* split of a string in c without #include */
/*
** `predst` destination for the prefix (before the split character)
** `postdst` destination for the postfix (after the split character)
** `src` original string to be splitted
** `ch` the character to split at
** returns the length of `predst`
**
** it is UB if
** src does not contain ch
** predst or postdst has no space for the result
*/
size_t split(char *predst, char *postdst, const char *src, char ch) {
size_t retval = 0;
while (*src != ch) {
*predst++ = *src++;
retval++;
}
*predst = 0;
src++; /* skip over ch */
while ((*postdst++ = *src++) != 0) /* void */;
return retval;
}
Example usage
char a[10], b[42];
size_t n;
n = split(b, a, "forty two", ' ');
/* n is 5; b has "forty"; a has "two" */

How to remove a character from a string using baskspace in C?

can you give me an example of deleting characters from an array of characters in c?
I tried too much, but i didn't reach to what i want
That is what i did:
int i;
if(c<max && c>start) // c is the current index, start == 0 as the index of the start,
//max is the size of the array
{
i = c;
if(c == start)
break;
arr[c-1] = arr[c];
}
printf("%c",arr[c]);

A character array in C does not easily permit deleting entries. All you can do is move the data (for instance using memmove). Example:
char string[20] = "strring";
/* delete the duplicate r*/
int duppos=3;
memmove(string+duppos, string+duppos+1, strlen(string)-duppos);

You have an array of characters c:
char c[] = "abcDELETEdefg";
You want a different array that contains only "abcdefg" (plus the null terminator). You can do this:
#define PUT_INTO 3
#define TAKE_FROM 9
int put, take;
for (put = START_CUT, take = END_CUT; c[take] != '\0'; put++, take++)
{
c[put] = c[take];
}
c[put] = '\0';
There are more efficient ways to do this using memcpy or memmove, and it could be made more general, but this is the essence. If you really care about speed, you should probably make a new array that doesn't contain the characters you don't want.

Here's one approach. Instead of removing characters in place and shuffling the remaining characters (which is a pain), you copy the characters you want to keep to another array:
#include <string.h>
...
void removeSubstr(const char *src, const char *substr, char *target)
{
/**
* Use the strstr() library function to find the beginning of
* the substring in src; if the substring is not present,
* strstr returns NULL.
*/
char *start = strstr(src, substr);
if (start)
{
/**
* Copy characters from src up to the location of the substring
*/
while (src != start) *target++ = *src++;
/**
* Skip over the substring
*/
src += strlen(substr);
}
/**
* Copy the remaining characters to the target, including 0 terminator
*/
while ((*target++ = *src++))
; // empty loop body;
}
int main(void)
{
char *src = "This is NOT a test";
char *sub = "NOT ";
char result[20];
removeSubstr(src, sub, result);
printf("Src: \"%s\", Substr: \"%s\", Result: \"%s\"\n", src, sub, result);
return 0;
}

string = H E L L O \0
string_length = 5 (or just use strlen inside if you don't want to cache it outside this call
remove_char_at_index = 1 if you want to delete the 'E'
copy to the 'E' position (string + 1)
from the first 'L' position (string + 1 + 1)
4 bytes (want to get the NULL), so 5 - 1 = 4
remove_character_at_location(char * string, int string_length, int remove_char_at_index) {
/* Use memmove because the locations overlap */.
memmove(string+remove_char_at_index,
string+remove_char_at_index+1,
string_length - remove_char_at_position);
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Inplace string replacement in C - c

Related

Trim function in C

Custom STRCAT is overwhelmed by too many arguments

how to cut out a chinese words & english words mixture string in c language

How to do a split of a string

How to remove a character from a string using baskspace in C?

Categories

Resources