Why is substring not part of the C standard library? - c

I know C is purposefully bare-bones, but I'm curious as to why something as commonplace as a substring function is not included in <string.h>.
Is it that there is not one "right enough" way to do it? Too many domain specific requirements? Can anyone shed any light?
BTW, this is the substring function I came up with after a bit of research.
Edit: I made a few updates based on comments.
void substr (char *outStr, const char *inpStr, int startPos, size_t strLen) {
/* Cannot do anything with NULL. */
if (inpStr == NULL || outStr == NULL) return;
size_t len = strlen (inpStr);
/* All negative positions to go from end, and cannot
start before start of string, force to start. */
if (startPos < 0) {
startPos = len + startPos;
}
if (startPos < 0) {
startPos = 0;
}
/* Force negative lengths to zero and cannot
start after end of string, force to end. */
if ((size_t)startPos > len) {
startPos = len;
}
len = strlen (&inpStr[startPos]);
/* Adjust length if source string too short. */
if (strLen > len) {
strLen = len;
}
/* Copy string section */
memcpy(outStr, inpStr+startPos, strLen);
outStr[strLen] = '\0';
}
Edit: Based on a comment from r I also came up with this one liner. You're on your own for checks though!
#define substr(dest, src, startPos, strLen) snprintf(dest, BUFF_SIZE, "%.*s", strLen, src+startPos)

Basic standard library functions don't burden themselves with excessive expensive safety checks, leaving them to the user. Most of the safety checks you carry out in your implementation are of expensive kind: totally unacceptable in such a basic library function. This is C, not Java.
Once you get some checks out of the picture, the "substrung" function boils down to ordinary strlcpy. I.e ignoring the safety check on startPos, all you need to do is
char *substr(const char *inpStr, char *outStr, size_t startPos, size_t strLen) {
strlcpy(outStr, inpStr + startPos, strLen);
return outStr;
}
While strlcpy is not a part of the standard library, but it can be crudely replaced by a [misused] strncpy. Again, ignoring the safety check on startPos, all you need to do is
char *substr(const char *inpStr, char *outStr, size_t startPos, size_t strLen) {
strncpy(outStr, inpStr + startPos, strLen);
outStr[strLen] = '\0';
return outStr;
}
Ironically, in your code strncpy is misused in the very same way. On top of that, many of your safety checks are the direct consequence of your choosing a signed type (int) to represent indices, while proper type would be an unsigned one (size_t).

Perhaps because it's a one-liner:
snprintf(dest, dest_size, "%.*s", sub_len, src+sub_start);

You DO have strcpy and strncpy. Aren't enough for you? With strcpy you can simulate the substring from character to end, with strncpy you can simulate the substring from character for a number of characters (you only need to remember to add the \0 at the end of the string). strncpy is even better than the C# equivalent, because you can overshoot the length of the substring and it won't throw an error (if you have allocated enough space in dest, you can do strncpy(dest, src, 1000) even if src is long 1. In C# you can't.)
As written in the comment, you can even use memcpy, but remember to always add a \0 at the end of the string, and you must know how many characters you are copying (so you must know exactly the length of the src substring) AND it's a little more complex to use if a day you want to refactor your code to use wchar_t AND it's not type-safe (because it accepts void* instead of char*). All this in exchange for a little more speed over strncpy

In C you have a function that returns a subset of symbols from a string via pointers: strstr.
char *ptr;
char string1[] = "Hello World";
char string2[] = "World";
ptr = strstr(string1, string2)
*ptr will be pointing to the first character occurrence.
BTW you did not write a function but a procedure, ANSI string functions: string.h

Here's a lighter weight version of what you want. Avoids the redundant strlen calls and guarantees null termination on the destination buffer (something strncpy won't do).
void substr(char* pszSrc, int start, int N, char* pszDst, int lenDest)
{
const char* psz = pszSrc + start;
int x = 0;
while ((x < N) && (x < lenDest))
{
char ch = psz[x];
pszDst[x] = ch;
x++;
if (ch == '\0')
{
return;
}
}
// guarantee null termination
if (x > 0)
{
pszDest[x-1] = 0;
}
}
Example:
char *pszLongString = "This is a long string";
char szSub[10];
substr(pszLongString, 0, 4, szSub, 10); // copies "long" into szSub and includes the null char
So while there isn't a formal substring function in C, C++ string classes usually have such a method:
#include <string>
...
std::string str;
std::string strSub;
str = "This is a long string";
strSub = str.substr(10, 4); // "long"
printf("%s\n", strSub.c_str());

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const char* substr(const char *string, size_t from, size_t to);
int main(int argc, char *argv[])
{
char *string = argv[1];
const char *substring = substr(string,6,80);
printf("string is [%s] substring is [%s]\n",string,substring);
return 0;
}
const char* substr(const char *string, size_t from, size_t to)
{
if (to <= from)
return NULL;
if (from >= to)
return NULL;
if (string == NULL)
return NULL;
if (strlen(string) == 0)
return NULL;
if (from < 0)
from = 0;
if (to > strlen(string))
to = strlen(string);
char *substring = malloc(sizeof(char) * ((to-from)+1));
size_t index;
for (index = 0; from < to; from++, index++)
substring[index] = string[from];
substring[index] = '\0';
return substring;
}

Related

How to find and replace multiple or all occurences in C strings

The goal is to replace multiple (or all) occurences of a given text in another string using only C strings.
(self answered question)
This uses fixed size buffers, you must make sure they are big enough to hold the string after replacement is done.
Define the size before use:
#define LINE_LEN 256
This code was tested with MSVC 2019.
void replaceN(char* line,const char* orig,const char* new, int times){
char* buf;
if(times==0) return; //sem tempo irmao
if((times==-1||--times>0) && (buf = strstr(line,orig))!=NULL){ //find orig
for(const char *c=orig;*c;c++) buf++; //advance buf
replaceN(buf,orig,new,times); //repeat until the last occurrence
}
//this will run first for the last match
if((buf = strstr(line,orig))!=NULL){
char tmp[LINE_LEN];
int i = buf-line; //pointer difference
strncpy(tmp,line,i); //copy everything before the match
for(const char *k=orig;*k;k++) buf++; //buf++; //skip find string
for(const char *k=new;*k;k++) tmp[i++]=*k; //copy replace chars
for(;*buf;buf++) tmp[i++]=*buf; //copy the rest of the string
tmp[i]='\0';
strcpy(line,tmp);
}
}
inline void replace(char* line,const char* orig,const char* new){replaceN(line, orig, new, 1);}
inline void replaceAll(char* line,const char* orig,const char* new){replaceN(line,orig,new,-1);}
Turns out I had too much self esteem. The code was not tested, and I should not have posted it without proper testing. I add this comment to remind others of not doing the same mistake. If you find any other errors, please let me know.
In order to keep it simple, I don't do it in place. Instead it requires a preallocated output buffer. Doing in place is risky if the size of the new string is longer than the original. And there's also an edge case that can be tricky to handle, and that's when the original substring to replace is a substring of the new string.
The headers needed to run allt his:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>
The main replace function. It replaces maximum n occurrences and returns number of replacements. dest is a buffer big enough to hold the result. All pointers needs to be non NULL and valid. You may notice that I'm using goto which may be frowned upon, but using it to exit cleanly is very convenient.
size_t replace(char *dest, const char *src, const char *orig,
const char *new, size_t n) {
size_t ret = 0;
// Maybe an unnecessary optimization to avoid multiple calls in
// loop, but it also adds clarity
const size_t newlen = strlen(new);
const size_t origlen = strlen(orig);
if(origlen == 0 || n == 0) goto END; // Edge cases
do {
const char *match = strstr(src, orig);
if(!match) goto END;
// Length of the part of src before first match
const ptrdiff_t offset = match - src;
memcpy(dest, src, offset); // Copy before match
memcpy(dest + offset, new, newlen); // Replace
src += offset + origlen; // Move src past what we have already copied.
dest += offset + newlen; // Advance pointer to dest to the end
ret++;
} while(n > ret);
END:
strcpy(dest, src); // Copy whatever is remaining
return ret;
}
It's easy to write a wrapper for the allocation. We borrow and modify some code from find the count of substring in string
size_t countOccurrences(const char *str, const char *substr) {
if(strlen(substr) == 0) return 0;
size_t count = 0;
const size_t len = strlen(substr);
while((str = strstr(str, substr))) {
count++;
str+=len // We're standing at the match, so we need to advance
}
return count;
}
Then some code to calculate buffer size
size_t calculateBufferLength(const char *src, const char *orig,
const char *new, size_t n) {
const size_t origlen = strlen(orig);
const size_t newlen = strlen(new);
const size_t baselen = strlen(src) + 1;
if(origlen > newlen) return srclen;
n = n < count ? n : count; // Min of n and count
return baselen +
n * (newlen - origlen);
}
And the final function. It combines allocation and replacement. It returns a pointer to the buffer, and NULL if allocation fails.
char *replaceAndAllocate(const char *src, const char *orig,
const char *new, size_t n) {
const size_t count = countOccurrences(src, orig);
const size_t size = calculateBufferLength(src, orig, new, n);
char *buf = malloc(size);
if(buf) replace(buf, src, orig, new, n);
return buf;
}
And finally, a simple main with a few test cases
int main(void) {
puts(replaceAndAllocate("hoho", "ha", "he", SIZE_MAX ));
puts(replaceAndAllocate("", "", "", 5));
puts(replaceAndAllocate("", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "", 5));
puts(replaceAndAllocate("", "", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 5));
puts(replaceAndAllocate("hihihi!!!", "hi", "of", 2));
puts(replaceAndAllocate("!!!hihihi", "hi", "x", 3));
puts(replaceAndAllocate("asdfasdfasdf", "asdf", "x", 2));
puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "y", SIZE_MAX ));
puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "y", 0));
puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "y", 1));
puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "", SIZE_MAX ));
puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "", 3 ));
puts(replaceAndAllocate("!asdf!asdf!asdf!", "asdf", "asdf#asdf", SIZE_MAX));
// Yes, I skipped freeing the buffers to save some space
}
No warnings with -Wall -Wextra -pedantic and the output is:
$ ./a.out
hoho
ofofhi!!!
!!!xxx
xxasdf
yyyyyyyyyyyy
xxxxxxxxxxxx
yxxxxxxxxxxx
xxxxxxxxx
!asdf#asdf!asdf#asdf!asdf#asdf!
Note that I don't have any special functions for replacing one and replacing all. If you really want those, just write wrappers with n=1 or n=SIZE_MAX. Using SIZE_MAX is safe, because a string cannot be bigger than that.
Another reason that I got rid of a special function for one replacement is that it was very ineffecient. Also, it was easier to write it that way and it is much cleaner.
I changed the code a lot from last time, and that's very much thanks to the awesome help I got at Codereview. You can see how the code was before on the question I posted there: https://codereview.stackexchange.com/q/263785/133688

C replace char in char array

Folks, need to search through a character array and replace any occurrence of '+','/',or'=' with '%2B','%2F', and '%2F' respectively
base64output variable looks like
FtCPpza+Z0FASDFvfgtoCZg5zRI=
code
char *signature = replace_char(base64output, "+", "%2B");
signature = replace_char(signature, "/", "%2F");
signature = replace_char(signature, "=", "%3B");
char replace_char (char *s, char find, char replace) {
while (*s != 0) {
if (*s == find)
*s = replace;
s++;
}
return s;
}
(Errors out with)
s.c:266: warning: initialization makes pointer from integer without a cast
What am i doing wrong? Thanks!
If the issue is that you have garbage in your signature variable:
void replace_char(...) is incompatible with signature = replace_char(...)
Edit:
Oh I didn't see... This is not going to work since you're trying to replace a char by an array of chars with no memory allocation whatsoever.
You need to allocate a new memory chunk (malloc) big enough to hold the new string, then copy the source 's' to the destination, replacing 'c' by 'replace' when needed.
The prototype should be:
char *replace_char(char *s, char c, char *replace);
1.
for char use '' single quotes
for char* use "" double quotes
2.
The function does include the return keyword, therefore it does not return what you'd expect
3.
These webpages have examples on string replacement
http://www.cplusplus.com/reference/cstring/strstr/
What is the function to replace string in C?
You could go for some length discussing various ways to do this.
Replacing a single char is simple - loop through, if match, replace old with new, etc.
The problem here is that the length of the "new" part is longer than the length of the old one.
One way would be to determine the length of the new string (by counting chars), and either (1) try to do it in place, or (2) allocate a new string.
Here's an idea for #1:
int replace(char *buffer, size_t size, char old, const char *newstring)
{
size_t newlen = strlen(newstring);
char *p, *q;
size_t targetlen = 0;
// First get the final length
//
p = buffer;
while (*p)
{
if (*p == old)
targetlen += newlen;
else
targetlen++;
++p;
}
// Account for null terminator
//
targetlen++;
// Make sure there's enough space
//
if (targetlen > size)
return -1;
// Now we copy characters. We'll start at the end and
// work our way backwards.
//
p = buffer + strlen(buffer);
q = buffer + targetlen;
while (targetlen)
{
if (*p == old)
{
q -= newlen;
memcpy(q, newstring, newlen);
targetlen -= newlen;
--p;
}
else
{
*--q = *p--;
--targetlen;
}
}
return 0;
}
Then you could use it this way (here's a quick test I did):
char buf[4096] = "hello world";
if (replace(buf, sizeof(buf), 'o', "oooo"))
{
fprintf(stderr, "Not enough space\n");
}
else
{
puts(buf);
}
your replace_char signature returns void
void replace_char (char *s, char find, char replace)
But, when the linker tries to resolve the following
signature = replace_char(signature, "=", '%3B');
It doesn't find any function that's called replace_char and returns int (int is the default if there's no prototype).
Change the replace_char function prototype to match the statement.
EDIT:
The warning states that your function returns char, but you use it as a char *
also, your function doesn't return anything, do you need to return something ?
It looks like you don't really understand the code that you're working with.
Fixing errors and warnings without understanding exactly what you need to do is worthless..
fix like this
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *replace_char (char *str, char find, char *replace) {
char *ret=str;
char *wk, *s;
wk = s = strdup(str);
while (*s != 0) {
if (*s == find){
while(*replace)
*str++ = *replace++;
++s;
} else
*str++ = *s++;
}
*str = '\0';
free(wk);
return ret;
}
int main(void){
char base64output[4096] = "FtCPpza+Z0FASDFvfgtoCZg5zRI=";
char *signature = replace_char(base64output, '+', "%2B");
signature = replace_char(signature, '/', "%2F");
signature = replace_char(signature, '=', "%3B");
printf("%s\n", base64output);
return 0;
}
below is a code that ACTUALLY WORKS !!!!
Ammar Hourani
char * replace_char(char * input, char find, char replace)
{
char * output = (char*)malloc(strlen(input));
for (int i = 0; i < strlen(input); i++)
{
if (input[i] == find) output[i] = replace;
else output[i] = input[i];
}
output[strlen(input)] = '\0';
return output;
}

What's wrong with this character buffer code?

For reasons that I promise exist, I'm reading input character by character, and if a character meets certain criteria, I'm writing it into a dynamically allocated buffer. This function adds the specified character to the "end" of the specified string. When reading out of the buffer, I read the first 'size' characters.
void append(char c, char *str, int size)
{
if(size + 1 > strlen(str))
str = (char*)realloc(str,sizeof(char)*(size + 1));
str[size] = c;
}
This function, through various iterations of development has produced such errors as "corrupted double-linked list", "double free or corruption". Below is a sample of how append is supposed to be used:
// buffer is a string
// bufSize is the number of non-garbage characters at the beginning of buffer
char *buft = buffer;
int bufLoc=0;
while((buft-buffer)/sizeof(char) < bufSize)
append(*(buft==),destination,bufLoc++);
It generally works for some seemingly arbitrary number of characters, and then aborts with error. If it's not clear what the second code snippet is doing, it's just copying from the buffer into some destination string. I know there's library methods for this, but I need a bit finer control of what exactly gets copied sometimes.
Thanks in advance for any insight. I'm stumped.
This function does not append a character to a buffer.
void append(char c, char *str, int size)
{
if(size + 1 > strlen(str))
str = realloc(str, size + 1);
str[size] = c;
}
First, what is strlen(str)? You can say "it's the length of str", but that's omitting some very important details. How does it compute the length? Easy -- str must be NUL-terminated, and strlen finds the offset of the first NUL byte in it. If your buffer doesn't have a NUL byte at the end, then you can't use strlen to find its length.
Typically, you will want to keep track of the buffer's length. In order to reduce the number of reallocations, keep track of the buffer size and the amount of data in it separately.
struct buf {
char *buf;
size_t buflen;
size_t bufalloc;
};
void buf_init(struct buf *b)
{
buf->buf = NULL;
buf->buflen = 0;
buf->bufalloc = 0;
}
void buf_append(struct buf *b, int c)
{
if (buf->buflen >= buf->bufalloc) {
size_t newalloc = buf->bufalloc ? buf->bufalloc * 2 : 16;
char *newbuf = realloc(buf->buf, newalloc);
if (!newbuf)
abort();
buf->buf = newbuf;
buf->bufalloc = newalloc;
}
buf->buf[buf->buflen++] = c;
}
Another problem
This code:
str = realloc(str, size + 1);
It only changes the value of str in append -- it doesn't change the value of str in the calling function. Function arguments are local to the function, and changing them doesn't affect anything outside of the function.
Minor quibbles
This is a bit strange:
// Weird
x = (char*)realloc(str,sizeof(char)*(size + 1));
The (char *) cast is not only unnecessary, but it can actually mask an error -- if you forget to include <stdlib.h>, the cast will allow the code to compile anyway. Bummer.
And sizeof(char) is 1, by definition. So don't bother.
// Fixed
x = realloc(str, size + 1);
When you do a:
str = (char*)realloc(str,sizeof(char)*(size + 1));
the changes in str will not be reflected in the calling function, in other words the changes are local to the function as the pointer is passed by value. To fix this you can either return the value of str:
char * append(char c, char *str, int size)
{
if(size + 1 > strlen(str))
str = (char*)realloc(str,sizeof(char)*(size + 1));
str[size] = c;
return str;
}
or you can pass the pointer by address:
void append(char c, char **str, int size)
{
if(size + 1 > strlen(str))
*str = (char*)realloc(*str,sizeof(char)*(size + 1));
(*str)[size] = c;
}

C - Sub String (From POS to POS)

I have a character array of length 32 and would like to take certain charcters out of it.
for example
111111000000000000000000111111 <32 chars
I would like to take chars 0-6 which would be 111111
Or even take chars 26-31 which would be 111111
char check_type[32];
Above is how I'm declaring.
What I would like to be able to do is define a function or use a function that takes that starting place, and end character.
Ive looked at many ways like using strncpy and strcpy but found no way yet.
I would simply wrap strncpy:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Creates a sub-string of range [start, end], return value must be freed */
char *substr(char *src, size_t start, size_t end)
{
size_t sub_len = end - start + 1;
char * new_str = malloc(sub_len + 1); /* TODO: check malloc's return value */
strncpy(new_str, src, sub_len);
new_str[sub_len] = '\0'; /* new_str is of size sub_len + 1 */
return new_str;
}
int main(void)
{
char str[] = "111111000000000000000000111111";
char *sub_str = substr(str, 0, 5);
puts(sub_str);
free(sub_str);
return EXIT_SUCCESS;
}
Output:
111111
Use memcpy.
// Stores s[from..to) in sub.
// The caller is responsible for memory allocation.
void extract_substr(char const *s, char *sub, size_t from, size_t to)
{
size_t sublen = to - from;
memcpy(sub, s + from, sublen);
sub[sublen] = '\0';
}
Sample:
char *substr(char *source, int startpos, int endpos)
{
int len = endpos - startpos + 2; // must account for final 0
int i = 0;
char *src, *dst;
char *ret = calloc(len, sizeof(char));
if (!ret)
return ret;
src = source + startpos;
dst = ret;
while (i++ < len)
*dst++ = *src++;
*dst = 0;
return ret;
}
Of course, free the return code when you don't need it anymore. And you notice this function will not check for the validity of endpos vs startpos.
First define the required interface...perhaps:
int substring(char *target, size_t tgtlen, const char *source, size_t src_bgn, size_t src_end);
This takes a destination (target) array where the data will be copied, and is given its length. The data will come from the source array, between positions src_bgn and src_end. The return value will be -1 for an error, and the length of the output (excluding the terminating null). If the target string is too short, you will get an error.
With that set of details in place, you can implement the body fairly easily, and strncpy() might well be appropriate this time (it often isn't).
Usage (based on your question):
char check_type[32] = "111111000000000000000000111111";
char result1[10];
char result2[10];
if (substring(result1, sizeof(result1), check_type, 0, 6) <= 0 ||
substring(result2, sizeof(result2), check_type, 26, 31) <= 0)
...something went wrong...
else
...use result1 and result2...
Check this:
char* Substring(char *string, int len, int start, int end) {
/*
Creates a substring from a given string.
Args:
string: The string whose substring you need to find.
len: The length of the string.
start: The start position for the substring.
end: The end position of the substring (inclusive).
Returns:
substring: (of type char*) which is allocated on the heap.
NULL: on error.
*/
// Check that the start and end position are valid.
// If not valid, then return NULL.
if (start < 0 || start >= len || end < 0 || end >= len) {
return NULL;
}
// Allocate memory to return the substring on the heap.
char *substring = malloc(sizeof(char) * (end - start + 2));
int index = 0, i;
for (i = start; i <= end; i++) {
substring[index] = string[i];
index++;
}
// End with a null character.
substring[index] = '\0';
return substring;
}
int main() {
char str[] = "11111100000000000000000000111111";
printf("%s\n", Substring(str, strlen(str), 0, 5));
printf("%s\n", Substring(str, strlen(str), 26, 31));
}

Reversing a string in C using pointers?

Language: C
I am trying to program a C function which uses the header char *strrev2(const char *string) as part of interview preparation, the closest (working) solution is below, however I would like an implementation which does not include malloc... Is this possible? As it returns a character meaning if I use malloc, a free would have to be used within another function.
char *strrev2(const char *string){
int l=strlen(string);
char *r=malloc(l+1);
for(int j=0;j<l;j++){
r[j] = string[l-j-1];
}
r[l] = '\0';
return r;
}
[EDIT] I have already written implementations using a buffer and without the char. Thanks tho!
No - you need a malloc.
Other options are:
Modify the string in-place, but since you have a const char * and you aren't allowed to change the function signature, this is not possible here.
Add a parameter so that the user provides a buffer into which the result is written, but again this is not possible without changing the signature (or using globals, which is a really bad idea).
You may do it this way and let the caller responsible for freeing the memory. Or you can allow the caller to pass in an allocated char buffer, thus the allocation and the free are all done by caller:
void strrev2(const char *string, char* output)
{
// place the reversed string onto 'output' here
}
For caller:
char buffer[100];
char *input = "Hello World";
strrev2(input, buffer);
// the reversed string now in buffer
You could use a static char[1024]; (1024 is an example size), store all strings used in this buffer and return the memory address which contains each string. The following code snippet may contain bugs but will probably give you the idea.
#include <stdio.h>
#include <string.h>
char* strrev2(const char* str)
{
static char buffer[1024];
static int last_access; //Points to leftmost available byte;
//Check if buffer has enough place to store the new string
if( strlen(str) <= (1024 - last_access) )
{
char* return_address = &(buffer[last_access]);
int i;
//FixMe - Make me faster
for( i = 0; i < strlen(str) ; ++i )
{
buffer[last_access++] = str[strlen(str) - 1 - i];
}
buffer[last_access] = 0;
++last_access;
return return_address;
}else
{
return 0;
}
}
int main()
{
char* test1 = "This is a test String";
char* test2 = "George!";
puts(strrev2(test1));
puts(strrev2(test2));
return 0 ;
}
reverse string in place
char *reverse (char *str)
{
register char c, *begin, *end;
begin = end = str;
while (*end != '\0') end ++;
while (begin < --end)
{
c = *begin;
*begin++ = *end;
*end = c;
}
return str;
}

Resources