I've known some ways for handling a failure of a malloc() and I prefer the way to use a result of a malloc() like some example code below. But actually I don't know well how this phrase is actually work. Can anybody give me some advice about using this phrase ?
"if ((result = malloc(sizeof(char) * (strlen(s1) + strlen(s2) + 1))) != 0)"
char strjoin(char const *s1, char const *s2)
{
char *result;
int idx;
if ((result = malloc(sizeof(char) * (strlen(s1) + strlen(s2) + 1))) != 0)
{
idx = 0;
while (*s1)
result[idx++] = *s1++;
while (*s2)
result[idx++] = *s2++;
result[idx] = 0;
return (result);
}
return (0);
}
Found some example codes i.e. using same phrase
char strjoin... is a bug which prevents this code from compiling. It should obviously be char*.
malloc returns a pointer if successful, otherwise a null pointer. result gets assigned either of these and evaluate to not zero or zero accordingly.
Please note that assignment inside an if condition is regarded as poor and dangerous practice, so it should be avoided.
sizeof(char) * ... is nonsense. The very definition of sizeof is that sizeof(char) is always 1. Multiplying something with a constant guaranteed to be 1 is just useless bloat.
strlen(s1) + strlen(s2) + 1 sums up the length of two strings and then + 1 to make room for the null terminator.
Note that lines like result[idx++] = *s1++; have no less than 3 side effects in a single line, which is very bad practice. It is recommended to only have 1 side effect in an expression, to prevent sequencing/order of evaluation bugs.
Copying characters 1 by 1 is naive, memcpy is likely much faster than that.
There is no need for idx since we already know the length of each string from previous strlen calls. The programmer should have saved it down at that point. Instead they are making the program needlessly slow.
Consider a complete rewrite of this questionable code into something more readable and efficient:
#include <stdlib.h>
#include <string.h>
char* strjoin (const char *s1, const char* s2)
{
size_t length1 = strlen(s1);
size_t length2 = strlen(s2);
char* result = malloc(length1 + length2 + 1);
if(result == NULL)
return NULL;
memcpy(result, s1, length1);
memcpy(result+length1, s2, length2);
result[length1+length2] = '\0';
return result;
}
Related
I have just coded splitting string into words.
if char *cmd = "Hello world baby", then argv[0] = "Hello", argv[1] = "world", argv[2] = "baby".
strdup function cannot be used, and I want to implement this using malloc and strcpy.
my code is below.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define buf_size 128
int main() {
char *argv[16];
memset(argv, 0, sizeof(argv));
int words = 0;
char *cmd = "Hello world baby";
unsigned int len = strlen(cmd);
int start = 0;
for(unsigned int i = 0; i <= len; i++){
if(cmd[i] == ' ' | cmd[i] == '\0'){
++words;
char *w = (char *)malloc(sizeof(char)*(i-start) + 1);
strcpy(w, cmd + start);
w[i-start] = '\0';
argv[i] = w;
start = i + 1;
}
}
for(int i = 0; i < words; i++){
printf("%s\n", argv[i]);
free(argv[i]);
}
return 0;
}
I hoped that the printf function produces:
Hello
world
baby
However, when the printf() function is reached, the program triggers a segmentation fault.
Your primary problem, despite the banter in the comments about how to write your own version of strdup(), is that you really need strndup(). Eh?
You have the line:
strcpy(w, cmd + start);
Unfortunately, this copies the whole string from cmd + start into the allocated space, but you only wanted to copy (i - start) + 1 bytes including the null byte, because that's all the space you allocated. So, you have a buffer overflow (but not a stack overflow).
POSIX provides the function strndup()
with the signature:
extern char *strndup(const char *s, size_t size);
This allocates at most size + 1 bytes and copies at most size bytes from s and a null byte into the allocated space. You'd use:
argv[i] = strndup(cmd + start, i - start);
to get the required result. If you don't have (or can't use) strndup(), you can write your own. That's easiest if you have strnlen(), but you can write your own version of that if necessary (you don't have it or can't use it):
char *my_strndup(const char *s, size_t len)
{
size_t nbytes = strnlen(s, len);
char *result = malloc(nbytes + 1);
if (result != NULL)
{
memmove(result, s, nbytes);
result[nbytes] = '\0';
}
return result;
}
This deals with the situation where the actual string is shorter than the maximum length it can be by using the size from strnlen(). It's not clear that you are guaranteed to be able to access the memory at s + nbytes - 1, so simply allocating for the maximum size is not appropriate.
Implementing strnlen():
size_t my_strnlen(const char *s, size_t size)
{
size_t count = 0;
while (count < size && *s++ != '\0')
count++;
return count;
}
"Official" versions of this are probably implemented in assembler and are more efficient, but I think that's a valid implementation in pure C.
Another alternative in your code is to use the knowledge of the length:
char *w = (char *)malloc(sizeof(char)*(i - start + 1));
memmove(w, cmd + start, i - start);
w[i-start] = '\0';
argv[i] = w;
start = i + 1;
I note in passing that multiplying by sizeof(char) is a no-op since sizeof(char) == 1 by definition. You should include the + 1 in the multiplication in general (as I've reparenthesized the expression). If you were dealing with some structure and wanted N + 1 structures, you need to use (N + 1) * sizeof(struct WhatNot) and not N * sizeof(struct WhatNot) + 1. It's a good idea to head off bugs caused by sloppy coding practices while you're learning, even though there's no difference in the result here.
There are those who excoriate the cast on the result of malloc(). I'm not one of them: I learned to program on a pre-standard C system where the cast was crucial because the char * address of an object was different from the address of the same memory location when referenced via a pointer to a type bigger than a char. That is, the short * address and char * address for the same memory location had different bit patterns. Not casting the result of malloc() led to crashes. (I observe that the primary excuse given for rejecting the cast is "it may hide errors if malloc() is not declared". That excuse went by the wayside when C99 mandated that functions must be declared before being used.)
Warning: no compiler was consulted about the validity of any of the code shown in this answer. Nor was the sanity of the overall algorithm tested.
You have:
if(cmd[i] == ' ' | cmd[i] == '\0'){
That | should be ||.
For some functions for string manipulation, I try to rewrite the function output onto the original string. I came up with the general scheme of
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *char_repeater(char *str, char ch)
{
int tmp_len = strlen(str) + 1; // initial size of tmp
char *tmp = (char *)malloc(tmp_len); // initial size of tmp
// the process is normally too complicated to calculate the final length here
int j = 0;
for (int i = 0; i < strlen(str); i++)
{
tmp[j] = str[i];
j++;
if (str[i] == ch)
{
tmp[j] = str[i];
j++;
}
if (j > tmp_len)
{
tmp_len *= 2; // growth factor
tmp = realloc(tmp, tmp_len);
}
}
tmp[j] = 0;
char *output = (char *)malloc(strlen(tmp) + 1);
// output matching the final string length
strncpy(output, tmp, strlen(tmp));
output[strlen(tmp)] = 0;
free(tmp); // Is it necessary?
return output;
}
int main()
{
char *str = "This is a test";
str = char_repeater(str, 'i');
puts(str);
free(str);
return 0;
}
Although it works on simple tests, I am not sure if I am on the right track.
Is this approach safe overall?
Of course, we do not re-write the string. We simply write new data (array of the characters) at the same pointer. If output is longer than str, it will rewrite the data previously written at str, but if output is shorter, the old data remains, and we would have a memory leak. How can we free(str) within the function before outputting to its pointer?
A pair of pointers can be used to iterate through the string.
When a matching character is found, increment the length.
Allocate output as needed.
Iterate through the string again and assign the characters.
This could be done in place if str was malloced in main.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *char_repeater(char *str, char ch)
{
int tmp_len = strlen(str) + 1; // initial size of tmp
char *find = str;
while ( *find) // not at terminating zero
{
if ( *find == ch) // match
{
tmp_len++; // add one
}
++find; // advance pointer
}
char *output = NULL;
if ( NULL == ( output = malloc(tmp_len)))
{
fprintf ( stderr, "malloc peoblem\n");
exit ( 1);
}
// output matching the final string length
char *store = output; // to advance through output
find = str; // reset pointer
while ( *find) // not at terminating zero
{
*store = *find; // assign
if ( *find == ch) // match
{
++store; // advance pointer
*store = ch; // assign
}
++store; // advance pointer
++find;
}
*store = 0; // terminate
return output;
}
int main()
{
char *str = "This is a test";
str = char_repeater(str, 'i');
puts(str);
free(str);
return 0;
}
For starters the function should be declared like
char * char_repeater( const char *s, char c );
because the function does not change the passed string.
Your function is unsafe and inefficient at least because there are many dynamic memory allocations. You need to check that each dynamic memory allocation was successful. Also there are called the function strlen also too ofhen.
Also this code snippet
tmp[j] = str[i];
j++;
if (str[i] == ch)
{
tmp[j] = str[i];
j++;
}
if (j > tmp_len)
//...
can invoke undefined behavior. Imagine that the source string contains only one letter 'i'. In this case the variable tmp_len is equal to 2. So temp[0] will be equal to 'i' and temp[1] also will be equal to 'i'. In this case j equal to 2 will not be greater than tmp_len. As a result this statement
tmp[j] = 0;
will write outside the allocated memory.
And it is a bad idea to reassign the pointer str
char *str = "This is a test";
str = char_repeater(str, 'i');
As for your question whether you need to free the dynamically allocated array tmp
free(tmp); // Is it necessary?
then of course you need to free it because you allocated a new array for the result string
char *output = (char *)malloc(strlen(tmp) + 1);
And as for your another question
but if output is shorter, the old data remains, and we would have a
memory leak. How can we free(str) within the function before
outputting to its pointer?
then it does not make a sense. The function creates a new character array dynamically that you need to free and the address of the allocated array is assigned to the pointer str in main that as I already mentioned is not a good idea.
You need at first count the length of the result array that will contain duplicated characters and after that allocate memory only one time.
Here is a demonstration program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char * char_repeater( const char *s, char c )
{
size_t n = 0;
for ( const char *p = s; ( p = strchr( p, c ) ) != NULL; ++p )
{
++n;
}
char *result = malloc( strlen( s ) + 1 + n );
if ( result != NULL )
{
if ( n == 0 )
{
strcpy( result, s );
}
else
{
char *p = result;
do
{
*p++ = *s;
if (*s == c ) *p++ = c;
} while ( *s++ );
}
}
return result;
}
int main( void )
{
const char *s = "This is a test";
puts( s );
char *result = char_repeater( s, 'i' );
if ( result != NULL ) puts( result );
free( result );
}
The program output is
This is a test
Thiis iis a test
My kneejerk reaction is to dislike the design. But I have reasons.
First, realloc() is actually quite efficient. If you are just allocating a few extra bytes every loop, then chances are that the standard library implementation simply increases the internal bytecount value associated with your memory. Caveats are:
Interleaving memory management.Your function here doesn’t have any, but should you start calling other routines then keeping track of all that becomes an issue. Anything that calls other memory management routines can lead to the next problem:
Fragmented memory.If at any time the available block is too small for your new request, then a much more expensive operation to obtain more memory and copy everything over becomes an issue.
Algorithmic issues are:
Mixing memory management in increases the complexity of your code.
Every occurrence of c invokes a function call with potential to be expensive. You cannot control when it is expensive and when it is not.
Worst-case options (char_repeater( "aaaaaaaaaa", 'a' )) trigger worst-case potentialities.
My recommendation is to simply make two passes.
This passes several smell tests:
Algorithmic complexity is broken down into two simpler parts:
counting space required, and
allocating and copying.
Worst-case scenarios for allocation/reallocation are reduced to a single call to malloc().
Issues with very large strings are reduced:
You need at most space for 2 large strings (not 3, possibly repeated)
Page fault / cache boundary issues are similar (or the same) for both methods
Considering there are no real downsides to using a two-pass approach, I think that using a simpler algorithm is reasonable. Here’s code:
#include <stdio.h>
#include <stdlib.h>
char * char_repeater( const char * s, char c )
{
// FIRST PASS
// (1) count occurances of c in s
size_t number_of_c = 0;
const char * p = s;
while (*p) number_of_c += (*p++ == c);
// (2) get strlen s
size_t length_of_s = p - s;
// SECOND PASS
// (3) allocate space for the resulting string
char * dest = malloc( length_of_s + number_of_c + 1 );
// (4) copy s -> dest, duplicating every occurance of c
if (dest)
{
char * d = dest;
while (*s)
if ((*d++ = *s++) == c)
*d++ = c;
*d = '\0';
}
return dest;
}
int main(void)
{
char * s = char_repeater( "Hello world!", 'o' );
puts( s );
free( s );
return 0;
}
As always, know your data
Whether or not a two-pass approach actually is better than a realloc() approach depends on more factors than what is evident in a posting on the internet.
Nevertheless, I would wager that for general purpose strings that this is a better choice.
But, even if it isn’t, I would argue that a simpler algorithm, splitting tasks into trivial sub-tasks, is far easier to read and maintain. You should only start making tricky algorithms only if you have use-case profiling saying you need to spend more attention on it.
Without that, readability and maintainability trumps all other concerns.
For reasons that I promise exist, I'm reading input character by character, and if a character meets certain criteria, I'm writing it into a dynamically allocated buffer. This function adds the specified character to the "end" of the specified string. When reading out of the buffer, I read the first 'size' characters.
void append(char c, char *str, int size)
{
if(size + 1 > strlen(str))
str = (char*)realloc(str,sizeof(char)*(size + 1));
str[size] = c;
}
This function, through various iterations of development has produced such errors as "corrupted double-linked list", "double free or corruption". Below is a sample of how append is supposed to be used:
// buffer is a string
// bufSize is the number of non-garbage characters at the beginning of buffer
char *buft = buffer;
int bufLoc=0;
while((buft-buffer)/sizeof(char) < bufSize)
append(*(buft==),destination,bufLoc++);
It generally works for some seemingly arbitrary number of characters, and then aborts with error. If it's not clear what the second code snippet is doing, it's just copying from the buffer into some destination string. I know there's library methods for this, but I need a bit finer control of what exactly gets copied sometimes.
Thanks in advance for any insight. I'm stumped.
This function does not append a character to a buffer.
void append(char c, char *str, int size)
{
if(size + 1 > strlen(str))
str = realloc(str, size + 1);
str[size] = c;
}
First, what is strlen(str)? You can say "it's the length of str", but that's omitting some very important details. How does it compute the length? Easy -- str must be NUL-terminated, and strlen finds the offset of the first NUL byte in it. If your buffer doesn't have a NUL byte at the end, then you can't use strlen to find its length.
Typically, you will want to keep track of the buffer's length. In order to reduce the number of reallocations, keep track of the buffer size and the amount of data in it separately.
struct buf {
char *buf;
size_t buflen;
size_t bufalloc;
};
void buf_init(struct buf *b)
{
buf->buf = NULL;
buf->buflen = 0;
buf->bufalloc = 0;
}
void buf_append(struct buf *b, int c)
{
if (buf->buflen >= buf->bufalloc) {
size_t newalloc = buf->bufalloc ? buf->bufalloc * 2 : 16;
char *newbuf = realloc(buf->buf, newalloc);
if (!newbuf)
abort();
buf->buf = newbuf;
buf->bufalloc = newalloc;
}
buf->buf[buf->buflen++] = c;
}
Another problem
This code:
str = realloc(str, size + 1);
It only changes the value of str in append -- it doesn't change the value of str in the calling function. Function arguments are local to the function, and changing them doesn't affect anything outside of the function.
Minor quibbles
This is a bit strange:
// Weird
x = (char*)realloc(str,sizeof(char)*(size + 1));
The (char *) cast is not only unnecessary, but it can actually mask an error -- if you forget to include <stdlib.h>, the cast will allow the code to compile anyway. Bummer.
And sizeof(char) is 1, by definition. So don't bother.
// Fixed
x = realloc(str, size + 1);
When you do a:
str = (char*)realloc(str,sizeof(char)*(size + 1));
the changes in str will not be reflected in the calling function, in other words the changes are local to the function as the pointer is passed by value. To fix this you can either return the value of str:
char * append(char c, char *str, int size)
{
if(size + 1 > strlen(str))
str = (char*)realloc(str,sizeof(char)*(size + 1));
str[size] = c;
return str;
}
or you can pass the pointer by address:
void append(char c, char **str, int size)
{
if(size + 1 > strlen(str))
*str = (char*)realloc(*str,sizeof(char)*(size + 1));
(*str)[size] = c;
}
I know C is purposefully bare-bones, but I'm curious as to why something as commonplace as a substring function is not included in <string.h>.
Is it that there is not one "right enough" way to do it? Too many domain specific requirements? Can anyone shed any light?
BTW, this is the substring function I came up with after a bit of research.
Edit: I made a few updates based on comments.
void substr (char *outStr, const char *inpStr, int startPos, size_t strLen) {
/* Cannot do anything with NULL. */
if (inpStr == NULL || outStr == NULL) return;
size_t len = strlen (inpStr);
/* All negative positions to go from end, and cannot
start before start of string, force to start. */
if (startPos < 0) {
startPos = len + startPos;
}
if (startPos < 0) {
startPos = 0;
}
/* Force negative lengths to zero and cannot
start after end of string, force to end. */
if ((size_t)startPos > len) {
startPos = len;
}
len = strlen (&inpStr[startPos]);
/* Adjust length if source string too short. */
if (strLen > len) {
strLen = len;
}
/* Copy string section */
memcpy(outStr, inpStr+startPos, strLen);
outStr[strLen] = '\0';
}
Edit: Based on a comment from r I also came up with this one liner. You're on your own for checks though!
#define substr(dest, src, startPos, strLen) snprintf(dest, BUFF_SIZE, "%.*s", strLen, src+startPos)
Basic standard library functions don't burden themselves with excessive expensive safety checks, leaving them to the user. Most of the safety checks you carry out in your implementation are of expensive kind: totally unacceptable in such a basic library function. This is C, not Java.
Once you get some checks out of the picture, the "substrung" function boils down to ordinary strlcpy. I.e ignoring the safety check on startPos, all you need to do is
char *substr(const char *inpStr, char *outStr, size_t startPos, size_t strLen) {
strlcpy(outStr, inpStr + startPos, strLen);
return outStr;
}
While strlcpy is not a part of the standard library, but it can be crudely replaced by a [misused] strncpy. Again, ignoring the safety check on startPos, all you need to do is
char *substr(const char *inpStr, char *outStr, size_t startPos, size_t strLen) {
strncpy(outStr, inpStr + startPos, strLen);
outStr[strLen] = '\0';
return outStr;
}
Ironically, in your code strncpy is misused in the very same way. On top of that, many of your safety checks are the direct consequence of your choosing a signed type (int) to represent indices, while proper type would be an unsigned one (size_t).
Perhaps because it's a one-liner:
snprintf(dest, dest_size, "%.*s", sub_len, src+sub_start);
You DO have strcpy and strncpy. Aren't enough for you? With strcpy you can simulate the substring from character to end, with strncpy you can simulate the substring from character for a number of characters (you only need to remember to add the \0 at the end of the string). strncpy is even better than the C# equivalent, because you can overshoot the length of the substring and it won't throw an error (if you have allocated enough space in dest, you can do strncpy(dest, src, 1000) even if src is long 1. In C# you can't.)
As written in the comment, you can even use memcpy, but remember to always add a \0 at the end of the string, and you must know how many characters you are copying (so you must know exactly the length of the src substring) AND it's a little more complex to use if a day you want to refactor your code to use wchar_t AND it's not type-safe (because it accepts void* instead of char*). All this in exchange for a little more speed over strncpy
In C you have a function that returns a subset of symbols from a string via pointers: strstr.
char *ptr;
char string1[] = "Hello World";
char string2[] = "World";
ptr = strstr(string1, string2)
*ptr will be pointing to the first character occurrence.
BTW you did not write a function but a procedure, ANSI string functions: string.h
Here's a lighter weight version of what you want. Avoids the redundant strlen calls and guarantees null termination on the destination buffer (something strncpy won't do).
void substr(char* pszSrc, int start, int N, char* pszDst, int lenDest)
{
const char* psz = pszSrc + start;
int x = 0;
while ((x < N) && (x < lenDest))
{
char ch = psz[x];
pszDst[x] = ch;
x++;
if (ch == '\0')
{
return;
}
}
// guarantee null termination
if (x > 0)
{
pszDest[x-1] = 0;
}
}
Example:
char *pszLongString = "This is a long string";
char szSub[10];
substr(pszLongString, 0, 4, szSub, 10); // copies "long" into szSub and includes the null char
So while there isn't a formal substring function in C, C++ string classes usually have such a method:
#include <string>
...
std::string str;
std::string strSub;
str = "This is a long string";
strSub = str.substr(10, 4); // "long"
printf("%s\n", strSub.c_str());
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const char* substr(const char *string, size_t from, size_t to);
int main(int argc, char *argv[])
{
char *string = argv[1];
const char *substring = substr(string,6,80);
printf("string is [%s] substring is [%s]\n",string,substring);
return 0;
}
const char* substr(const char *string, size_t from, size_t to)
{
if (to <= from)
return NULL;
if (from >= to)
return NULL;
if (string == NULL)
return NULL;
if (strlen(string) == 0)
return NULL;
if (from < 0)
from = 0;
if (to > strlen(string))
to = strlen(string);
char *substring = malloc(sizeof(char) * ((to-from)+1));
size_t index;
for (index = 0; from < to; from++, index++)
substring[index] = string[from];
substring[index] = '\0';
return substring;
}
I need to construct a path to a file from two strings. I could use this (not tested, though):
/* DON'T USE THIS CODE! */
/* cmp means component */
char *path_cmp1 = "/Users/john/";
char *path_cmp2 = "foo/bar.txt";
unsigned len = strlen(path_cmp1);
char *path = path_cmp1;
for (int i = 0; i < strlen(path_cmp2); i++) {
path[len + i] = path_cmp2[i];
}
but this could lead to memory corruption I guess. Is there a better way to do this, or is there a function for this in the standard library?
#include <stdlib.h>
#include <string.h>
char *join(const char* s1, const char* s2)
{
char* result = malloc(strlen(s1) + strlen(s2) + 1);
if (result) // thanks #pmg
{
strcpy(result, s1);
strcat(result, s2);
}
return result;
}
This is simple enough to be written in place, especially when you have multiple strings to concatenate.
Note that these functions return their destination argument, so you can write
char* result = malloc(strlen(s1) + strlen(s2) + 1);
assert(result);
strcat(strcpy(result, s1), s2);
but this is less readable.
#include <stdio.h>
char *a = "hello ";
char *b = "goodbye";
char *joined;
asprintf(&joined, "%s%s", a, b)
There are several problems in this code:
1 - calling strlen on the for loop is a bad idea, it will calculate the string length every iteration, so it is better to call it once before the loop and keep the result in a variable.
2 - The same strlen problem applies to strlen(path_cmp1) inside the loop, call it before the loop and increments its size.
In the end, it is better to simple copy both strings and store those on a dynamic allocated string, like:
char *join_strings(const char* s1, const char* s2)
{
size_t lens1 = strlen(s1);
size_t lens2 = strlen(s2);
//plus 1 for \0
char *result = malloc(lens1 + lens2 + 1);
if(result)
{
memcpy(result, s1, lens1);
memcpy(result+lens1, s2, lens2+1);
}
//do not forget to call free when do not need it anymore
return result;
}
There are strcat and strncat for that.
char *path_cmp1 = "/Users/john/";
char *path_cmp2 = "foo/bar.txt";
int firstLength = strlen(path_cmp1);
int secondLength = strlen(path_cmp2);
char *both = malloc(firstLength+secondLength+1);
memcpy(both, path_cmp1, firstLength);
memcpy(both+firstLength, path_cmp2, secondLength+1);
// this +1 copyes the second string's null-terminator too.
create a new string with the length of both inputs and strcpy/strcat the inputs and don't forget the null terminator.
Use strcat. (You are right that your code will lead to memory corruption.)
How about strcat in string.h?
path is just a pointer to path_cmp1 and you are trying to access beyond the end of the array. Very occasionally this will work, but in the vast majority of cases you will cause memory corruption.
As others have pointed out use strcat to concatenate strings.