Raw Heap String - c

I'm trying to wrap my head around the concept of heap-management in C. The following code compiles w/o warnings and runs w/o errors.
char* string_make(char* text) {
size_t len = strlen(text) + 1;
char* str = malloc(len);
memcpy(str, text, len);
return str;
}
char* string_concat(char* x, char* y) {
size_t len_x = strlen(x);
size_t len_y = strlen(y);
x = realloc(x, len_x + len_y + 1);
memcpy(x+len_x, y, len_y+1);
return x;
}
int main (int argc, char const *argv[]) {
char* first = string_make("funny");
char* second = string_make(" duck");
char* third = string_make("! c++");
//
printf("%s\n", string_concat(first, second));
printf("%s\n", string_concat(first, third));
//
free(first);
free(second);
free(third);
}
I have a couple of questions:
Is string_make() doing anything illegal or undefined?
Is string_concat() doing anything illegal or undefined?
I just want to create a simple heap string, which could be increased/decreased at different stages of program as per requirements.
Thanks.
EDIT:
If I change string_concat() calls in main to the following:
first = string_concat(first, second);
first = string_concat(first, third);
would it make things legit?

string_make is fine.
string_concat is not. It reallocates x, which possibly means that a bigger chunk of memory is allocated elsewhere, and the original chunk of memory is marked as free. However, functions in C do not change their arguments, therefore when string_concat returns, x possibly points to a location which is marked as free.

Ignoring error checking (after malloc and realloc), all you need to do is replace these two lines in main:
printf("%s\n", string_concat(first, second));
printf("%s\n", string_concat(first, third));
with these lines
first = string_concat(first, second);
printf("%s\n", first);
first = string_concat(first, third);
printf("%s\n", first);
The reason is that arguments are passed by value in C. So updating x in the string_concat does not update variable first in main. So the code needs to update first using the return value from the function.
Now, you may be confused because you tested your code and it seemed to work. That's because your final string was only 16 bytes, including the NUL terminator. Most modern implementations of malloc will round up the size to a multiple of 16 (or some larger power of 2). That means that all three calls to string_make returned pointers to memory regions of 16 bytes, even though you requested less. And that also means that realloc can expand the buffer without moving it. As the man page explains
If there is not enough room to enlarge the memory allocation
pointed to by ptr, realloc() creates a new allocation, copies as much
of the old data pointed to by ptr as will fit to the new allocation,
frees the old allocation, and returns a pointer to the allocated
memory.
In your case, there was enough room to enlarge the memory allocation, so realloc returned the same pointer that was passed in. And as a result, your code would appear to work even though it has a serious flaw.

string_concat(): If using realloc() I would pass the address of the pointer to the string which has to be realloc-ed, but return the reallocated string itself. So I would avoid the return value pointing at location possibly marked as free. In case realloc returns NULL, function returns pointer to the original string.
char* string_concat(char** x, char* y) {
size_t len_x = strlen(*x);
size_t len_y = strlen(y);
char *temp = realloc(*x, len_x + len_y + 1);
if(!temp) return *x; //or do some more appropriate error handling
memcpy(temp+len_x, y, len_y+1);
*x = temp;
return *x;
}
and
printf("%s\n", string_concat(&first, second));
printf("%s\n", string_concat(&first, third));
free(first);
free(second);
free(third);
It is well explained here.

Related

How do I modify the contents of a string literal without using brackets in C?

Disclaimer: this is for a homework assigment.
Say I have a string that was declared like this:
char *string1;
For part of my program, I need to set string1 equal to another string, string2. I can't use strcpy or use brackets.
This is my code so far:
int i;
for(i = 0; *(string2 + i) != '\0'; i++){
*(string1 + i) = *(string2 + i);
}
This causes a segmentation fault.
According to https://www.geeksforgeeks.org/storage-for-strings-in-c/ , this is because string1 was declared like this: char *string1 and a workaround to avoid segfaults is to use brackets. I can't use brackets, so is there any workaround that I can do?
EDIT: I am also prohibited from allocating more memory or declaring arrays. I cant use malloc(), falloc() etc.
The issue you are having is that string2 does not have memory allocated to it.
Your code is missing some details, but I'll assume it looks something like this:
#include <stdio.h>
int main()
{
char *originalStr = "Hello NewArsenic";
char *newStr;
// YMMV depending on the compiler for this line. Might print (null) for
// newStr or it might throw an error.
printf("Original: %s\nNew: %s\n", originalStr, newStr);
int i;
for (i = 0; *(originalStr + i) != '\0'; i++)
{
*(newStr + i) = *(originalStr + i);
}
printf("Original: %s\nNew: %s\n", originalStr, newStr);
return 0;
}
TL;DR Your Issue
Your issue here is that you are attempting to store some values into newStr without having the memory to do so.
Solution
Use malloc.
#include <stdio.h>
#include <stdlib.h> // malloc(size_t) is in stdlib.h
#include <string.h> // strlen(const char *) is in string.h
int main()
{
char *originalStr = "Hello NewArsenic";
// Note here that size_t is preferable to int for length.
// Generally you want to be using size_t if you are working with size/length.
// More info at https://stackoverflow.com/questions/19732319/difference-between-size-t-and-unsigned-int
size_t originalLength = strlen(originalStr);
// This is malloc's typical usage, where we are asking from the system to
// give us originalLength + 1 many chars.
// The `char` here is redundant, actually, since sizeof(char) is defined to
// be one by the C spec, but you might find it useful to see the typical
// usage of `malloc`.
// Since malloc returns a void *, we need to cast that to a char *.
char *newStr = (char *)malloc((originalLength + 1) * sizeof(char));
// Your code stays the same.
printf("Original: %s\nNew: %s\n", originalStr, newStr);
size_t i;
for (i = 0; *(originalStr + i) != '\0'; i++)
{
*(newStr + i) = *(originalStr + i);
}
// Don't forget to append a null character like I did before editing!
*(newStr + originalLength) = 0;
printf("Original: %s\nNew: %s\n", originalStr, newStr);
// Because `malloc` gives us memory on the stack, we need to tell the system
// that we want to free it before exiting.
free(newStr);
return 0;
}
The long answer
What is a C String?
In C, a string is merely an array of characters. What this means is that for each character you want to have have, you need to allocate memory.
Memory
In C, there are two types of memory allocation - stack- and heap-based.
Stack Memory
You're probably more familiar with stack-based memory than you think. Whenever you declare a variable, you're defining it on the stack. Arrays declared with bracket notation type array[size_t] are stack-based too. What's specific about stack-based memory allocation is that when you allocate memory, it will only last for as long as the function in which it was declared, as you're probably familiar with. This means that you don't have to worry about your memory sticking around for longer than it should.
Heap Memory
Now heap-based memory allocation is different in the sense that it will persist until it is cleared. This is advantageous in one way:
You can keep values of which you don't know the size at compile time.
But, that comes at a cost:
The heap is slower
You have to manually clear your memory once you're done with it.
For more info, check out this thread.
We typically use the function (void *) malloc(size_t) and its sister (void *) calloc(size_t, size_t) for allocating heap memory. To free the memory that we asked for from the system, use free(void *).
Alternatives
You could've also used newStr = originalStr, but that would not actually copy the string, but only make newStr point to originalStr, which I'm sure you're aware of.
Other remarks
Generally, it's an anti-pattern to do:
char* string = "literal";
This is an anti-pattern because literals cannot be edited and shouldn't be. Do:
char const* string = "literal";
See this thread for more info.
Avoid using int in your loop. Use size_t See this thread.
For part of my program, I need to set string1 equal to another string, string2. I can't use strcpy or use brackets.
Perhaps the solution is just as simple as
string2 = string1
Note that this assignes the string2 pointer to point directly to the same memory as string1. This is sometimes very helpful because you need to maintain the beginning of the string with string1 but also need another pointer to move inside the string with things like string2++.
One way or another, you have to point string2 at an address in memory that you have access to. There are two ways to do this:
Point at memory that you already have access to through another variable either with another pointer variable or with the address-of & operator.
Allocate memory with malloc() or related functions.

strtok() with realloc() weird behaviour

I have the following program written in C:
...
char *answer = NULL;
char *pch = strtok(phrase, " "); // phrase is a string with possibly many words
while (pch) {
char *tmp = translate_word(pch); // returns a string based on pch
void *ptr = realloc(answer, sizeof(answer) + sizeof(tmp) + 1000); // allocate space to answer
if (!ptr) // If realloc fails
return -1;
strcat(answer, tmp); // append tmp to answer
pch = strtok(NULL, " "); // find next word
}
...
The problem is that strtok() shows weird behavior, it returns a word that does not exist in the phrase string but is part of the answer string.
On the other hand, when I change the following line:
void *ptr = realloc(answer, sizeof(answer) + sizeof(tmp) + 1000);
to:
void *ptr = realloc(answer, sizeof(answer) + sizeof(tmp) + 1);
strok() works as expected.
How is it possible that realloc() affects strtok() in this case? They do not even use the same variables. Looking forward to your insights.
The realloc function could move the memory that was previously allocated. After the call, the pointer to the allocated memory is returned and the pointer value passed to it, if it differs, is no longer valid. So when you call strcat(answer, tmp); you're potentially writing to freed memory which invokes undefined behavior, and in this case it manifests as the strange output you're seeing.
After checking the return value of realloc, assign that value back to answer.
Also, sizeof(answer) and sizeof(tmp) give you the size of the pointer, not the size of what it points to. You instead want to use strlen to get the length of the string then contain. And while we're at it, lets just add 1 to this instead of 1000 because that's all you actually need.
void *ptr = realloc(answer, strlen(answer) + strlen(tmp) + 1);
if (!ptr)
return -1;
answer = ptr;
strcat(answer, tmp);
One more issue: the first time realloc is called the memory is completely uninitialized. Subsequently calling strcat on it depends on answer containing a null terminated string. It doesn't so this also invokes undefined behavior.
This can be fixed by malloc-ing a single byte to start and setting it to 0, that way you start with an empty string.
char *answer = malloc(1);
if (!answer) return -1;
answer[0] = 0;
sizeof(answer) & sizeof(tmp) gives you sizes of the pointers.
You need to use strlen instead
additionally...
char *answer = NULL;
... either:
... strlen(answer) ...
strcat(answer, tmp);
These SHOULD fail, with a segmentation violation, but may not depending on the OS. Dereferencing NULL is never a good idea.
In short, you need to either know you have assigned something to answer, or to check if answer is NULL.

Writing a string-concat: How to convert character array to pointer

I am learning C and I have written the following strcat function:
char * stringcat(const char* s1, const char* s2) {
int length_of_strings = strlen(s1) + strlen(s2);
char s3[length_of_strings + 1]; // add one for \0 at the end
int idx = 0;
for(int i=0; (s3[idx]=s1[i]) != 0; idx++, i++);
for(int i=0; (s3[idx]=s2[i]) != 0; idx++, i++);
s3[idx+1] = '\0';
// s3 is a character array;
// how to get a pointer to a character array?
char * s = s3;
return s;
}
That part that looks odd to me is where I have to "re-assign" the character array to a pointer, otherwise C complains that my return is a memory address. I also tried "casting" the return value to (char *) s3, but that didn't work either.
What is the most common way to do this "conversion"? Is this a common pattern in C programs?
There are many ways to handle this situation, but returning a pointer to stack-allocated memory inside the function isn't one of them (the behavior is undefined; consider this memory untouchable once the function returns).
One approach is to allocate heap memory using malloc inside the function, build the result string, then return the pointer to the newly allocated memory with the understanding that the caller is responsible for freeing the memory.
Here's an example of this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *stringcat(const char* s1, const char* s2) {
int i = 0;
int s1_len = strlen(s1);
int s2_len = strlen(s2);
char *result = malloc(s1_len + s2_len + 1);
result[s1_len+s2_len] = '\0';
for (int j = 0; j < s1_len; j++) {
result[i++] = s1[j];
}
for (int j = 0; j < s2_len; j++) {
result[i++] = s2[j];
}
return result;
}
int main(void) {
char *cat = stringcat("hello ", "world");
printf("%s\n", cat); // => hello world
free(cat);
return 0;
}
Another approach is for the caller to handle all of the memory management, which is similar to how strcat behaves:
/* Append SRC on the end of DEST. */
char *
STRCAT (char *dest, const char *src)
{
strcpy (dest + strlen (dest), src);
return dest;
}
man says:
The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs.
The problem isn't converting from array to pointer; that happens all the time implicitly, and it's no big deal. Your problem is you've just returned a pointer to invalid memory. The array you allocated in the function disappears when the function returns, and dereferencing a pointer to that array is undefined behavior (returning the pointer isn't technically illegal, but any good compiler warns you, because a pointer that is never dereferenced is usually pretty useless).
If you want to return a new array with the concatenated string, you must use dynamically allocated memory, e.g. from malloc/calloc; making the array static would also work (it would now be persistent global memory), but it would make your function both non-reentrant and non-threadsafe, so it's usually frowned on.
Your little trick of assigning to a pointer and returning the pointer may have fooled the compiler into thinking you weren't doing anything illegal, but it did nothing to make your code safer.
You might be used to languages with more dynamic memory handling, but your function here won't work because C strings are just a block of local memory which disappears when you return. That means that whatever you write to char s3[] will disappear after the return (the details vary and the memory can sometimes stick around long enough for you to think it worked even when it didn't).
Normally you'd want to allocate the memory before calling the function, and pass it in as a parameter, as in:
void stringcat(const char * first, const char * second, char * dest, const size_t dest_len)
Called like this:
char title[] = "Mr. ";
char last[] = "Jones";
char addressname[sizeof(title) + sizeof(last)];
stringcat(title, last, addressname, sizeof(addressname));
The other way to do it is to allocate the memory in the function using malloc(), and return that, but you have to remember to free it in the code when you're done with it.

Memory, pointers, and pointers to pointers

I am working on a short program that reads a .txt file. Intially, I was playing around in main function, and I had gotten to my code to work just fine. Later, I decided to abstract it to a function. Now, I cannot seem to get my code to work, and I have been hung up on this problem for quite some time.
I think my biggest issue is that I don't really understand what is going on at a memory/hardware level. I understand that a pointer simply holds a memory address, and a pointer to a pointer simply holds a memory address to an another memory address, a short breadcrumb trail to what we really want.
Yet, now that I am introducing malloc() to expand the amount of memory allocated, I seem to lose sight of whats going on. In fact, I am not really sure how to think of memory at all anymore.
So, a char takes up a single byte, correct?
If I understand correctly, then by a char* takes up a single byte of memory?
If we were to have a:
char* str = "hello"
Would it be say safe to assume that it takes up 6 bytes of memory (including the null character)?
And, if we wanted to allocate memory for some "size" unknown at compile time, then we would need to dynamically allocate memory.
int size = determine_size();
char* str = NULL;
str = (char*)malloc(size * sizeof(char));
Is this syntactically correct so far?
Now, if you would judge my interpretation. We are telling the compiler that we need "size" number of contiguous memory reserved for chars. If size was equal to 10, then str* would point to the first address of 10 memory addresses, correct?
Now, if we could go one step further.
int size = determine_size();
char* str = NULL;
file_read("filename.txt", size, &str);
This is where my feet start to leave the ground. My interpretation is that file_read() looks something like this:
int file_read(char* filename, int size, char** buffer) {
// Set up FILE stream
// Allocate memory to buffer
buffer = malloc(size * sizeof(char));
// Add characters to buffer
int i = 0;
char c;
while((c=fgetc(file))!=EOF){
*(buffer + i) = (char)c;
i++;
}
Adding the characters to the buffer and allocating the memory is what is I cannot seem to wrap my head around.
If **buffer is pointing to *str which is equal to null, then how do I allocate memory to *str and add characters to it?
I understand that this is lengthy, but I appreciate the time you all are taking to read this! Let me know if I can clarify anything.
EDIT:
Whoa, my code is working now, thanks so much!
Although, I don't know why this works:
*((*buffer) + i) = (char)c;
So, a char takes up a single byte, correct?
Yes.
If I understand correctly, by default a char* takes up a single byte of memory.
Your wording is somewhat ambiguous. A char takes up a single byte of memory. A char * can point to one char, i.e. one byte of memory, or a char array, i.e. multiple bytes of memory.
The pointer itself takes up more than a single byte. The exact value is implementation-defined, usually 4 bytes (32bit) or 8 bytes (64bit). You can check the exact value with printf( "%zd\n", sizeof char * ).
If we were to have a char* str = "hello", would it be say safe to assume that it takes up 6 bytes of memory (including the null character)?
Yes.
And, if we wanted to allocate memory for some "size" unknown at compile time, then we would need to dynamically allocate memory.
int size = determine_size();
char* str = NULL;
str = (char*)malloc(size * sizeof(char));
Is this syntactically correct so far?
Do not cast the result of malloc. And sizeof char is by definition always 1.
If size was equal to 10, then str* would point to the first address of 10 memory addresses, correct?
Yes. Well, almost. str* makes no sense, and it's 10 chars, not 10 memory addresses. But str would point to the first of the 10 chars, yes.
Now, if we could go one step further.
int size = determine_size();
char* str = NULL;
file_read("filename.txt", size, &str);
This is where my feet start to leave the ground. My interpretation is that file_read() looks something like this:
int file_read(char* filename, int size, char** buffer) {
// Set up FILE stream
// Allocate memory to buffer
buffer = malloc(size * sizeof(char));
No. You would write *buffer = malloc( size );. The idea is that the memory you are allocating inside the function can be addressed by the caller of the function. So the pointer provided by the caller -- str, which is NULL at the point of the call -- needs to be changed. That is why the caller passes the address of str, so you can write the pointer returned by malloc() to that address. After your function returns, the caller's str will no longer be NULL, but contain the address returned by malloc().
buffer is the address of str, passed to the function by value. Allocating to buffer would only change that (local) pointer value.
Allocating to *buffer, on the other hand, is the same as allocating to str. The caller will "see" the change to str after your file_read() returns.
Although, I don't know why this works: *((*buffer) + i) = (char)c;
buffer is the address of str.
*buffer is, basically, the same as str -- a pointer to char (array).
(*buffer) + i) is pointer arithmetic -- the pointer *buffer plus i means a pointer to the ith element of the array.
*((*buffer) + i) is dereferencing that pointer to the ith element -- a single char.
to which you are then assigning (char)c.
A simpler expression doing the same thing would be:
(*buffer)[i] = (char)c;
with char **buffer, buffer stands for the pointer to the pointer to the char, *buffer accesses the pointer to a char, and **buffer accesses the char value itself.
To pass back a pointer to a new array of chars, write *buffer = malloc(size).
To write values into the char array, write *((*buffer) + i) = c, or (probably simpler) (*buffer)[i] = c
See the following snippet demonstrating what's going on:
void generate0to9(char** buffer) {
*buffer = malloc(11); // *buffer dereferences the pointer to the pointer buffer one time, i.e. it writes a (new) pointer value into the address passed in by `buffer`
for (int i=0;i<=9;i++) {
//*((*buffer)+i) = '0' + i;
(*buffer)[i] = '0' + i;
}
(*buffer)[10]='\0';
}
int main(void) {
char *b = NULL;
generate0to9(&b); // pass a pointer to the pointer b, such that the pointer`s value can be changed in the function
printf("b: %s\n", b);
free(b);
return 0;
}
Output:
0123456789

Simple c function, segfaults on free

I know this has been asked quite a bit, but every example I looked at never seemed to fit exactly. In the code below if I keep the free(), the resulting compiled binary segfaults. If I remove it, the code works just fine. My question is, why?
int convertTimeToStr(time_t* seconds, char* str)
{
int rc = 0;
if (str == NULL) {
printf("The passed in char array was null!\n");
rc = 1;
} else {
char* buf = malloc(sizeof(char) * 100);
memset(buf, '\0', sizeof(buf));
buf = asctime(gmtime(seconds));
strcpy(str, buf);
free(buf);
}
return rc;
}
The problem is that you reassign the pointer to your allocated memory. What you're doing is basically equivalent to
int a = 5;
int b = 10;
a = b;
and then wondering why a is no longer equal to 5.
With the assignment buf = asctime(gmtime(seconds)) you lose the original pointer and have a memory leak.
What the asctime function returns is a pointer to a static internal buffer, it's not something you should pass to free.
You should not be surprised by that, since you've changed the value of the pointer buf from what malloc() returned.
char* buf = malloc(sizeof(char) * 100); // value returned by malloc()
memset(buf, '\0', sizeof(buf));
buf = asctime(gmtime(seconds)); // change value of buf
strcpy(str, buf);
free(buf); // buf differs from above
Calling free() with an argument that was not returned from malloc() (or calling it for the second time) is undefined behaviour.
You call malloc and memset, which allocates a buffer and sets it to zeroes, but then you overwrite the value of buf with the return value from asctime. By the time you call free, it is on the return value from asctime, not your original allocation. This has three issues:
You never use the buffer you allocated with malloc for any useful purpose, so you don't need that malloc nor the memset
You lose the pointer returned by malloc so you can never free it. Your process has leaked memory.
You try to free the return value from asctime. The return value from asctime does not need to be freed and should not be freed. This causes undefined behavior, in your case a segfault.

Resources