return substring of string

return substring of string - c

I have a large string, where I want to use pieces of it but I don't want to necessarily copy them, so I figured I can make a structure that marks the beginning and length of the useful chunk from the big string, and then create a function that reads it.
struct descriptor {
int start;
int length;
};
So far so good, but when I got to writing the function I realized that I can't really return the chunk without copying into memory...
char* getSegment(char* string, struct descriptor d) {
char* chunk = malloc(d.length + 1);
strncpy(chunk, string + d.start, d.length);
chunk[d.length] = '\0';
return chunk;
}
So the questions I have are:
Is there any way that I can return the piece of string without copying it
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?

Answering your two questions:
No
The caller should provide buffer for the copied string
I would personally pass the pointer to the descrpiptor
char* getSegment(const char* string, const char *buff, struct descriptor *d)

Is there any way that I can return the piece of string without copying it
A string includes the terminating null character, so unless the part code wants is the tail, a pointer to a "piece of string" and still be a string, is not possible.
how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Create temporary space with a variable length array (since C99 and optional supported in C11). Good until the end of the block. At which point, the memory is released and should not be further used.
char* getSegment(char* string, struct descriptor d, char *dest) {
// form result in `dest`
return dest;
}
Usage
char *t;
{
struct descriptor des = bar();
char *large_string = foo();
char sub[des.length + 1u]; //VLA
t = getSegment(large_string, des, sub);
puts(t); // use sub or t;
}
// do not use `t` here, invalid pointer.
Recall size is of concern. If code is returning large sub-strings, best to malloc() a buffer and oblige the calling code to free it when done.

Is there any way that I can return the piece of string without copying it
You're right that if you want to use the chunks in conjunction with any of the many C functions that expect to work with null-terminated character arrays, then you have to make copies. Otherwise, adding the terminators modifies the original string.
If you're prepared to handle the chunks as fixed-length, unterminated arrays, however, then you can represent them without copying as a combination of a pointer to the first character and a length. Some standard library functions work with user-specified string lengths, thus supporting operations on such segments without null termination. You would need to be very careful with them, however.
If you take that approach, I would recommend colocating the pointer and length in a structure. For example,
struct string_segment {
char *start;
size_t length;
};
You could declare variables of this type, pass and return objects of this type, and create compound literals of this type without any dynamic memory allocation, thus avoiding opening any avenue for memory leakage.
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Returning dynamically-allocated objects does not automatically create a memory leak -- it merely confers a responsibility on the caller to free the allocated memory. It is when the caller fails to either satisfy that responsibility or pass it on to other code that a memory leak occurs. Several standard library functions indeed do return dynamically-allocated objects, and it's not so unusual in third-party libraries. The canonical example (other than malloc() itself) would probably be the POSIX-standard strdup() function.
If your function returns a pointer to a dynamically-allocated object -- whether a copied string, or a chunk definition structure -- then it should document the responsibility to free that falls on callers. You must ensure that you satisfy your obligation when you call it from your own code, but having clearly documented the function's behavior, you cannot take responsibility for errors other callers may make by failing to fulfill their obligations.

Related

Return a string allocated with malloc?

I'm creating a function that returns a string. The size of the string is known at runtime, so I'm planning to use malloc(), but I don't want to give the user the responsibility for calling free() after using my function's return value.
How can this be achieved? How do other functions that return strings (char *) work (such as getcwd(), _getcwd(), GetLastError(), SDL_GetError())?

Your challenge is that something needs to release the resources (i.e. cause the free() to happen).
Normally, the caller frees the allocated memory either by calling free() directly (see how strdup users work for instance), or by calling a function you provide the wraps free. You might, for instance, require callers to call a foo_destroy function. As another poster points out you might choose to wrap that in an opaque struct, though that's not necessary as having your own allocation and destroy functions is useful even without that (e.g. for resource tracking).
However, another way would be to use some form of clean-up function. For instance, when the string is allocated, you could attach it to a list of resources allocated in a pool, then simply free the pool when done. This is how apache2 works with its apr_pool structure. In general, you don't free() anything specifically under that model. See here and (easier to read) here.
What you can't do in C (as there is no reference counting of malloc()d structures) is directly determine when the last 'reference' to an object goes out of scope and free it then. That's because you don't have references, you have pointers.
Lastly, you asked how existing functions return char * variables:
Some (like strdup, get_current_dir_name and getcwd under some circumstances) expect the caller to free.
Some (like strerror_r and getcwd in under other circumstances) expect the caller to pass in a buffer of sufficient size.
Some do both: from the getcwd man page:
As an extension to the POSIX.1-2001 standard, Linux (libc4, libc5, glibc) getcwd() allocates the buffer dynamically
using malloc(3) if buf is NULL. In this case, the allocated buffer has the length size unless size is zero, when
buf is allocated as big as necessary. The caller should free(3) the returned buffer.
Some use an internal static buffer and are thus not reentrant / threadsafe (yuck - do not do this). See strerror and why strerror_r was invented.
Some only return pointers to constants (so reentrancy is fine), and no free is required.
Some (like libxml) require you to use a separate free function (xmlFree() in this case)
Some (like apr_palloc) rely on the pool technique above.

Many libraries force the user to deal with memory allocation. This is a good idea because every application has its own patterns of object lifetime and reuse. It's good for the library to make as few assumptions about its users as possible.
Say a user wants to call your library function like this:
for (a lot of iterations)
{
params = get_totally_different_params();
char *str = your_function(params);
do_something(str);
// now we're done with this str forever
}
If your libary mallocs the string every time, it is wasting a lot of effort calling malloc, and possibly showing poor cache behavior if malloc picks a different block each time.
Depending on the specifics of your library, you might do something like this:
int output_size(/*params*/);
void func(/*params*/, char *destination);
where destination is required to be at least output_size(params) size, or you could do something like the socket recv API:
int func(/*params*/, char *destination, int destination_size);
where the return value is:
< desination_size: this is the number of bytes we actually used
== destination_size: there may be more bytes waiting to output
These patterns both perform well when called repeatedly, because the caller can reuse the same block of memory over and over without any allocations at all.

There is no way to do this in C. You have to either pass a parameter with size information, so that malloc() and free() can be called in the called function, or the calling function has to call free after malloc().
Many object oriented languages (eg. C++) handle memory in such a way as to do what you want to, but not C.
Edit
By size information as an argument, I mean something to let the called function know the how many bytes of memory are owned by the pointer you are passing. This can be done by looking directly at the called string if it has already been assigned a value, such as:
char test1[]="this is a test";
char *test2="this is a test";
when called like this:
readString(test1); // (or test2)
char * readString(char *abc)
{
int len = strlen(abc);
return abc;
}
Both of those arguments will result in len = 14
However if you create a non populated variable, such as:
char *test3;
And allocate the same amount of memory, but do not populate it, for example:
test3 = malloc(strlen("this is a test") +1);
There is no way for the called function to know what memory has been allocated. The variable len will == 0 inside the 1st prototype of readString(). However, if you change the prototype readString() to:
readString(char *abc, int sizeString); Then size information as an argument can be used to create memory:
void readString(char *abc, size_t sizeString)
{
char *in;
in = malloc(sizeString +1);
//do something with it
//then free it
free(in);
}
example call:
int main()
{
int len;
char *test3;
len = strlen("this is a test") +1; //allow for '\0'
readString(test3, len);
// more code
return 0;
}

You cannot do this in C.
Return a pointer and it is up to the person calling the function to call free
Alternatively use C++. shared_ptr etc

You can wrap it in a opaque struct.
Give the user access to pointers to your struct but not its internal. Create a function to release resources.
void release_resources(struct opaque *ptr);
Of course the user needs to call the function.

You could keep track of the allocated strings and free them in an atexit routine (http://www.tutorialspoint.com/c_standard_library/c_function_atexit.htm). In the following, I have used a global variable but it could be a simple array or list if you have one handy.
#include <stdlib.h>
#include <string.h>
#include <malloc.h>
char* freeme = NULL;
void AStringRelease(void)
{
if (freeme != NULL)
free(freeme);
}
char* AStringGet(void)
{
freeme = malloc(20);
strcpy(result, "A String");
atexit(AStringRelease);
return freeme;
}

Memory leak with strings?

I'm new to C, so this may be obvious, but I'm still not sure. Java took care of this for me ^^
I have a table of replacements, input string, and a function str_replace which does some work on the string. str_replace internally calls malloc to get space for the new string (it returns a newly allocated char*.
char* color_tags(char* s) {
char* out = s;
// in real, the table is much longer
static char* table[4][2] = {
{"<b>", BOLD},
{"<u>", UNDERLINE},
{"</b>", BOLD_R},
{"</u>", UNDERLINE_R},
};
for(int r=0; r<4; r++) {
// here's what bothers me
out = str_replace(table[r][0], table[r][1], out);
}
return out;
}
As you can see, the char* out is replaced by pointer to the new string, so the old string apparently ends up as a memory leak - if I don't understand it totally wrong.
What would be a better way for this?

[This is more of a comment than an answer — abacabadabacaba has already posted the answer — but I hope it will clarify things a bit.]
I would argue that the memory leak is in this statement:
str_replace internally calls malloc to get space for the new string […]
[emphasis mine] Memory management is such a fundamental concern in C that if a function allocates memory that it itself doesn't de-allocate, then that is a major property of the function, and one that needs to be documented up-front, together with information about what the caller is supposed to do about it. It should not be considered "internal" to the function, and you shouldn't have the read the entirety of the function's source-code in order to determine it. It's enough to make me suspicious of the rest of the function (and indeed, a quick glance at that function is enough to notice a lot of issues: its parameter-types should be const char * rather than char *; it should check the return-value of malloc; it could be made more efficient by keeping track of the tail of new_subject, or cleaner by using strcat, instead of the current worst-of-both-worlds; etc.).
You didn't write str_replace originally, but you can modify your own version, so you should change its documentation from this:
Search and replace a string with another string , in a string
to something like this:
Creates and returns a copy of subject, but with all occurrences of the substring search replaced by replace. The returned string is newly allocated using malloc; the caller should use free.
(Your color_tags function will need similar documentation, since it too returns a newly-allocated string using malloc.)
That documentation in hand, there's a clear chain of "ownership": the caller of str_replace takes ownership of the string it returns. So color_tags has to call free for every string returned by str_replace, except the string that color_tags itself will return (which in turn will be "owned" by the caller of color_tags). Hence abacabadabacaba's answer.

The code leaks 3 strings in total: one after each of the iterations except the last one. The solution is to deallocate each of those strings after its use. The code may look like this:
for(int r=0; r<4; r++) {
char* new_out = str_replace(table[r][0], table[r][1], out);
if (r>0) {
// out is an intermediate value which will never be used again, free it
free(out);
}
out = new_out;
}

How to assign string value in C from void *?

I'm implementing a CMap in C, and part of this entails storing information in a linked-list type of structure that I manually manage the memory of. So the first 4 bytes of this struct is a pointer to the next struct, the next section is the string (key), and the final section is the value.
Say void *e = ptr defines one such linked list.
Then, ptr + 4 refers to the beginning of the string section.
I want to assign that string value to another string, and what I've done so far is:
char *string = (char *)ptr + 4;
However, I don't think this is right.

If you want to point to the same string your code is fine, assuming pointers are always 4 bytes wide.
If you want to copy the contents of the string use malloc and strcpy to create a new string.

Just reference struct instead of calculating offsets.
//if data is structured this way
struct struct_list_el
{
struct list_el * next;
char* str;
int value;
};
typedef struct struct_list_el list_el;
// than from void_pointer
list_el* el;
el = (list_el*) void_pointer;
char * string;
string = el->str;

#ralu is right that you should be using a struct. But you should also be very careful when copying strings. In C there is no first-class string object like in C++, Java, Python, and well, everything else. :)
In C, character pointers (char*) are often used as strings, but they are really just pointers to null-terminated arrays of bytes in memory somewhere. Copying a character pointer is not the same as copying the underlying array of characters. To do that, you need to provide memory for the characters of the copy. This memory can be on the stack (a local array), or the heap (created with malloc), or some other buffer.
You'll need to measure the length of the string before you do anything to make sure that the target buffer can hold it. Be sure to add one to the length so that there is room for the terminating null.
Also note that the standard library functions (strlen, strcpy, strncpy, strcat, snprintf, strdup, etc.) are slightly incompatible with each other regarding the terminating null. For example, strlen returns the number of characters, excluding the terminating null, so buffers need to be one byte larger than what it returns to hold things. Also, strncpy does not guarantee null termination while snprintf does. Misuse of these functions and C strings in general is the cause of a significant number of security breaches (not to mention bugs) in computer systems today.
Unless you build or use a solid library, string and list manipulation in C is tedious and error-prone. You can see why C++ and all those other languages were invented.

Dealing with returning C strings

What is considered better practice when writing methods that return strings in C?
passing in a buffer and size:
void example_m_a(type_a a,char * buff,size_t buff_size)
or making and returning a string of proper size:
char * example_m_b(type_a a)
P.S. what do you think about returning the buffer ptr to allow assignment style and
nested function calls i.e.
char * example_m_a(type_a a,char * buff,size_t buff_size)
{
...
return buff;
}

Passing a buffer as an argument solves most the problems this type of code can run into.
If it returns a pointer to a buffer, then you need to decide how it is allocated and if the caller is responsible for freeing it. The function could return a static pointer that doesn't need to be freed, but then it isn't thread safe.

Passing a buffer and a size is generally less error-prone, especially if the sizes of your strings are typically of a "reasonable" size. If you dynamically allocate memory and return a pointer, the caller is responsible for freeing the memory (and must remember to use the corresponding free function for the memory depending on how the function allocated it).
If you examine large C APIs such as Win32, you will find that virtually all functions that return strings use the first form where the caller passes a buffer and a size. Only in limited circumstances might you find the second form where the function allocates the return value (I can't think of any at the moment).

I'd prefer the second option because it allows the function to decide how big a buffer is needed. Often the caller is not in a position to take that decision.

Another alternative to the pass a buffer and size style, using a return code:
size_t example_m_a(type_a a,char * buff,size_t buff_size)
A zero return code indicates that the caller's buffer was suitable and has been filled in.
A return code > 0 indicates that the caller's buffer was too small and reveals the size that is actually needed, allowing the caller to resize his buffer and retry.

Passing buffer address and length is best in most cases. It is less error-prone and one does not have to worry about memory leaks. In fact, in some tight embedded systems it is completely undesirable to use the heap. However, the function must not overrun the buffer as that can crash the system and worse: make it vulnerable to hackers.
The only time where I've seen function returning allocated buffer is libxml's API to generate XML text from xmlDoc.

C when to allocate and free memory - before function call, after function call...etc

I am working with my first straight C project, and it has been a while since I worked on C++ for that matter. So the whole memory management is a bit fuzzy.
I have a function that I created that will validate some input. In the simple sample below, it just ignores spaces:
int validate_input(const char *input_line, char** out_value){
int ret_val = 0; /*false*/
int length = strlen(input_line);
out_value =(char*) malloc(sizeof(char) * length + 1);
if (0 != length){
int number_found = 0;
for (int x = 0; x < length; x++){
if (input_line[x] != ' '){ /*ignore space*/
/*get the character*/
out_value[number_found] = input_line[x];
number_found++; /*increment counter*/
}
}
out_value[number_found + 1] = '\0';
ret_val = 1;
}
return ret_val;
}
Instead of allocating memory inside the function for out_value, should I do it before I call the function and always expect the caller to allocate memory before passing into the function? As a rule of thumb, should any memory allocated inside of a function be always freed before the function returns?

I follow two very simple rules which make my life easier.
1/ Allocate memory when you need it, as soon as you know what you need. This will allow you to capture out-of-memory errors before doing too much work.
2/ Every allocated block of memory has a responsibility property. It should be clear when responsibility passes through function interfaces, at which point responsibility for freeing that memory passes with the memory. This will guarantee that someone has a clearly specified requirement to free that memory.
In your particular case, you need to pass in a double char pointer if you want the value given back to the caller:
int validate_input (const char *input_line, char **out_value_ptr) {
: :
*out_value_ptr =(char*) malloc(length + 1); // sizeof(char) is always 1
: :
(*out_value_ptr)[number_found] = input_line[x];
: :
As long as you clearly state what's expected by the function, you could either allocate the memory in the caller or the function itself. I would prefer outside of the function since you know the size required.
But keep in mind you can allow for both options. In other words, if the function is passed a char** that points to NULL, have it allocate the memory. Otherwise it can assume the caller has done so:
if (*out_value_ptr == NULL)
*out_value_ptr =(char*) malloc(length + 1);

You should free that memory before the function returns in your above example. As a rule of thumb you free/delete allocated memory before the scope that the variable was defined in ends. In your case the scope is your function so you need to free it before your function ends. Failure to do this will result in leaked memory.
As for your other question I think it should be allocated going in to the function since we want to be able to use it outside of the function. You allocate some memory, you call your function, and then you free your memory. If you try and mix it up where allocation is done in the function, and freeing is done outside it gets confusing.

The idea of whether the function/module/object that allocates memory should free it is somewhat of a design decision. In your example, I (personal opinion here) think it is valid for the function to allocate it and leave it up to the caller to free. It makes it more usable.
If you do this, you need to declare the output parameter differently (either as a reference in C++ style or as char** in C style. As defined, the pointer will exist only locally and will be leaked.

A typical practice is to allocate memory outside for out_value and pass in the length of the block in octets to the function with the pointer. This allows the user to decide how they want to allocate that memory.
One example of this pattern is the recv function used in sockets:
ssize_t recv(int socket, void *buffer, size_t length, int flags);

Here are some guidelines for allocating memory:
Allocate only if necessary.
Huge objects should be dynamically
allocated. Most implementations
don't have enough local storage
(stack, global / program memory).
Set up ownership rules for the
allocated object. Owner should be
responsible for deleting.
Guidelines for deallocating memory:
Delete if allocated, don't delete
objects or variables that were not
dynamically allocated.
Delete when not in use any more.
See your object ownership rules.
Delete before program exits.

In this example you should be neither freeing or allocating memory for out_value. It is typed as a char*. Hence you cannot "return" the new memory to the caller of the function. In order to do that you need to take in a char**
In this particular scenario the buffer length is unknown before the caller makes the call. Additionally making the same call twice will produce different values since you are processing user input. So you can't take the approach of call once get the length and call the second time with the allocated buffer. Hence the best approach is for the function to allocate the memory and pass the responsibility of freeing onto the caller.

First, this code example you give is not ANSI C. It looks more like C++. There is not "<<" operator in C that works as an output stream to something called "cout."
The next issue is that if you do not free() within this function, you will leak memory. You passed in a char * but once you assign that value to the return value of malloc() (avoid casting the return value of malloc() in the C programming language) the variable no longer points to whatever memory address you passed in to the function. If you want to achieve that functionality, pass a pointer to a char pointer char **, you can think of this as passing the pointer by reference in C++ (if you want to use that sort of language in C, which I wouldn't).
Next, as to whether you should allocate/free before or after a function call depends on the role of the function. You might have a function whose job it is to allocate and initialize some data and then return it to the caller, in which case it should malloc() and the caller should free(). However, if you are just doing some processing with a couple of buffers like, you may tend to prefer the caller to allocate and deallocate. But for your case, since your "validate_input" function looks to be doing nothing more than copying a string without the space, you could just malloc() in the function and leave it to the caller. Although, since in this function, you simply allocate the same size as the whole input string, it almost seems as if you might as well have the caller to all of it. It all really depends on your usage.
Just make sure you do not lose pointers as you are doing in this example

Some rough guidelines to consider:
Prefer letting the caller allocate the memory. This lets it control how/where that memory is allocated. Calling malloc() directly in your code means your function is dictating a memory policy.
If there's no way to tell how much memory may be needed in advance, your function may need to handle the allocation.
In cases where your function does need to allocate, consider letting the caller pass in an allocator callback that it uses instead of calling malloc directly. This lets your function allocate when it needs and as much as it needs, but lets the caller control how and where that memory is allocated.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight