Adding formatting support to a custom string implementation - C

Adding formatting support to a custom string implementation - C - c

I have a C application (not using C99 features) that does some heavy string processing. Since the string lengths are not known, statically allocated buffers is not an option for me. I have created a simple string implementation which will abstract the null termination and dynamic expansion of buffers.
Here is how it looks like,
struct strbuf {
char *buffer; /* null terminated buffer */
size_t length; /* length of the string excluding null terminator */
size_t allocated; /* total memory allocated */
};
add function which adds the supplied string into the buffer.
int strbuf_add(struct strbuf *string, const char *c)
{
if(string == NULL) return 0;
while(*c != '\0') {
if(!add_char(string, *c++))
return 0;
}
return 1;
}
static int add_char(struct strbuf *string, char c)
{
size_t space_available;
assert(string != NULL);
space_available = string->allocated - string->length;
if(space_available <= 1) {
if(!grow_buffer(string)) {
return 0;
}
}
string->buffer[string->length++] = c;
string->buffer[string->length] = '\0';
return 1;
}
Now, I need to add a new method something like, addformatted which will take a format like sprintf. I am wondering what would be the best way to do this? Following are my thoughts.
Use vsnprintf. But I am not sure that this is portable and has got the same behaviour on all platforms.
Write a format parser myself. But this seems to be more work.
Any help in implementing this would be great. I am only interested in portable solutions.
Dev Env : Linux with GCC
Expected to compile on MSVC

snprintf is the way to go and has well-defined behavior, but there are some broken implementations where it returns the wrong values when the buffer is too small. Personally I would just ignore broken implementations unless you really need to use one, or provide a custom implementation of the whole printf family to replace the system version on such broken systems. Otherwise you need to research the broken behavior these systems exhibit and find out how to write workarounds. It might require progressively enlarging the buffer over and over until your call succeeds.

When you use sprintf, you must have some idea of what the resulting string length will be, or else you use snprintf and shield yourself from buffer overflows.
If you want this kind of interface, you can create a function that wraps snprintf and takes as argument a buffer length, and expand your string the required amount before calling snprintf.

Related

Local variables and memory in c

I'm somewhat new to C and am wondering about certain things about memory allocation. My function is as follows:
size_t count_nwords(const char* str) {
//char* copied_str = strdup(str); // because 'strtok()' alters the string it passes through
char copied_str[strlen(str)];
strcpy(copied_str, str);
size_t count = 1;
strtok(copied_str, " ");
while(strtok(NULL, " ") != 0) {
count++;
}
//free(copied_str);
return count;
}
This function counts the amount of words in a string (the delimiter is a space, ie ""). I do not want the string passed in argument to be modified.
I have two questions:
Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy() is sufficient and faster, but I am not certain.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the function is robust? Or is using size_t nwords = count_nwords(copied_input); completely safe and will always properly get the returned value?
Thank you!
EDIT: I've accepted the only answer that concerned my questions precisely, but I advise reading the other answers as they provide good insights regarding errors I had made in my code.

Failure to account for the null character
// char copied_str[strlen(str)];
char copied_str[strlen(str) + 1];
strcpy(copied_str, str);
Wrong algorithm
Even with above fix, code returns 1 with count_nwords(" ")
Unnecessary copying of string
strtok() not needed here. A copy of the string is not needed.
Alternative: walk the string.
size_t count_nwords(const char* str) {
size_t count = 0;
while (*str) {
while (isspace((unsigned char) *str)) {
str++;
}
if (*str) {
count++;
while (!isspace((unsigned char) *str) && *str) {
str++;
}
}
}
return count;
}

Another option is the state-loop approach where you continually loop over each character keeping track of the state of your count with a simple flag. (you are either in a word reading characters or you are reading spaces). The benefit being you have only a single loop involved. A short example would be:
size_t count_words (const char *str)
{
size_t words = 0;
int in_word = 0;
while (*str) {
if (isspace ((unsigned char)*str))
in_word = 0;
else {
if (!in_word)
words++;
in_word = 1;
}
str++;
}
return words;
}
It is worth understanding all techniques. isspace requires the inclusion of ctype.h.

Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy()
is sufficient and faster, but I am not certain.
Your solution is clean and works well so don't bother. The only point is that you are using VLA which is now optional, then using strdup would be less standard prone. Now regarding performance, as it is not specified how VLAs are implemented, performance may vary from compiler/platform to compiler/platform (gcc is known to use stack for VLAs but any other compiler may use heap). We only know that strdup allocates on the heap, that's all. I doubt that performance problem will come from such a choice.
Note: you allocation size is wrong and should be at least strlen(str)+1.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the
function is robust? Or is using size_t nwords =
count_nwords(copied_input); completely safe and will always properly
get the returned value?
Managing return values and memory suitable for is a concern of the compiler. Usually, these values are transfered on/from the stack (have some reading on "stack frame"). As you may suspect, space is allocated on the stack for it just before the call and is deallocated after the call (as soon as you discard or copy the returned value).

How do I write a C function that returns a variable-length string?

I need to be able to check in a kernel module whether or not a file descriptor, dentry, or inode falls under a certain path. To do this, I am going to have to write a function that when given a dentry or a file descriptor (not sure which, yet), will return said object's full path name.
What is the way to write a function that returns variable-length strings?

You can try like this:
char *myFunction(void)
{
char *word;
word = malloc (sizeof (some_random_length));
//add some random characters
return word;
}
You can also refer related thread: best practice for returning a variable length string in c

The typical way to do this in C, is not to return anything at all:
void func (char* buf, size_t buf_size, size_t* length);
Where buf is a pointer to the buffer which will hold the string, allocated by the caller. buf_size is the size of that buffer. And length is how much of that buffer that the function used.
You could return a pointer to buf as done by for example strcpy. But this doesn't make much sense, since the same pointer already exists in one of the parameters. It adds nothing but confusion.
(Don't use strcpy, strcat etc functions as some role model for how to write functions. Many C standard library functions have obscure prototypes, because they are so terribly old, from a time when good programming practice wasn't invented, or at least not known by Dennis Ritchie.)

There are two common approaches:
One is to have a fixed size buffer to store the result:
int makeFullPath(char *buffer,size_t max_size,...)
{
int actual_size = snprintf(buffer,max_size,...);
return actual_size;
}
Examples of standard functions which use this approach are strncpy() and snprintf(). This approach has the advantage that no dynamic memory allocation is needed, which will give better performance for time-critical functions. The downside is that it puts more responsibility on the caller to be able to determine the largest possible result size in advance or be ready to reallocate if a larger size is necessary.
The second common approach is to calculate how big of a buffer to use and allocate that many bytes internally:
// Caller eventually needs to free() the result.
char* makeFullPath(...)
{
size_t max_size = calculateFullPathSize(...);
char *buffer = malloc(max_size);
if (!buffer) return NULL;
int actual_size = snprintf(buffer,max_size,...);
assert(actual_size<max_size);
return buffer;
}
An example of a standard function that uses this approach is strdup(). The advantage is that the caller no longer needs to worry about the size, but they now need to make sure that they free the result. For a kernel module, you would use kmalloc() and kfree() instead of malloc() and free().
A less common approach is to have a static buffer:
const char *makeFullPath(char *buffer,size_t max_size,...)
{
static char buffer[MAX_PATH];
int actual_size = snprintf(buffer,MAX_PATH,...);
return buffer;
}
This avoids the caller having to worry about the size or freeing the result, and it is also efficient, but it has the downside that the caller now has to make sure that they don't call the function a second time while the result of the first call is still being used.
char *result1 = makeFullPath(...);
char *result2 = makeFullPath(...);
printf("%s",result1);
printf("%s",result2); /* oops! */
Here, the caller probably meant to print two separate strings, but they'll actually just get the second string twice. This is also problematic in multi-threaded code, and probably unusable for kernel code.

For example:
char * fn( int file_id )
{
static char res[MAX_PATH];
// fill res[]
return res;
}

/*
let do it the BSTR way (BasicString of VB)
*/
char * CopyString(char *str){
unsigned short len;
char *buff;
len=lstrlen(str);
buff=malloc(sizeof(short)+len+1);
if(buff){
((short*)buff)[0]=len+1;
buff=&((short*)buff)[1];
strcpy(buff,str);
}
return buff;
}
#define len_of_string(s) ((short*)s)[-1])
#define free_string(s) free(&((short*)s)[-1]))
int main(){
char *buff=CopyString("full_path_name");
if(buff){
printf("len of string= %d\n",len_of_string(buff));
free_string(buff);
}else{
printf("Error: malloc failed\n");
}
return 0;
}
/*
now you can imagine how to reallocate the string to a new size
*/

How to handle less size situation in string concatination?

I'm working on C and want to implements a string concatenation function.
I implemented following function:
void mystr_concat(char* dest, char* src)
{
char* temp = dest;
while(*temp)
{
temp++;
}
while(*src)
{
*temp++ = *src;
src++;
}
*temp = '\0';
return;
}
The output of above program is that it append "src" string to "dest" string.
If user passed a "dest" string which is small in the length such that it can't append "src" string anymore.
e.g. user have this strings and invoked function
char dest[6] = "abcnd";
char src[100] = "zdfhjksdfskdfsdfsdfj";
mystr_concat(dest, src)
In this case
How to check the above raise condition and required solution to resolve this issue?

C does not perform any bounds checks on array references. If you need this done, you will need to either pass into the function the maximum size of the destination array and then verify that the source will fit (or decide to truncate it if required), or, introduce an additional data structure to track the length of strings in the way that typical Pascal implementation prefix each string with its maximum length.
Neither solution is automatic and to support this functionality in a safe way requires the use of a language like Java or C# to prevent the use of unsafe constructs.

Notice that strings, when passed as parameters of functions, are decayed to char pointers. And there is no portable way in C to know at runtime the size of a memory zone given by its pointer. So your mystr_concat cannot know the size of dest (unless you give that size somehow, e.g. by passing that size as an additional function paramter e.g. declaring
void mystr_concat(char* dest, size_t destsize, char* src)
Then you might call it with e.g.
char destbuf[36];
strncpy (destbuf, sizeof(destbuf) /*i.e. 36*/, "start");
mystr_concat(destbuf, sizeof(destbuf), "some-more");
Notice that the standard snprintf(3) function has a similar convention.
Another possible way is to decide that your function returns a heap allocated string and that it is the responsibility of its caller to free that string. Then you could code
char *my_string_catenation (const char* s1, const char* s2)
{
if (!s1||!s2) return NULL;
size_t s1len = strlen(s1);
size_t s2len = strlen(s2);
char* res = malloc(s1len+s2len+1);
if (!res)
{ perror("my_string_catenation malloc"); exit(EXIT_FAILURE); };
memcpy (res, s1, s1len);
memcpy (res+s1len, s2, s2len);
res[s1len+s2len] = (char)0;
return res;
}
You then might code things like
char buf[32];
snprintf(buf, sizeof(buf), "+%d", i);
char* catstr = my_string_catenation((i>30)?"foo":"boo", buf);
do_something_with(catstr);
free(catstr), catstr = NULL;
The above example is a bit stupid, since one could just use snprintf without my_string_catenation but I can't think of a short better example.
It is common in C libraries to have conventions about who is responsible to free some heap-allocated data. You should document such conventions (at least in comments in the header files declaring them).
Perhaps you might be interested in using Boehm's conservative garbage collector; you'll then use GC_MALLOC instead of malloc etc... and you won't bother about free ...

How to use the pointer given from MapViewOfFile when its not ending with a Null termination character?

I'm new to C/C++ and I need help.
So this is what I do:
HANDLE OpenH = CreateFile(filePath,GENERIC_READ,FILE_SHARE_READ,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL);
HANDLE hMapFile = CreateFileMapping(OpenH, NULL,PAGE_READONLY,0,0,FFD.cFileName);
char * pBuf = (char *) MapViewOfFile(hMapFile,FILE_MAP_READ,0,0,0);
if(strstr(pBuf,searchword) != 0)
I am mapping text files with a lot of data in them, and I have to use the pointer pBuf in a strstr, strcmp, strtok, and other functions, but the pBuf doesn't have a null terminating character at the end of it and it gives me access violation every time I use it in a function. So how can I use pBuf?

I guess strstr is not the right function for this purpose...
You could use memchr to locate the first character and afterwards check, if the rest fits:
int chr = 'a';
//DWORD size = GetFileSize(OpenH);
unsigned length = ... // obtain length of pBuf
const void* ptr = memchr(pBuf, chr, length);
if(ptr != NULL){
....
}
This is a little bit tedious, but will lead to a result eventually.
EDIT:
Using the Win32 GetFileSize-function you can determine the current size of the file, should work with mapping, too.

Simply, you need to know how big the file is, information that is readily available. And then you need to use a search function that does not rely on the buffer being null-terminated. If your library does not have such a function then I guess you will need to write your own.

best practice for returning a variable length string in c

I have a string function that accepts a pointer to a source string and returns a pointer to a destination string. This function currently works, but I'm worried I'm not following the best practice regrading malloc, realloc, and free.
The thing that's different about my function is that the length of the destination string is not the same as the source string, so realloc() has to be called inside my function. I know from looking at the docs...
http://www.cplusplus.com/reference/cstdlib/realloc/
that the memory address might change after the realloc. This means I have can't "pass by reference" like a C programmer might for other functions, I have to return the new pointer.
So the prototype for my function is:
//decode a uri encoded string
char *net_uri_to_text(char *);
I don't like the way I'm doing it because I have to free the pointer after running the function:
char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);
Which means that malloc() and realloc() are called inside my function and free() is called outside my function.
I have a background in high level languages, (perl, plpgsql, bash) so my instinct is proper encapsulation of such things, but that might not be the best practice in C.
The question: Is my way best practice, or is there a better way I should follow?
full example
Compiles and runs with two warnings on unused argc and argv arguments, you can safely ignore those two warnings.
example.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *net_uri_to_text(char *);
int main(int argc, char ** argv) {
char * chr_input = "testing123%5a%5b%5cabc";
char * chr_output = net_uri_to_text(chr_input);
printf("%s\n", chr_output);
free(chr_output);
return 0;
}
//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
//define variables
int int_length = strlen(chr_input);
int int_new_length = int_length;
char * chr_output = malloc(int_length);
char * chr_output_working = chr_output;
char * chr_input_working = chr_input;
int int_output_working = 0;
unsigned int uint_hex_working;
//while not a null byte
while(*chr_input_working != '\0') {
//if %
if (*chr_input_working == *"%") {
//then put correct char in
sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
*chr_output_working = (char)uint_hex_working;
//printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
//realloc
chr_input_working++;
chr_input_working++;
int_new_length -= 2;
chr_output = realloc(chr_output, int_new_length);
//output working must be the new pointer plys how many chars we've done
chr_output_working = chr_output + int_output_working;
} else {
//put char in
*chr_output_working = *chr_input_working;
}
//increment pointers and number of chars in output working
chr_input_working++;
chr_output_working++;
int_output_working++;
}
//last null byte
*chr_output_working = '\0';
return chr_output;
}

It's perfectly ok to return malloc'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc and xrealloc are "safe" allocating functions that I made up to skip NULL checks.)

The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()ated string. It's a common idiom to return mallocated obejcts and have the caller free() them.
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()d, though).
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc().
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc() at all.
If you accept one last piece of good advice: it's advisable to const-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
All in all, I'd rewrite your function like this:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}

It's perfectly OK to return newly-malloc-ed (and possibly internally realloced) values from functions, you just need to document that you are doing so (as you do here).
Other obvious items:
Instead of int int_length you might want to use size_t. This is "an unsigned type" (usually unsigned int or unsigned long) that is the appropriate type for lengths of strings and arguments to malloc.
You need to allocate n+1 bytes initially, where n is the length of the string, as strlen does not include the terminating 0 byte.
You should check for malloc failing (returning NULL). If your function will pass the failure on, document that in the function-description comment.
sscanf is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can use isxdigit from <ctype.h> to check for hexadecimal digits, and/or strtoul to do the conversion.
Rather than doing one realloc for every % conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing a realloc after all.

I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!

Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca, you may not reallocate it.

I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string and (int)*(mystring + 4) == capacity of string. Thus, the string itself only starts at the 8th position *(mystring+8). By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Adding formatting support to a custom string implementation - C - c

Related

Local variables and memory in c

How do I write a C function that returns a variable-length string?

How to handle less size situation in string concatination?

How to use the pointer given from MapViewOfFile when its not ending with a Null termination character?

best practice for returning a variable length string in c

Categories

Resources