Performing lots of string concatenation in C? - c

I'm porting some code from Java to C, and so far things have gone well.
However, I have a particular function in Java that makes liberal use of StringBuilder, like this:
StringBuilder result = new StringBuilder();
// .. build string out of variable-length data
for (SolObject object : this) {
result.append(object.toString());
}
// .. some parts are conditional
if (freezeCount < 0) result.append("]");
else result.append(")");
I realize SO is not a code translation service, but I'm not asking for anyone to translate the above code.
I'm wondering how to efficiently perform this type of mass string concatenation in C. It's mostly small strings, but each is determined by a condition, so I can't combine them into a simple sprintf call.
How can I reliably do this type of string concatenation?

A rather "clever" way to conver a number of "objects" to string is:
char buffer[100];
char *str = buffer;
str += sprintf(str, "%06d", 123);
str += sprintf(str, "%s=%5.2f", "x", 1.234567);
This is fairly efficient, since sprintf returns the length of the string copied, so we can "move" str forward by the return value, and keep filling in.
Of course, if there are true Java Objects, then you'll need to figure out how to make a Java style ToString function into "%somethign" in C's printf family.

The performance problem with strcat() is that it has to scan the destination string to find the terminating \0' before it can start appending to it.
But remember that strcat() doesn't take strings as arguments, it takes pointers.
If you maintain a separate pointer that always points to the terminating '\0' of the string you're appending to, you can use that pointer as the first argument to strcat(), and it won't have to re-scan it every time. For that matter, you can use strcpy() rater than strcat().
Maintaining the value of this pointer and ensuring that there's enough room are left as an exercise.
NOTE: you can use strncat() to avoid overwriting the end of the destination array (though it will silently truncate your data). I don't recommend using strncpy() for this purpose. See my rant on the subject.
If your system supports them, the (non-standard) strcpy() and strlcat() functions can be useful for this kind of thing. They both return the total length of the string they tried to create. But their use makes your code less portable; on the other hand, there are open-source implementations that you can use anywhere.
Another solution is to call strlen() on the string you're appending. This isn't ideal, since it's then scanned twice, once by strcat() and once by strlen() -- but at least it avoids re-scanning the entire destination string.

The cause of poor performance when concatenating strings is the reallocation of memory. Joel Spolsky discusses this in his article Back to basics. He describes the naive method of concatenating strings:
Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.
The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.
The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"
"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"
If you can, you want to know how large your destination buffer needs to be before allocating it. The only realistic way to do this is to call strlen on all of the strings you want to concatenate. Then allocate the appropriate amount of memory and use a slightly modified version of strncpy that returns a pointer to the end of the destination buffer.
// Copies src to dest and returns a pointer to the next available
// character in the dest buffer.
// Ensures that a null terminator is at the end of dest. If
// src is larger than size then size - 1 bytes are copied
char* StringCopyEnd( char* dest, char* src, size_t size )
{
size_t pos = 0;
if ( size == 0 ) return dest;
while ( pos < size - 1 && *src )
{
*dest = *src;
++dest;
++src;
++pos;
}
*dest = '\0';
return dest;
}
Note how you have to set the size parameter to be the number of bytes left until the end of the destination buffer.
Here's a sample test function:
void testStringCopyEnd( char* str1, char* str2, size_t size )
{
// Create an oversized buffer and fill it with A's so that
// if a string is not null terminated it will be obvious.
char* dest = (char*) malloc( size + 10 );
memset( dest, 'A', size + 10 );
char* end = StringCopyEnd( dest, str1, size );
end = StringCopyEnd( end, str2, size - ( end - dest ) );
printf( "length: %d - '%s'\n", strlen( dest ), dest );
}
int main(int argc, _TCHAR* argv[])
{
// Test with a large enough buffer size to concatenate 'Hello World'.
// and then reduce the buffer size from there
for ( int i = 12; i > 0; --i )
{
testStringCopyEnd( "Hello", " World", i );
}
return 0;
}
Which produces:
length: 11 - 'Hello World'
length: 10 - 'Hello Worl'
length: 9 - 'Hello Wor'
length: 8 - 'Hello Wo'
length: 7 - 'Hello W'
length: 6 - 'Hello '
length: 5 - 'Hello'
length: 4 - 'Hell'
length: 3 - 'Hel'
length: 2 - 'He'
length: 1 - 'H'
length: 0 - ''

If operations like these are very frequent, you could implement them in your own buffer class. Example (error handling omitted for brevity ;-):
struct buff {
size_t used;
size_t size;
char *data;
} ;
struct buff * buff_new(size_t size)
{
struct buff *bp;
bp = malloc (sizeof *bp);
bp->data = malloc (size);
bp->size = size;
bp->used = 0;
return bp;
}
void buff_add_str(struct buff *bp, char *add)
{
size_t len;
len = strlen(add);
/* To be implemented: buff_resize() ... */
if (bp->used + len +1 >= bp->size) buff_resize(bp, bp->used+1+len);
memcpy(buff->data + buff->used, add, len+1);
buff->used += len;
return;
}

Given that the strings look so small, I'd be inclined just to use strcat and revisit if performance becomes an issue.
You could make your own method that remembers the string length so it doesn't need to iterate through the string to find the end (which is potentially the slow bit of strcat if you are doing lots of appends to long strings)

Related

C - Using sprintf() to put a prefix inside of a string

I'm trying to use sprintf() to put a string "inside itself", so I can change it to have an integer prefix. I was testing this on a character array of length 12 with "Hello World" inside it already.
The basic premise is that I want a prefix that denotes the amount of words within a string. So I copy 11 characters into a character array of length 12.
Then I try to put the integer followed by the string itself by using "%i%s" in the function. To get past the integer (I don't just use myStr as the argument for %s), I make sure to use myStr + snprintf(NULL, 0, "%i", wordCount), which should be myStr + characters taken up by the integer.
The problem is that I'm having is that it eats the 'H' when I do this and prints "2ello World" instead of having the '2' right beside the "Hello World"
So far I've tried different options for getting "past the integer" in the string when I try to copy it inside itself, but nothing really seems to be the right case, as it either comes out as an empty string or just the integer prefix itself '222222222222' copied throughout the entire array.
int main() {
char myStr[12];
strcpy(myStr, "Hello World");//11 Characters in length
int wordCount = 2;
//Put the integer wordCount followed by the string myStr (past whatever amount of characters the integer would take up) inside of myStr
sprintf(myStr, "%i%s", wordCount, myStr + snprintf(NULL, 0, "%i", wordCount));
printf("\nChanged myStr '%s'\n", myStr);//Prints '2ello World'
return 0;
}
First, to insert a one-digit prefix into a string “Hello World”, you need a buffer of 13 characters—one for the prefix, eleven for the characters in “Hello World”, and one for the terminating null character.
Second, you should not pass a buffer to snprintf as both the output buffer and an input string. Its behavior is not defined by the C standard when objects passed to it overlap.
Below is a program that shows you how to insert a prefix by moving the string with memmove. This is largely tutorial, as it is not generally a good way to manipulate strings. For short strings, where space is not an issue, most programmers would simply print the desired string into a temporary buffer, avoiding overlap issues.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* Insert a decimal numeral for Prefix into the beginning of String.
Length specifies the total number of bytes available at String.
*/
static void InsertPrefix(char *String, size_t Length, int Prefix)
{
// Find out how many characters the numeral needs.
int CharactersNeeded = snprintf(NULL, 0, "%i", Prefix);
// Find the current string length.
size_t Current = strlen(String);
/* Test whether there is enough space for the prefix, the current string,
and the terminating null character.
*/
if (Length < CharactersNeeded + Current + 1)
{
fprintf(stderr,
"Error, not enough space in string to insert prefix.\n");
exit(EXIT_FAILURE);
}
// Move the string to make room for the prefix.
memmove(String + CharactersNeeded, String, Current + 1);
/* Remember the first character, because snprintf will overwrite it with a
null character.
*/
char Temporary = String[0];
// Write the prefix, including a terminating null character.
snprintf(String, CharactersNeeded + 1, "%i", Prefix);
// Restore the first character of the original string.
String[CharactersNeeded] = Temporary;
}
int main(void)
{
char MyString[13] = "Hello World";
InsertPrefix(MyString, sizeof MyString, 2);
printf("Result = \"%s\".\n", MyString);
}
The best way to deal with this is to create another buffer to output to, and then if you really need to copy back to the source string then copy it back once the new copy is created.
There are other ways to "optimise" this if you really needed to, like putting your source string into the middle of the buffer so you can append and change the string pointer for the source (not recommended, unless you are running on an embedded target with limited RAM and the buffer is huge). Remember code is for people to read so best to keep it clean and easy to read.
#define MAX_BUFFER_SIZE 128
int main() {
char srcString[MAX_BUFFER_SIZE];
char destString[MAX_BUFFER_SIZE];
strncpy(srcString, "Hello World", MAX_BUFFER_SIZE);
int wordCount = 2;
snprintf(destString, MAX_BUFFER_SIZE, "%i%s", wordCount, srcString);
printf("Changed string '%s'\n", destString);
// Or if you really want the string put back into srcString then:
strncpy(srcString, destString, MAX_BUFFER_SIZE);
printf("Changed string in source '%s'\n", srcString);
return 0;
}
Notes:
To be safer protecting overflows in memory you should use strncpy and snprintf.

Concatenate char array and char

I am new to C language. I need to concatenate char array and a char. In java we can use '+' operation but in C that is not allowed. Strcat and strcpy is also not working for me. How can I achieve this? My code is as follows
void myFunc(char prefix[], struct Tree *root) {
char tempPrefix[30];
strcpy(tempPrefix, prefix);
char label = root->label;
//I want to concat tempPrefix and label
My problem differs from concatenate char array in C as it concat char array with another but mine is a char array with a char
Rather simple really. The main concern is that tempPrefix should have enough space for the prefix + original character. Since C strings must be null terminated, your function shouldn't copy more than 28 characters of the prefix. It's 30(the size of the buffer) - 1 (the root label character) -1 (the terminating null character). Fortunately the standard library has the strncpy:
size_t const buffer_size = sizeof tempPrefix; // Only because tempPrefix is declared an array of characters in scope.
strncpy(tempPrefix, prefix, buffer_size - 3);
tempPrefix[buffer_size - 2] = root->label;
tempPrefix[buffer_size - 1] = '\0';
It's also worthwhile not to hard code the buffer size in the function calls, thus allowing you to increase its size with minimum changes.
If your buffer isn't an exact fit, some more legwork is needed. The approach is pretty much the same as before, but a call to strchr is required to complete the picture.
size_t const buffer_size = sizeof tempPrefix; // Only because tempPrefix is declared an array of characters in scope.
strncpy(tempPrefix, prefix, buffer_size - 3);
tempPrefix[buffer_size - 2] = tempPrefix[buffer_size - 1] = '\0';
*strchr(tempPrefix, '\0') = root->label;
We again copy no more than 28 characters. But explicitly pad the end with NUL bytes. Now, since strncpy fills the buffer with NUL bytes up to count in case the string being copied is shorter, in effect everything after the copied prefix is now \0. This is why I deference the result of strchr right away, it is guaranteed to point at a valid character. The first free space to be exact.
strXXX() family of functions mostly operate on strings (except the searching related ones), so you will not be able to use the library functions directly.
You can find out the position of the existing null-terminator, replace that with the char value you want to concatenate and add a null-terminator after that. However, you need to make sure you have got enough room left for the source to hold the concatenated string.
Something like this (not tested)
#define SIZ 30
//function
char tempPrefix[SIZ] = {0}; //initialize
strcpy(tempPrefix, prefix); //copy the string
char label = root->label; //take the char value
if (strlen(tempPrefix) < (SIZ -1)) //Check: Do we have room left?
{
int res = strchr(tempPrefix, '\0'); // find the current null
tempPrefix[res] = label; //replace with the value
tempPrefix[res + 1] = '\0'; //add a null to next index
}

C - start traversing from the middle of a string

Just double checking because I keep mixing up C and C++ or C# but say that I have a string that I was parsing using strcspn(). It returns the length of the string up until the first delimiter it finds. Using strncpy (is that C++ only or was that available in C also?) I copy the first part of the string somewhere else and have a variable store my position. Let's say strcspn returned 10 (so the delimiter is the 10th character)
Now, my code does some other stuff and eventually I want to keep traversing the string. Do I have to copy the second half of the string and then call strncspn() from the beginning. Can I just make a pointer and point it at the 11th character of my string and pass that to strncspn() (I guess something like char* pos = str[11])? Something else simpler I'm just missing?
You can get a pointer to a location in the middle of the string and you don't need to copy the second half of the string to do it.
char * offset = str + 10;
and
char * offset = &str[10];
mean the same thing and both do what you want.
You mean str[9] for the 10th char, or str[10] for the 11th, but yes you can do that.
Just be careful that you are not accessing beyond the length of the string and beyond the size of memory allocated.
It sounds like you are performing tokenization, I would suggest that you can directly use strtok instead, it would be cleaner, and it already handles both of what you want to do (strcspn+strncpy and continue parsing after the delimiter).
you can call strcspn again with (str + 11) as first argument. But make sure that length of str is greater than 11.
n = strcspn(str, pattern);
while ((n+1) < strlen(str))
{
n2 = strcspn((str+n), pattern);
n += n2;
}
Note : using char *pos = str[11] is wrong. You should use like char *pos = str + 11;

strcat() for formatted strings

I'm building a string piece by piece in my program and am currently using a mix of strcat() when I'm adding a simple string onto the end, but when im adding a formatted string I'm using sprintf() e.g.:
int one = 1;
sprintf(instruction + strlen(instruction), " number %d", one);
is it possible to concatenate formatted string using strcat() or what is the preferred method for this?
Your solution will work. Calling strlen is a bit awkward (particularly if the string gets quite long). sprintf() will return the length you have used [strcat won't], so one thing you can do is something like this:
char str[MAX_SIZE];
char *target = str;
target += sprintf(target, "%s", str_value);
target += sprintf(target, "somestuff %d", number);
if (something)
{
target += sprintf(target, "%s", str_value2);
}
else
{
target += sprintf(target, "%08x", num2);
}
I'm not sure strcat is much more efficient than sprintf() is when used in this way.
Edit: should write smaller examples...
no it's not possible but you could use sprintf() on those simple strings and avoid calling strlen() every time:
len = 0;
len += sprintf(buf+len, "%s", str);
len += sprintf(buf+len, " number %d", one);
To answer the direct question, sure, it's possible to use strcat to append formatted strings. You just have to build the formatted string first, and then you can use strcat to append it:
#include <stdio.h>
#include <string.h>
int main(void) {
char s[100];
char s1[20];
char s2[30];
int n = 42;
double x = 22.0/7.0;
strcpy(s, "n = ");
sprintf(s1, "%d", n);
strcat(s, s1);
strcat(s, ", x = ");
sprintf(s2, "%.6f", x);
strcat(s, s2);
puts(s);
return 0;
}
Output:
n = 42, x = 3.142857
But this is not a particularly good approach.
sprintf works just as well writing to the end of an existing string. See Mats's answer and mux's answer for examples. The individual arrays used to hold individual fields are not necessary, at least not in this case.
And since this code doesn't keep track of the end of the string, the performance is likely to be poor. strcat(s1, s2) first has to scan s1 to find the terminating '\0', and then copy the contents of s2 into it. The other answers avoid this by advancing an index or a pointer to keep track of the end of the string without having to recompute it.
Also, the code makes no effort to avoid buffer overruns. strncat() can do this, but it just truncates the string; it doesn't tell you that it was truncated. snprintf() is a good choice; it returns the number of characters that it would have written if enough space were available. If this exceeds the size you specify, then the string was truncated.
/* other declarations as above */
size_t count;
count = snprintf(s, sizeof s, "n = %d, x = %.6f", n, x);
if (count > sizeof s) {
/* the string was truncated */
}
And to append multiple strings (say, if some are appended conditionally or repeatedly), you can use the methods in the other answers to keep track of the end of the target string.
So yes, it's possible to append formatted strings with strcat(). It's just not likely to be a good idea.
What the preferred method is, depends on what you are willing to use. Instead of doing all those manual (and potentially dangerous) string operations, I would use the GString data structure from GLib or GLib's g_strdup_print function. For your problem, GString provides the g_string_append_printf function.
Write your own wrapper for your need.
A call to this would look like this :-
result = universal_concatenator(4,result,"numbers are %d %f\n",5,16.045);
result = universal_concatenator(2,result,"tail_string");
You could define one function, that would take care of worrying about, if you need to use sprintf() or strcat(). This is what the function would look like :-
/* you should pass the number of arguments
* make sure the second argument is a pointer to the result always
* if non formatted concatenation:
* call function with number_of_args = 2
* else
* call function with number of args according to format
* that is, if five inputs to sprintf(), then 5.
*
* NOTE : Here you make an assumption that result has been allocated enough memory to
* hold your concatenated string. This assumption holds true for strcat() or
* sprintf() of your previous implementation
*/
char* universal_concaternator(int number_of_args,...)
{
va_list args_list;
va_start(args_list,number_of_args);
int counter = number_of_args;
char *result = va_arg(args_list, char*);
char *format;
if(counter == 2) /* it is a non-formatted concatenation */
{
result = strcat(result,va_arg(args_list,char*));
va_end(args_list);
return result;
}
/* else part - here you perform formatted concatenation using sprintf*/
format = va_arg(args_list,char*);
vsprintf(result + strlen(result),format,args_list);
va_end(args_list);
return result;
}
/* dont forget to include the header
* <stdarg.h> #FOR-ANSI
* or <varargs.h> #FOR-UNIX
*/
It should firstly, determine, which of the two it should call(strcat or sprintf), then it should make the call, and make it easy for you to concentrate on the actual logic of whatever you are working on!
Just ctrl+c code above and ctrl+v into your code base.
Note : Matt's answer is a good alternative for long strings. But for short string lengths(<250), this should do.

How to truncate C char*?

As simple as that. I'm on C++ btw. I've read the cplusplus.com's cstdlib library functions, but I can't find a simple function for this.
I know the length of the char, I only need to erase last three characters from it. I can use C++ string, but this is for handling files, which uses char*, and I don't want to do conversions from string to C char.
If you don't need to copy the string somewhere else and can change it
/* make sure strlen(name) >= 3 */
namelen = strlen(name); /* possibly you've saved the length previously */
name[namelen - 3] = 0;
If you need to copy it (because it's a string literal or you want to keep the original around)
/* make sure strlen(name) >= 3 */
namelen = strlen(name); /* possibly you've saved the length previously */
strncpy(copy, name, namelen - 3);
/* add a final null terminator */
copy[namelen - 3] = 0;
I think some of your post was lost in translation.
To truncate a string in C, you can simply insert a terminating null character in the desired position. All of the standard functions will then treat the string as having the new length.
#include <stdio.h>
#include <string.h>
int main(void)
{
char string[] = "one one two three five eight thirteen twenty-one";
printf("%s\n", string);
string[strlen(string) - 3] = '\0';
printf("%s\n", string);
return 0;
}
If you know the length of the string you can use pointer arithmetic to get a string with the last three characters:
const char* mystring = "abc123";
const int len = 6;
const char* substring = mystring + len - 3;
Please note that substring points to the same memory as mystring and is only valid as long as mystring is valid and left unchanged. The reason that this works is that a c string doesn't have any special markers at the beginning, only the NULL termination at the end.
I interpreted your question as wanting the last three characters, getting rid of the start, as opposed to how David Heffernan read it, one of us is obviously wrong.
bool TakeOutLastThreeChars(char* src, int len) {
if (len < 3) return false;
memset(src + len - 3, 0, 3);
return true;
}
I assume mutating the string memory is safe since you did say erase the last three characters. I'm just overwriting the last three characters with "NULL" or 0.
It might help to understand how C char* "strings" work:
You start reading them from the char that the char* points to until you hit a \0 char (or simply 0).
So if I have
char* str = "theFile.nam";
then str+3 represents the string File.nam.
But you want to remove the last three characters, so you want something like:
char str2[9];
strncpy (str2,str,8); // now str2 contains "theFile.#" where # is some character you don't know about
str2[8]='\0'; // now str2 contains "theFile.\0" and is a proper char* string.

Resources