Most efficient way to concatenate strings in c - c

Consider this simple program that concatenates all specified parameters and prints them in standard output. I used 2 for loops to append the strings, one to calculate the length of that string and one to concatenate the strings. Is there a way doing it with only one loop? It wouldn't be more efficient reallocating memory for each string to concatenate, would it? How would Java's StringBuilder be implemented in C? Would it loop twice as I did?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char** argv)
{
size_t len = 0;
// start for loop at i = 1 to skip the program name specified in argv
for(int i = 1; i < argc; i++)
len += strlen(argv[i]) + 1; // +1 for the space
char* toAppend = (char*)malloc(len * sizeof(char) + 1);
toAppend[0] = '\0'; // first string is empty and null terminated
for(int i = 1; i < argc; i++)
{
strcat(toAppend, argv[i]);
strcat(toAppend, " ");
}
printf(toAppend);
free(toAppend);
}

Your method of allocation is efficient, measuring the total length and allocating just once. But the concatenation loop repeatedly measures the length of the output buffer from the start to concatenate to it, resulting in quadratic runtime.
To fix it track your position as you go:
size_t pos = 0;
for(int i = 1; i < argc; i++) {
size_t len = strlen(argv[i]);
memcpy(toAppend+pos, argv[i], len);
pos += len;
toAppend[pos] = ' ';
pos++;
}
toAppend[pos] = 0;
This is the most efficient way to actually concatenate in memory, but the most efficient of all is not to concatenate. Instead:
for(int i = 1; i < argc; i++)
printf("%s ", argv[i]);
The whole reason stdio is buffered is so you don't have to build arbitrary-length in-memory buffers to do efficient output; instead it buffers up to a fixed size automatically and flushes when the buffer is full.
Note that your usage of printf is wrong and dangerous in the event that your input contains a % character anywhere; it should be printf("%s", toAppend);.
If you're writing to POSIX (or POSIX-ish) systems rather than just plain C, another option would be fmemopen, which would allow you to write the loop just as:
for(int i = 1; i < argc; i++)
fprintf(my_memfile, "%s ", argv[i]);

efficient way to concatenate strings in c
An efficient way is to calculate the string lengths - and remember them.
size_t sum = 1; // for \0
if (argc > 2) sum += argc - 2. // spaces
size_t length[argc]; // This is a VLA, available C99 and optionally in C11
for(int i = 1; i < argc; i++)
length[i] = strlen(argv[i]);
sum += length[i];
}
Then allocate, and then check for errors.
char *dest = malloc(sum);
if (dest == NULL) Handle_OutOfMemory();
Copy each string in turn
char *p = dest;
for(int i = 1; i < argc; i++)
// Use either memcpy() or strcpy().
// memcpy() tends to be faster for long strings than strcpy().
memcpy(p, argv[i], length[i]);
p += length[i]; // advance insertion point
if (i > 1) {
*p++ = ' '; // space separators
}
}
*p = '\0';
Now use dest[].
printf("<%s>\n", dest);
Free resources when done.
free(dest);
It wouldn't be more efficient reallocating memory for each string to concatenate, would it?
Usually repetitive re-allocations is best avoided, yet for small short strings it really makes scant difference. Focus on big O. My answer is O(n). Relocating in a loop tends to be O(n*n).
If performance was critical, try various approaches and profile for the intended system. The point being what is fast on one machine may differ on another. Usually it is best to first code a reasonable clear approach.

The most efficient way is probably to not use any str functions and copy the characters "by hand":
char* toAppend = malloc(len + 1);
size_t j = 0;
for(size_t i = 1; i < argc; i++)
{
for(size_t k = 0; argv[i][k]; k++)
toAppend[j++] = argv[i][k];
toAppend[j++] = ' ';
}
toAppend[j - 1] = '\0'; // Remove the last space and NULL-terminate the string

Related

buffer overrun while trying to link two strings together, why do I have this error?

(in C, using visual studio 2022 preview), I have to do a program that link two strings together. Here's what I did:
I wrote two for-loops to count characters of first string and second
string,
I checked (inside the link function if the pointers are null (first and second). If they are null, then "return NULL".
I created "char *result". this is a new string and this is the string to be returned. I allocated enough memory to store nprime, nsecond, and 1 more character (the zero terminator). I used a malloc.
then, I checked if result is null. if it's null then "return NULL".
then, I wrote 2 for-loops to perform the linking between the first string and the second string. And here I got a compiler warning (because I think it's in compile time not in debug time). buffer overrun, the writable size is
"nprime+nsecond+1" but 2 bytes might be written.
my theory is that the program is trying to write outside the result-array, so there could be a loss of data, I tried to edit my code, therefore I write "nprime+nsecond+2" instead but it doesn't work, and it keeps showing me the same buffer overrun error.
#include <stdlib.h>
char* link( const char* first, const char* second) {
size_t nprime = 0;
size_t nsecond = 0;
if (first == NULL) {
return NULL;
}
if (second == NULL) {
return NULL;
}
for (size_t i = 0; first[i] < '\0'; i++) {
nprime++;
}
for (size_t i = 0; second[i] < '\0'; i++) {
nsecond++;
}
char* result = malloc(nprime + nsecond + 1);
if (result == NULL) {
return NULL;
}
for (size_t i = 0; i < nprime; i++) {
result[i] = first[i];
}
for (size_t i = 0; i < nsecond; i++) {
result[nprime + i] = second[i];
}
result[nprime + nsecond] = 0;
return result;
}
this is the main:
int main(void) {
char s1[] = "this is a general string ";
char s2[] = "this is a general test.";
char* s;
s = link(s1, s2);
return 0;
}
The warning is given due to the wrong conditions you defined in the first 2 for loops. The right loops should be as follows:
for (size_t i = 0; first[i] != '\0'; i++) {
nprime++;
}
for (size_t i = 0; second[i] != '\0'; i++) {
nsecond++;
}
With the conditions you defined (i.e. first[i] < '\0') you are just counting how many chars in the given string have an ASCII code lower than the ASCII code of \0 and exit the loop as soon as you find a char not fulfilling such condition.
Since '\0' has ASCII value 0, your nprime and nsecond are never incremented, leading to a malloc with insufficient room for the chars you actually need.

How to add space between every characters using C

I want a space between every character of a string like I will give input "HELLO"
the result will be "H E L L O"
I need help in that
[Edit from comments]
I want it in a string
for (i = 0; i <= strlen(str); i++) {
printf("\n String is: %s", str[i]);
printf(" ");
}
The shorter, more general answer is that you need to bump characters back, and insert a ' ' in between them. What have you done so far? Does it need to be in place?
One (perhaps not optimal, but easy to follow solution) would be making a larger array, copying in alternating letters, something like (not guaranteed to work verbatim)
char foo[N]; // assuming this has N characters and you want to add a space in between all of them.
char bar[2*N];
for (int i = 0; i < N; i++) {
bar[2*i] = foo[i];
if (i != N - 1)
bar[2*i + 1] = ' ';
}
Of course, this new string is in bar, but functions as desired. At what point are you having issues?
try this
#include <stdio.h>
void add_spaces(char need_to_add[])
{
int len = strlen(need_to_add);
char with_spaces[len*2];
int space_index = 0;
for (int i=0 ; i<len ; i++)
{
with_spaces[space_index]=need_to_add[i];
with_spaces[++space_index]=' ';
space_index=space_index+1;
}
printf("%s\n", with_spaces);
}
int main()
{
char * a = "aaa";
add_spaces(a); // fraught with problems
return 1;
}

realloc() seems to affect already allocated memory

I am experiencing an issue where the invocation of realloc seems to modify the contents of another string, keyfile.
It's supposed to run through a null-terminated char* (keyfile), which contains just above 500 characters. The problem, however, is that the reallocation I perform in the while-loop seems to modify the contents of the keyfile.
I tried removing the dynamic reallocation with realloc and instead initialize the pointers in the for-loop with a size of 200*sizeof(int) instead. The problem remains, the keyfile string is modified during the (re)allocation of memory, and I have no idea why. I have confirmed this by printing the keyfile-string before and after both the malloc and realloc statements.
Note: The keyfile only contains the characters a-z, no digits, spaces, linebreaks or uppercase. Only a text of 26, lowercase letters.
int **getCharMap(const char *keyfile) {
char *alphabet = "abcdefghijklmnopqrstuvwxyz";
int **charmap = malloc(26*sizeof(int));
for (int i = 0; i < 26; i++) {
charmap[(int) alphabet[i]] = malloc(sizeof(int));
charmap[(int) alphabet[i]][0] = 0; // place a counter at index 0
}
int letter;
int count = 0;
unsigned char c = keyfile[count];
while (c != '\0') {
int arr_count = charmap[c][0];
arr_count++;
charmap[c] = realloc(charmap[c], (arr_count+1)*sizeof(int));
charmap[c][0] = arr_count;
charmap[c][arr_count] = count;
c = keyfile[++count];
}
// Just inspecting the results for debugging
printf("\nCHARMAP\n");
for (int i = 0; i < 26; i++) {
letter = (int) alphabet[i];
printf("%c: ", (char) letter);
int count = charmap[letter][0];
printf("%d", charmap[letter][0]);
if (count > 0) {
for (int j = 1; j < count+1; j++) {
printf(",%d", charmap[letter][j]);
}
}
printf("\n");
}
exit(0);
return charmap;
}
charmap[(int) alphabet[i]] = malloc(sizeof(int));
charmap[(int) alphabet[i]][0] = 0; // place a counter at index 0
You are writing beyond the end of your charmap array. So, you are invoking undefined behaviour and it's not surprising that you are seeing weird effects.
You are using the character codes as an index into the array, but they do not start at 0! They start at whatever the ASCII code for a is.
You should use alphabet[i] - 'a' as your array index.
The following piece of code is a source of troubles:
int **charmap = malloc(26*sizeof(int));
for (int i = 0; i < 26; i++)
charmap[...] = ...;
If sizeof(int) < sizeof(int*), then it will be performing illegal memory access operations.
For example, on 64-bit platforms, the case is usually sizeof(int) == 4 < 8 == sizeof(int*).
Under that scenario, by writing into charmap[13...25], you will be accessing unallocated memory.
Change this:
int **charmap = malloc(26*sizeof(int));
To this:
int **charmap = malloc(26*sizeof(int*));

glibc detected, realloc(): invalid old size C

Admittedly what I was trying to do was at best educated guess work.
I have an array of strings, and I was trying to account for if someone entered a string that was too large, or the array received too much input (it needs to be dynamic for what I'm trying to do)
here's the section of code that's breaking:
if (strlen(token) > wordlength)
{
wordlength *= 2;
for(int j = 0; j < numwords; j++)
{
char* tmpw = realloc(wordarray[j], wordlength);
assert(tmpw != NULL);
wordarray[j] = tmpw;
printf("increased size of words to %zu \n", wordlength);
}
}
Explanations:
Token is the next word being taken in (I'm parsing a string) so I compare it to the current word length, if it's too big, i double word length and try to adjust the array accordingly.
If you need any more information let me know
initialization of wordarray:
wordarray = malloc(numwords);
for(int i = 0; i < numwords; i++)
wordarray[i] = malloc(wordlength);
Another place where realloc crashes:
if (arraycounter > numwords)
{
numwords *= 2;
char** tmp = realloc(wordarray, numwords);
assert(tmp != NULL);
wordarray = tmp;
for(int h = arraycounter; h < numwords; h++)
wordarray[h] = malloc(wordlength);
printf("increased size of wordarray to %zu \n", numwords);
}
In this situation, it would attempt to increase the size of the array if it was about to go over the initial set limit, so would not be affected due to token running out of memory (tested with 20 small words and it crashed on its attempt to resize)
You need
wordarray = malloc(numwords * sizeof *wordarray);
Also, what do you want to happen when your program is compiled without assert and run on a system with low memory? I mean, the use of assert() is probably wrong.

How would you add chars to an array dynamically? Without the array being predefined?

If I want to add chars to char array, I must do it like this:
#include <stdio.h>
int main() {
int i;
char characters[7] = "0000000";
for (i = 0; i < 7; i++) {
characters[i] = (char)('a' + i);
if (i > 2) {
break;
}
}
for (i = 0; i < 7; i++) {
printf("%c\n", characters[i]);
}
return 0;
}
To prevent from printing any weird characters, I must initialize the array, but it isn't flexible. How can I add characters to a char array dynamically? Just like you would in Python:
characters = []
characters.append(1)
...
There is no non-ugly solution for pure C.
#include <stdio.h>
int main() {
int i;
size_t space = 1; // initial room for string
char* characters = malloc(space); // allocate
for (i = 0; i < 7; i++) {
characters[i] = (char)('a' + i);
space++; // increment needed space by 1
characters = realloc(characters, space); // allocate new space
if (i > 2) {
break;
}
}
for (i = 0; i < 7; i++) {
printf("%c\n", characters[i]);
}
return 0;
}
In practice you want to avoid the use of realloc and of course allocate the memory in bigger chunks than just one byte, maybe even at an exponetial rate. But in essence thats what happening under the hood of std::string and the like: You need a counter, which counts the current size, a variable of the current maximum size (Here it is always current size+1, for simplicity) and some reallocation if the need for space surpasses the maximum current size.
Yes, of course you can add characters dynamically:
quote char[100] = "The course of true love";
strcat( quote, " never did run smooth.";
but only if there is enough room in quote[ ] to hold the appended characters. Or maybe you are asking why, in C, you have to pre-arrange enough character storage whereas, in Python, storage is allocated dynamically. That's how the language was designed in 197x.
C99 does allow dynamically-allocated storage: storage allocated by the system at run time. And a very bad mistake it is, imo.
You cannot unless you use Linked Lists or some other custom data structure.

Resources