Concatenate with memcpy

Concatenate with memcpy - c

I'm trying to add two strings together using memcpy. The first memcpy does contain the data, I require. The second one does not however add on. Any idea why?
if (strlen(g->db_cmd) < MAX_DB_CMDS )
{
memcpy(&g->db_cmd[strlen(g->db_cmd)],l->db.param_value.val,strlen(l->db.param_value.val));
memcpy(&g->db_cmd[strlen(g->db_cmd)],l->del_const,strlen(l->del_const));
g->cmd_ctr++;
}

size_t len = strlen(l->db.param_value.val);
memcpy(g->db_cmd, l->db.param_value.val, len);
memcpy(g->db_cmd + len, l->del_const, strlen(l->del_cost)+1);
This gains you the following:
Less redundant calls to strlen. Each of those must traverse the string, so it's a good idea to minimize these calls.
The 2nd memcpy needs to actually append, not replace. So the first argument has to differ from the previous call.
Note the +1 in the 3rd arg of the 2nd memcpy. That is for the NUL terminator.
I'm not sure your if statement makes sense either. Perhaps a more sane thing to do would be to make sure that g->db_cmd has enough space for what you are about to copy. You would do that via either sizeof (if db_cmd is an array of characters) or by tracking how big your heap allocations are (if db_cmd was acquired via malloc). So perhaps it would make most sense as:
size_t param_value_len = strlen(l->db.param_value.val),
del_const_len = strlen(l->del_const);
// Assumption is that db_cmd is a char array and hence sizeof(db_cmd) makes sense.
// If db_cmd is a heap allocation, replace the sizeof() with how many bytes you
// asked malloc for.
//
if (param_value_len + del_const_len < sizeof(g->db_cmd))
{
memcpy(g->db_cmd, l->db.param_value.val, param_value_len);
memcpy(g->db_cmd + param_value_len, l->del_const, del_const_len + 1);
}
else
{
// TODO: your buffer is not big enough. handle that.
}

You're not copying the null terminator, you're only coping the raw string data. That leaves your string non-null-terminated, which can cause all sorts of problems. You're also not checking to make sure you have enough space in your buffer, which can result in buffer overflow vulnerabilities.
To make sure you copy the null terminator, just add 1 to the number of bytes you're copying -- copy strlen(l->db.param_value.val) + 1 bytes.

One possible problem is that your first memcpy() call won't necessarily result in a null terminated string since you're not copying the '\0' terminator from l->db.param_value.val:
So when strlen(g->db_cmd) is called in the second call to memcpy() it might be returning something completely bogus. Whether this is a problem depends on whether the g->db_cmd buffer is initialized to zeros beforehand or not.
Why not use the strcat(), which was made to do exactly what you're trying to do with memcpy()?
if (strlen(g->db_cmd) < MAX_DB_CMDS )
{
strcat( g->db_cmd, l->db.param_value.val);
strcat( g->db_cmd, l->del_const);
g->cmd_ctr++;
}
That'll have the advantage of being easier for someone to read. You might think it would be less performant - but I don't think so since you're making a bunch of strlen() calls explicitly. In any case, I'd concentrate on getting it right first, then worry about performance. Incorrect code is as unoptimized as you can get - get it right before getting it fast. In fact, my next step wouldn't be to improve the code performance-wise, it would be to improve the code to be less likely to have a buffer overrun (I'd probably switch to using something like strlcat() instead of strcat()).
For example, if g->db_cmd is a char array (and not a pointer), the result might look like:
size_t orig_len = strlen(g->db_cmd);
size_t result = strlcat( g->db_cmd, l->db.param_value.val, sizeof(g->db_cmd));
result = strlcat( g->db_cmd, l->del_const, sizeof(g->db_cmd));
g->cmd_ctr++;
if (result >= sizeof(g->db_cmd)) {
// the new stuff didn't fit, 'roll back' to what we started with
g->db_cmd[orig_len] = '\0';
g->cmd_ctr--;
}
If strlcat() isn't part of your platform it can be found on the net pretty easily. If you're using MSVC there's a strcat_s() function which you could use instead (but note that it's not equivalent to strlcat() - you'd have to change how the results from calling strcat_s() are checked and handled).

Related

Using memcpy() to move tail of buffer to its beginning? (overlap)

I have binary file read buffer which reads structures of variable length. Near the end of buffer there will always be incomplete struct. I want to move such tail of buffer to its beginning and then read buffer_size - tail_len bytes during next file read. Something like this:
char[8192] buf;
cur = 0, rcur = 0;
while(1){
read("file", &buf[rcur], 8192-rcur);
while (cur + sizeof(mystruct) < 8192){
mystruct_ptr = &buf[cur];
if (mystruct_prt->tailsize + cur >= 8192) break; //incomplete
//do stuff
cur += sizeof(mystruct) + mystruct_ptr->tailsize;
}
memcpy(buf,&buf[cur],8192-cur);
rcur=8192-cur;
cur = 0;
}
It should be okay if tail is small and buffer is big because then memcpy most likely won't overlap copied memory segment during single copy iteration. However it sounds slightly risky when tail becomes big - bigger than 50% of buffer.
If buffer is really huge and tail is also huge then it still should be okay since there's physical limit of how much data can be copied in single operation which if I remember correctly is 512 bytes for modern x86_64 CPUs using vector units. I thought about adding condition that checks length of tail and if it's too big comparing to size of buffer, performs naive byte-by-byte copy but question is:
How big is too big to consider such overlapping memcpy more or less safe. tail > buffer size - 2kb?

Per the standard, memcpy() has undefined behavior if the source and destination regions overlap. It doesn't matter how big the regions are or how much overlap there is. Undefined behavior cannot ever be considered safe.
If you are writing to a particular implementation, and that implementation defines behavior for some such copying, and you don't care about portability, then you can rely on your implementation's specific behavior in this regard. But I recommend not. That would be a nasty bug waiting to bite people who decide to use the code with some other implementation after all. Maybe even future you.
And in this particular case, having the alternative of using memmove(), which is dedicated to this exact purpose, makes gambling with memcpy() utterly reckless.

Is this code vulnerable to buffer overflow?

Fortify reported a buffer overflow vulnerability in below code citing following reason -
In this case we are primarily concerned with the case "Depends upon properties of the data that are enforced outside of the immediate scope of the code.", because we cannot verify the safety of the operation performed by memcpy() in abc.cpp
void create_dir(const char *sys_tmp_dir, const char *base_name,
size_t base_name_len)
{
char *tmp_dir;
size_t sys_tmp_dir_len;
sys_tmp_dir_len = strlen(sys_tmp_dir);
tmp_dir = (char*) malloc(sys_tmp_dir_len + 1 + base_name_len + 1);
if(NULL == tmp_dir)
return;
memcpy(tmp_dir, sys_tmp_dir, sys_tmp_dir_len);
tmp_dir[sys_tmp_dir_len] = FN_LIBCHAR;
memcpy(tmp_dir + sys_tmp_dir_len + 1, base_name, base_name_len);
tmp_dir[sys_tmp_dir_len + base_name_len + 1] = '\0';
..........
..........
}
It appears to me a false positive since we are getting the size of data first, allocating that much amount of space, then calling memcpy with size to copy.
But I am looking for good reasons to convince fellow developer to get rid of current implementation and rather use c++ strings. This issue has been assigned to him. He just sees this a false positive so doesn't want to change anything.
Edit I see quick, valid criticism of the current code. Hopefully, I'll be able to convince him now. Otherwise, I'll hold the baton. :)

Take a look to strlen(), it has input string but it has not an upper bound then it'll go on searching until it founds \0. It's a vulnerability because you'll perform memcpy() trusting its result (if it won't crash because of access violation while searching). Imagine:
create_dir((const char*)12345, baseDir, strlen(baseDir));
You tagged both C and C++...if you're using C++ then std::string will protect you from these issues.

It appears to me a false positive since we are getting the size of data first, allocating that much amount of space
This assumption is a problem that matches the warning/error. In your code, you're assuming that malloc successfully allocated the requested memory. If your system has no memory to spare, malloc will fail and return NULL. When you try to memcpy into tmp_dir, you'd be copying to NULL which would be bad news.
You should check to guarantee that the value returned by malloc is not NULL before considering it as a valid pointer.

Performance of GNU C implementation of getcwd()

According to the GNU Lib C documentation on getcwd()...
The GNU C Library version of this function also permits you to specify a null pointer for the buffer argument. Then getcwd allocates a buffer automatically, as with malloc(see Unconstrained Allocation). If the size is greater than zero, then the buffer is that large; otherwise, the buffer is as large as necessary to hold the result.
I now draw your attention to the implementation using the standard getcwd(), described in the GNU documentation:
char* gnu_getcwd ()
{
size_t size = 100;
while (1)
{
char *buffer = (char *) xmalloc (size);
if (getcwd (buffer, size) == buffer)
return buffer;
free (buffer);
if (errno != ERANGE)
return 0;
size *= 2;
}
}
This seems great for portability and stability but it also looks like a clunky compromise with all that allocating and freeing memory. Is this a possible performance concern given that there may be frequent calls to the function?
*It's easy to say "profile it" but this can't account for every possible system; present or future.

The initial size is 100, holding a 99-char path, longer than most of the paths that exist on a typical system. This means in general that there is no “allocating and freeing memory”, and no more than 98 bytes are wasted.
The heuristic of doubling at each try means that at a maximum, a logarithmic number of spurious allocations take place. On many systems, the maximum length of a path is otherwise limited, meaning that there is a finite limit on the number of re-allocations caused.
This is about the best one can do as long as getcwd is used as a black box.

This is not a performance concern because it's the getcwd function. If that function is in your critical path then you're doing it wrong.
Joking aside, there's none of this code that could be removed. The only way you could improve this with profiling is to adjust the magic number "100" (it's a speed/space trade-off). Even then, you'd only have optimized it for your file system.
You might also think of replacing free/malloc with realloc, but that would result in an unnecessary memory copy, and with the error checking wouldn't even be less code.

Thanks for the input, everyone. I have recently concluded what should have been obvious from the start: define the value ("100" in this case) and the increment formula to use (x2 in this case) to be based on the target platform. This could account for all systems, especially with the use of additional flags.

How to write into a char array in C at specific location using sprintf?

I am trying to port some code written in MATLAB to C, so that I can compile the function and execute it faster (the code is executed very often and it would bring a significant speed increase).
So basically what my MATLAB code does it that it takes a matrix and converts it to a string, adding brackets and commas, so I can write it to a text file. Here's an idea of how this would work for a vector MyVec:
MyVec = rand(1,5);
NbVal = length(MyVec)
VarValueAsText = blanks(2 + NbVal*30 + (NbVal-1));
VarValueAsText([1 end]) = '[]';
VarValueAsText(1 + 31*(1:NbVal-1)) = ',';
for i = 1:NbVal
VarValueAsText(1+(i-1)*31+(1:30)) = sprintf('%30.15f', MyVec(i));
end
Now, how can I achieve a similar result in C? It doesn't seem too difficult, since I can calculate in advance the size of my string (char array) and I know the position of each element that I need to write to my memory area. Also the sprintf function exists in C. However, I have trouble understanding how to set this up, also because I don't have an environment where I can learn easily by trial and error (for each attempt I have to recompile, which often leads to a segmentation fault and MATLAB crashing...).
I hope someone can help even though the problem will probably seem trivial, but I have have very little experience with C and I haven't been able to find an appropriate example to start from...

Given an offset (in bytes) into a string, retrieving a pointer to this offset is done simply with:
char *ptr = &string[offset];
If you are iterating through the lines of your matrix to print them, your loop might look as follow:
char *ptr = output_buffer;
for (i = 0; i < n_lines; i++) {
sprintf (ptr, "...", ...);
ptr = &ptr[line_length];
}
Be sure that you have allocated enough memory for your output buffer though.

Remember that sprintf will put a string-terminator at the end of the string it prints, so if the string you "print" into should be longer than the string you print, then that won't work.
So if you just want to overwrite part of the string, you should probably use sprintf to a temporary buffer, and then use memcpy to copy that buffer into the actual string. Something like this:
char temp[32];
sprintf(temp, "...", ...);
memcpy(&destination[position], temp, strlen(temp));

two mallocs returning same pointer value

I'm filling a structure with data from a line, the line format could be 3 different forms:
1.-"LD "(Just one word)
2.-"LD A "(Just 2 words)
3.- "LD A,B "(The second word separated by a coma).
The structure called instruccion has only the 3 pointers to point each part (mnemo, op1 and op2), but when allocating memory for the second word sometimes malloc returns the same value that was given for the first word. Here is the code with the mallocs pointed:
instruccion sepInst(char *linea){
instruccion nueva;
char *et;
while(linea[strlen(linea)-1]==32||linea[strlen(linea)-1]==9)//Eliminating spaces and tabs at the end of the line
linea[strlen(linea)-1]=0;
et=nextET(linea);//Save the direction of the next space or tab
if(*et==0){//If there is not, i save all in mnemo
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
nueva.op1[0]='k';nueva.op1[1]=0;//And set a "K" for op1
nueva.op2=NULL;
return nueva;
}
nueva.mnemo=malloc(et-linea+1);<-----------------------------------
strncpy(nueva.mnemo,linea,et-linea);
nueva.mnemo[et-linea]=0;printf("\nj%xj",nueva.mnemo);
linea=et;
while(*linea==9||*linea==32)//Move pointer to the second word
linea++;
if(strchr(linea,',')==NULL){//Check if there is a coma
nueva.op1=malloc(strlen(linea)+1);//Do this if there wasn't any coma
strcpy(nueva.op1,linea);
nueva.op2=NULL;
}
else{//Do this if there was a coma
nueva.op1=malloc(strchr(linea,',')-linea+1);<----------------------------------
strncpy(nueva.op1,linea,strchr(linea,',')-linea);
nueva.op1[strchr(linea,',')-linea]=0;
linea=strchr(linea,',')+1;
nueva.op2=malloc(strlen(linea)+1);
strcpy(nueva.op2,linea);printf("\n2j%xj2",nueva.op2);
}
return nueva;
}
When I print the pointers it happens to be the same number.
note: the function char *nextET(char *line) returns the direction of the first space or tab in the line, if there is not it returns the direction of the end of the line.
sepInst() is called several times in a program and only after it has been called several times it starts failing. These mallocs across all my program are giving me such a headache.

There are two main possibilities.
Either you are freeing the memory somewhere else in your program (search for calls to free or realloc). In this case the effect that you see is completely benign.
Or, you might be suffering from memory corruption, most likely a buffer overflow. The short term cure is to use a specialized tool (a memory debugger). Pick one that is available on your platform. The tool will require recompilation (relinking) and eventually tell you where exactly is your code stepping beyond previously defined buffer limits. There may be multiple offending code locations. Treat each one as a serious defect.
Once you get tired of this kind of research, learn to use the const qualifier and use it with all variable/parameter declarations where you can do it cleanly. This cannot completely prevent buffer overflows, but it will restrict them to variables intended to be writable buffers (which, for example, those involved in your question apparently are not).

On a side note, personally, I think you should work harder to call malloc less. It's a good idea for performance, and it also causes corruption less.
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
should be
// strlen has to traverse your string to get the length,
// so if you need it more than once, save its value.
cbLineA = strlen(linea);
// malloc for the string, and the 2 bytes you need for op1.
nueva.mnemo=malloc(cbLineA + 3);
// strcpy checks for \0 again, so use memcpy
memcpy(nueva.mnemo, linea, cbLineA);
nueva.mnemo[cbLineA] = 0;
// here we avoid a second malloc by pointing op1 to the space we left after linea
nueva.op1 = nueva.mnemo + cbLinea + 1;
Whenever you can reduce the number of mallocs by pre-calculation....do it. You are using C! This is not some higher level language that abuses the heap or does garbage collection!