Below I have extracted a piece of code from a project over which I ran a static analysis tool looking for security flaws. It flagged this code as being susceptible to an integer overflow/wraparound flaw as explained here:
https://cwe.mitre.org/data/definitions/190.html
Here is the relevant code:
#define STRMAX 16
void bar() {
// ... struct malloc'd elsewhere
mystruct->string = malloc(STRMAX * sizeof(char));
memset(mystruct->string, '\0', STRMAX);
strncpy(mystruct->string,"H3C19H1E4XAA9MQ",STRMAX); // 15 character string
mystruct->baz = malloc(3 * sizeof(char*));
memset(mystruct->baz, '\0', 3);
if (strlen(mystruct->string) > 0) {
strncpy(mystruct->baz,&(mystruct->string[0]),2);
}
mystruct->quux = malloc(19 * sizeof(char));
memset(mystruct->quux, '\0', 19);
for(int i = 0; i < 19; i++) {
mystruct->quux[i] = 'a';
}
foo(struct);
}
void foo(Mystruct *struct) {
// this was flagged
size_t input_len = strlen(mystruct->baz) + strlen(mystruct->quux);
// this one too but it flows from the first I think.
char *input = malloc((input_len + 1) * sizeof(char));
memset(input, '\0', input_len + 1);
strncpy(input, mystruct->baz, strlen(mystruct->baz));
strncat(input, mystruct->quux, strlen(mystruct->quux));
// ...
}
So in other words, there are three members of a struct that are explicitly bounded in one function and then used to create a variable in another one based on the size of the struct's members.
The analyzer flagged the first line of foo in particular. My question is, did it correctly flag this? If so, what would be a simple way to mitigate this? I'm on a platform that doesn't have a BSD-style reallocate function by default.
PS: I realize that strncpy(foo, bar, strlen(bar)) is somewhat frivolous in memory terms.
size_t input_len = strlen(struct->baz) + strlen(struct->quux); can theoretically wraparound so the analyser is right. If strlen(struct->quux) is larger than SIZE_MAX - strlen(struct->baz) it will wraparound
If input_len == SIZE_MAX then input_len + 1 can wraparound as well.
When you add two string lengths, the result can exceed the maximum value of the size_t type. For instance, if SIZE_MAX were 100 (I know this is not realistic, it's just for example purposes), and your strings were 70 and 50 characters long, the total would wrap around to 20, which is not the correct result.
This is rarely a real concern, since the actual maximum is usually very large (the C standard allows it to be as low as 65535, but you're only likely to run into this on microcontrollers), and in practice strings are not even close to the limit, so most programmers would just ignore this problem. Protecting against this overflow gets complicated, although it's possible if you really need to.
Related
After checking out this question I did not found required solution, so I've tried to use strtol in following manner:
in = (unsigned char *)malloc(16);
for(size_t i = 0; i < (size_t)strlen(argv[2]) / 2; i+=2 )
{
long tmp = 0;
tmp = strtol((const char *)argv[2] + i, &argv[2] + 2, 16);
memcpy(&in[i], &tmp, 1);
}
This code produced several intermediate values:
Can someone please explain me why entire in array gets filled by 0xFF(255) bytes and why tmp is not equal it's estimated value?
Tips about how to improve above code to fill in array with correct hex values also welcome.
Your code is erroneous for multiple counts and the casts hide the problems:
Casting the return value of malloc is not necessary in C and can potentially hide an unsafe conversion if you forget to include <stdlib.h>:
in = (unsigned char *)malloc(16);
Casting the return value of strlen to (size_t) is useless, even redundant as strlen is defined to return a size_t. Again you might have forgotten to include <string.h>...
for (size_t i = 0; i < (size_t)strlen(argv[2]) / 2; i += 2) {
long tmp = 0;
strtol takes a const pointer to char, which argv[2] + i will convert to implicitly. The cast (const char*) is useless. The second argument is the address of a char*. You pass the address of the fifth element of argv, in other terms &argv[4], most certainly not what you indent to do, although your loop's purpose is quite obscure...
tmp = strtol((const char *)argv[2] + i, &argv[2] + 2, 16);
Copying the long value in tmp with memcpy would require to copy sizeof(tmp) bytes. Copying the first byte only has an implementation defined effect, depending on the size of char and the endianness of the target system:
memcpy(&in[i], &tmp, 1);
}
You should post a complete compilable example that illustrates your problem, a code fragment is missing important context information, such as what header files are included, how variables are defined and what code is executed before the fragment.
As written, your code does not make much sense, trying to interpret its behavior is pointless.
Regarding the question in reference, your code does not even remotely provide a solution for converting a string of hexadecimal characters to an array of bytes.
This question already has answers here:
Faster approach to checking for an all-zero buffer in C?
(19 answers)
Closed 7 years ago.
I have a mass of data, maybe 4MB. Now want to check if all bits in it are 0.
Eg:
Here is the data:
void* data = malloc(4*1024*1024);
memset(data, 0, 4*1024*1024);
Check if all bits in it are 0. Here is my solution which is not fast enough:
int dataisnull(char* data, int length)
{
int i = 0;
while(i<length){
if (data[i]) return 0;
i++;
}
return 1;
}
This code might have some things to improve in performance. For example, in 32/64 bits machine, checking 4/8 bytes at a time may be faster.
So I wonder what is the fastest way to do it?
You can handle multiple bytes at a time and unroll the loop:
int dataisnull(const void *data, size_t length) {
/* assuming data was returned by malloc, thus is properly aligned */
size_t i = 0, n = length / sizeof(size_t);
const size_t *pw = data;
const unsigned char *pb = data;
size_t val;
#define UNROLL_FACTOR 8
#if UNROLL_FACTOR == 8
size_t n1 = n - n % UNROLL_FACTOR;
for (; i < n1; i += UNROLL_FACTOR) {
val = pw[i + 0] | pw[i + 1] | pw[i + 2] | pw[i + 3] |
pw[i + 4] | pw[i + 5] | pw[i + 6] | pw[i + 7];
if (val)
return 0;
}
#endif
val = 0;
for (; i < n; i++) {
val |= pw[i];
}
for (i = n * sizeof(size_t); i < length; i++) {
val |= pb[i];
}
return val == 0;
}
Depending on your specific problem, it might be more efficient to detect non zero values early or late:
If the all zero case is the most common, you should compute cumulate all bits into the val accumulator and test only at the end.
If the all zero case is rare, you should check for non zero values more often.
The unrolled version above is a compromise that tests for non zero values every 64 or 128 bytes depending on the size of size_t.
Depending on your compiler and processor, you might get better performance by unrolling less or more. You could also use intrinsic functions available for your particular architecture to take advantage of vector types, but it would be less portable.
Note that the code does not verify proper alignment for the data pointer:
it cannot be done portably.
it assumes the data was allocated via malloc or similar, hence properly aligned for any type.
As always, benchmark different solutions to see if it makes a real difference. This function might not be a bottleneck at all, writing a complex function to optimize a rare case is counterproductive, it makes the code less readable, more likely to contain bugs and much less maintainable. For example, the assumption on data alignment may not hold if you change memory allocation scheme or if you use static arrays, the function may invoke undefined behavior then.
The following checks if the first byte is what you want, and all subsequent pairs of bytes are the same.
int check_bytes(const char * const data, size_t length, const char val)
{
if(length == 0) return 1;
if(*data != val) return 0;
return memcmp(data, data+1, length-1) ? 0 : 1;
}
int check_bytes64(const char * const data, size_t length, const char val)
{
const char * const aligned64_start = (char *)((((uintptr_t)data) + 63) / 64 * 64);
const char * const aligned64_end = (char *)((((uintptr_t)data) + length) / 64 * 64);
const size_t start_length = aligned64_start - data;
const size_t aligned64_length = aligned64_end - aligned64_start;
const size_t end_length = length - start_length - aligned64_length;
if (!check_bytes(data, start_length, val)) return 0;
if (!check_bytes(aligned64_end, end_length, val)) return 0;
return memcmp(aligned64_start, aligned64_start + 64, aligned64_length-64) ? 0 : 1;
}
A more elaborate version of this function should probably pass cache-line-aligned pointers to memcmp, and manually check the remaining blocks(s) instead of just the first byte.
Of course, you will have to profile on your specific hardware to see if there is any speed benefit of this method vs others.
If anyone doubts whether this works, ideone.
I once wrote the following function for my own use. It assumes that the data to check is a multiple of a constant chunk size and aligned properly for a buffer of machine words. If this is not given in your case, it is not hard to loop for the first and last few bytes individually and only check the bulk with the optimized function. (Strictly speaking, it is undefined behavior even if the array is properly aligned but the data has been written by any type that is incompatible with unsigned long. However, I believe that you can get pretty far with this careful breaking of the rules here.)
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
bool
is_all_zero_bulk(const void *const p, const size_t n)
{
typedef unsigned long word_type;
const size_t word_size = sizeof(word_type);
const size_t chunksize = 8;
assert(n % (chunksize * word_size) == 0);
assert((((uintptr_t) p) & 0x0f) == 0);
const word_type *const frst = (word_type *) p;
const word_type *const last = frst + n / word_size;
for (const word_type * iter = frst; iter != last; iter += chunksize)
{
word_type acc = 0;
// Trust the compiler to unroll this loop at its own discretion.
for (size_t j = 0; j < chunksize; ++j)
acc |= iter[j];
if (acc != 0)
return false;
}
return true;
}
The function itself is not very smart. The main ideas are:
Use large unsigned machine words for data comparison.
Enable loop unrolling by factoring out an inner loop with a constant iteration count.
Reduce the number of branches by ORing the words into an accumulator and only comparing it every few iterations against zero.
This should also make it easy for the compiler to generate vectorized code using SIMD instructions which you really want for code like this.
Additional non-standard tweaks would be to annotate the function with __attribute__ ((hot)) and use __builtin_expect(acc != 0, false). Of course, the most important thing is to turn on your compiler's optimizations.
Basically I'm building a bit reader, slightly before the buffer has been exhausted, I'd like to copy whatever is left of the array to the beginning of the same array, and then zero everything after the copy, and fill the rest with data from the input file.
I'm only trying to use the standard library for portability reasons.
Also, I was profiling my bit reader earlier and Instruments says it was taking like 28 milliseconds to do all of this, is it supposed to take that long?
Code removed
I recommend using memmove for the copy. It has a signature and functionality identical to memcpy, except that it is safe for copying between overlapping regions (which is what you're describing).
For the zero-fill, memset is usually adequate. On the occasion where null pointers aren't represented using an underlying sequence of zeros, for example, you'll need to roll your own using assignment depending upon the type.
For this reason you might want to hide the memmove and memset operations behind abstraction, for example:
#include <string.h>
void copy_int(int *destination, int *source, size_t size) {
memmove(destination, source, size * sizeof *source);
}
void zero_int(int *seq, size_t size) {
memset(seq, 0, size * sizeof *seq);
}
int main(void) {
int array[] = { 0, 1, 2, 3, 4, 5 };
size_t index = 2
, size = sizeof array / sizeof *array - index;
copy_int(array, array + index, size);
zero_int(array + size, index);
}
Should either memmove or memset become unsuitable for usecases in the future, it'll then be simple to drop in your own copy/zero loops.
As for your strange profiler results, I suppose it might be possible that you're using some archaic (or grossly underclocked) implementation, or trying to copy huge arrays... Otherwise, 28 milliseconds does seem quite absurd. Nonetheless, your profiler would have surely identified that this memmove and memset isn't a significant bottleneck in an a program that performs actual I/O work, right? The I/O must surely be the bottleneck, right?
If the memmove+memset is indeed a bottleneck, you could try implementing a circular array to avoid the copies. For example, the following code attempts to find needle in the figurative haystack that is input_file...
Otherwise, if the I/O is a bottleneck, there are tweaks that can be applied to reduce that. For example, the following code uses setvbuf to suggest that the underlying implementation attempt to use an underlying buffer to read chunks of the file, despite the code using fgetc to read one character at a time.
void find_match(FILE *input_file, char const *needle, size_t needle_size) {
char input_array[needle_size];
size_t sz = fread(input_array, 1, needle_size, input_file);
if (sz != needle_size) {
// No matches possible
return;
}
setvbuf(input_file, NULL, _IOFBF, BUFSIZ);
unsigned long long pos = 0;
for (;;) {
size_t cursor = pos % needle_size;
int tail_compare = memcmp(input_array, needle + needle_size - cursor, cursor),
head_compare = memcmp(input_array + cursor, needle, needle_size - cursor);
if (head_compare == 0 && tail_compare == 0) {
printf("Match found at offset %llu\n", pos);
}
int c = fgetc(input_file);
if (c == EOF) {
break;
}
input_array[cursor] = c;
pos++;
}
}
Notice how there's no memmove (or zeroing, FWIW) necessary here? We simply operate as though the start of the array is at cursor, the end at cursor - 1 and we wrap by modulo needle_size to ensure there's no overflow/underflow. Then, after each insertion we simply increment the cursor...
I am trying to do a function that will store in a char array some information to print on it:
int offset = 0;
size_t size = 1;
char *data = NULL;
data = malloc(sizeof(char));
void create(t_var *var){
size_t sizeLine = sizeof(char)*(strlen(var->nombre)+2)+sizeof(int);
size = size + sizeLine;
realloc(data, size);
sprintf(data+offset,"%s=%d\n",var->name,var->value);
offset=strlen(data);
}
list_iterate(aList, (void *)create);
t_var is a struct that has two fields: name (char*) and value (int).
What's wrong with this code? When running it on Valgrind it complains about the realloc and sprintf.
Without knowing the specific valgrind errors, the standout one is:
realloc(data, size); should be data = realloc(data, size);
I'm sorry to say that, but almost EVERYTHING is wrong with your code.
First, incomplete code.
You say your t_var type has two members, name and value.
But your code refers to a nombre member. Did you forget to mention it or did you forget to rename it when publishing the code?
Second, misused sizeof.
You use a sizeof(int) expression. Are you aware what you actually do here?!
Apparently you try to calculate the length of printed int value. Alas, operator sizeof retrieves the information about a number of bytes the argument occupies in memory. So, for example, for 32-bits integer the result of sizeof(int) is 4 (32 bits fit in 4 bytes), but the maximum signed 32-bit integer value is power(2,31)-1, that is 2147483647 in decimal. TEN digits, not four.
You can use (int)(2.41 * sizeof(any_unsigned_int_type)+1) to determine a number of characters you may need to print the value of any_unsigned_int_type. Add one for a preceding minus in a case of signed integer types.
The magic constant 2.41 is a decimal logarithm of 256 (rounded up at the 3-rd decimal digi), thus it scales the length in bytes to a length in decimal digits.
If you prefer to avoid floating-point operations you may use another approximation 29/12=2.41666..., and compute (sizeof(any_unsigned_int_type)*29/12+1).
Third, sizeof(char).
You multiply the result of strlen by sizeof(char).
Not an error, actually, but completely useless, as sizeof(char) equals 1 by definition.
Fourth, realloc.
As others already explained, you must store the return value:
data = realloc(data, size);
Otherwise you risk you loose your re-allocated data AND you continue writing at the previous location, which may result in overwriting (so destroying) some other data on the heap.
Fifth, offset.
You use that value to determine the position to sprintf() at. However, after the print you substitute offset with a length of last printout instead of incrementing it. As a result consecutive sprintfs will overwrite previous output!
Do:
offset += strlen(data);
Sixth: strlen of sprintf.
You needn't call strlen here at all, as all functions of printf family return the number of characters printed. You can just use that:
int outputlen = sprintf(data+offset, "%s=%d\n", var->name, var->value);
offset += outputlen;
Seventh: realloc. Seriously.
This is quite costly function. It may need to do internal malloc for a new size of data, copy your data into a new place and free the old block. Why do you force it? What impact will it have on your program if it needs to print five thousand strings some day...?
It is also quite dangerous. Really. Suppose you need to print 5,000 strings but there is room for 2,000 only. You will get a NULL pointer from realloc(). All the data printed to the point are still at the current data pointer, but what will you do next?
How can you tell list_iterate to stop iterating...?
How can you inform the routine above the list_iterate that the string is incomplete...?
There is no good answer. Luckily you needn't solve the problem — you can just avoid making it!
Solution.
Traverse your list first and calculate the size of buffer you need. Then allocate the buffer — just once! — and go on with filling it. There is just one place where the allocation may fail and you can simply not go into the problem if that ever happens:
int totaloutputlength = 0;
char *outputbuffer = NULL;
char *currentposition = NULL;
void add_var_length(t_var *var){
const int numberlength = sizeof(var->value)*29/12 + 1;
totaloutputlength += strlen(var->name) + 2 + numberlength;
}
void calculate_all_vars_length(t_list *aList){
totaloutputlength = 0;
list_iterate(aList, (void *)add_var_length);
}
void sprint_var_value(t_var *var){
int outputlen = sprintf(currentposition, "%s=%d\n", var->name, var->value);
currentposition += outputlen; // advance the printing position
}
int sprint_all_vars(t_list *aList){
calculate_all_vars_length(aList);
outputbuffer = malloc(totaloutputlength + 1); // +1 for terminating NUL char
// did allocation succeed?
if(outputbuffer == NULL) { // NO
// possibly print some error message...
// possibly terminate the program...
// or just return -1 to inform a caller something went wrong
return -1;
}
else { // YES
// set the initial printing position
currentposition = outputbuffer;
// go print all variables into the buffer
list_iterate(aList, (void *)sprint_var_value);
// return a 'success' status
return 0;
}
}
There seems to be a lot of confusion regarding the purpose of the two arguments 'size' and 'count' in fwrite(). I am trying to figure out which will be faster -
fwrite(source, 1, 50000, destination);
or
fwrite(source, 50000, 1, destination);
This is an important decision in my code as this command will be executed millions of times.
Now, I could just jump to testing and use the one which gives better results, but the problem is that the code is intended for MANY platforms.
So,
How can I get a definitive answer to which is better across platforms?
Will implementation logic of fwrite() vary from platform to platform?
I realize there are similar questions (What is the rationale for fread/fwrite taking size and count as arguments?, Performance of fwrite and write size) but do understand that this is a different question regarding the same issue. The answers in similar questions do not suffice in this case.
The performance should not depend on either way, because anyone implementing fwrite would multiply size and count to determine how much I/O to do.
This is exemplified by FreeBSD's libc implementation of fwrite.c, which in its entirety reads (include directives elided):
/*
* Write `count' objects (each size `size') from memory to the given file.
* Return the number of whole objects written.
*/
size_t
fwrite(buf, size, count, fp)
const void * __restrict buf;
size_t size, count;
FILE * __restrict fp;
{
size_t n;
struct __suio uio;
struct __siov iov;
/*
* ANSI and SUSv2 require a return value of 0 if size or count are 0.
*/
if ((count == 0) || (size == 0))
return (0);
/*
* Check for integer overflow. As an optimization, first check that
* at least one of {count, size} is at least 2^16, since if both
* values are less than that, their product can't possible overflow
* (size_t is always at least 32 bits on FreeBSD).
*/
if (((count | size) > 0xFFFF) &&
(count > SIZE_MAX / size)) {
errno = EINVAL;
fp->_flags |= __SERR;
return (0);
}
n = count * size;
iov.iov_base = (void *)buf;
uio.uio_resid = iov.iov_len = n;
uio.uio_iov = &iov;
uio.uio_iovcnt = 1;
FLOCKFILE(fp);
ORIENT(fp, -1);
/*
* The usual case is success (__sfvwrite returns 0);
* skip the divide if this happens, since divides are
* generally slow and since this occurs whenever size==0.
*/
if (__sfvwrite(fp, &uio) != 0)
count = (n - uio.uio_resid) / size;
FUNLOCKFILE(fp);
return (count);
}
The purpose of two arguments gets more clear, if you consider ther return value, which is the count of objects successfuly written/read to/from the stream:
fwrite(src, 1, 50000, dst); // will return 50000
fwrite(src, 50000, 1, dst); // will return 1
The speed might be implementation dependent although, I don't expect any considerable difference.
I'd like to point you to my question, which ended up exposing an interesting performance difference between calling fwrite once and calling fwrite multiple times to write a file "in chunks".
My problem was that there's a bug in Microsoft's implementation of fwrite so files larger than 4GB cannot be written in one call (it hangs at fwrite). So I had to work around this by writing the file in chunks, calling fwrite in a loop until the data was completely written. I found that this latter method always returns faster than the single fwrite call.
I'm in Windows 7 x64 with 32 GB of RAM, which makes write caching pretty aggressive.