Determine needed # of extra bytes to conduct buffer overflow attack (homework)

Determine needed # of extra bytes to conduct buffer overflow attack (homework) - c

For assistance in completing Level 0 of this buffer overflow project (.pdf) in Assembly, I'm using this guide.
Goal = provide a string longer than getbuf can handle (getbuf() takes
an input of 32 characters), causing an overflow and pushing the rest
of the string onto the stack where you can then control where the
function getbuf returns after exection (we want to call the smoke() function).
When I get to step 2 of the guide, I need to calculate how many additional characters my input string should have, on top of the original 32-character input, in order to execute buffer overflow and call smoke().
(Step 2 Excerpt):
In my case, I find startingAddressOfBufVariable to be 0x55683ddc and %ebp is 0x55683df0.
I calculate buffer size = addressAt%ebp+4 - startingAddressOfBufVariable, which is 0x55683df0+4 - 0x55683ddc = 0x18 = 24.
In the next step, I'm supposed to subtract 32 from that result to determine how many additional characters (on top of the original 32) that my input string should have. However, 24 - 32 = -8. I get a negative number! I'm not sure what to do with that. Subtract 8 characters from my 32-character input string? I'm trying to conduct overflow, so that doesn't make sense.
For testing/guessing purposes, I moved on with the guide, pretending that the -8 result I got was actually a positive 8. So I moved on with the guide, intending to add 8 characters on top of my 32-character string.
Knowing the address of my smoke() function to be 0x08048b2b, I created my input file as such, per the instructions in step 4 (though why did they change aa to 61?):
perl -e 'print "AA "x32, "BB "x4, "CC "x4, "2b 8b 04 08" '>hex3
(Step 4 Excerpt:)
So, Am I using the incorrect math in Step 2 of the guide? Are my addresses I'm using in the math incorrect? If they are correct, how do I interpret the -8 result, and what does the -8 result mean, in terms of modifying my character input to execute the overflow attack?

They write in the example C code that the buffer size is 12. It would make sense that you would have to infer the buffer size from the registers. Maybe try again with a base string size of 0x18?

Related

How to approximate the size of a text file containing a matrix, written by Fortran?

I have written an N*N matrix to a formatted text file using Fortran. The Fortran output format was N(3X,E17.8), where N is the matrix size.
Example of one matrix element: 0.19860753E-08 i.e. "(three spaces)0.19860753E-08".
A 558x558 Matrix has a file size of 5.294.302 Bytes.
How could I approximate the file size for an N*N matrix for any value of N?

I am not sure if you have the lines separate or just all numbers in sequence.
If I read your description correctly you have N(3X,E17.8) per line. In that case every character is one byte. 3 spaces + 17 characters = 20 bytes. One line is 20 * N + 1 (or + 2 on Windows). The +1 or +2 is the line end marker (LF or CR + LF). (This distinction is what the unix2dos and dos2unix utilities convert.)
Then you have N lines, so it should be N*(20 * N + 1).
That does not correspond to the numbers you show.
But the example you show is not consistent with the format you show. The number 0.19860753E-08 is actually just 17 characters (3X,E14.8). In that case it is N*(17 * N +1 ) + 1 or eventually N*(17 * N + 2) with CR + LF. Which corresponds to the file size you quoted much better. 558 * (17 * 558 + 2) = 5 294 304.

For a single record written by a format like
write (*, '(999(3X,E17.8)') a
we can determine how many characters would be written. For each element there will be three spaces followed by a field of width 17: a total of 20 characters for each element. There will then be a number of characters to end the record (depending on filesystem, operating system, etc., usually one or two).
Knowing how large each record is, and how many records are written, you have the size. Depending again on system settings, you may also see a separate end-of-file marker of some size.
We can answer this question because we know how large each field is going to be. We don't always know that from a format. For example, with some edit descriptors the field width is determined later on: A, I0, G0.6, etc., are such.
Finally, note also that without colon editing we may get extra output for an element. X is a special case, if we instead had
write (*, '(999(" ",E17.8)') a
then there would be three spaces additional spaces after the final element written out before the end of the record. This can be prevented in
write (*, '(999(" ":,,E17.8)') a
X is a position edit descriptor: it doesn't actually transfer data so doesn't add to the transfer count unless further data are written.

Rather than calculating the size of the file from the details of the code and the file format, it may be simpler to write out files containing matrices with (at least) three different values of N. You can then fit the file size S as a function of N, as
S = a*N^2 + b*N + c
where a, b and c are constants which you will get from the fit.
The a*N^2 term comes from the representation of the numbers in the matrix.
The b*N term comes from line endings, and any value separators you might be using.
The c term comes from file metadata, and any spare characters you might have in your file.

memset bytewise instead of integer ( 4 bytes )

I am struggling to find the answer to this:
#define BUFLEN 8
unsigned short randombuffer[BUFLEN];
memset(randombuffer, 200 , BUFLEN );
printf("%u", randombuffer[0]);
I am getting the answer as 51400 although I was expecting 200.
After debugging I found out that the randombuffer is filled with 0xC8 for the first 8 entries. Hence 0xC8C8 is 51400. I was however expecting 0x00C8 for each index in the array.
What am I doing wrong?

What you are doing wrong is not reading the spec of memset. memset sets each byte to the specified value. Your buffer has most likely 8 entries of two bytes each. Since you passed 8 to memset, both bytes of the first four entries are changed, the rest isn't touched. That's who memset works.

memset fills bytes, but it looks like you want to fill words. I don't know if there's a memset-like function built in for this, so you might have to do repeated memset/memcpy instead. Note that if you feel comfortable writing inline assembler you could probably do this pretty efficiently yourself in machine code - although a tight loop using pointers in C is probably close to as fast.

The size calculation in the memset call is not correct. It should be:
memset(randombuffer, 200 , BUFLEN * sizeof(*randombuffer) );
because individual elements in randombuffer are two bytes in your case and the third parameter in memset takes the amount of bytes of the object, so you have to pass the number of the elements in the object times their size in bytes.
The values in the elements will still remain 0xC8C8, because memset set a value per byte not per element.
To print them out correctly, use the correct specifier for short:
printf("%hu", randombuffer[0]);
or
printf("%hx", randombuffer[0]);

Generating Strings

I am about creating a distributed Password Cracker, in which I will use brute force technique, so I need every combination of string.
For the sake of distribution, Server will give Client a range of strings like from "aaaa" to "bxyz". I am supposing that string length will be of four. So I need to check every string between these two bounds.
I am trying to generate these strings in C. I am trying to make logic for this but I'm failing; I also searched on Google but no benefit. Any Idea?
EDIT
Sorry brothers, I would like to edit it
I want combination of string with in a range, lets suppose between aaaa and aazz that would be strings like aaaa aaab aaac aaad ..... aazx aazy aazz .. my character space is just upper and smaller English letters that would be like 52 characters. I want to check every combination of 4 characters. but Server will distribute range of strings among its clients. MY question was if one client gets range between aaaa and aazz so how will I generate strings between just these bounds.

If your strings will comprehend only the ASCII table, you'll have, as an upper limit, 256 characters, or 2^8 characters.
Since your strings are 4 characters length, you'll have 2^8 * 2^8 * 2^8 * 2^8 combinations,
or 2^8^4 = 2^32 combinations.
Simply split the range of numbers and start the combinations in each machine.
You'll probably be interested in this: Calculating Nth permutation step?
Edit:
Considering your edit, your space of combinations would be 52^4 = 7.311.616 combinations.
Then, you do simply need to divide these "tasks" for each machine to compute, so, 7.311.616 / n = r, having r as the amount of permutations calculated by each machine -- the last machine may compute r + (7.311.616 % n) combinations.
Since you know the amount of combinations to build in each machine, you'll have to execute the following, in each machine:
function check_permutations(begin, end, chars) {
for (i = begin; i < end; i++) {
nth_perm = nth_permutation(chars, i);
check_permutation(nth_perm); // your function of verification
}
}
The function nth_permutation() is not hard to derive, and I'm quite sure you can get it in the link I've posted.
After this, you would simply start a process with such a function as check_permutations, giving the begin, end, and the vector of characters chars.

You can generate a tree containing all the permutations. E.g., like in this pseudocode:
strings(root,len)
for(c = 'a' to 'z')
root->next[c] = c
strings(&root->next[c], len - 1)
Invoke by strings(root, 4).
After that you can traverse the tree to get all the permutations.

To check wether a given string seqence is palindrome or not using stacks

I want to know how to store the characters or integers popped from stack to be stored so that I can compare it with the original string or int value.
For example :
n = 1221;
n = n / 1000; // so that i get the last digit in this case 1 and dividing the remainder
// further each time by 100, 10 and 1
If I store each number that I get in a variable-- for example, say that I store 1 which I got from the above division in a variable named s, and push it onto the stack.
After this, I pop the values back out; when I do so, how can I check weather it is equal to the original number? I can check it with a if condition, for eg if i have 3 did then
(i == p && check for other two numbers)
but i don`t want that i want to check for any size number.
Please don't send the source code on how to do it, just give me a few snippets or a few hints, thanks. Also please let me know how you came up for the solution.
Thanks!

Dont send the source code on how to do it
ok ;)
just give me few snippets or few hints
Recursion
and while you give the solution let me know how you came up for the solution, weather you had seen a program like this or know the algorithm or came up with a solution now when you saw the question. Thanks
i have been doing this too long to remember where i first saw it =\

You can use recursion to solve the problem, like Justin mentioned.
Once you start your recursion you will divide the number with an appropriate divisor(1000,100,10,1) store the quotient in the stack. You do this till the end.
Once you reach the 'unit' place, you store the unit digit, and now start popping out of the stack.
Have an if-else ladder to return an integer from the recursive function.
The ladder will have conditions of returning the int after left shifting and returning the variable shifted.
You can do your check in the main function.
1 ->1221(left shift 122 OR with 1)
2 ->122(left shift 12 OR with 2)
2 ->12(left shift 1 OR with 2)
1---->1
Hope this helps.
Thanks
Aditya

Need to make a very large array [2^16+1][2^16+1] - size of array is too large :(

Greetings
I need to calculate a first-order entropy (Markov source, like on wiki here http://en.wikipedia.org/wiki/Entropy_(information_theory) of a signal that consists of 16bit words.
This means, i must calculate how frequently each combination of a->b (symbol b appears after a) is happening in the data stream.
When i was doing it for just 4 less significant or 4 more significant bits, i used a two dimensional array, where first dimension was the first symbol and second dimension was the second symbol.
My algorithm looked like this
Read current symbol
Array[prev_symbol][curr_symbol]++
prev_symbol=curr_symbol
Move forward 1 symbol
Then, Array[a][b] would mean how many times did symbol b going after symbol a has occurred in a stream.
Now, i understand that array in C is a pointer that is incremented to get exact value, like to get element [3][4] from array[10][10] i have to increment pointer to array[0][0] by (3*10+4)(size of variable stored in array). I understand that the problem must be that 2^32 elements of type unsigned long must be taking too much.
But still, is there a way to deal with it?
Or maybe there is another way to accomplish this?

An two-dimensional array of integers (4 byte) with 32'000 by 32'000 elements occupies about 16 GByte of RAM. Does your machine have that much memory?
Anyhow, out of the more than 1 billion array elements, only very few will have a count different from zero. So it's probably better to go with some sort of sparse storage.
One solution would be to use a dictionary where the tuple (a, b) is the key and the count of occurrences is the value.

Perhaps you could do multiple passes over the data. The entropy contribution from pairs beginning with symbol X is essentially independent of pairs beginning with any other symbol (aside from the total number of them, of course), so you can calculate the entropy for all such pairs and then throw away the distribution data. At the end, combine 2^16 partial entropy values to get the total. You don't necessarily have to do 2^16 passes over the data, you can be "interested" in as many initial characters in a single pass as you have space for.
Alternatively, if your data is smaller than 2^32 samples, then you know for sure that you won't see all possible pairs, so you don't actually need to allocate a count for each one. If the sample is small enough, or the entropy is low enough, then some kind of sparse array would use less memory than your full 16GB matrix.

Did a quick test on Ubuntu 10.10 x64
gt#thinkpad-T61p:~/test$ uname -a
Linux thinkpad-T61p 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux
gt#thinkpad-T61p:~/test$ cat mtest.c
#include <stdio.h>
#include <stdlib.h>
short *big_array;
int main(void)
{
if((big_array = (short *)malloc(4UL*1024*1024*1024*sizeof (short))) == NULL) {
perror("malloc");
return 1;
}
big_array[0]++;
big_array[100]++;
big_array[1UL*1024*1024*1024]++;
big_array[2UL*1024*1024*1024]++;
big_array[3UL*1024*1024*1024]++;
printf("array[100] = %d\narray[3G] = %d\n", big_array[100], big_array[3UL*1024*1024*1024]);
return 0;
}
gt#thinkpad-T61p:~/test$ gcc -Wall mtest.c -o mtest
gt#thinkpad-T61p:~/test$ ./mtest
array[100] = 1
array[3G] = 1
gt#thinkpad-T61p:~/test$
It looks like the virtual memory system on linux is up to the job, as long as you have enough memory and/or swap.
Have fun!