Re-write segment of a compiled C program - c

I've a C program structured in this way:
#include <Windows.h>
#include <stdio.h>
#include <stdint.h>
#pragma section(".code",execute, read, write)
#pragma comment(linker,"/SECTION:.code,ERW")
#pragma code_seg(".code")
//Code to decrypt
#pragma section(".stub", execute, read, write)
#pragma code_seg(".stub")
void decryptor(){
//Retrieve virtual address of the pointer to the .code section
//Retrieve the virtual size of the pointer to the .code section
for(int i = 0; i<size; i++){
//HERE THE PROGRAM STOPS
ptrCode[0] = //Reverse function of the encryptor
}
}
int main(){
decryptor();
mainFunctionDecrypted();
return 0;
}
Basically i've an encryptor which first encrypt the .code segment in the exe of this program after compilation.
Then when i execute the modified exe i want to be able to first decrypt it and then execute the decrypted part. However it seems like i cannot write to the .code segment loaded in memory (I think because it's a part memory dedicated to code to be executed).
Is there any way to write to executable memory?
Is there any workaroud you would do?

Windows and other operating systems go out of their way to prevent you from doing this (modifying the code sections of a running application).
Your immediate options, then, are
1) decrypt the code to some other memory area allocated dynamically for that purpose (code must then either use only position-independent instructions, or contain custom fixups for the instructions that have position-specific data).
2) use a separate program that decrypts the program on disk before it is executed.

Obfuscating a program in this way is inherently futile. Whatever your "decryptor" does, a human who is determined to reverse engineer your program can also do. Spend your effort instead on making your program desirable enough that people want to pay you for it, and benevolent enough that you don't have to hide what it's doing.

I need to modify the code in the following way. Moreover there are important compiler option to set in visual studio, for example to disable the Data Execution Prevention.
Compiler option used:
/permissive- /GS /TC /GL /analyze- /W3 /Gy /Zc:wchar_t /Gm- /O2 /sdl /Zc:inline /fp:precise /Zp1 /D "_MBCS" /errorReport:prompt /WX- /Zc:forScope /GR- /Gd /Oy- /Oi /MD /FC /nologo /diagnostics:classic
Linker option used:
/MANIFEST /LTCG:incremental /NXCOMPAT:NO /DYNAMICBASE:NO "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /FIXED /MACHINE:X86 /OPT:REF /SAFESEH /INCREMENTAL:NO /SUBSYSTEM:CONSOLE /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /MAP /OPT:ICF /ERRORREPORT:PROMPT /NOLOGO /TLBID:1
#pragma section(".code", execute, read)
#pragma section(".codedata", read, write)
#pragma comment(linker,"/SECTION:.code,ERW")
#pragma comment(linker,"/SECTION:.codedata,ERW")
#pragma comment(linker, "/MERGE:.codedata=.code")
//All the following will go in code
#pragma code_seg(".code")
#pragma data_seg(".codedata")
#pragma const_seg(".codedata")
//CODE TO DECRYPT
// .stub SECTION
#pragma section(".stub", execute, read)
#pragma section(".stubdata", read, write)
#pragma comment(linker,"/SECTION:.stub,ERW")
#pragma comment(linker,"/SECTION:.stubdata,ERW")
#pragma comment(linker, "/MERGE:.stubdata=.stub")
//All the following will go in .stub segment
#pragma code_seg(".stub")
#pragma data_seg(".stubdata")
#pragma const_seg(".stubdata")
/*This function needs to be changed to whatever correspond to the decryption function of the encryotion function used by the encryptor*/
void decryptCodeSection(){
//Retrieve virtual address of the pointer to the .code section
//Retrieve the virtual size of the pointer to the .code section
for(int i = 0; i<size; i++){
//HERE THE PROGRAM STOPS
ptrCode[0] = //Reverse function of the encryptor
}
void main(int argc, char* argv[]){
decryptor();
mainFunctionDecrypted();
}
Doing this way i was able to first decrypt the segment and then execute the function.

Related

Using Address Sanitizer and _CrtDumpMemoryLeaks() with MSVC

I'm having an issue with compiling with both /fsanitize=address and /MDd compiler options.
#ifdef _DEBUG
#define _CRTDBG_MAP_ALLOC
#include <stdlib.h>
#include <crtdbg.h>
#else
#include <stdlib.h>
#endif
#include <stdio.h>
int main(int argc, char **argv) {
_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE);
_CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDOUT);
_CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE);
_CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDOUT);
_CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE);
_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDOUT);
int *foo = malloc(sizeof(*foo) * 1024);
printf("%d\n", _CrtDumpMemoryLeaks());
return EXIT_SUCCESS;
}
With cl test.c /MDd /Zi works as expected and reports the leak.
Detected memory leaks!
Dumping objects ->
test.c(20) : {104} normal block at 0x000001ABE8BFA130, 4096 bytes long.
Data: < > CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD
Object dump complete.
1
However, adding address sanitizer, cl test.c /fsanitize=address /MDd /Zi reports no leaks.
0
I had assumed according to MSDN that this might work.
Thank you to the commenter for pointing out that even if they did work together, you'd expect that _CrtDumpMemoryLeaks() would always report a leak because:
The AddressSanitizer runtime doesn't release memory back to the OS during execution. From the OS's point of view, it may look like there's a memory leak. This design decision is intentional, so as not to allocate all the required memory up front.
I still am not quite sure I'm not seeing this behavior though and _CrtDumpMemoryLeaks() is resulting in 0 when /fsanitize=address is enabled.
However, this clears up the issue for me, and I should just expect to create different builds to test either with address sanitizer or _CrtDumpMemoryLeaks().

Getting "call to '__bad_copy_to' declared with attribute error: copy destination size is too small" after disabling optimization

This is part of a linux device driver.
uint32_t val = 0;
static long axpu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
...
copy_from_user(&val,(int32_t *)arg, sizeof(val));
...
}
I can compile this ok and was doing some debug. To keep from some codes from being optimized away. I changed the code like this.
#pragma GCC push_options
#pragma GCC optimize ("O0")
uint32_t val = 0;
static long axpu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
...
copy_from_user(&val,(int32_t *)arg, sizeof(val)); // <-- line 364
...
}
#pragma GCC pop_options
This is a trick I've been using for a while and worked just fine.(just that I used in the linux kernel codes for boot debug). This time, I'm doing it for the device driver. Now the compiler gives me this error.
In file included from ./arch/arm64/include/asm/preempt.h:5,
from ./include/linux/preempt.h:78,
from ./include/linux/spinlock.h:51,
from ./include/linux/seqlock.h:36,
from ./include/linux/time.h:6,
from ./arch/arm64/include/asm/stat.h:12,
from ./include/linux/stat.h:6,
from ./include/linux/module.h:10,
from /home/ckim/testlin540/axpu_ldd_kc.c:8:
In function 'check_copy_size',
inlined from 'copy_from_user' at ./include/linux/uaccess.h:143:6,
inlined from 'axpu_ioctl' at /home/ckim/testlin540/axpu_ldd_kc.c:364:4:
./include/linux/thread_info.h:160:4: error: call to '__bad_copy_to' declared with attribute error: copy destination size is too small
160 | __bad_copy_to();
| ^~~~~~~~~~~~~~~
I tried putting the #pragma thing around #include <linux/uaccess.h> in vain.
Some say it's gcc problem but I'm not sure. This is linux-5.4.188 code and this is the compiler version.
ckim#ckim-ubuntu:~/testlin540$ aarch64-none-linux-gnu-gcc --version
aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.2.1 20201103
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

OpenMP in C using GCC in Windows to create DLL

I've recently started to play around with OpenMP and like it very much.
I am a just-for-fun Classic-VB programmer and like coding functions for my VB programs in C. As such, I use Windows 7 x64 and GCC 4.7.2.
I usually set up all my C functions in one large C file and then compile a DLL out of it. Now I would like to use OpenMP in my DLL.
First of all, I set up a simple example and compiled an exe file from it:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int n = 520000;
int i;
int a[n];
int NumThreads;
omp_set_num_threads(4);
#pragma omp parallel for
for (i = 0; i < n; i++)
{
a[i] = 2 * i;
NumThreads = omp_get_num_threads();
}
printf("Value = %d.\n", a[77]);
printf("Number of threads = %d.", NumThreads);
return(0);
}
I compile that using gcc -fopenmp !MyC.c -o !MyC.exe and it works like a charm.
However, when I try to use OpenMP in my DLL, it fails. For example, I set up this function:
__declspec(dllexport) int __stdcall TestAdd3i(struct SAFEARRAY **InArr1, struct SAFEARRAY **InArr2, struct SAFEARRAY **OutArr) //OpenMP Test
{
int LengthArr;
int i;
int *InArrElements1;
int *InArrElements2;
int *OutArrElements;
LengthArr = (*InArr1)->rgsabound[0].cElements;
InArrElements1 = (int*) (**InArr1).pvData;
InArrElements2 = (int*) (**InArr2).pvData;
OutArrElements = (int*) (**OutArr).pvData;
omp_set_num_threads(4);
#pragma omp parallel for private(i)
for (i = 0; i < LengthArr; i++)
{
OutArrElements[i] = InArrElements1[i] + InArrElements2[i];
}
return(omp_get_num_threads());
}
The structs are defined, of course. I compile that using
gcc -fopenmp -c -DBUILD_DLL dll.c -o dll.o
gcc -fopenmp -shared -o mydll.dll dll.o -lgomp -Wl,--add-stdcall-alias
The compiler and linker do not complain (not even warnings come up) and the dll file is actually being built. But as I try to call the function from within VB, the VB compiler claims the the DLL file could not be found (run-time error 53). The strange thing about that is that as soon as one single OpenMP "command" is present inside the .c file, the VB compiler claims a missing DLL even if I call a function that does not even contain a single line of OpenMP code. When I comment all OpenMP stuff out, the function works as expected, but doesn't use OpenMP for parallelization, of course.
What is wrong here? Any help appreciated, thanks in advance! :-)
The problem most probably in this case is LD_LIBRARY_PATH is not set . You must use set LD_LIBRARY_PATH to the path that contains the dll or the system will not be able to find it and hence complains about the same

Can I use Intel syntax of x86 assembly with GCC?

I want to write a small low level program. For some parts of it I will need to use assembly language, but the rest of the code will be written on C/C++.
So, if I will use GCC to mix C/C++ with assembly code, do I need to use AT&T syntax or can
I use Intel syntax? Or how do you mix C/C++ and asm (intel syntax) in some other way?
I realize that maybe I don't have a choice and must use AT&T syntax, but I want to be sure..
And if there turns out to be no choice, where I can find full/official documentation about the AT&T syntax?
Thanks!
If you are using separate assembly files, gas has a directive to support Intel syntax:
.intel_syntax noprefix # not recommended for inline asm
which uses Intel syntax and doesn't need the % prefix before register names.
(You can also run as with -msyntax=intel -mnaked-reg to have that as the default instead of att, in case you don't want to put .intel_syntax noprefix at the top of your files.)
Inline asm: compile with -masm=intel
For inline assembly, you can compile your C/C++ sources with gcc -masm=intel (See How to set gcc to use intel syntax permanently? for details.) The compiler's own asm output (which the inline asm is inserted into) will use Intel syntax, and it will substitute operands into asm template strings using Intel syntax like [rdi + 8] instead of 8(%rdi).
This works with GCC itself and ICC, but for clang only clang 14 and later.
(Not released yet, but the patch is in current trunk.)
Using .intel_syntax noprefix at the start of inline asm, and switching back with .att_syntax can work, but will break if you use any m constraints. The memory reference will still be generated in AT&T syntax. It happens to work for registers because GAS accepts %eax as a register name even in intel-noprefix mode.
Using .att_syntax at the end of an asm() statement will also break compilation with -masm=intel; in that case GCC's own asm after (and before) your template will be in Intel syntax. (Clang doesn't have that "problem"; each asm template string is local, unlike GCC where the template string truly becomes part of the text file that GCC sends to as to be assembled separately.)
Related:
GCC manual: asm dialect alternatives: writing an asm statement with {att | intel} in the template so it works when compiled with -masm=att or -masm=intel. See an example using lock cmpxchg.
https://stackoverflow.com/tags/inline-assembly/info for more about inline assembly in general; it's important to make sure you're accurately describing your asm to the compiler, so it knows what registers and memory are read / written.
AT&T syntax: https://stackoverflow.com/tags/att/info
Intel syntax: https://stackoverflow.com/tags/intel-syntax/info
The x86 tag wiki has links to manuals, optimization guides, and tutorials.
You can use inline assembly with -masm=intel as ninjalj wrote, but it may cause errors when you include C/C++ headers using inline assembly. This is code to reproduce the errors on Cygwin.
sample.cpp:
#include <cstdint>
#include <iostream>
#include <boost/thread/future.hpp>
int main(int argc, char* argv[]) {
using Value = uint32_t;
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
// "movl $1, %0\n\t" // AT&T syntax
:"=r"(value)::);
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
std::cout << (value + func.get());
return 0;
}
When I built this code, I got error messages below.
g++ -E -std=c++11 -Wall -o sample.s sample.cpp
g++ -std=c++11 -Wall -masm=intel -o sample sample.cpp -lboost_system -lboost_thread
/tmp/ccuw1Qz5.s: Assembler messages:
/tmp/ccuw1Qz5.s:1022: Error: operand size mismatch for `xadd'
/tmp/ccuw1Qz5.s:1049: Error: no such instruction: `incl DWORD PTR [rax]'
/tmp/ccuw1Qz5.s:1075: Error: no such instruction: `movl DWORD PTR [rcx],%eax'
/tmp/ccuw1Qz5.s:1079: Error: no such instruction: `movl %eax,edx'
/tmp/ccuw1Qz5.s:1080: Error: no such instruction: `incl edx'
/tmp/ccuw1Qz5.s:1082: Error: no such instruction: `cmpxchgl edx,DWORD PTR [rcx]'
To avoid these errors, it needs to separate inline assembly (the upper half of the code) from C/C++ code which requires boost::future and the like (the lower half). The -masm=intel option is used to compile .cpp files that contain Intel syntax inline assembly, not to other .cpp files.
sample.hpp:
#include <cstdint>
using Value = uint32_t;
extern Value GetValue(void);
sample1.cpp: compile with -masm=intel
#include <iostream>
#include "sample.hpp"
int main(int argc, char* argv[]) {
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
:"=r"(value)::);
std::cout << (value + GetValue());
return 0;
}
sample2.cpp: compile without -masm=intel
#include <boost/thread/future.hpp>
#include "sample.hpp"
Value GetValue(void) {
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
return func.get();
}

Segmentation fault using OpenMp and SSE

I'm just getting started experimenting adding OpenMP to some SSE code.
My first test program SOMETIMES crashes in _mm_set_ps, but works when I set the if (0).
It looks so simple I must be missing something obvious.
I'm compiling with gcc -fopenmp -g -march=core2 -pthreads
#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>
int main()
{
#pragma omp parallel if (1)
{
#pragma omp sections
{
#pragma omp section
{
__m128 x1 = _mm_set_ps ( 1.1f, 2.1f, 3.1f, 4.1f );
}
#pragma omp section
{
__m128 x2 = _mm_set_ps ( 1.2f, 2.2f, 3.2f, 4.2f );
}
} // end omp sections
} //end omp parallel
return 0;
}
This is a bug in the openMP implementation. I was having the same problem in gcc on Windows (MinGW). -mstackrealign command line option solved my problem. This adds an instruction to the prolog of every function to realign the stack at the 16-byte boundary. I didn't notice any performance penalty. You can also try to add __attribute__ ((force_align_arg_pointer)) to a function declaration, which should do the same, but only for a specific function. You might have to put the SSE code in a separate function that you then call from the function with #pragma omp, so that the stack has a chance to be realigned.
I stopped having the problem when I moved onto compiling for a 64-bit target (MinGW64, such as TDM GCC build).
I am playing with AVX instructions which require a 32-byte alignment, but GCC doesn't support that for windows at all. This forced me to fix the produced assembly code using a python script, but it works.
I smell unaligned memory access. Its the only way code like that could explode(assuming that is the only code there). For that to happen the XMM registers wouldn't be used but rather stack memory, which is only aligned to 4 bytes, my guess is the omp code is messing up the alignment of the stack.

Resources