Here what's i am trying to do:
assume you have two fonction
void f1(int *v)
{
*v = 55;
}
void f2(int *v)
{
*v = 44;
}
char *template;
template = allocExecutablePages(...);
char *allocExecutablePages (int pages)
{
template = (char *) valloc (getpagesize () * pages);
if (mprotect (template, getpagesize (),
PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror (“mprotect”);
}
}
I would like to do a comparison between f1 and f2 (so tell what is identical and what is not) (so get the assembly lines of those function and make a line by line comparison)
And then put those line in my template.
Is there a way in C to do that?
THanks
Update
Thank's for all you answers guys but maybe i haven't explained my need correctly.
basically I'm trying to write a little obfuscation method.
The idea consists in letting two or more functions share the same location in memory. A region of memory (which we will call a template) is set up containing some of the
machine code bytes from the functions, more specifically, the ones they all
have in common. Before a particular function is executed, an edit script is used
to patch the template with the necessary machine code bytes to create a
complete version of that function. When another function assigned to the same
template is about to be executed, the process repeats, this time with a
different edit script. To illustrate this, suppose you want to obfuscate a
program that contains two functions f1 and f2. The first one (f1) has the
following machine code bytes
Address Machine code
0 10
1 5
2 6
3 20
and the second one (f2) has
Address Machine code
0 10
1 9
2 3
3 20
At obfuscation time, one will replace f1 and f2 by the template
Address Machine code
0 10
1 ?
2 ?
3 20
and by the two edit scripts e1 = {1 becomes 5, 2 becomes 6} and e2 = {1
becomes 9, 2 becomes 3}.
#include <stdlib.h>
#include <string.h>
typedef unsigned int uint32;
typedef char * addr_t;
typedef struct {
uint32 offset;
char value;
} EDIT;
EDIT script1[200], script2[200];
char *template;
int template_len, script_len = 0;
typedef void(*FUN)(int *);
int val, state = 0;
void f1_stub ()
{
if (state != 1) {
patch (script1, script_len, template);
state = 1;
}
((FUN)template)(&val);
}
void f2_stub () {
if (state != 2) {
patch (script2, script_len, template);
state = 2;
}
((FUN)template)(&val);
}
int new_main (int argc, char **argv)
{
f1_stub ();
f2_stub ();
return 0;
}
void f1 (int *v) { *v = 99; }
void f2 (int *v) { *v = 42; }
int main (int argc, char **argv)
{
int f1SIZE, f2SIZE;
/* makeCodeWritable (...); */
/* template = allocExecutablePages(...); */
/* Computed at obfuscation time */
diff ((addr_t)f1, f1SIZE,
(addr_t)f2, f2SIZE,
script1, script2,
&script_len,
template,
&template_len);
/* We hide the proper code */
memset (f1, 0, f1SIZE);
memset (f2, 0, f2SIZE);
return new_main (argc, argv);
}
So i need now to write the diff function. that will take the addresses of my two function and that will generate a template with the associated script.
So that is why i would like to compare bytes by bytes my two function
Sorry for my first post who was not very understandable!
Thank you
Do you want to do this at runtime or during authorship?
You can probably instruct your C compiler to produce assembly language output, for example gcc has the -S option which will produce output in file.s Your compiler suite may also have a program like objdump which can decompile an object file or entire executable. However, you generally want to leave optimizations up to a modern compiler rather than do it yourself.
At runtime the & operator can take the address of a function and you can read through it, though you have to be prepared for the possibility of encountering a branch instruction before anything interesting, so you actually have to programatically "understand" at least a subset of the instruction set. What you will run into when reading function pointers will of course vary all over the place by machine, ABI, compiler, optimization flags, etc.
Put the functions into t1.c and t2.c use gcc -S to generate assembly output:
gcc -S t1.c
gcc -S t2.c
Now compare t1.s and t2.s.
If you are using Visual Studio, go to
Project Properties -> Configuration -> C/C++ -> Output Files -> Assembler output
or use compiler switches /FA, /FAc, /FAs, /FAcs. Lower-case c means output machine code, s-source code side-by-side with assembly code. And don't forget to disable compiler optimizations.
Having read through some of the answers and the comments there, I'm not sure I fully understand your question, but maybe you're looking for a gcc invocation like the following:
gcc -S -xc - -o -
This tells gcc to input C code from stdin and output assembly to stdout.
If you use a vi-like editor, you can highlight the function body in visual mode and then run the command:
:'<,'>!gcc -S -xc - -o - 2> /dev/null
...and this will replace the function body with assembly (the "stderr > /dev/null" business is to skip errors about #include's).
You could otherwise use this invocation of gcc as part of a pipeline in a script.
Related
This question already has answers here:
Find size of a function in C
(9 answers)
Closed 4 years ago.
I am writing code for an embedded system where it is more efficient to copy code from ROM to the SOC's internal memory then execute it. Is there any way to programmatically get the size in bytes of the function so I can use a function like memcpy to copy the function's instructions to internal memory.
Is there a better way to do this?
Is there a better way to do this?
There likely is, and I truly hope someone else responds with an answer that provides a simpler approach, but for now I'll shed some details on your suggested method.
If the program is compiled in ELF format (I don't know for the other formats), all your functions will be included in the .text section of the ELF file. You can use the symbol table to find this function in the text section. To get the size of this function, you might be able to use the st_size member of the Elf64_Sym or Elf32_Sym struct, but I'm not entirely certain that will give the correct size. What you could do (a little hacky, admittedly) is iterate through the other symbols, and find the one immediately after it, and subtract to get the size. Of course you'd have to keep in mind alignment rules, but that's not too much of an issue- if you copy extra bytes, they won't be executed anyways.
Also keep in mind that some code get compiled with certain assumptions about its offset in memory. You'll might need to manually patch the GOT and/or PLT if you copy the function directly into memory. Know you should probably compile the function you want to include with -PIC and -fPIC for position independent code, at least in GCC.
If you need more details on how to access the symbol table, or the text section of your ELF, I could add more details.
With some compilers you can get a function size by computing the difference between the function's address and the address of another function that immediatly follows the first one.
But it really depends of the compiler. With Visual C++ for example, both functions has to be static functions. With GCC, it does not work anymore if optimization O2 or better is activated.
And even if you manage to copy your function elsewhere in memory, you may not be able to use it, especially if it refers other functions, or if it refers global/static variables, or if the code is not position independant, etc.
So this is a simple solution, it may work in your case, but it can't be considered as a general solution.
Below there's an example that works with gcc and visual C++, tested on windows 10 and WSL (do not activate optimizations with gcc).
#include <stdio.h>
#include <string.h>
#ifdef _WIN32
#include <windows.h>
#endif
#ifdef __linux
#include <sys/mman.h>
#endif
// The function to copy
static int fib(int m)
{
int v1 = 0, v2 = 1, n;
if (m == 0) return 0;
for (n = 1; n < m; n++)
{
int v = v1 + v2;
v1 = v2;
v2 = v;
}
return v2;
}
static void endFib(void)
{
// This function follow immediatly the fib function
// and it exists only to get its address and compute the size of fib function
}
int main(int argc, char *argv)
{
long sizeFib;
int (*copyFib)(int);
printf("&fib=%p\n", (char *)fib);
sizeFib = (char *)endFib - (char *)fib;
printf("size of fib : %ld\n", sizeFib);
printf("fib(8) : %d\n", fib(8));
// For the example the allocated copy must be in an executable part of the memory.
#ifdef _WIN32
copyFib = VirtualAlloc(NULL, sizeFib, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
#endif
#ifdef __linux
copyFib = mmap(NULL, sizeFib, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
#endif
memcpy(copyFib, fib, sizeFib);
printf("©Fib=%p\n", copyFib);
printf("copyFib(8) : %d\n", copyFib(8));
return 0;
}
Is it possible to write a script to run this code for different values of A;
#include <stdio.h>
#define A 3
int main (){
printf("In this version A = %d\n", A);
return(0);
}
I guess something like for loop?
Is it possible to write a script to run this code for different values of A;
Not as it is because the macro A has a fixed value defined in your code. Instead you can pass the value as an argument:
#include <stdio.h>
int main(int argc, char **argv){
if(argc == 2) {
printf("In this version A = %s\n", argv[1]);
}
return 0;
}
(The code doesn't check if its input is an integer -- which you can test if necessary).
and you can run it via script. For example, compile the above (gcc -Wall -Wextra test.c -o test) using a for loop of bash:
$ for ((i = 0; i < 10; i++)); do ./test $i; done
In this version A = 0
In this version A = 1
In this version A = 2
In this version A = 3
In this version A = 4
In this version A = 5
In this version A = 6
In this version A = 7
In this version A = 8
In this version A = 9
$
No. But you can make A a command line arg:
#include <stdio.h>
int main (int argc, char *argv[]) {
int a;
if (argc != 2 || sscanf(argv[1], "%d", &a) != 1) return 1;
printf("In this version A = %d\n", a);
return 0;
}
Compile to a binary named foo, then
foo 42
will print
In this version A = 42
You can also compile different versions by defining A in the compilation command line. From your original program, remove the #define. Then
gcc -DA=42 foo.c -o foo
./foo
will print the same as above.
DO you need run program repeated from script? why not to make program that accepts arguments from command line?
1)The main() function actually takes arguments, you can compile program once and pass different parameters, as shown in answers above
2) If you need to change some code parameters from make script, I'd say, create separate header that would contain defines and write script that would echo into that file (> for start, >> to continue writing).
3) Alternative way you can call you compiler with flag that would be equal to #define macro-command. For gcc it's -D, for example -DA=3 instead of #define A 3.
Most programs use makefile to be compiled. For that case you can script make file to use 2) or 3) Former is preferable because you do not need to pass that argument to all compilation targets, reducing time or re-compiling. There are tools for more advanced manipulations, like autoconf.
In code reviews I ask for option (1) below to be used as it results in a symbol being created (for debugging) whereas (2) and (3) do not appear to do so at least for gcc and icc. However (1) is not a true const and cannot be used on all compilers as an array size. Is there a better option that includes debug symbols and is truly const for C?
Symbols:
gcc f.c -ggdb3 -g ; nm -a a.out | grep _sym
0000000100000f3c s _symA
0000000100000f3c - 04 0000 STSYM _symA
Code:
static const int symA = 1; // 1
#define symB 2 // 2
enum { symC = 3 }; // 3
GDB output:
(gdb) p symA
$1 = 1
(gdb) p symB
No symbol "symB" in current context.
(gdb) p symC
No symbol "symC" in current context.
And for completeness, the source:
#include <stdio.h>
static const int symA = 1;
#define symB 2
enum { symC = 3 };
int main (int argc, char *argv[])
{
printf("symA %d symB %d symC %d\n", symA, symB, symC);
return (0);
}
The -ggdb3 option should be giving you macro debugging information. But this is a different kind of debugging information (it has to be different - it tells the debugger how to expand the macro, possibly including arguments and the # and ## operators) so you can't see it with nm.
If your goal is to have something that shows up in nm, then I guess you can't use a macro. But that's a silly goal; you should want to have something that actually works in a debugger, right? Try print symC in gdb and see if it works.
Since macros can be redefined, gdb requires the program to be stopped at a location where the macro existed so it can find the correct definition. In this program:
#include <stdio.h>
int main(void)
{
#define X 1
printf("%d\n", X);
#undef X
printf("---\n");
#define X 2
printf("%d\n", X);
}
If you break on the first printf and print X you'll get the 1; next to the second printf and gdb will tell you that there is no X; next again and it will show the 2.
Also the gdb command info macro foo can be useful, if foo is a macro that takes arguments and you want to see its definition rather than expand it with a specific set of arguments. And if a macro expands to something that's not an expression, gdb can't print it so info macro is the only thing you can do with it.
For better inspection of the raw debugging information, try objdump -W instead of nm.
However (1) is not a true const and cannot be used on all compilers as an array size.
This can be used as array size on all compilers that support C99 and latter (gcc, clang). For others (like MSVC) you have only the last two options.
Using option 3 is preferred 2. enums are different from #define constants. You can use them for debugging. You can use enum constants as l-value as well unlike #define constants.
I have a dynamic library that contains a constructor.
__attribute__ ((constructor))
void construct() {
// This is initialization code
}
The library is compiled with -nostdlib option and I cannot change that. As a result there are no .ctor and .dtor sections in library and the constructor is not running on the library load.
As written there there should be special measures that allow running the constructor even in this case. Could you please advice me what and how that can be done?
Why do you need constructors? Most programmers I work with, myself included, refuse to use libraries with global constructors because all too often they introduce bugs by messing up the program's initial state when main is entered. One concrete example I can think of is OpenAL, which broke programs when it was merely linked, even if it was never called. I was not the one on the project who dealt with this bug, but if I'm not mistaken it had something to do with mucking with ALSA and breaking the main program's use of ALSA later.
If your library has nontrivial global state, instead see if you can simply use global structs and initializers. You might need to add flags with some pointers to indicate whether they point to allocated memory or static memory, though. Another method is to defer initialization to the first call, but this can have thread-safety issues unless you use pthread_once or similar.
Hmm missed the part that there where no .ctor and .dtor sections... forget about this.
#include <stdio.h>
#include <stdint.h>
typedef void (*func)(void);
__attribute__((constructor))
void func1(void) {
printf("func1\n");
}
__attribute__((constructor))
void func2(void) {
printf("func2\n");
}
extern func* __init_array_start;
int main(int argc, char **argv)
{
func *funcarr = (func*)&__init_array_start;
func f;
int idx;
printf("start %p\n", *funcarr);
// iterate over the array
for (idx = 0; ; ++idx) {
f = funcarr[idx];
// skip the end of array marker (0xFFFFFFFF) on 64 bit it's twice as long ;)
if (f == (void*)~0)
continue;
// till f is NULL which indicates the start of the array
if (f == NULL)
break;
printf("constructor %p\n", *f);
f();
}
return 0;
}
Which gives:
Compilation started at Fri Mar 9 09:28:29
make test && ./test
cc test.c -o test
func2
func1
start 0xffffffff
constructor 0x80483f4
func1
constructor 0x8048408
func2
Probably you need to swap the continue and break if you are running on an Big Endian system but i'm not entirely sure.
But just like R.. stated using static constructors in libraries is not so nice to the developers using your library :p
On some platforms, .init_array/.fini_array sections are generated to include all global constructors/destructors. You may use that.
What is the simplest possible C function for starting the R interpreter, passing in a small expression (eg, 2+2), and getting out the result? I'm trying to compile with MingW on Windows.
You want to call R from C?
Look at section 8.1 in the Writing R Extensions manual. You should also look into the "tests" directory (download the source package extract it and you'll have the tests directory). A similar question was previously asked on R-Help and here was the example:
#include <Rinternals.h>
#include <Rembedded.h>
SEXP hello() {
return mkString("Hello, world!\n");
}
int main(int argc, char **argv) {
SEXP x;
Rf_initEmbeddedR(argc, argv);
x = hello();
return x == NULL; /* i.e. 0 on success */
}
The simple example from the R manual is like so:
#include <Rembedded.h>
int main(int ac, char **av)
{
/* do some setup */
Rf_initEmbeddedR(argc, argv);
/* do some more setup */
/* submit some code to R, which is done interactively via
run_Rmainloop();
A possible substitute for a pseudo-console is
R_ReplDLLinit();
while(R_ReplDLLdo1() > 0) {
add user actions here if desired
}
*/
Rf_endEmbeddedR(0);
/* final tidying up after R is shutdown */
return 0;
}
Incidentally, you might want to consider using Rinside instead: Dirk provides a nice "hello world" example on the project homepage.
In you're interested in calling C from R, here's my original answer:
This isn't exactly "hello world", but here are some good resources:
Jay Emerson recently gave a talk on R package development at the New York useR group, and he provided some very nice examples of using C from within R. Have a look at the paper from this discussion on his website, starting on page 9. All the related source code is here: http://www.stat.yale.edu/~jay/Rmeetup/MyToolkitWithC/.
The course taught at Harvard by Gopi Goswami in 2005: C-C++-R (in Statistics). This includes extensive examples and source code.
Here you go. It's the main function, but you should be able to adapt it to a more general purpose function. This example builds an R expression from C calls and also from a C string. You're on your own for the compiling on windows, but I've provided compile steps on linux:
/* simple.c */
#include <Rinternals.h>
#include <Rembedded.h>
#include <R_ext/Parse.h>
int
main(int argc, char *argv[])
{
char *localArgs[] = {"R", "--no-save","--silent"};
SEXP e, tmp, ret;
ParseStatus status;
int i;
Rf_initEmbeddedR(3, localArgs);
/* EXAMPLE #1 */
/* Create the R expressions "rnorm(10)" with the R API.*/
PROTECT(e = allocVector(LANGSXP, 2));
tmp = findFun(install("rnorm"), R_GlobalEnv);
SETCAR(e, tmp);
SETCADR(e, ScalarInteger(10));
/* Call it, and store the result in ret */
PROTECT(ret = R_tryEval(e, R_GlobalEnv, NULL));
/* Print out ret */
printf("EXAMPLE #1 Output: ");
for (i=0; i<length(ret); i++){
printf("%f ",REAL(ret)[i]);
}
printf("\n");
UNPROTECT(2);
/* EXAMPLE 2*/
/* Parse and eval the R expression "rnorm(10)" from a string */
PROTECT(tmp = mkString("rnorm(10)"));
PROTECT(e = R_ParseVector(tmp, -1, &status, R_NilValue));
PROTECT(ret = R_tryEval(VECTOR_ELT(e,0), R_GlobalEnv, NULL));
/* And print. */
printf("EXAMPLE #2 Output: ");
for (i=0; i<length(ret); i++){
printf("%f ",REAL(ret)[i]);
}
printf("\n");
UNPROTECT(3);
Rf_endEmbeddedR(0);
return(0);
}
Compile steps:
$ gcc -I/usr/share/R/include/ -c -ggdb simple.c
$ gcc -o simple simple.o -L/usr/lib/R/lib -lR
$ LD_LIBRARY_PATH=/usr/lib/R/lib R_HOME=/usr/lib/R ./simple
EXAMPLE #1 Output: 0.164351 -0.052308 -1.102335 -0.924609 -0.649887 0.605908 0.130604 0.243198 -2.489826 1.353731
EXAMPLE #2 Output: -1.532387 -1.126142 -0.330926 0.672688 -1.150783 -0.848974 1.617413 -0.086969 -1.334659 -0.313699
I don't think any of the above has answered the question - which was to evaluate 2 + 2 ;). To use a string expression would be something like:
#include <Rinternals.h>
#include <R_ext/Parse.h>
#include <Rembedded.h>
int main(int argc, char **argv) {
SEXP x;
ParseStatus status;
const char* expr = "2 + 2";
Rf_initEmbeddedR(argc, argv);
x = R_ParseVector(mkString(expr), 1, &status, R_NilValue);
if (TYPEOF(x) == EXPRSXP) { /* parse returns an expr vector, you want the first */
x = eval(VECTOR_ELT(x, 0), R_GlobalEnv);
PrintValue(x);
}
Rf_endEmbeddedR(0);
return 0;
}
This lacks error checking, obviously, but works:
Z:\>gcc -o e.exe e.c -IC:/PROGRA~1/R/R-213~1.0/include -LC:/PROGRA~1/R/R-213~1.0/bin/i386 -lR
Z:\>R CMD e.exe
[1] 4
(To get the proper commands for your R use R CMD SHLIB e.c which gives you the relevant compiler flags)
You can also construct the expression by hand if it's simple enough - e.g., for rnorm(10) you would use
SEXP rnorm = install("rnorm");
SEXP x = eval(lang2(rnorm, ScalarInteger(10)), R_GlobalEnv);
I think you can't do much better than the inline package (which supports C, C++ and Fortran):
library(inline)
fun <- cfunction(signature(x="ANY"),
body='printf("Hello, world\\n"); return R_NilValue;')
res <- fun(NULL)
which will print 'Hello, World' for you. And you don't even know where / how / when the compiler and linker are invoked. [ The R_NilValue is R's NULL version of a SEXP and the .Call() signature used here requires that you return a SEXP -- see the 'Writing R Extensions' manual which you can't really avoid here. ]
You will then take such code and wrap it in a package. We had great success with using
inline for the
Rcpp unit tests (over 200 and counting now) and some of the examples.
Oh, and this inline example will work on any OS. Even Windoze provided you have the R package building tool chain installed, in the PATH etc pp.
Edit: I misread the question. What you want is essentially what the littler front-end does (using pure C) and what the RInside classes factored-out for C++.
Jeff and I never bothered with porting littler to Windoze, but RInside did work there in most-recent release. So you should be able to poke around the build recipes and create a C-only variant of RInside so that you can feed expression to an embedded R process. I suspect that you still want something like Rcpp for the clue as it gets tedious otherwise.
Edit 2: And as Shane mentions, there are indeed a few examples in the R sources in tests/Embedding/ along with a Makefile.win. Maybe that is the simplest start if you're willing to learn about R internals.