Obfuscating function call - c

John Viega suggests a method to obfuscate function calls in his book Secure Programming Cookbook for C and C++. It can be read here.
#define SET_FN_PTR(func, num) \
static inline void *get_##func(void) { \
int i, j = num / 4; \
long ptr = (long)func + num; \
for (i = 0; i < 2; i++) ptr -= j; \
return (void *)(ptr - (j * 2)); \
}
#define GET_FN_PTR(func) get_##func( )
#include <stdio.h>
void my_func(void) {
printf("my_func( ) called!\n");
}
SET_FN_PTR(my_func, 0x01301100); /* 0x01301100 is some arbitrary value */
int main(int argc, char *argv[ ]) {
void (*ptr)(void);
ptr = GET_FN_PTR(my_func); /* get the real address of the function */
(*ptr)( ); /* make the function call */
return 0;
}
I compiled it with gcc fp.c -S -O2, Ubuntu 15.10 64bit, gcc5.2.1, and checked the assemby:
...
my_func:
.LFB23:
.cfi_startproc
movl $.LC0, %edi
jmp puts
.cfi_endproc
.LFE23:
.size my_func, .-my_func
.section .text.unlikely
.LCOLDE1:
.text
.LHOTE1:
.section .text.unlikely
.LCOLDB2:
.section .text.startup,"ax",#progbits
.LHOTB2:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB25:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call my_func
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
...
I see that my_func is called in main. Can somebody explain how this method obfuscates the function call?
I see that many readers just come and downvote. I took the time the understand the problem, and when I failed to post it here. Please at least write some comment, instead of pushing the downvote button.
UPDATE: Turning off optimization I got:
...
my_func:
...
get_my_func:
...
main:
...
call get_my_func
movq %rax, -8(%rbp)
movq -8(%rbp), %rax
call *%rax
...
I think there is no inlineing now. However I do not really understand why is it important...
I am still looking for an explanation what was the goal of the author with this code, even if it not working with today's smart compilers.

The problem with that way to obfuscating function call relies on the compiler not being smart enough to see through the obfuscation. The idea here was that the caller shouldn't contain a direct reference to the function to be called, but to retrieve the pointer to the function from another function.
However modern compiler does this and when applying optimization they remove the obfuscation again. What the compiler does is probably simple inline expansion of GET_FN_PTR and when inline expanded it is quite obvious how to optimize - it's just a bunch of constants that's combined into a pointer which is then called. Constant expressions are quite easy to compute at compile time (and is often done).
Before you obfuscate your code you should probably have a good reason to do so, and use a method suitable for the needs.

The idea of the suggested approach is to use an indirect function call so that the function address must be computed first and then called. The C Preprocessor is used to provide a way to define a proxy function for the actual function and this proxy function provides the calculation needed to determine the actual address of the real function which the proxy function provides access to.
See Wikipedia article Proxy pattern for details about the Proxy design pattern which has this to say:
The proxy design pattern allows you to provide an interface to other
objects by creating a wrapper class as the proxy. The wrapper class,
which is the proxy, can add additional functionality to the object of
interest without changing the object's code.
I would suggest an alternative which implements the same type of indirect call however it does not require using the C Preprocessor to hide implementation details in such a fashion as to make reading of the source code difficult.
The C compiler allows for a struct to contain function pointers as members. What is nice about this is that you can define an externally visible struct variable with function pointers a members yet when the struct is defined, the functions specified in the definition of the struct variable can be static meaning they have file visibility only (see What does "static" mean in a C program.)
So I can have two files, a header file func.h and an implementation file func.c which define the struct type, the declaration of the externally visible struct variable, the functions used with a static modifier, and the externally visible struct variable definition with the function addresses.
What is attractive about this approach is that the source code is easy to read and most IDEs will handle this sort of indirect much nicer because the C Preprocessor is not being used to create source at compile time which affects readability by people and by software tools such as IDEs.
An example func.h file, which would be #included into the C source file using the functions, could look like:
// define a type using a typedef so that we can declare the externally
// visible struct in this include file and then use the same type when
// defining the externally visible struct in the implementation file which
// will also have the definitions for the actual functions which will have
// file visibility only because we will use the static modifier to restrict
// the functions' visibility to file scope only.
typedef struct {
int (*p1)(int a);
int (*p2)(int a);
} FuncList;
// declare the externally visible struct so that anything using it will
// be able to access it and its members or the addresses of the functions
// available through this struct.
extern FuncList myFuncList;
And the func.c file example could look like:
#include <stdio.h>
#include "func.h"
// the functions that we will be providing through the externally visible struct
// are here. we mark these static since the only access to these is through
// the function pointer members of the struct so we do not want them to be
// visible outside of this file. also this prevents name clashes between these
// functions and other functions that may be linked into the application.
// this use of an externally visible struct with function pointer members
// provides something similar to the use of namespace in C++ in that we
// can use the externally visible struct as a way to create a kind of
// namespace by having everything go through the struct and hiding the
// functions using the static modifier to restrict visibility to the file.
static int p1Thing(int a)
{
return printf ("-- p1 %d\n", a);
}
static int p2Thing(int a)
{
return printf ("-- p2 %d\n", a);
}
// externally visible struct with function pointers to allow indirect access
// to the static functions in this file which are not visible outside of
// this file. we do this definition here so that we have the prototypes
// of the functions which are defined above to allow the compiler to check
// calling interface against struct member definition.
FuncList myFuncList = {
p1Thing,
p2Thing
};
A simple C source file using this externally visible struct could look like:
#include "func.h"
int main(int argc, char * argv[])
{
// call function p1Thing() through the struct function pointer p1()
myFuncList.p1 (1);
// call function p2Thing() through the struct function pointer p2()
myFuncList.p2 (2);
return 0;
}
The assembler emitted by Visual Studio 2005 for the above main() looks like the following showing a computed call through the specified address:
; 10 : myFuncList.p1 (1);
00000 6a 01 push 1
00002 ff 15 00 00 00
00 call DWORD PTR _myFuncList
; 11 : myFuncList.p2 (2);
00008 6a 02 push 2
0000a ff 15 04 00 00
00 call DWORD PTR _myFuncList+4
00010 83 c4 08 add esp, 8
; 12 : return 0;
00013 33 c0 xor eax, eax
As you can see this function calls are now indirect function calls through a struct specified by an offset within the struct.
The nice thing about this approach is that you can do whatever you want to the memory area containing the function pointers so long as before you call a function through the data area, the correct function addresses have been put there. So you could actually have two functions, one that would initialize the area with the correct addresses and a second that would clear the area. So before using the functions you would call the function to initialize the area and after finishing with the functions call the function to clear the area.
// file scope visible struct containing the actual or real function addresses
// which can be used to initialize the externally visible copy.
static FuncList myFuncListReal = {
p1Thing,
p2Thing
};
// NULL addresses in externally visible struct to cause crash is default.
// Must use myFuncListInit() to initialize the pointers
// with the actual or real values.
FuncList myFuncList = {
0,
0
};
// externally visible function that will update the externally visible struct
// with the correct function addresses to access the static functions.
void myFuncListInit (void)
{
myFuncList = myFuncListReal;
}
// externally visible function to reset the externally visible struct back
// to NULLs in order to clear the addresses making the functions no longer
// available to external users of this file.
void myFuncListClear (void)
{
memset (&myFuncList, 0, sizeof(myFuncList));
}
So you could do something like this modified main():
myFuncListInit();
myFuncList.p1 (1);
myFuncList.p2 (2);
myFuncListClear();
However what you would really want to do is to have the call to myFuncListInit() be someplace in the source that would not be near where the functions are actually used.
Another interesting option would be to have the data area encrypted and in order to use the program, the user would need to enter the correct key to properly decrypt the data to get the correct pointer addresses.

The "obfuscation" in C/C++ is mainly related to the size of compiled code. If it is too short (e.g. 500-1000 assembly lines), every middle level programmer can decode it and find what is necessary for several days or hours.

Related

Specifying registers for function arguments?

Some compilers, says old gcc or egcs, apply ABI-breaking optimization for static functions within single file, like passing arguments or returning results with arbitrary registers.
Consider some source code like:
// Original foobar.c
// This example targets MIPS o32 ABI.
// Shared subroutine
// Compiler decided to use $16, $17 to pass a0 and a1 to minimize stack usage and move between registers.
static void __bar(int a0, int a1) {
// Something very complicated
}
// ...
void foo(int a0, int a1) {
// ...
/*
This call was compiled to something like:
ori $16, $0, 0x1
jal __bar
ori $17, $0, 0x1
*/
__bar(1, 1);
// ...
}
// ...
Suppose someone want to restore / reimplement foobar.c from the compiled assembly without access to the original source.
One would probably like to decompile / rewrite some part first, says start from foo() or other standard functions.
However, in order to test the correctness of the implementation, one must deal with calls to non-standard ABI routines.
A trivial way is to workaround with global register variables provided by gcc / clang:
// Restoration of foobar.c
// void __bar(int asm("s0"), int asm("s1"))
// External function in assembly, says foobar.s, which is from compiled original foobar.c.
void __bar();
volatile register int s0 asm ("s0"); // $16 = s0
volatile register int s1 asm ("s1"); // $17 = s1
// ...
void foo(int a0, int a1) {
// ...
// __bar(1, 1);
s0 = 1; s1 = 1;
__bar();
// ...
}
// ...
The question is:
Does gcc / clang supports customize calling convention for some specific functions?
Are there any way to deal with non-standard ABI calls more elegantly?
Does gcc / clang supports customize calling convention for some specific functions?
The best you can do is opt in to one of the specific supported calling conventions, e.g. one of these for x86. If the static function in question does not conform to any of them, then you're stuck.
Are there any way to deal with non-standard ABI calls more elegantly?
Nothing truly elegant. If none of the supported calling conventions apply, you're stuck with either:
Reversing & rebuilding the whole thing (so it can compile as normal without relying on original binaries), or at least enough of it that you're replacing ABI conforming functions and all their dependencies completely, or
Calling it from assembly, explicitly passing the arguments per the non-standard calling conventions of the compiled function.
#2 is the basis for the most elegant solution, which is basically to write a wrapper function in assembly that receives the arguments and returns the values according to the ABI, and otherwise does nothing but rearrange them to pass to the non-standard function it wraps (and possibly fix up the return value if it's not returning according to normal rules). You write the wrapper(s) once, and now the rest of your code can be written in C, calling the wrapper functions which adhere to the ABI and being blissfully unaware of the weirdness under the covers.
Similarly, if you're trying to replace the existing non-standard function with another, you'd write the non-conforming wrapper in assembly, then write your replacement function in plain C and have the wrapper call it, and swap in your wrapper in your hacked together mix of the original binary and the new code.

How to intercept a static library call in C language?

Here's my question:
There is a static library (xxx.lib) and some C files who are calling function foo() in xxx.lib. I'm hoping to get a notification message every time foo() is called. But I'm not allowed to change any source code written by others.
I've spent several days searching on the Internet and found several similar Q&As but none of these suggestions could really solve my problem. I list some of them:
use gcc -wrap: Override a function call in C
Thank god, I'm using Microsoft C compiler and linker, and I can't find an equivalent option as -wrap.
Microsoft Detours:
Detours intercepts C calls in runtime and re-direct the call to a trampoline function. But Detours is only free for IA32 version, and it's not open source.
I'm thinking about injecting a jmp instruction at the start of function foo() to redirect it to my own function. However it's not feasible when foo()is empty, like
void foo() ---> will be compiled into 0xC3 (ret)
{ but it'll need at least 8 bytes to inject a jmp
}
I found a technology named Hotpatch on MSDN. It says the linker will add serveral bytes of padding at the beginning of each function. That's great, because I can replace the padding bytes with jmp instruction to realize the interception in runtime! But when I use the /FUNCTIONPADMIN option with the linker, it gives me a warning:
LINK : warning LNK4044: unrecognized option '/FUNCTIONPADMIN'; ignored
Anybody could tell me how could I make a "hotpatchable" image correctly? Is it a workable solution for my question ?
Do I still have any hope to realize it ?
If you have the source, you can instrument the code with GCC without changing the source by adding -finstrument-functions for the build of the files containing the functions you are interested in. You'll then have to write __cyg_profile_func_enter/exit functions to print your tracing. An example from here:
#include <stdio.h>
#include <time.h>
static FILE *fp_trace;
void
__attribute__ ((constructor))
trace_begin (void)
{
fp_trace = fopen("trace.out", "w");
}
void
__attribute__ ((destructor))
trace_end (void)
{
if(fp_trace != NULL) {
fclose(fp_trace);
}
}
void
__cyg_profile_func_enter (void *func, void *caller)
{
if(fp_trace != NULL) {
fprintf(fp_trace, "e %p %p %lu\n", func, caller, time(NULL) );
}
}
void
__cyg_profile_func_exit (void *func, void *caller)
{
if(fp_trace != NULL) {
fprintf(fp_trace, "x %p %p %lu\n", func, caller, time(NULL));
}
}
Another way to go if you have source to recompile the library as a shared library. From there it is possible to do runtime insertions of your own .so/.dll using any number of debugging systems. (ltrace on unix, something or other on windows [somebody on windows -- please edit]).
If you don't have source, then I would think your option 3 should still work. Folks writing viruses have been doing it for years. You may have to do some manual inspection (because x86 instructions aren't all the same length), but the trick is to pull out a full instruction and replace it with a jump to somewhere safe. Do what you have to do, get the registers back into the same state as if the instruction you removed had run, then jump to just after the jump instruction you inserted.
The VC compiler provides 2 options /Gh & /GH for hooking functions.
The /Gh flag causes a call to the _penter function at the start of every method or function, and the /GH flag causes a call to the _pexit function at the end of every method or function.
So, if I write some code in _penter to find out the address of the caller function, then I'll be able to intercept any function selectively by comparing the function address.
I made a sample:
#include <stdio.h>
void foo()
{
}
void bar()
{
}
void main() {
bar();
foo();
printf ("I'm main()!");
}
void __declspec(naked) _cdecl _penter( void )
{
__asm {
push ebp; // standard prolog
mov ebp, esp;
sub esp, __LOCAL_SIZE
pushad; // save registers
}
unsigned int addr;
// _ReturnAddress always returns the address directly after the call, but that is not the start of the function!
// subtract 5 bytes as instruction for call _penter
// is 5 bytes long on 32-bit machines, e.g. E8 <00 00 00 00>
addr = (unsigned int)_ReturnAddress() - 5;
if (addr == foo) printf ("foo() is called.\n");
if (addr == bar) printf ("bar() is called.\n");
_asm {
popad; // restore regs
mov esp, ebp; // standard epilog
pop ebp;
ret;
}
}
Build it with cl.exe source.c /Gh and run it:
bar() is called.
foo() is called.
I'm main()!
It's perfect!
More examples about how to use _penter and _pexit can be found here A Simple Profiler and tracing with penter pexit and A Simple C++ Profiler on x64.
I've solved my problem using this method, and I hope it can help you also.
:)
I don't think there is any to do this without changing any code.
Easiest way I can think of is to do this is to write wrapper for your void foo() function and Find/Replace it with your wrapper.
void myFoo(){
return foo();
}
Instead of calling foo() call myFoo().
Hope this will help you.

incompatible return type from struct function - C

When I attempt to run this code as it is, I receive the compiler message "error: incompatible types in return". I marked the location of the error in my code. If I take the line out, then the compiler is happy.
The problem is I want to return a value representing invalid input to the function (which in this case is calling f2(2).) I only want a struct returned with data if the function is called without using 2 as a parameter.
I feel the only two ways to go is to either:
make the function return a struct pointer instead of a dead-on struct but then my caller function will look funny as I have to change y.b to y->b and the operation may be slower due to the extra step of fetching data in memory.
Allocate extra memory, zero-byte fill it, and set the return value to the struct in that location in memory. (example: return x[nnn]; instead of return x[0];). This approach will use more memory and some processing to zero-byte fill it.
Ultimately, I'm looking for a solution that will be fastest and cleanest (in terms of code) in the long run. If I have to be stuck with using -> to address members of elements then I guess that's the way to go.
Does anyone have a solution that uses the least cpu power?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
long a;
char b;
}ab;
static char dat[500];
ab f2(int set){
ab* x=(ab*)dat;
if (set==2){return NULL;}
if (set==1){
x->a=1;
x->b='1';
x++;
x->a=2;
x->b='2';
x=(ab*)dat;
}
return x[0];
}
int main(){
ab y;
y=f2(1);
printf("%c",y.b);
y.b='D';
y=f2(0);
printf("%c",y.b);
return 0;
}
If you care about speed, it is implementation specific.
Notice that the Linux x86-64 ABI defines that a struct of two (exactly) scalar members (that is, integers, doubles, or pointers, -which all fit in a single machine register- but not struct etc... which are aggregate data) is returned thru two registers (without going thru the stack), and that is quite fast.
BTW
if (set==2){return NULL;} //wrong
is obviously wrong. You could code:
if (set==2) return (aa){0,0};
Also,
ab* x=(ab*)dat; // suspicious
looks suspicious to me (since you return x[0]; later). You are not guaranteed that dat is suitably aligned (e.g. to 8 or 16 bytes), and on some platforms (notably x86-64) if dat is misaligned you are at least losing performance (actually, it is undefined behavior).
BTW, I would suggest to always return with instructions like return (aa){l,c}; (where l is an expression convertible to long and c is an expression convertible to char); this is probably the easiest to read, and will be optimized to load the two return registers.
Of course if you care about performance, for benchmarking purposes, you should enable optimizations (and warnings), e.g. compile with gcc -Wall -Wextra -O2 -march=native if using GCC; on my system (Linux/x86-64 with GCC 5.2) the small function
ab give_both(long x, char y)
{ return (ab){x,y}; }
is compiled (with gcc -O2 -march=native -fverbose-asm -S) into:
.globl give_both
.type give_both, #function
give_both:
.LFB0:
.file 1 "ab.c"
.loc 1 7 0
.cfi_startproc
.LVL0:
.loc 1 7 0
xorl %edx, %edx # D.2139
movq %rdi, %rax # x, x
movb %sil, %dl # y, D.2139
ret
.cfi_endproc
you see that all the code is using registers, and no memory is used at all..
I would use the return value as an error code, and the caller passes in a pointer to his struct such as:
int f2(int set, ab *outAb); // returns -1 if invalid input and 0 otherwise

Tell gcc that a function call will not return

I am using C99 under GCC.
I have a function declared static inline in a header that I cannot modify.
The function never returns but is not marked __attribute__((noreturn)).
How can I call the function in a way that tells the compiler it will not return?
I am calling it from my own noreturn function, and partly want to suppress the "noreturn function returns" warning but also want to help the optimizer etc.
I have tried including a declaration with the attribute but get a warning about the repeated declaration.
I have tried creating a function pointer and applying the attribute to that, but it says the function attribute cannot apply to a pointed function.
From the function you defined, and which calls the external function, add a call to __builtin_unreachable which is built into at least GCC and Clang compilers and is marked noreturn. In fact, this function does nothing else and should not be called. It's only here so that the compiler can infer that program execution will stop at this point.
static inline external_function() // lacks the noreturn attribute
{ /* does not return */ }
__attribute__((noreturn)) void your_function() {
external_function(); // the compiler thinks execution may continue ...
__builtin_unreachable(); // ... and now it knows it won't go beyond here
}
Edit: Just to clarify a few points raised in the comments, and generally give a bit of context:
A function has has only two ways of not returning: loop forever, or short-circuit the usual control-flow (e.g. throw an exception, jump out of the function, terminate the process, etc.)
In some cases, the compiler may be able to infer and prove through static analysis that a function will not return. Even theoretically, this is not always possible, and since we want compilers to be fast only obvious/easy cases are detected.
__attribute__((noreturn)) is an annotation (like const) which is a way for the programmer to inform the compiler that he's absolutely sure a function will not return. Following the trust but verify principle, the compiler tries to prove that the function does indeed not return. If may then issue an error if it proves the function may return, or a warning if it was not able to prove whether the function returns or not.
__builtin_unreachable has undefined behaviour because it is not meant to be called. It's only meant to help the compiler's static analysis. Indeed the compiler knows that this function does not return, so any following code is provably unreachable (except through a jump).
Once the compiler has established (either by itself, or with the programmer's help) that some code is unreachable, it may use this information to do optimizations like these:
Remove the boilerplate code used to return from a function to its caller, if the function never returns
Propagate the unreachability information, i.e. if the only execution path to a code points is through unreachable code, then this point is also unreachable. Examples:
if a function does not return, any code following its call and not reachable through jumps is also unreachable. Example: code following __builtin_unreachable() is unreachable.
in particular, it the only path to a function's return is through unreachable code, the function can be marked noreturn. That's what happens for your_function.
any memory location / variable only used in unreachable code is not needed, therefore settings/computing the content of such data is not needed.
any computations which is probably (1) unnecessary (previous bullet) and (2) has no side effects (such as pure functions) may be removed.
Illustration:
The call to external_function cannot be removed because it might have side-effects. In fact, it probably has at least the side effect of terminating the process!
The return boiler plate of your_function may be removed
Here's another example showing how code before the unreachable point may be removed
int compute(int) __attribute((pure)) { return /* expensive compute */ }
if(condition) {
int x = compute(input); // (1) no side effect => keep if x is used
// (8) x is not used => remove
printf("hello "); // (2) reachable + side effect => keep
your_function(); // (3) reachable + side effect => keep
// (4) unreachable beyond this point
printf("word!\n"); // (5) unreachable => remove
printf("%d\n", x); // (6) unreachable => remove
// (7) mark 'x' as unused
} else {
// follows unreachable code, but can jump here
// from reachable code, so this is reachable
do_stuff(); // keep
}
Several solutions:
redeclaring your function with the __attribute__
You should try to modify that function in its header by adding __attribute__((noreturn)) to it.
You can redeclare some functions with new attribute, as this stupid test demonstrates (adding an attribute to fopen) :
#include <stdio.h>
extern FILE *fopen (const char *__restrict __filename,
const char *__restrict __modes)
__attribute__ ((warning ("fopen is used")));
void
show_map_without_care (void)
{
FILE *f = fopen ("/proc/self/maps", "r");
do
{
char lin[64];
fgets (lin, sizeof (lin), f);
fputs (lin, stdout);
}
while (!feof (f));
fclose (f);
}
overriding with a macro
At last, you could define a macro like
#define func(A) {func(A); __builtin_unreachable();}
(this uses the fact that inside a macro, the macro name is not macro-expanded).
If your never-returning func is declaring as returning e.g. int you'll use a statement expression like
#define func(A) ({func(A); __builtin_unreachable(); (int)0; })
Macro-based solutions like above won't always work, e.g. if func is passed as a function pointer, or simply if some guy codes (func)(1) which is legal but ugly.
redeclaring a static inline with the noreturn attribute
And the following example:
// file ex.c
// declare exit without any standard header
void exit (int);
// define myexit as a static inline
static inline void
myexit (int c)
{
exit (c);
}
// redeclare it as notreturn
static inline void myexit (int c) __attribute__ ((noreturn));
int
foo (int *p)
{
if (!p)
myexit (1);
if (p)
return *p + 2;
return 0;
}
when compiled with GCC 4.9 (from Debian/Sid/x86-64) as gcc -S -fverbose-asm -O2 ex.c) gives an assembly file containing the expected optimization:
.type foo, #function
foo:
.LFB1:
.cfi_startproc
testq %rdi, %rdi # p
je .L5 #,
movl (%rdi), %eax # *p_2(D), *p_2(D)
addl $2, %eax #, D.1768
ret
.L5:
pushq %rax #
.cfi_def_cfa_offset 16
movb $1, %dil #,
call exit #
.cfi_endproc
.LFE1:
.size foo, .-foo
You could play with #pragma GCC diagnostic to selectively disable a warning.
Customizing GCC with MELT
Finally, you could customize your recent gcc using the MELT plugin and coding your simple extension (in the MELT domain specific language) to add the attribute noreturn when encoutering the desired function. It is probably a dozen of MELT lines, using register_finish_decl_first and a match on the function name.
Since I am the main author of MELT (free software GPLv3+) I could perhaps even code that for you if you ask, e.g. here or preferably on gcc-melt#googlegroups.com; give the concrete name of your never-returning function.
Probably the MELT code is looking like:
;;file your_melt_mode.melt
(module_is_gpl_compatible "GPLv3+")
(defun my_finish_decl (decl)
(let ( (tdecl (unbox :tree decl))
)
(match tdecl
(?(tree_function_decl_named
?(tree_identifier ?(cstring_same "your_function_name")))
;;; code to add the noreturn attribute
;;; ....
))))
(register_finish_decl_first my_finish_decl)
The real MELT code is slightly more complex. You want to define your_adding_attr_mode there. Ask me for more.
Once you coded your MELT extension your_melt_mode.melt for your needs (and compiled that MELT extension into your_melt_mode.quicklybuilt.so as documented in the MELT tutorials) you'll compile your code with
gcc -fplugin=melt \
-fplugin-arg-melt-extra=your_melt_mode.quicklybuilt \
-fplugin-arg-melt-mode=your_adding_attr_mode \
-O2 -I/your/include -c yourfile.c
In other words, you just add a few -fplugin-* flags to your CFLAGS in your Makefile !
BTW, I'm just coding in the MELT monitor (on github: https://github.com/bstarynk/melt-monitor ..., file meltmom-process.melt something quite similar.
With a MELT extension, you won't get any additional warning, since the MELT extension would alter the internal GCC AST (a GCC Tree) of the declared function on the fly!
Customizing GCC with MELT is probably the most bullet-proof solution, since it is modifying the GCC internal AST. Of course, it is probably the most costly solution (and it is GCC specific and might need -small- changes when GCC is evolving, e.g. when using the next version of GCC), but as I am trying to show it is quite easy in your case.
PS. In 2019, GCC MELT is an abandoned project. If you want to customize GCC (for any recent version of GCC, e.g. GCC 7, 8, or 9), you need to write your own GCC plugin in C++.

Implementing a new strcpy function redefines the library function strcpy?

It is said that we can write multiple declarations but only one definition. Now if I implement my own strcpy function with the same prototype :
char * strcpy ( char * destination, const char * source );
Then am I not redefining the existing library function? Shouldn't this display an error? Or is it somehow related to the fact that the library functions are provided in object code form?
EDIT: Running the following code on my machine says "Segmentation fault (core dumped)". I am working on linux and have compiled without using any flags.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *strcpy(char *destination, const char *source);
int main(){
char *s = strcpy("a", "b");
printf("\nThe function ran successfully\n");
return 0;
}
char *strcpy(char *destination, const char *source){
printf("in duplicate function strcpy");
return "a";
}
Please note that I am not trying to implement the function. I am just trying to redefine a function and asking for the consequences.
EDIT 2:
After applying the suggested changes by Mats, the program no longer gives a segmentation fault although I am still redefining the function.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *strcpy(char *destination, const char *source);
int main(){
char *s = strcpy("a", "b");
printf("\nThe function ran successfully\n");
return 0;
}
char *strcpy(char *destination, const char *source){
printf("in duplicate function strcpy");
return "a";
}
C11(ISO/IEC 9899:201x) §7.1.3 Reserved Identifiers
— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise.
— All identifiers with external linkage in any of the following subclauses (including the
future library directions) are always reserved for use as identifiers with external
linkage.
— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
If the program declares or defines an identifier in a context in which it is reserved, or defines a reserved identifier as a macro name, the behavior is undefined. Note that this doesn't mean you can't do that, as this post shows, it can be done within gcc and glibc.
glibc §1.3.3 Reserved Names proveds a clearer reason:
The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names. All other library names are reserved if your program explicitly includes the header file that defines or declares them. There are several reasons for these restrictions:
Other people reading your code could get very confused if you were using a function named exit to do something completely different from what the standard exit function does, for example. Preventing this situation helps to make your programs easier to understand and contributes to modularity and maintainability.
It avoids the possibility of a user accidentally redefining a library function that is called by other library functions. If redefinition were allowed, those other functions would not work properly.
It allows the compiler to do whatever special optimizations it pleases on calls to these functions, without the possibility that they may have been redefined by the user. Some library facilities, such as those for dealing with variadic arguments (see Variadic Functions) and non-local exits (see Non-Local Exits), actually require a considerable amount of cooperation on the part of the C compiler, and with respect to the implementation, it might be easier for the compiler to treat these as built-in parts of the language.
That's almost certainly because you are passing in a destination that is a "string literal".
char *s = strcpy("a", "b");
Along with the compiler knowing "I can do strcpy inline", so your function never gets called.
You are trying to copy "b" over the string literal "a", and that won't work.
Make a char a[2]; and strcpy(a, "b"); and it will run - it probably won't call your strcpy function, because the compiler inlines small strcpy even if you don't have optimisation available.
Putting the matter of trying to modify non-modifiable memory aside, keep in mind that you are formally not allowed to redefine standard library functions.
However, in some implementations you might notice that providing another definition for standard library function does not trigger the usual "multiple definition" error. This happens because in such implementations standard library functions are defined as so called "weak symbols". Foe example, GCC standard library is known for that.
The direct consequence of that is that when you define your own "version" of standard library function with external linkage, your definition overrides the "weak" standard definition for the entire program. You will notice that not only your code now calls your version of the function, but also all class from all pre-compiled [third-party] libraries are also dispatched to your definition. It is intended as a feature, but you have to be aware of it to avoid "using" this feature inadvertently.
You can read about it here, for one example
How to replace C standard library function ?
This feature of the implementation doesn't violate the language specification, since it operates within uncharted area of undefined behavior not governed by any standard requirements.
Of course, the calls that use intrinsic/inline implementation of some standard library function will not be affected by the redefinition.
Your question is misleading.
The problem that you see has nothing to do with the re-implementation of a library function.
You are just trying to write non-writable memory, that is the memory where the string literal a exists.
To put it simple, the following program gives a segmentation fault on my machine (compiled with gcc 4.7.3, no flags):
#include <string.h>
int main(int argc, const char *argv[])
{
strcpy("a", "b");
return 0;
}
But then, why the segmentation fault if you are calling a version of strcpy (yours) that doesn't write the non-writable memory? Simply because your function is not being called.
If you compile your code with the -S flag and have a look at the assembly code that the compiler generates for it, there will be no call to strcpy (because the compiler has "inlined" that call, the only relevant call that you can see from main, is a call to puts).
.file "test.c"
.section .rodata
.LC0:
.string "a"
.align 8
.LC1:
.string "\nThe function ran successfully"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movw $98, .LC0(%rip)
movq $.LC0, -8(%rbp)
movl $.LC1, %edi
call puts
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
.section .rodata
.LC2:
.string "in duplicate function strcpy"
.text
.globl strcpy
.type strcpy, #function
strcpy:
.LFB3:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
movl $.LC2, %edi
movl $0, %eax
call printf
movl $.LC0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE3:
.size strcpy, .-strcpy
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.
I think Yu Hao answer has a great explanation for this, the quote from the standard:
The names of all library types, macros, variables and functions that
come from the ISO C standard are reserved unconditionally; your
program may not redefine these names. All other library names are
reserved if your program explicitly includes the header file that
defines or declares them. There are several reasons for these
restrictions:
[...]
It allows the compiler to do whatever special optimizations it pleases
on calls to these functions, without the possibility that they may
have been redefined by the user.
your example can operate in this way : ( with strdup )
char *strcpy(char *destination, const char *source);
int main(){
char *s = strcpy(strdup("a"), strdup("b"));
printf("\nThe function ran successfully\n");
return 0;
}
char *strcpy(char *destination, const char *source){
printf("in duplicate function strcpy");
return strdup("a");
}
output :
in duplicate function strcpy
The function ran successfully
The way to interpret this rule is that you cannot have multiple definitions of a function end up in the final linked object (the executable). So, if all the objects included in the link have only one definition of a function, then you are good. Keeping this in mind, consider the following scenarios.
Let's say you redefine a function somefunction() that is defined in some library. Your function is in main.c (main.o) and in the library the function is in an a object named someobject.o (in the libray). Remember that in the final link, the linker only looks for unresolved symbols in the libraries. Because somefunction() is resolved already from main.o, the linker does not even look for it in the libraries and does not pull in someobject.o. The final link has only one definition of the function, and things are fine.
Now imagine that there is another symbol anotherfunction() defined in someobject.o that you also happen to call. The linker will try to resolve anotherfunction() from someobject.o, and pull it in from the library, and it will become a part of the final link. Now you have two definitions of somefunction() in the final link - one from main.o and another from someobject.o, and the linker will throw an error.
I use this one frequently:
void my_strcpy(char *dest, char *src)
{
int i;
i = 0;
while (src[i])
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
}
and you can also do strncpy just by modify one line
void my_strncpy(char *dest, char *src, int n)
{
int i;
i = 0;
while (src[i] && i < n)
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
}

Resources