How does typedefing influence compile/assembly? - c

Since there are two ways of writing enums, structs, unions or types where one uses typedef, or alternatively doesn't, I was wondering what would be the benefits and disadvantages of each approach.
E.g. 1: typedef enum { ENUM_A, ENUM_B } ENUM_OBJECT;
E.g. 2: typedef unsigned char uint8
Personally I like encapsulation provided by the second variant, but when I write code for all enums, structs and unions I would always avoid writing the first example with typedef, but I would use this approach:
E.g. 1: enum ENUM_TAG { ENUM_A, ENUM_B };
enum ENUM_TAG some_variable;
Yes, I know that it can take a bit more horizontal space, but to me it better details idea of what is the type than something like this:
typedef int matrix_buffer_t[2][5];
matrix_buffer_t some_variable;
Can someone outline facts (not personal opinions, since these are discussible) about differences between usage of typedef and no typedef? How does this influence compiled code, program memory size, etc?
I've tried looking at the assembly diff when I compile the following code with no typedef:
struct TestStruct
{
int field;
};
int main(void)
{
struct TestStruct test;
printf ("%d\n", test.field);
return 1;
}
vs code with typedef:
typedef struct
{
int field;
} TestStruct;
int main(void)
{
TestStruct test;
printf ("%d\n", test.field);
return 1;
}
Assembly is definitely not the same. I'm giving a side to side comparison:
___________________________________________________________________________________________________________
.LC0: .LC0:
.string "%d\n" .string "%d\n"
main: main:
pushq %rbp push rbp
movq %rsp, %rbp mov rbp, rsp
subq $16, %rsp sub rsp, 16
movl -4(%rbp), %eax mov eax, DWORD PTR [rbp-4]
movl %eax, %esi mov esi, eax
movl $.LC0, %edi mov edi, OFFSET FLAT:.LC0
movl $0, %eax mov eax, 0
call printf call printf
movl $1, %eax mov eax, 1
leave leave
ret ret
It was compiled using godbolt and gcc compiler. For sure I can see the difference I am just wondering when it is better to use which approach depending on benefits/flaws.
Note: I've tried compiling to a .map file which gives addresses of each variable, size, types, etc. and when typedef is used, .map file becomes more compex.

Typedef is the only feature that makes C language grammar a context-sensitive grammar.
Its effect is to change the environment of the parser, to convert some symbols into special symbols.
Without typedef a parser for the C language would not need an environmental structure to keep track of the definitions of type symbols.
Note that in ISO/IEC 9899:1999 5.1.1.2 Translation phases, typedef acts into the 7th step (the syntax analyser step):
White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are
syntactically and semantically analyzed and translated as a translation unit.

Can someone outline facts (not personal opinions, since these are discussible) about differences between usage of typedef and no typedef?
This is the very same, mostly subjective discussion as whether you should typedef struct or not. Most people use typedef, except the Linux world which tends to type out struct tag and there is no obvious right or wrong.
(Although the attempt to provide a rationale for using the latter style in the unprofessional document Linux kernel coding style is laughable, entirely subjective and similar to the arguments for using "Hungarian notation".)
Regarding enum, you typically use the typedef just to not having to type out the enum tag every time you use it. Anonymous enum without a name are typically just use when dealing with a local, encapsulated type.
There are a few non-subjective reasons why typing out keyword + tag is bad:
Two preprocessor tokens instead of one can be problematic to use when passing types to function-like macros, using them in X-macros and similar. Example:
#define SUPPORTED_TYPES(X) \
X(int) \
X(float) \
X(struct foo) \
enum
{
#define ENUM_CONSTANT(type) type ## _val,
SUPPORTED_TYPES(ENUM_CONSTANT)
};
This code is attempting to create a number of constants corresponding to a list of supported types. It will create int_val, float_val and then fail upon struct foo_val because there are two preprocessor tokens in that type and spaces can't be used in identifiers. On the other hand, the same problem exists when using unsigned int.
Not typing out the tag is consistent with C++ coding style. C++ doesn't have tags in the same manner as C, but the name of the type is the type - with no need to type out the keyword explicitly.
Regarding typedef unsigned char uint8 specifically, this is very bad practice but not because of the typedef but because you are inventing your own "local garage standard" instead of using standard C stdint.h types.
I've tried looking at the assembly diff when I compile the following code with no typedef
You are doing something wrong, like not using the same optimizer settings or forgetting #include <stdio.h> in C90 mode. There's identical machine code, see for yourself: https://godbolt.org/z/zd5q7Pc77

Related

How to optimize "don't care" argument with gcc?

Sometimes a function doesn't use an argument (perhaps because another "flags" argument doesn't enable a specific feature).
However, you have to specify something, so usually you just put 0. But if you do that, and the function is external, gcc will emit code to "really make sure" that parameter gets set to 0.
Is there a way to tell gcc that a particular argument to a function doesn't matter and it can leave alone whatever value it is that happens to be in the argument register right then?
Update: Someone asked about the XY problem. The context behind this question is I want to implement a varargs function in x86_64 without using the compiler varargs support. This is simplest when the parameters are on the stack, so I declare my functions to take 5 or 6 dummy parameters first, so that the last non-vararg parameter and all of the vararg parameters end up on the stack. This works fine, except it's clearly not optimal - when looking at the assembly code it's clear that gcc is initializing all those argument registers to zero in the caller.
Please don't take below answer seriously. The question asks for a hack so there you go.
GCC will effectively treat value of uninitialized variable as "don't care" so we can try exploiting this:
int foo(int x, int y);
int bar_1(int y) {
int tmp = tmp; // Suppress uninitialized warnings
return foo(tmp, y);
}
Unfortunately my version of GCC still cowardly initializes tmp to zero but yours may be more aggressive:
bar_1:
.LFB0:
.cfi_startproc
movl %edi, %esi
xorl %edi, %edi
jmp foo
.cfi_endproc
Another option is (ab)using inline assembly to fake GCC into thinking that tmp is defined (when in fact it isn't):
int bar_2(int y) {
int tmp;
asm("" : "=r"(tmp));
return foo(tmp, y);
}
With this GCC managed to get rid of parameter initializations:
bar_2:
.LFB1:
.cfi_startproc
movl %edi, %esi
jmp foo
.cfi_endproc
Note that inline asm must be immediately before the function call, otherwise GCC will think it has to preserve output values which would harm register allocation.

Obfuscating function call

John Viega suggests a method to obfuscate function calls in his book Secure Programming Cookbook for C and C++. It can be read here.
#define SET_FN_PTR(func, num) \
static inline void *get_##func(void) { \
int i, j = num / 4; \
long ptr = (long)func + num; \
for (i = 0; i < 2; i++) ptr -= j; \
return (void *)(ptr - (j * 2)); \
}
#define GET_FN_PTR(func) get_##func( )
#include <stdio.h>
void my_func(void) {
printf("my_func( ) called!\n");
}
SET_FN_PTR(my_func, 0x01301100); /* 0x01301100 is some arbitrary value */
int main(int argc, char *argv[ ]) {
void (*ptr)(void);
ptr = GET_FN_PTR(my_func); /* get the real address of the function */
(*ptr)( ); /* make the function call */
return 0;
}
I compiled it with gcc fp.c -S -O2, Ubuntu 15.10 64bit, gcc5.2.1, and checked the assemby:
...
my_func:
.LFB23:
.cfi_startproc
movl $.LC0, %edi
jmp puts
.cfi_endproc
.LFE23:
.size my_func, .-my_func
.section .text.unlikely
.LCOLDE1:
.text
.LHOTE1:
.section .text.unlikely
.LCOLDB2:
.section .text.startup,"ax",#progbits
.LHOTB2:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB25:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call my_func
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
...
I see that my_func is called in main. Can somebody explain how this method obfuscates the function call?
I see that many readers just come and downvote. I took the time the understand the problem, and when I failed to post it here. Please at least write some comment, instead of pushing the downvote button.
UPDATE: Turning off optimization I got:
...
my_func:
...
get_my_func:
...
main:
...
call get_my_func
movq %rax, -8(%rbp)
movq -8(%rbp), %rax
call *%rax
...
I think there is no inlineing now. However I do not really understand why is it important...
I am still looking for an explanation what was the goal of the author with this code, even if it not working with today's smart compilers.
The problem with that way to obfuscating function call relies on the compiler not being smart enough to see through the obfuscation. The idea here was that the caller shouldn't contain a direct reference to the function to be called, but to retrieve the pointer to the function from another function.
However modern compiler does this and when applying optimization they remove the obfuscation again. What the compiler does is probably simple inline expansion of GET_FN_PTR and when inline expanded it is quite obvious how to optimize - it's just a bunch of constants that's combined into a pointer which is then called. Constant expressions are quite easy to compute at compile time (and is often done).
Before you obfuscate your code you should probably have a good reason to do so, and use a method suitable for the needs.
The idea of the suggested approach is to use an indirect function call so that the function address must be computed first and then called. The C Preprocessor is used to provide a way to define a proxy function for the actual function and this proxy function provides the calculation needed to determine the actual address of the real function which the proxy function provides access to.
See Wikipedia article Proxy pattern for details about the Proxy design pattern which has this to say:
The proxy design pattern allows you to provide an interface to other
objects by creating a wrapper class as the proxy. The wrapper class,
which is the proxy, can add additional functionality to the object of
interest without changing the object's code.
I would suggest an alternative which implements the same type of indirect call however it does not require using the C Preprocessor to hide implementation details in such a fashion as to make reading of the source code difficult.
The C compiler allows for a struct to contain function pointers as members. What is nice about this is that you can define an externally visible struct variable with function pointers a members yet when the struct is defined, the functions specified in the definition of the struct variable can be static meaning they have file visibility only (see What does "static" mean in a C program.)
So I can have two files, a header file func.h and an implementation file func.c which define the struct type, the declaration of the externally visible struct variable, the functions used with a static modifier, and the externally visible struct variable definition with the function addresses.
What is attractive about this approach is that the source code is easy to read and most IDEs will handle this sort of indirect much nicer because the C Preprocessor is not being used to create source at compile time which affects readability by people and by software tools such as IDEs.
An example func.h file, which would be #included into the C source file using the functions, could look like:
// define a type using a typedef so that we can declare the externally
// visible struct in this include file and then use the same type when
// defining the externally visible struct in the implementation file which
// will also have the definitions for the actual functions which will have
// file visibility only because we will use the static modifier to restrict
// the functions' visibility to file scope only.
typedef struct {
int (*p1)(int a);
int (*p2)(int a);
} FuncList;
// declare the externally visible struct so that anything using it will
// be able to access it and its members or the addresses of the functions
// available through this struct.
extern FuncList myFuncList;
And the func.c file example could look like:
#include <stdio.h>
#include "func.h"
// the functions that we will be providing through the externally visible struct
// are here. we mark these static since the only access to these is through
// the function pointer members of the struct so we do not want them to be
// visible outside of this file. also this prevents name clashes between these
// functions and other functions that may be linked into the application.
// this use of an externally visible struct with function pointer members
// provides something similar to the use of namespace in C++ in that we
// can use the externally visible struct as a way to create a kind of
// namespace by having everything go through the struct and hiding the
// functions using the static modifier to restrict visibility to the file.
static int p1Thing(int a)
{
return printf ("-- p1 %d\n", a);
}
static int p2Thing(int a)
{
return printf ("-- p2 %d\n", a);
}
// externally visible struct with function pointers to allow indirect access
// to the static functions in this file which are not visible outside of
// this file. we do this definition here so that we have the prototypes
// of the functions which are defined above to allow the compiler to check
// calling interface against struct member definition.
FuncList myFuncList = {
p1Thing,
p2Thing
};
A simple C source file using this externally visible struct could look like:
#include "func.h"
int main(int argc, char * argv[])
{
// call function p1Thing() through the struct function pointer p1()
myFuncList.p1 (1);
// call function p2Thing() through the struct function pointer p2()
myFuncList.p2 (2);
return 0;
}
The assembler emitted by Visual Studio 2005 for the above main() looks like the following showing a computed call through the specified address:
; 10 : myFuncList.p1 (1);
00000 6a 01 push 1
00002 ff 15 00 00 00
00 call DWORD PTR _myFuncList
; 11 : myFuncList.p2 (2);
00008 6a 02 push 2
0000a ff 15 04 00 00
00 call DWORD PTR _myFuncList+4
00010 83 c4 08 add esp, 8
; 12 : return 0;
00013 33 c0 xor eax, eax
As you can see this function calls are now indirect function calls through a struct specified by an offset within the struct.
The nice thing about this approach is that you can do whatever you want to the memory area containing the function pointers so long as before you call a function through the data area, the correct function addresses have been put there. So you could actually have two functions, one that would initialize the area with the correct addresses and a second that would clear the area. So before using the functions you would call the function to initialize the area and after finishing with the functions call the function to clear the area.
// file scope visible struct containing the actual or real function addresses
// which can be used to initialize the externally visible copy.
static FuncList myFuncListReal = {
p1Thing,
p2Thing
};
// NULL addresses in externally visible struct to cause crash is default.
// Must use myFuncListInit() to initialize the pointers
// with the actual or real values.
FuncList myFuncList = {
0,
0
};
// externally visible function that will update the externally visible struct
// with the correct function addresses to access the static functions.
void myFuncListInit (void)
{
myFuncList = myFuncListReal;
}
// externally visible function to reset the externally visible struct back
// to NULLs in order to clear the addresses making the functions no longer
// available to external users of this file.
void myFuncListClear (void)
{
memset (&myFuncList, 0, sizeof(myFuncList));
}
So you could do something like this modified main():
myFuncListInit();
myFuncList.p1 (1);
myFuncList.p2 (2);
myFuncListClear();
However what you would really want to do is to have the call to myFuncListInit() be someplace in the source that would not be near where the functions are actually used.
Another interesting option would be to have the data area encrypted and in order to use the program, the user would need to enter the correct key to properly decrypt the data to get the correct pointer addresses.
The "obfuscation" in C/C++ is mainly related to the size of compiled code. If it is too short (e.g. 500-1000 assembly lines), every middle level programmer can decode it and find what is necessary for several days or hours.

Will compilers optimize double logical negation in conditionals?

Consider the following hypothetical type:
typedef struct Stack {
unsigned long len;
void **elements;
} Stack;
And the following hypothetical macros for dealing with the type (purely for enhanced readability.) In these macros I am assuming that the given argument has type (Stack *) instead of merely Stack (I can't be bothered to type out a _Generic expression here.)
#define stackNull(stack) (!stack->len)
#define stackHasItems(stack) (stack->len)
Why do I not simply use !stackNull(x) for checking if a stack has items? I thought that this would be slightly less efficient (read: not noticeable at all really, but I thought it was interesting) than simply checking stack->len because it would lead to double negation. In the following case:
int thingy = !!31337;
printf("%d\n", thingy);
if (thingy)
doSomethingImportant(thingy);
The string "1\n" would be printed, and It would be impossible to optimize the conditional (well actually, only impossible if the thingy variable didn't have a constant initializer or was modified before the test, but we'll say in this instance that 31337 is not a constant) because (!!x) is guaranteed to be either 0 or 1.
But I'm wondering if compilers will optimize something like the following
int thingy = wellOkaySoImNotAConstantThingyAnyMore();
if (!!thingy)
doSomethingFarLessImportant();
Will this be optimized to actually just use (thingy) in the if statement, as if the if statement had been written as
if (thingy)
doSomethingFarLessImportant();
If so, does it expand to (!!!!!thingy) and so on? (however this is a slightly different question, as this can be optimized in any case, !thingy is !!!!!thingy no matter what, just like -(-(-(1))) = -1.)
In the question title I said "compilers", by that I mean that I am asking if any compiler does this, however I am particularly interested in how GCC will behave in this instance as it is my compiler of choice.
This seems like a pretty reasonable optimization and a quick test using godbolt with this code (see it live):
#include <stdio.h>
void func( int x)
{
if( !!x )
{
printf( "first\n" ) ;
}
if( !!!x )
{
printf( "second\n" ) ;
}
}
int main()
{
int x = 0 ;
scanf( "%d", &x ) ;
func( x ) ;
}
seems to indicate gcc does well, it generates the following:
func:
testl %edi, %edi # x
jne .L4 #,
movl $.LC1, %edi #,
jmp puts #
.L4:
movl $.LC0, %edi #,
jmp puts #
we can see from the first line:
testl %edi, %edi # x
it just uses x without doing any operations on it, also notice the optimizer is clever enough to combine both tests into one since if the first condition is true the other must be false.
Note I used printf and scanf for side effects to prevent the optimizer from optimizing all the code away.

Implementing a new strcpy function redefines the library function strcpy?

It is said that we can write multiple declarations but only one definition. Now if I implement my own strcpy function with the same prototype :
char * strcpy ( char * destination, const char * source );
Then am I not redefining the existing library function? Shouldn't this display an error? Or is it somehow related to the fact that the library functions are provided in object code form?
EDIT: Running the following code on my machine says "Segmentation fault (core dumped)". I am working on linux and have compiled without using any flags.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *strcpy(char *destination, const char *source);
int main(){
char *s = strcpy("a", "b");
printf("\nThe function ran successfully\n");
return 0;
}
char *strcpy(char *destination, const char *source){
printf("in duplicate function strcpy");
return "a";
}
Please note that I am not trying to implement the function. I am just trying to redefine a function and asking for the consequences.
EDIT 2:
After applying the suggested changes by Mats, the program no longer gives a segmentation fault although I am still redefining the function.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *strcpy(char *destination, const char *source);
int main(){
char *s = strcpy("a", "b");
printf("\nThe function ran successfully\n");
return 0;
}
char *strcpy(char *destination, const char *source){
printf("in duplicate function strcpy");
return "a";
}
C11(ISO/IEC 9899:201x) §7.1.3 Reserved Identifiers
— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise.
— All identifiers with external linkage in any of the following subclauses (including the
future library directions) are always reserved for use as identifiers with external
linkage.
— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
If the program declares or defines an identifier in a context in which it is reserved, or defines a reserved identifier as a macro name, the behavior is undefined. Note that this doesn't mean you can't do that, as this post shows, it can be done within gcc and glibc.
glibc §1.3.3 Reserved Names proveds a clearer reason:
The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names. All other library names are reserved if your program explicitly includes the header file that defines or declares them. There are several reasons for these restrictions:
Other people reading your code could get very confused if you were using a function named exit to do something completely different from what the standard exit function does, for example. Preventing this situation helps to make your programs easier to understand and contributes to modularity and maintainability.
It avoids the possibility of a user accidentally redefining a library function that is called by other library functions. If redefinition were allowed, those other functions would not work properly.
It allows the compiler to do whatever special optimizations it pleases on calls to these functions, without the possibility that they may have been redefined by the user. Some library facilities, such as those for dealing with variadic arguments (see Variadic Functions) and non-local exits (see Non-Local Exits), actually require a considerable amount of cooperation on the part of the C compiler, and with respect to the implementation, it might be easier for the compiler to treat these as built-in parts of the language.
That's almost certainly because you are passing in a destination that is a "string literal".
char *s = strcpy("a", "b");
Along with the compiler knowing "I can do strcpy inline", so your function never gets called.
You are trying to copy "b" over the string literal "a", and that won't work.
Make a char a[2]; and strcpy(a, "b"); and it will run - it probably won't call your strcpy function, because the compiler inlines small strcpy even if you don't have optimisation available.
Putting the matter of trying to modify non-modifiable memory aside, keep in mind that you are formally not allowed to redefine standard library functions.
However, in some implementations you might notice that providing another definition for standard library function does not trigger the usual "multiple definition" error. This happens because in such implementations standard library functions are defined as so called "weak symbols". Foe example, GCC standard library is known for that.
The direct consequence of that is that when you define your own "version" of standard library function with external linkage, your definition overrides the "weak" standard definition for the entire program. You will notice that not only your code now calls your version of the function, but also all class from all pre-compiled [third-party] libraries are also dispatched to your definition. It is intended as a feature, but you have to be aware of it to avoid "using" this feature inadvertently.
You can read about it here, for one example
How to replace C standard library function ?
This feature of the implementation doesn't violate the language specification, since it operates within uncharted area of undefined behavior not governed by any standard requirements.
Of course, the calls that use intrinsic/inline implementation of some standard library function will not be affected by the redefinition.
Your question is misleading.
The problem that you see has nothing to do with the re-implementation of a library function.
You are just trying to write non-writable memory, that is the memory where the string literal a exists.
To put it simple, the following program gives a segmentation fault on my machine (compiled with gcc 4.7.3, no flags):
#include <string.h>
int main(int argc, const char *argv[])
{
strcpy("a", "b");
return 0;
}
But then, why the segmentation fault if you are calling a version of strcpy (yours) that doesn't write the non-writable memory? Simply because your function is not being called.
If you compile your code with the -S flag and have a look at the assembly code that the compiler generates for it, there will be no call to strcpy (because the compiler has "inlined" that call, the only relevant call that you can see from main, is a call to puts).
.file "test.c"
.section .rodata
.LC0:
.string "a"
.align 8
.LC1:
.string "\nThe function ran successfully"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movw $98, .LC0(%rip)
movq $.LC0, -8(%rbp)
movl $.LC1, %edi
call puts
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
.section .rodata
.LC2:
.string "in duplicate function strcpy"
.text
.globl strcpy
.type strcpy, #function
strcpy:
.LFB3:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
movl $.LC2, %edi
movl $0, %eax
call printf
movl $.LC0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE3:
.size strcpy, .-strcpy
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.
I think Yu Hao answer has a great explanation for this, the quote from the standard:
The names of all library types, macros, variables and functions that
come from the ISO C standard are reserved unconditionally; your
program may not redefine these names. All other library names are
reserved if your program explicitly includes the header file that
defines or declares them. There are several reasons for these
restrictions:
[...]
It allows the compiler to do whatever special optimizations it pleases
on calls to these functions, without the possibility that they may
have been redefined by the user.
your example can operate in this way : ( with strdup )
char *strcpy(char *destination, const char *source);
int main(){
char *s = strcpy(strdup("a"), strdup("b"));
printf("\nThe function ran successfully\n");
return 0;
}
char *strcpy(char *destination, const char *source){
printf("in duplicate function strcpy");
return strdup("a");
}
output :
in duplicate function strcpy
The function ran successfully
The way to interpret this rule is that you cannot have multiple definitions of a function end up in the final linked object (the executable). So, if all the objects included in the link have only one definition of a function, then you are good. Keeping this in mind, consider the following scenarios.
Let's say you redefine a function somefunction() that is defined in some library. Your function is in main.c (main.o) and in the library the function is in an a object named someobject.o (in the libray). Remember that in the final link, the linker only looks for unresolved symbols in the libraries. Because somefunction() is resolved already from main.o, the linker does not even look for it in the libraries and does not pull in someobject.o. The final link has only one definition of the function, and things are fine.
Now imagine that there is another symbol anotherfunction() defined in someobject.o that you also happen to call. The linker will try to resolve anotherfunction() from someobject.o, and pull it in from the library, and it will become a part of the final link. Now you have two definitions of somefunction() in the final link - one from main.o and another from someobject.o, and the linker will throw an error.
I use this one frequently:
void my_strcpy(char *dest, char *src)
{
int i;
i = 0;
while (src[i])
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
}
and you can also do strncpy just by modify one line
void my_strncpy(char *dest, char *src, int n)
{
int i;
i = 0;
while (src[i] && i < n)
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
}

at&t asm inline c++ problem

My Code
const int howmany = 5046;
char buffer[howmany];
asm("lea buffer,%esi"); //Get the address of buffer
asm("mov howmany,%ebx"); //Set the loop number
asm("buf_loop:"); //Lable for beginning of loop
asm("movb (%esi),%al"); //Copy buffer[x] to al
asm("inc %esi"); //Increment buffer address
asm("dec %ebx"); //Decrement loop count
asm("jnz buf_loop"); //jump to buf_loop if(ebx>0)
My Problem
I am using the gcc compiler. For some reason my buffer/howmany variables are undefined in the eyes of my asm. I'm not sure why. I just want to move the beginning address of my buffer array into the esi register, loop it 'howmany' times while copying each element to the al register.
Are you using the inline assembler in gcc? (If not, in what other C++ compiler, exactly?)
If gcc, see the details here, and in particular this example:
asm ("leal (%1,%1,4), %0"
: "=r" (five_times_x)
: "r" (x)
);
%0 and %1 are referring to the C-level variables, and they're listed specifically as the second (for outputs) and third (for inputs) parameters to asm. In your example you have only "inputs" so you'd have an empty second operand (traditionally one uses a comment after that colon, such as /* no output registers */, to indicate that more explicitly).
The part that declares an array like that
int howmany = 5046;
char buffer[howmany];
is not valid C++. In C++ it is impossible to declare an array that has "variable" or run-time size. In C++ array declarations the size is always a compile-time constant.
If your compiler allows this array declaration, it means that it implements it as an extension. In that case you have to do your own research to figure out how it implements such a run-time sized array internally. I would guess that internally buffer will be implemented as a pointer, not as a true array. If my guess is correct and it is really a pointer, then the proper way to load the address of the array into esi might be
mov buffer,%esi
and not a lea, as in your code. lea will only work with "normal" compile-time sized arrays, but not with run-time sized arrays.
Another question is whether you really need a run-time sized array in your code. Could it be that you just made it so by mistake? If you simply change the howmany declaration to
const int howmany = 5046;
the array will turn into an "normal" C++ array and your code might start working as is (i.e. with lea).
All of those asm instructions need to be in the same asm statement if you want to be sure they're contiguous (without compiler-generated code between them), and you need to declare input / output / clobber operands or you will step on the compiler's registers.
You can't use lea or mov to/from a C variable name (except for global / static symbols which are actually defined in the compiler's asm output, but even then you usually shouldn't).
Instead of using mov instructions to set up inputs, ask the compiler to do it for you using input operand constraints. If the first or last instruction of a GNU C inline asm statement, usually that means you're doing it wrong and writing inefficient code.
And BTW, GNU C++ allows C99-style variable-length arrays, so howmany is allowed to be non-const and even set in a way that doesn't optimize away to a constant. Any compiler that can compile GNU-style inline asm will also support variable-length arrays.
How to write your loop properly
If this looks over-complicated, then https://gcc.gnu.org/wiki/DontUseInlineAsm. Write a stand-alone function in asm so you can just learn asm instead of also having to learn about gcc and its complex but powerful inline-asm interface. You basically have to know asm and understand compilers to use it correctly (with the right constraints to prevent breakage when optimization is enabled).
Note the use of named operands like %[ptr] instead of %2 or %%ebx. Letting the compiler choose which registers to use is normally a good thing, but for x86 there are letters other than "r" you can use, like "=a" for rax/eax/ax/al specifically. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, and also other links in the inline-assembly tag wiki.
I also used buf_loop%=: to append a unique number to the label, so if the optimizer clones the function or inlines it multiple places, the file will still assemble.
Source + compiler asm output on the Godbolt compiler explorer.
void ext(char *);
int foo(void)
{
int howmany = 5046; // could be a function arg
char buffer[howmany];
//ext(buffer);
const char *bufptr = buffer; // copy the pointer to a C var we can use as a read-write operand
unsigned char result;
asm("buf_loop%=: \n\t" // do {
" movb (%[ptr]), %%al \n\t" // Copy buffer[x] to al
" inc %[ptr] \n\t"
" dec %[count] \n\t"
" jnz buf_loop \n\t" // } while(ebx>0)
: [res]"=a"(result) // al = write-only output
, [count] "+r" (howmany) // input/output operand, any register
, [ptr] "+r" (bufptr)
: // no input-only operands
: "memory" // we read memory that isn't an input operand, only pointed to by inputs
);
return result;
}
I used %%al as an example of how to write register names explicitly: Extended Asm (with operands) needs a double % to get a literal % in the asm output. You could also use %[res] or %0 and let the compiler substitute %al in its asm output. (And then you'd have no reason to use a specific-register constraint unless you wanted to take advantage of cbw or lodsb or something like that.) result is unsigned char, so the compiler will pick a byte register for it. If you want the low byte of a wider operand, you could use %b[count] for example.
This uses a "memory" clobber, which is inefficient. You don't need the compiler to spill everything to memory, only to make sure that the contents of buffer[] in memory matches the C abstract machine state. (This is not guaranteed by passing a pointer to it in a register).
gcc7.2 -O3 output:
pushq %rbp
movl $5046, %edx
movq %rsp, %rbp
subq $5056, %rsp
movq %rsp, %rcx # compiler-emitted to satisfy our "+r" constraint for bufptr
# start of the inline-asm block
buf_loop18:
movb (%rcx), %al
inc %rcx
dec %edx
jnz buf_loop
# end of the inline-asm block
movzbl %al, %eax
leave
ret
Without a memory clobber or input constraint, leave appears before the inline asm block, releasing that stack memory before the inline asm uses the now-stale pointer. A signal-handler running at the wrong time would clobber it.
A more efficient way is to use a dummy memory operand which tells the compiler that the entire array is a read-only memory input to the asm statement. See get string length in inline GNU Assembler for more about this flexible-array-member trick for telling the compiler you read all of an array without specifying the length explicitly.
In C you can define a new type inside a cast, but you can't in C++, hence the using instead of a really complicated input operand.
int bar(unsigned howmany)
{
//int howmany = 5046;
char buffer[howmany];
//ext(buffer);
buffer[0] = 1;
buffer[100] = 100; // test whether we got the input constraints right
//using input_t = const struct {char a[howmany];}; // requires a constant size
using flexarray_t = const struct {char a; char x[];};
const char *dummy;
unsigned char result;
asm("buf_loop%=: \n\t" // do {
" movb (%[ptr]), %%al \n\t" // Copy buffer[x] to al
" inc %[ptr] \n\t"
" dec %[count] \n\t"
" jnz buf_loop \n\t" // } while(ebx>0)
: [res]"=a"(result) // al = write-only output
, [count] "+r" (howmany) // input/output operand, any register
, "=r" (dummy) // output operand in the same register as buffer input, so we can modify the register
: [ptr] "2" (buffer) // matching constraint for the dummy output
, "m" (*(flexarray_t *) buffer) // whole buffer as an input operand
//, "m" (*buffer) // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
: // no clobbers
);
buffer[100] = 101;
return result;
}
I also used a matching constraint so buffer could be an input directly, and the output operand in the same register means we can modify that register. We got the same effect in foo() by using const char *bufptr = buffer; and then using a read-write constraint to tell the compiler that the new value of that C variable is what we leave in the register. Either way we leave a value in a dead C variable that goes out of scope without being read, but the matching constraint way can be useful for macros where you don't want to modify the value of your input (and don't need the type of your input: int dummy would work fine, too.)
The buffer[100] = 100; and buffer[100] = 101; assignments are there to show that they both appear in the asm, instead of being merged across the inline-asm (which does happen if you leave out the "m" input operand). IDK why the buffer[100] = 101; isn't optimized away; it's dead so it should be. Also note that asm volatile doesn't block this reordering, so it's not an alternative to a "memory" clobber or using the right constraints.

Resources