Asm variable in C code - c

I am reading Secure Programming Cookbook for C and C++ from John Viega. There is a code snippet where I need some help to understand:
asm(".long 0xCEFAEDFE \n"
"crc32_stored: \n"
".long 0xFFFFFFFF \n"
".long 0xCEFAEDFE \n"
);
int main(){
//crc32_stored used here as a variable
}
What do these lines exactly mean: "crc32_stored:\n", ".long 0xFFFFFFFF \n"? Is this a variable definition and initialization?
Trying to compile the code from the book I got the following error:
error: ‘crc32_stored’ undeclared (first use in this function)

crc32_stored: is simply a label, which in assembler is just an alias for a memory address. Since the label itself does not take up any space in the object code the address represented by crc32_stored is the address of .long 0xFFFFFFFF which assembles to four FF-bytes. In the object code the label will show up as a symbol, which means pretty much the same thing (just an alias for an address).
In C, a variable is (in a way) yet another way to express the same thing: A name that refers to a certain address in memory, but it has additional type information, i.e. int or long. You can create a variable in C with int crc32_stored = 0xFFFFFFFF; which (minus the type information) is equivalent to assembly crc32_stored: .long 0xFFFFFFFF, but that will create a different alias to yet another address.
You can tell the C compiler to not reserve a new address for the name "crc32_stored" but to create only the alias part and then to couple it with the address of a symbol with the same name. That is done with a declaration using the "extern" storage-class specifier, as in extern int crc32_stored. By this you "promise" to later link against another object file that will have this symbol.
Obviously you have to take care yourself that the C type information matches the intention of the assembly code (i.e. there are 4 bytes at the given address that should be interpreted as a signed 32-bit integer).
Addendum:
Without the extra declaration the symbol is not visible from C code, because the assembler parts are processed separately. The symbols can not be exported to C code automatically because the type information is missing. (An assembly label does not even include information about whether it points to data or code.)

Related

Why is `extern` used in this MPLAB C example?

In the MPLAB XC8 Compiler User Guide, an example on page 162 (reproduced below) uses the extern keyword in conjunction with the # specifier. Given that we are specifying the address ourselves, why is this needed? It's not going to be allocating any memory per se.
The only reason I can think of is maybe extern variables aren't zeroed at startup. But then, C variables generally contain garbage anyway until you explicitly assign to them. So...I dunno.
Maybe it has something to do with it being in a header file? To avoid multiple #include statements causing a "variable already declared" error of some sort?
If the pointer has to access objects in data memory, you need to define a different object to act as a dummy target. For example, if the checksum was to be calculated over 10 bytes starting at address 0x90 in data memory, the following code could be used.
const char * cp;
extern char inputData[10] # 0x90;
cp = &inputData;
// cp is incremented over inputData and used to read values there
No memory is consumed by the extern declaration, and this can be mapped over the top of existing objects.
It makes no difference. It's mainly a choice of being explicit vs. being implicit.

Understanding an x86 ASM function in C

I'm currently working through the pintos project and had a question about some assembly macros the project has included
#define syscall1(NUMBER, ARG0) \
({ \
int retval; \
asm volatile \
("pushl %[arg0]; pushl %[number]; int $0x30; addl $8, %%esp" \
: "=a" (retval) \
: [number] "i" (NUMBER), \
[arg0] "g" (ARG0) \
: "memory"); \
retval; \
})
This macro is called to set up the stack for a syscall with only one argument. We push the one argument, the syscall number and trap to kernel. We only pass NUMBER and ARG0, I was wondering where the [number] and [arg0] (lowercase) come from. I have read some docs but didnt find answers. Would love some help!
Thanks
In GCC’s extended assembly syntax, the notation [name] "constraints" (expression) says:
Make expression available to the assembly code.
Put the expression in a place satisfying the constraints. The constraints describe acceptable places to use, such as general processor registers, floating-point registers, and memory. They may also include symbols telling GCC that the expression will be changed by the assembly code or both read and changed. (For output operands, the expression should be an lvalue so that it provides a place for the new value to be written.)
Use name as the name of the place. Then, when GCC sees %[name] in the assembly code, it replaces that with the assembly expression that refers to the place, such as %rax or 16(r3). The [name] part of the operand notation is optional. If you do not give it, GCC gives the operands names of 0, 1, 2,…, so the assembly code would refer to them with %0, %1, %2,...
The part enclosed in square brackets is a symbolic name used in the ASM tempate only. The part in the parentheses is the reference to the variable name in your C program. (More detailed description below)
From the GCC documentation for the ASM template:
[ [asmSymbolicName] ] constraint (cvariablename)
asmSymbolicName
Specifies a symbolic name for the operand. Reference the name in the assembler template by enclosing it in square brackets (i.e. ‘%[Value]’). The scope of the name is the asm statement that contains the definition.* Any valid C variable name is acceptable, including names already defined in the surrounding code. *No two operands within the same asm statement can use the same symbolic name.
When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are three output operands, use ‘%0’ in the template to refer to the first, ‘%1’ for the second, and ‘%2’ for the third.
constraint
A string constant specifying constraints on the placement of the operand; See Constraints, for details.
Output constraints must begin with either ‘=’ (a variable overwriting an existing value) or ‘+’ (when reading and writing). When using ‘=’, do not assume the location contains the existing value on entry to the asm, except when the operand is tied to an input; see Input Operands.
After the prefix, there must be one or more additional constraints (see Constraints) that describe where the value resides. Common constraints include ‘r’ for register and ‘m’ for memory. When you list more than one possible location (for example, "=rm"), the compiler chooses the most efficient one based on the current context. If you list as many alternates as the asm statement allows, you permit the optimizers to produce the best possible code. If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, local register variables may provide a solution (see Local Register Variables).
cvariablename
Specifies a C lvalue expression to hold the output, typically a variable name. The enclosing parentheses are a required part of the syntax.*
...
Extended Asm - Assembler Instructions with C Expression Operands

How/when memory is assigned to global variables in C

I am aware of C memory layout and binary formation process.
I have a doubt/query regarding the phase when and who assigns address to global variables.
extern int dummy; //Declared in some other file
int * pTest = &dummy;
This code compiles well. Here pTest will have address of dummy only if address is assigned to it.
I want to know in which phase (compilation or linker) does dummy variable gets address?
The compiler says:
int *pTest = &<where is dummy?>;
The linker says:
int *pTest= &<dummy is here>;
The loader says:
int *pTest= <dummy is at 0x1234>;
This somewhat simplified explanation tries to convey the following:
The compiler identifies that an external variable dummy is used
The linker identifies where and in which module this variable resides
But only once the executable program is placed in memory is the actual location of the variable known and the loader puts this actual address in all the places where dummy is used.
the actual process is actually a bit different.
The compiler saves the information in the object file about the the assignment and the external object reference.
The linker depending on the actual hardware IS and implementation calculates the absolute address ( if the code will be placed at the fixed address - for example the embedded uC project) or same virtual and sets the entry in the relocation table (If the code is position independent) and the loaded is changing this virtuall address to the correct one during the program loading and start-up.

An Example of complicated define in C

#define _FUID1(x) __attribute__((section("__FUID1.sec"),space(prog))) int _FUID1 = (x);
I am trying to make sense of the about the above define. the _FUID(x) macro. This relates to program memory and has the attribute of the section defining in the code section memory area?
what does the above trying to accomplish?
The macro isn't doing anything interesting or complicated at all; it just outputs a declaration for int _FUID1, with its parameter as an initializer, and with an attributes list ahead of it.
As for what the attributes list means, look at the documentation for variable attributes in GCC. section puts the variable in a named section, which allows the linker to relocate it to a special address or do some other interesting thing to it, and space isn't documented, but space(prog) sounds like a directive to put a value into the program address space instead of the data address space on a Harvard-architecture machine.
I think this is hardware specific (some Microchip unit), it places a value, for example:
__attribute__((section("__FUID1.sec"),space(prog))) int _FUID1 = (0xf1);
into unit id register 1 (__FUID1.sec), in the program flash to configure the hardware. See the pic documentation (for references to FUID) and MPLAB C30 manual (for description of memory spaces).

Use label in assembly from C

I simply need a way to load the address of a label e.g. MyLabel: in e.g. 'src.asm' into a variable in e.g. 'src.c'. (These files will be linked together) I am using gcc and nasm to assemble these files. How can I load the label address?
There are two steps to this. First, you must export the label as global from the assembly file using the global directive.
global MyLabel
MyLabel: dd 1234 ; data or code, in whatever section. It doesn't matter.
Next, you must declare the label as external in C. You can do this either in the code using it, or in a header.
// It doesn't matter, and can be plain void,
// but prefer giving it a C type that matches what you put there with asm
extern void MyLabel(void); // The label is code, even if not actually a function
extern const uint32_t MyLabel[]; // The label is data
// *not* extern long *MyLabel, unless the memory at MyLabel *holds* a pointer.
Finally, you get the address of the label in C the same way you get the address of any variable.
doSomethingWith( &MyLabel );
Note that some compilers add an underscore to the beginning of C variable and function names. For example, GCC does this on Mac OS X, but not Linux. I don't know about other platforms/compilers. To be on the safe side, you can add an asm statement to the variable declaration to tell GCC what the assembly name for the variable is.
extern uint8_t MyLabel asm("MyLabel");
You might consider an assembler "getter" routine.
Also, you might be able to simply fake the label to look like a routine to the C binder so that you could take the address of the "procedure".

Resources