Plot pixel VGA with inline Assembly - c

I have a simple function to plot a pixel with inline assembly in c using djgpp and 256 VGA in DOS Box:
byte *VGA = (byte *)0xA0000;
void plot_pixel(int x, int y, byte color){
int offset;
if(x>32 && x <=320 && y>0 && y<180){
//x=x-1;
//y=y-1;
if(x<0){x=0;}
if(y<0){y=0;}
offset = (y<<8) + (y<<6) + x;
VGA[offset]=color;
}
}
I´m working on translating it to inline assembly and I have this:
void plot_pixel(int x, int y, byte color){
int offset;
if(x>32 && x <=320 && y>0 && y<180){
//x=x-1;
//y=y-1;
if(x<0){x=0;}
if(y<0){y=0;}
// offset = (y<<8) + (y<<6) + x;
// VGA[offset]=color;
__asm__ (
"mov $0xA000,%edx;"
"mov $320,%ax;"
"mul y;" //not sure how to reference my variable here
"add x,%ax;" //not sure how to reference my variable here
"mov %ax,%bx;"
"mov color,%al;" //not sure how to reference my variable here
"mov %al,%bx:(%edx);"
);
}
}
However I´m getting several errors on the compiler. I´m not familiarized with the GCC inline assembly so any help in correcting my code will be apreciated.

First, for segmentation you'd have to use a segment register (and you can't load 0xA000 into a general purpose register and use that). However...
DJGPP has its own "DOS extender" and runs your code in 32-bit protected mode; and segments work very differently in protected mode. For this reason you can't use segments as if you're in real mode; and have to create a "segment descriptor" that you can use with special library functions. For examples of this, see http://www.delorie.com/djgpp/v2faq/faq18_4.html
For GCC's inline assembly, the compiler doesn't understand assembly at all and mostly just inserts your code directly into the compiler's output (possibly after doing some simple text substitutions). Because of this you need to tell the compiler which registers are used for inputs, which registers are used for outputs, and which things (registers, memory, etc) are "clobbered" (modified by your code).
You should be able to find multiple pages online that describe how to provide the input/output/clobber lists and their format.
Note: DJGPP is "GCC ported to DOS" so most information for GCC works the same for DJGPP; and you'll have more luck searching for (e.g.) "GCC inline assembly" than you will searching for "DJGPP inline assembly" because GCC is still in use.

Related

Porting AT&T inline-asm inb / outb wrappers to work with gcc -masm=intel

I am currently working on my x86 OS. I tried implementing the inb function from here and it gives me Error: Operand type mismatch for `in'.
This may also be the same with outb or io_wait.
I am using Intel syntax (-masm=intel) and I don't know what to do.
Code:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
With AT&T syntax this does work.
For outb I'm having a different problem after reversing the operands:
void io_wait(void)
{
asm volatile ( "outb $0x80, %0" : : "a"(0) );
}
Error: operand size mismatch for `out'
If you need to use -masm=intel you will need to insure that your inline assembly is in Intel syntax. Intel syntax is dst, src (AT&T syntax is reverse). This somewhat related answer has some useful information on some differences between NASM's Intel variant1 (not GAS's variant) and AT&T syntax:
Information on how you can go about translating NASM Intel syntax to GAS's AT&T syntax can be found in this Stackoverflow Answer, and a lot of useful information is provided in this IBM article.
[snip]
In general the biggest differences are:
With AT&T syntax the source is on the left and destination is on the right and Intel is the reverse.
With AT&T syntax register names are prepended with a %
With AT&T syntax immediate values are prepended with a $
Memory operands are probably the biggest difference. NASM uses [segment:disp+base+index*scale] instead of GAS's syntax of segment:disp(base, index, scale).
The problem in your code is that source and destination operands have to be reversed from the original AT&T syntax you were working with. This code:
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "Nd"(port) );
Needs to be:
asm volatile ( "inb %0, %1"
: "=a"(ret)
: "Nd"(port) );
Regarding your update: the problem is that in Intel syntax immediate values are not prepended with a $. This line is a problem:
asm volatile ( "outb $0x80, %0" : : "a"(0) );
It should be:
asm volatile ( "outb 0x80, %0" : : "a"(0) );
If you had a proper outb function you could do something like this instead:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb %0, %1"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb %1, %0"
:
: "a"(byte),
"Nd"(port) );
}
void io_wait(void)
{
outb (0x80, 0);
}
A slightly more complex version that supports both the AT&T and Intel dialects:
Multiple assembler dialects in asm templates On targets such as x86,
GCC supports multiple assembler dialects. The -masm option controls
which dialect GCC uses as its default for inline assembler. The
target-specific documentation for the -masm option contains the list
of supported dialects, as well as the default dialect if the option is
not specified. This information may be important to understand, since
assembler code that works correctly when compiled using one dialect
will likely fail if compiled using another. See x86 Options.
If your code needs to support multiple assembler dialects (for
example, if you are writing public headers that need to support a
variety of compilation options), use constructs of this form:
{ dialect0 | dialect1 | dialect2... }
On x86 and x86-64 targets there are two dialects. Dialect0 is AT&T syntax and Dialect1 is Intel syntax. The functions could be reworked this way:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb {%[port], %[retreg] | %[retreg], %[port]}"
: [retreg]"=a"(ret)
: [port]"Nd"(port) );
return ret;
}
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb {%[byte], %[port] | %[port], %[byte]}"
:
: [byte]"a"(byte),
[port]"Nd"(port) );
}
void io_wait(void)
{
outb (0x80, 0);
}
I have also given the constraints symbolic names rather than using %0 and %1 to make the inline assembly easier to read and maintain.. From the GCC documentation each constraint has the form:
[ [asmSymbolicName] ] constraint (cvariablename)
Where:
asmSymbolicName
Specifies a symbolic name for the operand. Reference the name in the assembler template by enclosing it in square brackets (i.e. ‘%[Value]’). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.
When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are three output operands, use ‘%0’ in the template to refer to the first, ‘%1’ for the second, and ‘%2’ for the third.
This version should work2 whether you compile with -masm=intel or -masm=att options
Footnotes
1Although NASM Intel dialect and GAS's (GNU Assembler) Intel syntax are similar there are some differences. One is that NASM Intel syntax uses [segment:disp+base+index*scale] where a segment can be specified inside the [] and GAS's Intel syntax requires the segment outside with segment:[disp+base+index*scale].
2Although the code will work, you should place all these basic functions in the ioaccess.h file directly and eliminate them from the .c file that contains them. Because you placed these basic functions in a separate .c file (external linkage) the compiler can't optimize them as well as it could. You can modify the functions to be of type static inline and place them in the header directly. The compiler will then have the ability to optimize the code by removing function calling overhead and reduce the need for extra loads and stores. You will want to compile with optimizations higher than -O0. Consider -O2 or -O3.
Special Notes Regarding OS Development:
There are many toy OSes (examples, tutorials, and even code on OSDev Wiki) that do not work with optimizations on. Many failures are due to bad/poor inline assembly or using undefined behaviour. Inline assembly should be used as a last resort. If your kernel doesn't run with optimizations on it is likely not a bug in the compiler (it is possible just not likely).
Heed the advice in #PeterCordes answer regarding port access that may trigger DMA reads.
It's possible to writing code that works with or without -masm=intel, using dialect alternatives for GNU C inline asm https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html (This is a good idea for headers that other people might include.)
It works like "{at&t stuff | intel stuff}": the compiler picks which side of the | to keep based on the current mode.
The major difference between AT&T vs. Intel syntax is that the operand-list is reversed, so usually you have something like "inb {%1,%0 | %0,%1}".
This is a version of #MichaelPetch's nice functions using dialect alternatives:
// make this a header: these single instructions can inline more cheaply
// than setting up args for a function call
#include <stdint.h>
static inline
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb {%1, %0 | %0, %1}"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
static inline
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb {%1, %0 | %0, %1}"
:
: "a"(byte),
"Nd"(port) );
}
static inline
void io_wait(void) {
outb (0x80, 0);
}
The Linux/Glibc sys/io.h macros sometimes use %w1 to expand a constraint to the 16-bit register name, but using types of the right size also works.
If you want a memory-barrier version of these to take advantage of the fact that in / out are (more or less) serializing like a locked instruction or mfence, add a "memory" clobber to stop compile-time reordering of memory access across it.
If a port I/O can trigger a DMA read of some other memory that you wrote recently, you might also need a "memory" clobber for that. (x86 has cache-coherent DMA so you wouldn't have had to explicitly flush it, but you can't let the compiler reorder it after an outb, or even optimize away an apparently dead store.)
There's no support in GAS for saving the old mode, so using .intel_syntax noprefix inside your inline asm leaves you no way know whether to switch back to .att_syntax or not.
But that wouldn't usually be sufficient anyway: you need to get the compiler to format operands in ways that match the syntax mode when filling in a template. e.g. the port number needs to expand to $imm or %dx (AT&T1) vs. dx or imm without the $ prefix.
Or for a memory operand, [rdi + rax*4 + 8] or 8(%rdi, %rax, 4).
But you still need to take care of reversing the operand list with { | } yourself; the compiler doesn't try to do that for you. It's purely a text-substitution into the template according to simple rules.
Footnote 1: AT&T disassembly by objdump -d bizarrely uses (%dx) for the port number in the non-immediate form, but GAS accepts %dx or (%dx) on input, so an "Nd" constraint is usable, simply expanding to the bare register name.

GCC baremetal inline-assembly SI register not playing nicely with pointers

Well, this is obviously a beginner's question, but this is my first attempt at making an operating system in C (Actually, I'm almost entirely new to C.. I'm used to asm) so, why exactly is this not valid? As far as I know, a pointer in C is just a uint16_t used to point to a certain area in memory, right (or a uint32_t and that's why it's not working)?
I've made the following kernel ("I've already made a bootloader and all in assembly to load the resulting KERNEL.BIN file):
kernel.c
void printf(char *str)
{
__asm__(
"mov si, %0\n"
"pusha\n"
"mov ah, 0x0E\n"
".repeat:\n"
"lodsb\n"
"cmp al, 0\n"
"je .done\n"
"int 0x10\n"
"jmp .repeat\n"
".done:\n"
"popa\n"
:
: "r" (str)
);
return;
}
int main()
{
char *msg = "Hello, world!";
printf(msg);
__asm__("jmp $");
return 0;
}
I've used the following command to compile it kernel.c:
gcc kernel.c -ffreestanding -m32 -std=c99 -g -O0 -masm=intel -o kernel.bin
which returns the following error:
kernel.c:3: Error: operand type mismatch for 'mov'
Why exactly might be the cause of this error?
As Michael Petch already explained, you use inline assembly only for the absolute minimum of code that cannot be done in C. For the rest there is inline assembly, but you have to be extremely careful to set the constraints and clobber list right.
Let always GCC do the job of passing the values in the right register and just specify in which register the values should be.
For your problem you probably want to do something like this
#include <stdint.h>
void print( const char *str )
{
for ( ; *str; str++) {
__asm__ __volatile__("int $0x10" : : "a" ((int16_t)((0x0E << 8) + *str)), "b" ((int16_t)0) : );
}
}
EDIT: Your assembly has the problem that you try to pass a pointer in a 16 bit register. This cannot work for 32 bit code, as 32 bit is also the pointer size.
If you in case want to generate 16 bit real-mode code, there is the -m16 option. But that does not make GCC a true 16 bit compiler, it has its limitations. Essentially it issues a .code16gcc directive in the code.
You can't simply use 16bit assembly instructions on 32-bit pointers and expect it to work. si is the lower 16bit of the esi register (which is 32bit).
gcc -m32 and -m16 both use 32-bit pointers. -m16 just uses address-size and operand-size prefixes to do mostly the same thing as normal -m32 mode, but running in real mode.
If you try to use 16bit addressing in a 32bit application you'll drop the high part of your pointers, and simply go to a different place.
Just try to read a book on intel 32bit addressing modes, and protected mode, and you'll see that many things are different on that mode.
(and if you try to switch to 64bit mode, you'll see that everything changes again)
A bootloader is something different as normally, cpu reset forces the cpu to begin in 16bit real mode. This is completely different from 32bit protected mode, which is one of the first things the operating system does. Bootloaders work in 16bit mode, and there, pointers are 16bit wide (well, not, 20bits wide, when the proper segment register is appended to the address)

Why doesn't this compiler barrier enforce ordering?

I was looking at the documentation on the Atmel website and I came across this example where they explain some issues with reordering.
Here's the example code:
#define cli() __asm volatile( "cli" ::: "memory" )
#define sei() __asm volatile( "sei" ::: "memory" )
unsigned int ivar;
void test2( unsigned int val )
{
val = 65535U / val;
cli();
ivar = val;
sei();
}
In this example, they're implementing a critical region-like mechanism. The cli instruction disables interrupts and the sei instruction enables them. Normally, I would save the interrupt state and restore to that state, but I digress...
The problem which they note is that, with optimization enabled, the division on the first line actually gets moved to after the cli instruction. This can cause some issues when you're trying to be inside of the critical region for the shortest amount of time as possible.
How come this is possible if the cli() MACRO expands to inline asm which explicitly clobbers the memory? How is the compiler free to move things before or after this statement?
Also, I modified the code to include memory barriers before every statement in the form of __asm volatile("" ::: "memory"); and it doesn't seem to change anything.
I also removed the memory clobber from the cli() and sei() MACROs, and the generated code was identical.
Of course, if I declare the test2 function argument as volatile, there is no reordering, which I assume to be because volatile statements can't be reordered with respect to other volatile statements (which the inline asm technically is). Is my assumption correct?
Can volatile accesses be reordered with respect to volatile inline asm?
Can non-volatile accesses be reordered with respect to volatile inline asm?
What's weird is that Atmel claims they need the memory clobber just to enforce the ordering of volatile accesses with respect to the asm. That doesn't make any sense to me.
If the compiler barrier isn't the proper solution for this, then how could I go about preventing any outside code from "leaking" into the critical region?
If anyone could shed some light, I'd appreciate it.
Thanks
How come this is possible if the cli() MACRO expands to inline asm which explicitly clobbers the memory? How is the compiler free to move things before or after this statement?
This is due to implementation details of avr-gcc: The compiler's support library, libgcc, provides many functions written in assembly for performance; including functions for integer division like __udivmodhi4. Not all of these functions clobber all of the callee-used registers as specified by the avr-gcc ABI. In particular, __udivmodhi4 does not clobber the Z register.
avr-gcc makes use of this as follows: On machines without 16-bit division instruction like AVR, GCC would issue a library call instead of generating code for it inline. avr-gcc however pretends that the architecture does have such division instruction and models it as having an effect on processor registers just like the library call. Finally, after all code analyzes and optimizations, the avr backend prints this instruction as [R]CALL __udivmodhi4. Let's call this a transparent call, i.e. a call which the compiler analysis does not see.
Example
int div (int a, int b, volatile const __flash char *z)
{
int ab;
(void) *z;
asm volatile ("" : "+r" (a));
ab = a / b;
asm volatile ("" : "+r" (ab));
(void) *z;
return ab;
}
Compile this with avr-gcc -S -Os -mmcu=atmega8 ... to get assembly file *.s:
div:
movw r30,r20
lpm r18,Z
rcall __divmodhi4
movw r24,r22
lpm r18,Z
ret
Explanation
(void) *z reads one byte from flash, and in order to use lpm instruction, the address must be in the Z register accomplished by movw r30,r20. After reading via lpm, the compiler issues rcall __divmodhi4 to perform signed 16-bit division. If this was an ordinary (non-transparent) call, the compiler would know nothing about the internal working of the callee, but as the avr backend models the call by hand, the compiler knows that the instruction sequence does not change Z and hence may use Z again after the call without any further ado. This allows for better code generation due to less register pressure, in particular z need not be saved / restores around the division.
The asm just serves to order the code: It is volatile and hence must not be reordered against the volatile read *z. And the asm must not be reordered against the division because the asm changes a and ab – at least that's what we are pretending and telling the compiler by means of the constraints. (These variables are not actually changed, but that does not matter here.)
Also, I modified the code to include memory barriers before every statement in the form of __asm volatile("" ::: "memory"); and it doesn't seem to change anything.
The division does not touch memory (it's a transparent call without memory clobber) hence the compiler machinery may reorder it against memory clobber / accesses.
If you need a specific order, then you'll have to introduce artificial dependencies like in in my example above.
In order to tell apart ordinary calls from transparent ones, you can dump the generated assembly in the .s file be means of -save-temps -dp where -dp prints insn names:
void func0 (void);
int func1 (int a, int b)
{
return a / b;
}
void func2 (void)
{
func0();
}
Every call that's neither call_insn nor call_value_insn is a transparent call, *divmodhi4_call in this case:
func1:
rcall __divmodhi4 ; 17 [c=0 l=1] *divmodhi4_call
movw r24,r22 ; 18 [c=4 l=1] *movhi/0
ret ; 23 [c=0 l=1] return
func2:
rjmp func0 ; 5 [c=0 l=1] call_insn/3

How to access C struct/variables from inline asm?

Consider the following code:
int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr)
{
uint32 q, m; /* Division Result */
uint32 i; /* Loop Counter */
uint32 j; /* Loop Counter */
/* Check Input */
if (bn1 == NULL) return(EFAULT);
if (bn1->dat == NULL) return(EFAULT);
if (bn2 == NULL) return(EFAULT);
if (bn2->dat == NULL) return(EFAULT);
if (bnr == NULL) return(EFAULT);
if (bnr->dat == NULL) return(EFAULT);
#if defined(__i386__) || defined(__amd64__)
__asm__ (".intel_syntax noprefix");
__asm__ ("pushl %eax");
__asm__ ("pushl %edx");
__asm__ ("pushf");
__asm__ ("movl %eax, (bn1->dat[i])");
__asm__ ("xorl %edx, %edx");
__asm__ ("divl (bn2->dat[j])");
__asm__ ("movl (q), %eax");
__asm__ ("movl (m), %edx");
__asm__ ("popf");
__asm__ ("popl %edx");
__asm__ ("popl %eax");
#else
q = bn->dat[i] / bn->dat[j];
m = bn->dat[i] % bn->dat[j];
#endif
/* Return */
return(0);
}
The data types uint32 is basically an unsigned long int or a uint32_t unsigned 32-bit integer. The type bnint is either a unsigned short int (uint16_t) or a uint32_t depending on if 64-bit data types are available or not. If 64-bit is available, then bnint is a uint32, otherwise it's a uint16. This was done in order to capture carry/overflow in other parts of the code. The structure bn_t is defined as follows:
typedef struct bn_data_t bn_t;
struct bn_data_t
{
uint32 sz1; /* Bit Size */
uint32 sz8; /* Byte Size */
uint32 szw; /* Word Count */
bnint *dat; /* Data Array */
uint32 flags; /* Operational Flags */
};
The function starts on line 300 in my source code. So when I try to compile/make it, I get the following errors:
system:/home/user/c/m3/bn 1036 $$$ ->make
clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal -Winline -Wunknown-pragmas -Wundef -Wendif-labels -c /home/user/c/m3/bn/bn.c
/home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable]
uint32 i; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable]
uint32 j; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression
__asm__ ("movl %eax, (bn1->dat[i])");
^
<inline asm>:1:18: note: instantiated into assembly here
movl %eax, (bn1->dat[i])
^
/home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression
__asm__ ("divl (bn2->dat[j])");
^
<inline asm>:1:12: note: instantiated into assembly here
divl (bn2->dat[j])
^
4 warnings and 2 errors generated.
*** [bn.o] Error code 1
Stop in /home/user/c/m3/bn.
system:/home/user/c/m3/bn 1037 $$$ ->
What I know:
I consider myself to be fairly well versed in x86 assembler (as evidenced from the code that I wrote above). However, the last time that I mixed a high level language and assembler was using Borland Pascal about 15-20 years ago when writing graphics drivers for games (pre-Windows 95 era). My familiarity is with Intel syntax.
What I don't know:
How do I access members of bn_t (especially *dat) from asm? Since *dat is a pointer to uint32, I am accessing the elements as an array (eg. bn1->dat[i]).
How do I access local variables that are declared on the stack?
I am using push/pop to restore clobbered registers to their previous values so as to not upset the compiler. However, do I also need to include the volatile keyword on the local variables as well?
Or, is there a better way that I am not aware of? I don't want to put this in a separate function call because of the calling overhead as this function is performance critical.
Additional:
Right now, I'm just starting to write this function so it is no where complete. There are missing loops and other such support/glue code. But, the main gist is accessing local variables/structure elements.
EDIT 1:
The syntax that I am using seems to be the only one that clang supports. I tried the following code and clang gave me all sorts of errors:
__asm__ ("pushl %%eax",
"pushl %%edx",
"pushf",
"movl (bn1->dat[i]), %%eax",
"xorl %%edx, %%edx",
"divl ($0x0c + bn2 + j)",
"movl %%eax, (q)",
"movl %%edx, (m)",
"popf",
"popl %%edx",
"popl %%eax"
);
It wants me to put a closing parenthesis on the first line, replacing the comma. I switched to using %% instead of % because I read somewhere that inline assembly requires %% to denote CPU registers, and clang was telling me that I was using an invalid escape sequence.
If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div, which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer:
uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];
Compilers are quite good at CSEing that into one div. Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder.
e.g. *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j], so gcc would have to reload the operands and redo the div for the % operation. See the godbolt link for bad/good examples.
Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link).
If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. (GNU C doesn't have one, AFAICT). The obvious way in C (casting operands to uint64_t) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE.
For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount. With -mpopcnt, it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call.
Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does. When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified, but code using inline asm will just load constants into registers and do a div at run-time. Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize.
Using GNU C syntax the right way
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are.
You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like, preferably by compiling with -masm=intel. The AT&T syntax bug (fsub and fsubr mnemonics are reversed) might still be present in intel-syntax mode; I forget.
Most software projects that use GNU C inline asm use AT&T syntax only.
See also the bottom of this answer for more GNU C inline asm info, and the x86 tag wiki.
An asm statement takes one string arg, and 3 sets of constraints. The easiest way to make it multi-line is by making each asm line a separate string ending with \n, and let the compiler implicitly concatenate them.
Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. Doing that would really shoot yourself in the foot. The tutorial Brett Hale linked in comments hopefully covers all this.
Correct example of div with GNU C inline asm
You can see the compiler asm output for this on godbolt.
uint32_t q, m; // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.
asm ("divl %[bn2dat_j]\n"
: "=a" (q), "=d" (m) // results are in eax, edx registers
: "d" (0), // zero edx for us, please
"a" (bn1->dat[i]), // "a" means EAX / RAX
[bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
: // no register clobbers, and we don't read/write "memory" other than operands
);
"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints.

Is it possible to access 32-bit registers in C?

Is it possible to access 32-bit registers in C ? If it is, how ? And if not, then is there any way to embed Assembly code in C ? I`m using the MinGW compiler, by the way.
Thanks in advance!
If you want to only read the register, you can simply:
register int ecx asm("ecx");
Obviously it's tied to instantiation.
Another way is using inline assembly. For example:
asm("movl %%ecx, %0;" : "=r" (value) : );
This stores the ecx value into the variable value. I've already posted a similar answer here.
Which registers do you want to access?
General purpose registers normally can not be accessed from C. You can declare register variables in a function, but that does not specify which specific registers are used. Further, most compilers ignore the register keyword and optimize the register usage automatically.
In embedded systems, it is often necessary to access peripheral registers (such as timers, DMA controllers, I/O pins). Such registers are usually memory-mapped, so they can be accessed from C...
by defining a pointer:
volatile unsigned int *control_register_ptr = (unsigned int*) 0x00000178;
or by using pre-processor:
#define control_register (*(unsigned int*) 0x00000178)
Or, you can use Assembly routine.
For using Assembly language, there are (at least) three possibilities:
A separate .asm source file that is linked with the program. The assembly routines are called from C like normal functions. This is probably the most common method and it has the advantage that hw-dependent functions are separated from the application code.
In-line assembly
Intrinsic functions that execute individual assembly language instructions. This has the advantage that the assembly language instruction can directly access any C variables.
You can embed assembly in C
http://en.wikipedia.org/wiki/Inline_assembler
example from wikipedia
extern int errno;
int funcname(int arg1, int *arg2, int arg3)
{
int res;
__asm__ volatile(
"int $0x80" /* make the request to the OS */
: "=a" (res) /* return result in eax ("a") */
"+b" (arg1), /* pass arg1 in ebx ("b") */
"+c" (arg2), /* pass arg2 in ecx ("c") */
"+d" (arg3) /* pass arg3 in edx ("d") */
: "a" (128) /* pass system call number in eax ("a") */
: "memory", "cc"); /* announce to the compiler that the memory and condition codes have been modified */
/* The operating system will return a negative value on error;
* wrappers return -1 on error and set the errno global variable */
if (-125 <= res && res < 0) {
errno = -res;
res = -1;
}
return res;
}
I don't think you can do them directly. You can do inline assembly with code like:
asm (
"movl $0, %%ebx;"
"movl $1, %%eax;"
);
If you are on a 32-bit processor and using an adequate compiler, then yes. The exact means depends on the particular system and compiler you are programming for, and of course this will make your code about as unportable as can be.
In your case using MinGW, you should look at GCC's inline assembly syntax.
You can of course. "MinGW" (gcc) allows (as other compilers) inline assembly, as other answers already show. Which assembly, it depends on the cpu of your system (prob. 99.99% that it is x86). This makes however your program not portable on other processors (not that bad if you know what you are doing and why).
The relevant page talking about assembly for gcc is here and here, and if you want, also here. Don't forget that it can't be specific since it is architecture-dependent (gcc can compile for several cpus)
there is generally no need to access the CPU registers from a program written in a high-level language: high-level languages, like C, Pascal, etc. where precisely invented in order to abstract the underlying machine and render a program more machine-independent.
i suspect you are trying to perform something but have no clue how to use a conventional way to do it.
many access to the registers are hidden in higher-level constructs or in system or library calls which lets you avoid coding the "dirty-part". tell us more about what you want to do and we may suggest you an alternative.

Resources