gcc compiling .c with .s file - .bss confusion (bug?)

gcc compiling .c with .s file - .bss confusion (bug?) - c

Using gcc 4.6.3 under Ubuntu 12.04 on an IA32 architecture, I ran into an issue relating to compiling C files with assembly files using storage on the .bss segment with both .comm and .lcomm directives.
Between a .comm and a .lcomm buffer, the assembly file foo.s uses close to the maximum space gas lets me allocate in this segment (foo calculates prime factorization of long longs). With an assembly file bar.s handling i/o and such, everything compiles and links fine (and fast), and works well.
When I then use a C file bar.c to handle i/o, gcc does not terminate - or at least not in less than 5 minutes. The .bss request is close to my small notebook memory, but as the .bss segment does not get compile-time initialized, and as it works with bar.s, I don't see why this happens.
When I reduce the .bss size requested in foo.s, gcc compiles and links fine, and everything executes as it should. Also, as expected, the file size of the executable created in each case using
gcc bar.c foo.s -Wall
does not depend on the size in .bss requested (I compiled varying sizes which were all much smaller than the original, failing size). The executable is very small (maybe 10k) in all cases - in fact, of identical size - except, obviously, the original case which does not successfully compile and gets hung up.
Is this a gcc bug? Is there a command line option to use to prevent this from happening? Or what is going on?
Edit: per a comment, here is the part with the .bss segment allocation:
# Sieve of Eratosthenes
# Create list of prime numbers smaller than n
#
# Note: - no input error (range) check
# - n <= 500,000,000 (could be changed) - in assembly
# compiling it with gcc: trouble. make n <= 50,000,000
# Returns: pointer to array of ints of prime numbers
# (0 sentinel at end)
#
# Registers: %esi: sentinel value (n+1)
# %edx: n
# %ecx: counting variable (2 - n)
# %ebx: pointer into array of primes
# (position next to be added)
# %eax: inner pointer to A. tmp array
.section .bss
# .lcomm tmp_Arr, 2000000008 # 500,000,000 plus sentinel & padding
# .comm prime_Arr, 500000008 # asymptotically, primes aren't dense
.lcomm tmp_Arr, 200000008 # 50,000,000 plus sentinel & padding
.comm prime_Arr, 50000008 # asymptotically, primes aren't dense
.section .text
.globl sieve
.type sieve, #function
sieve:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
pushl %esi
pushl %ebx
# create Eratosthenes tmp array
movl $0, %ecx
loop_sieve_Tmp_:
movl %ecx, tmp_Arr(, %ecx, 4)
addl $1, %ecx
cmp %ecx, %edx
jge loop_sieve_Tmp_
# initialize registers used in algorithm
movl $2, %ecx # outer loop counting var
movl %ecx, %eax # inner loop counting var
xor %ebx, %ebx # pointer to prime array
movl %edx, %esi
incl %esi # sentinel (or placeholder for 'not prime')
loop_sieve_Outer_:
movl %ecx, prime_Arr(, %ebx, 4) # record prime
incl %ebx
loop_sieve_Inner_:
addl %ecx, %eax
movl %esi, tmp_Arr(, %eax, 4)
cmp %eax, %edx
jge loop_sieve_Inner_
find_Next_: # find minimum in Erist. tmp array
addl $1, %ecx
cmp %ecx, %edx
jl lbl_sieve_done_
cmp tmp_Arr(, %ecx, 4), %esi
je find_Next_
movl %ecx, %eax
jmp loop_sieve_Outer_
lbl_sieve_done_:
movl $0, prime_Arr(, %ebx, 4) # sentinel
movl $prime_Arr, %eax
popl %ebx
popl %esi
movl %ebp, %esp
popl %ebp
ret
# end sieve

I've replicated your problem with gcc 4.7.2 on Debian. It doesn't hang for me, but it does take a substantial length of time (10 seconds).
It appears that the linker actually allocates and zeros "comm" memory during it's processing. If the machine is sufficiently memory limited (as it seems yours is) this will cause the linker to thrash the swap even though the final executable will be tiny. For me the memory allocation was around 2.3Gb.
I tried a couple of variations (.space, .zero, .org) and they all seem to give the same effect.
With an up to date version on the compiler (4.9) this no longer happens.

Related

Segmentation fault when calling assembly function from C code

I'm trying to link assembly functions to a C code for exercise.
Here's my assembly function, written in x86 assembly:
.code32
.section .text
.globl max_function
.type max_function, #function
# i parametri saranno in ordine inverso a partire da 8(%ebp)
max_function:
pushl %ebp # save ebp
movl %esp, %ebp # new frame function
movl $0, %edi # first index is 0
movl 8(%ebp), %ecx # ecx is loaded with the number of elements
cmpl $0, %ecx # check that the number of elements is not 0
je end_function_err #if it is, exit
movl 12(%ebp),%edx # edx is loaded with the array base
movl (%edx), %eax # first element of the array
start_loop:
incl %edi #increment the index
cmpl %edi,%ecx #if it's at the end quit
je loop_exit
movl (%edx,%edi,4),%ebx #pick the value
cmpl %ebx,%eax #compare with actual maximum value
jle start_loop #less equal -> repeat loop
movl %ebx,%eax #greater -> update value
jmp start_loop #repeat loop
loop_exit:
jmp end_function #finish
end_function: #exit operations
movl %ebp, %esp
popl %ebp
ret
end_function_err:
movl $0xffffffff, %eax #return -1 and quit
jmp end_function
It basically defines a function that finds the maximum number of an array (or it should be)
And my C code:
#include <stdio.h>
#include <stdlib.h>
extern int max_function(int size, int* values);
int main(){
int values[] = { 4 , 5 , 7 , 3 , 2 , 8 , 5 , 6 } ;
printf("\nMax value is: %d\n",max_function(8,values));
}
I compile them with gcc -o max max.s max.c.
I get a SegmentationFault when executing the code.
My suspect is that I don't access the value in a right manner, but I can't see why, even because I based my code on an example code that prints argc and argv values when called from the command line.
I'm running Debian 8 64-bit

The problems were:
not preserving %ebx and %edi
not compiling for 32 bit (had to use -m32 flag for gcc)
cmpl operands were inverted
Thanks everybody, problem is solved.
I'll focus more on debugging tools to (disassembling and running step by step was very useful)!

Understanding x86 Assembly Code from C code

C code:
#include <stdio.h>
main() {
int i;
for (i = 0; i < 10; i++) {
printf("%s\n", "hello");
}
}
ASM:
.file "simple_loop.c"
.section .rodata
.LC0:
.string "hello"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp # push ebp onto stack
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp # setup base pointer or stack ?
.cfi_def_cfa_register 5
andl $-16, %esp # ?
subl $32, %esp # ?
movl $0, 28(%esp) # i = 0
jmp .L2
.L3:
movl $.LC0, (%esp) # point stack pointer to "hello" ?
call puts # print "hello"
addl $1, 28(%esp) # i++
.L2:
cmpl $9, 28(%esp) # if i < 9
jle .L3 # goto l3
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
So I am trying to improve my understanding of x86 assembly code. For the above code, I marked off what I believe I understand. As for the question marked content, could someone share some light? Also, if any of my comments are off, please let me know.

andl $-16, %esp # ?
subl $32, %esp # ?
This reserves some space on the stack. First, the andl instruction rounds the %esp register down to the next lowest multiple of 16 bytes (exercise: find out what the binary value of -16 is to see why). Then, the subl instruction moves the stack pointer down a bit further (32 bytes), reserving some more space (which it will use next). I suspect this rounding is done so that access through the %esp register is slightly more efficient (but you'd have to inspect your processor data sheets to figure out why).
movl $.LC0, (%esp) # point stack pointer to "hello" ?
This places the address of the string "hello" onto the stack (this instruction doesn't change the value of the %esp register itself). Apparently your compiler considers it more efficient to move data onto the stack directly, rather than to use the push instruction.

Decoding equivalent assembly code of C code

Wanting to see the output of the compiler (in assembly) for some C code, I wrote a simple program in C and generated its assembly file using gcc.
The code is this:
#include <stdio.h>
int main()
{
int i = 0;
if ( i == 0 )
{
printf("testing\n");
}
return 0;
}
The generated assembly for it is here (only the main function):
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
leave
ret
I am at an absolute loss to correlate the C code and assembly code. All that the code has to do is store 0 in a register and compare it with a constant 0 and take suitable action. But what is going on in the assembly?

Since main is special you can often get better results by doing this type of thing in another function (preferably in it's own file with no main). For example:
void foo(int x) {
if (x == 0) {
printf("testing\n");
}
}
would probably be much more clear as assembly. Doing this would also allow you to compile with optimizations and still observe the conditional behavior. If you were to compile your original program with any optimization level above 0 it would probably do away with the comparison since the compiler could go ahead and calculate the result of that. With this code part of the comparison is hidden from the compiler (in the parameter x) so the compiler can't do this optimization.
What the extra stuff actually is
_main:
pushl %ebpz
movl %esp, %ebp
subl $24, %esp
andl $-16, %esp
This is setting up a stack frame for the current function. In x86 a stack frame is the area between the stack pointer's value (SP, ESP, or RSP for 16, 32, or 64 bit) and the base pointer's value (BP, EBP, or RBP). This is supposedly where local variables live, but not really, and explicit stack frames are optional in most cases. The use of alloca and/or variable length arrays would require their use, though.
This particular stack frame construction is different than for non-main functions because it also makes sure that the stack is 16 byte aligned. The subtraction from ESP increases the stack size by more than enough to hold local variables and the andl effectively subtracts from 0 to 15 from it, making it 16 byte aligned. This alignment seems excessive except that it would force the stack to also start out cache aligned as well as word aligned.
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
I don't know what all this does. alloca increases the stack frame size by altering the value of the stack pointer.
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
movl $0, %eax
I think you know what this does. If not, the movl just befrore the call is moving the address of your string into the top location of the stack so that it may be retrived by printf. It must be passed on the stack so that printf can use it's address to infer the addresses of printf's other arguments (if any, which there aren't in this case).
leave
This instruction removes the stack frame talked about earlier. It is essentially movl %ebp, %esp followed by popl %ebp. There is also an enter instruction which can be used to construct stack frames, but gcc didn't use it. When stack frames aren't explicitly used, EBP may be used as a general puropose register and instead of leave the compiler would just add the stack frame size to the stack pointer, which would decrease the stack size by the frame size.
ret
I don't need to explain this.
When you compile with optimizations
I'm sure you will recompile all fo this with different optimization levels, so I will point out something that may happen that you will probably find odd. I have observed gcc replacing printf and fprintf with puts and fputs, respectively, when the format string did not contain any % and there were no additional parameters passed. This is because (for many reasons) it is much cheaper to call puts and fputs and in the end you still get what you wanted printed.

Don't worry about the preamble/postamble - the part you're interested in is:
movl $0, -4(%ebp)
cmpl $0, -4(%ebp)
jne L2
movl $LC0, (%esp)
call _printf
L2:
It should be pretty self-evident as to how this correlates with the original C code.

The first part is some initialization code, which does not make any sense in the case of your simple example. This code would be removed with an optimization flag.
The last part can be mapped to C code:
movl $0, -4(%ebp) // put 0 into variable i (located at -4(%ebp))
cmpl $0, -4(%ebp) // compare variable i with value 0
jne L2 // if they are not equal, skip to after the printf call
movl $LC0, (%esp) // put the address of "testing\n" at the top of the stack
call _printf // do call printf
L2:
movl $0, %eax // return 0 (calling convention: %eax has the return code)

Well, much of it is the overhead associated with the function. main() is just a function like any other, so it has to store the return address on the stack at the start, set up the return value at the end, etc.
I would recommend using GCC to generate mixed source code and assembler which will show you the assembler generated for each sourc eline.
If you want to see the C code together with the assembly it was converted to, use a command line like this:
gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst
See http://www.delorie.com/djgpp/v2faq/faq8_20.html
On linux, just use gcc. On Windows down load Cygwin http://www.cygwin.com/
Edit - see also this question Using GCC to produce readable assembly?
and http://oprofile.sourceforge.net/doc/opannotate.html

You need some knowledge about Assembly Language to understand assembly garneted by C compiler.
This tutorial might be helpful

See here more information. You can generate the assembly code with C comments for better understanding.
gcc -g -Wa,-adhls your_c_file.c > you_asm_file.s
This should help you a little.

Understanding empty main()'s translation into assembly

Could somebody please explain what GCC is doing for this piece of code? What is it initializing? The original code is:
#include <stdio.h>
int main()
{
}
And it was translated to:
.file "test1.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
leave
ret
I would be grateful if a compiler/assembly guru got me started by explaining the stack, register and the section initializations. I cant make head or tail out of the code.
EDIT:
I am using gcc 3.4.5. and the command line argument is gcc -S test1.c
Thank You,
kunjaan.

I should preface all my comments by saying, I am still learning assembly.
I will ignore the section initialization. A explanation for the section initialization and basically everything else I cover can be found here:
http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
The ebp register is the stack frame base pointer, hence the BP. It stores a pointer to the beginning of the current stack.
The esp register is the stack pointer. It holds the memory location of the top of the stack. Each time we push something on the stack esp is updated so that it always points to an address the top of the stack.
So ebp points to the base and esp points to the top. So the stack looks like:
esp -----> 000a3 fa
000a4 21
000a5 66
000a6 23
ebp -----> 000a7 54
If you push e4 on the stack this is what happens:
esp -----> 000a2 e4
000a3 fa
000a4 21
000a5 66
000a6 23
ebp -----> 000a7 54
Notice that the stack grows towards lower addresses, this fact will be important below.
The first two steps are known as the procedure prolog or more commonly as the function prolog. They prepare the stack for use by local variables (See procedure prolog quote at the bottom).
In step 1 we save the pointer to the old stack frame on the stack by calling
pushl %ebp. Since main is the first function called, I have no idea what the previous value of %ebp points too.
Step 2, We are entering a new stack frame because we are entering a new function (main). Therefore, we must set a new stack frame base pointer. We use the value in esp to be the beginning of our stack frame.
Step 3. Allocates 8 bytes of space on the stack. As we mentioned above, the stack grows toward lower addresses thus, subtracting by 8, moves the top of the stack by 8 bytes.
Step 4; Aligns the stack, I've found different opinions on this. I'm not really sure exactly what this is done. I suspect it is done to allow large instructions (SIMD) to be allocated on the stack,
http://gcc.gnu.org/ml/gcc/2008-01/msg00282.html
This code "and"s ESP with 0xFFFF0000,
aligning the stack with the next
lowest 16-byte boundary. An
examination of Mingw's source code
reveals that this may be for SIMD
instructions appearing in the "_main"
routine, which operate only on aligned
addresses. Since our routine doesn't
contain SIMD instructions, this line
is unnecessary.
http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
Steps 5 through 11 seem to have no purpose to me. I couldn't find any explanation on google. Could someone who really knows this stuff provide a deeper understanding. I've heard rumors that this stuff is used for C's exception handling.
Step 5, stores the return value of main 0, in eax.
Step 6 and 7 we add 15 in hex to eax for unknown reason. eax = 01111 + 01111 = 11110
Step 8 we shift the bits of eax 4 bits to the right. eax = 00001 because the last bits are shift off the end 00001 | 111.
Step 9 we shift the bits of eax 4 bits to the left, eax = 10000.
Steps 10 and 11 moves the value in the first 4 allocated bytes on the stack into eax and then moves it from eax back.
Steps 12 and 13 setup the c library.
We have reached the function epilogue. That is, the part of the function which returns the stack pointers, esp and ebp to the state they were in before this function was called.
Step 14, leave sets esp to the value of ebp, moving the top of stack to the address it was before main was called. Then it sets ebp to point to the address we saved on the top of the stack during step 1.
Leave can just be replaced with the following instructions:
mov %ebp, %esp
pop %ebp
Step 15, returns and exits the function.
1. pushl %ebp
2. movl %esp, %ebp
3. subl $8, %esp
4. andl $-16, %esp
5. movl $0, %eax
6. addl $15, %eax
7. addl $15, %eax
8. shrl $4, %eax
9. sall $4, %eax
10. movl %eax, -4(%ebp)
11. movl -4(%ebp), %eax
12. call __alloca
13. call ___main
14. leave
15. ret
Procedure Prolog:
The first thing a function has to do
is called the procedure prolog. It
first saves the current base pointer
(ebp) with the instruction pushl %ebp
(remember ebp is the register used for
accessing function parameters and
local variables). Now it copies the
stack pointer (esp) to the base
pointer (ebp) with the instruction
movl %esp, %ebp. This allows you to
access the function parameters as
indexes from the base pointer. Local
variables are always a subtraction
from ebp, such as -4(%ebp) or
(%ebp)-4 for the first local variable,
the return value is always at 4(%ebp)
or (%ebp)+4, each parameter or
argument is at N*4+4(%ebp) such as
8(%ebp) for the first argument while
the old ebp is at (%ebp).
http://www.milw0rm.com/papers/52
A really great stack overflow thread exists which answers much of this question.
Why are there extra instructions in my gcc output?
A good reference on x86 machine code instructions can be found here:
http://programminggroundup.blogspot.com/2007/01/appendix-b-common-x86-instructions.html
This a lecture which contains some of the ideas used below:
http://csc.colstate.edu/bosworth/cpsc5155/Y2006_TheFall/MySlides/CPSC5155_L23.htm
Here is another take on answering your question:
http://www.phiral.net/linuxasmone.htm
None of these sources explain everything.

Here's a good step-by step breakdown of a simple main() function as compiled by GCC, with lots of detailed info: GAS Syntax (Wikipedia)
For the code you pasted, the instructions break down as follows:
First four instructions (pushl through andl): set up a new stack frame
Next five instructions (movl through sall): generating a weird value for eax, which will become the return value (I have no idea how it decided to do this)
Next two instructions (both movl): store the computed return value in a temporary variable on the stack
Next two instructions (both call): invoke the C library init functions
leave instruction: tears down the stack frame
ret instruction: returns to caller (the outer runtime function, or perhaps the kernel function that invoked your program)

Well, dont know much about GAS, and i'm a little rusty on Intel assembly, but it looks like its initializing main's stack frame.
if you take a look, __main is some kind of macro, must be executing initializations.
Then, as main's body is empty, it calls leave instruction, to return to the function that called main.
From http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax#.22hello.s.22_line-by-line:
This line declares the "_main" label, marking the place that is called from the startup code.
pushl %ebp
movl %esp, %ebp
subl $8, %esp
These lines save the value of EBP on the stack, then move the value of ESP into EBP, then subtract 8 from ESP. The "l" on the end of each opcode indicates that we want to use the version of the opcode that works with "long" (32-bit) operands;
andl $-16, %esp
This code "and"s ESP with 0xFFFF0000, aligning the stack with the next lowest 16-byte boundary. (neccesary when using simd instructions, not useful here)
movl $0, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
This code moves zero into EAX, then moves EAX into the memory location EBP-4, which is in the temporary space we reserved on the stack at the beginning of the procedure. Then it moves the memory location EBP-4 back into EAX; clearly, this is not optimized code.
call __alloca
call ___main
These functions are part of the C library setup. Since we are calling functions in the C library, we probably need these. The exact operations they perform vary depending on the platform and the version of the GNU tools that are installed.
Here's a useful link.
http://unixwiz.net/techtips/win32-callconv-asm.html

It would really help to know what gcc version you are using and what libc. It looks like you have a very old gcc version or a strange platform or both. What's going on is some strangeness with calling conventions. I can tell you a few things:
Save the frame pointer on the stack according to convention:
pushl %ebp
movl %esp, %ebp
Make room for stuff at the old end of the frame, and round the stack pointer down to a multiple of 4 (why this is needed I don't know):
subl $8, %esp
andl $-16, %esp
Through an insane song and dance, get ready to return 1 from main:
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
Recover any memory allocated with alloca (GNU-ism):
call __alloca
Announce to libc that main is exiting (more GNU-ism):
call ___main
Restore the frame and stack pointers:
leave
Return:
ret
Here's what happens when I compile the very same source code with gcc 4.3 on Debian Linux:
.file "main.c"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
And I break it down this way:
Tell the debugger and other tools the source file:
.file "main.c"
Code goes in the text section:
.text
Beats me:
.p2align 4,,15
main is an exported function:
.globl main
.type main, #function
main's entry point:
main:
Grab the return address, align the stack on a 4-byte address, and save the return address again (why I can't say):
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
Save frame pointer using standard convention:
pushl %ebp
movl %esp, %ebp
Inscrutable madness:
pushl %ecx
popl %ecx
Restore the frame pointer and the stack pointer:
popl %ebp
leal -4(%ecx), %esp
Return:
ret
More info for the debugger?:
.size main, .-main
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
By the way, main is special and magical; when I compile
int f(void) {
return 17;
}
I get something slightly more sane:
.file "f.c"
.text
.p2align 4,,15
.globl f
.type f, #function
f:
pushl %ebp
movl $17, %eax
movl %esp, %ebp
popl %ebp
ret
.size f, .-f
.ident "GCC: (Debian 4.3.2-1.1) 4.3.2"
.section .note.GNU-stack,"",#progbits
There's still a ton of decoration, and we're still saving the frame pointer, moving it, and restoring it, which is utterly pointless, but the rest of the code make sense.

It looks like GCC is acting like it is ok to edit main() to include CRT initialization code. I just confirmed that I get the exact same assembly listing from MinGW GCC 3.4.5 here, with your source text.
The command line I used is:
gcc -S emptymain.c
Interestingly, if I change the name of the function to qqq() instead of main(), I get the following assembly:
.file "emptymain.c"
.text
.globl _qqq
.def _qqq; .scl 2; .type 32; .endef
_qqq:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
which makes much more sense for an empty function with no optimizations turned on.

Incrementing from 0 to 100 in assembly language

This is kinda oddball, but I was poking around with the GNU assembler today (I want to be able to at least read the syntax), and was trying to get this little contrived example of mine to work. Namely I just want to go from 0 to 100, printing out numbers all the while. So a few minutes later I come up with this:
# count.s: print the numbers from 0 to 100.
.text
string: .asciz "%d\n"
.globl _main
_main:
movl $0, %eax # The starting point/current value.
movl $100, %ebx # The ending point.
_loop:
# Display the current value.
pushl %eax
pushl $string
call _printf
addl $8, %esp
# Check against the ending value.
cmpl %eax, %ebx
je _end
# Increment the current value.
incl %eax
jmp _loop
_end:
All I get from this is 3 printed over and over again. Like I said, just a little contrived example, so don't worry too much about it, it's not a life or death problem.
(The formatting's a little messed up, but nothing major).

You can't trust what any called procedure does to any of the registers.
Either push the registers onto the stack and pop them back off after calling printf or have the increment and end point values held in memory and read/written into registers as you need them.
I hope the following works. I'm assuming that pushl has an equivalant popl and you can push an extra couple of numbers onto the stack.
# count.s: print the numbers from 0 to 100.
.text
string: .asciz "%d\n"
.globl _main
_main:
movl $0, %eax # The starting point/current value.
movl $100, %ebx # The ending point.
_loop:
# Remember your registers.
pushl %eax
pushl %ebx
# Display the current value.
pushl %eax
pushl $string
call _printf
addl $8, %esp
# reinstate registers.
popl %ebx
popl %eax
# Check against the ending value.
cmpl %eax, %ebx
je _end
# Increment the current value.
incl %eax
jmp _loop
_end:

I'm not too familiar with _printf, but could it be that it modifies eax? Printf should return the number of chars printed, which in this case is two: '0' and '\n'. I think it returns this in eax, and when you increment it, you get 3, which is what you proceed to print.
You might be better off using a different register for the counter.

You can safely use registers that are "callee-saved" without having to save them yourself. On x86 these are edi, esi, and ebx; other architectures have more.
These are documented in the ABI references: http://math-atlas.sourceforge.net/devel/assembly/

Well written functions will usually push all the registers onto the stack and then pop them when they're done so that they remain unchanged during the function. The exception would be eax that contains the return value. Library functions like printf are most likely written this way, so I wouldn't do as Wedge suggests:
You'll need to do the same for any other variable you have. Using registers to store local variables is pretty much reserved to architectures with enough registers to support it (e.g. EPIC, amd64, etc.)
In fact, from what I know, compilers usually compile functions that way to deal exactly with this issue.
#seanyboy, your solution is overkill. All that's needed is to replace eax with some other register like ecx.

Nathan is on the right track. You can't assume that register values will be unmodified after calling a subroutine. In fact, it's best to assume they will be modified, else the subroutine wouldn't be able to do it's work (at least for low register count architectures like x86). If you want to preserve a value you should store it in memory (e.g. push it onto the stack and keep track of it's location).
You'll need to do the same for any other variable you have. Using registers to store local variables is pretty much reserved to architectures with enough registers to support it (e.g. EPIC, amd64, etc.)

You could rewrite it so that you use registers that aren't suppose to change, for example %ebp. Just make sure you push them onto the stack at the beginning, and pop them off at the end of your routine.
# count.s: print the numbers from 0 to 100.
.text
string: .asciz "%d\n"
.globl _main
_main:
push %ecx
push %ebp
movl $0, %ecx # The starting point/current value.
movl $100, %ebp # The ending point.
_loop:
# Display the current value.
pushl %ecx
pushl $string
call _printf
addl $8, %esp
# Check against the ending value.
cmpl %ecx, %ebp
je _end
# Increment the current value.
incl %ecx
jmp _loop
_end:
pop %ebp
pop %ecx

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight