Why do all C functions converted to assembler start and end with some identical operations? [duplicate]

Why do all C functions converted to assembler start and end with some identical operations? [duplicate] - c

I know data in nested function calls go to the Stack.The stack itself implements a step-by-step method for storing and retrieving data from the stack as the functions get called or returns.The name of these methods is most known as Prologue and Epilogue.
I tried with no success to search material on this topic. Do you guys know any resource ( site,video, article ) about how function prologue and epilogue works generally in C ? Or if you can explain would be even better.
P.S : I just want some general view, not too detailed.

There are lots of resources out there that explain this:
Function prologue (Wikipedia)
x86 Disassembly/Calling Conventions (WikiBooks)
Considerations for Writing Prolog/Epilog Code (MSDN)
to name a few.
Basically, as you somewhat described, "the stack" serves several purposes in the execution of a program:
Keeping track of where to return to, when calling a function
Storage of local variables in the context of a function call
Passing arguments from calling function to callee.
The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilog is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the calling (parent) function.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack. (In optimized code, using ebp as a frame pointer is optional; other ways of unwinding the stack for exceptions are possible, so there's no actual requirement to spend instructions setting it up.)
The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack. (So on function entry, things are set up so a ret could execute to pop that return address back into EIP. The prologue points ESP somewhere else, which is part of why we need an epilogue.)
Then the prologue is executed:
push ebp ; Save the stack-frame base pointer (of the calling function).
mov ebp, esp ; Set the stack-frame base pointer to be the current
; location on the stack.
sub esp, N ; Grow the stack by N bytes to reserve space for local variables
At this point, we have:
...
ebp + 4: Return address
ebp + 0: Calling function's old ebp value
ebp - 4: (local variables)
...
The epilog:
mov esp, ebp ; Put the stack pointer back where it was when this function
; was called.
pop ebp ; Restore the calling function's stack frame.
ret ; Return to the calling function.

C Function Call Conventions and the Stack explains well the concept of a call stack
Function prologue briefly explains the assembly code and the hows and whys.
The gen on function perilogues

I am quite late to the party & I am sure that in the last 7 years since the question was asked, you'd have gotten a way clearer understanding of things, that is of course if you chose to pursue the question any further. However, I thought I would still give a shot at especially the why part of the prolog & the epilog.
Also, the accepted answer elegantly & quite simply explains the how of the epilog & the prolog, with good references. I only intend to supplement that answer with the why (at least the logical why) part.
I will quote the below from the accepted answer & try to extend it's explanation.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep
track of the function's stack frame. The esp register is used by the
processor to point to the most recent addition (the top value) on the
stack.
The call instruction does two things: First it pushes the return
address onto the stack, then it jumps to the function being called.
Immediately after the call, esp points to the return address on the
stack.
The last line in the quote above says immediately after the call, esp points to the return address on the stack.
Why's that?
So let's say that our code that's getting currently executed has the following situation, as shown in the (really badly drawn) diagram below
So our next instruction to be executed is, say at the address 2. This is where the EIP is pointing. The current instruction has a function call (that would internally translate to the assembly call instruction).
Now ideally, because the EIP is pointing to the very next instruction, that would indeed be the next instruction to get executed. But since there's sort of a diversion from the current execution flow path, (that is now expected because of the call) the EIP's value would change. Why? Because now another instruction, that may be somewhere else, say at the address 1234 (or whatever), may need to get executed. But in order to complete the execution flow of the program as was intended by the programmer, after the diversion activities are done, the control must return back to the address 2 as that is what should have been executed next should the diversion have not happened. Let us call this address 2 as the return address in the context of the call that is being made.
Problem 1
So, before the diversion actually happens, the return address, 2, would need to be stored somewhere temporarily.
There could have been many choices of storing it in any of the available registers, or some memory location etc. But for (I believe good reason) it was decided that the return address would be stored onto the stack.
So what needs to be done now is increment the ESP (the stack pointer) such that the top of the stack now points at the next address on the stack. So TOS' (TOS before the increment) which was pointing to the address, say 292, now gets incremented & starts pointing to the address 293. That is where we put our return address 2. So something like this:
So it looks like now we have achieved our goal of temporarily storing the return address somewhere. We should now just go about making the diversion call. And we could. But there's a small problem. During the execution of the called function, the stack pointer, along with the other register values, could be manipulated multiple times.
Problem 2
So, although the return address of ours, is still stored on the stack, at location 293, after the called function finishes off executing, how would the execution flow know that it should now goto 293 & that's where it would find the return address?
So (I believe for good reason again) one of the ways of solving the above problem could be to store the stack address 293 (where the return address is) in a (designated) register called EBP. But then what about the contents of EBP? Would that not be overwritten? Sure, that's a valid point. So let's store the current contents of EBP on to the stack & then store this stack address into EBP. Something like this:
The stack pointer is incremented. The current value of EBP (denoted as EBP'), which is say xxx, is stored onto the top of the stack, i.e. at the address 294. Now that we have taken a backup of the current contents of EBP, we can safely put any other value onto the EBP. So we put the current address of the top of the stack, that is the address 294, in EBP.
With the above strategy in place, we solve for the Problem 2 discussed above. How? So now when the execution flow wants to know where from should it fetch the return address, it would :
first get the value from EBP out and point the ESP to that value. In our case, this would make TOS (top of stack) point to the address 294 (since that is what is stored in EBP).
Then it would restore the previous value of EBP. To do this it would simply take the value at 294 (the TOS), which is xxx (which was actually the older value of EBP), & put it back to EBP.
Then it would decrement the stack pointer to go to the next lower address in the stack which is 293 in our case. Thus finally reaching 293 (see that's what our problem 2 was). That's where it would find the return address, which is 2.
It will finally pop this 2 out into the EIP, that's the instruction that should have ideally been executed should the diversion have not happened, remember.
And the steps that we just saw being performed, with all the jugglery, to store the return address temporarily & then retrieve it is exactly what gets done with the function prolog (before the function call) & the epilog (before the function ret). The how was already answered, we just answered the why as well.
Just an end note: For the sake of brevity, I have not taken care of the fact that the stack addresses may grow the other way round.

Every function has an identical prologue(The starting of function code) and epilogue ( The ending of a function).
Prologue: The structure of Prologue is look like:
push ebp
mov esp,ebp
Epilogue: The structure of Prologue is look like:
leave
ret
More in detail : what is Prologue and Epilogue

Related

Determining return address of function on ARM Cortex-M series

I want to determine the return address of a function in Keil. I opened diassembly section at debugging mode in Keil uvision. What is shown is some assembly code like this:
My intention is to inject a simple binary code to microcontroller via using buffer overflow at microcontroller.see: Buffer overflow
I want to determine the return address of "test" function . Is it a must to know how to read assembly code or are there any trick to find the return address?
I am newbie to assembly.

R14 or in other name LR hold the return address. On the left you can see it in the picture. It is 0x08000287.

When a function is called, R14 will be overwritten with the address following the call ("BL" or "BLX") instruction. If that function doesn't call any other functions, R14 will often be left holding the return address for its duration. Further, if the function tail-calls another function, the tail call may be replaced with a branch ("B" or "BX"), with R14 holding the return address of the original caller. If a function makes a non-tail call to another function, it will be necessary to save R14 "somewhere" (typically the stack, but possibly to another previously-used caller-saved register) at some time before that, and retrieve that value from the stack at some later time, but if optimizations are enabled the location where R14 is saved will generally be unpredictable.
Some compilers may have a mode that would stack things consistently enough to be usable, but code will be very compiler-dependent. The technique most likely to be successful may be to do something like:
extern int getStackAddress(uint8_t **addr); // Always returns zero
void myFunction(...whavever...)
{
uint8_t *returnAddress;
if (getStackAddress(&returnAddress)) return; // Put this first.
}
where the getStackAddress would be a machine-code function that stores R14 to the address in R0, loads R0 with zero, and then branches to R14. There are relatively few code sequences that would be likely to follow that, and if a code examines instructions at the address stored in returnAddress and recognizes one of these code sequences, it would know that the return address for myFunction is stored in a spot appropriate for the sequence in question. For example, if it sees:
test r0,r0
be ...
pop {r0,pc}
It would know that the caller's address is second on the stack. Likewise if it sees:
cmp r0,#0
bne somewhere:
somewhere: ; Compute address based on lower byte of bne
pop {r0,r1,r2,r4,r5,pc}
then it would know that the caller's address is sixth.
There are a few instructons compilers could use to test a register against zero, and some compilers might use be while others use bne, but for the code above compilers would be likely to use the above pattern, and so counting how many bits are set in the pop instruction would reveal the whereabouts of the return address on the stack. One wouldn't know until runtime whether this test would actually work, but in cases where it claims to identify the return address it should actually be correct.

You can find all the answers in the Cortex-M documentation
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337h/Chdedegj.html

In terms of using the stack, why do we need a base pointer and a stack pointer [duplicate]

This question already has answers here:
Why is it better to use the ebp than the esp register to locate parameters on the stack?
(1 answer)
What is the purpose of the EBP frame pointer register?
(5 answers)
Closed 3 years ago.
In terms of x86 assembly code anyways. I've been reading about function calls, but still can't fully grasp the need for a base / frame pointer (EBP) along with a stack pointer (ESP).
When we call a function, the current value of EBP will be placed on the stack and then EBP gets the current ESP value.
Place holders for the return value, function arguments and local variables of the function will then be placed on the stack, and the stack pointer ESP value will decrease (or increase) to point to after the last placeholder placed on the stack.
Now we have the EBP pointing to the beginning of the current stack frame, and ESP pointing to the end of the stack frame.
The EBP will be used to access the arguments and local variables of the function due to constant offsets from the EBP. That is fine. What I don't understand is, why can't the ESP just be used to access these variables also by using its offsets. The EBP points to the beginning of the stack frame , and the ESP points to the end of the stack frame. What's the difference?
The ESP shouldn't change from once there has been a placeholder for all the local variables etc. or should it?

Technically, it is possible (but sometimes hard) to track how many local and temporary variables are stored on the stack, so that accessing function input, and local variables can be done without EBP.
Consider the following "C" code ;
int func(int arg) {
int result ;
double x[arg+5] ;
// Do something with x, calculate result
return result ;
} ;
The numbers of items that are stored on the stack is now variables (arg+5 items of double). Calculating the location of 'arg' from the stack require run time calculation, which can have significant negative impact on performance.
With extra register (EBP), the location of arg is always at fixed location (EBP-2). Executing a 'return' is always simple - move BP to SP, and return, etc.
Bottom line, the decision to commit the EBP register to a single function (instead of using it as a general register) is a trade off between performance, simplicity, code size and other factors. Practical experience has shown the benefit outweigh the cost.

Side note about debugger/runtime tools:
Using of EBP make it easier for debugger (and other runtime tools) to 'walk the stack'. Tools can examine the stack at run-time, and without knowing anything about the current program stack (e.g., how many items have been pushed into eac frame), they can travel the stack all the way to the "main".
Without EBP pointing to the 'next' frame, run-time tools (including debugger) will face the very hard (impossible ?) task of knowing how to move from the ESP to specific local variables.

Recognizing stack frames in a stack using saved EBP values

I would like to divide a stack to stack-frames by looking on the raw data on the stack. I thought to do so by finding a "linked list" of saved EBP pointers.
Can I assume that a (standard and commonly used) C compiler (e.g. gcc) will always update and save EBP on a function call in the function prologue?
pushl %ebp
movl %esp, %ebp
Or are there cases where some compilers might skip that part for functions that don't get any parameters and don't have local variables?
The x86 calling conventions and the Wiki article on function prologue don't help much with that.
Is there any better method to divide a stack to stack frames just by looking on its raw data?
Thanks!

Some versions of gcc have a -fomit-frame-pointer optimization option. If memory serves, it can be used even with parameters/local variables (they index directly off of ESP instead of using EBP). Unless I'm badly mistaken, MS VC++ can do roughly the same.
Offhand, I'm not sure of a way that's anywhere close to universally applicable. If you have code with debug info, it's usually pretty easy -- otherwise though...

Even with the framepointer optimized out, stackframes are often distinguishable by looking through stack memory for saved return addresses instead. Remember that a function call sequence in x86 always consists of:
call someFunc ; pushes return address (instr. following `call`)
...
someFunc:
push EBP ; if framepointer is used
mov EBP, ESP ; if framepointer is used
push <nonvolatile regs>
...
so your stack will always - even if the framepointers are missing - have return addresses in there.
How do you recognize a return address ?
to start with, on x86, instruction have different lengths. That means return addresses - unlike other pointers (!) - tend to be misaligned values. Statistically 3/4 of them end not at a multiple of four.
Any misaligned pointer is a good candidate for a return address.
then, remember that call instructions on x86 have specific opcode formats; read a few bytes before the return address and check if you find a call opcode there (99% most of the time, it's five bytes back for a direct call, and three bytes back for a call through a register). If so, you've found a return address.
This is also a way to distinguish C++ vtables from return addresses by the way - vtable entrypoints you'll find on the stack, but looking "back" from those addresses you don't find call instructions.
With that method, you can get candidates for the call sequence out of the stack even without having symbols, framesize debugging information or anything.
The details of how to piece the actual call sequence together from those candidates are less straightforward though, you need a disassembler and some heuristics to trace potential call flows from the lowest-found return address all the way up to the last known program location. Maybe one day I'll blog about it ;-) though at this point I'd rather say that the margin of a stackoverflow posting is too small to contain this ...

How can I create a parallel stack and run a coroutine on it?

I decided I should try to implement coroutines (I think that's how I should call them) for fun and profits. I expect to have to use assembler, and probably some C if I want to make this actually useful for anything.
Bear in mind that this is for educational purposes. Using an already built coroutine library is too easy (and really no fun).
You guys know setjmp and longjmp? They allow you to unwind the stack up to a predefined location, and resumes execution from there. However, it can't rewind to "later" on the stack. Only come back earlier.
jmpbuf_t checkpoint;
int retval = setjmp(&checkpoint); // returns 0 the first time
/* lots of stuff, lots of calls, ... We're not even in the same frame anymore! */
longjmp(checkpoint, 0xcafebabe); // execution resumes where setjmp is, and now it returns 0xcafebabe instead of 0
What I'd like is a way to run, without threading, two functions on different stacks. (Obviously, only one runs at a time. No threading, I said.) These two functions must be able to resume the other's execution (and halt their own). Somewhat like if they were longjmping to the other. Once it returns to the other function, it must resume where it left (that is, during or after the call that gave control to the other function), a bit like how longjmp returns to setjmp.
This is how I thought it:
Function A creates and zeroes a parallel stack (allocates memory and all that).
Function A pushes all its registers to the current stack.
Function A sets the stack pointer and the base pointer to that new location, and pushes a mysterious data structure indicating where to jump back and where to set the instruction pointer back.
Function A zeroes most of its registers and sets the instruction pointer to the beginning of function B.
That's for the initialization. Now, the following situation will indefinitely loop:
Function B works on that stack, does whatever work it needs to.
Function B comes to a point where it needs to interrupt and give A control again.
Function B pushes all of its registers to its stack, takes the mysterious data structure A gave it at the very beginning, and sets the stack pointer and the instruction pointer to where A told it to. In the process, it hands back A a new, modified data structure that tells where to resume B.
Function A wakes up, popping back all the registers it pushed to its stack, and does work until it comes to a point where it needs to interrupt and give B control again.
All this sounds good to me. However, there is a number of things I'm not exactly at ease with.
Apparently, on good ol' x86, there was this pusha instruction that would send all registers to the stack. However, processor architectures evolve, and now with x86_64 we've got a lot more general-purpose registers, and likely several SSE registers. I couldn't find any evidence that pusha does push them. There are about 40 public registers in a mordern x86 CPU. Do I have to do all the pushes myself? Moreover, there is no push for SSE registers (though there's bound to be an equivalent—I'm new to this whole "x86 assembler" thing).
Is changing the instruction pointer as easy as saying it? Can I do, like, mov rip, rax (Intel syntax)? Also, getting the value from it must be somewhat special as it constantly changes. If I do like mov rax, rip (Intel syntax again), will rip be positioned on the mov instruction, to the instruction after it, or somewhere between? It's just jmp foo. Dummy.
I've mentioned a mysterious data structure a few times. Up to now I've assumed it needs to contain at least three things: the base pointer, the stack pointer and the instruction pointer. Is there anything else?
Did I forget anything?
While I'd really like to understand how things work, I'm pretty sure there are a handful of libraries that do just that. Do you know any? Is there any POSIX- or BSD-defined standard way to do it, like pthread for threads?
Thanks for reading my question textwall.

You are correct in that PUSHA wont work on x64 it will raise the exception #UD, as PUSHA only pushes the 16-bit or 32-bit general purpose registers. See the Intel manuals for all the info you ever wanted to know.
Setting RIP is simple, jmp rax will set RIP to RAX. To retrieve RIP, you could either get it at compile time if you already know all the coroutine exit origins, or you could get it at run time, you can make a call to the next address after that call. Like this:
a:
call b
b:
pop rax
RAX will now be b. This works because CALL pushes the address of the next instruction. This technique works on IA32 as well (although I'd suppose there's a nicer way to do it on x64, as it supports RIP-relative addressing, but I don't know of one). Of course if you make a function coroutine_yield, it can just intercept the caller address :)
Since you can't push all the registers to the stack in a single instruction, I wouldn't recommend storing the coroutine state on the stack, as that complicates things anyways. I think the nicest thing to do would be to allocate a data structure for every coroutine instance.
Why are you zeroing things in function A? That's probably not necessary.
Here's how I would approach the entire thing, trying to make it as simple as possible:
Create a structure coroutine_state that holds the following:
initarg
arg
registers (also contains the flags)
caller_registers
Create a function:
coroutine_state* coroutine_init(void (*coro_func)(coroutine_state*), void* initarg);
where coro_func is a pointer to the coroutine function body.
This function does the following:
allocate a coroutine_state structure cs
assign initarg to cs.initarg, these will be the initial argument to the coroutine
assign coro_func to cs.registers.rip
copy current flags to cs.registers (not registers, only flags, as we need some sane flags to prevent an apocalypse)
allocate some decent sized area for the coroutine's stack and assign that to cs.registers.rsp
return the pointer to the allocated coroutine_state structure
Now we have another function:
void* coroutine_next(coroutine_state cs, void* arg)
where cs is the structure returned from coroutine_init which represents a coroutine instance, and arg will be fed into the coroutine as it resumes execution.
This function is called by the coroutine invoker to pass in some new argument to the coroutine and resume it, the return value of this function is an arbitrary data structure returned (yielded) by the coroutine.
store all current flags/registers in cs.caller_registers except for RSP, see step 3.
store the arg in cs.arg
fix the invoker stack pointer (cs.caller_registers.rsp), adding 2*sizeof(void*) will fix it if you're lucky, you'd have to look this up to confirm it, you probably want this function to be stdcall so no registers are tampered with before calling it
mov rax, [rsp], assign RAX to cs.caller_registers.rip; explanation: unless your compiler is on crack, [RSP] will hold the instruction pointer to the instruction that follows the call instruction that called this function (ie: the return address)
load the flags and registers from cs.registers
jmp cs.registers.rip, efectively resuming execution of the coroutine
Note that we never return from this function, the coroutine we jump to "returns" for us (see coroutine_yield). Also note that inside this function you may run into many complications such as function prologue and epilogue generated by the C compiler, and perhaps register arguments, you have to take care of all this. Like I said, stdcall will save you lots of trouble, I think gcc's -fomit-frame_pointer will remove the epilogue stuff.
The last function is declared as:
void coroutine_yield(void* ret);
This function is called inside the coroutine to "pause" execution of the coroutine and return to the caller of coroutine_next.
store flags/registers in cs.registers
fix coroutine stack pointer (cs.registers.rsp), once again, add 2*sizeof(void*) to it, and you want this function to be stdcall as well
mov rax, arg (lets just pretend all the functions in your compiler return their arguments in RAX)
load flags/registers from cs.caller_registers
jmp cs.caller_registers.rip This essentially returns from the coroutine_next call on the coroutine invoker's stack frame, and since the return value is passed in RAX, we returned arg. Let's just say if arg is NULL, then the coroutine has terminated, otherwise it's an arbitrary data structure.
So to recap, you initialize a coroutine using coroutine_init, then you can repeatedly invoke the instantiated coroutine with coroutine_next.
The coroutine's function itself is declared:
void my_coro(coroutine_state cs)
cs.initarg holds the initial function argument (think constructor). Each time my_coro is called, cs.arg has a different argument that was specified by coroutine_next. This is how the coroutine invoker communicates with the coroutine. Finally, every time the coroutine wants to pause itself, it calls coroutine_yield, and passes one argument to it, which is the return value to the coroutine invoker.
Okay, you may now think "thats easy!", but I left out all the complications of loading the registers and flags in the correct order while still maintaining a non corrupt stack frame and somehow keeping the address of your coroutine data structure (you just overwrote all your registers), in a thread-safe manner. For that part you will need to find out how your compiler works internally... good luck :)

Good learning reference: libcoroutine, especially their setjmp/longjmp implementation. I know its not fun to use an existing library, but you can at least get a general bearing on where you are going.

Simon Tatham has an interesting implementation of coroutines in C that doesn't require any architecture-specific knowledge or stack fiddling. It's not exactly what you're after, but I thought it might nonetheless be of at least academic interest.

boost.coroutine (boost.context) at boost.org does all for you

What is exactly the base pointer and stack pointer? To what do they point?

Using this example coming from wikipedia, in which DrawSquare() calls DrawLine(),
(Note that this diagram has high addresses at the bottom and low addresses at the top.)
Could anyone explain me what ebp and esp are in this context?
From what I see, I'd say the stack pointer points always to the top of the stack, and the base pointer to the beginning of the the current function? Or what?
edit: I mean this in the context of windows programs
edit2: And how does eip work, too?
edit3: I have the following code from MSVC++:
var_C= dword ptr -0Ch
var_8= dword ptr -8
var_4= dword ptr -4
hInstance= dword ptr 8
hPrevInstance= dword ptr 0Ch
lpCmdLine= dword ptr 10h
nShowCmd= dword ptr 14h
All of them seem to be dwords, thus taking 4 bytes each. So I can see there is a gap from hInstance to var_4 of 4 bytes. What are they? I assume it is the return address, as can be seen in wikipedia's picture?
(editor's note: removed a long quote from Michael's answer, which doesn't belong in the question, but a followup question was edited in):
This is because the flow of the function call is:
* Push parameters (hInstance, etc.)
* Call function, which pushes return address
* Push ebp
* Allocate space for locals
My question (last, i hope!) now is, what is exactly what happens from the instant I pop the arguments of the function i want to call up to the end of the prolog? I want to know how the ebp, esp evolve during those moments(I already understood how the prolog works, I just want to know what is happening after i pushed the arguments on the stack and before the prolog).

esp is as you say it is, the top of the stack.
ebp is usually set to esp at the start of the function. Function parameters and local variables are accessed by adding and subtracting, respectively, a constant offset from ebp. All x86 calling conventions define ebp as being preserved across function calls. ebp itself actually points to the previous frame's base pointer, which enables stack walking in a debugger and viewing other frame's local variables to work.
Most function prologs look something like:
push ebp ; Preserve current frame pointer
mov ebp, esp ; Create new frame pointer pointing to current stack top
sub esp, 20 ; allocate 20 bytes worth of locals on stack.
Then later in the function you may have code like (presuming both local variables are 4 bytes)
mov [ebp-4], eax ; Store eax in first local
mov ebx, [ebp - 8] ; Load ebx from second local
FPO or frame pointer omission optimization which you can enable will actually eliminate this and use ebp as another register and access locals directly off of esp, but this makes debugging a bit more difficult since the debugger can no longer directly access the stack frames of earlier function calls.
EDIT:
For your updated question, the missing two entries in the stack are:
nShowCmd = dword ptr +14h
hlpCmdLine = dword ptr +10h
PrevInstance = dword ptr +0Ch
hInstance = dword ptr +08h
return address = dword ptr +04h <==
savedFramePointer = dword ptr +00h <==
var_4 = dword ptr -04h
var_8 = dword ptr -08h
var_C = dword ptr -0Ch
This is because the flow of the function call is:
Push parameters (hInstance, PrevInstance, hlpCmdLine, nShowCmd)
Call function, which pushes return address
Push ebp
Allocate space for locals

ESP (Stack Pointer) is the current stack pointer, which will change any time a word or address is pushed or popped on/off the stack. EBP (Base Pointer) is a more convenient way for the compiler to keep track of a function's parameters and local variables than using the ESP directly.
Generally (and this may vary from compiler to compiler), all of the arguments to a function being called are pushed onto the stack by the calling function (usually in the reverse order that they're declared in the function prototype, but this varies). Then the function is called, which pushes the return address (EIP, Instruction Pointer) onto the stack.
Upon entry to the function, the old EBP value is pushed onto the stack and EBP is set to the value of ESP. Then the ESP is decremented (because the stack grows downward in memory) to allocate space for the function's local variables and temporaries. From that point on, during the execution of the function, the arguments to the function are located on the stack at positive offsets from EBP (because they were pushed prior to the function call), and the local variables are located at negative offsets from EBP (because they were allocated on the stack after the function entry). That's why the EBP is called the Frame Pointer, because it points to the center of the function call frame.
Upon exit, all the function has to do is set ESP to the value of EBP (which deallocates the local variables from the stack, and exposes the entry EBP on the top of the stack), then pop the old EBP value from the stack, and then the function returns (popping the return address into EIP).
Upon returning back to the calling function, it can then increment ESP in order to remove the function arguments it pushed onto the stack just prior to calling the other function. At this point, the stack is back in the same state it was in prior to invoking the called function.

You have it right. The stack pointer points to the top item on the stack and the base pointer points to the "previous" top of the stack before the function was called.
When you call a function, any local variable will be stored on the stack and the stack pointer will be incremented. When you return from the function, all the local variables on the stack go out of scope. You do this by setting the stack pointer back to the base pointer (which was the "previous" top before the function call).
Doing memory allocation this way is very, very fast and efficient.

EDIT: For a better description, see x86 Disassembly/Functions and Stack Frames in a WikiBook about x86 assembly. I try to add some info you might be interested in using Visual Studio.
Storing the caller EBP as the first local variable is called a standard stack frame, and this may be used for nearly all calling conventions on Windows. Differences exist whether the caller or callee deallocates the passed parameters, and which parameters are passed in registers, but these are orthogonal to the standard stack frame problem.
Speaking about Windows programs, you might probably use Visual Studio to compile your C++ code. Be aware that Microsoft uses an optimization called Frame Pointer Omission, that makes it nearly impossible to do walk the stack without using the dbghlp library and the PDB file for the executable.
This Frame Pointer Omission means that the compiler does not store the old EBP on a standard place and uses the EBP register for something else, therefore you have hard time finding the caller EIP without knowing how much space the local variables need for a given function. Of course Microsoft provides an API that allows you to do stack-walks even in this case, but looking up the symbol table database in PDB files takes too long for some use cases.
To avoid FPO in your compilation units, you need to avoid using /O2 or need to explicitly add /Oy- to the C++ compilation flags in your projects. You probably link against the C or C++ runtime, which uses FPO in the Release configuration, so you will have hard time to do stack walks without the dbghlp.dll.

First of all, the stack pointer points to the bottom of the stack since x86 stacks build from high address values to lower address values. The stack pointer is the point where the next call to push (or call) will place the next value. It's operation is equivalent to the C/C++ statement:
// push eax
--*esp = eax
// pop eax
eax = *esp++;
// a function call, in this case, the caller must clean up the function parameters
move eax,some value
push eax
call some address // this pushes the next value of the instruction pointer onto the
// stack and changes the instruction pointer to "some address"
add esp,4 // remove eax from the stack
// a function
push ebp // save the old stack frame
move ebp, esp
... // do stuff
pop ebp // restore the old stack frame
ret
The base pointer is top of the current frame. ebp generally points to your return address. ebp+4 points to the first parameter of your function (or the this value of a class method). ebp-4 points to the first local variable of your function, usually the old value of ebp so you can restore the prior frame pointer.

Long time since I've done Assembly programming, but this link might be useful...
The processor has a collection of registers which are used to store data. Some of these are direct values while others are pointing to an area within RAM. Registers do tend to be used for certain specific actions and every operand in assembly will require a certain amount of data in specific registers.
The stack pointer is mostly used when you're calling other procedures. With modern compilers, a bunch of data will be dumped first on the stack, followed by the return address so the system will know where to return once it's told to return. The stack pointer will point at the next location where new data can be pushed to the stack, where it will stay until it's popped back again.
Base registers or segment registers just point to the address space of a large amount of data. Combined with a second regiser, the Base pointer will divide the memory in huge blocks while the second register will point at an item within this block. Base pointers therefor point to the base of blocks of data.
Do keep in mind that Assembly is very CPU specific. The page I've linked to provides information about different types of CPU's.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight