In C, does using static variables in a function make it faster?

In C, does using static variables in a function make it faster? - c

My function will be called thousands of times. If i want to make it faster, will changing the local function variables to static be of any use? My logic behind this is that, because static variables are persistent between function calls, they are allocated only the first time, and thus, every subsequent call will not allocate memory for them and will become faster, because the memory allocation step is not done.
Also, if the above is true, then would using global variables instead of parameters be faster to pass information to the function every time it is called? i think space for parameters is also allocated on every function call, to allow for recursion (that's why recursion uses up more memory), but since my function is not recursive, and if my reasoning is correct, then taking off parameters will in theory make it faster.
I know these things I want to do are horrible programming habits, but please, tell me if it is wise. I am going to try it anyway but please give me your opinion.

The overhead of local variables is zero. Each time you call a function, you are already setting up the stack for the parameters, return values, etc. Adding local variables means that you're adding a slightly bigger number to the stack pointer (a number which is computed at compile time).
Also, local variables are probably faster due to cache locality.
If you are only calling your function "thousands" of times (not millions or billions), then you should be looking at your algorithm for optimization opportunities after you have run a profiler.
Re: cache locality (read more here):
Frequently accessed global variables probably have temporal locality. They also may be copied to a register during function execution, but will be written back into memory (cache) after a function returns (otherwise they wouldn't be accessible to anything else; registers don't have addresses).
Local variables will generally have both temporal and spatial locality (they get that by virtue of being created on the stack). Additionally, they may be "allocated" directly to registers and never be written to memory.

The best way to find out is to actually run a profiler. This can be as simple as executing several timed tests using both methods and then averaging out the results and comparing, or you may consider a full-blown profiling tool which attaches itself to a process and graphs out memory use over time and execution speed.
Do not perform random micro code-tuning because you have a gut feeling it will be faster. Compilers all have slightly different implementations of things and what is true on one compiler on one environment may be false on another configuration.
To tackle that comment about fewer parameters: the process of "inlining" functions essentially removes the overhead related to calling a function. Chances are a small function will be automatically in-lined by the compiler, but you can suggest a function be inlined as well.
In a different language, C++, the new standard coming out supports perfect forwarding, and perfect move semantics with rvalue references which removes the need for temporaries in certain cases which can reduce the cost of calling a function.
I suspect you're prematurely optimizing, however, you should not be this concerned with performance until you've discovered your real bottlenecks.

Absolutly not! The only "performance" difference is when variables are initialised
int anint = 42;
vs
static int anint = 42;
In the first case the integer will be set to 42 every time the function is called in the second case ot will be set to 42 when the program is loaded.
However the difference is so trivial as to be barely noticable. Its a common misconception that storage has to be allocated for "automatic" variables on every call. This is not so C uses the already allocated space in the stack for these variables.
Static variables may actually slow you down as its some aggresive optimisations are not possible on static variables. Also as locals are in a contiguous area of the stack they are easier to cache efficiently.

There is no one answer to this. It will vary with the CPU, the compiler, the compiler flags, the number of local variables you have, what the CPU's been doing before you call the function, and quite possibly the phase of the moon.
Consider two extremes; if you have only one or a few local variables, it/they might easily be stored in registers rather than be allocated memory locations at all. If register "pressure" is sufficiently low that this may happen without executing any instructions at all.
At the opposite extreme there are a few machines (e.g., IBM mainframes) that don't have stacks at all. In this case, what we'd normally think of as stack frames are actually allocated as a linked list on the heap. As you'd probably guess, this can be quite slow.
When it comes to accessing the variables, the situation's somewhat similar -- access to a machine register is pretty well guaranteed to be faster than anything allocated in memory can possible hope for. OTOH, it's possible for access to variables on the stack to be pretty slow -- it normally requires something like an indexed indirect access, which (especially with older CPUs) tends to be fairly slow. OTOH, access to a global (which a static is, even though its name isn't globally visible) typically requires forming an absolute address, which some CPUs penalize to some degree as well.
Bottom line: even the advice to profile your code may be misplaced -- the difference may easily be so tiny that even a profiler won't detect it dependably, and the only way to be sure is to examine the assembly language that's produced (and spend a few years learning assembly language well enough to know say anything when you do look at it). The other side of this is that when you're dealing with a difference you can't even measure dependably, the chances that it'll have a material effect on the speed of real code is so remote that it's probably not worth the trouble.

It looks like the static vs non-static has been completely covered but on the topic of global variables. Often these will slow down a programs execution rather than speed it up.
The reason is that tightly scoped variables make it easy for the compiler to heavily optimise, if the compiler has to look all over your application for instances of where a global might be used then its optimising won't be as good.
This is compounded when you introduce pointers, say you have the following code:
int myFunction()
{
SomeStruct *A, *B;
FillOutSomeStruct(B);
memcpy(A, B, sizeof(A);
return A.result;
}
the compiler knows that the pointer A and B can never overlap and so it can optimise the copy. If A and B are global then they could possibly point to overlapping or identical memory, this means the compiler must 'play it safe' which is slower. The problem is generally called 'pointer aliasing' and can occur in lots of situations not just memory copies.
http://en.wikipedia.org/wiki/Pointer_alias

Using static variables may make a function a tiny bit faster. However, this will cause problems if you ever want to make your program multi-threaded. Since static variables are shared between function invocations, invoking the function simultaneously in different threads will result in undefined behaviour. Multi-threading is the type of thing you may want to do in the future to really speed up your code.
Most of the things you mentioned are referred to as micro-optimizations. Generally, worrying about these kind of things is a bad idea. It makes your code harder to read, and harder to maintain. It's also highly likely to introduce bugs. You'll likely get more bang for your buck doing optimizations at a higher level.
As M2tM suggests, running a profiler is also a good idea. Check out gprof for one which is quite easy to use.

You can always time your application to truly determine what is fastest. Here is what I understand: (all of this depends on the architecture of your processor, btw)
C functions create a stack frame, which is where passed parameters are put, and local variables are put, as well as the return pointer back to where the caller called the function. There is no memory management allocation here. It usually a simple pointer movement and thats it. Accessing data off the stack is also pretty quick. Penalties usually come into play when you're dealing with pointers.
As for global or static variables, they're the same...from the standpoint that they're going to be allocated in the same region of memory. Accessing these may use a different method of access than local variables, depends on the compiler.
The major difference between your scenarios is memory footprint, not so much speed.

Using static variables can actually make your code significantly slower. Static variables must exist in a 'data' region of memory. In order to use that variable, the function must execute a load instruction to read from main memory, or a store instruction to write to it. If that region is not in the cache, you lose many cycles. A local variable that lives on the stack will most surely have an address that is in the cache, and might even be in a cpu register, never appearing in memory at all.

I agree with the others comments about profiling to find out stuff like that, but generally speaking, function static variables should be slower. If you want them, what you are really after is a global. Function statics insert code/data to check if the thing has been initialized already that gets run every time your function is called.

Profiling may not see the difference, disassembling and knowing what to look for might.
I suspect you are only going to get a variation as much as a few clock cycles per loop (on average depending on the compiler, etc). Sometimes the change will be dramatic improvement or dramatically slower, and that wont necessarily be because the variables home has moved to/from the stack. Lets say you save four clock cycles per function call for 10000 calls on a 2ghz processor. Very rough calculation: 20 microseconds saved. Is 20 microseconds a lot or a little compared to your current execution time?
You will likely get more a performance improvement by making all of your char and short variables into ints, among other things. Micro-optimization is a good thing to know but takes lots of time experimenting, disassembling, timing the execution of your code, understanding that fewer instructions does not necessarily mean faster for example.
Take your specific program, disassemble both the function in question and the code that calls it. With and without the static. If you gain only one or two instructions and this is the only optimization you are going to do, it is probably not worth it. You may not be able to see the difference while profiling. Changes in where the cache lines hit could show up in profiling before changes in the code for example.

Related

Is making smaller functions generally more efficient memory-wise since variables get deallocated more frequently?

Is dividing the work into 5 functions as opposed to one big function more memory efficient in C since at a given time there are fewer variables in memory, as the stack-frame gets deallocated more often? Does it depend on the compiler, and optimization? if so in what compilers is it faster?
Answer given there are a lot of local variables and the stack frames comes from a centralized main and not created on the top of each other.
I know other advantages of breaking out the function into smaller functions. Please answer this question, only in respect to memory usage.

It might reduce "high water mark" of stack usage for your program, and if so that might reduce the overall memory requirement of the program.
Yes, it depends on optimization. If the optimizer inlines the function calls, you might well find that all the variables of all the functions inlined are wrapped into one big stack frame. Any compiler worth using is capable of inlining[*], so the fact that it can happen doesn't depend on compiler. Exactly when it happens, will differ.
If your local variables are small, though, then it's fairly rare for your program to use more stack than has been automatically allocated to you at startup. Unless you go past what you're given initially, how much you use makes no difference to overall memory requirements.
If you're putting great big structures on the stack (multiple kilobytes), or if you're on a machine where a kilobyte is a lot of memory, then it might make a difference to overall memory usage. So, if by "a lot of local variables" you mean few dozen ints and pointers then no, nothing you do makes any significant difference. If by "a lot of local variables" you mean a few dozen 10k buffers, or if your function recurses very deep so that you have hundreds of levels of your few dozen ints, then it's a least possible it could make a difference, depending on the OS and configuration.
The model that stack and heap grow towards each other through general RAM, and the free memory in the middle can be used equally by either one of them, is obsolete. With the exception of a very few, very restricted systems, memory models are not designed that way any more. In modern OSes, we have so-called "virtual memory", and stack space is allocated to your program one page at a time. Most of them automatically allocate more pages of stack as it is used, up to a configured limit that's usually very large. A few don't automatically extend stack (Symbian last I used it, which was some years ago, didn't, although arguably Symbian is not a "modern" OS). If you're using an embedded OS, check what the manual says about stack.
Either way, the only thing that affects total memory use is how many pages of stack you need at any one time. If your system automatically extends stack, you won't even notice how much you're using. If it doesn't, you'll need to ensure that the program is given sufficient stack for its high-water mark, and that's when you might notice excessive stack use.
In short, this is one of those things that in theory makes a difference, but in practice that difference is almost always insignificant. It only matters if your program uses massive amounts of stack relative to the resources of the environment it runs in.
[*] People programming in C for PICs or something, using a C compiler that is basically a non-optimizing assembler, are allowed to be offended that I've called their compiler "not worth using". The stack on such devices is so different from "typical" systems that the answer is different anyway.

I think in most cases the area of memory allocated for the stack (for the entire program) remains constant. The amount in use will change based on the depth of call stack and that amount would be less when fewer variables are used (but note that function calls push the return address and stack pointer also).
Also it depends on how the functions are called. If two functions are called in series, for example, and the stack of the first is popped before the call to the second, then you'll be using less of the stack..but if the first function calls the second then you're back to where you were with one big function (plus the function call overhead).

There's no memory allocation on stack - just moving the stack pointer towards next value. While stack size itself is predefined. So there's no difference in memory usage (apart of situations when you get stack overflow).

Yes, in the same vein that using a finer coat of paint on a jet plane increases its aerodynamic properties. Ok, that's a bad analogy, but the point is that if there is ever a question of making things clear and telegraphic or trying to use more functions, go with telegraphic. In most cases these are not mutually exclusive anyway as the beginners tend to give subroutines or functions too much to do.
In terms of memory I think that if you are truly splitting up up work (f, then g, then h) then you will see some minute available memory increases but if these are interdependent then you will not.
As #Joel Burget says, memory management is not really a consideration in code structuring.
Just my take.

Splitting a huge function into smaller ones does have its benefits, among them is potentially more optimized memory usage.
Say, you have this function.
void huge_func(int input) {
char a[1024];
char b[1024];
// do something with input and a
// do something with input and b
}
And you split it to two.
void func_a(int input) {
char a[1024];
// do something with input and a
}
void func_b(int input) {
char b[1024];
// do something with input and b
}
Calling huge_func will take at least 2048 bytes of memory, and calling func_a then func_b achieves the same outcome with about half less memory. However, if inside func_a you call func_b, the amount of memory used is about the same as huge_func. Essentially, as what #sje397 wrote.
I might be wrong to say this but I do not think there is any compiler optimization that could help you reduce the usage of stack memory. I believe the layout of stack memory must ensure that sufficient memory is reserved for all declared variables, whether used or not.

C variable allocation time and space

If i have a test.c file with the following
#include ...
int global = 0;
int main() {
int local1 = 0;
while(1) {
int local2 = 0;
// Do some operation with one of them
}
return 0;
}
So if I had to use one of this variables in the while loop, which one would be preferred?
Maybe I'm being a little vague here, but I want to know if the difference in time/space allocation is actually relevant.

If you are wondering whether declaring a variable inside a for loop causes it to be created/destroyed at every iteration, there is nothing really to worry about. These variables are not dynamically allocated at runtime, nothing is being malloced here - just some memory is being set aside for use inside the loop. So having the variable inside is just the same as having it outside the loop in terms of performance.
The real difference here is scope not performance. Whether you use a global or local variable only affects where you want this variable to be visible.

In case you're wondering about performance differences: most likely there aren't any. If there are theoretical performance differences, you'll find it hard to actually devise a test to measure them.

A decision like this should not be based on performance but semantics. Unless the semantic behavior of a global variable is required, you should always use automatic (local non-static) variables.
As others have said and surely will say, there are unlikely to be any differences in performance. If there are, the automatic variable will be faster.

The C compiler will have an easier time making optimizations on the variables declared local to the function. The global variable would require an optimizer to perform "Inter-Procedural Data Flow Analysis", which isn't that commonly done.
As an example of the difference, consider that all your declarations initialize the variable to zero. However, in the case of the global variable, the compiler cannot use that information unless it verifies that no flow of control in your program can change the global prior to using it in your example function. In the case of the locally declared ("automatic") variables, there is no way the initial value can be changed by another function (in particular, the compiler verifies that their address is never passed to a sub-function) and the compiler can perform "killed definitions" and "value liveness" analysis to determine whether the zero value can be assumed in some code paths.
Of the two local variables, as a guideline, the optimizer will always have an easier time optimizing access to the variable with the smaller (more limited) scope.
Having stated the above, I would suggest that other answers concerning a bias toward semantics over optimizer-meta-optimization is correct. Use the variable which causes the code to read best, and you will be rewarded with more time returned to you than assisting the def-use optimization calculation.
In general, avoid using a global variable, or any variable which can be accessed more broadly than absolutely necessary. Limited scoping of variables helps prevent bugs from being introduced during later program maintenance.

There are three broad classes of variables: static (global), stack (auto), and register.
Register variables are stored in CPU registers. Registers are very fast word-sized memories, which are integrated in the CPU pipeline. They are free to access, but there are a very limited number of them (typically between 8 and 32 depending on your processor and what operations you're doing).
Stack variables are stored in an area of RAM called the stack. The stack is almost always going to be in the cache, so stack variables typically take 1-4 cycles to access.
Generally, local variables can be either in registers or on the stack. It doesn't matter whether they are allocated at the top of a function or in a loop; they will only be allocated once per function call, and allocation is basically free. The compiler will put variables in registers if at all possible, but if you have more active variables than registers, they won't all fit. Also, if you take the address of a variable, it must be stored on the stack since registers don't have addresses.
Global and static variables are a different beast. Since they are not usually accessed frequently, they may not be in cache, so it could take hundreds of cycles to access them. Also, since the compiler may not know the address of a global variable ahead of time, it may need to be looked up, which is also expensive.
As others have said, don't worry too much about this stuff. It's definitely good to know, but it shouldn't affect the way you write your programs. Write code that makes sense, and let the compiler worry about optimization. If you get into compiler development, then you can start worrying about it. :)
Edit: more details on allocation:
Register variables are allocated by the compiler, so there is no runtime cost. The code will just put a value in a register as soon as the value is produced.
Stack variables are allocated by your program at runtime. Typically, when a function is called, the first thing it will do is reserve enough stack space for all of its local variables. So there is no per-variable cost.

Keep a global variable or recreate a local variable in c?

I've been programming with Java for Android quite some while now. Since performance is very important for the stuff I am working on I end up just spamming global variables. I guess everyone will come rushing in now and tell me this is the worst style ever, but lets keep it simple. For Android, local variables means garbage collection and garbage collection is something that kills performance.
Lately I have started using the NDK. Now I feel the urge to actually take all the local variables and change them to global variables. I am wondering though if this makes any sense in c code. Obviously it is no good style, but if it is needed for speed I'll sacrifice the style gladly.
I've looked through older threads about local vs global, but I haven't been able to find anything about speed. So my question is, if I am calling a function very often is it relevant for the speed that local variables are created and die after the function is done? Or doesn't it matter at all and I can happily keep on using the local variables.
I would test it myself, but for some reason the performance of my app goes up and down like a roller coaster and I doubt I'll be able to really make any sense of the data. I hope someone can help me out before I rewrite my whole code for nothing :)

For Android, local variables means garbage collection...
This is an incorrect statement. Local variables are allocated on the stack - not dynamically allocated on the heap.Check out this article on what gets allocated where in Java
As a rule, items allocated on the stack do not require garbage collection/freeing and "die" immediately after the execution leaves its current scope. Stack allocation/deallocation is significantly faster than heap allocation and garbage collection.
Try to avoid global variables for both style and performance reasons. Stack-allocated local variables will perform much faster.

In C, the performance difference depends on the hardware. Loading a global on a RISC processor is more instructions (because you have to load both halves of the address in separate instructions, versus an add to the stack pointer), and then you need to contend with cache issues. For the most part, you can count on your local variables being in the cache. Using globals will thrash the cache a bit and some functions may be very adversely affected.
If you have substantial performance variability while running your app, it is quite likely that your assertion about the performance impact of local variables is immaterial.
The "cost" of creating a local variable in C is zero; it's just bumping a register (the stack pointer) to make space for the local. Then you initialize that variable via whatever means are appropriate. You should be able to know if that is expensive or not by casual inspection. When the function exits, the stack pointer is returned to its previous value, regardless of how many local variables you have.
If your definition of "local variables" is heap allocated objects, though, you will suffer from the cost of memory allocation. Memory allocation is very slow in my opinion, so whatever you can do to get away from malloc/free (and 'new' in Java), the better off you'll be. (I make games, and we tend to use dlmalloc but even that is too slow for regular usage; 400ns per call adds up quick.)

On the MIPS- and ARM-based CPUs found in most Android phones, there is no reason whatsoever to move local variables to global space "for performance." Locals are stored on the stack and a stack allocation is a single op; moreover the entire stack is cleaned up at once on calling ret. Moving them to global space will just make your logic into a snarled mess of indecipherable state for no advantage.
The one place to worry about perf with creating objects is when you are allocating them on the heap (eg with malloc()). This is exactly where C is "more performant than" garbage-collected languages, because you can see and control exactly when these mallocs occur and when they are freed. It is not really the case that C malloc() is any faster than Java new; rather, because every allocation is transparent and explicit to you, you can do the necessary work to make sure that such slow operations happen as little as possible.

By the way, declaring a variable static within a C function will give you the behavior of a global without littering the global namespace.
But as mentioned, declaring automatic variables on the stack takes 0 time and accessing those variables is also extremely quick, so there is not much reason to avoid function local variables.
If you really need this extreme level of optimization you should look to inline all your commonly called functions to avoid the call overhead.

Code optimization

If I have a big structure(having lot of member variables). This structure pointer is passed to many functions in my code. Some member variables of this structure are used very often, in almost all functions.
If I put those frequently used member variables at the beginning in the structure declaration, will it optmize the code for MCPS - Million cycles per second(time consumed by the code). If i put frequently accessed members at time, will they be accessed efficiently/lesser time than if they are put randomly in the structure of at bottom of structure declaration? If yes what is the logic?
If I have a structure member being accessed in some function as follows:
structurepointer1->member_variable
Will it help in optimizing it in MCPS aspect if I assign it to a local variable and then access the local variable, as shown below?
local_variable = structurepointer1->member_variable;
If yes, then how does it help?

1) The position of a field in a structure should have no effect on its access time except to the extent that, if your structure is very large and spans multiple pages, it may be a good idea to position members that are often used in quick succession close together in order to increase locality of reference and try to decrease cache misses.
2) Maybe / maybe not. In fact it may make things slower. If the variable is not volatile, your compiler may be smart enough to store the field in a register anyway. Even if not, your processor will cache its value, but this may not help if is uses are somewhat far apart, with lots of other memory access in between. If the value would have either been stored in a register or would have stayed in your processor's cache, then assigning it to a local will only be unnecessary extra work.
Standard Optimizations Disclaimer: Always profile before optimizing. Make sure that what you are trying to optimize is worth optimizing. Always profile your attempted optimizations and make sure they actually made things faster (and not slower).

First, the obligatory disclaimer: for all performance questions, you must profile the code to see where improvements can be made.
In general though, anything you can do to keep your data in the processor cache will help. Putting the most commonly accessed items close together will facilitate this.

I know this is not really answering your question, but before you delve into super-optimizing your code, go through this presentation http://dl.fefe.de/optimizer-isec.pdf. I saw it live and it was a good eye opening experience showing compilers are getting far more advanced in optimization than we tend to think and readable code is more important than small optimizations.
On 2, you most likely are better off not declaring a local variable. The compiler is usually smart enough to figure out when and how variable is used and utilize registers to keep it around.
Also, I would second Mark Ransom's suggestion, profile the code before making assumptions about bottlenecks.

I think your question is related with data alignment and data structure padding. In modern compilers this is handled automatically the most of the times, trying to avoid the alignment faults that could happen on memory. You can read about this here. Of course, you can change the alignment for your data, but I think you would need to specify some compiler options to disable auto-alignment and rearrange the fields on the structure to match the architecture you are aiming to.
I would say this is a very low level optimization.

The location of the field in the structure is irrelevant as that will be calculated by the compiler. A more promising optimization is to make sure that your most-used fields are byte-aligned with the word size of your processor.
If you are using the variable local to a function, this should have no impact. If you are passing it to other functions (separate from the larger structure) than that might help a bit.

As with all of the other answers, you need to run a profile baseline before optimizing, to make sure changes are effective. If you're worried about execution time, profile your algorithms and optimize them before you worry about the code a compiler creates, more bang for the buck.
Also, if you want to know what is going to happen, you should consider compiling your c code into assembly output. This will give you an idea of what the compiler is going to do and how you may go about further "fine tuning".
Structure access is most always indexed indirect access. The assembly code will effectively pull memory knowing the pointer to the structure as the base plus and index to get the right field. This is usually an expensive operation, but for modern CPU's its probably not that slow.
This depends on the locality of the data being accessed. First and foremost accessing the structure the first time will be the most expensive. Accessing the data afterwards, can be quick if the data is already in a processor register, however, this may not be the case depending on the processor used. Storing to a local variable should be less expensive since the memory access instructions for such an operation is less expensive. Again, I think now days processors are fast enough that this optimization is minimal.
I still think that there are probably better places to optimize your code. It is good though that there is someone out there that thinks about this still, in a world of code bloat ;) Embedded computing, you still need to worry about these things.

This depends on the size of your fields and caching details. Look at using valgrind for profiling this.
If you doing this dereferencing a lot it would cost time. A decent optimizing compiler will effectively do the storing the pointer into the local variable optimization as you described. It will do a better job than you will and it will do it in an architecture-specific way.
What you want to do in this situation, overall, is make sure that you test the correctness and the performance of each optimization you are trying. Otherwise you are poking around in the dark.
Remember that fine optimizations at the C line level will virtually never trump higher-order algorithm/design optimizations.

Yes, it can help. But as people have already stated, it depends and can even be counter productive.
The reason why I think it can help, has to do with pointer aliasing. If you access your variables via a pointer, and the compiler can not guarantee that the structure was not changed elsewhere (via your pointer or another) he will generate code to reload or save the variable even if he could have hold the value in a register. Here an example to show what I mean:
calc = structurepointer1->member_variable * x + c;
/* Do something in function which doesn't involve member_variable; */
function(structurepointer1);
calc2 = structurepointer1->member_variable * y;
The compiler will make a memory access for both references to member_variable, because it can not be sure that the called function has modified that field.
If you're sure the function doesn't change that value, doing this would save 1 memory access
int temp = structurepointer1->member_variable;
calc = temp * x + something;
function(structurepointer1);
calc2 = temp * y;
There's also another reason you can use a local variable for your member variables, it can make the code much more readable.

Efficiency of C Variable Declaration [duplicate]

This question already has answers here:
How is conditional initialization handled and is it a good practice?
(5 answers)
Closed 9 years ago.
How long does it take to declare a variable in C, for example int x or unsigned long long var? I am wondering if it would make my code any faster in something like this.
for (conditions) {
int var = 0;
// code
}
Would it be faster to do this, or is it easier not to?
int var;
for (conditions) {
var = 0;
// code
}
Thanks for the help.

One piece of advice: stop worrying about which language constructs are microscopically faster or slower than which others, and instead focus on which ones let you express yourself best.
Also, to find out where your code is spending time, use a profiler.
And as others have pointed out, declarations are purely compile-time things, they don't affect execution time.

It doesn't make any difference. In a traditional implementation the declaration itself (excluding initialization) generates no machine instructions. Function prologue code typically allocates space in the stack for all local variables at once, regardless of where they are declared.
However, where you declare your local variables can affect the performance of your code indirectly, in theory at least. When you declare the variables as locally as possible (your first variant), in general case it results in smaller size of the stack frame reserved by the function for its local variables (since the same location in the stack can be shared by different local variables at different times). Having smaller stack frame reduces the general stack memory consumption, i.e. as nested function calls are performed stack size doesn't grow as fast (especially noticeable with recursive functions). It generally improves performance since new stack page allocations happen less often, and stack memory locality becomes better.
The latter considerations are platform-dependent, of course. It might have very little or no effect on your platform and/or for your applications.

Whenever you have a question about performance, the best thing to do is wrap a loop around it (millions of iterations) and time it. But, in this case, you will likely find that it makes no difference.
It is more important to properly express the intentions of your code. If you need the variable outside your loop, delare it outside. If you only need the variable inside the loop, declare it inside.
You should always declare and initialize variables in narrowest scope possible.
You shouldn't be worrying about those types of micro-optimizations anyway (except in the rarest, rarest of cases). If you really need to worry about potential nano-second performance improvements, measure the difference. It is very unlikely that your variable declarations will be the largest bottleneck in your application.

It takes no time at all. The memory for global variables is allocated at startup, and "declaring" variables on the stack simply involves how far "up" the stack pointer moves when the function is called.

declarations are purely compile time, they cost nothing at runtime¹. But the first piece of code is still better than the second for two reasons
you should always initialize variables when you declare them, they way they can never have uninitialized values. This goes hand in hand with
always use the narrowest possible scope for variable declarations
So your first example, while no faster than the second, is still better.
And all of the people who chimed in telling him not to prematurely or micro optimize his code are wrong. It is never bad to know how costly various bits of code are. The very best programmers have a solid, almost unconcious, grasp of the cost of various strategies and take that into account automatically when they design. The way you become that programmer is to ask just this sort of question when you are a beginner.
¹ In fact, there is a small cost when each function allocates space for local variables, but that cost is the same regardless of how many local variables there are*.
*ok that's not really true, but the cost depends only on the total amount of space, not the number of variables.

Declaration takes no time at all.
The compiler will interpret that line as a notification that space for it will need to exist on the stack.

As others have already said, it shouldn't take any time. Therefore you need to make this decision based on other factors: what would make your code more readable and less prone to bugs. It's generally considered a good practice to declare a variable as close as possible to its usage (so you can see the declaration and usage in one go). If it's only used in the inner scope then just declare it inside that scope - forget about performance on this one.

Declaring variables does take time, as it results in machine language instructions that allocate the space for the variables on the stack. This is simply an increment of the stack pointer, which takes a tiny, but non-zero amount of time.
I believe your question is whether more time will be required (i.e. more stack increment operations) if the variable is declared inside the loop. The answer is no, since the stack is incremented once only for the loop block, not each time the loop is executed. So, there will be no difference in time either way, even if the loop executes zillions of zillions of times.

Disclaimer: Precisely what happens depends on your compiler, architecture, etc. But conceptually here's what's going on:
When you declare a variable within a method, it is allocated on the stack. Allocating something on the stack only involves bumping up the stack pointer by the size of the variable. So, for example, if SP represents the memory address of the top of the stack, declaring char x results in SP += 1 and int x results in SP += 4 (on a 32 bit machine).
When the function exits, the stack pointer is returned to where it was before your method was called. So deallocating everything is fast, too.
So, either way it's just an add, which takes the same amount of time regardless of the amount of data.
A smart compiler will combine several variable declarations into a single add.
When you declare a variable within a loop, in theory it could be changing the stack pointer on each iteration through the loop, but again, a smart compiler probably won't do that.
(A notable exception is C++, which does extra work because it needs to call constructors and destructors when the stack-allocated object is created or destroyed.)

I wouldn't care about a nanosecond here or there. Unless you need to access its value after the for loop ends, leave the variable inside the loop: it will be closer to the code that uses it (your code will be more readable), and its scope will be bounded by the loop itself (your code will be more elegant and less bug-prone).

I bet the compiled binary will be identical for both cases.

Variable declaration is turned into stack space reservation by the compiler. Now how does this work is entirely platform-dependent. On x86 and pretty much every popular architecture this is just a subtraction from the address of the stack frame and\or indexing addressing mode to access from the top of the stack. All these come with the cost of a simple subtraction\addition, which is really irrelevant.
Technically the second example is less efficient, because the declaration happens on every entry into the loop scope, i.e. on every loop iteration. However it is 99.99% chance that the stack space will be reserved only once.Even the assignment operation will be optimized away, although technically it should be done every loop iteration. Now in C++ this can get much worse, if the variable has a constructor which will then be run on every loop iteration.
And as a bottom line, you really should not worry about any of such issues without proper profiling. And even then there are much more valuable questions to ask yourself here, like "what is the most readable way to do this, what is easier to understand and maintain, etc.".