GNU Lightning - Lisp like "Apply" function

GNU Lightning - Lisp like "Apply" function - c

I'm trying to make a sort of Lisp like "Apply" function with GNU Lightning: a function F that receive a pointer to a function, an argument count and an array of integers and call G with the right number of parameters.
My code is not working properly. What's wrong? How can I do?
Here the code:
#include <stdio.h>
#include <stdlib.h>
#include <lightning.h>
int f0() {
printf("f0();\n");
}
int f1(int p1) {
printf("f1(%d);\n", p1);
}
int f2(int p1, int p2) {
printf("f2(%d, %d);\n", p1, p2);
}
int f3(int p1, int p2, int p3) {
printf("f3(%d, %d %d)\n", p1, p2, p3);
}
int main (int argc, char **argv) {
init_jit(argv[0]);
jit_state_t *_jit = jit_new_state();
int (*f)(int *g, int argc, int *argv);
jit_prolog();
jit_node_t *p_g = jit_arg();
jit_node_t *p_argc = jit_arg();
jit_node_t *p_argv = jit_arg();
jit_getarg(JIT_R0, p_g);
jit_getarg(JIT_R1, p_argc);
jit_getarg(JIT_R2, p_argv);
jit_prepare();
/* for ( ; argc; argc--, argv++) */
jit_node_t *label = jit_label();
jit_node_t *zero = jit_beqi(JIT_R1, 0);
// *argv
jit_ldr_i(JIT_V0, JIT_R2);
jit_pushargr(JIT_V0);
// Go next
// argv++
jit_addi(JIT_R2, JIT_R2, sizeof(int));
// argc--
jit_subi(JIT_R1, JIT_R1, 1);
//
jit_patch_at(jit_jmpi(), label);
jit_patch(zero);
jit_finishr(JIT_R0);
jit_reti(0);
jit_epilog();
f = jit_emit();
jit_clear_state();
f((void *)f0, 0, NULL);
int a1[] = {10};
f((void *)f1, 1, a1);
int a2[] = {100, 200};
f((void *)f2, 2, a2);
int a3[] = {1000, 2000, 3000};
f((void *)f3, 3, a3);
finish_jit();
return 0;
}
Expected output:
f0();
f1(10);
f2(100,200);
f3(1000,2000,3000);
Real output:
f0();
f1(10);
f2(200, 2);
f3(3000, 3 -13360)

If we add a call to jit_disassemble() to your code, we see that the generated code for your f function looks like this:
0x7f2511280000 sub $0x30,%rsp
0x7f2511280004 mov %rbx,0x28(%rsp)
0x7f2511280009 mov %rbp,(%rsp)
0x7f251128000d mov %rsp,%rbp
0x7f2511280010 sub $0x18,%rsp
0x7f2511280014 mov %rdi,%rax
0x7f2511280017 mov %rsi,%r10
0x7f251128001a mov %rdx,%r11
0x7f251128001d nopl (%rax)
0x7f2511280020 test %r10,%r10
0x7f2511280023 je 0x7f2511280040
0x7f2511280029 movslq (%r11),%rbx
0x7f251128002c mov %rbx,%rdi
0x7f251128002f add $0x4,%r11
0x7f2511280033 sub $0x1,%r10
0x7f2511280037 jmpq 0x7f2511280020
0x7f251128003c nopl 0x0(%rax)
0x7f2511280040 callq *%rax
0x7f2511280042 xor %rax,%rax
0x7f2511280045 mov %rbp,%rsp
0x7f2511280048 mov 0x28(%rsp),%rbx
0x7f251128004d mov (%rsp),%rbp
0x7f2511280051 add $0x30,%rsp
0x7f2511280055 retq
If we look at the code that's generated by jit_pushargr, the problem becomes apparent:
0x7f251128002c mov %rbx,%rdi
So this sets the value of *argv as the first argument of the called function (rdi being the register that holds the first argument in the x64 calling convention). It does so multiple times because it's in a loop, but it's always the first argument that's being set. So when the function is called after the loop, rdi will hold the value that was last written to it in the loop (i.e. the last value inside argv) and the other argument registers / memory locations won't have been written to at all.
The way that jit_pushargr works is that it will write to the first argument when you call it the first time, then the second argument when you call it the second time and so on. But in your code you only call it once, so it only ever writes the first argument.
So what you'd need to do to create an apply function using lightning would be to actually call jit_pushargr argc times. That means that instead of generating a general apply function, you'd want to define the apply function itself in normal C and instead generate a helper function that pushes the given number of arguments.
Alternatively you could do this entirely without lightning and instead use libffi for this, which would be the more traditional tool for uses like this and wouldn't incur the overhead of generating new code every time apply is called. Of course this wouldn't prevent you to call your apply function from code generated by lightning.

Related

Buffer Overflow Return Address Attack

I am trying to learn more about buffer overflows so I have created a simple program to gain knowledge and try to exploit it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void failed(void)
{
puts("Did not exploit");
exit(0);
}
void pass(void)
{
puts("Good Job");
exit(1);
}
void foo()
{
char input[4];
gets(input);
}
int _main()
{
foo();
failed();
return 0;
}
I am trying to fill the buffer within foo() with random characters as well as the address of pass() such that the return address of foo() gets overwritten to the starting address of pass(). Using the GDB commands as follows to get relevant information.
x foo
-> 0x8049dd7 foo : 0xfb1e0ff3
disas foo
Dump of assembler code for function foo:
0x08049e09 <+0>: endbr32
0x08049e0d <+4>: push %ebp
0x08049e0e <+5>: mov %esp,%ebp
0x08049e10 <+7>: push %ebx
0x08049e11 <+8>: sub $0x14,%esp
0x08049e14 <+11>: call 0x8049e5a <__x86.get_pc_thunk.ax>
0x08049e19 <+16>: add $0x9b1e7,%eax
0x08049e1e <+21>: sub $0xc,%esp
0x08049e21 <+24>: lea -0xc(%ebp),%edx
0x08049e24 <+27>: push %edx
0x08049e25 <+28>: mov %eax,%ebx
0x08049e27 <+30>: call 0x8058850 <gets>
0x08049e2c <+35>: add $0x10,%esp
0x08049e2f <+38>: nop
0x08049e30 <+39>: mov -0x4(%ebp),%ebx
0x08049e33 <+42>: leave
0x08049e34 <+43>: ret
End of assembler dump.
I then created a python program which feeds its output into my vulnerable.c program as printing simply
print('A'*15 + '\x08\x04\x9d\xd7')
The A*15 is supposed to fill the buffer and the EBP then overwrites the return address with the address of foo (\x08\x04\x9d\xd7) but I continue to get segmentation faults. Any assistance would be great!

Any mistake and the attempt will segfault. You must:
have the right target address
put it in the right place on the stack
use the right byte order
The first one is difficult because the kernel will randomize address spaces on load,
primarily because of these kinds of attacks.
The other two you've gotten wrong.
If you'd like to play with something similar, here's an example
that changes the return address. Because of C calling conventions,
the stack is corrupted at the end of main, which can be fixed by using
stdcall or pascal calling conventions for the test function.
Syntax for that is compiler dependent.
#include <stdio.h>
#include <stdlib.h>
void oops() {
printf("oops!\n");
}
void /*__stdcall*/ test(int t)
{
/* x86 stack is top down, int is same size as pointer */
int *return_is_at = &t - 1;
/* replace parameter with our return address, for oops to return to */
*(&t) = *return_is_at; /* just-in-case avoid optimization*/
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
int main(int argc, char **argv)
{
test(1);
printf("test returned\n");
/* unless stdcall, at this point our stack is corrupted
and this return will crash, so:
*/
exit(1);
}
Here's an alternative function that uses a local variable to calculate
the return address location intead of the parameter.
This assumes a standard stack frame, which the compiler may optimize away.
It also corrupts the stack.
void test2()
{
/* x86 stack is top down, int is same size as pointer */
/* this relies on consistently defined stack frames */
int l;
int *return_is_at = &l + 2;
/* copy our return address up one,
for oops to return to (corrupting the stack)
*/
return_is_at[1] = *return_is_at;
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
FYI - It's possible to use a similar technique to track unique call trees for a function
(by walking up the stack frames) in order to fail specific call instances during testing.

divide and store quotient and reminder in different arrays

The standard div() function returns a div_t struct as parameter, for example:
/* div example */
#include <stdio.h> /* printf */
#include <stdlib.h> /* div, div_t */
int main ()
{
div_t divresult;
divresult = div (38,5);
printf ("38 div 5 => %d, remainder %d.\n", divresult.quot, divresult.rem);
return 0;
}
My case is a bit different; I have this
#define NUM_ELTS 21433
int main ()
{
unsigned int quotients[NUM_ELTS];
unsigned int remainders[NUM_ELTS];
int i;
for(i=0;i<NUM_ELTS;i++) {
divide_single_instruction(&quotient[i],&reminder[i]);
}
}
I know that the assembly language for division does everything in single instruction, so I need to do the same here to save on cpu cycles, which is bassicaly move the quotient from EAX and reminder from EDX into a memory locations where my arrays are stored. How can this be done without including the asm {} or SSE intrinsics in my C code ? It has to be portable.

Since you're writing to the arrays in-place (replacing numerator and denominator with quotient and remainder) you should store the results to temporary variables before writing to the arrays.
void foo (unsigned *num, unsigned *den, int n) {
int i;
for(i=0;i<n;i++) {
unsigned q = num[i]/den[i], r = num[i]%den[i];
num[i] = q, den[i] = r;
}
}
produces this main loop assembly
.L5:
movl (%rdi,%rcx,4), %eax
xorl %edx, %edx
divl (%rsi,%rcx,4)
movl %eax, (%rdi,%rcx,4)
movl %edx, (%rsi,%rcx,4)
addq $1, %rcx
cmpl %ecx, %r8d
jg .L5
There are some more complicated cases where it helps to save the quotient and remainder when they are first used. For example in testing for primes by trial division you often see a loop like this
for (p = 3; p <= n/p; p += 2)
if (!(n % p)) return 0;
It turns out that GCC does not use the remainder from the first division and therefore it does the division instruction twice which is unnecessary. To fix this you can save the remainder when the first division is done like this:
for (p = 3, q=n/p, r=n%p; p <= q; p += 2, q = n/p, r=n%p)
if (!r) return 0;
This speeds up the result by a factor of two.
So in general GCC does a good job particularly if you save the quotient and remainder when they are first calculated.

The general rule here is to trust your compiler to do something fast. You can always disassemble the code and check that the compiler is doing something sane. It's important to realise that a good compiler knows a lot about the machine, often more than you or me.
Also let's assume you have a good reason for needing to "count cycles".
For your example code I agree that the x86 "idiv" instruction is the obvious choice. Let's see what my compiler (MS visual C 2013) will do if I just write out the most naive code I can
struct divresult {
int quot;
int rem;
};
struct divresult divrem(int num, int den)
{
return (struct divresult) { num / den, num % den };
}
int main()
{
struct divresult res = divrem(5, 2);
printf("%d, %d", res.quot, res.rem);
}
And the compiler gives us:
struct divresult res = divrem(5, 2);
printf("%d, %d", res.quot, res.rem);
01121000 push 1
01121002 push 2
01121004 push 1123018h
01121009 call dword ptr ds:[1122090h] ;;; this is printf()
Wow, I was outsmarted by the compiler. Visual C knows how division works so it just precalculated the result and inserted constants. It didn't even bother to include my function in the final code. We have to read in the integers from console to force it to actually do the calculation:
int main()
{
int num, den;
scanf("%d, %d", &num, &den);
struct divresult res = divrem(num, den);
printf("%d, %d", res.quot, res.rem);
}
Now we get:
struct divresult res = divrem(num, den);
01071023 mov eax,dword ptr [num]
01071026 cdq
01071027 idiv eax,dword ptr [den]
printf("%d, %d", res.quot, res.rem);
0107102A push edx
0107102B push eax
0107102C push 1073020h
01071031 call dword ptr ds:[1072090h] ;;; printf()
So you see, the compiler (or this compiler at least) already does what you want, or something even more clever.
From this we learn to trust the compiler and only second-guess it when we know it isn't doing a good enough job already.

Hooking - hotpatching

I'm trying to hook the Windows API function FindWindowA(). I successfully did it with the code below without "hotpatching" it: I've overwritten the bytes at the beginning of the function. myHook() is called and a message box shows up when FindWindowA() is called.
user32.dll has hotpatching enabled and I'd like to overwrite the NOPs before the actual function instead of overwriting the function itself. However, the code below won't work when I set hotpatching to TRUE. It does nothing when FindWindowA() gets executed.
#include <stdio.h>
#include <windows.h>
void myHook()
{
MessageBoxA(NULL, "Hooked", "Hook", MB_ICONINFORMATION);
}
int main(int argc, char *argv[])
{
BOOLEAN hotpatching = FALSE;
LPVOID fwAddress = GetProcAddress(GetModuleHandleA("user32.dll"), "FindWindowA");
LPVOID fwHotpatchingAddress = (LPVOID)((DWORD)fwAddress - 5);
LPVOID myHookAddress = &myHook;
DWORD jmpOffset = (DWORD)&myHook - (DWORD)(!hotpatching ? fwAddress : fwHotpatchingAddress) - 5; // -5 because "JMP offset" = 5 bytes (1 + 4)
printf("fwAddress: %X\n", fwAddress);
printf("fwHotpatchingAddress: %X\n", fwHotpatchingAddress);
printf("myHookAddress: %X\n", myHookAddress);
printf("jmpOffset: %X\n", jmpOffset);
printf("Ready?\n\n");
getchar();
char JMP[1] = {0xE9};
char RETN[1] = {0xC3};
LPVOID offset0 = NULL;
LPVOID offset1 = NULL;
LPVOID offset2 = NULL;
if (!hotpatching)
offset0 = fwAddress;
else
offset0 = fwHotpatchingAddress;
offset1 = (LPVOID)((DWORD)offset0 + 1);
offset2 = (LPVOID)((DWORD)offset1 + 4);
DWORD oldProtect = 0;
VirtualProtect(offset0, 6, PAGE_EXECUTE_READWRITE, &oldProtect);
memcpy(fwAddress, JMP, 1);
memcpy(offset1, &jmpOffset, 4);
memcpy(offset2, RETN, 1);
VirtualProtect(offset0, 6, oldProtect, &oldProtect);
printf("FindWindowA() Patched");
getchar();
FindWindowA(NULL, "Test");
getchar();
return 0;
}
Could you tell me what's wrong?
Thank you.

Hotpatching enabled executable images are prepared by the compiler and linker to allow replacing the image while in use. The following two changes are applied (x86):
The function entry point is set to a 2-byte no-op mov edi, edi (/hotpatch).
Five consecutive nop's are prepended to each function entry point (/FUNCTIONPADMIN).
To illustrate this, here is a typical disassembly listing of a hotpaching enabled function:
(2) 768C8D66 90 nop
768C8D67 90 nop
768C8D68 90 nop
768C8D69 90 nop
768C8D6A 90 nop
(1) 768C8D6B 8B FF mov edi,edi
(3) 768C8D6D 55 push ebp
768C8D6E 8B EC mov ebp,esp
(1) designates the function entry point with the 2-byte no-op. (2) is the padding provided by the linker, and (3) is where the non-trivial function implementation starts.
To hook into a function you have to overwrite (2) with a jump to your hook function jmp myHook, and make this code reachable by replacing (1) with a relative jump jmp $-5.
The hook function must leave the stack in a consistent state. It should be declared as __declspec(naked) to prevent the compiler from generating function prolog and epilog code. The final instruction must either perform stack cleanup in line with the calling convention of the hooked function, or jump back to the hooked function at the address designated by (3).

How do I get info from the stack, using inline assembly, to program in c?

I have a task to do and I'm asking for some help. (on simple c lang')
What I need to do?
I need to check every command on the main c program (using interrupt num 1) and printing a message only if the next command is the same procedure that was sent earlier to the stack, by some other procedure.
What I want to do?
I want to take info from the stack, using inline assembley, and put it on a variable that can be compare on c program itself after returnning to c. (volatile)
This is the program:
#include <stdio.h>
#include <dos.h>
#include <conio.h>
#include <stdlib.h>
typedef void (*FUN_PTR)(void);
void interrupt (*Int1Save) (void); //pointer to interrupt num 1//
volatile FUN_PTR our_func;
char *str2;
void interrupt my_inter (void) //New interrupt//
{volatile FUN_PTR next_command;
asm { PUSH BP
MOV BP,SP
PUSH AX
PUSH BX
PUSH ES
MOV ES,[BP+4]
MOV BX,[BP+2]
MOV AX,ES:[BX]
MOV word ptr next_command,AX
POP ES
POP BX
POP AX
pop BP}
if (our_func==next_command) printf("procedure %s has been called\n",str2);}
void animate(int *iptr,char str[],void (*funptr)(), char fstr[])
{
str2=fstr;
our_func=funptr;
Int1Save = getvect(1); // save old interrupt//
setvect(1,my_inter);
asm { pushf //TF is ON//
pop ax
or ax,100000000B
push ax
popf}}
void unanimate()
{asm { pushf //TF is OFF//
pop ax
and ax,1111111011111111B
push ax
popf}
setvect (1,Int1Save); //restore old interrupt//}
void main(void)
{int i;
int f1 = 1;
int f2 = 1;
int fibo = 1;
animate(&fibo, "fibo", sleep, "sleep");
for(i=0; i < 8; i++)
{
sleep(2);
f1 = f2;
f2 = fibo;
fibo = f1 + f2;} // for//
unanimate();} // main//
My question...
Off course the problem is at "my inter" on the inline assembly. but can't figure it out.
What am I doing wrong? (please take a look at the code above)
I wanted to save the address of the pointer for the specific procedure (sleep) in the volatile our_func. then take the info (address to each next command) from the stack to volatile next_command and then finaly returnning to c and make the compare each time. If the same value (address) is on both variables then to print a specific message.
Hope I'm clear..
10x,
Nir B

Answered as a comment by the OP
I got the answer I wanted:
asm { MOV SI,[BP+18] //Taking the address of each command//
MOV DI,[BP+20]
MOV word ptr next_command+2,DI
MOV word ptr next_command,SI}
if ((*our_func)==(*next_command)) //Making the next_command compare//
printf("procedure %s has been called\n",str2);

Pointers and Pointer Functions

Studying the K&R book in C I had a few questions regarding complicated pointer declarations and pointer-array relationships.
1) What exactly is the difference between
char amessage[] = "this is a string";
and
char *pmessage
pmessage = "this is a string"
and when would you use one or the other?
From my understanding the first one allocates some amount of memory according to the size of the string, and then stores the chars in the memory. Then when you access amessage[] you just directly access whatever char you're looking for. For the second one you also allocate memory except you just access the data through a pointer whenever you need it. Is this the correct way of looking at it?
2) The book says that arrays when passed into functions are treated as if you gave the pointer to the first index of the array and thus you manipulate the array through manipulating the pointer even though you can still do syntax like a[i]. Is this true if you just created an array somewhere and want to access it or is it only true if you pass in an array into a function? For example:
char amessage[]= "hi";
char x = *(amessage + 1); // can I do this?
3) The book says the use of static is great in this particular function:
/* month_name: return name of n-th month */
char *month_name(int n)
{
static char *name[] = {
"Illegal month",
"January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"
};
return (n < 1 || n > 12) ? name[0] : name[n];
}
I don't understand why exactly this is a good use of static. Is it because the char *name[] would get deleted after function return if it is not static (because its a local variable)? Then does that mean in c you can't do stuff like:
void testFunction(){
int x = 1;
return x;
}
Without x being deleted before you use the return value? (Sorry I guess this might not be a pointer question but it was in the pointer chapter).
4) There are some complicated declaration like
char (*(*x())[])()
I'm really confused as to what is going on. So the x() part means a function x that returns a pointer? But what kind of pointer does it return its just a "" without like int or void or w/e. Or does that mean a pointer to a function (but I thought that would be like (*x)())? And then after you add brackets (because I assume brackets have the next precedence)...what is that? An array of functions?
This kind of ties to my confusion with function pointers. If you have something like
int (*func)()
That means a pointer to a function that returns an int, and the name of that pointer is func, but what does it mean when its like int (*x[3])(). I don't understand how you can replace the pointer name with an array.
Thanks for any help!
Kevin

1) What exactly is the difference between
char amessage[] = "this is a string";
and
char *pmessage
pmessage = "this is a string"
and when would you use one or the other?
amessage will always refer to the memory holding this is a string\0. You cannot change the address it refers to. pmessage can be updated to point to any character in memory, whether or not it is part of a string. If you assign to pmessage, you might lose your only reference to this is a string\0. (It depends if you made references anywhere else.)
I would use char amessage[] if I intended to modify the contents of amessage[] in place. You cannot modify the memory that pmessage points to. Try this little program; comment out amessage[0]='H' and pmessage[0]='H'; one at a time and see that pmessage[0]='H'; causes a segmentation violation:
#include <stdio.h>
int main(int argc, char* argv[]) {
char amessage[]="howdy";
char *pmessage="hello";
amessage[0]='H';
pmessage[0]='H';
printf("amessage %s\n", amessage);
printf("pmessage %s\n", pmessage);
return 0;
}
Modifying a string that was hard-coded in the program is relatively rare; char *foo = "literal"; is probably more common, and the immutability of the string might be one reason why.
2) The book says that arrays when passed into functions are treated as
if you gave the pointer to the first index of the array and thus you
manipulate the array through manipulating the pointer even though you
can still do syntax like a[i]. Is this true if you just created an
array somewhere and want to access it or is it only true if you pass
in an array into a function? For example:
char amessage[]= "hi";
char x = *(amessage + 1); // can I do this?
You can do that, however it is pretty unusual:
$ cat refer.c
#include <stdio.h>
int main(int argc, char* argv[]) {
char amessage[]="howdy";
char x = *(amessage+1);
printf("x: %c\n", x);
return 0;
}
$ ./refer
x: o
$
At least, I have never seen a "production" program that did this with character strings. (And I'm having trouble thinking of a program that used pointer arithmetic rather than array subscripting on arrays of other types.)
3) The book says the use of static is great in this particular
function:
/* month_name: return name of n-th month */
char *month_name(int n)
{
static char *name[] = {
"Illegal month",
"January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"
};
return (n < 1 || n > 12) ? name[0] : name[n];
}
I don't understand why exactly this is a good use of static. Is it
because the char *name[] would get deleted after function return if
it is not static (because its a local variable)? Then does that mean
in c you can't do stuff like:
void testFunction(){
int x = 1;
return x;
}
Without x being deleted before you use the return value? (Sorry I
guess this might not be a pointer question but it was in the pointer
chapter).
In this specific case, I believe the static is needless; at least GCC is able to determine that the strings are not modified and stores them in the .rodata read-only data segment. However, that might be an optimization with string literals. Your example with another primitive data type (int) also works fine because C passes everything by value both on function calls and function returns. However, if you're returning a pointer to an object allocated on the stack then the static is absolutely necessary, because it determines where in memory the object lives:
$ cat stackarray.c ; make stackarray
#include <stdio.h>
struct foo { int x; };
struct foo *bar() {
struct foo array[2];
array[0].x=1;
array[1].x=2;
return &array[1];
}
int main(int argc, char* argv[]) {
struct foo* fp;
fp = bar();
printf("foo.x: %d\n", fp->x);
return 0;
}
cc stackarray.c -o stackarray
stackarray.c: In function ‘bar’:
stackarray.c:9:2: warning: function returns address of local variable
If you change the storage duration of array to static, then the address that is being returned is not automatically allocated, and will continue to work even after the function has returned:
$ cat staticstackarray.c ; make staticstackarray ; ./staticstackarray
#include <stdio.h>
struct foo { int x; };
struct foo *bar() {
static struct foo array[2];
array[0].x=1;
array[1].x=2;
return &array[1];
}
int main(int argc, char* argv[]) {
struct foo* fp;
fp = bar();
printf("foo.x: %d\n", fp->x);
return 0;
}
cc staticstackarray.c -o staticstackarray
foo.x: 2
You can see where the memory allocation changes between stackarray and staticstackarray:
$ readelf -S stackarray | grep -A 3 '\.data'
[24] .data PROGBITS 0000000000601010 00001010
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601020 00001020
0000000000000010 0000000000000000 WA 0 0 8
$ readelf -S staticstackarray | grep -A 3 '\.data'
[24] .data PROGBITS 0000000000601010 00001010
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601020 00001020
0000000000000018 0000000000000000 WA 0 0 8
The .bss section in the version without static is 8 bytes smaller than the .bss section in the version with static. Those 8 bytes in the .bss section provide the persistent address that is returned.
So you can see that the case with strings didn't really make a difference -- at least GCC doesn't care -- but pointers to other types of objects, the static makes all the difference in the world.
However, most functions that return data in function-local-static storage have fallen out of favor. strtok(3), for example, extracts tokens from a string, and if subsequent calls to strtok(3) include NULL as the first argument to indicate that the function should re-use the string passed in the first call. This is neat, but means a program can never tokenize two separate strings simultaneously, and multiple-threaded programs cannot reliably use this routine. So a reentrant version is available, strtok_r(3), that takes an additional argument to store information between calls. man -k _r will show a surprising number of functions that have reentrant versions available, and the primary change is reducing static use in functions.
4) There are some complicated declaration like
char (*(*x())[])()
I'm really confused as to what is going on. So the x() part means a
function x that returns a pointer? But what kind of pointer does it
return its just a "" without like int or void or w/e. Or does that
mean a pointer to a function (but I thought that would be like
(*x)())? And then after you add brackets (because I assume brackets
have the next precedence)...what is that? An array of functions?
This kind of ties to my confusion with function pointers. If you have
something like
int (*func)()
That means a pointer to a function that returns an int, and the name
of that pointer is func, but what does it mean when its like int
(*x[3])(). I don't understand how you can replace the pointer name
with an array.
First, don't panic. You'll almost never need anything this complicated. Sometimes it is very handy to have a table of function pointers and call the next one based on a state transition diagram. Sometimes you're installing signal handlers with sigaction(2). You'll need slightly complicated function pointers then. However, if you use cdecl(1) to decipher what you need, it'll make sense:
struct sigaction {
void (*sa_handler)(int);
void (*sa_sigaction)(int, siginfo_t *, void *);
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
};
cdecl(1) only understands a subset of C native types, so replace siginfo_t with void and you can see roughly what is required:
$ cdecl
Type `help' or `?' for help
cdecl> explain void (*sa_sigaction)(int, void *, void *);
declare sa_sigaction as pointer to function
(int, pointer to void, pointer to void) returning void
Expert C Programming: Deep C Secrets has an excellent chapter devoted to understanding more complicated declarations, and even includes a version of cdecl, in case you wish to extend it to include more types and typedef handling. It's well worth reading.

This has to do with part 3 and is a kind of reply/addition to sarnold's comment. He's right in that with or without the static, the string literals are always going to be apart of the .data .rodata segment and essentially only created once. However, without the use of the word static, the actual array, that is the array of char pointers, will in fact be created on the stack each time the function is called.
With the use of static:
Dump of assembler code for function month_name:
0x08048394 <+0>: push ebp
0x08048395 <+1>: mov ebp,esp
0x08048397 <+3>: cmp DWORD PTR [ebp+0x8],0x0
0x0804839b <+7>: jle 0x80483a3 <month_name+15>
0x0804839d <+9>: cmp DWORD PTR [ebp+0x8],0xc
0x080483a1 <+13>: jle 0x80483aa <month_name+22>
0x080483a3 <+15>: mov eax,ds:0x8049720
0x080483a8 <+20>: jmp 0x80483b4 <month_name+32>
0x080483aa <+22>: mov eax,DWORD PTR [ebp+0x8]
0x080483ad <+25>: mov eax,DWORD PTR [eax*4+0x8049720]
0x080483b4 <+32>: pop ebp
0x080483b5 <+33>: ret
Without the use of static:
Dump of assembler code for function month_name:
0x08048394 <+0>: push ebp
0x08048395 <+1>: mov ebp,esp
0x08048397 <+3>: sub esp,0x40
0x0804839a <+6>: mov DWORD PTR [ebp-0x34],0x8048514
0x080483a1 <+13>: mov DWORD PTR [ebp-0x30],0x8048522
0x080483a8 <+20>: mov DWORD PTR [ebp-0x2c],0x804852a
0x080483af <+27>: mov DWORD PTR [ebp-0x28],0x8048533
0x080483b6 <+34>: mov DWORD PTR [ebp-0x24],0x8048539
0x080483bd <+41>: mov DWORD PTR [ebp-0x20],0x804853f
0x080483c4 <+48>: mov DWORD PTR [ebp-0x1c],0x8048543
0x080483cb <+55>: mov DWORD PTR [ebp-0x18],0x8048548
0x080483d2 <+62>: mov DWORD PTR [ebp-0x14],0x804854d
0x080483d9 <+69>: mov DWORD PTR [ebp-0x10],0x8048554
0x080483e0 <+76>: mov DWORD PTR [ebp-0xc],0x804855e
0x080483e7 <+83>: mov DWORD PTR [ebp-0x8],0x8048566
0x080483ee <+90>: mov DWORD PTR [ebp-0x4],0x804856f
0x080483f5 <+97>: cmp DWORD PTR [ebp+0x8],0x0
0x080483f9 <+101>: jle 0x8048401 <month_name+109>
0x080483fb <+103>: cmp DWORD PTR [ebp+0x8],0xc
0x080483ff <+107>: jle 0x8048406 <month_name+114>
0x08048401 <+109>: mov eax,DWORD PTR [ebp-0x34]
0x08048404 <+112>: jmp 0x804840d <month_name+121>
0x08048406 <+114>: mov eax,DWORD PTR [ebp+0x8]
0x08048409 <+117>: mov eax,DWORD PTR [ebp+eax*4-0x34]
0x0804840d <+121>: leave
0x0804840e <+122>: ret
As you can see in the second example (without static), the array is allocated on the stack each time:
0x08048397 <+3>: sub esp,0x40
and the pointers are loaded into the array:
0x0804839a <+6>: mov DWORD PTR [ebp-0x34],0x8048514
0x080483a1 <+13>: mov DWORD PTR [ebp-0x30],0x8048522
...
So there's obviously a little more to be set up each time the function is called if you decide not to use static.

3) It has nothing to do with that - static creates the array once, as opposed to creating it every time the function runs. Since the data in the array never changes, it is more efficient not to re-create it every time. Your example function would work fine, every time. It's a value. It won't be deleted before you can return it. That would be very unintuitive.

4) Adding some more information in the reply for the 4) point:
I'm following the next book to learn C: C for pascal Programmers by Norman J. Landis.It's quite old and it's thought to be a bridge from pascal to C; but I find it so so so useful, completed and explained at the lowest level of the machine. For me it's an awesome book.
The chapter 5.3.1 in the appendix A talks precisely about this. (Blockquotes is content extracted from the book)
Definition of base type:
The type specifier appearing in the declaration containing the declarator is called the >base type
Basically, in bool x => bool is the base type and in int x[] => the base type for the array is int and the base type for the x is array of int.
In order to interpret complex declarators, the following rules apply:
Apply asterisk operators first.
Apply the "function of base type"( () ) and "array of returning base type" ( [] ) >operators afterward, from right to left.
Of course, parentheses may enclose a declarator to alter the order of evaluation.
And there it is the same example changing the letter x by a letter w:
How I 'parse' this: char (*(*w())[])();
I'm going from outside of the parentheses to inside, after I follow the 2 rules said above. Steps:
Outside any parentheses, we find the function declarator. Then, so far we have a function returning a char.
Now, we enter in the parentheses and process prior pointer and after array.
Such pointer, is a pointer of "the upper base type", which is, we say, a function
returning a char. Then we got pointer of function returning a char, so far.
Following to the array, it's an array of "the upper base type". And "the upper base type" = pointer to function returning a char.
Now, go into the deepest parentheses, we find a pointer and a function. Same manner, first pointer, after function.
We process the pointer => pointer to an array of pointers to functions returning a char.
And finally the function declarator, and we got: Function returning a pointer to an array of pointers to functions returning a char.
I hope now it's much clear.
But you'll need some time and practice to really understand and hand this, but once you get it, it's pretty easy ;)