How to change the local variable without its reference - c

Interview question : Change the local variable value without using a reference as a function argument or returning a value from the function
void func()
{
/*do some code to change the value of x*/
}
int main()
{
int x = 100;
printf("%d\n", x); // it will print 100
func(); // not return any value and reference of x also not sent
printf("%d\n", x); // it need to print 200
}
x value need to changed

The answer is that you can’t.
The C programming language offers no way of doing this, and attempting to do so invariably causes undefined behaviour. This means that there are no guarantees about what the result will be.
Now, you might be tempted to exploit undefined behaviour to subvert C’s runtime system and change the value. However, whether and how this works entirely depends on the specific executing environment. For example, when compiling the code with a recent version of GCC and clang, and enabling optimisation, the variable x simply ceases to exist in the output code: There is no memory location corresponding to its name, so you can’t even directly modify a raw memory address.
In fact, the above code yields roughly the following assembly output:
main:
subq $8, %rsp
movl $100, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
call func
movl $100, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
As you can see, the value 100 is a literal directly stored in the ESI register before the printf call. Even if your func attempted to modify that register, the modification would then be overwritten by the compiled printf call:
…
movl $200, %esi /* This is the inlined `func` call! */
movl $100, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
…
However you dice it, the answer is: There is no x variable in the compiled output, so you cannot modify it, even accepting undefined behaviour. You could modify the output by overriding the printf function call, but that wasn’t the question.

By the design of the C language, and by the definition of a local variable, you cannot access it from outside without making it available in some way.
Some ways to make a local variable accessible to the outside world:
send a copy of it (the value);
send a pointer to it (don't save and use the pointer for too long, since the variable may be removed when its scope ends);
export it with extern if the variable is declared at file level (outside of all functions).

Hack
Only changing code in void func(), create a define.
Akin to #chqrlie.
void func()
{
/*do some code to change the value of x*/
#define func() { x = 200; }
}
int main()
{
int x = 100;
printf("%d\n", x); // it will print 100
func(); // not return any value and reference of x also not sent
printf("%d\n", x); // it need to print 200
}
Output
100
200

The answer is that you can’t, but...
I perfectly agree with what #virolino and #Konrad Rudolph and I don't like my "solution" to this problem be recognised as a best practise, but since this is some sort of challenge one can come up with this approach.
#include <stdio.h>
static int x;
#define int
void func() {
x = 200;
}
int main() {
int x = 100;
printf("%d\n", x); // it prints 100
func(); // not return any value and reference of x also not sent
printf("%d\n", x); // it prints 200
}
The define will set int to nothing. Thus x will be the global static x and not the local one. This compiles with a warning, since the line int main() { is now only main(){. It only compiles due to the special handling of a function with return type int.

This approach is hacky and fragile, but that interviewer is asking for it. So here's an example for why C and C++ are such fun languages:
// Compiler would likely inline it anyway and that's necessary, because otherwise
// the return address would get pushed onto the stack as well.
inline
void func()
{
// volatile not required here as the compiler is told to work with the
// address (see lines below).
int tmp;
// With the line above we have pushed a new variable onto the stack.
// "volatile int x" from main() was pushed onto it beforehand,
// hence we can take the address of our tmp variable and
// decrement that pointer in order to point to the variable x from main().
*(&tmp - 1) = 200;
}
int main()
{
// Make sure that the variable doesn't get stored in a register by using volatile.
volatile int x = 100;
// It prints 100.
printf("%d\n", x);
func();
// It prints 200.
printf("%d\n", x);
return 0;
}

Boring answer: I would use a straightforward, global pointer variable:
int *global_x_pointer;
void func()
{
*global_x_pointer = 200;
}
int main()
{
int x = 100;
global_x_pointer = &x;
printf("%d\n", x);
func();
printf("%d\n", x);
}
I'm not sure what "sending reference" means. If setting a global pointer counts as sending a reference, then this answer obviously violates the stated problem's curious stipulations and isn't valid.
(On the subject of "curious stipulations", I've sometimes wished SO had another tag, something like driving-screws-with-a-hammer, because that's what these "brain teasers" always make me think of. Perfectly obvious question, perfectly obvious answer, but no, gotcha, you can't use that answer, you're stuck on a desert island and your C compiler's for statement got broken in the shipwreck, so you're supposed to be McGyver and use a coconut shell and a booger instead. Occasionally these questions can demonstrate good lateral thinking skills and are interesting, but most of the time, they're just dumb.)

Related

Why isn't "Return" necessary?

The following code is for the lab https://cs50.harvard.edu/x/2021/labs/1/population/.
#include <cs50.h>
#include <stdio.h>
int main(void)
{
//Prompt for start size
int startPop;
do
{
startPop = get_int("Starting population: ");
}
while (startPop < 9);
//Prompt for end size
int endPop;
do
{
endPop = get_int("Ending population: ");
}
while (endPop < startPop);
int Years = 0;
while (startPop < endPop)
{
startPop = startPop + (startPop/3) - (startPop/4);
Years++;
}
printf("Total Years: %i", Years);
}
Why isn't return used after each integer is received? Like this
int startPop;
do
{
startPop = get_int("Starting population: ");
}
while (startPop < 9);
return startPop;
How do I know when and where to use it? What is the purpose of return?
It seems like every time I try to solve a problem I am completely off base and don't even know where/how to begin, even after hours of thinking about it and then the solution also mystifies me.
return will exit the function that you are currently in, and the remainder of the function will not execute. It may also return a value from that function, if it is a non-void function. Here is an example:
int main(void)
{
foo()
}
void foo()
{
//...some code...
return;
//..some more code...
}
In this case foo will be called and will run until it hits the return, at which point the execution will return to main, and all the code in foo after main will not execute.
In your program, there is only one function (main), and calling return from main will exit the program (not what we want in this case).
The return XX at the end of main is not necessary because the C standard says so:
C18 §5.1.2.2.3 Program termination
... reaching the } that terminates the main function returns a value of 0.
In other words, if there is no return xx the int main() function is presumed to return 0.
For all other functions that return a value, the function must explicitly return a value.
For void functions, the return statement is optional.
Return is the C syntax to tell the assembler to create a ret instruction.
What is a ret instruction? From Intel's x86 manual:
RET — Return from Procedure
Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.
When you call and return from a function (here called a procedure), in reality you are executing the following code at assembly level:
push rbp // push rbp (base pointer) to the stack
mov rbp, rsp // store rsp (stack pointer) into rbp.
...
mov rsp, rbp // get back the value of rsp
pop rbp // pop rbp from the stack
ret
"The caller" is the original function in which the call to the other function, "the callee" happens.
Note that by default the C language will asssume (rightfully) that every function has a return point, and that is why for void function, as well as for main, a written return is not necessary.
The return along with a value (let say 20) is translated as (intel syntax):
mov eax, 20
ret
What is the relationship with your question?
In your example, there is no call to a callee function, you are executing code inside your function (the same as what would happen if you add "inline") so the the variable you are currently calculating startPop doesn't need a return.
PS: On a side note, having a do while with = does not make a lot of sense.

for(;...) or while(...) flow control? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Which one, 1 or 2, is better in any way (whatever can be considered better)? Are they exactly the same?
void method1(char **var1) {
//the last element of var1 is NULL
char **var2 = var1;
int count = 0;
//1
for (; *var2; (*var2)++, count++);
//2
while(*var2) {
(*var2)++;
count++;
}
}
you could examine the asm output at different optimization levels with your compiler... or just not worry about stuff that is semantically the same...
...
LBB0_1: ## =>This Inner Loop Header: Depth=1
movq -16(%rbp), %rax
cmpq $0, (%rax)
je LBB0_4
## BB#2: ## in Loop: Header=BB0_1 Depth=1
jmp LBB0_3
LBB0_3: ## in Loop: Header=BB0_1 Depth=1
movq -16(%rbp), %rax
movq (%rax), %rcx
addq $1, %rcx
movq %rcx, (%rax)
movl -20(%rbp), %edx
addl $1, %edx
movl %edx, -20(%rbp)
jmp LBB0_1
LBB0_4:
...
.subsections_via_symbols
method2:
...
LBB0_1: ## =>This Inner Loop Header: Depth=1
movq -16(%rbp), %rax
cmpq $0, (%rax)
je LBB0_3
## BB#2: ## in Loop: Header=BB0_1 Depth=1
movq -16(%rbp), %rax
movq (%rax), %rcx
addq $1, %rcx
movq %rcx, (%rax)
movl -20(%rbp), %edx
addl $1, %edx
movl %edx, -20(%rbp)
jmp LBB0_1
LBB0_3:
...
.subsections_via_symbols
Purpose of the code in question
Your code seems to be entirely wrong as it increments the target of var2 pointer, which also serves for ending the loop. You cannot expect an incrementing value to reach zero. I will assume that (1) you wanted to increment the temporary pointer to iterate over a list (technically an array) of character strings and (2) that you expect a NULL pointer as a sentinel.
Detailed explanation of the pointer incrementation issue
So what is the logic of the code we are writing? It takes an array of strings (lines in a file, list of names, etc...), counts the items, and then does whatever else you need to do. The input argument is represented by a pointer to pointer to char, which can be a bit confusing for the beginner. Pointers are used for multiple purposes in C and one is to point to the first item of a list (technically array). This is the case of the list pointer (type char **) which points to an array of pointers (type char * each) which in turn point to an array of byte/character values (type char each).
Therefore you need to increment a local char ** pointer to iterate over the items and a temporary char * pointer to iterate over characters of an item. If you just want to read data, you must never increment anything else than local (temporary) variables. Incrementing *item is nonsense and would alter the data in a bad way (the pointer would point to the second character instead of the first one), and checking the incremented pointer for being NULL is a double nonsense.
In other words, the idiom of iterating over an array using a temporary pointer requires the following actions:
Increment the temporary pointer (and nothing else) at each step.
Check the target of the pointer (and not the address it points to) for the sentinel value.
Corrected code examples
Using C99 syntax, you probably wanted to do something like:
void method1(char **list) {
size_t count = 0;
for (char **item = list; *item; item++)
count++;
...
}
The older syntax is forcing you to do:
void method1(char **list) {
char **item;
size_t count = 0;
for (item = list; *item; item++)
count++;
...
}
A more intuitive version for people not fluent in pointers:
void method1(char **list) {
size_t count = 0;
for (size_t i = 0; list[i]; i++)
count++;
...
}
Note: The count is redundant as its value is kept the same as the value of i, so you could just do for (; list[count]; count++) with an empty body or while (list[count]) count++;.
A real function to just count the items would be:
size_t get_size(char **list)
{
int count = 0;
for (char **item = list; *item; item++)
count++;
return count;
}
Of course it could be simplified to (borrowing from other answer):
size_t get_size(char **list)
{
int count = 0;
for (; *list; list++)
count++;
return count;
}
Thanks to very specific circumstances where (1) it's easy to merge the condition and the increment and (2) you're not using the current item in the body, it can be turned to:
size_t get_size(char **list)
{
int count = 0;
while (*list++)
count++;
return count;
}
Attempt to answer the for versus while dilemma
While technically the while and for loops are equivalent, the for loop expresses the iteration idiom way better, as it keeps the iteration logic separate from the rest of the code and thus also makes it more reusable, i.e. you can use the same for header with a different body for any other iterative action on the list.
Bad usage of the for loop in the original code
There are a number of things that should be considered discouraged:
1) Don't modify the object from the for loop header.
for (... ; ...; (*item)++)
...
Any code matching the above patter modifies the target object instead of performing the looping logic, whenever item is a temporary pointer to the actual data.
2) Don't decouple any non-looping code from the for loop header.
char **item = list;
...
for (; *item; *item++)
count++;
The assignment before the for loop seems out of place. If you copy-pasted the header of the for loop to iterate again over all list items, the list would seem empty because of the omitted initialization.
3) Don't perform any per-item actions in the increment of the for loop header.
for (char **item = list; *item++, count++)
;
The count++ here doesn't help the looping at all, instead it performs an actual action (counting one item). If you copy-pasted the header of the for loop and added an actual body, the count would get modified.
4) Don't use non-descriptive for arguments, use simple names for temporary variables.
for (char **var2 = var1; *var2; var2++)
count++;
The two variables differ in their purpose, yet their names are almost the same, only distinguished by a number. How exactly you name them is a matter of context and preference.
Note: Some people also prefer explicit comparison to NULL instead of relying on boolean evaluation of pointers. I'm not one of them, though. Stack Exchange seems to highlight list as a keyword but I don't think there's such a keyword in C or C++.
I would prefer the for loop, if you initialize var2 as the first argument of the for loop, i.e.
for(char **var2 = var1; *var2; var2++)
because then all conditions (initial, terminal, increment) are located in one place
I would also prefer to make the test explicit, i.e.,
for(char **var2 = var1; *var2 != NULL; var2++)
because it makes the terminal condition more visible.
Next: I would not place count++ in the for loop, because if count is not modified inside the loop it is redundant and can be calculated from var2 - var 1. If count is modified inside the loop it should be done at a single spot.
But I assume this is a matter of taste only.
Probably both are same, compiler should not make any difference.
First of all the both loops are wrong. They have no sense. I think you mean the following
int count = 0;
while ( *var1++ ) ++count;
It is the loop I would use.
Or if you want that var1 would not be changed then
int count = 0;
for ( char **p = var1; *p; ++p ) ++count;
Also you could write
char **p = var1;
while ( *p ) ++p;
int count = p - var1;
you better make the loop conditional statement more stronger and explicit to avoid bugs and infinity loops. which one is better depends in your logic and code, "for" loops is faster and easier but if you want to make a loop which needs more logic then use "while" loop.

Is it possible to wrap shellcode in a C function such that control is returned to the caller after completion?

Suppose I have some arbitrary x86 instructions that I want to have executed in the context of some program, and I convert these instructions automatically or manually into shellcode. For example, the following instructions.
movq 1, %rax
cpuid
There are various questions, such as here and here, about casting shellcode to a function pointer and executing it by using a standard function invocation. However, arbitrary asm will generally not have the instructions to return to the caller after all the instructions have been completed.
I am interesting in writing an "interpreter" of sorts for arbitrary shellcode, so that it can execute a bunch of instructions (perhaps they are in a file somewhere), read out the value of certain registers, and return control to the main C program. I assume the shell code does not do something like exec and change the process, but merely runs instructions like rdpmc or cpuid.
I imagine something that looks like this, but I am not sure how I can patch the shellcode so that it returns control to the right place.
void executeAndReadRegisters(char* shellcode, int length, uint64_t* rax, uint64_t* rbx, uint64_t* rbx) {
// Modify the shellcode in some way so that it returns control to the
// current program's code after execution, right after "read out registers".
char* modifiedShellCode = malloc((length + EXTRA_NEEDED) * sizeof(char));
// How do I modify the shellcode to return to "Read out registers?"
int (*func)();
func = (int (*)()) modifiedShellCode;
(int)(*func)();
// Read out registers
asm("\t movq %%rax,%0" : "=r"(*rax));
asm("\t movq %%rbx,%0" : "=r"(*rbx));
asm("\t movq %%rcx,%0" : "=r"(*rcx));
}
int main(int argc, char **argv)
{
// Suppose this comes from a file somewhere
char shellcode[] = "...";
int length = ; // Get from external source
uint64_t rax,rbx,rcx;
executeAndReadRegisters(shellcode, length, &rax,&rbx, &rcx);
printf("%lu %lu %lu\n", rax,rbx,rcx);
}

Segmentation fault creating a user-level thread with C and assembly

I am trying to understand some OS fundamentals using some assignments. I have already posted a similar question and got satisfying answers. But this one is slightly different but I haven't been able to debug it. So here's what I do:
What I want to do is to start a main program, malloc a space, use it as a stack to start a user-level thread. My problem is with return address. Here's the code so far:
[I'm editing my code to make it up-to-date to the current state of my answer ]
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#define STACK_SIZE 512
void switch_thread(int*,int*);
int k = 0;
void simple_function()
{
printf("I am the function! k is: %d\n",k);
exit(0);
}
void create_thread(void (*function)())
{
int* stack = malloc(STACK_SIZE + 32);
stack = (int* )(((long)stack & (-1 << 4)) + 0x10);
stack = (int* ) ((long)stack + STACK_SIZE);
*stack = (long) function;
switch_thread(stack,stack);
}
int main()
{
create_thread(simple_function);
assert(0);
return 0;
}
switch_thread is an assembly code I've written as follows:
.text
.globl switch_thread
switch_thread:
movq %rdi, %rsp
movq %rsi, %rbp
ret
This code runs really well under GDB and gives the expected output (which is,passing the control to simple_function and printing "I am the function! k is: 0". But when run separately, this gives a segmentation fault. I'm baffled by this result.
Any help would be appreciated. Thanks in advance.
Two problems with your code:
Unless your thread is actually inside a proper procedure (or a nested procedure), there's no such thing as "base pointer". This makes the value of %rbp irrelevant since the thread is not inside a particular procedure at the point of initialization.
Contrary to what you think, when the ret instruction gets executed, the value that %rsp is referring to becomes the new value of the program counter. This means that instead of *(base_pointer + 1), *(base_pointer) will be consulted when it gets executed. Again, the value of %rbp is irrelevant here.
Your code (with minimal modification to make it run) should look like this:
void switch_thread(int* stack_pointer,int* entry_point);
void create_thread(void (*function)())
{
int* stack_pointer = malloc(STACK_SIZE + 8);
stack_pointer += STACK_SIZE; //you'd probably want to back up the original allocated address if you intend to free it later for any reason.
switch_thread(stack_pointer,function);
}
Your switch_thread routine should look like this:
.text
.globl switch_thread
switch_thread:
mov %rsp, %rax //move the original stack pointer to a scratch register
mov %rdi, %rsp //set stack pointer
push %rax //back-up the original stack pointer
call %rsi //call the function
pop %rsp //restore the original stack pointer
ret //return to create_thread
FYI: If you're initializing a thread on your own, I suggest that you first create a proper trampoline that acts as a thread entry point (e.g. ntdll's RtlUserThreadStart). This will make things much cleaner especially if you want to make your program multithreaded and also pass in any parameters to the start routine.
base_pointer needs to be suitably aligned to store void (*)() values, otherwise you're dealing with undefined behaviour. I think you mean something like this:
void create_thread(void (*function)())
{
size_t offset = STACK_SIZE + sizeof function - STACK_SIZE % sizeof function;
char *stack_pointer = malloc(offset + sizeof *base_pointer);
void (**base_pointer)() = stack_pointer + offset;
*base_pointer = function;
switch_thread(stack_pointer,base_pointer);
}
There is no need to cast malloc. It's generally a bad idea to cast pointers to integer types, or function pointers to object pointer types.
I understand that this is all portable-C nit-picky advice, but it really does help to write as much as your software as possible in portable code rather than relying upon undefined behaviour.

How can I print the contents of stack in C program?

I want to, as the title says, print the contents of the stack in my C program.
Here are the steps I took:
I made a simple assembly (helper.s) file that included a function to return the address of my ebp register and a function to return the address of my esp register
.globl get_esp
get_esp:
movl %esp, %eax
ret
# get_ebp is defined similarly, and included in the .globl section
I called the get_esp () and get_ebp () functions from my C program ( fpC = get_esp (); where fpC is an int)
I (successfully, I think) printed the address of my esp and ebp registers ( fprintf (stderr, "%x", fcP); )
I tried, and failed to, print out the contents of my esp register. (I tried fprintf (sderr, "%d", *fcP); and fprintf (sderr, "%x", *((int *)fcP));, among other methods). My program hits a segmentation fault at runtime when this line is processed.
What am I doing wrong?
EDIT: This must be accomplished by calling these assembly functions to get the stack pointers.
EDIT2: This is a homework assignment.
If your utilising a GNU system, you may be able to use GNU's extension to the C library for dealing backtraces, see here.
#include <execinfo.h>
int main(void)
{
//call-a-lot-of-functions
}
void someReallyDeepFunction(void)
{
int count;
void *stack[50]; // can hold 50, adjust appropriately
char **symbols;
count = backtrace(stack, 50);
symbols = backtrace_symbols(stack, count);
for (int i = 0; i < count; i++)
puts(symbols[i]);
free(symbols);
}
get_esp returns esp as it is within the function. But this isn't the same as esp in the calling function, because the call operation changes esp.
I recommend replacing the function with a piece of inline assembly. This way esp won't change as you try to read it.
Also, printing to sderr wouldn't help. From my experience, stderr works much better.

Resources