How does this quine work? - c

I just came across this quine question, but no one really went into how it works: C/C++ program that prints its own source code as its output
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";main(){printf(s,34,s,34);}
What I especially don't understand is the following has the same output even though I changed the ints:
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";main(){printf(s,5,s,11);}
It still prints the 34s! Can someone walk me through this step by step?

Let's start by formatting the code to span multiple lines. This breaks the fact that it's a quine, but makes it easier to see what's happening:
char* s = "char*s=%c%s%c;main(){printf(s,34,s,34);}";
main() {
printf(s, 34, s, 34);
}
Essentially, this is a declaration of a string s that's a printf-formatted string, followed by a declaration of a function main that invokes printf on four arguments. (This definition of main uses the old-fashioned "implicit int" rule in C where functions are assumed to have int as a return type unless specified otherwise. I believe this is currently deprecated in C and know for certain that this is not legal C++ code.)
So what exactly is this printf call doing? Well, it might help to note that 34 is the ASCII code for a double-quote, so the line
printf(s, 34, s, 34);
is essentially
printf(s, '"', s, '"');
This means "print the string s with arguments ", s, and "." So what's s? It's shown here:
char* s = "char*s=%c%s%c;main(){printf(s,34,s,34);}";
This follows a common self-reference trick. Ignoring the %c%s%c part, this is basically a string representation of the rest of the program. The %c%s%c part occurs at the point where it becomes self-referential.
So what happens if you call printf(s, '"', s, '"')? This will fill in the placeholder %c%s%c with "char*s=%c%s%c;main(){printf(s,34,s,34);}", which is the string contents of the string s. Combined with the rest of the string s, this therefore proints
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";main(){printf(s,34,s,34);";
which is the source code of the program. I think this is kinda neat - the closest English translation of a general Quine program I know is "print this string, the second time in quotes" (try it - see what happens!), and this basically does exactly that.
You were asking why changing the numbers to 5 and 11 didn't change that 34 was being printed. That's correct! The string literal s has 34 hard-coded into it, so changing the 5 and 11 in the call to printf won't change that. What it will do is no longer print the quotation marks around the inside of the string s and instead print non-printing characters.

Related

Quine Program Example in C [duplicate]

This question already has answers here:
C/C++ program that prints its own source code as its output
(10 answers)
Closed 3 years ago.
In my course slides, I have this example but without much explanation:
char*f="char*f=%c%s%c;main(){printf(f,34,f,34,10);}%c";main(){printf(f,34,f,34,10);}
I understand what quine programs in general mean but I do not quite understand what's happening in the code above. This is the output I get if I run it:
char*f="char*f=%c%s%c;main(){printf(f,34,f,34,10);}%c";main(){printf(f,34,f,34,10);}
But how is it reproducing its own code? I don't really understand how the output is produced.
Start by writing it out in a way that'll be clearer (not changing anything but the layout):
char*f="char*f=%c%s%c;main(){printf(f,34,f,34,10);}%c";
main()
{
printf(f,34,f,34,10);
}
So we see a main function as we'd expect (it should return an int but you're allowed to get away with not in C; likewise for no function arguments).
And before that, a regular string. It's a funny-looking string, but it is not really that different to char*f="fish";.
Okay, so what if we expand the printf by putting the string in there by hand?
printf("char*f=%c%s%c;main(){printf(f,34,f,34,10);}%c" ,34,f,34,10);
We can see that it's going to print out some guff, and substitute in some values along the way. They are:
First %c : 34 (the ASCII code for " (quotes))
First %s : 'f' (our string, once again)
Second %c : 34 (" again)
Third %c : 10 (the ASCII code for Newline)
Let's substitute those all in then too (though I've replaced the contents of the string with <the string>, and "'s with \"'s to make it actually work as a standalone statement):
main()
{
printf("char*f=\"<the string>\";main(){printf(f,34,f,34,10);}\n");
}
Well look at that! main simply prints out the line we first started with. Hurrah!
Edited to add:
Although I've basically spelled out the answer for you, there is still a puzzle remaining. Consider for yourself why we bother substituting in the 34, f, 34, 10, rather than just putting them directly into the string like I did in my final code.

Reason behind the following output

#include <stdio.h>
char *prg = "char *prg = %c%s%c;main(){printf(prg,34,prg,34);} " ;
void main (){
printf(prg,34,prg,34);
}
Reason behind the following output
char *prg = "char *prg = %c%s%c;main(){printf(prg,34,prg,34);} ";main(){printf(prg,34,prg,34);}
the content in *prg Is printed but in place of "%c%s%c" the content is replaced, why is it embedded
That's the way printf() works - in the format (the first prg argument), each conversion specification (%c, %s) converts a subsequent argument (34, prg) according to the conversion specifier (c, s) and writes the result to the standard output.
That program is a classical example of a program that prints its source code to stdout. Ascii character 34 is the ascii literal for " which is needed to be able to print the delimiters of a C string without incurring in an infinite recursive problem. There are some characters that cannot be used literally, as they are transformed by the compiler, one of these is ", which dissapears when compiled in a string literal, others are the escaped char literals \n, \t,... depending on that, convert into a different char literal. This is the reason the source code must be all in one line (control chars transform themselves by the compiler), no #include ... statements must be allowed (because it ends in a newline), and other things like this.
Compile it and rearrange it (you have modified it, for clarity, on posting) so you can output the exact same source code as you give the compiler.
Note
The program code, to mimic exactly its source form in the output, must be written as:
char*p="char*p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
without a trailing newline at the end of the line.
If you substitute 34 by '\"', and p by its value, you'll get:
...
printf(
"char*p=%c%s%c;main(){printf(p,34,p,34);}",
'\"',
"char *p=%c%s%c;main(){printf(p,34,p,34);}",
'\"');
that will form the original string, putting the right delimiters at the proper positions. (note: the %c and %s in the third parameter are not further expanded)
Note2
This program depends on ASCII encoding for ". It should'n work on EBCDIC encodings (you must use the encoding for " instead of the number 34)

Exploits in use of Externally-Controlled Format String

first, please look thought this website (https://cwe.mitre.org/data/definitions/134.html). They are some code is having vulnerability. I not really understand where is the vulnerability code, them talking about.
It has 3 code snippets with vulnerability such as PrintWrapper, Snprintf and %1$d, present on this website.
#CassieJade you need to look at the documentation of these functions online.
printf, snpritf are pretty common functions. And by the way, this platform is not for school assignments. You are most welcome if you have tried something and want to follow from there.
http://www.cplusplus.com/reference/cstdio/printf/
http://www.cplusplus.com/reference/cstdio/snprintf/
The following explains beautifully about your concern of $.
(GCC) Dollar sign in printf format string
Notation %2$d means the same as %d (output signed integer), except it formats the parameter with given 1-based number (in your case it's a second parameter, b).
int a = 3, b = 2;
printf("%2$d %1$d", a, b);
Here you would expect 3 2 to be printed, but it will print 2 3, because the parameter a becomes param#1, and b becomes param#2, and %2$d is printed first so 2 is printed first followed by %1$d which is 3
You may want to look at man page of printf, its a bit complex for newbies but its the final source of truth.
The following is your print wrapper.
char buf[5012];
memcpy(buf, argv[1], 5012);
printWrapper(argv[1]);
return (0);
Your website says: When an attacker can modify an
externally-controlled format string, this can lead to buffer
overflows, denial of service, or data representation problems.
Now, if this argv1 can be provided by someone who is not trusted, he can provide any junk argument which will go to printf. The goal of your task is to not to feed on print() with any string that is externally controlled.
e.g. argv1 can be very huge string (max allowable).
Or for example I am the one invoking your program and I passed argv1 as "%d Hello World", your printWrapper will end up printing some junk like "-446798072 Hello World", because no integer is passed as argument in printf(argv1).
Also memcpy is reading fixed number of bytes from origin argv1 which can have shorter length string, in this case it will be an invalid read (read past bound).
snprintf(buf,128,argv[1]);
exploit here is very clear, the argv1 can be changed with containment of several specifiers like %n which can write n number of bytes to your buf rather than intended write. By using %X in argc1 hacker can gain address of a variable on stack which can be exploited further. All this is vulnerable because an external untrusted source is creating the format specifier string that is used by your printf or snprintf, sprintf functions.
For example suppose hacker gave "%200d" in the argv1. sprintf(buf, 128, argv[1]);
will land up printing 200 bytes and then a junk integer, which might not be intended at all, since its snprintf which is a bounded function it will allow only 128 bytes to be written which will be empty.
I hope it is clear now.

printf() working without double quotes, prints random characters [duplicate]

This question already has answers here:
Behaviour of printf when printing a %d without supplying variable name
(6 answers)
Closed 4 years ago.
I stumbled upon this C code in a college test and when I tested it on Dev-C++ 5.11 compiler, it printed random characters. I can't understand how or why this code works. Can someone enlighten me?
int main() {
char a[10] = "%s" ;
printf( a ) ;
}
This question has two parts: the missing quotes and the random characters.
printf() is just a function. You can pass strings and other values to functions as arguments. You don't have to use a literal. You can use both char *a = "something"; printf(a) (passing a variable as an argument) and printf("something") (passing a string literal as an argument).
printf() is also a variadic function. This means it can accept any number of arguments. You can use printf("hello world"), printf("%s", "hello world") and even printf("%s %s", "hello", "world"). Some older compilers don't verify you actually passed the right number of arguments based on the first argument which is the format string. This is why your code compiles even though it's missing an argument. When the program runs the code goes over the format string, sees "%s" and looks for the second argument to print it as a string. Since there is no second argument it basically reads random memory and you get garbage characters.
printf function signature is:
int printf(const char *format, ...);
It expects format string as the first argument and variable number of arguments that are handled and printed based on the format specifiers in the format string. variable a in your question is providing it the format string. Reason for random characters is that the argument for format specifier %s is missing. Following will correctly print a string:
printf( a, "Hello World!" );
A list of format specifiers can be seen here https://en.wikipedia.org/wiki/Printf_format_string
Why does it compile?
Because variadic arguments accepted by printf are processed at run time. Not all compilers do compile time checks for validating arguments against the format string. Even if they do they would at most throw a warning, but still compile the program.
It's using the string "%s" as a format string, and using uninitialized memory as the "data".
The only reason it does "something" is because the compiler was apparently not smart enough to recognize that the format string required one parameter and zero parameters were supplied. Or because compiler warnings were ignored and/or errors were turned off.
Just an FYI for anybody who bumps into this: "Always leave all warnings and errors enabled and fix your code until they're gone" This doesn't guarantee correct behaviour but does make "mysterious" problems less likely.

Strings behvior on C

I want to understand a number of things about the strings on C:
I could not understand why you can not change the string in a normal assignment. (But only through the functions of string.h), for example: I can't do d="aa" (d is a pointer of char or a array of char).
Can someone explain to me what's going on behind the scenes - the compiler gives to run such thing and you receive segmentation fault error.
Something else, I run a program in C that contains the following lines:
char c='a',*pc=&c;
printf("Enter a string:");
scanf("%s",pc);
printf("your first char is: %c",c);
printf("your string is: %s",pc);
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
If I put two letters, the first letter printed right! And the string is printed with a lot of profits (incorrect)
If I put a letter, the letter is printed right! And the string is printed with a lot of profits and at the end something weird (a square with four numbers containing zeros and ones)
Can anyone explain what is happening behind?
Please note: I do not want the program to work, I did not ask the question to get suggestions for another program, I just want to understand what happens behind the scenes in these situations.
Strings almost do not exist in C (except as C string literals like "abc" in some C source file).
In fact, strings are mostly a convention: a C string is an array of char whose last element is the zero char '\0'.
So declaring
const char s[] = "abc";
is exactly the same as
const char s[] = {'a','b','c','\0'};
in particular, sizeof(s) is 4 (3+1) in both cases (and so is sizeof("abc")).
The standard C library contains a lot of functions (such as strlen(3) or strncpy(3)...) which obey and/or presuppose the convention that strings are zero-terminated arrays of char-s.
Better code would be:
char buf[16]="a",*pc= buf;
printf("Enter a string:"); fflush(NULL);
scanf("%15s",pc);
printf("your first char is: %c",buf[0]);
printf("your string is: %s",pc);
Some comments: be afraid of buffer overflow. When reading a string, always give a bound to the read string, or else use a function like getline(3) which dynamically allocates the string in the heap. Beware of memory leaks (use a tool like valgrind ...)
When computing a string, be also aware of the maximum size. See snprintf(3) (avoid sprintf).
Often, you adopt the convention that a string is returned and dynamically allocated in the heap. You may want to use strdup(3) or asprintf(3) if your system provides it. But you should adopt the convention that the calling function (or something else, but well defined in your head) is free(3)-ing the string.
Your program can be semantically wrong and by bad luck happening to sometimes work. Read carefully about undefined behavior. Avoid it absolutely (your points 1,2,3 are probable UB). Sadly, an UB may happen to sometimes "work".
To explain some actual undefined behavior, you have to take into account your particular implementation: the compiler, the flags -notably optimization flags- passed to the compiler, the operating system, the kernel, the processor, the phase of the moon, etc etc... Undefined behavior is often non reproducible (e.g. because of ASLR etc...), read about heisenbugs. To explain the behavior of points 1,2,3 you need to dive into implementation details; look into the assembler code (gcc -S -fverbose-asm) produced by the compiler.
I suggest you to compile your code with all warnings and debugging info (e.g. using gcc -Wall -g with GCC ...), to improve the code till you got no warning, and to learn how to use the debugger (e.g. gdb) to run your code step by step.
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
Because memory is allocated for only one byte.
See char c and assigned with "a". Which is equal to 'a' and '\0' is written in one byte memory location.
If scanf() uses this memory for reading more than one byte, then this is simply undefined behavior.
char c="a"; is a wrong declaration in c language since even a single character is enclosed within a pair of double quotes("") will treated as string in C because it is treated as "a\0" since all strings ends with a '\0' null character.
char c="a"; is wrong where as char c='c'; is correct.
Also note that the memory allocated for char is only 1byte, so it can hold only one character, memory allocation details for datatypes are described bellow

Resources