Buffer Overflow skipping newline - c

I was experimenting with stack buffer overflow
struct something {
char A[8];
unsigned short B;
};
int main(void) {
struct something st;
st.B = 1979;
strcpy(st.A, "excessive");
printf("%d\n", st.B);
printf("%s", st.A);
return 0;
}
This worked very well. I got the output as
101
excessive
But, the string "excessive" has 10 characters including null terminator and therefore it's supposed to overflow even if i change the array to char A[9]. But it doesn't overflow in that case and outputs the original value of B. Why is it so? Where is the \0 going?

Allocation of memory follows what is known as natural alignment; ie any item in memory is aligned to at least a multiple of its own size.
unsigned short B takes a size of 2 bytes (on x86). So it's memory address is aligned to multiples of 2. When you have A[8], the array consumes an even number of bytes; and B starts at the byte just after the last byte of A. This is how your string has overflown into the B's memory.
But, A[9] takes an odd number of bytes and therefore B cannot start at the next byte since it would break the alignment. The compiler will insert a padding byte to ensure that B have an address that is a multiple of 2. This extra bytes gives space for the \0 character to occupy and thus a lucky escape from buffer overflow.

The '\0' is written into the struct member B. That also explains why you get 101 in the first printf statement. It's working because strcpy is not smashing the stack. That means it is not overrunning the memory allocated for the structure variable st on the stack. The C standard guarantees that the structure members are stored in the order they are declared. However, if necessary, padding is added before each struct member, to ensure correct alignment.
Here, the strcpy call
strcpy(st.A, "excessive");
overruns the structure member array A and also writes in the structure member B. Luckily, the size of the structure st here is just enough for the string literal "excessive" to be copied by strcpy. On my 32-bit machine, sizeof st is 10. The number of bytes copied by strcpy is also 10. That also explains why the statement
printf("%d\n", st.B);
prints 101 and not 1979 because its contents has been overwritten by the strcpy call. If you try to copy a string longer than 9 characters, or if you had declared the array char a[B]; after the unsigned short B;, then the call to strcpy would attempt illegal memory access causing undefined behaviour and most likely program crash due to segfault.
Also, '\0' is called the null character or the null byte, not the newline character which is '\n'.

Related

memory space required for char definition

char txt[20] = "Hello World!\0";
How many bytes are allocated by the above definition?
Considering one char occupies 1 byte, one int 2 byte.
Note that there is only one ", and \0 at the end.
How to calcultate many many bytes the above definition has occupied?
Statement char txt[20]="Hello World!\0" comprises actually two parts, a definition part and an initialization part. char txt[20], the definition part, tells the compiler to reserve 20 elements of size of character (in this case 20 bytes), regardless of the content with which you will initialize the array later on. The initialization part ="Hello World!\0" then "prefills" the reserved memory with the characters of literal Hello World!\0. Note that it is actually not necessary to write \0 explicitly in the string, since string literals are by itself terminated by the \0-character. So you should write char txt[20]="Hello World!". It is OK if the length of the string literal is smaller than the memory allocated; If the length of the string literal used for initializing exceeds the length of the array, you get at least a compiler warning.
Note, however, that if you write char txt[]="Hello World!", the length of the memory reserved will be exactly the length of the initial string literal.
Concerning array initialization, you might confer to cppreference.com. Concerning the discussion on "variable definition" versus "variable declaration", I find this SO answer very helpful.
Anything which goes inside the double quotes in C is considered as string with null termination in the end. You don't have to add \0 in the end.
You can use strlen(arr)+1to get the size of char. Here +1 because strlen doesn't count null termination.

regarding memory allocated using malloc() [duplicate]

int *ptr = malloc(sizeof(char));
*ptr = 100000;
printf("%d\n", *ptr); // 100000
Shouldn't that only allocate enough memory for a char, i.e. 1 byte? Therefore shouldn't the largest number be 255?
How does it still print 100000?
Update
Thanks for the answers. If it overwrites the next bytes, how does C then know that this number is larger than one byte, and not just look in the first byte?
Because C has no range-checking of memory. It allocates a byte, and then your assignment via the pointer overwrites it and the next three bytes. If you had allocated another bit of memory right after the first malloc, but before the assignment, you might have overwritten part of the heap (depending on how your malloc works).
This is why pointers can be very dangerous in C.
The %d in the format statement (plus the type of the variable) tells the compiler you are looking at an int, and accesses all four bytes.
Note that if you really had assigned the value to a char, e.g. char *ptr; *ptr = 100000;
then with some compilers (and assuming plain char is treated as signed but default) it would have printed out -96, not 255 (or 127). This is because the compiler doesn't automatically limit the value to the highest value that can fit (127 in a signed char, 255 in an unsigned char), but instead it just overflows. Most compilers will complain that you are trying to assign a constant value that overflows the variable.
The reason it is -96, is that 100000 % 256 is 160, but as a signed char it is output as -(256-160).
Short answer: you're invoking undefined behaviour by writing to memory that doesn't belong to you, so you never know what you might get. It might just work, it might crash, or it might do any number of other things. In this case, you're putting an int at the position that ptr references, which writes the first byte of the int into the 1-byte allocated region, and smashes whatever lived in the following three bytes.
When you read the value back with the %d format specifier, printf reads sizeof(int) bytes from memory to print them as an int. If you want to output just the value of one byte, you need to do something like:
printf("%d\n", *(char*)ptr);
That is, tell the compiler that ptr refers to a char, then get that char value, which is promoted to an int in an argument list and subsequently output correctly by the %d specifier.
This is 3 byte overflow. Overflow like this from stackoverflow logo, but on a heap not stack.
The simplest answer would be: it's not guaranteed to work.
Basically, you are corrupting the memory around the pointer.
To answer your updated question; C "sees" *ptr i.e. contents of ptr where ptr points at an integer. So it reads an integer from that ptr. It has already forgotten that you only allocated one character. The two parts (allocation and then access) aren't linked by anything you've expressed in your code.
Generally, malloc will allocate memory with greater granularity than a single byte, so your request for a single byte will reserve a memory space rounded up to the nearest 8 bytes or something. You then write to this space, blithely going right past what you've reserved. Then you dereference the pointer, bringing back an int, and hand that to printf which happily prints it out.
As others have said, depending on this behavior is a bad, bad idea. Allocate exactly what you need and don't overrun your buffers.

pointer related queries

Guys i have few queries in pointers. Kindly help to resolve them
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
question.1 : what is the difference between these 2 types of declaration ?
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
question.2 : i didn't get how is it working
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
question.3 how many bytes are being allocated to the pointer c?
when i try to input a string, it takes just a word and not the whole string. why so ?
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
question.4 when i try to input a charcter why does it throw a segmentation fault?
Thanks in advance.. Waiting for your reply guys..
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
the %s expects a pointer to array of chars.
char *c=malloc(sizeof(char)); // you are allocating only 1 byte aka char, not array of char!
scanf("%s",c); // you need pass a pointer to array, not a pointer to char
printf("%s",c);// you are printing a array of chars, but you are sending a char
you need do this:
int sizeofstring = 200; // max size of buffer
char *c = malloc(sizeof(char))*sizeofstring; //almost equals to declare char c[200]
scanf("%s",c);
printf("%s",c);
question.3 how many bytes are being allocated to the pointer c? when i
try to input a string, it takes just a word and not the whole string.
why so ?
In your code, you only are allocating 1 byte because sizeof(char) = 1byte = 8bit, you need allocate sizeof(char)*N, were N is your "string" size.
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
Here you are declaring two variables, a and b, and initializing them. "this is an array of characters" is a string literal, which in C has type array of char. a has type array of char. In this specific case, the array does not get converted to a pointer, and a gets initialized with the array "this is an array of characters". b has type pointer to char, the array gets converted to a pointer, and b gets initialized with a pointer to the array "this is an array of characters".
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
In an expression, *b dereferences the pointer b, so it evaluates to the char pointed by b, i.e: T. This is not an address (which is what "%s" is expecting), so you get undefined behavior, most probably a crash (but don't try to do this on embedded systems, you could get mysterious behaviour and corrupted data, which is worse than a crash). In the second case, %s expects a pointer to a char, gets it, and can proceed to do its thing.
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
In C, sizeof returns the size in bytes of an object (= region of storage). In C, a char is defined to be the same as a byte, which has at least 8 bits, but can have more (but some standards put additional restrictions, e.g: POSIX requires 8-bit bytes, i.e: octets). So, you are allocating 1 byte. When you call scanf(), it writes in the memory pointed to by d without restraint, overwriting everything in sight. scanf() allows maximum field widths, so:
Allocate more memory, at least enough for what you want + 1 terminating ASCII NUL.
Tell scanf() to stop, e.g: scanf("%19s") for a maximum 19 characters (you'll need 20 bytes to store that, counting the terminating ASCII NUL).
And last (if markdown lets me):
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
c is not a pointer, so you are trying to store an address where you shouldn't. In scanf, "%c" expects a pointer to char, which should point to an object (=region of storage) with enough space for the specified field width, 1 by default. Since c is not a pointer, the above may crash in some platforms (and cause worse things on others).
I see several problems in your code.
Question 1: The difference is:
a gets allocated in writable memory, the so-called data segment. Here you can read and write as much as you want. sizeof a is the length of the string plus 1, the so-called string terminator (just a null byte).
b, however, is just a pointer to a string which is located in the rodata. That means, in a data area which is read only. sizeof b is whatever is the pointer size on your system, maybe 4 or 8 on a PC or 2 on many embedded systems.
Question 2: The printf() format wants a pointer to a string. With *b, you dereferene the pointer you have and give it the first byte of data, which is a t (ASCII 84 or something like that). The callee, however, treats it as a pointer, dereferences it and BAM.
With b, however, everything goes fine, as it is exactly the right call.
Question 3: malloc(sizeof(char)) allocates exactly one byte. sizeof(char) is 1 by definition, so the call is effectively malloc(1). The input just takes a word because %s is defined that way.
Question 4:
char c=malloc(sizeof(char)); // 4)
shound give you a warning: malloc() returns a pointer which you try to put into a char. ITYM char *...
As you continue, you give that pointer to scanf(), which receives e.g. instead of 0x80043214 a mere 0x14, interprets it as a pointer and BAM again.
The correct way would be
char * c=malloc(1024);
scanf("%1024s", c);
printf("%s", c);
Why? Well, you want to read a string. 1 byte is too small, better allocate more.
In scanf() you should take care that you don't allow reading more than your buffer can hold - thus the limitation in the format specifier.
and on printing, you should use %s, because you want the whole string to be printed and not only the first character. (At least, I suppose so.)
Ad Q1: The first is an array of chars with a fixed pointer a pointing to it. sizeof(a) will return something like 20 (strlen(a)+1). Trying to assign something to a (like a = b) will fail, since a is fixed.
The second is a pointer pointing to an array of char and hence is the sizeof(b) usually 4 on 32-bit or 8 on 64-bit. Assigning something to b will work, since the pointer can take a new value.
Of course, *a or *b work on both.
Ad Q2: printf() with the %s argument takes a pointer to a char (those are the "strings" in C). Hence, printf("%s", *b) will crash, since the "pointer" used by printf() will contain the byte value of *b.
What you could do, is printf("%c", *b), but that would only print the first character.
Ad Q3: sizeof(char) is 1 (by definition), hence you allocate 1 byte. The scanf will most likely read more than one byte (remember that each string will be terminated by a null character occupying one char). Hence the scanf will trash memory, likely to cause memory sometime later on.
Ad 4: Maybe that's the trashed memory.
Both declaration are the same.
b point to the first byte so when you say *b it's the first character.
printf("%s", *b)
Will fail as %s accepts a pointer to a string.
char is one byte.

Why does scanf and malloc to char pointer work even if the size is not specified?

I am refreshing my C skills. I am using a char *s and using malloc to allocate memory to the s. Then using scanf, I read the input to s. But my question is I haven't specified a size for the memory chunk. But the program works. How does the memory gets allocated for the arbitrary length of the input string? Is scanf simply incrementing the pointer and writing data into the location?
#include <stdio.h>
#include <stdlib.h>
int main() {
char *s;
s = (char *) malloc(sizeof(s)); //I did not specify how much like malloc(sizeof(s) * 128)
if (s == NULL) {
fprintf(stderr, "\nError allocating memory for string");
exit(1);
}
scanf("%s", s);
puts(s);
free(s);
return 0;
}
/*
Input:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Output:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
*/
With char *s;, sizeof(s) is the same as sizeof(char *) which is either 4 or 8 depending on whether you are on a 32 bit box or a 64 bit box.
IF you are on a 32 bit box then you can store 3 characters plus the null 'end of string' character. IF you store more it may explode.
sizeof(s) returns the size in bytes of s which is of type char*. Typically on a 32 but machine this is 4 bytes and 8 byts on a 64 bit machine. So you actually have told malloc the number of bytes to allocate and s will point to that region of memory.
You did specify a size: sizeof(s). Since s is a char *, sizeof(s) == sizeof(char *). Depending on your platform, this may be 4 or 8 bytes in length.
So, you've effectively allocated 4 (or 8) bytes to store a string. If you type more than 3 (or 7) characters on the command line, then you are going to start writing past the end of the allocated array, which triggers undefined behaviour. With undefined behaviour, anything could happen: your program might look like it works fine, the program might fill the rest of the memory with ZALGO, the program might segfault horribly, or you might encounter the ever-popular nasal demons. The C specification does not specify what happens (hence the term "undefined behaviour").
The fact that your program "works" at all is a complete fluke, and should never be relied upon.
sizeof(s) is you case returns the size of a character pointer, which will be 4 or 8 bytes depending on if you are running on a 32 or 64 bit platform.
You want to use sizeof(*s) instead. However, since the C standard specifies that sizeof(char) (which is what sizeof(*s) will be) is one, so for character arrays you don't need it.
it will only allocate the space =size of char * and than
simply incrementing the pointer and writing data into the location? as you thought.
the answer for why it works is: because its writing it on the memory area which is not allocated to you but if the area is reserved by some other process your program will crash. so better allocate a larger space.
You are only allocating memory of size equal to size of Integer. If you write strings of greater length to this variable, it will just overwrite the existing memory locations and well, your program will show unexpected behavior.

Why does assigning a large number in one byte work in C?

int *ptr = malloc(sizeof(char));
*ptr = 100000;
printf("%d\n", *ptr); // 100000
Shouldn't that only allocate enough memory for a char, i.e. 1 byte? Therefore shouldn't the largest number be 255?
How does it still print 100000?
Update
Thanks for the answers. If it overwrites the next bytes, how does C then know that this number is larger than one byte, and not just look in the first byte?
Because C has no range-checking of memory. It allocates a byte, and then your assignment via the pointer overwrites it and the next three bytes. If you had allocated another bit of memory right after the first malloc, but before the assignment, you might have overwritten part of the heap (depending on how your malloc works).
This is why pointers can be very dangerous in C.
The %d in the format statement (plus the type of the variable) tells the compiler you are looking at an int, and accesses all four bytes.
Note that if you really had assigned the value to a char, e.g. char *ptr; *ptr = 100000;
then with some compilers (and assuming plain char is treated as signed but default) it would have printed out -96, not 255 (or 127). This is because the compiler doesn't automatically limit the value to the highest value that can fit (127 in a signed char, 255 in an unsigned char), but instead it just overflows. Most compilers will complain that you are trying to assign a constant value that overflows the variable.
The reason it is -96, is that 100000 % 256 is 160, but as a signed char it is output as -(256-160).
Short answer: you're invoking undefined behaviour by writing to memory that doesn't belong to you, so you never know what you might get. It might just work, it might crash, or it might do any number of other things. In this case, you're putting an int at the position that ptr references, which writes the first byte of the int into the 1-byte allocated region, and smashes whatever lived in the following three bytes.
When you read the value back with the %d format specifier, printf reads sizeof(int) bytes from memory to print them as an int. If you want to output just the value of one byte, you need to do something like:
printf("%d\n", *(char*)ptr);
That is, tell the compiler that ptr refers to a char, then get that char value, which is promoted to an int in an argument list and subsequently output correctly by the %d specifier.
This is 3 byte overflow. Overflow like this from stackoverflow logo, but on a heap not stack.
The simplest answer would be: it's not guaranteed to work.
Basically, you are corrupting the memory around the pointer.
To answer your updated question; C "sees" *ptr i.e. contents of ptr where ptr points at an integer. So it reads an integer from that ptr. It has already forgotten that you only allocated one character. The two parts (allocation and then access) aren't linked by anything you've expressed in your code.
Generally, malloc will allocate memory with greater granularity than a single byte, so your request for a single byte will reserve a memory space rounded up to the nearest 8 bytes or something. You then write to this space, blithely going right past what you've reserved. Then you dereference the pointer, bringing back an int, and hand that to printf which happily prints it out.
As others have said, depending on this behavior is a bad, bad idea. Allocate exactly what you need and don't overrun your buffers.

Resources