char pointer overflow - c

I am new to C and I was wondering if it was possible for a pointer to be overflowed by a vulnerable c function like strcpy(). I have seen it a lot in source code, is it a way to avoid buffer overflows?

Yes it is. This is in fact the classic cause of buffer overflow vulnerabilities. The only way to avoid overflowing the buffer is to ensure that you don't do anything that can cause the overflow. In the case of strcpy the solution is to use strncpy which includes the size of the buffer into which the string is being copied.

Sure, if you don't allocate enough space for a buffer, then you certainly can:
char* ptr = (char*)malloc(3);
strcpy(ptr, "this is very, very bad"); /* ptr only has 3 bytes allocated! */
However, what's really bad is that this code could work without giving you any errors, but it may overwrite some memory somewhere that could cause your program to blow up later, seemingly randomly, and you could have no idea why. Those are the source of hours (sometimes even days) of frustration, which anyone whose spent any significant amount of time writing C will tell you.
That is why with C, you have to be extremely careful with such things, and double, triple, nth degree check your code. After that, check it again.

Some other approaches are
#define MAX_LENGTH_NAME 256
foo()
{
char a[MAX_LENGTH_NAME+1]; // You can also use malloc here
strncpy(a,"Foxy",MAX_LENGTH_NAME);
snprintf(a,MAX_LENGTH_NAME,"%s","Foxy");
}
So its good to know the size of allocated memory and then use the calls to avoid buffer overflows.
Static analysis of already written code may point out these kinds of mistakes and you can change it too.

Related

avoiding char buffer overflow more efficiently

i wrote a simple in/out program
whenever i run it and enter the input and exceed the char limit i get
*** stack smashing detected ***: terminated Aborted (core dumped)
i searched it up and found it was a gcc thing for safety,i heard it might lead to seg faults so i experimented turning it off with -fno-stack-protector and it ran normally if i exceeded the char limit
but what if i want to write the program if the input length is unknown, is there a safer way to do this? more efficient that increasing the value in char to an ridiculously large value?
the code:
#include <stdio.h>
int main()
{
char in[1];
printf("in: ");
scanf("%s\0", &in);
printf("\nout: %s\n", in);
}
P.s- im new to C, >2 days old so a simple explanation would be appreciated
char in[1]; can hold only the empty string (a single null terminating byte), which is impossible to use safely with scanf.
Also note that explicitly stating the null terminating byte in a string literal is superfluous, as they are implicitly null terminated.
but what if i want to write the program if the input length is unknown, is there a safer way to do this? more efficient that increasing the value in char to an ridiculously large value?
The counter-questions here are:
What do you consider inefficient?
What do you define as ridiculously large?
As I see it, you have two options:
Use dynamically allocated memory to read strings of an arbitrary size.
Set a realistic upper limit on the length of of input to expect.
An example of #1 can be seen in library functions like POSIX getline or getdelim. Its re-implementation can be as simple as as malloc (realloc), getchar, and a loop.
The use of #2 depends greatly on the context of your program, and what it is supposed to do. Perhaps you a reading a single line, and a smallish buffer will suffice. Maybe you are expecting a larger chunk of data, and need more memory. Only you can decide this for yourself.
In any case, its up to you to avoid undefined behavior by preventing overflows before they happen. It is already too late if one has occurred.
Use field-width specifiers when using %s:
char buf[512];
if (1 != scanf("%511s", buf))
/* read error */;
or use sane functions like fgets, which allow you to pass a buffer size as an argument.
stack smashing detected
i searched it up and found it was a gcc thing for safety
That's indeed gcc's way of spotting run-time bugs in your code by inserting so-called "stack canaries" to spot stack corruption/overflows. More errors detected is a good thing.
i heard it might lead to seg faults
No, bugs in your application lead to seg faults. If the compiler provides ways to detect them before the OS, that's a good thing. Dormant but severe bugs in your program is a bad thing. However, the OS would possibly detect the bug too and say "seg fault".
so i experimented turning it off with -fno-stack-protector and it ran normally if i exceeded the char limit
Basically you know that you are an inexperienced driver and afraid you might hit other cars. To solve, this you drive with your eyes closed instead, so you won't see those cars you could hit. That doesn't mean that they disappear.
char in[1]; can only hold 1 byte of data and if you read out of bounds of this array, you invoke undefined behavior, which will manifest itself as stack smashing or seg faults. Because you are trying to write to memory that doesn't belong to you. This is the bug, this is the problem. The correct solution is to allocate enough memory.
(You also have a bug scanf("%s\0", &in); -> scanf("%s\0", in);. The & isn't needed since in is an array and automatically "decays" into a pointer to its first element when you pass it to a function.)
One sensible way is to allocate 128 bytes or so, and then restrict the input so that it cannot be more than 128 bytes. The proper function to read strings with restricted input length is fgets. So you could either switch to fgets or you could accept that your beginner trial programs need not live up to production quality and just use scanf for now. (You can use scanf safely as shown in another answer, but IMO that's just more cumbersome than using fgets.)
Also I would strongly advise C beginners not to worry about if they allocate 10 bytes or 100 bytes. Learn programming by using a PC and then it won't matter. Optimizing memory consumption is an advanced topic which you will learn later on.

How come this piece of code is working? It should crash due to buffer over-run but I am getting the output as "stackoverflow"

Thisprogram should crash due to buffer overrun. But I am getting output as "stackoverflow". How?
#include<stdio.h>
#include<string.h>
int main()
{
char *src;
char dest[10];
src = (char*)malloc(5);
strcpy(src, "stackoverflow");
printf("%s\n", src);
return 0;
}
It does crash due to a buffer overrun.
The behaviour of your code is undefined as you are overrunning your buffer. You can't expect the behaviour to be in any way predictable.
It's difficult - and not required by the c standard - to issue an appropriate diagnostic in such cases.
Buffer overflows are not guaranteed to crash you: they cause undefined behavior. While a lot of platforms make the sequence of events that may or may not culminate in a crash rather predictable, one very important thing to consider is that the possible crash almost never happens at the same time that the damage is caused.
In a stack buffer overflow, possible crashes happens when you read the value of a variable that sat on the stack and was overflowed onto, or when you return from the function and the return address has been overwritten.
However, you're not overflowing a stack buffer: you're overflowing a heap buffer that you got from malloc. Typically, possible crashes there happens when you free that buffer or try to use a buffer that happened to be contiguous to it (there is, on purpose, no way to predict this). You allocate only one buffer and never free it, so you're not going to observe any problem from a small overflow.
In addition, I don't know any mainstream malloc implementation on desktops that returns blocks of less than 32 bytes, so even though you said malloc(5), you probably have room for 32 bytes, so your short write is not overflowing on anything (although you must not rely on this).
The only case where an overflow will straight-up crash your program is if you overflow to a memory location that has not been assigned any meaning. For instance, if you do something like memset('c', dest, 100000000), that will probably happen because you'll be busting out of the memory area that is reserved to the stack and there is probably nothing next to it.
Copying to a buffer that is too small is undefined behavior; that doesn't necessarily mean it's guaranteed to crash. For all we know those other bytes occupying the "overflow\0" part of your string aren't being used anyway.
Because unless you are using some overrun-protection library/debugging tool, nothing will notice that you’re writing to memory you shouldn’t be. If you run this under valgrind it will display that you wrote to memory you shouldn’t have. But malloc(5) returns a pointer into a likely larger block of memory, so the chances of the buffer overflow resulting in trying to access an unmapped address is low. But if you had other malloc() calls, etc., you might notice the "overflow" part ending up in one of those other buffers—but it really depends on the implementation of malloc() and what code that overflow breaks won’t be deterministic.
Your buffer is allocated in the heap so your pointer src is pointing to buffer of char basicly of size 5 bytes because the size of char is 1 byte, however if the size of this allocated buffer + the added size by copying the string into this buffer doesn't exceed the size of the heap then it will work ,in the other hand if the total size try to overwrite an allocat memory by other pointer then you get the crash or the size exceed the heap size limitation you get the crash
As conclusion avoid this kind of code because you will get an unexpected behavior.

malloc() in C not working as expected

I'm new to C. Sorry if this has already been answered, I could'n find a straight answer, so here we go..
I'm trying to understand how malloc() works in C. I have this code:
#define MAXLINE 100
void readInput(char **s)
{
char temp[MAXLINE];
printf("Please enter a string: ");
scanf("%s", temp);
*s = (char *)malloc((strlen(temp)+1)*sizeof(char)); // works as expected
//*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
strcpy ((char *)*s, temp);
}
int main()
{
char *str;
readInput(&str);
printf("Your string is %s\n", str);
free(str);
return 0;
}
The question is why doesn't the program crash (or at least strip the remaining characters) when I call malloc() like this:
*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
Won't this cause a buffer overflow if I enter a string with more than two characters? As I understood malloc(), it allocates a fixed space for data, so surely allocating the space for only two chars would allow the string to be maximum of one usable character ('0\' being the second), but it still is printing out all the 10 chars entered.
P.S. I'm using Xcode if that makes any difference.
Thanks,
Simon
It works out fine because you're lucky! Usually, a block a little larger than just 2 bytes is given to your program by your operating system.
If the OS actually gave you 16 bytes when you asked for 2 bytes, you could write 16 bytes without the OS taking notice of it. However if you had another malloc() in your program which used the other 14 bytes, you would write over that variables content.
The OS doesn't care about you messing about inside your own program. Your program will only crash if you write outside what the OS has given you.
Try to write 200 bytes and see if it crashes.
Edit:
malloc() and free() uses some of the heap space to maintain information about allocated memory. This information is usually stored in between the memory blocks. If you overflow a buffer, this information may get overwritten.
Yes writing more data into an allocated buffer is a buffer overflow. However there is no buffer overflow check in C and if there happens to be valid memory after your buffer than your code will appear to work correctly.
However what you have done is write into memory that you don't own and likely have corrupted the heap. Your next call to free or malloc will likely crash, or if not the next call, some later call could crash, or you could get lucky and malloc handed you a larger buffer than you requested, in which case you'll never see an issue.
Won't this cause a buffer overflow if I enter a string with more than two characters?
Absolutely. However, C does no bounds checking at runtime; it assumes you knew what you were doing when you allocated the memory, and that you know how much is available. If you go over the end of the buffer, you will clobber whatever was there before.
Whether that causes your code to crash or not depends on what was there before and what you clobbered it with. Not all overflows will kill your program, and overflow in the heap may not cause any (obvious) problems at all.
This is because even if you did not allocate the memory, the memory exists.
You are accessing data that is not yours, and probably that with a good debugger, or static analyzer you would have seen the error.
Also if you have a variable that is just behind the block you allocated it will probably be overriden by what you enter.
Simply this is one of the case of undefined behavior. You are unlucky that you are getting the expected result.
It does cause a buffer overflow. But C doesn’t do anything to prevent a buffer overflow. Neither do most implementations of malloc.
In general, a crash from a buffer overflow only occurs when...
It overflows a page—the unit of memory that malloc actually gets from the operating system. Malloc will fulfill many individual allocation requests from the same page of memory.
The overflow corrupts the memory that follows the buffer. This doesn’t cause an immediate crash. It causes a crash later when other code runs that depends upon the contents of that memory.
(...but these things depend upon the specifics of the system involved.)
It is entirely possible, if you are lucky, that a buffer overflow will never cause a crash. Although it may create other, less noticeable problems.
malloc() is the function call which is specified in Stdlib.h header file. If you are using arrays, you have to fix your memory length before utilize it. But in malloc() function, you can allocate the memory when you need and in required size. When you allocate the memory through malloc() it will search the memory modules and find the free block. even the memory blocks are in different places, it will assign a address and connect all the blocks.
when your process finish, you can free it. Free means, assigning a memory is in RAM only. once you process the function and make some data, you will shift the data to hard disk or any other permenant storage. afterwards, you can free the block so you can use for another data.
If you are going through pointer function, with out malloc() you can not make data blocks.
New() is the keyword for c++.
When you don't know when you are programming how big is the space of memory you will need, you can use the function malloc
void *malloc(size_t size);
The malloc() function shall allocate unused space for an object whose size in bytes is specified by size and whose value is unspecified.
how does it work is the question...
so
your system have the free chain list, that lists all the memory spaces available, the malloc search this list until it finds a space big enough as you required. Then it breaks this space in 2, sends you the space you required and put the other one back in the list. It breaks in pieces of size 2^n that way you wont have weird space sizes in your list, what makes it easy just like Lego.
when you call 'free' your block goes back to the free chain list.

Malloc -> how much memory has been allocated?

# include <stdio.h>
# include <stdbool.h>
# include <string.h>
# include <stdlib.h>
int main ()
{
char * buffer;
buffer = malloc (2);
if (buffer == NULL){
printf("big errors");
}
strcpy(buffer, "hello");
printf("buffer is %s\n", buffer);
free(buffer);
return 0;
}
I allocated 2 bytes of memory to the pointer/char buffer yet if I assign the C-style string hello to it, it still prints the entire string, without giving me any errors. Why doesn't the compiler give me an error telling me there isn't enough memory allocated? I read a couple of questions that ask how to check how much memory malloc actually allocates but I didn't find a concrete answer. Shouldn't the free function have to know exactly how much memory is allocated to buffer?
The compiler doesn't know. This is the joy and terror of C. malloc belongs to the runtime. All the compilers knows is that you have told it that it returns a void*, it has no idea how much, or how much strcpy is going to copy.
Tools like valgrind detect some of these errors. Other programming languages make it harder to shoot yourself in the foot. Not C.
No production malloc() implementation should prevent you from trying to write past what you allocated. It is assumed that if you allocate 123 bytes, you will use all or less than what you allocated. malloc(), for efficiency sake, has to assume that a programmer is going to keep track of their pointers.
Using memory that you didn't explicitly and successfully ask malloc() to give you is undefined behavior. You might have asked for n bytes but got n + x, due to the malloc() implementation optimizing for byte alignment. Or you could be writing to a black hole. You never can know, that's why it's undefined behavior.
That being said ...
There are malloc() implementations that give you built in statistics and debugging, however these need to be used in lieu of the standard malloc() facility just like you would if you were using a garbage collected variety.
I've also seen variants designed strictly for LD_PRELOAD that expose a function to allow you to define a callback with at least one void pointer as an argument. That argument expects a structure that contains the statistical data. Other tools like electric fence will simply halt your program on the exact instruction that resulted in an overrun or access to invalid blocks. As #R.. points out in comments, that is great for debugging but horribly inefficient.
In all honesty or (as they say) 'at the end of the day' - it's much easier to use a heap profiler such as Valgrind and its associated tools (massif) in this case which will give you quite a bit of information. In this particular case, Valgrind would have pointed out the obvious - you wrote past the allocated boundary. In most cases, however when this is not intentional, a good profiler / error detector is priceless.
Using a profiler isn't always possible due to:
Timing issues while running under a profiler (but those are common any time calls to malloc() are intercepted).
Profiler is not available for your platform / arch
The debug data (from a logging malloc()) must be an integral part of the program
We used a variant of the library that I linked in HelenOS (I'm not sure if they're still using it) for quite a while, as debugging at the VMM was known to cause insanity.
Still, think hard about future ramifications when considering a drop in replacement, when it comes to the malloc() facility you almost always want to use what the system ships.
How much malloc internally allocates is implementation-dependent and OS-dependent (e.g. multiples of 8 bytes or more). Your writing into the un-allocated bytes may lead to overwriting other variable's values even if your compiler and run-time dont detect the error. The free-function remembers the number of bytes allocated separate from the allocated region, for example in a free-list.
Why doesnt the compiler give me an
error telling me there isnt enough
memory allocated ?
C does not block you from using memory you should not. You can use that memory, but it is bad and result in Undefined Behaviour. You are writing in a place you should not. This program might appear as running correctly, but might later crash. This is UB. you do not know what might happen.
This is what is happening with your strcpy(). You write in place you do not own, but the language does not protect you from that. So you should make sure you always know what and where you are writing, or make sure you stop when you are about to exceed valid memory bounds.
I read a couple of questions that ask
how to check how much memory malloc
actually allocates but I didn't find a
concrete answer. Shouldn't the 'free'
function have to know how much memory
is exactly allocated to 'buffer' ?
malloc() might allocate more memory than you request cause of bit padding.
More : http://en.wikipedia.org/wiki/Data_structure_alignment
free() free-s the exact same amount you allocated with malloc(), but it is not as smart as you think. Eg:
int main()
{
char * ptr = malloc(10);
if(ptr)
{
++ptr; // Now on ptr+1
free(ptr); // Undefined Behaviour
}
}
You should always free() a pointer which points to the first block. Doing a free(0) is safe.
You've written past the end of the buffer you allocated. The result is undefined behavior. Some run time libraries with the right options have at least some ability to diagnose problems like this, but not all do, and even those that do only do so at run-time, and usually only when compiled with the correct options.
Malloc -> how much memory has been allocated?
When you allocate memory using malloc. On success it allocates memory and default allocation is 128k. first call to malloc gives you 128k.
what you requested is buffer = malloc (2); Though you requested 2 bytes. It has allocated 128k.
strcpy(buffer, "hello"); Allocated 128k chunk it started processing your request. "Hello"
string can fit into this.
This pgm will make you clear.
int main()
{
int *p= (int *) malloc(2);---> request is only 2bytes
p[0]=100;
p[1]=200;
p[2]=300;
p[3]=400;
p[4]=500;
int i=0;
for(;i<5;i++,p++)enter code here
printf("%d\t",*p);
}
On first call to malloc. It allocates 128k---> from that it process your request (2 bytes). The string "hello" can fit into it. Again when second call to malloc it process your request from 128k.
Beyond 128k it uses mmap interface. You can refer to man page of malloc.
There is no compiler/platform independent way of finding out how much memory malloc actually allocated. malloc will in general allocation slightly more than you ask it for see here:
http://41j.com/blog/2011/09/finding-out-how-much-memory-was-allocated/
On Linux you can use malloc_usable_size to find out how much memory you can use. On MacOS and other BSD platforms you can use malloc_size. The post linked above has complete examples of both these techniques.

zeroing out memory

gcc 4.4.4 C89
I am just wondering what most C programmers do when they want to zero out memory.
For example, I have a buffer of 1024 bytes. Sometimes I do this:
char buffer[1024] = {0};
Which will zero all bytes.
However, should I declare it like this and use memset?
char buffer[1024];
.
.
memset(buffer, 0, sizeof(buffer));
Is there any real reason you have to zero the memory? What is the worst that can happen by not doing it?
The worst that can happen? You end up (unwittingly) with a string that is not NULL terminated, or an integer that inherits whatever happened to be to the right of it after you printed to part of the buffer. Yet, unterminated strings can happen other ways, too, even if you initialized the buffer.
Edit (from comments) The end of the world is also a remote possibility, depending on what you are doing.
Either is undesirable. However, unless completely eschewing dynamically allocated memory, most statically allocated buffers are typically rather small, which makes memset() relatively cheap. In fact, much cheaper than most calls to calloc() for dynamic blocks, which tend to be bigger than ~2k.
c99 contains language regarding default initialization values, I can't, however, seem to make gcc -std=c99 agree with that, using any kind of storage.
Still, with a lot of older compilers (and compilers that aren't quite c99) still in use, I prefer to just use memset()
I vastly prefer
char buffer[1024] = { 0 };
It's shorter, easier to read, and less error-prone. Only use memset on dynamically-allocated buffers, and then prefer calloc.
When you define char buffer[1024] without initializing, you're going to get undefined data in it. For instance, Visual C++ in debug mode will initialize with 0xcd. In Release mode, it will simply allocate the memory and not care what happens to be in that block from previous use.
Also, your examples demonstrate runtime vs. compile time initialization. If your char buffer[1024] = { 0 } is a global or static declaration, it will be stored in the binary's data segment with its initialized data, thus increasing your binary size by about 1024 bytes (in this case). If the definition is in a function, it's stored on the stack and is allocated at runtime and not stored in the binary. If you provide an initializer in this case, the initializer is stored in the binary and an equivalent of a memcpy() is done to initialize buffer at runtime.
Hopefully, this helps you decide which method works best for you.
In this particular case, there's not much difference. I prefer = { 0 } over memset because memset is more error-prone:
It provides an opportunity to get the bounds wrong.
It provides an opportunity to mix up the arguments to memset (e.g. memset(buf, sizeof buf, 0) instead of memset(buf, 0, sizeof buf).
In general, = { 0 } is better for initializing structs too. It effectively initializes all members as if you had written = 0 to initialize each. This means that pointer members are guaranteed to be initialized to the null pointer (which might not be all-bits-zero, and all-bits-zero is what you'd get if you had used memset).
On the other hand, = { 0 } can leave padding bits in a struct as garbage, so it might not be appropriate if you plan to use memcmp to compare them later.
The worst that can happen by not doing it is that you write some data in character by character and later interpret it as a string (and you didn't write a null terminator). Or you end up failing to realise a section of it was uninitialised and read it as though it were valid data. Basically: all sorts of nastiness.
Memset should be fine (provided you correct the sizeof typo :-)). I prefer that to your first example because I think it's clearer.
For dynamically allocated memory, I use calloc rather than malloc and memset.
One of the things that can happen if you don't initialize is that you run the risk of leaking sensitive information.
Uninitialized memory may have something sensitive in it from a previous use of that memory. Maybe a password or crypto key or part of a private email. Your code may later transmit that buffer or struct somewhere, or write it to disk, and if you only partially filled it the rest of it still contains those previous contents. Certain secure systems require zeroizing buffers when an address space can contain sensitive information.
I prefer using memset to clear a chunk of memory, especially when working with strings. I want to know without a doubt that there will be a null delimiter after my string. Yes, I know you can append a \0 on the end of each string and some functions do this for you, but I want no doubt that this has taken place.
A function could fail when using your buffer, and the buffer remains unchanged. Would you rather have a buffer of unknown garbage, or nothing?
This post has been heavily edited to make it correct. Many thanks to Tyler McHenery for pointing out what I missed.
char buffer[1024] = {0};
Will set the first char in the buffer to null, and the compiler will then expand all non-initialized chars to 0 too. In such a case it seems that the differences between the two techniques boil down to whether the compiler generates more optimized code for array initialization or whether memset is optimized faster than the generated compiled code.
Previously I stated:
char buffer[1024] = {0};
Will set the first char in the buffer
to null. That technique is commonly
used for null terminated strings, as
all data past the first null is
ignored by subsequent (non-buggy)
functions that handle null terminated
strings.
Which is not quite true. Sorry for the miscommunication, and thanks again for the corrections.
Depends how you're filling it: if you're planning on writing to it before even potentially reading anything, then why bother? It also depends what you're going to use the buffer for: if it's going to be treated as a string, then you just need to set the first byte to \0:
char buffer[1024];
buffer[0] = '\0';
However, if you're using it as a byte stream, then the contents of the entire array are probably going to be relevant, so memseting the entire thing or setting it to { 0 } as in your example is a smart move.
I also use memset(buffer, 0, sizeof(buffer));
The risk of not using it is that there is no guarantee that the buffer you are using is completely empty, there might be garbage which may lead to unpredictable behavior.
Always memset-ing to 0 after malloc, is a very good practice.
yup, calloc() method defined in stdlib.h allocates memory initialized with zeros.
I'm not familiar with the:
char buffer[1024] = {0};
technique. But assuming it does what I think it does, there's a (potential) difference to the two techniques.
The first one is done at COMPILE time, and the buffer will be part of the static image of the executable, and thus be 0's when you load.
The latter will be done at RUN TIME.
The first may incur some load time behaviour. If you just have:
char buffer[1024];
the modern loaders may well "virtually" load that...that is, it won't take any real space in the file, it'll simply be an instruction to the loader to carve out a block when the program is loaded. I'm not comfortable enough with modern loaders say if that's true or not.
But if you pre-initialize it, then that will certainly need to be loaded from the executable.
Mind, neither of these have "real" performance impacts in the small. They may not have any in the "large". Just saying there's potential here, and the two techniques are in fact doing something quite different.

Resources