addressSanitizer: heap-buffer-overflow on address - c

I am at the very beginning of learning C.
I am trying to write a function to open a file, read a BUFFER_SIZE, store the content in an array, then track the character '\n' (because I want to get each line of the input).
when I set the BUFFER_SIZE very large, I can get the first line. when I set the BUFFER_SIZE reasonably small (say, 42) which is not yet the end of the first line , it prints out some weird symbol at the end, but I guess it is some bug in my own code.
however, when I set the BUFFER_SIZE very small, say = 10, and i use the -fsanitizer=address to check for memory leak. it throws a monster of error:
==90673==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000fb at pc 0x000108868a95 bp 0x7fff573979a0 sp 0x7fff57397998
READ of size 1 at 0x6020000000fb thread T0
If anyone can explain me in a general sense:
what is fsanitizer=address flag?
what is heap-buffer-overflow?
what is address and thread? what is the flag to see the thread in colors on screen?
and why it says 'read of size 1 at address.." ?
i would really appreciate <3

what is fsanitizer=address flag?
Usually C compiler doesn't add boundaries check for memory access.
Sometimes due to code error, there is read or write from outside the buffer, such an error is usually hard to detect. Using this flag the compiler add some boundaries check, to ensure you won't use a buffer to reach outside of its allocation.
what is heap-buffer-overflow?
use an array to reach after its allocation,
char* x = malloc(10);
char n=x[11]; //heap-buffer-overflow
(underflow is to reach before its allocation)
char* x = malloc(10);
char n=x[-11]; //heap-buffer-underflow
what is address and thread?
Address is position in memory, thread is part of process running sequence of code.
and why it says 'read of size 1 at address.." ?
It means you read single byte form the given address.
I think your problem is that you allocate the BUFFER_SIZE for the buffer and read the same BUFFER_SIZE into it. The correct approach is to always declare at least one more byte than you read.
like this:
char* buff = malloc(BUFFER_SIZE+1);//notice to +1
fread(buff,1,BUFFER_SIZE,fp);

In simple words it is segmentation fault with the variable created using new keyword as all that goes into heap area of memory.
Explanation - you are trying to access such an address for which you haven't declared your variable, to find all such errors revisit all your conditions and check if you are accessing something out of bounds or not.

This can also be rectified by making fast input output by:-
//For FAST I/O
**ios_base::sync_with_stdio(false);
cin.tie(NULL);**

Related

Why is my code working when I haven't allocated enough memory using malloc()?

I have I am doing this problem on SPOJ. http://www.spoj.com/problems/NHAY/. It requires taking input dynamically. In the code below even though I am not allocating memory to char *needle using malloc() - I am taking l = 1 - yet I am able to take input of any length and also it is printing out the entire string. Sometimes it gives runtime error. Why is this when I have not allocated enough memory for the string?
#include<stdio.h>
#include<malloc.h>
#include<ctype.h>
#include<stdlib.h>
int main()
{
long long l;
int i;
char *needle;
while(1){
scanf("%lld",&l);
needle =(char *)malloc(sizeof(char)*l);
scanf("%s",needle);
i=0;
while(needle[i]!='\0'){
printf("%c",needle[i]);
i++;
}
free(needle);
}
}
I also read on stackoverflow that a string is a char * so I should declare char *needle. How can I use this fact in the code? If I take l = 1 then no matter, what the length of the input string it should contain characters only up to the memory allocated for the char * pointer, i.e 1 byte. How can I do that?
Your code is producing an intentional buffer overflow by having sscanf copying a string bigger than the allocated space into the memory allocated by malloc. This "works" because in most cases, the buffer that is allocated is somewhere in the middle of a page so copying more data into the buffer "only" overwrites adjacent data. C (and C++) don't do any array bounds checking on plain C array and thus the error is uncaught.
In the cases where you end up with a runtime error, you most likely copied part of the string into unmapped and unallocated memory, which trigger an access violation.
Memory is usually allocated from the underlying OS in pages of a fixed size. For example, on x86 systems, pages are usually 4k in size. If the mapped address you are writing to is far enough away from the beginning and end of the page, the whole string will fit within the boundaries of the page. If you get close enough to the upper boundary, the code may attempt to write past the boundary, triggering the access violation.
[With some assumptions about the underlying system]
The reason it works for now is that the C library manages pools of memory allocated in pages from the operating system. The operating system only returns pages. The C library returns arbitrary amounts of data.
For your first allocation, you are getting read/write pages allocated by the operating system and managed by the pool. You are going off the edge of the data allocated by the library but are within the page returned by the operating system.
DOing what you are doing will corrupt the structure of the pool and a more extensive program using dynamic memory will eventually crash.
C language do not have default bound check. At the best it will crash while debugging, sometimes it will work as expected. Otherwise you will end up overwriting other memory blocks.
It will not always work. It is Undefined Behaviour.

C: dynamic char-array crashes heap

I have yet again a question about the workings of C. (ANSI-C compiled by VS2012)
I am refactoring a standalone program (.exe) into a .dll. This works fine so far but I stumble accross problems when it comes to logging. Let me explain:
The original program - while running - wrote a log-file and printed information to the screen. Since my dll is going to run on a webserver, accessed by many people simultaneously there is
no real chance to handle log-files properly (and clean up after them)
no console-window anyone would see
So my goal is to write everything that would be put in the log-file or on the screen into string-like variables (I know that there are no strings in C) which I then can later pass on requet to the caller (also a dll, but written in C#).
Since in C such a thing is not possible:
char z88rlog;
z88rlog="First log-entry\n";
z88rlog+="Second log-entry\n";
I have two possibilities:
char z88rlog[REALLY_HUGE];
dynamically allocating memory
In my mind the first way is to be ignored because:
The potential waste of memory is rather enormous
I still may need more memory than REALLY_HUGE, thus creating a buffer overflow
which leaves me with the second way. I have done some work on that and came up with two solutions, either of which doesn't work properly.
/* Solution 1 */
void logpr(char* tmpstr)
{
extern char *z88rlog;
if (z88rlog==NULL)
{
z88rlog=malloc(strlen(tmpstr)+1);
strcpy(z88rlog,tmpstr);
}
else
{
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr));
z88rlog=strcat(z88rlog,tmpstr);
}
}
In solution 1 (equal to solution 2 you will find) I pass my new log-entry through char tmpstr[255];. My "log-file" z88rlog is declared globally, so I need extern to access it. I then check if memory has been allocated for z88rlog. If no I allocate memory the size of my log-entry (+1 for my \0) and copy the contents of tmpstr into z88rlog. If yes I realloc memory for z88rlog in the size of what it has been + the length of tmpstr (+1). Then the two "string" are joined, using strcat. Using breakpoints an the direct-window I obtainded the following output:
z88rlog
0x00000000 <Schlechtes Ptr>
z88rlog
0x0059ef80 "start Z88R version 14OS"
z88rlog
0x0059ef80 "start Z88R version 14OS
opening file Z88.DYNÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍýýýý««««««««þîþîþîþ"
It shows three consecutive calls of logpr (breakpoint before strcpy/strcat). The indistinguable gibberish at the end results from memory allocation. After that VS gives out an error message that something caused the debugger to set a breakpoint in realloc.c. Because this obviously doesn't work I concocted my wonderful solution 2:
/* Solution 2 */
void logpr(char* tmpstr)
{
extern char *z88rlog;
char *z88rlogtmp;
if (z88rlog==NULL)
{
z88rlog=malloc(strlen(tmpstr)+1);
strcpy(z88rlog,tmpstr);
}
else
{
z88rlogtmp=malloc(strlen(z88rlog)+strlen(tmpstr+1));
z88rlogtmp=strcat(z88rlog,tmpstr);
free(z88rlog);
z88rlog=malloc(strlen(z88rlogtmp)+1);
memcpy(z88rlog,z88rlogtmp,strlen(z88rlogtmp)+1);
free(z88rlogtmp);
}
}
Here my aim is to create a copy of my log-file, free the originals' memory create new memory for the original in the new size and copy the contents back. And don't forget to free the temporary copy since it's allocated via malloc. This crashes instantly when it reaches free, again telling me that the heap might be broken.
So lets comment free for the time being. This does work better - much to my relief - but while building the log-string suddenly not all characters from z88rlogtmp get copied. But everything still works kind of properly. Until suddenly I am told again that the heap might be broken and the debugger puts a breakpoint at the end of _heap_alloc (size_t size) in malloc.c size has - according to the debugger - the value of 1041.
So I have 2 (or 3) ways I want to achieve this "string-growing" but none works. Might the error giving me the size point me to the conclusion that the array has become to big? I hope I explained well what I want to do and someone can help me :-) Thanks in advance!
irony on Maybee I should just go and buy some new heap for the computer. Does it fit in RAM-slots? Can anyone recomend a good brand? irony off
This is one mistake in Solution 1:
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr));
as no space is allocated for the terminating null character. Note that you must store the result of realloc() to a temporary variable to avoid memory leak in the event of failure. To correct:
char* tmp = realloc(z88rlog, strlen(z88rlog) + strlen(tmpstr) + 1);
if (tmp)
{
z88rlog = tmp;
/* ... */
}
Mistakes in Solution 2:
z88rlogtmp=malloc(strlen(z88rlog)+strlen(tmpstr+1));
/*^^^^^^^^^*/
it is calulating one less than the length of tmpstr. To correct:
z88rlogtmp=malloc(strlen(z88rlog) + strlen(tmpstr) + 1);
Pointer reassignment resulting in undefined behaviour:
z88rlogtmp=strcat(z88rlog,tmpstr);
/* Now, 'z88rlogtmp' and 'z88rlog' point to the same memory. */
free(z88rlog);
/* 'z88rlogtmp' now points to deallocated memory. */
z88rlog=malloc(strlen(z88rlogtmp)+1);
/* This call ^^^^^^^^^^^^^^^^^^ is undefined behaviour,
and from this point on anything can happen. */
memcpy(z88rlog,z88rlogtmp,strlen(z88rlogtmp)+1);
free(z88rlogtmp);
Additionally, if the code is executing within a Web Server it is almost certainly operating in a multi-threaded environment. As you have a global variable it will need synchronized access.
You seem to have many problems. To start with in your realloc call you don't allocate space for the terminating '\0' character. In your second solution you have strlen(tmpstr+1) which isn't correct. In your second solution you also use strcat to append to the existing buffer z88rlog, and if it's not big enough you overwrite unallocated memory, or over data allocated for something else. The first argument to strcat is the destination, and that is what is returned by the function as well so you loose the newly allocated memory too.
The first solution, with realloc, should work fine, if you just remember to allocate that extra character.
In solution 1, you would need to allocate space for terminating NULL character. Hence, the realloc should include one more space i.e.
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr) + 1);
In second solution, I am not sure of this z88rlogtmp=strcat(z88rlog,tmpstr); because z88rlog is the destination string. In case you wish to perform malloc only, then
z88rlogtmp=malloc(strlen(z88rlog)+1 // Allocate a temporary string
strcpy(z88rlogtmp,z88rlog); // Make a copy
free(z88rlog); // Free current string
z88rlog=malloc(strlen(z88rlogtmp)+ strlen(tmpstr) + 1)); //Re-allocate memory
strcpy(z88rlog, z88rlogtmp); // Copy first string
strcat(z88rlog, tmpStr); // Concatenate the next string
free(z88rlogtmp); // Free the Temporary string

Why does my program throw a segmentation fault while using heap-allocated memory?

After writing a program to reverse a string, I am having trouble understanding why I got a seg fault while trying to reverse the string. I have listed my program below.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void reverse(char *);
int main() {
char *str = calloc(1,'\0');
strcpy(str,"mystring0123456789");
reverse(str);
printf("Reverse String is: %s\n",str);
return 0;
}
void reverse(char *string) {
char ch, *start, *end;
int c=0;
int length = strlen(string);
start = string;
end = string;
while (c < length-1){
end++;
c++;
}
c=0;
while(c < length/2){
ch = *end;
*end = *start;
*start = ch;
start++;
end--;
c++;
}
}
1st Question:
Even though I have allocated only 1 byte of memory to the char pointer
str (calloc(1,'\0')), and I copied a 18 bytes string mystring0123456789 into it, and it didn't throw any error and the program worked fine without any SEGFAULT.
Why did my program not throw an error? Ideally it should throw some error as it don't have any memory to store that big string. Can someone throw light on this?
The program ran perfectly and gives me output Reverse String is: 9876543210gnirtsym.
2nd Question:
If the replace the statement
strcpy(str,"mystring0123456789");
with
str="mystring0123456789\0";
the program gives segmentation fault even though I have allocated enough memory for str (malloc(100)).
Why the program throwing segmentation fault?
Even though i have allocated only 1 byte of memory to the char pointer str(calloc(1,'\0')), and i copied a 18 bytes string "mystring0123456789" into it, and it didn't throw any error and the program worked fine without any SEGFAULT.
Your code had a bug -- of course it's not going to do what you expect. Fix the bug and the mystery will go away.
If the replace the statement
strcpy(str,"mystring0123456789");
with
str="mystring0123456789\0";
the program gives segmentation fault even though i have allocated enough memory for str (malloc(100)).
Because when you finish this, str points to a constant. This throws away the previous value of str, a pointer to memory you allocated, and replaces it with a pointer to that constant.
You cannot modify a constant, that's what makes it a constant. The strcpy function copies the constant into a variable which you can then modify.
Imagine if you could do this:
int* h = &2;
Now, if you did *h = 1; you'd be trying to change that constant 2 in your code, which of course you can't do.
That's effectively what you're doing with str="mystring0123456789\0";. It makes str point to that constant in your source code which, of course, you can't modify.
There's no requirement that it throw a segmentation fault. All that happens is that your broken code invokes undefined behavior. If that behavior has no visible effect, that's fine. If it formats the hard drive and paints the screen blue, that's fine too. It's undefined.
You're overwriting the pointer value with the address of a string literal, which totally doesn't use the allocated memory. Then you try to reverse the string literal which is in read-only memory, which causes the segmentation fault.
Your program did not throw an error because, even though you did the wrong thing, ncaught you (more below). You wrote data were you were not supposed to, but you got “lucky” and did not break anything by doing this.
strcpy(str,"mystring0123456789"); copies data into the place where str points. It so happens that, at that place, you are able to write data without causing a trap (this time). In contrast, str="mystring0123456789\0"; changes str to point to a new place. The place it points to is the place where "mystring0123456789\0" is stored. That place is likely read-only memory, so, when you try to write to it in the reverse routine, you get a trap.
More about 1:
When calloc allocates memory, it merely arranges for there to be some space that you are allowed to use. Physically, there is other memory present. You can write to that other memory, but you should not. This is exactly the way things work in the real world: If you rent a hotel room, you are allowed to use that hotel room, but it is wrong for you to use other rooms even if they happen to be open.
Sometimes when you trespass where you are not supposed to, in the real world or in a program, nobody will see, and you will get away with it. Sometimes you will get caught. The fact that you do not get caught does not mean it was okay.
One more note about calloc: You asked it to allocate space for one thing of zero size (the source code '\0' evaluates to zero). So you are asking for zero bytes. Various standards (such as C and Open Unix) may say different things about this, so it may be that, when you ask for zero bytes, calloc gives you one byte. However, it certainly does not give you as many bytes as you wrote with strcpy.
It sounds like you are writing C programs having come from a dynamic language or at least a language that does automatic string handling. For lack of a more formal definition, I find C to be a language very close to the architecture of the machine. That is, you make a lot of the programming decisions. A lot of your program problems are the result of your code causing undefined behavior.You got a segfault with strcpy, because you copied memory into a protected location; the behavior was undefined. Whereas, assigning your fixed string "mystring0123456789\0" was just assigning that pointer to str.
When you implement in C, you decide whether you want to define your storage areas at compile or run-time, or decide to have storage allocated from the heap (malloc/calloc). In either case, you have to write housekeeping routines to make sure you do not exceed the storage you have defined.
Assigning a string to a pointer merely assigns the string's address in memory; it does not copy the string, and a fixed string inside quotes "test-string" is read-only, and you cannot modify it. Your program may have worked just fine, having done that assignment, even though it would not be considered good C coding practice.
There are advantages to handling storage allocations this way, which is why C is a popular language.
Another case is that you can have a segfault when you use memory correct AND your heap became so big that your physical memory cannot manage it (without overlap with stack|text|data|bss -> link)
Proof: link, section Possible Cause #2

compression algorithm

I'm working on a compression algorithm wherein we have to write code in C. The program takes a file and removes the most significant bit in every character and stores the compressed text in another file. I wrote a function called compress as shown below. I'm getting a seg fault while freeing out_buf. Any help will be a great pleasure.
You close out_fd twice, so of course the second time it is an invalid file descriptor. But more than that, you need to review your use of sizeof() which is NOT the same as finding the buffer size of a dynamically-allocated buffer (sizeof returns the size of the pointer, not the buffer). You don't show the calling code, but using strcat() on a buffer passed-in is always worth a look too (is the buffer passed by the caller large enough for the result?).
Anyway, that should be enough to get you going again...
You're closing twice the same file descriptor
close(out_fd);
if ( close(out_fd) == -1 )
oops("Error closing output file", "");
Just remove the first close(out_fd)
The segmentation fault is because you moved the out_buf pointer.
If you want to put values inside his malloc'd area, use another temp pointer and move it through this memory area.
Like this:
unsigned char *out_buf = malloc(5400000*7/8);
unsigned char *tmp_buf = out_buf;
then subst every *out_buf++ with *tmp_buf++;
Change also the out_buf inside the write call with tmp_buf

c debug problem, free method

I have the problem in C program:
char *str = (char *) malloc(20);
strcpy_s(str, 10, "abcdefghij");
//here I change one byte before str and one byte after
*((int*)str-1) = 10;
*((int*)(str+20)) = 10;
//and it stops on the..
free(str);
line during the debug
what's wrong?
The part with overwriting not allocated memory is the part of the task. I know that usually it's not correct, but in this context it is the part of the task.
Why on earth would you think that you have the right to change anything located at str-1? You don't :)
It appears you have yet another problem, which, because of the vividness of the first one went past my attention. the addresses you may acces vary from str + 0 to str + 19. str + 20 is out of your realm :)
Both these things result in what's called undefined behavior. Which means you can't get surprised at any behavior! Including failing free, debugger crash, or whatever else
You're not allowed to write to str+20 because you only requested 20 bytes, so str+19 is the last byte you own. And the same goes for str-1.
Writing to memory outside what you allocated gives undefined behavior.
In this particular case, we can guess that the heap manager probably has some book-keeping information about each block of memory stored just before the memory that it hands to you. When you write to the byte before your allocated block, you're overwriting some part of that, so it can no longer free the block correctly.
*((int*)str-1) is often used to store the length of the allocated space, so that free knows how much byte to free...

Resources