How to solve heap buffer overflow - c

I am having hard time solving problem with my program. It seems to be running as it should. But when I compile it with adress sanitizer I get heap buffer overflow. The function gets a 2d array containg a word to look for - replace [a][0] and a word next to it, which will be later used to replace it.
const char * findInArray(const char * (*replace)[2], char * start) // function goes through a array 'start' and searches if the word is also in 'replace'
{
char * copy = (char *) malloc (strlen(start)+1); // I made a copy of a field, so I do not modify the one I passed to function
char * total = (char *) malloc (strlen(start)+2); // Here I will add words from copy one by one
memmove(copy,start,strlen(start)+1); // I fill the copy array
char * tokens = strtok (copy," "); // I split copy array
while (tokens != NULL)
{
printf("%lu ",strlen(total));
memmove(total+strlen(total)+1,tokens,strlen(tokens)); // I add words to a new array one by one
memmove (total + strlen(total)," ",1); // at the end of each word I add space
for (int i = 0 ; replace[i][0] != NULL; i++) // I search if word is in array or not, if yes I return its adress
{const char *ptr = strstr(total,replace[i][0]);
if (ptr != NULL) // If there is match - I return pointer to a word that will be replaced
{
free (total);
return replace[i][0];
}
}
//printf("%s\n",tokens);
tokens = strtok(NULL, " ");
}
free (total);
return NULL;
}
The problem if I understand it correctly is that I read values to total array, when It already has been allocated - which does not make a lot of sense to me.
Here is the error message:
==58229==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x000104d00bbb at pc 0x000102c7468c bp 0x00016d562b20 sp 0x00016d5622d8
READ of size 44 at 0x000104d00bbb thread T0
#0 0x102c74688 in wrap_strlen+0x164 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x14688)
#1 0x10289e4dc in findInArray test 4.c:39 // the line of moving memory to total
#2 0x10289ee14 in newSpeak test 4.c:129
#3 0x10289f880 in main test 4.c:186
#4 0x1b4df7e4c (<unknown module>)
0x000104d00bbb is located 0 bytes to the right of 43-byte region [0x000104d00b90,0x000104d00bbb)
allocated by thread T0 here:
#0 0x102c9eca8 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3eca8)
#1 0x10289e480 in findInArray test 4.c:31 // the line of allocating total array
#2 0x10289ee14 in newSpeak test 4.c:129
#3 0x10289f880 in main test 4.c:186
#4 0x1b4df7e4c (<unknown module>)
Thank you for any help.

Related

Segmentation fault in c (C90), Whats the problem?

Heres my main.c:
int main() {
char *x = "add r3,r5";
char *t;
char **end;
t = getFirstTok(x,end);
printf("%s",t);
}
And the function getFirstTok:
/* getFirstTok function returns a pointer to the start of the first token. */
/* Also makes *endOfTok (if it's not NULL) to point at the last char after the token. */
char *getFirstTok(char *str, char **endOfTok)
{
char *tokStart = str;
char *tokEnd = NULL;
/* Trim the start */
trimLeftStr(&tokStart);
/* Find the end of the first word */
tokEnd = tokStart;
while (*tokEnd != '\0' && !isspace(*tokEnd))
{
tokEnd++;
}
/* Add \0 at the end if needed */
if (*tokEnd != '\0')
{
*tokEnd = '\0';
tokEnd++;
}
/* Make *endOfTok (if it's not NULL) to point at the last char after the token */
if (endOfTok)
{
*endOfTok = tokEnd;
}
return tokStart;
}
Why do i get segmentation fault running this main program?
I'm programming a two pass aseembler and i need a function that get parse a string by a delimiter, In this case a white space. Is it better to use strtok instead for this purpose?
I need a command pasrer - So that it will extract "add", an operand parser (By , delimiter), To extract "r3" and "r5". I wanted to check if this getFirstTok function is good for this purpose but when i try to run it i get a segmentation fault:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
Thank you.
As pointed out in the comments, string literals are read-only, as they are baked into the compiled program. If you don't want to go with the suggested solution of making your "source program" a stack-allocated array of characters (char x[] = "add r3,r5"), you can use a function like strdup(3) to make a readable/writable copy like so:
#include <string.h>
[...]
char *rw_code = strdup(x);
t = getFirstTok(rw_code, end);
printf("%s", t);
free(rw_code); /* NOTE: invalidates _all_ references pointing at it! */
[...]
And as a little aside, I always make string literals constant const char *lit = "...", as the compiler will usually warn me if I attempt to write to them later on.

Two-dimensional char array too large exit code 139

Hey guys I'm attempting to read in workersinfo.txt and store it into a two-dimensional char array. The file is around 4,000,000 lines with around 100 characters per line. I want to store each file line on the array. Unfortunately, I get exit code 139(Not enough memory). I'm aware I have to use malloc() and free() but I've tried a couple of things and I haven't been able to make them work.Eventually I have to sort the array by ID number but I'm stuck on declaring the array.
The file looks something like this:
First Name, Last Name,Age, ID
Carlos,Lopez,,10568
Brad, Patterson,,20586
Zack, Morris,42,05689
This is my code so far:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
FILE *ptr_file;
char workers[4000000][1000];
ptr_file =fopen("workersinfo.txt","r");
if (!ptr_file)
perror("Error");
int i = 0;
while (fgets(workers[i],1000, ptr_file)!=NULL){
i++;
}
int n;
for(n = 0; n < 4000000; n++)
{
printf("%s", workers[n]);
}
fclose(ptr_file);
return 0;
}
The Stack memory is limited. As you pointed out in your question, you MUST use malloc to allocate such a big (need I say HUGE) chunk of memory, as the stack cannot contain it.
you can use ulimit to review the limits of your system (usually including the stack size limit).
On my Mac, the limit is 8Mb. After running ulimit -a I get:
...
stack size (kbytes, -s) 8192
...
Or, test the limit using:
struct rlimit slim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur // the stack limit
I truly recommend you process each database entry separately.
As mentioned in the comments, assigning the memory as static memory would, in most implementations, circumvent the stack.
Still, IMHO, allocating 400MB of memory (or 4GB, depending which part of your question I look at), is bad form unless totally required - especially for a single function.
Follow-up Q1: How to deal with each DB entry separately
I hope I'm not doing your homework or anything... but I doubt your homework would include an assignment to load 400Mb of data to the computer's memory... so... to answer the question in your comment:
The following sketch of single entry processing isn't perfect - it's limited to 1Kb of data per entry (which I thought to be more then enough for such simple data).
Also, I didn't allow for UTF-8 encoding or anything like that (I followed the assumption that English would be used).
As you can see from the code, we read each line separately and perform error checks to check that the data is valid.
To sort the file by ID, you might consider either running two lines at a time (this would be a slow sort) and sorting them, or creating a sorted node tree with the ID data and the position of the line in the file (get the position before reading the line). Once you sorted the binary tree, you can sort the data...
... The binary tree might get a bit big. did you look up sorting algorithms?
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 1 on sucesss or 0 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return 0;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
return 0;
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 1;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
return 0;
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE* ptr_file;
ptr_file = fopen("workersinfo.txt", "r");
if (!ptr_file)
perror("File Error");
struct DBEntry line;
while (read_db_line(ptr_file, &line)) {
// do what you want with the data... print it?
printf(
"First name:\t%s\n"
"Last name:\t%s\n"
"Age:\t\t%s\n"
"ID:\t\t%s\n"
"--------\n",
line.first_name, line.last_name, line.age, line.id);
}
// close file
fclose(ptr_file);
return 0;
}
Followup Q2: Sorting array for 400MB-4GB of data
IMHO, 400MB is already touching on the issues related to big data. For example, implementing a bubble sort on your database should be agonizing as far as performance goes (unless it's a single time task, where performance might not matter).
Creating an Array of DBEntry objects will eventually get you a larger memory foot-print then the actual data..
This will not be the optimal way to sort large data.
The correct approach will depend on your sorting algorithm. Wikipedia has a decent primer on sorting algorythms.
Since we are handling a large amount of data, there are a few things to consider:
It would make sense to partition the work, so different threads/processes sort a different section of the data.
We will need to minimize IO to the hard drive (as it will slow the sorting significantly and prevent parallel processing on the same machine/disk).
One possible approach is to create a heap for a heap sort, but only storing a priority value and storing the original position in the file.
Another option would probably be to employ a divide and conquer algorithm, such as quicksort, again, only sorting a computed sort value and the entry's position in the original file.
Either way, writing a decent sorting method will be a complicated task, probably involving threading, forking, tempfiles or other techniques.
Here's a simplified demo code... it is far from optimized, but it demonstrates the idea of the binary sort-tree that holds the sorting value and the position of the data in the file.
Be aware that using this code will be both relatively slow (although not that slow) and memory intensive...
On the other hand, it will require about 24 bytes per entry. For 4 million entries, it's 96MB, somewhat better then 400Mb and definitely better then the 4GB.
#include <stdlib.h>
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// this might be a sorting node for a sorted bin-tree:
struct SortNode {
struct SortNode* next; // a pointer to the next node
fpos_t position; // the DB entry's position in the file
long value; // The computed sorting value
}* top_sorting_node = NULL;
// this function will free all the memory used by the global Sorting tree
void clear_sort_heap(void) {
struct SortNode* node;
// as long as there is a first node...
while ((node = top_sorting_node)) {
// step forward.
top_sorting_node = top_sorting_node->next;
// free the original first node's memory
free(node);
}
}
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 0 on sucesss or 1 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return -1;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
clear_sort_heap();
exit(2);
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 0;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
clear_sort_heap();
exit(1);
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// read and sort a single line from the database.
// return 0 if there was no data to sort. return 1 if data was read and sorted.
int sort_line(FILE* fp) {
// allocate the memory for the node - use calloc for zero-out data
struct SortNode* node = calloc(sizeof(*node), 1);
// store the position on file
fgetpos(fp, &node->position);
// use a stack allocated DBEntry for processing
struct DBEntry line;
// check that the read succeeded (read_db_line will return -1 on error)
if (read_db_line(fp, &line)) {
// free the node's memory
free(node);
// return no data (0)
return 0;
}
// compute sorting value - I'll assume all IDs are numbers up to long size.
sscanf(line.id, "%ld", &node->value);
// heap sort?
// This is a questionable sort algorythm... or a questionable implementation.
// Also, I'll be using pointers to pointers, so it might be a headache to read
// (it's a headache to write, too...) ;-)
struct SortNode** tmp = &top_sorting_node;
// move up the list until we encounter something we're smaller then us,
// OR untill the list is finished.
while (*tmp && (*tmp)->value <= node->value)
tmp = &((*tmp)->next);
// update the node's `next` value.
node->next = *tmp;
// inject the new node into the tree at the position we found
*tmp = node;
// return 1 (data was read and sorted)
return 1;
}
// writes the next line in the sorting
int write_line(FILE* to, FILE* from) {
struct SortNode* node = top_sorting_node;
if (!node) // are we done? top_sorting_node == NULL ?
return 0; // return 0 - no data to write
// step top_sorting_node forward
top_sorting_node = top_sorting_node->next;
// read data from one file to the other
fsetpos(from, &node->position);
char* buffer = NULL;
ssize_t length;
size_t buff_size = 0;
length = getline(&buffer, &buff_size, from);
if (length <= 0) {
perror("Line Copy Error - Couldn't read data");
return 0;
}
fwrite(buffer, 1, length, to);
free(buffer); // getline allocates memory that we're incharge of freeing.
return 1;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE *fp_read, *fp_write;
fp_read = fopen("workersinfo.txt", "r");
fp_write = fopen("sorted_workersinfo.txt", "w+");
if (!fp_read) {
perror("File Error");
goto cleanup;
}
if (!fp_write) {
perror("File Error");
goto cleanup;
}
printf("\nSorting");
while (sort_line(fp_read))
printf(".");
// write all sorted data to a new file
printf("\n\nWriting sorted data");
while (write_line(fp_write, fp_read))
printf(".");
// clean up - close files and make sure the sorting tree is cleared
cleanup:
printf("\n");
fclose(fp_read);
fclose(fp_write);
clear_sort_heap();
return 0;
}

Memory in valgrind

I'm having an issue with memory inn valgrind. I've been trying to figure out what's wrong but I can't seem to find it. Here is my issue:
==32233== Invalid write of size 1
==32233== at 0x4C2E1E0: strcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32233== by 0x4010C7: songCopy (song.c:102)
==32233== by 0x4009E6: main (songtest.c:82)
==32233== Address 0x51fda09 is 0 bytes after a block of size 9 alloc'd
==32233== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32233== by 0x4010A4: songCopy (song.c:101)
==32233== by 0x4009E6: main (songtest.c:82)
And this is where the issue is.
song *songCopy(const song *s)
{
//song *d = NULL ;
mtime *tmp = NULL ;
song *d = malloc(sizeof(song));
d->artist = malloc(sizeof(s->artist) + 1) ;
strcpy(d->artist, s->artist) ;
d->title = malloc(sizeof(s->title) + 1) ;
strcpy(d->title, s->title) ;
if (NULL != s->lastPlayed)
{
// copy the last played
tmp = mtimeCopy(s->lastPlayed) ;
d->lastPlayed = tmp ;
}
else
{
// set lastPlayed to NULL
d->lastPlayed = NULL ;
}
return d ;
}
I've tried dereferencing and adding more space to malloc. I know it's going wrong in the strcpy but I'm not sure why.
You did not show the declaration of song, but from the usage it looks like its artist and title members are char* pointers. You can use sizeof to measure an array, but not a block pointed to by the pointer. sizeof is the same for all char* pointers on your machine, no matter how long is the string to which they point.
You need to use strlen(str)+1 instead of sizeof(str)+1 to fix this problem:
d->artist = malloc(strlen(s->artist) + 1) ;
strcpy(d->artist, s->artist) ;
d->title = malloc(strlen(s->title) + 1) ;
strcpy(d->title, s->title) ;

C KERN_INVALID_ADDRESS in regnexec()

I am Writing a program in c that relies heavily on regular expressions and my mechanism for executing them works 99% of the time but then it crashes the program every once in a while and I am stumped to why it would be.
New_Sifter() takes a String representation of its regex and a processing function that takes an array or strings and returns a single string.
Sifter* New_Sifter(const char* exp, const char*(*func)(const char**, size_t)){
Sifter *sifter = malloc(sizeof(Sifter*));
sifter->strRegEx = exp;
if(regcomp(&(sifter->regEx), exp, REG_EXTENDED)){
printf("Could not compile regular expression\n");
exit(1);
}
sifter->Sift = &Base_;
sifter->Custom = func;
sifter->nGroups = sifter->regEx.re_nsub + 1;
sifter->captures = malloc(sifter->nGroups * sizeof(regmatch_t));
Register_Disposable(sifter->captures); //stores pointer in registry to be freed later
Register_Disposable(sifter); //stores pointer in registry to be freed later
return sifter;
}
const char* Base_(Sifter* self, const char* source){
if(regexec(&(self->regEx), source, self->nGroups, self->captures,
REG_EXTENDED) != 0){
printf("about to return null\n");
return NULL;
}
return self->Custom(
//Sift_() returns an array of the strings captured in the regexec
Sift_(source, self->captures, self->nGroups), self->nGroups);
}
The error I get sometimes when I run this (and debug some with gdb) looks like:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000008
0x00007fff90d72b9f in tre_reset_last_matched_branches ()
(gdb) bt
0x00007fff90d72b9f in tre_reset_last_matched_branches ()
0x00007fff90d72a58 in tre_fill_pmatch ()
0x00007fff90d72e56 in tre_match ()
0x00007fff90d72d35 in regnexec ()
0x00000001000030cf in Base_ (self=0x1001000e0, source=0x1000033ee "add 11111, 22222, 33333")
Sifter *sifter = malloc(sizeof(Sifter*));
This line has one * too many. You are allocating space for a pointer, not space for the struct. Take the * out of the sizeof.

elements in array of char* getting corrupted when copied into a new array

I have come across this before, and the answer has been either to null-terminate the string, or to be sure to allocate enough memory for the string. Here is the relevant snippet of code:
for (z; z<mountable_volumes; z++)
{
main_items[z] = malloc(strlen(volumes[z])+2);
main_items[z] = volumes[z];
printf("main_items[%i]: %s\n", z, volumes[z]);
}
main_items[z] = NULL;
return main_items;
Volumes[] is correct, but when its contents get created into main_items[], it goes bad. I have tried playing with malloc, even outright allocating way more ram than necessary. I have also tried tacking on a '\0' to the end of each main_items[] element. I have tried using strcpy, strncpy, sprintf with the same results.
Here is the log from my program:
volumes[0]: Unmount /sdcard
volumes[1]: Mount /system
volumes[2]: Unmount /cache
volumes[3]: Mount /data
volumes[0]: Unmount /sdc)☻
main_items[1]: Mount /syste)☻
main_items[2]: Unmount /cac‼
main_items[3]: Mount /data
What am I missing? Thanks! I can paste more of the function if its needed.
EDIT:
here is the entire function: (I have applied the strndup() and free() tips)
char** get_mount_menu_options()
{
Volume * device_volumes = get_device_volumes();
num_volumes = get_num_volumes();
char** volumes = malloc (num_volumes * sizeof (char *));
int mountable_volumes = 0;
int usb_storage_enabled = is_usb_storage_enabled();
int i;
for (i=0; i<num_volumes; i++)
{
volumes[i] = "";
Volume *v = &device_volumes[i];
char* operation;
if (is_path_mountable(v->mount_point) != -1)
{
if (is_path_mounted(v->mount_point)) operation = "Unmount";
else operation = "Mount";
volumes[mountable_volumes] = malloc(sizeof(char*));
printf("volumes[%i]: %s %s\n", mountable_volumes, operation, v->mount_point);
sprintf(volumes[mountable_volumes], "%s %s", operation, v->mount_point);
mountable_volumes++;
}
}
char **main_items = malloc (num_volumes * sizeof (char *));
int z;
for (z=0; z<mountable_volumes; z++)
{
main_items[z] = strndup(volumes[z], strlen(volumes[z]));
free(volumes[z]);
printf("main_items[%i]: %s\n", z, volumes[z]);
}
main_items[z] = NULL;
return main_items;
}
CURRENT LOG:
volumes[0]: Unmount /sdcard
volumes[1]: Mount /system
volumes[2]: Unmount /cache
volumes[3]: Mount /data
main_items[0]: Unmount /sdc)☻
main_items[1]: Mount /syste)☻
main_items[2]: Unmount ¿♠
main_items[3]:
Thanks everyone!
For copying strings, you need to use the strcpy(). Assignments won't work!
strcpy(main_items[z], volumes[z]);
And moreover,
main_items[z] = malloc(strlen(volumes[z])+2);
should be
main_items[z] = (char *) malloc(strlen(volumes[z]) + 1);
And I assume main_items[z] and volumes[z] is a char *
As an alternative to everyone's examples of strcpy(), you can just use strdup(), and skip the malloc():
main_items[z] = strdup(volumes[z]);
Obviously you still have to free() your memory! This also relies on volumes[z] being NULL-terminated.
Edit: Or, as Peter Downs points out in the comments, you can use strndup() rather than relying on NULL-termination, if you know the lengths of your strings.
main_items[z] = volumes[z]; should be strcpy(main_items[z], volumes[z]);, otherwise the memory that you have allocated on the line just above is leaked, and the pointer main_items[z] becomes aliased to the pointer in volumes[z].

Resources