Reading and writing Large files in C - c

I wanted to know the issue in the following code. This is used to first open a file of size 800Mb to fill in the name variable and later used to access the data stored in it by reading indices from another file. The problem is that after reading and filling the array name, on accessing any element from it gives a seg fault. When the same code was tested on smaller data size, it worked. What could be the reason? The hardware on which I am running this is a 4gb RAM,32 bit linux version on intel i5 chip.
#include <stdio.h>
#define MAXINT 61578414
int main(int argc,char** argv){
printf("Starting \n");
FILE* fp1 = fopen(argv[1],"r");
FILE* fp2 = fopen(argv[2],"r");
char** name;
name = (char**)malloc(MAXINT*sizeof(char*));
char* tname;
int i = 0;
int tmp1;
//reading to fill in name
while(i < MAXINT){
name[i] = (char*)malloc(20);
fscanf(fp1,"%d%s",&tmp1,name[i]);
i++;
}
//accessing elements from name
int i1,i2;
while(!feof(fp2)){
fscanf(fp2,"%d%d",&i1,&i2);
fprintf(stdout,"%s %s\n",*(name+i1),*(name+i2));
}
}

There is one potential problem with your allocation of name.
char** name;
name = (char**)malloc(MAXINT*sizeof(char*));
name is defined to be a pointer to pointer, which is an array of pointers. The code is allocating MAXINT number of pointers, which is a very large number. You need to check if next was successfully allocated or a NULL pointer was removed i.e. whether malloc was successful or not.
Similarly, name[i] = malloc(20) should also be checked for NULL pointer as your system may potentially run out of memory.

Is this on Linux? In that case, (by default) non-NULL values from malloc doesn't tell you if you actually have access to the amount of memory you request. In practice, this means you can malloc far more memory than you have available. It's only when you actually access it, that the memory it going to be allocated.
So write a simple loop which just reads or writes to each byte you malloc'ed and see if that crashes. If it does, than well that's your problem.
As a side-note, have you considered using mmap() instead to deal with large file/memory allocation?

I imagine that name = (char**)malloc(MAXINT*sizeof(char*)); actually fails, you should check for a NULL return value there.

Related

making char pointer pointing to null causes segmentation fault

in linux, I am trying the below code which is causing segmentation fault error:
int main(int arg_count,char *args[]){
char *buffer;
if(arg_count>1)
buffer = args[1];
else
*buffer = 0;
}
I know that pointers point to read only part of the memory, so I changed my first try buffer[0]=0; to above. But I don't understand why this one is not working either?!
The final line of your function, *buffer = 0, is attempting to set the value referred to by the pointer buffer.
As buffer has never been initialised and therefore contains an indeterminate value, dereferencing buffer is very likely to cause a segfault.
For most projects you should never write argument parsing code yourself. There are many robust and efficient libraries that will do a much better job than you (or I) could. As you are writing C on Linux GNU getopt is a good option.
if you go through your program line by line you'll see that if the user doesn't pass any arguments then buffer is just a random value. As another comment said you need to initialize it. In your case I don't think you literally want to put the value 0 in the memory address that buffer points to. Here is code that shows how to handle arguments
int main(int argc, char **argv){
char *buffer = NULL;
if(argc > 1){
buffer = argv[1];
}
else{
buffer = malloc(1024);
puts("please enter an argument");
fgets(buffer, 1024, stdin);
//do stuff with buffer
free(buffer)
}
return 0;
}
in the code above the program checks if any arguments were passed to the program, if no arguments were passed then the program allocated 1024 bytes and points buffer to that memory location and then asks the user for input. From this point you can do what ever you want with buffer.
buffer character pointer is not initialised. Since buffer is declared with auto storage class it will have a garbage value. You are trying to access a uninitialized pointer which is a memory access exception hence it gave a seg fault. Before accessing buffer allocate a memory using calloc or malloc.

Dynamic String array dosent work

I'm trying to create a dynamic array of 1000 character long strings using calloc:
int i;
char** strarr =(char**)calloc(argc,sizeof(char)*1000);
if(strarr == NULL)
return 0;
strarr[0][0] ='a';
printf("%c\n",strarr[0][0]);
Every time i try to run this code i get segmentation fault on the printf line, i don't get why does this happen (you can assume that argc is bigger than 0)
Thanks
P.s. im sorry that the code is in text format but im using a mobile so i dont have the code feature
Try this:
const int num_of_strings = 255; //argc ?
const int num_of_chars = 1000;
int i;
char** strarr =(char**)malloc(sizeof(char*)*num_of_strings);
if(strarr == NULL)
return 0;
for (i = 0; i < num_of_strings; i++) strarr[i] = (char*)malloc(sizeof(char)*num_of_chars);
Hello and Welcome to the world of undefined behaviour, one of the darkest territories of the C language. Your code has several problems, which cause undefined behaviour in several occasions, but they all get executed, until you reach the printf line, where you are accessing memory, you have not allocated, which is finally caught by your system and, thus, a segmentation fault is produced.
But I think, it would be better to walk ourselves through your code.
The variable i, which is declared in the int i; line is not used anywhere in the code you have posted, but I guess you need it later.
The first piece of code, that is not right, is in this second line, where you declare an array of strings or a char**. That means that you have a pointer to pointers to chars. So, what you really want to do there is allocate memory for those pointers and not for the chars they will point to. Note that a char consumes a different amount of memory than a char*. This line is, thus, the one to go with.
char** strarr = (char**) calloc(argc, sizeof(char*));
This will allocate memory for argc blocks of memory, each of which is of size 4 or 8 bytes, which depends on whether your system is 32 or 64-bit.
You are doing a very good job of checking whether the calloc function returned NULL or not, which is a very good practice overall.
Next, you will want to allocate memory for the strings themselves, that are pointed to by the pointers, for which you allocated memory in the previous line. These lines will do it.
for (int i = 0; i < argc; i++) {
strarr[i] = (char*) calloc(1000, sizeof(char));
}
This will now allocate 1000-character lengthed strings for every element of our argc-sized string array.
After that, you can continue with your code as it is and I think that no errors will be produced. Please accept an additional piece of advice from me. Learn to love valgrind. It is a very helpful program, which you can run your code with, in order to analyse memory. It is my first step, whenever I get a segmentation fault.

Segmentation fault while trying to make a dynamically allocated array of strings in C

I am trying to make a function which writes strings and integers to a file. A part of my approach involves creating a 2-D dynamic array to hold a series of strings. There are no compile time errors. However, when I try to run the program (the function plus a main to test it inside) I keep getting a segmentation fault error. I have traced it so far to a particular section of code that I have highlighted below. From what I can tell, it seems that my attempts to open the array are what is causing the seg fault (I have used a debug string with a %p character in it to make sure that the array is actually in the memory). I can't for the life of me tell why though. For convenience, I've only included the code that pertains to the seg fault as well as the declarations/initializations. If you want me to provide the whole main, please let me know. To be clear, I have tried Google for a solution, but everything I've found so far either I don't understand, or doesn't pertain to my situation.
int data_cat_num=0,current_cat=0,col_index=0;
char **cat_names;
printf("How many data categories will there be?:");
scanf("%d",&data_cat_num);
cat_names=malloc(data_cat_num*50);/*Currently defaulting to 50 bytes per name*/
while(current_cat<data_cat_num){
col_index=0;
printf("What is the name of category %d?",current_cat+1);
while(cat_names[current_cat][col_index]!='\n'){/*SEG FAULT TRACED TO THIS*/
cat_names[current_cat][col_index++]=getchar();/*LOOP!!!!!*/
}
cat_names[current_cat][col_index]='\n';
current_cat++;
}
This line:
cat_names=malloc(data_cat_num*50);
doesn't allocate 50 bytes per name; it allocates data_cat_num*50 names. You need to do this:
cat_names = malloc(data_cat_num * sizeof(char *));
int i;
for(i = 0; i < data_cat_num; i++)
cat_names[i] = malloc(50);
The first line allocates data_cat_num names (i.e., char * variables), and later (in the for loop), 50 bytes for each name.
EDIT Also, when you free this memory, you need to do it this way:
for(i = 0; i < data_cat_num; i++)
free(cat_names[i]);
free(cat_names);
This line doesn't do what you think:
cat_names=malloc(data_cat_num*50);/*Currently defaulting to 50 bytes per name*/
cat_names contains pointers to the strings, not the strings themselves. So, here you're allocating enough space to store the pointers to 50 cat names, but not the names themselves.
What you'll need to do is each time you add a name, malloc that as well. So...
cat_names[current_cat] = malloc(maxNameLength*sizeof(char));
char **cat_names;
This says cat_names will be a pointer to a pointer.
cat_names=malloc(data_cat_num*50);/*Currently defaulting to 50 bytes per name*/
This makes cat_names a pointer, but a pointer to garbage because the memory returned by malloc isn't initialized.
cat_names[current_cat][col_index]='\n';
Oops, you used the cat_names contents as a pointer, but it points to garbage.

fread a single int from bin file gives me a segmentation fault in C

I want your help in something that should be easy and I do not know why it does not work. I want to read the first data from a bin which I know that it is an int. I am using the following part of code, but I am getting a segmentation fault:
int main(int argc, char **argv)
{
int *data;
/*opening file*/
FILE *ptr_myfile;
ptr_myfile=fopen(myfile.bin","rb");
if (!ptr_myfile)
{
printf("Unable to open file!\n");
return 1;
}
/*input file opened*/
printf("I will read the first 32-bit number\n");
/*will read the first 32 bit number*/
fread(&data,sizeof(int),1, ptr_myfile);
printf("Data read is: %d\n",*data);
fclose(ptr_myfile);
return 0;
}
I also tried calling it like this:
fread(data,sizeof(int),1, ptr_myfile);
It should be something with the pointer but I cannot see what.
Change:
int *data; to int data;
printf("Data read is: %d\n",*data); to printf("Data read is: %d\n",data);
You are reading the int into a pointer, then trying to dereference the pointer (which is has a value that's meaningless as a pointer).
You don't have any memory allocated to data and in the first example code you are not using the pointer correctly. This is an alternative that could work:
int data;
and then you would use it like so:
fread(&data,sizeof(int),1, ptr_myfile);
In your original code, here you are writing an int into a pointer:
fread(&data,sizeof(int),1, ptr_myfile) ;
and then this *data will be dereferencing an invalid pointer. In the alternative case:
fread(data,sizeof(int),1, ptr_myfile);
you will be using a pointer that has no memory allocated to it.
You are passing address of a non-allocated pointer. Remove * from data:
int data;
...
fread(&data, sizeof(int), 1, ptr_myfile);
You are overthinking the question of pointers. The function fread() needs to know the address at which to store the data it reads, so it is passed a pointer. That does not mean that the buffer used by fread should be a pointer, just that fread needs a pointer to that buffer.
So your first error is writing int *data; where you really just mean int data;. You say the data element to be read is an int, so you would just declare an int to hold it.
You correctly called fread to read the data by passing a pointer to actual allocated and valid memory (i.e. &data), avoiding a different beginner pitfall of passing the uninitialized pointer to integer you declared earlier.
Your actual error is caused by writing *data in the call to printf. The * operator takes a pointer and dereferences it. In other words, it assumes that the pointer points to something, and retrieves that value. However, when the pointer being dereferenced does not point to any valid something, you have a problem.
And here is where you were lucky. The value you read from the file was then treated by the * operator as an address in memory, but happened to be an address that was not part of the memory accessible to your process, which caused the system to take notice and halt the process with a segmentation fault. As a result, you learned immediately that there was an error in your code.
If you were not lucky, the integer you read from the file when treated as an address of memory could have been the address of something already present in your process. You would not have caused a segmentation fault, and you would by puzzled by why printf produced a strange answer.
Now imagine if you read a value from an untrusted source and treated it as an arbitrary address. An attacker could use that error to learn something about your program. Worse, if you had written to the memory referenced by that pointer, the attacker could conceivably change the behavior of your program.
As a final point, note that reading and writing int (or any other data type) is safe and acceptable as long as that file will never be transferred to a different system. Not all systems agree on even the simplest definitions like the number of bytes occupied by an int (which you did handle correctly by using sizeof(int), but not in a way that would allow exchanging the file with a system with a different integer size), or even the order of those bytes in memory. Correctly addressing these issues is a large subject.

C Pointers question

I have a small conceptual question about pointers. This may be embarrassing but I need to know the answer.
I was trying to read a line from a file using getline function. getline takes char ** as its first argument and that is where the line pointer is stored. Please see the pasted code below and tell me the difference between them. Notice the declaration and use of readLine pointer.
The second code gave me segmentation fault when it reached printf(). I checked the value at *readLine with gdb(before printf()) and it was correct, but when it goes to printf(), boom SIGSEGV
this code works:
FILE *fp;
char *readLine;
readLine=NULL;
int s=0;
while(getline(&readLine,(size_t *)&s,fp) != -1){
printf("%s\n",readLine);
}
this code does not work:
FILE *fp;
char **readLine;
*readLine=NULL;
int s=0;
while(getline(readLine,(size_t *)&s,fp) != -1){
printf("%s\n",*readLine);
}
cheers...
rv
(size_t *)&s
This will crash a 64-bit machine with 32-bit ints. The solution to this kind of problem is to declare the type which is required (size_t s;), not to cast anything. In x86-64, this assigns 8 bytes to a 4-byte location on the stack, which results in stack corruption. Because the overwrite happens in the called function, it could overwrite the return address, for example.
char **readLine;
*readLine=NULL;
This is also an instant crash. You are assigning a value to the target of an uninitialized pointer, changing the bytes at some unknown point in memory.
In the first case, the variable readLine -- the value of which is stored on the stack, in it's own special reserved area -- is a pointer-to-char. When you pass its address to getline(), you tell getline() to store the real pointer into the memory which is reserved for it. Everything works.
In the second case, readLine is a pointer-to-pointer-to-char, and again, there is space reserved for it on the stack. When you call getline(), though, you're telling getline() that the readLine variable holds the address in which the pointer-to-char should be stored. But readLine points to some random memory somewhere, not to a location which getline() should be allowed to store data. In fact, you've already started corrupting memory when you write
*readLine = NULL;
because as I said, readLine is pointing to memory you don't own. getline() just makes it worse.
Please find example bellow, it is compiled end executed on Ubuntu 18.04.
If you use linux, please type "man getline", man pages are your friend.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
char *readLine;
FILE *fp;
size_t n = 0;
readLine=NULL;
fp = fopen("example.c", "r");
while(getline(&readLine,&n,fp) != EOF){
printf("%s\n",readLine);
}
free(readLine);
}

Resources