Segmentation fault in file parsing code - c

I am getting a segmentation fault when I try to run my program that does matrix addition. I am trying to run the program separately ~1000 times (while timing each run and writing the result to a file).
The problem is, I get segmentation fault after a number of runs - how far I get depends on the size of the matrix. For example, if I run a 10x10 matrix (each instance has randomly generated numbers), I get segmentation fault after exactly 1013 runs. For a 100x100 matrix, I get a segfault at 260 runs.
A quick run through of how the program works is as follows:
Numbers are randomly generated and written to a file depending on the entered input (10x10, 100x100)
The numbers are read in from the file and send to CUDA*
CUDA calculates the results and writes it to a results file (and also times how long the calculation took and writes it to another file)
*This step appears to be causing the segmentation fault according to the GDB debugger. Below is the error output from the debugger and the function that is causing the error.
>Program terminated with signal 11, Segmentation fault.
#0 0x0000000000402f4c in readFromFile(int, char, int&, int&, float*) ()
Here is the actual function:
void readFromFile(int fd, char byte, int &matrixWidth, int &matrixHeight,float *matrix)
{
int tokenIndex = 0;
char *token = (char*) malloc(500);
int matrixIndex = 0;
while(read(fd,&byte,1)){
if(isdigit(byte)||byte=='.'){
token[tokenIndex]=byte;
tokenIndex++;
}
else if(byte==' ' && matrixHeight==0){
matrixWidth++;
token[tokenIndex]='\0';
matrix[matrixIndex]=atof(token);
//printf("Stored: %d\n",matrixOne[matrixIndex]);
tokenIndex=0;
matrixIndex++;
}
else if(byte=='\n'){
matrixHeight++;
if(tokenIndex!=0){
token[tokenIndex]='\0';
matrix[matrixIndex]=atof(token);
//printf("Stored: %d\n",matrixOne[matrixIndex]);
tokenIndex=0;
matrixIndex++;
}
}
else if(byte==' ' && matrixHeight!=0){
token[tokenIndex]='\0';
matrix[matrixIndex]=atof(token);
tokenIndex=0;
matrixIndex++;
}
//printf("Token: %s, number matrix: %f\n" , token, matrix[matrixIndex-1]);
}
}
This code is repeatedly run until the segmentation fault (each time the file it reads has different numbers). If you need any more code, just let me know. Any help would greatly be appreciated.

There are many problems that could cause the segmentation fault in the code you posted. Let me list a few:
As Jens pointed out how sure are you that the size of any token is actually < 500 characters
How sure are you that there are no run-on numbers like 1.32.4? ATOF could crash on this so use strtod instead
Use a debugger to run the program as macs pointed out that will give you the line that it crashed on and you can see the values of the variables so it gives you a better idea
Use safety checks when reading the input as the run-on numbers could be the least of the problems. Multiple '\n' characters could mess it up.
Since everything is text I would suggest you read using fscanf rather then building your own version.
Read up on Row major vs. Column Major formats depending on what the data might be you may not need to keep track of dimensions

How do you allocate the memory for the matrix? Equally important, do you free it? Anyway, a hint: compile your program with -g option to generate debug information, and learn how to use a debugger. Then you will find the problem, whereas we can just guess.

Related

When using MPI, how can I fix the error: "_int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed"?

I'm using a scientific simulation code that my supervisor wrote about 10 years ago to run some calculations. An intermittent issue keeps arising when running it in parallel on our cluster (which has hyperthreading enabled) using mpirun. The error it produces is very terse, and simply tells me that a memory assignment has failed.
program-name: malloc.c:4036: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
[MKlabgroup:3448077] *** Process received signal ***
[MKlabgroup:3448077] Signal: Aborted (6)
[MKlabgroup:3448077] Signal code: (-6)
I've used the advice here to start the program and halt it so that I can attach a debugger session to one of the instances on a single core. The error occurs during the partitioning of the input mesh (using metis) when a custom matrix function is called for the first time, and requests space for ~4000 rows and 4 columns, with each element being an 8 byte integer. This particular function (below) uses an array of n pointers addressing m arrays of integer pointers:
int **matrix_int(int n, int m)
{
int i;
int **mat;
// First: Assign the rows [n]
mat = (int **) malloc(n*sizeof(int*));
if (!mat)
{
program_error("ERROR[0]: Memory allocation failure for the rows #[matrix_int()]");
}
// Second: Assign the columns [m]
for (i = 0; i <= n-1; i++)
{
mat[i] = (int *) malloc(m*sizeof(int));
if (!mat[i])
{
program_error("ERROR[0]: Memory allocation failure for the columns #[matrix_int()]");
}
}
return mat;
}
My supervisor thinks that the issue has to do with automatic resource allocation on the CPU. As such, I've recently tried using the -rf option in mpirun in conjunction with a rankfile specifying which cores to use, but this has produced similarly intermittent results; sometimes a single process crashes, sometimes several, and sometimes it runs fine. It always runs reliably in serial, but the calculations are extremely slow on a single core.
Does anyone know of a change to the server configuration or the code itself that I can make (aside from globally disabling hyperthreading) that would allow this to run for certain every time?
(Any general tips on debugging in parallel would also be greatly appreciated! I'm still pretty new to C/C++ and MPI, and have another bug to chase after this one which is probably related.)
After using the compiler flags suggested by n. 1.8e9-where's-my-share m. to diagnose memory access violations I've discovered that the memory corruption is indeed caused by a function that is called just before the one in my original question.
The offending function reads in data from a text file using sscanf, and would allocate a 3-element array for each line of the file (for the 3 numbers to be read in per line). The next part is conjecture, but I think that the problem arose because sscanf returns a NULL at the end of a sequence it reads. I'm surmising that this NULL was written to the next byte along from the 3 allocated, such that the next time malloc tried to allocate data the first thing it saw was a NULL, causing it to return without actually having allocated any space. Then the next function to try and use the allocated memory would come along and crash because it's trying to access unassigned memory that malloc had reported to be allocated.
I was able to fix the bug by changing the size of the allocated array in the read function from 3 to 4 elements. This would seem to allow the NULL character to be stored without it interfering with subsequent memory allocations.

Array & segmentation fault

I'm creating the below array:
int p[100];
int
main ()
{
int i = 0;
while (1)
{
p[i] = 148;
i++;
}
return (0);
}
The program aborts with a segmentation fault after writing 1000 positions of the array, instead of the 100. I know that C doesn't check if the program writes out of bounds, this is left to the OS. I'm running it on ubuntu, the size of the stack is 8MB (limit -s). Why is it aborting after 1000? How can I check how much memory my OS allocates for an array?
Sorry if it's been asked before, I've been googling this but can't seem to find a specific explanation for this.
Accessing an invalid memory location leads to Undefined Behavior which means that anything can happen. It is not necessary for a segmentation-fault to occur.
...the size of the stack is 8MB (limit -s)...
variable int p[100]; is not at the stack but in data area because it is defined as global. It is not initialized so it is placed into BSS area and filled with zeros. You can check that printing array values just at the beginning of main() function.
As other said, using p[i] = 148; you produced undefined behaviour. Filling 1000 position you most probably reached end of BSS area and got segmentation fault.
It appear that you clearly get over the 100 elements defined (int p[100];) since you make a loop without any limitation (while (1)).
I would suggest to you to use a for loop instead:
for (i = 0; i < 100; i++) {
// do your stuff...
}
Regarding you more specific question about the memory, consider that any outside range request (in your situation over the 100 elements of the array) can produce an error. The fact that you notice it was 1000 in your situation can change depending on memory usage by other program.
It will fail once the CPU says
HEY! that's not Your memory, leave it!
The fact that the memory is not inside of the array does not mean that it's not for the application to manipulate.
The program aborts with a segmentation fault after writing 1000 positions of the array, instead of the 100.
You do not reason out Undefined Behavior. Its like asking If 1000 people are under a coconut tree, will 700 hundred of them always fall unconscious if a Coconut smacks each of their heads?

two dimensional array increement

I need to bin a hist variable like this, which is in a loop for p and bin,
hist[p][bin] = hist[p][bin] + 1;
When I comment this line, the code works( verified the p and bin variable print). However when I include this line the program terminates with segmentation fault. Further examining the bin variable gives me a huge negative integer ( -214733313 ), which leads to segmentation fault. The program runs normally when I comment this line and the bin variables are normal integer. Do I miss an obvious thing here?.
Thanks
If you're getting a -2147... you are basically reaching the max size of a signed integer, or 2^31 -1 (32 bits, 4 bytes, a C int).
If we assume this, then it's safe to say you're hitting memory where $FFFFFFFF is in it. I only ever see this in unallocated, generally random memory. It could be safe to assume then you are going out of bounds with your ask. You could have hist[p][bin] be the maximum memory of your array, and adding 1 is going out bounds.

Segmentation Fault & printing an array without loops

I'm Just trying to run a simple program to count the number of spaces, digits and other characters using arrays. Below is my program:
void main(){
int digit_holders[10]={0};
int ch;
int i, white_space=0,other=0;
while((ch=getchar())!=EOF){
if(isspace(ch))
white_space++;
else if(isdigit(ch))
digit_holders[ch-'0']++;
else
other++;
}
digit_holders[12]=20;
printf("\n White spaces=%d\n Other=%d\n",white_space,other);
for(i=0;i<=9;i++)
printf("\ndigit_holders[%d]=%d\n",i,digit_holders[i]);
printf("\n digit_holder[12]=%d\n",digit_holders[12]);
}
2 Questions:
Why does digit_holders[12] still manage to print the assigned vale despite it being outside the range? Why doesn't it display a segmentation fault ?The same happens when I change the for loop check to i<=11 it manages to print digit_holders[11]=0 ( which it shouldn't) .. however when I replace 11/10 with 1100 i.e digit_holders[1100] in either of the case, the program crashes ( segmentation fault). Why so ?
Is there an easier way to print the elements of this array without using for loop ?
-Thanks!
There is no range checking in C so it gives it is best shot (I.e enough rope to hang yourself and the rest of the family).
Segmentation fault occurs from the OS. i.e. trying to access memory not assigned to the process.
As I wrote in the comment:
You will need a loop of some kind to print the content of your array in C.
You are assigning the 13th element of an array with only 10 elements declared, its risky to do and will be unstable, but wont necessarily result in a seg fault because if the OS hasnt modified it in the time between write and read, your pointer will resolve the value without error. But again, risky.
If you had declared an array with 13 elements, all 13 will be reserved in memory and there will be no chance of a seg fault. You are likely to get a seg fault if you interrogate an array outside of its declared limits, more so the further you go away from the range you defined.
It's possible to print "out-of-range" array indices in C because digit_holders[12] is handled as though digit_holders were a pointer with an offset of 12 * sizeof(int) added on to it.
This won't necessarily cause a segmentation fault, since the memory address being read might still be earmarked as being "available" to the program. A good example of this is a struct containing an array:
typedef struct {
int some_array[12];
int another_int;
} my_struct;
If you were to read the value of some_array[12], you'd get the contents of another_int instead, since it's stored directly after the last "valid" index of some_array.
As for the printing out of arrays: I'm not aware of any way to print out an array without using a loop in C, though it is possible in some other languages. For example, you could easily print out a list of strings in Python using print(", ".join(["This", "That", "The other"]))

fscanf writes output in index of variable

I am trying to write a c program to monitor the temperatures of my processor.
To test my program I read integers from a file using fscanf. To my surprise this does not work under certain conditions.
I am using the following code:
#include <stdio.h>
#define CORECOUNT 3
int main()
{
int c=0;
int core[CORECOUNT]={0};
FILE *cmd[CORECOUNT];
for (c=0; c<=CORECOUNT; c++) //read input and store it in 'core'
{
cmd[c]=fopen("testinput", "r");
printf("c=%d\n", c);
fscanf(cmd[c], "%d", &core[c]);
printf("core[c]=%d, z=%d\n", core[c], c);
fclose(cmd[c]);
}
for (c=0; c<=CORECOUNT; c++) //print input
{
printf("core%d: %d ", c, core[c]);
}
printf("\n");
}
It compiles without an error.
Everything works as expected until the third (and last) call of fscanf: then suddenly 'c' gets the value 42 (which 'core' should get actually):
c=0
core[z]=42, z=0
c=1
core[z]=42, z=1
c=2
core[z]=42, z=2
c=3
core[z]=1, z=42
Segmentation fault (core dumped)
The segmentation fault occurs because fclose tries to close cmd[42] which does not exist.
When using other values of 'CORECOUNT' (e.g. 4) everything works as expected. However, when using a number which has '11' as last two digits in binary (e.g. 3, 7, 11, 15, ...) the program will crash. When declaring another integer and setting it to '0' the program works as expected when 'CORECOUNT' has a value which has '11' as last two digits in binary. If this is not the case 'core' does sometimes get strange values (e.g. '15274000', which are not the same every time the program is executed).
Strange enough, this only happens in gcc 4.6.3, but not in gcc 4.8
Where is the error in my code? Or is it even something in the compiler (which I highly doubt)?
just for now I'll declare another variable ('tmp') and use it as index in the call of fscanf:
printf("c=%d\n", c);
tmp=c;
fscanf(cmd[c], "%d", &core[tmp]);
I'm sorry if there are any spelling/grammar errors; English is not my native language.
In your for loops, change c<=CORECOUNT to c<CORECOUNT
You have declared arrays of size [CORECOUNT] thus when CORECOUNT is 3, you have elements [0], [1], [2]. Having c<=CORECOUNT in your for loop, you try to access element [3] so memory outside the array is being corrupted.

Resources