fopen Segfault error on large files

fopen Segfault error on large files - c

Hello everyone I'm new to C but I've recently been getting a weird segfault error with my fopen.
FILE* thefile = fopen(argv[1],"r");
The problem I've been having is that this code works on other smaller text files, but when I try with a file around 400MB it will give a sefault error. I've even tried hardcoding the filename but that doesn't work either. Could there be a problem in the rest of the code causing the segfault on this line?(doubt it but would like to know if its possible. It's just really odd that no errors come up for a small text file, but a large text file does get errors.
Thanks!
EDIT* didn't want to bog this down with too much but heres my code
int main(int argc, char *argv[])
{
if(argc != 3)
{
printf("[ERROR] Invalid number of arguments. Please pass 2 arguments, input_bound_file (column 1:probe, columne 2,...: samples) and desired_output_file_name");
exit(2);
}
int i,j;
rankAvg= g_hash_table_new(g_direct_hash, g_direct_equal);
rankCnt= g_hash_table_new(g_direct_hash, g_direct_equal);
table = g_hash_table_new_full (g_direct_hash, g_direct_equal, NULL, g_free);
getCounts(argv[1]);
printf("NC=: %i nR =: %i",nC,nR);
double srcMat[nR][nC];
int rankMat[nR][nC];
double normMat[nR][nC];
int sorts[nR][nC];
char line[100];
FILE* thefile = fopen(argv[1],"r");
printf("%s\n", strerror(errno));
FILE* output = fopen(argv[2],"w");
char* rownames[100];
i=0;j = 1;
int processedProbeNumber = 0;
int previousStamp = 0;
fgets(line,sizeof(line),thefile); //read file
while(fgets(line,sizeof(line),thefile) != NULL)
{
cleanSpace(line); //creates only one space between entries
char dest[100];
int len = strlen(line);
for(i = 0; i < len; i++)
{
if(line[i] == ' ') //read in rownames
{
rownames[j] = strncpy(dest, line, i);
dest[i] = '\0';
break;
}
}
char* token = strtok(line, " ");
token = strtok(NULL, " ");
i=1;
while(token!=NULL) //put words into array
{
rankMat[j][i]= abs(atof(token));
srcMat[j][i] = abs(atof(token));
token = strtok(NULL, " ");
i++;
}
// set the first column as a row id
j++;
processedProbeNumber++;
if( (processedProbeNumber-previousStamp) >= 10000)
{
previousStamp = processedProbeNumber;
printf("\tnumber of loaded lines = %i",processedProbeNumber);
}
}
printf("\ttotal number of loaded lines = %i \n",processedProbeNumber);
fclose(thefile);

How do you know that fopen is seg faulting? If you're simply sprinkling printf in the code, there's a chance the standard output isn't sent to the console before the error occurs. Obviously, if you're using a debugger you will know exactly where the segfault occured.
Looking at your code, nR and nC aren't defined so I don't know how big rankMat and srcMat are, but two thoughts crossed my mind while looking at your code:
You don't check i and j to ensure that they don't exceed nR and nC
If nR and nC are sufficiently large, that may mean you're using a very large amount of memory on the stack (srcMat, rankMat, normMat, and sorts are all huge). I don't know what environemnt you're running in, but some systems my not be able to handle huge stacks (Linux, Windows, etc. should be fine, but I do a lot of embedded work). I normally allocate very large structures in the heap (using malloc).

Generally files 2GB (2**31) or larger are the ones you can expect to get this on. This is because you are starting to run out of space in a 32-bit integer for things like file indices, and one bit is typically taken up for directions in relative offsets.
Supposedly on Linux you can get around this issue by using the following macro defintion:
#define _FILE_OFFSET_BITS 64
Some systems also provide a separate API call for large file opens (eg: fopen64() in MKS).

400Mb should not be considered a "large file" nowadays. I would reserve this for files larger than, say, 2Gb.
Also, just opening a file is very unlikely to give a segfault. WOuld you show us the code that access the file? I suspect some other factor is at play here.
UPDATE
I still can't tell exactly what's happening here. There are strange things that could be legitimate: you discard the first line and also the first token of each line.
You also assign to all the rownames[j] (except the first one) the address of dest which is a variable that has a block scope and whose associated memory is most likely to be reused outside that block. I hope you don't rely on rownames[j] to be any meaningful (but then why you have them?) and you never try to access them.
C99 allows you to mix variable declarations with actual instructions but I would suggest a little bit of cleaning to make the code clearer (also a better indentation would help).
From the symptoms I would look for some memory corruption somewhere. On small files (and hence less tokens) it may go unnoticed, but with larger files (and many more token) it fires a segfault.

Related

My C function recurses over system directories and segfaults over large inputs. How can I fix this?

I am trying to code a program that searches through a given directory and all sub-directories and files within it (and the sub-directories and files of the sub-directories and so on) and print outs all files that have a given set of permissions (int target_perm).
It works fine on smaller input, but returns Segmentation fault (core dumped) when it has to recurse over directories with large quantities of files. Valgrind reveals that this is due to stack overflow.
Is there any way I can fix my function so it can work with arbitrarily large directories?
void recurse_dir(struct stat *sb, struct dirent *de, DIR *dr, int target_perm, char* curr_path) {
if ((strcmp(".", de->d_name) != 0) && (strcmp("..", de->d_name) != 0)) {
char full_file_name[strlen(curr_path) + strlen(de->d_name)+1];
strcpy(full_file_name, curr_path);
strcpy(full_file_name + strlen(curr_path), de->d_name);
full_file_name[strlen(curr_path) + strlen(de->d_name)] = '\0';
if (stat(full_file_name, sb) < 0) {
fprintf(stderr, "Error: Cannot stat '%s'. %s\n", full_file_name, strerror(errno));
} else {
char* curr_perm_str = permission_string(sb);
int curr_perm = permission_string_to_bin(curr_perm_str);
free(curr_perm_str);
if ((curr_perm == target_perm )) {
printf("%s\n", full_file_name);
}
if (S_ISDIR(sb->st_mode)) {
DIR *dp;
struct dirent *dent;
struct stat b;
dp = opendir(full_file_name);
char new_path[PATH_MAX];
strcpy(new_path, full_file_name);
new_path[strlen(full_file_name)] ='/';
new_path[strlen(full_file_name)+1] ='\0';
if (dp != NULL) {
if ((dent = readdir(dp)) != NULL) {
recurse_dir(&b, dent, dp, target_perm, new_path);
}
closedir(dp);
} else {
fprintf(stderr, "Error: Cannot open directory '%s'. %s.\n", de->d_name, strerror(errno));
}
}
}
}
if ((de = readdir(dr)) != NULL) {
recurse_dir(sb, de, dr, target_perm, curr_path);
}
}

The problem here is not actually the recursion, although I've addressed that particular problem below. The problem is that your directory hierarchy probably includes symbolic links which make some directories aliases for one of their parents. An example from a Ubuntu install:
$ ls -ld /usr/bin/X11
lrwxrwxrwx 1 root root 1 Jan 25 2018 /usr/bin/X11 -> .
$ # Just for clarity:
$ readlink -f /usr/bin/X11
usr/bin
So once you encounter /usr/bin/X11, you enter into an infinite loop. This will rapidly exhaust the stack, but getting rid of recursion won't fix the problem, since the infinite loop is still an infinite loop.
What you need to do is either:
Avoid following symlinks, or
(better) Avoid following symlinks which resolve to directories, or
Keep track of all the directories you've encountered during the recursive scan, and check to make sure that any new directory hasn't already been examined.
The first two solutions are easier (you just need to check the filetype field in the struct stat) but they will fail to list some files you may be interested in (for example, when a symlink resolves to a directory outside of the directory structure you're examining.)
Once you fix the above problem, you might want to consider these suggestions:
In recursive functions, it's always a good idea to reduce the size of a stack frame to the minimum possible. The maximum recursion depth during a directory walk shouldn't be more than the maximum number of path segments in a filename (but see point 3 below), which shouldn't be too big a number. (On my system, the maximum depth of a file in the /usr hierarchy is 16, for example.) But the amount of stack used is the product of the size of the stack frame and the maximum recursion depth, so if your stack frames are large, then you'll have less recursion capacity.
In pursuit of the above goal, you should avoid the use of local arrays. For example, the declaration
char new_path[PATH_MAX];
adds PATH_MAX bytes to every stack frame (on my system, that's 4k). And that's in addition to the VLA full_file_name. For what it's worth, I compiled your function on a 64-bit Linux system, and found that the stack frame size is 4,280 bytes plus the size of the VLA (rounded to a multiple of 16 for alignment). That's probably not going to use more than 150Kb of stack, assuming a reasonable file hierarchy, which is within the limits. But that could increase significantly if your system has a larger value of PATH_MAX (which, in any case, cannot be relied on to be the maximum size of a filepath).
Good style dictates using dynamically-allocated memory for variables like these ones. But an even better approach would be to avoid using so many different buffers.
Parenthetically, you also need to be aware of the cost of strlen. In order to compute the length of a string, the strlen function needs to scan all its bytes looking for the NUL terminator. C strings, unlike string objects in higher-level languages, do not contain any indication of their length. So when you do this:
char full_file_name[strlen(curr_path) + strlen(de->d_name)+1];
strcpy(full_file_name, curr_path);
strcpy(full_file_name + strlen(curr_path), de->d_name);
full_file_name[strlen(curr_path) + strlen(de->d_name)] = '\0';
you end up scanning curr_path three times, and de->d_name twice, even though the lengths of these strings will not change. Rather than doing that, you should save the lengths in local variables so that they can be reused.
Alternatively, you could find a different way to concatenate the strings. One simple possibility which also dynamically allocates the memory, is:
char* full_file_name;
asprintf(&full_file_name, "%s%s", curr_path, de->d_name);
Note: You should check the return value of asprintf, both to verify that there was not a memory allocation problem, and also to save the length of full_file_name in case you need it later. asprintf is available on Linux and BSD derivatives, including OS X. But it's easy to implement using the Posix-standard snprintf and there are short, freely-reusable implementations available.)
You could use asprintf to compute new_path, as well, again removing the stack allocation of a possibly large array (and avoiding the buffer overflow if PATH_MAX is not big enough to contain the new filepath, which is definitely possible):
char* newpath;
asprintf("%s/", full_file_path);
But that's kind of silly. You're copying an entire filepath just in order to add a single character at the end. Better would be to leave space for the slash when you create full_file_path in the first place, and fill it in when you need it:
char* full_file_name;
int full_file_name_len = asprintf(&full_file_name, "%s%s\0",
curr_path, de->d_name);
if (full_file_name_len < 0) { /* handle error */ }
--full_file_name_len; /* Bytes written includes the \0 in the format */
/* Much later, instead of creating new_path: */
if (dp != NULL) {
full_file_name[full_file_name_len - 1] = '/';
if ((dent = readdir(dp)) != NULL) {
recurse_dir(&b, dent, dp, target_perm, full_file_name);
}
full_file_name[full_file_name_len - 1] = '\0';
closedir(dp);
}
There are other ways to do this. In fact, you really only need a single filepath buffer which you could pass down through the recursion. The recursion only appends to the filepath, so it's only necessary to restore the NUL byte at the end of every recursive call. However, in production code you would not want a fixed-length buffer, which might turn out to be too small, so you would need to implement some kind of reallocation strategy.
While I was trying to figure out the actual stack frame size for your function, which required compiling it, I ran into this code which relies on some undeclared functions:
char* curr_perm_str = permission_string(sb);
int curr_perm = permission_string_to_bin(curr_perm_str);
free(curr_perm_str);
Making a guess about what these two functions do, I think you could safely replace the above with
int curr_perm = sb->st_mode & (S_IRWXU|S_IRWXG|S_IRWXO);
Or perhaps
int curr_perm = sb->st_mode
& (S_ISUID|S_ISGID|S_ISVTX|S_IRWXU|S_IRWXG|S_IRWXO);
if you want to include the setuid and sticky bits.

How to properly call an executable in C program runtime?

I have a C application whose one of the jobs is to call an executable file. That file has performance measurement routines inserted during compilation, at the level of intermediate code. It can measure time or L1/L2/L3 cache misses. In other words, I have modified the LLVM compiler to insert a call to that function and print the result to stdout for any compiled program.
Now, like I mentioned at the beginning, I would like to execute the program (with this result returned to stdout) from a separate C application and save that result. The way I'm doing it right now is:
void executeProgram(const char* filename, char* time) {
printf("Executing selected program %s...\n", filename);
char filePath[100] = "/home/michal/thesis/Drafts/output/";
strcat(filePath, filename);
FILE *fp;
fp = popen(filePath, "r");
char str[30];
if (fp == NULL) {
printf("Failed to run command\n" );
exit(1);
}
while (fgets(str, sizeof(str) - 1, fp) != NULL) {
strcat(time, str);
}
pclose(fp);
}
where filename is the name of the compiled executable to run. The result is saved to time string.
The problem is, that the results I'm getting are pretty different and unstable compared to those that are returned by simply running the executable 'by hand' from the command line (./test16). They look like:
231425
229958
230450
228534
230033
230566
231059
232016
230733
236017
213179
90515
229775
213351
229316
231642
230875
So they're mostly around 230000 us, with some occasional drops. The same executable, run from within the other application, produces:
97097
88706
91418
97970
97972
94597
95846
95139
91070
95918
107006
89988
90882
91986
90997
88824
129136
94976
102191
94400
95215
95061
92115
96319
114091
95230
114500
95533
102294
108473
105730
Note that it is the same executable that's being called. Yet the measured time it returns is different. The program that is being measured consists of a function call to a simple nested loop, accessing array elements. Here is the code:
#include "test.h"
#include <stdio.h>
float data[1000][1000] = {0};
void test(void)
{
int i0, i1;
int N = 80;
float mean[1000];
for (i0 = 0; i0 < N; i0++)
{
mean[i0] = 0.0;
for (i1 = 0; i1 < N; i1++) {
mean[i0] += data[i0][i1];
}
mean[i0] /= 1000;
}
}
I'm suspecting that there is something wrong in the way the program is invoked in the code, maybe the process should be forked or something? Any ideas?

You didnt specify where exactly your time measuring subroutines are inserted, so all I can really offer is guesswork.
The results seem to hint to the exact opposite - running the application from shell is slower, so I wouldn't worry about the way you're starting the process from the C code. My guess would be - when you run your program from shell, it's the terminal that's slowing you down. When you're running the process from your C code, you pipe the output back to your 'starter' application, which is already waiting for input on the pipe.
As a side note, consider switching from strcat to something safer, like strncat.

Why is my Memory dumping soo slow?

The idea behind this program is to simply access the ram and download the data from it to a txt file.
Later Ill convert the txt file to jpeg and hopefully it will be readable .
However when I try and read from the RAM using NEW[] it takes waaaaaay to long to actually copy all the values into the file?
Isnt it suppose to be really fast? I mean I save pictures everyday and it doesn't even take a second?
Is there some other method I can use to dump memory to a file?
#include <stdio.h>
#include <stdlib.h>
#include <hw/pci.h>
#include <hw/inout.h>
#include <sys/mman.h>
main()
{
FILE *fp;
fp = fopen ("test.txt","w+d");
int NumberOfPciCards = 3;
struct pci_dev_info info[NumberOfPciCards];
void *PciDeviceHandler1,*PciDeviceHandler2,*PciDeviceHandler3;
uint32_t *Buffer;
int *BusNumb; //int Buffer;
uint32_t counter =0;
int i;
int r;
int y;
volatile uint32_t *NEW,*NEW2;
uintptr_t iobase;
volatile uint32_t *regbase;
NEW = (uint32_t *)malloc(sizeof(uint32_t));
NEW2 = (uint32_t *)malloc(sizeof(uint32_t));
Buffer = (uint32_t *)malloc(sizeof(uint32_t));
BusNumb = (int*)malloc(sizeof(int));
printf ("\n 1");
for (r=0;r<NumberOfPciCards;r++)
{
memset(&info[r], 0, sizeof(info[r]));
}
printf ("\n 2");
//Here the attach takes place.
for (r=0;r<NumberOfPciCards;r++)
{
(pci_attach(r) < 0) ? FuncPrint(1,r) : FuncPrint(0,r);
}
printf ("\n 3");
info[0].VendorId = 0x8086; //Wont be using this one
info[0].DeviceId = 0x3582; //Or this one
info[1].VendorId = 0x10B5; //WIll only be using this one PLX 9054 chip
info[1].DeviceId = 0x9054; //Also PLX 9054
info[2].VendorId = 0x8086; //Not used
info[2].DeviceId = 0x24cb; //Not used
printf ("\n 4");
//I attached the device and give it a handler and set some setting.
if ((PciDeviceHandler1 = pci_attach_device(0,PCI_SHARE|PCI_INIT_ALL, 0, &info[1])) == 0)
{
perror("pci_attach_device fail");
exit(EXIT_FAILURE);
}
for (i = 0; i < 6; i++)
//This just prints out some details of the card.
{
if (info[1].BaseAddressSize[i] > 0)
printf("Aperture %d: "
"Base 0x%llx Length %d bytes Type %s\n", i,
PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? PCI_MEM_ADDR(info[1].CpuBaseAddress[i]) : PCI_IO_ADDR(info[1].CpuBaseAddress[i]),
info[1].BaseAddressSize[i],PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? "MEM" : "IO");
}
printf("\nEnd of Device random info dump---\n");
printf("\nNEWs Address : %d\n",*(int*)NEW);
//Not sure if this is a legitimate way of memory allocation but I cant see to read the ram any other way.
NEW = mmap_device_memory(NULL, info[1].BaseAddressSize[3],PROT_READ|PROT_WRITE|PROT_NOCACHE, 0,info[1].CpuBaseAddress[3]);
//Here is where things are starting to get messy and REALLY long to just run through all the ram and dump it.
//Is there some other way I can dump the data in the ram into a file?
while (counter!=info[1].BaseAddressSize[3])
{
fprintf(fp, "%x",NEW[counter]);
counter++;
}
fclose(fp);
printf("0x%x",*Buffer);
}

A few issues that I can see:
You are writing blocks of 4 bytes - that's quite inefficient. The stream buffering in the C library may help with that to a degree, but using larger blocks would still be more efficient.
Even worse, you are writing out the memory dump in hexadecimal notation, rather than the bytes themselves. That conversion is very CPU-intensive, not to mention that the size of the output is essentially doubled. You would be better off writing raw binary data using e.g. fwrite().
Depending on the specifics of your system (is this on QNX?), reading from I/O-mapped memory may be slower than reading directly from physical memory, especially if your PCI device has to act as a relay. What exactly is it that you are doing?
In any case I would suggest using a profiler to actually find out where your program is spending most of its time. Even a rudimentary system monitor would allow you to determine if your program is CPU-bound or I/O-bound.
As it is, "waaaaaay to long" is hardly a valid measurement. How much data is being copied? How long does it take? Where is the output file located?
P.S.: I also have some concerns w.r.t. what you are trying to do, but that is slightly off-topic for this question...

For fastest speed: write the data in binary form and use the open() / write() / close() API-s. Since your data is already available in a contiguous block of (virtual) memory it is a waste to copy it to a temporary buffer (used by the fwrite(), fprintf(), etc. API-s).
The code using write() will be similar to:
int fd = open("filename.bin", O_RDWR|O_CREAT, S_IRWXU);
write(fd, (void*)NEW, 4*info[1].BaseAddressSize[3]);
close(fd);
You will need to add error handling and make sure that the buffer size is specified correctly.
To reiterate, you get the speed-up from:
avoiding the conversion from binary to ASCII (as pointed out by others above)
avoiding many calls to libc
reducing the number of system-calls (from inside libc)
eliminating the overhead of copying data to a temporary buffer inside the fwrite()/fprintf() and related functions (buffering would be useful if your data arrived in small chunks, including the case of converting to ASCII in 4 byte units)
I intentionally ignore commenting on other parts of your code as it is apparently not intended to be production quality yet and your question is focused on how to speed up writing data to a file.

Forking with command line arguments

I am building a Linux Shell, and my current headache is passing command line arguments to forked/exec'ed programs and system functions.
Currently all input is tokenized on spaces and new lines, in a global variable char * parsed_arguments. For example, the input dir /usa/folderb would be tokenized as:
parsed_arguments[0] = dir
parsed_arguments[1] = /usa/folderb
parsed_arguments tokenizes everything perfectly; My issue now is that i wish to only take a subset of parsed_arguments, which excludes the command/ first argument/path to executable to run in the shell, and store them in a new array, called passed_arguments.
so in the previous example dir /usa/folderb
parsed_arguments[0] = dir
parsed_arguments[1] = /usa/folderb
passed_arguments[0] = /usa/folderb
passed_arguments[1] = etc....
Currently I am not having any luck with this so I'm hoping someone could help me with this. Here is some code of what I have working so far:
How I'm trying to copy arguments:
void command_Line()
{
int i = 1;
for(i;parsed_arguments[i]!=NULL;i++)
printf("%s",parsed_arguments[i]);
}
Function to read commands:
void readCommand(char newcommand[]){
printf("readCommand: %s\n", newcommand);
//parsed_arguments = (char* malloc(MAX_ARGS));
// strcpy(newcommand,inputstring);
parsed = parsed_arguments;
*parsed++ = strtok(newcommand,SEPARATORS); // tokenize input
while ((*parsed++ = strtok(NULL,SEPARATORS)))
//printf("test1\n"); // last entry will be NULL
//passed_arguments=parsed_arguments[1];
if(parsed[0]){
char *initial_command =parsed[0];
parsed= parsed_arguments;
while (*parsed) fprintf(stdout,"%s\n ",*parsed++);
// free (parsed);
// free(parsed_arguments);
}//end of if
command_Line();
}//end of ReadCommand
Forking function:
else if(strstr(parsed_arguments[0],"./")!=NULL)
{
int pid;
switch(pid=fork()){
case -1:
printf("Fork error, aborting\n");
abort();
case 0:
execv(parsed_arguments[0],passed_arguments);
}
}
This is what my shell currently outputs. The first time I run it, it outputs something close to what I want, but every subsequent call breaks the program. In addition, each additional call appends the parsed arguments to the output.
This is what the original shell produces. Again it's close to what I want, but not quite. I want to omit the command (i.e. "./testline").

Your testline program is a sensible one to have in your toolbox; I have a similar program that I call al (for Argument List) that prints its arguments, one per line. It doesn't print argv[0] though (I know it is called al). You can easily arrange for your testline to skip argv[0] too. Note that Unix convention is that argv[0] is the name of the program; you should not try to change that (you'll be fighting against the entire system).
#include <stdio.h>
int main(int argc, char **argv)
{
while (*++argv != 0)
puts(*argv);
return 0;
}
Your function command_line() is also reasonable except that it relies unnecessarily on global variables. Think of global variables as a nasty smell (H2S, for example); avoid them when you can. It should be more like:
void command_Line(char *argv[])
{
for (int i = 1; argv[i] != NULL; i++)
printf("<<%s>>\n", argv[i]);
}
If you're stuck with C89, you'll need to declare int i; outside the loop and use just for (i = 1; ...) in the loop control. Note that the printing here separates each argument on a line on its own, and encloses it in marker characters (<< and >> — change to suit your whims and prejudices). It would be fine to skip the newline in the loop (maybe use a space instead), and then add a newline after the loop (putchar('\n');). This makes a better, more nearly general purpose debug routine. (When I write a 'dump' function, I usually use void dump_argv(FILE *fp, const char *tag, char *argv[]) so that I can print to standard error or standard output, and include a tag string to identify where the dump is written.)
Unfortunately, given the fragmentary nature of your readCommand() function, it is not possible to coherently critique it. The commented out lines are enough to elicit concern, but without the actual code you're running, we can't guess what problems or mistakes you're making. As shown, it is equivalent to:
void readCommand(char newcommand[])
{
printf("readCommand: %s\n", newcommand);
parsed = parsed_arguments;
*parsed++ = strtok(newcommand, SEPARATORS);
while ((*parsed++ = strtok(NULL, SEPARATORS)) != 0)
{
if (parsed[0])
{
char *initial_command = parsed[0];
parsed = parsed_arguments;
while (*parsed)
fprintf(stdout, "%s\n ", *parsed++);
}
}
command_Line();
}
The variables parsed and parsed_arguments are both globals and the variable initial_command is set but not used (aka 'pointless'). The if (parsed[0]) test is not safe; you incremented the pointer in the previous line, so it is pointing at indeterminate memory.
Superficially, judging from the screen shots, you are not resetting the parsed_arguments[] and/or passed_arguments[] arrays correctly on the second use; it might be an index that is not being set to zero. Without knowing how the data is allocated, it is hard to know what you might be doing wrong.
I recommend closing this question, going back to your system and producing a minimal SSCCE. It should be under about 100 lines; it need not do the execv() (or fork()), but should print the commands to be executed using a variant of the command_Line() function above. If this answer prevents you deleting (closing) this question, then edit it with your SSCCE code, and notify me with a comment to this answer so I get to see you've done that.

When compiling a line counting program in Solaris an extra three lines are being as opposed to MacOSX

I wrote the following code under MacOSX in XCode. When moving the code over to a Solaris Server three extra lines are being counted and I can not figure out why.
#include <stdio.h>
#define MAXLINE 281 // 281 is a prime number!!
char words[4][MAXLINE]; // words array to hold menu items
char displayfilename[4][MAXLINE]; //filename array to hold filename for display function
char exit_choice[4][MAXLINE]; //for user interaction and end of each function
int i; //standard array variable
int loop = 1; //control variable for my loop
int main()
{
printf("Enter filename: ");
scanf("%s", displayfilename[i]);
FILE *fp;
int clo_c , clo_nc, clo_nlines;
fp = fopen(*displayfilename,"r"); // open for reading */
if ( fp == NULL )
{
printf("Cannot open for reading!\n");
}
clo_c = getc( fp ) ;
while ( clo_c != EOF )
{
if (clo_c == '\n')
clo_nlines++ ;
clo_nc++ ;
clo_c = getc ( fp );
}
fclose( fp );
if ( clo_nc != 0 )
{
printf("There are %d lines in this file.\n", clo_nlines);
}
else
printf("File is empty, exiting!\n");
}
Can anyone explain to me Solaris is adding three to clo_nlines?

You didn't initialize clo_nlines - therefore you got 'undefined behavior'.
Declaring a variable in C doesn't set its value to anything - it just allocates some memory for that variable, and whatever junk happens to be in that bit (well, not bit, but you get the idea >.>) of memory is what the variable starts out as.

There are a couple of issues here.
First one, from a bulletproof-code point of view, is #Zilchonum's point, that clo_nc and clo_nlines aren't being initialized. In old C, that means you don't have any idea what's in them to start with and so don't have any idea what you'll end with.
However, later C standards define that uninitialized variables are set to 0, so that's probably not it unless you're setting the compiler to earlier behavior with flags.
More likely is Auri's point, that different machines use different newline standards. However, I believe that Mac OS/X uses a single character for newline, just as Solaris does.
Which brings us to the file itself. Try using oc -c to see what's actually in the file. my guess is that you'll find the file on one system is \r\n newlines, but on the other system has \n newlines, probably as a result of the settings of the file transfer program you used. It has probably converted to UNIX format on one but not the other.

Did you make sure you're not counting crlf as two linefeeds?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

fopen Segfault error on large files - c

Related

My C function recurses over system directories and segfaults over large inputs. How can I fix this?

How to properly call an executable in C program runtime?

Why is my Memory dumping soo slow?

Forking with command line arguments

When compiling a line counting program in Solaris an extra three lines are being as opposed to MacOSX

Categories

Resources