Read a txt file line by line without using fopen()

Read a txt file line by line without using fopen() - c

I need to read a text file line by line.
The problem is that in my code I cannot use fopen() and then fgets() to read the content of each line, because fopen() fails when I use a lot of threads (it seems that it gets overload because it is being opend so many times).

Most operating systems have limits on how many files a particular process can open at the same time. Since fopen() is "just" a wrapper on top of open(), it won't help to use a lower-level interface if this is your problem.
You can verify that this is indeed your problem by checking errno when fopen() fails, i.e. if it returns NULL. You sound as if you've already done this, but you weren't very specific. I would expect EMFILE if you're running into the per-process limit (see the open() manual page).
You need to investigate what your particular limits are and see if you can change them, or re-design your program so that you don't open as many files in parallel.

You can use open (man open (2)) and read (man read(2)).
What is the real problem with fopen()?

If open/read works and fopen/fread doesn't work, this may be because you're running out of lock structures. Using unlocked_getc in a loop (reading until you get a newline) might be the easiest option.

you can use getline()
You would need to specify the delimiter and a while or for loop to stop when reach EOF.
EDIT: you said you dont want to use fopen() and fgets() and so :
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while (! myfile.eof() )
{
getline (myfile,line);
if (line = something)
{
// read next line and print it... but how?
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}

Some folks recommended you to use open(2) and read(2) as replacements of fopen(3) and fgets(3), but since they are Unix (Linux and OS X) API, you cannot use them with Borland on Windows. What corresponds to open(2) and read(2) on Windows are Win32 APIs CreateFile and ReadFile
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365467.aspx
But I really doubt this is the way to go for you... You're trying to solve a problem which should never happen on a normal program. You need to find the root cause of your problem.

Related

How to make fprintf() writes immediately

One way to write into a file is by using fprintf(). However, this function does not write the results into a file immediately. It rather seems to write everything at once when the program is terminated or finished.
My question is the following: I have a program that takes very long time to run (4-5 hours for big dataset). During this time, I want to see the intermediate results so that I don't have to wait for 5 hours. My university uses Sun Grid Engine for job submission. As most of you know, you have to wait until your job finishes to see your final results. Thus, I want to be able to write the intermediate results into a text file and see the updated results as the program is processing (Similarly if I am using printf).
How can I modify fprintf() to write anything I want immediately to the target file?

You can use the fflush function after each write to flush the output buffer to disk.
fprintf(fileptr, "writing to file\n");
fflush(fileptr);
If you're on a POSIX system (i.e. Linux, BSD, etc), and you really want to be sure the file is written to disk, i.e. you want to flush the kernel buffers as well as the userspace buffers, also use fsync:
fsync(fileno(fileptr));
But fflush should be sufficient. Don't bother with fsync unless you find that you need to.

Maybe you can set FILE pointer _IONBF mode. Then you cloud not use fflush or fsync.
FILE *pFilePointor = fopen(...);
setvbuf(pFilePointor, NULL, _IONBF, 0);
fprintf(...)
fprintf(...)

fflush
This works on FILE *. For your case it looks more appropriate. Please note fflush(NULL) will update all opened files / streams and my be CPU intensive. You may like to use/avoid fflush(NULL) for performance reason.
fsync
This works on int descriptor. It not only updates file/stream, also metadata. It can work even in system crash / reboot cases as well. You can check man page for more details.
Personally I use fflush, and it works fine for me (in Ubuntu / Linux).

System Calls Function for prompting and getting user input

Ok, so I am writing a C program for my class, but I am only allowed to use system calls. Basically our program is making our on cp command, where we are taking two files as inputs from the command line and copying the first file and putting it into a second file. It is relatively simple and I have most of the code right or just about right with maybe some small fixes. However, one part of the program is if the destination file already exits, we need to prompt the user to ask if it should be overwritten or not, so I need to know how to get user input using a system call function, aka I can't use scanf, fgets, gets etc. The only function I can use from the standard library is printf basically. So I need to know what the system calls function is to get a user prompt. This part of the code is suppose to work like cp -i , if that helps anyone. Thank you in advance.

You could use system call read. To read from standard input, fd (file descriptor) is 0.
$ man read
READ(2)
Linux Programmer's Manual (2)
NAME
read - read from a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
DESCRIPTION
read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.
If count is zero, read() returns zero and has no other results. If count is greater than SSIZE_MAX, the result is unspecified.

fopen does not return

I used 'fopen' in a C program to open a file in readonly mode (r). But in my case I observed that fopen call does not return. It does not return NULL or valid pointer - execution gets blocked at fopen call. The patch of file is absolutely correct (I have already verified that) and there is no permission related issues. Can anybody please tell what could be the reason for this kind if behavior. Any kind of help is really appreciable. Is there anything related to gcc or glibc?
EDIT
Here is the sample code
printf("%s %d\n",__FUNCTION__,__LINE__);
if ((fp = fopen(argv[1], "r")) == NULL) {
printf("%s %d\n",__FUNCTION__,__LINE__);
return;
}
printf("%s %d\n",__FUNCTION__,__LINE__);
When I run this code, I only get the first print (before calling fopen) and after that program just halts. So fopen does not complete it's operation. The file is a simple configuration file with '.conf' extension and this file can be opened by all other means like vi, cat etc. There should not be any NFS related issue. Filesystem is ext3.
Thanks in advance,
Souvik

Here's a few reasons:
You've corrupted memory somewhere, and all bets are off as to what's happening (run your program through valgrind)
You're calling this code inside a signal handler, fopen() is not signal async safe, so really anything could happen (a deadlock due to the FILE* internal mutex is common though)
The file is a fifo , in which cases opening the file will block until someone opens the file at the other end(read/writing)
The file is on a stale NFS mount.
The file is a character/block special file with semantics that open blocks until something interesting happens,

So what? fopen is allowed to block until the file has been opened, or until it has been determined that access is denied. If you have a slow storage device, it is absolutely correct to wait until that becomes available. But that is an operating system issue then, not C's.

I notice you don't close the file if you open it successfully.
Is it possible you that you have run it before and killed it, and now you have a process out there which has the file open, and locked?
If so, then maybe fopen is waiting for the lock to be released.

Is it possible that you've redefined a symbol in the reserved namespace: either something beginning with two underscores, an underscore and a capital letter, or any of the standard C library functions? If so, that results in undefined behavior, and it's possible that fopen somehow ends up calling part of your code instead of the correct code in the standard library.
This question has a major "missing information" smell to it. I seriously doubt the code snippet in the question has the behavior OP has described when it appears by itself in main, and I wonder if OP hasn't done some bogus stuff he's not telling us about...

Check if writing to a file is finished

I am currently writing a plugin in C++. For my functionality I ask the API to save out a file. The API gives me a return value when the file is written... or so it seemd. The problem is, that this return value is returned too early so that I can not be sure, that the file is written completely.
Is there a possibility of checking the write completeness of the file independent of the api?

That's because the system does not write data to disk as soon as it's requested, but still returns. In C, you could use int fflush (FILE *stream), but I don't know how you'd do that in C++.

Not really, even if we re-read the file, to 'verify' that the write had taken place, you could still be looking at a kernel buffer.

You could compare the original buffer and the file you have written to byte for byte, but I think it is better to trust your operating system with the according fflush and fclose operations.

As mentioned you can use fflush(). you can call sync() / fsync() based upon whether you are using stream class or descriptor.

C fopen vs open

Is there any reason (other than syntactic ones) that you'd want to use
FILE *fdopen(int fd, const char *mode);
or
FILE *fopen(const char *path, const char *mode);
instead of
int open(const char *pathname, int flags, mode_t mode);
when using C in a Linux environment?

First, there is no particularly good reason to use fdopen if fopen is an option and open is the other possible choice. You shouldn't have used open to open the file in the first place if you want a FILE *. So including fdopen in that list is incorrect and confusing because it isn't very much like the others. I will now proceed to ignore it because the important distinction here is between a C standard FILE * and an OS-specific file descriptor.
There are four main reasons to use fopen instead of open.
fopen provides you with buffering IO that may turn out to be a lot faster than what you're doing with open.
fopen does line ending translation if the file is not opened in binary mode, which can be very helpful if your program is ever ported to a non-Unix environment (though the world appears to be converging on LF-only (except IETF text-based networking protocols like SMTP and HTTP and such)).
A FILE * gives you the ability to use fscanf and other stdio functions.
Your code may someday need to be ported to some other platform that only supports ANSI C and does not support the open function.
In my opinion the line ending translation more often gets in your way than helps you, and the parsing of fscanf is so weak that you inevitably end up tossing it out in favor of something more useful.
And most platforms that support C have an open function.
That leaves the buffering question. In places where you are mainly reading or writing a file sequentially, the buffering support is really helpful and a big speed improvement. But it can lead to some interesting problems in which data does not end up in the file when you expect it to be there. You have to remember to fclose or fflush at the appropriate times.
If you're doing seeks (aka fsetpos or fseek the second of which is slightly trickier to use in a standards compliant way), the usefulness of buffering quickly goes down.
Of course, my bias is that I tend to work with sockets a whole lot, and there the fact that you really want to be doing non-blocking IO (which FILE * totally fails to support in any reasonable way) with no buffering at all and often have complex parsing requirements really color my perceptions.

open() is a low-level os call. fdopen() converts an os-level file descriptor to the higher-level FILE-abstraction of the C language. fopen() calls open() in the background and gives you a FILE-pointer directly.
There are several advantages to using FILE-objects rather raw file descriptors, which includes greater ease of usage but also other technical advantages such as built-in buffering. Especially the buffering generally results in a sizeable performance advantage.

fopen vs open in C
1) fopen is a library function while open is a system call.
2) fopen provides buffered IO which is faster compare to open which is non buffered.
3) fopen is portable while open not portable (open is environment specific).
4) fopen returns a pointer to a FILE structure(FILE *); open returns an integer that identifies the file.
5) A FILE * gives you the ability to use fscanf and other stdio functions.

Unless you're part of the 0.1% of applications where using open is an actual performance benefit, there really is no good reason not to use fopen. As far as fdopen is concerned, if you aren't playing with file descriptors, you don't need that call.
Stick with fopen and its family of methods (fwrite, fread, fprintf, et al) and you'll be very satisfied. Just as importantly, other programmers will be satisfied with your code.

If you have a FILE *, you can use functions like fscanf, fprintf and fgets etc. If you have just the file descriptor, you have limited (but likely faster) input and output routines read, write etc.

open() is a system call and specific to Unix-based systems and it returns a file descriptor. You can write to a file descriptor using write() which is another system call.
fopen() is an ANSI C function call which returns a file pointer and it is portable to other OSes. We can write to a file pointer using fprintf.
In Unix:
You can get a file pointer from the file descriptor using:
fP = fdopen(fD, "a");
You can get a file descriptor from the file pointer using:
fD = fileno (fP);

Using open, read, write means you have to worry about signal interaptions.
If the call was interrupted by a signal handler the functions will return -1
and set errno to EINTR.
So the proper way to close a file would be
while (retval = close(fd), retval == -1 && ernno == EINTR) ;

I changed to open() from fopen() for my application, because fopen was causing double reads every time I ran fopen fgetc . Double reads were disruptive of what I was trying to accomplish. open() just seems to do what you ask of it.

open() will be called at the end of each of the fopen() family functions. open() is a system call and fopen() are provided by libraries as a wrapper functions for user easy of use

Depends also on what flags are required to open. With respect to usage for writing and reading (and portability) f* should be used, as argued above.
But if basically want to specify more than standard flags (like rw and append flags), you will have to use a platform specific API (like POSIX open) or a library that abstracts these details. The C-standard does not have any such flags.
For example you might want to open a file, only if it exits. If you don't specify the create flag the file must exist. If you add exclusive to create, it will only create the file if it does not exist. There are many more.
For example on Linux systems there is a LED interface exposed through sysfs. It exposes the brightness of the led through a file. Writing or reading a number as a string ranging from 0-255. Of course you don't want to create that file and only write to it if it exists. The cool thing now: Use fdopen to read/write this file using the standard calls.

opening a file using fopen
before we can read(or write) information from (to) a file on a disk we must open the file. to open the file we have called the function fopen.
1.firstly it searches on the disk the file to be opened.
2.then it loads the file from the disk into a place in memory called buffer.
3.it sets up a character pointer that points to the first character of the buffer.
this the way of behaviour of fopen function
there are some causes while buffering process,it may timedout. so while comparing fopen(high level i/o) to open (low level i/o) system call , and it is a faster more appropriate than fopen.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight