I want to wget in C and I am using popen to do so.
FILE *stdoutPtr = popen(command,"r");
fseek(stdoutPtr, 0L, SEEK_END);
long pos = ftell(stdoutPtr);
Here I want to get the size of stdout so that I can initialize buffer. But pos variable is always -1.
pos is supposed to tell me current position of read pointer.
Please help....
The FILE returned by popen is not a regular file, but a thing called a pipe. (That's what the p stands for.) Data flows through the pipe from the stdout of the command you invoked to your program. Because it's a communications channel and not a file on disk, a pipe does not have a definite size, and you cannot seek to different locations in the data stream. Therefore, fseek and ftell will both fail when applied to this FILE, and that's what a -1 return value means. If you inspect errno immediately after the call to ftell you will discover that it has the value ESPIPE, which means "You can't do that to a pipe."
If you're trying to read all of the output from the command into a single char* buffer, the only way to do it is to repeatedly call one of the read functions until it indicates end-of-file, and enlarge the buffer as necessary using realloc. If the output is potentially large, it would be better to change your program to process the data in chunks, if there's any way to do that.
You can't use pipes that way. For one thing, the information would be obsolete the instant you got it, since more data could be written to the pipe by then. You have to use a different allocation strategy.
The most common strategy is to allocate a fixed-size buffer and just keep reading until you reach end of file. You can process the data as you read it, if you like.
If you need to process the data all in one chunk, you can allocate a large buffer and start reading into that. If it does get full, then use realloc to enlarge the buffer and keep going until you have it all.
A common pattern is to keep a buffer pointer, a buffer count, and an allocation size. Initially, set the allocation size to, say, 64K. Set the count to zero. Allocate a 64K buffer. Read up to size-count bytes into the buffer. If you hit EOF, stop. If the buffer is nearly full, bump up the allocation size by 50% and realloc the buffer.
Related
I'm trying to parse some code which works with O_DIRECT files.
ssize_t written = write(fd, buf, size);
What is confusing is that size can be lower than the sector size of the disk, thus does write(fd,buf,size) write the entirety of buf to fd or only the first size bytes of buf to disk?
Without O_DIRECT this is simply the second case, but I can't find any documentation about in the case of O_DIRECT, and from what I've read it will still send buf to the disk, so the only thing I can think of is that it also tells the disk to only write size...
[...] does write(fd,buf,size) write the entirety of buf to fd or only the first size bytes of buf to disk?
If the write() call is successful it means all of the requested size data has been written but the question becomes: written to where? You have to remember that opening a file with O_DIRECT is sending more of a hint that you want to bypass OS caches rather than order. The filesystem could choose to simply write your I/O through the page cache either because that's what it always does or because you broke the rules regarding alignment and using the page cache is a way of quietly fixing up your mistake. The only way to know this would be to investigate the data path when the I/O was issued.
I'm opening a file using CreateFile() with the flags FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH for several reasons, and I've noticed a strange behavior:
Since for using those flags we have to allocate memory aligned to the sector size, let's say the sector size is 512.
Now, if I allocate 512 bytes with _aligned_malloc() and I read from the file, everything works fine if the file size is exactly a multiple of the sector size, let's say 512*4, or 2048. I read pieces of 512 bytes and the last piece makes ReadFile() to return the EOF code, that is, to return FALSE and GetLastError() set as ERROR_HANDLE_EOF.
The problem arise when the file size it not aligned to the sector size, that is, the file's size is let's say 2048+13, or 2061 bytes.
I can successfully read the first 4 512-sized chunks from the file, and a 5th call to ReadFile() lets me to read the latest 13 surplus bytes from the file, but this is the strange thing: in such a case ReadFile() doesn't return the EOF code! Even if I told to ReadFile() to read 512 bytes, and it read only 13 bytes (so it surpassed the end of file), it doesn't tell me that, and returns just 13 bytes read, without no other further information.
So, when I read the last 13 bytes and my loop is set to read until EOF, it will call ReadFile() again for a 6th time, causing an error: ERROR_INVALID_PARAMETER and I guess this is correct, because I'm trying to read after I had surpassed the end of file!
My question is: is this a normal behavior or am I doing something wrong? When using non-buffered I/O, I should expect to not having EOF code when I read the last non-sector-aligned chunk of file? Or there is another way to do that?
How I can understand that I've just passed the EOF?
I guess that I could solve this problem by modifying the loop: instead of reading until EOF, I could read until EOF OR until the actually returned bytes are less than the requested bytes for the reading. Is this a correct assumption?
NOTE: this does not happen when using files with normal flags, it only happens when I use FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH.
NOTE 2: I'm using I/O Completion Ports for reading files, but I guess this happens also without using them, by just using blocking I/O.
EOF is surprisingly hard. Even C's feof function is often misunderstood.
Basically, you get ERROR_HANDLE_EOF in the first case to distinguish the "512 bytes read, more to read" and "512 bytes read, nothing left" cases.
In the seconds case, this is not needed. "512 bytes requested, 13 bytes read, no error" already means that you're at EOF. Any other reason for a partial read would have been an error.
I am writing a program that will generate hashes for files or for a string from stdin. However, fseek and ftell won't work on stdin, so I can't reserve a buffer, rewind stdin to the beginning, and slurp the entire stream. So is there any easy way to do this? Or should I just read n characters each time and send it to the state updating function for each hash?
If you believe that stdin is always textual, you could read entire lines from it using getline(3).
If you want to handle arbitrary input (including non-textual stdin) you should use fread(3) in a loop on some rather big block (e.g. 4K or 16K bytes) and take into account partial reads. The block may contain null bytes. You will update the state inside the loop.
You could have partial reads in the middle, e.g. if you are reading from a pipe (with popen(3) ....) on Linux...
I am struggling to know the difference between these functions. Which one of them can be used if i want to read one character at a time.
fread()
read()
getc()
Depending on how you want to do it you can use any of those functions.
The easier to use would probably be fgetc().
fread() : read a block of data from a stream (documentation)
read() : posix implementation of fread() (documentation)
getc() : get a character from a stream (documentation). Please consider using fgetc() (doc)instead since it's kind of saffer.
fread() is a standard C function for reading blocks of binary data from a file.
read() is a POSIX function for doing the same.
getc() is a standard C function (a macro, actually) for reading a single character from a file - i.e., it's what you are looking for.
In addition to the other answers, also note that read is unbuffered method to read from a file. fread provides an internal buffer and reading is buffered. The buffer size is determined by you. Also each time you call read a system call occurs which reads the amount of bytes you told it to. Where as with fread it will read a chunk in the internal buffer and return you only the bytes you need. For each call on fread it will first check if it can provide you with more data from the buffer, if not it makes a system call (read) and gets a chunk more data and returns you only the portion you wanted.
Also read directly handles the file descriptor number, where fread needs the file to be opened as a FILE pointer.
The answer depends on what you mean by "one character at a time".
If you want to ensure that only one character is consumed from the underlying file descriptor (which may refer to a non-seekable object like a pipe, socket, or terminal device) then the only solution is to use read with a length of 1. If you use strace (or similar) to monitor a shell script using the shell command read, you'll see that it repeatedly calls read with a length of 1. Otherwise it would risk reading too many bytes (past the newline it's looking for) and having subsequent processes fail to see the data on the "next line".
On the other hand, if the only program that should be performing further reads is your program itself, fread or getc will work just fine. Note that getc should be a lot faster than fread if you're just reading a single byte.
What I need to do is use the read function from unistd.h to read a file
line by line. I have this at the moment:
n = read(fd, str, size);
However, this reads to the end of the file, or up to size number of bytes.
Is there a way that I can make it read one line at a time, stopping at a newline?
The lines are all of variable length.
I am allowed only these two header files:
#include <unistd.h>
#include <fcntl.h>
The point of the exercise is to read in a file line by line, and
output each line as it's read in. Basically, to mimic the fgets()
and fputs() functions.
You can read character by character into a buffer and check for the linebreak symbols (\r\n for Windows and \n for Unix systems).
You'll want to create a buffer twice the length of your longest line you'll support, and you'll need to keep track of your buffer state.
Basically, each time you're called for a new line you'll scan from your current buffer position looking for an end-of-line marker. If you find one, good, that's your line. Update your buffer pointers and return.
If you hit your maxlength then you return a truncated line and change your state to discard. Next time you're called you need to discard up to the next end of line, and then enter your normal read state.
If you hit the end of what you've read in, then you need to read in another maxline chars, wrapping to the start of the buffer if you hit the bottom (ie, you may need to make two read calls) and then continue scanning.
All of the above assumes you can set a max line length. If you can't then you have to work with dynamic memory and worry about what happens if a buffer malloc fails. Also, you'll need to always check the results of the read in case you've hit the end of the file while reading into your buffer.
Unfortunately the read function isn't really suitable for this sort of input. Assuming this is some sort of artificial requirement from interview/homework/exercise, you can attempt to simulate line-based input by reading the file in chunks and splitting it on the newline character yourself, maintaining state in some way between calls. You can get away with a static position indicator if you carefully document the function's use.
This is a good question, but allowing only the read function doesn't help! :P
Loop read calls to get a fixed number of bytes, and search the '\n' character, then return a part of the string (untill '\n'), and stores the rest (except '\n') to prepend to the next character file chunk.
Use dynamic memory.
Greater the size of the buffer, less read calls used (which is a system call, so no cheap but nowadays there are preemptive kernels).
...
Or simply fix a maximum line length, and use fgets, if you need to be quick...
If you need to read exactly 1 line (and not overstep) using read(), the only generally-applicable way to do that is by reading 1 byte at a time and looping until you get a newline byte. However, if your file descriptor refers to a terminal and it's in the default (canonical) mode, read will wait for a newline and return less than the requested size as soon as a line is available. It may however return more than one line, if data arrives very quickly, or less than 1 line if your program's buffer or the internal terminal buffer is shorter than the line length.
Unless you really need to avoid overstep (which is sometimes important, if you want another process/program to inherit the file descriptor and be able to pick up reading where you left off), I would suggest using stdio functions or your own buffering system. Using read for line-based or byte-by-byte IO is very painful and hard to get right.
Well, it will read line-by-line from a terminal.
Some choices you have are:
Write a function that uses read when it runs out of data but only returns one line at a time to the caller
Use the function in the library that does exactly that: fgets().
Read only one byte at a time, so you don't go too far.
If you open the file in text mode then Windows "\r\n" will be silently translated to "\n" as the file is read.
If you are on Unix you can use the non-standard1 gcc 'getline()' function.
1 The getline() function is standard in POSIX 2008.
Convert file descriptor to FILE pointer.
FILE* fp = fdopen(fd, "r");
Then you can use getline().