C: lseek() related question - c

I want to write some bogus text in a file ("helloworld" text in a file called helloworld), but not starting from the beginning. I was thinking to lseek() function.
If I use the following code (edited):
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#define fname "helloworld"
#define buf_size 16
int main(){
char buffer[buf_size];
int fildes,
nbytes;
off_t ret;
fildes = open(fname, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);
if(fildes < 0){
printf("\nCannot create file + trunc file.\n");
}
//modify offset
if((ret = lseek(fildes, (off_t) 10, SEEK_END)) < (off_t) 0){
fprintf(stdout, "\nCannot modify offset.\n");
}
printf("ret = %d\n", (int)ret);
if(write(fildes, fname, 10) < 0){
fprintf(stdout, "\nWrite failed.\n");
}
close(fildes);
return (0);
}
, it compiles well and it runs without any apparent errors.
Still if i :
cat helloworld
The output is not what I expected, but:
helloworld
Can
Where is "Can" comming from, and where are my empty spaces ?
Should i expect for "zeros" instead of spaces ? If i try to open helloworld with gedit, an error occurs, complaining that the file character encoding is unknown.
LATER EDIT:
After I edited my program with the right buffer for writing, and then compile / run again, the "helloworld" file still cannot be opened with gedit.strong text
LATER EDIT
I understand the issue now. I've added to the code the following:
fildes = open(fname, O_RDONLY);
if(fildes < 0){
printf("\nCannot open file.\n");
}
while((nbytes = read(fildes, c, 1)) == 1){
printf("%d ", (int)*c);
}
And now the output is:
0 0 0 0 0 0 0 0 0 0 104 101 108 108 111 119 111 114 108 100
My problem was that i was expecting spaces (32) instead of zeros (0).

In this function call, write(fildes, fname, buf_size), fname has 10 characters (plus a trailing '\0' character, but you're telling the function to write out 16 bytes. Who knows what in the memory locations after the fname string.
Also, I'm not sure what you mean by "where are my empty spaces?".

Apart from expecting zeros to equal spaces, the original problem was indeed writing more than the length of the "helloworld" string. To avoid such a problem, I suggest letting the compiler calculate the length of your constant strings for you:
write(fildes, fname, sizeof(fname) - 1)
The - 1 is due to the NUL character (zero, \0) that is used to terminate C-style strings, and sizeof simply returning the size of the array that holds the string. Due to this you cannot use sizeof to calculate the actual length of a string at runtime, but it works fine for compile-time constants.
The "Can" you saw in your original test was almost certainly the beginning of one of the "\nCannot" strings in your code; after writing the 11 bytes in "helloworld\0" you continued to write the remaining bytes from whatever was following it in memory, which turned out to be the next string constant. (The question has now been amended to write 10 bytes, but the originally posted version wrote 16.)
The presence of NUL characters (= zero, '\0') in a text file may indeed cause certain (but not all) text editors to consider the file binary data instead of text, and possibly refuse to open it. A text file should contain just text, not control characters.

Your buf_size doesn't match the length of fname. It's reading past the buffer, and therefore getting more or less random bytes that just happened to sit after the string in memory.

Related

File reading problem (using GCC on Linux and Cygwin)

I am using the following program to find out the size of a file and allocate memory dynamically. This program has to be multi-platform functional.
But when I run the program on Linux machine and on a Windows machine using Cygwin, I see different outputs — why?
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
/*
Observation on Linux
When reading text file remember
the content in the text file if arranged in lines like below:
ABCD
EFGH
the size of file is 12, its because each line is ended by \r\n, so add 2 bytes for every line we read.
*/
off_t fsize(char *file) {
struct stat filestat;
if (stat(file, &filestat) == 0) {
return filestat.st_size;
}
return 0;
}
void ReadInfoFromFile(char *path)
{
FILE *fp;
unsigned int size;
char *buffer = NULL;
unsigned int start;
unsigned int buff_size =0;
char ch;
int noc =0;
fp = fopen(path,"r");
start = ftell(fp);
fseek(fp,0,SEEK_END);
size = ftell(fp);
rewind(fp);
printf("file size = %u\n", size);
buffer = (char*) malloc(sizeof(char) * (size + 1) );
if(!buffer) {
printf("malloc failed for buffer \n");
return;
}
buff_size = fread(buffer,sizeof(char),size,fp);
printf(" buff_size = %u\n", buff_size);
if(buff_size == size)
printf("%s \n", buffer);
else
printf("problem in file size \n %s \n", buffer);
fclose(fp);
}
int main(int argc, char *argv[])
{
printf(" using ftell etc..\n");
ReadInfoFromFile(argv[1]);
printf(" using stat\n");
printf("File size = %u\n", fsize(argv[1]));
return 0;
}
The problem is fread reading different sizes depends on compiler.
I have not tried on proper windows compiler yet.
But what would be the portable way to read contents from file?
Output on Linux:
using ftell etc..
file size = 34
buff_size = 34
ABCDEGFH
IJKLMNOP
QRSTUVWX
YX
using stat
File size = 34
Output on Cygwin:
using ftell etc..
file size = 34
buff_size = 30
problem in file size
ABCDEGFH
IJKLMNOP
QRSTUVWX
YX
_ROAMINGPRã9œw
using stat
File size = 34
Transferring comments into an answer.
The trouble is probably that on Windows, the text file has CRLF line endings ("\r\n"). The input processing maps those to "\n" to match Unix because you use "r" in the open mode (open text file for reading) instead of "rb" (open binary file for reading). This leads to a difference in the byte counts — ftell() reports the bytes including the '\r' characters, but fread() doesn't count them.
But how can I allocate memory, if I don't know the actual size? Even in this case also the return value of fread is 30/34, but my content is only of 26 bytes.
Define your content — there's a newline or CRLF at the end of each of 4 lines. When the file is opened on Windows (Cygwin) in text mode (no b), then you will receive 3 lines of 9 bytes (8 letters and a newline) plus one line with 3 bytes (2 letters and a newline), for 30 bytes in total. Compared to the 34 that's reported by ftell() or stat(), the difference is the 4 CR characters ('\r') that are not returned. If you opened the file as a binary file ("rb"), then you'd get all 34 characters — 3 lines with 10 bytes and 1 line with 4 bytes.
The good news is that the size reported by stat() or ftell() is bigger than the final number of bytes returned, so allocating enough space is not too hard. It might become wasteful if you have a gigabyte size file with every line containing 1 byte of data and a CRLF. Then you'd "waste" (not use) one third of the allocated space. You could always shrink the allocation to the required size with realloc().
Note that there is no difference between text and binary mode on Unix-like (POSIX) systems such as Linux. It does not do mapping of CRLF to NL line endings. If the file is copied from Windows to Linux without mapping the line endings, you will get CRLF at the end of each line on Linux If the file is copied and the line endings are mapped, you'll get a smaller size on Linux than under Cygwin. (Using "rb" on Linux does no harm; it doesn't do any good either. Using "rb" on Windows/Cygwin could be important; it depends on the behaviour you want.)
See also the C11 standard §7.21.2 Streams and also §7.21.3 Files.

Getting characters past a certain point in a file in C

I want to take all characters past location 900 from a file called WWW, and put all of these in an array:
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while((NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
I try to create a char array of length 1, since the read system call requires a pointer, I cannot use a regular char. The above code does not work. In fact, it does not print any characters to the terminal as expected by the loop. I think my logic is correct, but perhaps a misunderstanding of whats going on behind the scenes is what is making this hard for me. Or maybe i missed something simple (hope not).
If you already know how many bytes to read (e.g. in appropriatesize) then just read in that many bytes at once, rather than reading in bytes one at a time.
char everythingPast900[appropriatesize];
ssize_t bytesRead = read(WWW, everythingPast900, sizeof everythingPast900);
if (bytesRead > 0 && bytesRead != appropriatesize)
{
// only everythingPast900[0] to everythingPast900[bytesRead - 1] is valid
}
I made a test version of your code and added bits you left out. Why did you leave them out?
I also made a file named www.txt that has a hundred lines of "This is a test line." in it.
And I found a potential problem, depending on how big your appropriatesize value is and how big the file is. If you write past the end of EverythingPast900 it is possible for you to kill your program and crash it before you ever produce any output to display. That might happen on Windows where stdout may not be line buffered depending on which libraries you used.
See the MSDN setvbuf page, in particular "For some systems, this provides line buffering. However, for Win32, the behavior is the same as _IOFBF - Full Buffering."
This seems to work:
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int WWW = open("www.txt", O_RDONLY);
if(WWW < 0)
printf("Error opening www.txt\n");
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
int appropriatesize = 1000;
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while(i < appropriatesize && (NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
return 0;
}
As stated in another answer, read more than one byte. The theory behind "buffers" is to reduce the amount of read/write operations due to how slow disk I/O (or network I/O) is compared to memory speed and CPU speed. Look at it as if it is code and consider which is faster: adding 1 to the file size N times and writing N bytes individually, or adding N to the file size once and writing N bytes at once?
Another thing worth mentioning is the fact that read may read fewer than the number of bytes you requested, even if there is more to read. The answer written by #dreamlax illustrates this fact. If you want, you can use a loop to read as many bytes as possible, filling the buffer. Note that I used a function, but you can do the same thing in your main code:
#include <sys/types.h>
/* Read from a file descriptor, filling the buffer with the requested
* number of bytes. If the end-of-file is encountered, the number of
* bytes returned may be less than the requested number of bytes.
* On error, -1 is returned. See read(2) or read(3) for possible
* values of errno.
* Otherwise, the number of bytes read is returned.
*/
ssize_t
read_fill (int fd, char *readbuf, ssize_t nrequested)
{
ssize_t nread, nsum = 0;
while (nrequested > 0
&& (nread = read (fd, readbuf, nrequested)) > 0)
{
nsum += nread;
nrequested -= nread;
readbuf += nread;
}
return nsum;
}
Note that the buffer is not null-terminated as not all data is necessarily text. You can pass buffer_size - 1 as the requested number of bytes and use the return value to add a null terminator where necessary. This is useful primarily when interacting with functions that will expect a null-terminated string:
char readbuf[4096];
ssize_t n;
int fd;
fd = open ("WWW", O_RDONLY);
if (fd == -1)
{
perror ("unable to open WWW");
exit (1);
}
n = lseek (fd, 900, SEEK_SET);
if (n == -1)
{
fprintf (stderr,
"warning: seek operation failed: %s\n"
" reading 900 bytes instead\n",
strerror (errno));
n = read_fill (fd, readbuf, 900);
if (n < 900)
{
fprintf (stderr, "error: fewer than 900 bytes in file\n");
close (fd);
exit (1);
}
}
/* Read a file, printing its contents to the screen.
*
* Caveat:
* Not safe for UTF-8 or other variable-width/multibyte
* encodings since required bytes may get cut off.
*/
while ((n = read_fill (fd, readbuf, (ssize_t) sizeof readbuf - 1)) > 0)
{
readbuf[n] = 0;
printf ("Read\n****\n%s\n****\n", readbuf);
}
if (n == -1)
{
close (fd);
perror ("error reading from WWW");
exit (1);
}
close (fd);
I could also have avoided the null termination operation and filled all 4096 bytes of the buffer, electing to use the precision part of the format specifiers of printf in this case, changing the format specification from %s to %.4096s. However, this may not be feasible with unusually large buffers (perhaps allocated by malloc to avoid stack overflow) because the buffer size may not be representable with the int type.
Also, you can use a regular char just fine:
char c;
nread = read (fd, &c, 1);
Apparently you didn't know that the unary & operator gets the address of whatever variable is its operand, creating a value of type pointer-to-{typeof var}? Either way, it takes up the same amount of memory, but reading 1 byte at a time is something that normally isn't done as I've explained.
Mixing declarations and code is a no no. Also, no, that is not a valid declaration. C should complain about it along the lines of it being variably defined.
What you want is dynamically allocating the memory for your char buffer[]. You'll have to use pointers.
http://www.ontko.com/pub/rayo/cs35/pointers.html
Then read this one.
http://www.cprogramming.com/tutorial/c/lesson6.html
Then research a function called memcpy().
Enjoy.
Read through that guide, then you should be able to solve your problem in an entirely different way.
Psuedo code.
declare a buffer of char(pointer related)
allocate memory for said buffer(dynamic memory related)
Find location of where you want to start at
point to it(pointer related)
Figure out how much you want to store(technically a part of allocating memory^^^)
Use memcpy() to store what you want in the buffer

open() and read() system calls...program not executing

I'm trying to make a program that would copy 512 bytes from 1 file to another using said system calls (I could make a couple buffers, memcpy() and then fwrite() but I want to practice with Unix specific low level I/O). Here is the beginning of the code:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int src, dest, bytes_read;
char tmp_buf[512];
if (argc < 3)
printf("Needs 2 arguments.");
printf("And this message I for some reason don't see.... o_O");
if ((src = open(argv[1], O_RDWR, 0)) == -1 || (dest = open(argv[2], O_CREAT, 0)) == -1)
perror("Error");
while ((bytes_read = read(src, tmp_buf, 512)) != -1)
write(dest, tmp_buf, 512);
return 0;
}
I know I didn't deal with the fact that the file read from isn't going to be a multiple of 512 in size. But first I really need to figure out 2 things:
Why isn't my message showing up? No segmentation fault either, so I end up having to just C-c out of the program
How exactly do those low level functions work? Is there a pointer which shifts with each system call, like say if we were using FILE *file with fwrite, where our *file would automatically increment, or do we have to increment the file pointer by hand? If so, how would we access it assuming that open() and etc. never specify a file pointer, rather just the file ID?
Any help would be great. Please. Thank you!
The reason you don't see the printed message is because you don't flush the buffers. The text should show up once the program is done though (which never happens, and why this is, is explained in a comment by trojanfoe and in an answer by paxdiablo). Simply add a newline at the end of the strings to see them.
And you have a serious error in the read/write loop. If you read less than the requested 512 bytes, you will still write 512 bytes.
Also, while you do check for errors when opening, you don't know which of the open calls that failed. And you still continue the program even if you get an error.
And finally, the functions are very simple: They call a function in the kernel which handles everything for you. If you read X bytes the file pointer is moved forward X bytes after the call is done.
The reason you don't see the message is because you're in line-buffered mode. It will only be flushed if it discovers a newline character.
As to why it's waiting forever, you'll only get -1 on an error.
Successfully reading to end of file will give you a 0 return value.
A better loop would be along the lines of:
int bytes_left = 512;
while ((bytes_left > 0) {
bytes_read = read(src, tmp_buf, bytes_left);
if (bytes_read < 1) break;
write(dest, tmp_buf, bytes_read);
bytes_left -= bytes_read;
}
if (bytes_left < 0)
; // error of some sort

read() syscall on windows fails to read binary file

I would like to read image file to keep them in memory before using them with SDL. I just realized that open() and read() on windows fails to read my file entirely but on linux/BSD it works!
This is my code:
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#define IMGPATH "rabbit.png"
int
main(int argc, char *argv[])
{
int fd;
struct stat st;
void *data;
size_t nbread;
fd = open(IMGPATH, O_RDONLY);
if (fd < 0)
exit(1);
fstat(fd, &st);
data = malloc(st.st_size);
if (data == NULL)
exit(1);
nbread = read(fd, data, st.st_size);
if (nbread != st.st_size)
printf("Failed to read completely: expected = %ld, read = %ld\n", st.st_size, nbread);
}
On windows it will produce: Failed to read completely: expected = 19281, read = 5. perror() says no error and if I try to read() again it does not change and stuck at this 5 byte.
Is there some flag I should add to open() to read binary file?
This is the first PNG bytes file I try to read:
0000000 211 P N G \r \n 032 \n \0 \0 \0 \r I H D R
0000010 \0 \0 \0 \ \0 \0 \0 k \b 006 \0 \0 \0 <FA> 220 <E5>
Does it stops reading when '\n' appears?
I don't know how to fix this right now.
PS: please do not says "use libpng" because I just need to get the file buffer into memory before using it with SDL and my graphic library.
A few points:
read() is not guaranteed to read the count of bytes specified. It may return early or nothing at all. It's normal to have to call read() several times to fill large buffers. This is one of the reasons the stdio wrappers and fread() are useful.
On Windows, text and binary mode differ. Since you did not specifiy O_BINARY in your flags, Windows will handle '\n' characters differently for this file. Likely it is returning at the first '\n' encountered.
It's not necessary to check the file size before reading the file. The read() and indeed any wrapper around read() will always stop reading at the end of the file.
Update0
On further observation I see that the 5th and 6th characters are \r\n, this is handled specially by Windows when in text mode, and explains the early return as I mentioned above. If you don't pass O_BINARY to the open() call these 2 characters will be converted to a single \n.

C read binary stdin

I'm trying to build an instruction pipeline simulator and I'm having a lot of trouble getting started. What I need to do is read binary from stdin, and then store it in memory somehow while I manipulate the data. I need to read in chunks of exactly 32 bits one after the other.
How do I read in chunks of exactly 32 bits at a time? Secondly, how do I store it for manipulation later?
Here's what I've got so far, but examining the binary chunks I read further, it just doesn't look right, I don't think I'm reading exactly 32 bits like I need.
char buffer[4] = { 0 }; // initialize to 0
unsigned long c = 0;
int bytesize = 4; // read in 32 bits
while (fgets(buffer, bytesize, stdin)) {
memcpy(&c, buffer, bytesize); // copy the data to a more usable structure for bit manipulation later
// more stuff
buffer[0] = 0; buffer[1] = 0; buffer[2] = 0; buffer[3] = 0; // set to zero before next loop
}
fclose(stdin);
How do I read in 32 bits at a time (they are all 1/0, no newlines etc), and what do I store it in, is char[] okay?
EDIT: I'm able to read the binary in but none of the answers produce the bits in the correct order — they are all mangled up, I suspect endianness and problems reading and moving 8 bits around ( 1 char) at a time — this needs to work on Windows and C ... ?
What you need is freopen(). From the manpage:
If filename is a null pointer, the freopen() function shall attempt to change the mode of the stream to that specified by mode, as if the name of the file currently associated with the stream had been used. In this case, the file descriptor associated with the stream need not be closed if the call to freopen() succeeds. It is implementation-defined which changes of mode are permitted (if any), and under what circumstances.
Basically, the best you can really do is this:
freopen(NULL, "rb", stdin);
This will reopen stdin to be the same input stream, but in binary mode. In the normal mode, reading from stdin on Windows will convert \r\n (Windows newline) to the single character ASCII 10. Using the "rb" mode disables this conversion so that you can properly read in binary data.
freopen() returns a filehandle, but it's the previous value (before we put it in binary mode), so don't use it for anything. After that, use fread() as has been mentioned.
As to your concerns, however, you may not be reading in "32 bits" but if you use fread() you will be reading in 4 chars (which is the best you can do in C - char is guaranteed to be at least 8 bits but some historical and embedded platforms have 16 bit chars (some even have 18 or worse)). If you use fgets() you will never read in 4 bytes. You will read in at least 3 (depending on whether any of them are newlines), and the 4th byte will be '\0' because C strings are nul-terminated and fgets() nul-terminates what it reads (like a good function). Obviously, this is not what you want, so you should use fread().
Consider using SET_BINARY_MODE macro and setmode:
#ifdef _WIN32
# include <io.h>
# include <fcntl.h>
# define SET_BINARY_MODE(handle) setmode(handle, O_BINARY)
#else
# define SET_BINARY_MODE(handle) ((void)0)
#endif
More details about SET_BINARY_MODE macro here: "Handling binary files via standard I/O"
More details about setmode here: "_setmode"
I had to piece the answer together from the various comments from the kind people above, so here is a fully-working sample that works - only for Windows, but you can probably translate the windows-specific stuff to your platform.
#include "stdafx.h"
#include "stdio.h"
#include "stdlib.h"
#include "windows.h"
#include <io.h>
#include <fcntl.h>
int main()
{
char rbuf[4096];
char *deffile = "c:\\temp\\outvideo.bin";
size_t r;
char *outfilename = deffile;
FILE *newin;
freopen(NULL, "rb", stdin);
_setmode(_fileno(stdin), _O_BINARY);
FILE *f = fopen(outfilename, "w+b");
if (f == NULL)
{
printf("unable to open %s\n", outfilename);
exit(1);
}
for (;; )
{
r = fread(rbuf, 1, sizeof(rbuf), stdin);
if (r > 0)
{
size_t w;
for (size_t nleft = r; nleft > 0; )
{
w = fwrite(rbuf, 1, nleft, f);
if (w == 0)
{
printf("error: unable to write %d bytes to %s\n", nleft, outfilename);
exit(1);
}
nleft -= w;
fflush(f);
}
}
else
{
Sleep(10); // wait for more input, but not in a tight loop
}
}
return 0;
}
For Windows, this Microsoft _setmode example specifically shows how to change stdin to binary mode:
// crt_setmode.c
// This program uses _setmode to change
// stdin from text mode to binary mode.
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
int main( void )
{
int result;
// Set "stdin" to have binary mode:
result = _setmode( _fileno( stdin ), _O_BINARY );
if( result == -1 )
perror( "Cannot set mode" );
else
printf( "'stdin' successfully changed to binary mode\n" );
}
fgets() is all wrong here. It's aimed at human-readable ASCII text terminated by end-of-line characters, not binary data, and won't get you what you need.
I recently did exactly what you want using the read() call. Unless your program has explicitly closed stdin, for the first argument (the file descriptor), you can use a constant value of 0 for stdin. Or, if you're on a POSIX system (Linux, Mac OS X, or some other modern variant of Unix), you can use STDIN_FILENO.
fread() suits best for reading binary data.
Yes, char array is OK, if you are planning to process them bytewise.
I don't know what OS you are running, but you typically cannot "open stdin in binary". You can try things like
int fd = fdreopen (fileno (stdin), outfname, O_RDONLY | OPEN_O_BINARY);
to try to force it. Then use
uint32_t opcode;
read(fd, &opcode, sizeof (opcode));
But I have no actually tried it myself. :)
I had it right the first time, except, I needed ntohl ... C Endian Conversion : bit by bit

Resources