I'm trying to make a program that would copy 512 bytes from 1 file to another using said system calls (I could make a couple buffers, memcpy() and then fwrite() but I want to practice with Unix specific low level I/O). Here is the beginning of the code:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int src, dest, bytes_read;
char tmp_buf[512];
if (argc < 3)
printf("Needs 2 arguments.");
printf("And this message I for some reason don't see.... o_O");
if ((src = open(argv[1], O_RDWR, 0)) == -1 || (dest = open(argv[2], O_CREAT, 0)) == -1)
perror("Error");
while ((bytes_read = read(src, tmp_buf, 512)) != -1)
write(dest, tmp_buf, 512);
return 0;
}
I know I didn't deal with the fact that the file read from isn't going to be a multiple of 512 in size. But first I really need to figure out 2 things:
Why isn't my message showing up? No segmentation fault either, so I end up having to just C-c out of the program
How exactly do those low level functions work? Is there a pointer which shifts with each system call, like say if we were using FILE *file with fwrite, where our *file would automatically increment, or do we have to increment the file pointer by hand? If so, how would we access it assuming that open() and etc. never specify a file pointer, rather just the file ID?
Any help would be great. Please. Thank you!
The reason you don't see the printed message is because you don't flush the buffers. The text should show up once the program is done though (which never happens, and why this is, is explained in a comment by trojanfoe and in an answer by paxdiablo). Simply add a newline at the end of the strings to see them.
And you have a serious error in the read/write loop. If you read less than the requested 512 bytes, you will still write 512 bytes.
Also, while you do check for errors when opening, you don't know which of the open calls that failed. And you still continue the program even if you get an error.
And finally, the functions are very simple: They call a function in the kernel which handles everything for you. If you read X bytes the file pointer is moved forward X bytes after the call is done.
The reason you don't see the message is because you're in line-buffered mode. It will only be flushed if it discovers a newline character.
As to why it's waiting forever, you'll only get -1 on an error.
Successfully reading to end of file will give you a 0 return value.
A better loop would be along the lines of:
int bytes_left = 512;
while ((bytes_left > 0) {
bytes_read = read(src, tmp_buf, bytes_left);
if (bytes_read < 1) break;
write(dest, tmp_buf, bytes_read);
bytes_left -= bytes_read;
}
if (bytes_left < 0)
; // error of some sort
Related
I want to take all characters past location 900 from a file called WWW, and put all of these in an array:
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while((NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
I try to create a char array of length 1, since the read system call requires a pointer, I cannot use a regular char. The above code does not work. In fact, it does not print any characters to the terminal as expected by the loop. I think my logic is correct, but perhaps a misunderstanding of whats going on behind the scenes is what is making this hard for me. Or maybe i missed something simple (hope not).
If you already know how many bytes to read (e.g. in appropriatesize) then just read in that many bytes at once, rather than reading in bytes one at a time.
char everythingPast900[appropriatesize];
ssize_t bytesRead = read(WWW, everythingPast900, sizeof everythingPast900);
if (bytesRead > 0 && bytesRead != appropriatesize)
{
// only everythingPast900[0] to everythingPast900[bytesRead - 1] is valid
}
I made a test version of your code and added bits you left out. Why did you leave them out?
I also made a file named www.txt that has a hundred lines of "This is a test line." in it.
And I found a potential problem, depending on how big your appropriatesize value is and how big the file is. If you write past the end of EverythingPast900 it is possible for you to kill your program and crash it before you ever produce any output to display. That might happen on Windows where stdout may not be line buffered depending on which libraries you used.
See the MSDN setvbuf page, in particular "For some systems, this provides line buffering. However, for Win32, the behavior is the same as _IOFBF - Full Buffering."
This seems to work:
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int WWW = open("www.txt", O_RDONLY);
if(WWW < 0)
printf("Error opening www.txt\n");
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
int appropriatesize = 1000;
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while(i < appropriatesize && (NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
return 0;
}
As stated in another answer, read more than one byte. The theory behind "buffers" is to reduce the amount of read/write operations due to how slow disk I/O (or network I/O) is compared to memory speed and CPU speed. Look at it as if it is code and consider which is faster: adding 1 to the file size N times and writing N bytes individually, or adding N to the file size once and writing N bytes at once?
Another thing worth mentioning is the fact that read may read fewer than the number of bytes you requested, even if there is more to read. The answer written by #dreamlax illustrates this fact. If you want, you can use a loop to read as many bytes as possible, filling the buffer. Note that I used a function, but you can do the same thing in your main code:
#include <sys/types.h>
/* Read from a file descriptor, filling the buffer with the requested
* number of bytes. If the end-of-file is encountered, the number of
* bytes returned may be less than the requested number of bytes.
* On error, -1 is returned. See read(2) or read(3) for possible
* values of errno.
* Otherwise, the number of bytes read is returned.
*/
ssize_t
read_fill (int fd, char *readbuf, ssize_t nrequested)
{
ssize_t nread, nsum = 0;
while (nrequested > 0
&& (nread = read (fd, readbuf, nrequested)) > 0)
{
nsum += nread;
nrequested -= nread;
readbuf += nread;
}
return nsum;
}
Note that the buffer is not null-terminated as not all data is necessarily text. You can pass buffer_size - 1 as the requested number of bytes and use the return value to add a null terminator where necessary. This is useful primarily when interacting with functions that will expect a null-terminated string:
char readbuf[4096];
ssize_t n;
int fd;
fd = open ("WWW", O_RDONLY);
if (fd == -1)
{
perror ("unable to open WWW");
exit (1);
}
n = lseek (fd, 900, SEEK_SET);
if (n == -1)
{
fprintf (stderr,
"warning: seek operation failed: %s\n"
" reading 900 bytes instead\n",
strerror (errno));
n = read_fill (fd, readbuf, 900);
if (n < 900)
{
fprintf (stderr, "error: fewer than 900 bytes in file\n");
close (fd);
exit (1);
}
}
/* Read a file, printing its contents to the screen.
*
* Caveat:
* Not safe for UTF-8 or other variable-width/multibyte
* encodings since required bytes may get cut off.
*/
while ((n = read_fill (fd, readbuf, (ssize_t) sizeof readbuf - 1)) > 0)
{
readbuf[n] = 0;
printf ("Read\n****\n%s\n****\n", readbuf);
}
if (n == -1)
{
close (fd);
perror ("error reading from WWW");
exit (1);
}
close (fd);
I could also have avoided the null termination operation and filled all 4096 bytes of the buffer, electing to use the precision part of the format specifiers of printf in this case, changing the format specification from %s to %.4096s. However, this may not be feasible with unusually large buffers (perhaps allocated by malloc to avoid stack overflow) because the buffer size may not be representable with the int type.
Also, you can use a regular char just fine:
char c;
nread = read (fd, &c, 1);
Apparently you didn't know that the unary & operator gets the address of whatever variable is its operand, creating a value of type pointer-to-{typeof var}? Either way, it takes up the same amount of memory, but reading 1 byte at a time is something that normally isn't done as I've explained.
Mixing declarations and code is a no no. Also, no, that is not a valid declaration. C should complain about it along the lines of it being variably defined.
What you want is dynamically allocating the memory for your char buffer[]. You'll have to use pointers.
http://www.ontko.com/pub/rayo/cs35/pointers.html
Then read this one.
http://www.cprogramming.com/tutorial/c/lesson6.html
Then research a function called memcpy().
Enjoy.
Read through that guide, then you should be able to solve your problem in an entirely different way.
Psuedo code.
declare a buffer of char(pointer related)
allocate memory for said buffer(dynamic memory related)
Find location of where you want to start at
point to it(pointer related)
Figure out how much you want to store(technically a part of allocating memory^^^)
Use memcpy() to store what you want in the buffer
would install valgrind to tell me what the problem is, but unfortunately can't any new programs on this computer... Could anyone tell me if there's an obvious problem with this "echo" program? Doing this for a friend, so not sure what the layout of the client is on the other side, but I know that both reads and writes are valid socket descriptors, and I've tested that n = write(writes,"I got your message \n",20); and n = write(reads,"I got your message \n",20); both work so can confirm that it's not a case of an invalid fd. Thanks!
int
main( int argc, char** argv ) {
int reads = atoi(argv[1]) ;
int writes = atoi(argv[3]) ;
int n ;
char buffer[MAX_LINE];
memset(buffer, 0, sizeof(buffer));
int i = 0 ;
while (1) {
read(reads, buffer, sizeof(buffer));
n = write(writes,buffer,sizeof(buffer));
if (n < 0) perror("ERROR reading from socket");
}
There are a few problems, the most pressing of which is that you're likely pushing garbage data down the the write socket by using sizeof(buffer) when writing. Lets say you read data from the reads socket and it's less than MAX_LINES. When you go to write that data, you'll be writing whatever you read plus the garbage at the end of the buffer (even though you memset at the very beginning, continual use of the same buffer without reacting to different read sizes will probably generate some garbage.
Try getting the return value from read and using it in your write. If the read return indicates an error, clean up and either exit or try again, depending on how you want your program to behave.
int n, size;
while (1) {
size = read(reads, buffer, sizeof(buffer));
if (size > 0) {
n = write(writes, buffer, size);
if (n != size) {
// write error, do something
}
} else {
// Read error, do something
}
}
This, of course, assumes your writes and reads are valid file descriptors.
These two lines look very suspicious:
int reads = atoi(argv[1]) ;
int writes = atoi(argv[3]) ;
Do you really get file/socket descriptor numbers on the command line? From where?
Check the return value of your read(2) and write(2), and then the value of errno(3) - they probably tell you that your file descriptors are invalid (EBADF).
One point not made thus far: Although you know that the file descriptors are valid, you should include some sanity checking of the command line.
if (argc < 3) {
printf("usage: foo: input output\n");
exit(0);
}
Even with this sanity checking passing parameters like this on a command line can be dangerous.
The memset() is not needed, provided you change the following (which you should do nevertheless).
read() has a result, telling you how much it has actually read. This you should give to write() in order to write only what you actually have, removing the need for zeroing.
MAX_LINE should be at least 512, if not more.
There probably are some more issues, but I think I have the most important ones.
I'm having a hard time trying to figure out why this piece of code doesn't work as it should. I am learning the basics of I/O operations and I have to come up with a C program that writes on a 'log.txt' file what is given from standard input and as the 'stop' word is entered, the program must halt.
So my code is:
#include "main.h"
#define SIZE 1024
int main(int argc, char *argv[])
{
int fd;
int readBytes;
int writBytes;
char *buffer;
if ((fd = open("log.txt", O_WRONLY|O_APPEND)) < 0)
{
perror("open");
}
buffer = (char *) calloc (SIZE, sizeof(char));
while ((readBytes = read(0, buffer, SIZE) < SIZE)&&(strncmp(buffer, "stop", 4) != 0));
if ((writBytes = write(fd, buffer, SIZE)) < 0)
{
perror("write");
}
if ((close(fd)) < 0)
{
perror("close");
}
}
If I enter:
this is just a text
stop
The output is
stop
is just a text
If I enter more than a sentence:
this is just a text
this is more text
and text again
stop
This is what is logged:
stop
ext again
xt
t
And on top of that if I try to edit the log.txt file from vim or just a text editor I can see '\00's. I guess \00 stands for all the bytes left empty from the 1024 available, right? How can I prevent that from happening?
It looks like you're expecting
readBytes = read(0, buffer, SIZE) < SIZE)
to somehow accumulate things in buffer. It doesn't. Every subsequent read will put whatever it read at the start of the buffer, overwriting what the previous read has read.
You need to put your write in the while block - one write for every read, and only write as much as you read, otherwise you'll write garbage (zeros from the calloc and/or leftovers from the previous read) in your log file.
Also note that while your technique will probably work most of the time for a line-buffered input stream, it will not do what you expect if you redirect from a file or a pipe. You should be using formatted input functions (like getline if you your implementation has that, scanf, or fgets).
I've written code that should ideally take in data from one document, encrypt it and save it in another document.
But when I try executing the code it does not put the encrypted data in the new file. It just leaves it blank. Someone please spot what's missing in the code. I tried but I couldn't figure it out.
I think there is something wrong with the read/write function, or maybe I'm implementing the do-while loop incorrectly.
#include <stdio.h>
#include <stdlib.h>
#include <termios.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
int main (int argc, char* argv[])
{
int fdin,fdout,n,i,fd;
char* buf;
struct stat fs;
if(argc<3)
printf("USAGE: %s source-file target-file.\n",argv[0]);
fdin=open(argv[1], O_RDONLY);
if(fdin==-1)
printf("ERROR: Cannot open %s.\n",argv[1]);
fdout=open(argv[2], O_WRONLY | O_CREAT | O_EXCL, 0644);
if(fdout==-1)
printf("ERROR: %s already exists.\n",argv[2]);
fstat(fd, &fs);
n= fs.st_size;
buf=malloc(n);
do
{
n=read(fd, buf, 10);
for(i=0;i<n;i++)
buf[i] ^= '#';
write(fd, buf, n);
} while(n==10);
close(fdin);
close(fdout);
}
You are using fd instead of fdin in fstat, read and write system calls. fd is an uninitialized variable.
// Here...
fstat(fd, &fs);
// And here...
n=read(fd, buf, 10);
for(i=0;i<n;i++)
buf[i] ^= '#';
write(fd, buf, n);
You're reading and writing to fd instead of fdin and fdout. Make sure you enable all warnings your compiler will emit (e.g. use gcc -Wall -Wextra -pedantic). It will warn you about the use of an uninitialized variable if you let it.
Also, if you checked the return codes of fstat(), read(), or write(), you'd likely have gotten errors from using an invalid file descriptor. They are most likely erroring out with EINVAL (invalid argument) errors.
fstat(fd, &fs);
n= fs.st_size;
buf=malloc(n);
And since we're here: allocating enough memory to hold the entire file is unnecessary. You're only reading 10 bytes at a time in your loop, so you really only need a 10-byte buffer. You could skip the fstat() entirely.
// Just allocate 10 bytes.
buf = malloc(10);
// Or heck, skip the malloc() too! Change "char *buf" to:
char buf[10];
All said it true, one more tip.
You should use a larger buffer that fits the system hard disk blocks, usually 8192.
This will increase your program speed significantly as you will have less access to the disk by a factor of 800. As you know, accessing to disk is very expensive in terms of time.
Another option is use stdio functions fread, fwrite, etc, which already takes care of buffering, still you'll have the function call overhead.
Roni
This looks like a simple question, but I didn't find anything similar here.
Since there is no file copy function in C, we have to implement file copying ourselves, but I don't like reinventing the wheel even for trivial stuff like that, so I'd like to ask the cloud:
What code would you recommend for file copying using fopen()/fread()/fwrite()?
What code would you recommend for file copying using open()/read()/write()?
This code should be portable (windows/mac/linux/bsd/qnx/younameit), stable, time tested, fast, memory efficient and etc. Getting into specific system's internals to squeeze some more performance is welcomed (like getting filesystem cluster size).
This seems like a trivial question but, for example, source code for CP command isn't 10 lines of C code.
This is the function I use when I need to copy from one file to another - with test harness:
/*
#(#)File: $RCSfile: fcopy.c,v $
#(#)Version: $Revision: 1.11 $
#(#)Last changed: $Date: 2008/02/11 07:28:06 $
#(#)Purpose: Copy the rest of file1 to file2
#(#)Author: J Leffler
#(#)Modified: 1991,1997,2000,2003,2005,2008
*/
/*TABSTOP=4*/
#include "jlss.h"
#include "stderr.h"
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_fcopy_c[] = "#(#)$Id: fcopy.c,v 1.11 2008/02/11 07:28:06 jleffler Exp $";
#endif /* lint */
void fcopy(FILE *f1, FILE *f2)
{
char buffer[BUFSIZ];
size_t n;
while ((n = fread(buffer, sizeof(char), sizeof(buffer), f1)) > 0)
{
if (fwrite(buffer, sizeof(char), n, f2) != n)
err_syserr("write failed\n");
}
}
#ifdef TEST
int main(int argc, char **argv)
{
FILE *fp1;
FILE *fp2;
err_setarg0(argv[0]);
if (argc != 3)
err_usage("from to");
if ((fp1 = fopen(argv[1], "rb")) == 0)
err_syserr("cannot open file %s for reading\n", argv[1]);
if ((fp2 = fopen(argv[2], "wb")) == 0)
err_syserr("cannot open file %s for writing\n", argv[2]);
fcopy(fp1, fp2);
return(0);
}
#endif /* TEST */
Clearly, this version uses file pointers from standard I/O and not file descriptors, but it is reasonably efficient and about as portable as it can be.
Well, except the error function - that's peculiar to me. As long as you handle errors cleanly, you should be OK. The "jlss.h" header declares fcopy(); the "stderr.h" header declares err_syserr() amongst many other similar error reporting functions. A simple version of the function follows - the real one adds the program name and does some other stuff.
#include "stderr.h"
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
void err_syserr(const char *fmt, ...)
{
int errnum = errno;
va_list args;
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
if (errnum != 0)
fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
exit(1);
}
The code above may be treated as having a modern BSD license or GPL v3 at your choice.
As far as the actual I/O goes, the code I've written a million times in various guises for copying data from one stream to another goes something like this. It returns 0 on success, or -1 with errno set on error (in which case any number of bytes might have been copied).
Note that for copying regular files, you can skip the EAGAIN stuff, since regular files are always blocking I/O. But inevitably if you write this code, someone will use it on other types of file descriptors, so consider it a freebie.
There's a file-specific optimisation that GNU cp does, which I haven't bothered with here, that for long blocks of 0 bytes instead of writing you just extend the output file by seeking off the end.
void block(int fd, int event) {
pollfd topoll;
topoll.fd = fd;
topoll.events = event;
poll(&topoll, 1, -1);
// no need to check errors - if the stream is bust then the
// next read/write will tell us
}
int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
for(;;) {
void *pos;
// read data to buffer
ssize_t bytestowrite = read(fdin, buf, bufsize);
if (bytestowrite == 0) break; // end of input
if (bytestowrite == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdin, POLLIN);
continue;
}
return -1; // error
}
// write data from buffer
pos = buf;
while (bytestowrite > 0) {
ssize_t bytes_written = write(fdout, pos, bytestowrite);
if (bytes_written == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdout, POLLOUT);
continue;
}
return -1; // error
}
bytestowrite -= bytes_written;
pos += bytes_written;
}
}
return 0; // success
}
// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
#define FILECOPY_BUFFER_SIZE (64*1024)
#endif
int copy_data(int fdin, int fdout) {
// optional exercise for reader: take the file size as a parameter,
// and don't use a buffer any bigger than that. This prevents
// memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
// is small.
for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
void *buffer = malloc(bufsize);
if (buffer != NULL) {
int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
free(buffer);
return result;
}
}
// could use a stack buffer here instead of failing, if desired.
// 128 bytes ought to fit on any stack worth having, but again
// this could be made configurable.
return -1; // errno is ENOMEM
}
To open the input file:
int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;
Opening the output file is tricksy. As a basis, you want:
int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
close(fdin);
return -1;
}
But there are confounding factors:
you need to special-case when the files are the same, and I can't remember how to do that portably.
if the output filename is a directory, you might want to copy the file into the directory.
if the output file already exists (open with O_EXCL to determine this and check for EEXIST on error), you might want to do something different, as cp -i does.
you might want the permissions of the output file to reflect those of the input file.
you might want other platform-specific meta-data to be copied.
you may or may not wish to unlink the output file on error.
Obviously the answers to all these questions could be "do the same as cp". In which case the answer to the original question is "ignore everything I or anyone else has said, and use the source of cp".
Btw, getting the filesystem's cluster size is next to useless. You'll almost always see speed increasing with buffer size long after you've passed the size of a disk block.
the size of each read need to be a multiple of 512 ( sector size ) 4096 is a good one
Here is a very easy and clear example: Copy a file. Since it is written in ANSI-C without any particular function calls I think this one would be pretty much portable.
Depending on what you mean by copying a file, it is certainly far from trivial. If you mean copying the content only, then there is almost nothing to do. But generally, you need to copy the metadata of the file, and that's surely platform dependent. I don't know of any C library which does what you want in a portable manner. Just handling the filename by itself is no trivial matter if you care about portability.
In C++, there is the file library in boost
One thing I found when implementing my own file copy, and it seems obvious but it's not: I/O's are slow. You can pretty much time your copy's speed by how many of them you do. So clearly you need to do as few of them as possible.
The best results I found were when I got myself a ginourmous buffer, read the entire source file into it in one I/O, then wrote the entire buffer back out of it in one I/O. If I even had to do it in 10 batches, it got way slow. Trying to read and write out each byte, like a naieve coder might try first, was just painful.
The accepted answer written by Steve Jessop does not answer to the first part of the quession, Jonathan Leffler do it, but do it wrong: code should be written as
while ((n = fread(buffer, 1, sizeof(buffer), f1)) > 0)
if (fwrite(buffer, n, 1, f2) != 1)
/* we got write error here */
/* test ferror(f1) for a read errors */
Explanation:
sizeof(char) = 1 by definition, always: it does not matter how many bits in it, 8 (in most cases), 9, 11 or 32 (on some DSP, for example) — size of char is one. Note, it is not an error here, but an extra code.
The fwrite function writes upto nmemb (second argument) elements of specified size (third argument), it does not required to write exactly nmemb elements. To fix this you must write the rest of the data readed or just write one element of size n — let fwrite do all his work. (This item is in question, should fwrite write all data or not, but in my version short writes impossible until error occurs.)
You should test for a read errors too: just test ferror(f1) at the end of loop.
Note, you probably need to disable buffering on both input and output files to prevent triple buffering: first on read to f1 buffer, second in our code, third on write to f2 buffer:
setvbuf(f1, NULL, _IONBF, 0);
setvbuf(f2, NULL, _IONBF, 0);
(Internal buffers should, probably, be of size BUFSIZ.)