I'm quite new to C. I faced a problem while studying the last chapter of K&R.
I'm trying to implement fopen() and fillbuf() function by using system calls, open and read.
I exactly copied the source code from the book but repeatedly get segmentation error after I compile.
fp->fd = fd;
fp->cnt = 0;
fp->base = NULL;
fp->flag = (*mode=='r')? _READ : _WRITE;
Why does error occur? Here is my complete code.
#include<fcntl.h>
#include<unistd.h>
#include<stdlib.h>
#define PERM 0644
#define EOF (-1)
#define BUFSIZE 1024
#define OPEN_MAX 20
typedef struct _iobuf{
int cnt;
char *ptr;
char *base;
int flag;
int fd;
} myFILE;
enum _flags {
_READ = 01,
_WRITE = 02,
_UNBUF = 04,
_EOF = 010,
_ERR = 020
};
myFILE _iob[OPEN_MAX]={
{0, (char *) 0, (char *) 0, _READ, 0 },
{0, (char *) 0, (char *) 0, _WRITE, 1 },
{0, (char *) 0, (char *) 0, _WRITE | _UNBUF, 2 }
};
#define stdin (&_iob[0])
#define stdout (&_iob[1])
#define stderr (&_iob[2])
#define getc(p) ( --(p)->cnt>=0 ? (unsigned char) *(p)->ptr++ : _fillbuf(p) )
int _fillbuf(myFILE *fp)
{
int bufsize;
if((fp->flag & (_READ|_EOF|_ERR))!=_READ)
return EOF;
bufsize=(fp->flag & _UNBUF)? 1 : BUFSIZE;
if(fp->base==NULL)
if((fp->base=(char *)malloc(bufsize))==NULL)
return EOF;
fp->ptr=fp->base;
fp->cnt=read(fp->fd, fp->ptr, bufsize);
if(--fp->cnt<0){
if(fp->cnt == -1)
fp->flag |= _EOF;
else
fp->flag |= _ERR;
return EOF;
}
return (unsigned char) *fp->ptr++;
}
myFILE *myfopen(char *name, char *mode)
{
int fd;
myFILE *fp;
if(*mode!='r' && *mode!='w' && *mode!='a')
return NULL;
for(fp=_iob; fp<_iob+OPEN_MAX; fp++)
if((fp->flag & (_READ | _WRITE))==0)
break;
if(fp>=_iob+OPEN_MAX)
return NULL;
if(*mode=='w')
fd=creat(name, PERM);
else if(*mode=='a'){
if((fd=open(name, O_WRONLY, 0))==-1)
fd=creat(name, PERM);
lseek(fd, 0L, 2);
} else
fd=open(name, O_RDONLY, 0);
if(fd==-1)
return NULL;
fp->fd = fd;
fp->cnt = 0;
fp->base = NULL;
fp->flag = (*mode=='r')? _READ : _WRITE;
return fp;
}
int main(int argc, char *argv[])
{
myFILE *fp;
int c;
if((fp=myfopen(argv[1], "r"))!=NULL)
write(1, "opened\n", sizeof("opened\n"));
while((c=getc(fp))!=EOF)
write(1, &c, sizeof(c));
return 0;
}
EDIT: Please see Jonathan Leffler's answer. It is more accurate and provides a better diagnosis. My answer works, but there is a better way to do things.
I see the problem.
myFILE *fp;
if(*mode!='r' && *mode!='w' && *mode!='a')
return NULL;
for(fp=_iob; fp<_iob+OPEN_MAX; fp++)
if((fp->flag & (_READ | _WRITE))==0) // marked line
break;
When you reach the marked line, you try to dereference the fp pointer. Since it is (likely, but not certainly) initialized to zero (but I should say NULL), you are dereferencing a null pointer. Boom. Segfault.
Here's what you need to change.
myFILE *fp = (myFILE *)malloc(sizeof(myFILE));
Be sure to #include <malloc.h> to use malloc.
Also your close function should later free() your myFILE to prevent memory leaks.
A different analysis of the code in the question
The code shown in the question consists of parts, but not all, of the code from K&R "The C Programming Language, 2nd Edition" (1988; my copy is marked 'Based on Draft Proposed ANSI C'), pages 176-178, plus a sample main program that is not from the book at all. The name of the type was changed from FILE to myFILE too, and fopen() was renamed to myfopen(). I note that the expressions in the code in the question have many fewer spaces than the original code in K&R. The compiler doesn't mind; human readers generally prefer spaces around operators.
As stated in another (later) question and answer, the diagnosis given by Mark Yisri in the currently accepted answer is incorrect — the problem is not a null pointer in the for loop. The prescribed remedy works (as long as the program is invoked correctly), but the memory allocation is not necessary. Fortunately for all concerned, the fclose() function was not included in the implementations, so it wasn't possible to close a file once it was opened.
In particular, the loop:
for (fp = _iob; fp < _iob + OPEN_MAX; fp++)
if ((fp->flag & (_READ | _WRITE)) == 0)
break;
is perfectly OK because the array _iob is defined as:
FILE _iob[OPEN_MAX] = {
…initializers for stdin, stdout, stderr…
};
This is an array of structures, not structure pointers. The first three elements are initialized explicitly; the remaining elements are implicitly initialized to all zeros. Consequently, there is no chance of there being a null pointer in fp as it steps through the array. The loop might also be written as:
for (fp = &_iob[0]; fp < &_iob[OPEN_MAX]; fp++)
if ((fp->flag & (_READ | _WRITE)) == 0)
break;
Empirically, if the code shown in the question (including the main(), which was not — repeat not — written by K&R) is invoked correctly, it works without crashing. However, the code in the main() program does not protect itself from:
Being invoked without a non-null argv[1].
Being invoked with a non-existent or non-readable file name in argv[1].
These are very common problems, and with the main program as written, either could cause the program to crash.
Although it is hard to be sure 16 months later, it seems likely to me that the problem was in the way that the program was invoked rather than anything else. If the main program is written more-or-less appropriately, you end up with code similar to this (you also need to add #include <string.h> to the list of included headers):
int main(int argc, char *argv[])
{
myFILE *fp;
int c;
if (argc != 2)
{
static const char usage[] = "Usage: mystdio filename\n";
write(2, usage, sizeof(usage) - 1);
return 1;
}
if ((fp = myfopen(argv[1], "r")) == NULL)
{
static const char filenotopened[] = "mystdio: failed to open file ";
write(2, filenotopened, sizeof(filenotopened) - 1);
write(2, argv[1], strlen(argv[1]));
write(2, "\n", 1);
return 1;
}
write(1, "opened\n", sizeof("opened\n"));
while ((c = getc(fp)) != EOF)
write(1, &c, sizeof(c));
return 0;
}
This can't use fprintf() etc because the surrogate implementation of the standard I/O library is not complete. Writing the errors direct to file descriptor 2 (standard error) with write() is fiddly, if not painful. It also means that I've taken shortcuts like assuming that the program is called mystdio rather than actually using argv[0] in the error messages. However, if it is invoked without any file name (or if more than one file name is given), or if the named file cannot be opened for reading, then it produces a more or less appropriate error message — and does not crash.
Leading underscores
Note that the C standard reserves identifiers starting with underscores.
You should not create function, variable or macro names that start with an underscore, in general. C11 §7.1.3 Reserved identifiers says (in part):
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
See also What does double underscore (__const) mean in C?
In fairness, K&R were essentially describing the standard implementation of the standard I/O library at the time when the 1st Edition was written (1978), modernized sufficiently to be using function prototype notation in the 2nd Edition. The original code was on pages 165-168 of the 1st Edition.
Even today, if you are implementing the standard library, you would use names starting with underscores precisely because they are reserved for use 'by the implementation'. If you are not implementing the standard library, you do not use names starting with underscores because that uses the namespace reserved for the implementation. Most people, most of the time, are not writing the standard library — most people should not be using leading underscores.
Related
I am a beginner to C programming. I need to efficiently read millions of from a file using struct in a file. Below is the example of input file.
2,33.1609992980957,26.59000015258789,8.003999710083008
5,15.85200023651123,13.036999702453613,31.801000595092773
8,10.907999992370605,32.000999450683594,1.8459999561309814
11,28.3700008392334,31.650999069213867,13.107999801635742
I have a current code shown in below, it is giving an error "Error in file"
suggesting the file is NULL but file has data.
#include<stdio.h>
#include<stdlib.h>
struct O_DATA
{
int index;
float x;
float y;
float z;
};
int main ()
{
FILE *infile ;
struct O_DATA input;
infile = fopen("input.dat", "r");
if (infile == NULL);
{
fprintf(stderr,"\nError file\n");
exit(1);
}
while(fread(&input, sizeof(struct O_DATA), 1, infile))
printf("Index = %d X= %f Y=%f Z=%f", input.index , input.x , input.y , input.z);
fclose(infile);
return 0;
}
I need to efficiently read and store data from an input file to process it further. Any help would be really appreciated. Thanks in advnace.
~
~
~
First figure out how to convert one line of text to data
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct my_data
{
unsigned int index;
float x;
float y;
float z;
};
struct my_data *
deserialize_data(struct my_data *data, const char *input, const char *separators)
{
char *p;
struct my_data tmp;
if(sscanf(input, "%d,%f,%f,%f", &data->index, &data->x, &data->y, &data->z) != 7)
return NULL;
return data;
}
deserialize_data(struct my_data *data, const char *input, const char *separators)
{
char *p;
struct my_data tmp;
char *str = strdup(input); /* make a copy of the input line because we modify it */
if (!str) { /* I couldn't make a copy so I'll die */
return NULL;
}
p = strtok (str, separators); /* use line for first call to strtok */
if (!p) goto err;
tmp.index = strtoul (p, NULL, 0); /* convert text to integer */
p = strtok (NULL, separators); /* strtok remembers line */
if (!p) goto err;
tmp.x = atof(p);
p = strtok (NULL, separators);
if (!p) goto err;
tmp.y = atof(p);
p = strtok (NULL, separators);
if (!p) goto err;
tmp.z = atof(p);
memcpy(data, &tmp, sizeof(tmp)); /* copy values out */
goto out;
err:
data = NULL;
out:
free (str);
return data;
}
int main() {
struct my_data somedata;
deserialize_data(&somedata, "1,2.5,3.12,7.955", ",");
printf("index: %d, x: %2f, y: %2f, z: %2f\n", somedata.index, somedata.x, somedata.y, somedata.z);
}
Combine it with reading lines from a file:
just the main function here (insert the rest from the previous example)
int
main(int argc, char *argv[])
{
FILE *stream;
char *line = NULL;
size_t len = 0;
ssize_t nread;
struct my_data somedata;
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(EXIT_FAILURE);
}
stream = fopen(argv[1], "r");
if (stream == NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
while ((nread = getline(&line, &len, stream)) != -1) {
deserialize_data(&somedata, line, ",");
printf("index: %d, x: %2f, y: %2f, z: %2f\n", somedata.index, somedata.x, somedata.y, somedata.z);
}
free(line);
fclose(stream);
exit(EXIT_SUCCESS);
}
You've got an incorrect ; after your if (infile == NULL) test - try removing that...
[Edit: 2nd by 9 secs! :-)]
if (infile == NULL);
{ /* floating block */ }
The above if is a complete statement that does nothing regardless of the value of infile. The "floating" block is executed no matter what infile contains.
Remove the semicolon to 'attach' the "floating" block to the if
if (infile == NULL)
{ /* if block */ }
You already have solid responses in regard to syntax/structs/etc, but I will offer another method for reading the data in the file itself: I like Martin York's CSVIterator solution. This is my go-to approach for CSV processing because it requires less code to implement and has the added benefit of being easily modifiable (i.e., you can edit the CSVRow and CSVIterator defs depending on your needs).
Here's a mostly complete example using Martin's unedited code without structs or classes. In my opinion, and especially so as a beginner, it is easier to start developing your code with simpler techniques. As your code begins to take shape, it is much clearer why and where you need to implement more abstract/advanced devices.
Note this would technically need to be compiled with C++11 or greater because of my use of std::stod (and maybe some other stuff too I am forgetting), so take that into consideration:
//your includes
//...
#include"wherever_CSVIterator_is.h"
int main (int argc, char* argv[])
{
int index;
double tmp[3]; //since we know the shape of your input data
std::vector<double*> saved = std::vector<double*>();
std::vector<int> indices;
std::ifstream file(argv[1]);
for (CSVIterator loop(file); loop != CSVIterator(); ++loop) { //loop over rows
index = (*loop)[0];
indices.push_back(index); //store int index first, always col 0
for (int k=1; k < (*loop).size(); k++) { //loop across columns
tmp[k-1] = std::stod((*loop)[k]); //save double values now
}
saved.push_back(tmp);
}
/*now we have two vectors of the same 'size'
(let's pretend I wrote a check here to confirm this is true),
so we loop through them together and access with something like:*/
for (int j=0; j < (int)indices.size(); j++) {
double* saved_ptr = saved.at(j); //get pointer to first elem of each triplet
printf("\nindex: %g |", indices.at(j));
for (int k=0; k < 3; k++) {
printf(" %4.3f ", saved_ptr[k]);
}
printf("\n");
}
}
Less fuss to write, but more dangerous (if saved[] goes out of scope, we are in trouble). Also some unnecessary copying is present, but we benefit from using std::vector containers in lieu of knowing exactly how much memory we need to allocate.
Don't give an example of input file. Specify your input file format -at least on paper or in comments- e.g. in EBNF notation (since your example is textual... it is not a binary file). Decide if the numbers have to be in different lines (or if you might accept a file with a single huge line made of million bytes; read about the Comma Separated Values format). Then, code some parser for that format. In your case, it is likely that some very simple recursive descent parsing is enough (and your particular parser won't even use recursion).
Read more about <stdio.h> and its routines. Take time to carefully read that documentation. Since your input is textual, not binary, you don't need fread. Notice that input routines can fail, and you should handle the failure case.
Of course, fopen can fail (e.g. because your working directory is not what you believe it is). You'll better use perror or errno to find more about the failure cause. So at least code:
infile = fopen("input.dat", "r");
if (infile == NULL) {
perror("fopen input.dat");
exit(EXIT_FAILURE);
}
Notice that semi-colons (or their absence) are very important in C (no semi-colon after condition of if). Read again the basic syntax of C language. Read about How to debug small programs. Enable all warnings and debug info when compiling (with GCC, compile with gcc -Wall -g at least). The compiler warnings are very useful!
Remember that fscanf don't handle the end of line (newline) differently from a space character. So if the input has to have different lines you need to read every line separately.
You'll probably read every line using fgets (or getline) and parse every line individually. You could do that parsing with the help of sscanf (perhaps the %n could be useful) - and you want to use the return count of sscanf. You could also perhaps use strtok and/or strtod to do such a parsing.
Make sure that your parsing and your entire program is correct. With current computers (they are very fast, and most of the time your input file sits in the page cache) it is very likely that it would be fast enough. A million lines can be read pretty quickly (if on Linux, you could compare your parsing time with the time used by wc to count the lines of your file). On my computer (a powerful Linux desktop with AMD2970WX processor -it has lots of cores, but your program uses only one-, 64Gbytes of RAM, and SSD disk) a million lines can be read (by wc) in less than 30 milliseconds, so I am guessing your entire program should run in less than half a second, if given a million lines of input, and if the further processing is simple (in linear time).
You are likely to fill a large array of struct O_DATA and that array should probably be dynamically allocated, and reallocated when needed. Read more about C dynamic memory allocation. Read carefully about C memory management routines. They could fail, and you need to handle that failure (even if it is very unlikely to happen). You certainly don't want to re-allocate that array at every loop. You probably could allocate it in some geometrical progression (e.g. if the size of that array is size, you'll call realloc or a new malloc for some int newsize = 4*size/3 + 10; only when the old size is too small). Of course, your array will generally be a bit larger than what is really needed, but memory is quite cheap and you are allowed to "lose" some of it.
But StackOverflow is not a "do my homework" site. I gave some advice above, but you should do your homework.
I need to get proper Polish characters "ąężźćśół". I used some solutions like setlocale, system chcp, wchar_t. Everything goes well as long as I don't use files/lists. wscanf, wprintf and wchar_t works perfectly.
But if I'm trying to read something from a file and save that into a list (even in array), then trying to put that to the screen, I can't get proper Polish characters, and in case of the lists, I'm getting different results from time to time for example, z` , A2 , like random characters from nowhere. I've been trying to get good results by using fscanf and fgets with w(wide) variations, but it doesn't work. Did I something wrong?
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
struct dyk{
wchar_t line[200];
struct dyk *next;
};
typedef struct dyk dyk;
void printdyk(char name[100]){
dyk *wyp;
wyp = malloc(sizeof(dyk));
wchar_t yt[100];
FILE *dyktando;
dyktando = fopen(name, "r+");
if(dyktando == NULL){
wprintf(L"Błąd otwarcia pliku!\n"); //Can't open file
}else{
fgets(&wyp->line, sizeof(dyk), dyktando); //reading from file and send to the list
wprintf(L"%s\n", wyp->line); //write text from the list on the screen
wchar_t yt[100];
wscanf(L"%s", &yt); //testing strings comparing, so I have to put some variables
int n=strcmp(yt, wyp->line); //str compare
printf("%d", n); //result, it gives me -1 every time
}
fclose(dyktando);
}
I tested function with txt file that contents only one character "ż". Can't read from file properly. At the start of main function I put these 2 lines:
system("chcp 852");
setlocale(LC_ALL, ".852");
I'm using codeblock, mingw32-gcc compiler, and no flags.
You are not using wchar_t compatible functions everywhere in your code. In particular:
fgets(&wyp->line, sizeof(dyk), dyktando); //reading from file and send to the list
The wchar_t compatible version is fgetws. Also, wyp->line (without the & operator) is the correct argument.
int n=strcmp(yt, wyp->line); //str compare
wcscmp should be used instead.
Also note that sizeof on a wchar_t array is not correct when a function expects length in characters rather than bytes (like fgetws does).
A comment OP (Amatheon) made indicates that the true underlying problem is how to properly read files using wide-character functions.
To ensure maximum compatibility and portability, let's restrict to C99. Consider the following example program:
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <errno.h>
#ifdef USE_ERRNO_CONSTANTS
#define SET_ERRNO(value) (errno = (value))
#else
#define SET_ERRNO(value)
#endif
ssize_t get_wide_delimited(wchar_t **lineptr, size_t *sizeptr, wint_t delim, FILE *stream)
{
wchar_t *line = NULL;
size_t size = 0;
size_t used = 0;
wint_t wc;
if (!lineptr || !sizeptr || !stream) {
/* Invalid function parameters. NULL pointers are not allowed. */
SET_ERRNO(EINVAL);
return -1;
}
if (ferror(stream)) {
/* Stream is already in error state. */
SET_ERRNO(EIO);
return -1;
}
if (*sizeptr > 0) {
line = *lineptr;
size = *sizeptr;
} else {
*lineptr = NULL;
}
while (1) {
wc = fgetwc(stream);
if (wc == WEOF || wc == delim)
break;
if (used + 1 > size) {
/* Growth policy. We wish to allocate a chunk of memory at once,
so we don't need to do realloc() too often as it is a bit slow,
relatively speaking. On the other hand, we don't want to do
too large allocations, because that would waste memory.
Anything that makes 'size' larger than 'used' will work.
*/
if (used < 254)
size = 256;
else
if (used < 65536)
size = 2 * used;
else
size = (used | 65535) + 65521;
line = realloc(line, size * sizeof (wchar_t));
if (!line) {
/* Out of memory. */
SET_ERRNO(ENOMEM);
return -1;
}
*lineptr = line;
*sizeptr = size;
}
line[used++] = wc;
}
if (wc == WEOF) {
/* Verify that the WEOF did not indicate a read error. */
if (ferror(stream)) {
/* Read error. */
SET_ERRNO(EIO);
return -1;
}
}
/* Ensure there is enough room for the delimiter and end-of-string mark. */
if (used + 2 > size) {
/* We could reuse the reallocation policy here,
with the exception that the minimum is used + 2, not used + 1.
For simplicity, we use the minimum reallocation instead.
*/
size = used + 2;
line = realloc(line, size * sizeof (wchar_t));
if (!line) {
/* Out of memory. */
SET_ERRNO(ENOMEM);
return -1;
}
*lineptr = line;
*sizeptr = size;
}
/* Append the delimiter, unless end-of-stream mark. */
if (wc != WEOF)
line[used++] = wc;
/* Append the end-of-string nul wide char,
but do not include it in the returned length. */
line[used] = L'\0';
/* Success! */
return (ssize_t)used;
}
ssize_t get_wide_line(wchar_t **lineptr, size_t *sizeptr, FILE *stream)
{
return get_wide_delimited(lineptr, sizeptr, L'\n', stream);
}
int main(int argc, char *argv[])
{
wchar_t *line = NULL, *p;
size_t size = 0;
unsigned long linenum;
FILE *in;
int arg;
if (!setlocale(LC_ALL, ""))
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (fwide(stdout, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s FILENAME [ FILENAME ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program will output the named files, using wide I/O.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
for (arg = 1; arg < argc; arg++) {
in = fopen(argv[arg], "r");
if (!in) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
return EXIT_FAILURE;
}
if (fwide(in, 1) < 1) {
fprintf(stderr, "%s: Wide input is not supported from this file.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
linenum = 0;
while (get_wide_line(&line, &size, in) > 0) {
linenum++;
/* We use another pointer to the line for simplicity.
We must not modify 'line' (except via 'free(line); line=NULL; size=0;'
or a similar reallocation), because it points to dynamically allocated buffer. */
p = line;
/* Remove leading whitespace. */
while (iswspace(*p))
p++;
/* Trim off the line at the first occurrence of newline or carriage return.
(The line will also end at the first embedded nul wide character, L'\0',
if the file contains any.) */
p[wcscspn(p, L"\r\n")] = L'\0';
wprintf(L"%s: Line %lu: '%ls', %zu characters.\n", argv[arg], linenum, p, wcslen(p));
}
if (ferror(in)) {
fprintf(stderr, "%s: Read error.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
if (fclose(in)) {
fprintf(stderr, "%s: Delayed read error.\n", argv[arg]);
return EXIT_FAILURE;
}
wprintf(L"%s: Total %lu lines read.\n", argv[arg], linenum);
fflush(stdout);
}
free(line);
line = NULL;
size = 0;
return EXIT_SUCCESS;
}
Because the EINVAL, EIO, and ENOMEM errno constants are not defined in the C standards, the get_wide_line() and get_wide_delimited() only set errno if you define the USE_ERRNO_CONSTANTS preprocessor value.
The get_wide_line() and get_wide_delimited() are reimplementations of the getwline() and getwdelim() functions from ISO/IEC TR 24731-2:2010; the wide-character equivalents of the POSIX.1 getline() and getdelim() functions. Unlike fgets() or fgetws(), these use a dynamically allocated buffer to hold the line, so there is no fixed line length limits, other than available memory.
I've explicitly marked the code to be under Creative Commons Zero license: No Rights Reserved. It means you can use it in your own code, under whatever license you want.
Note: I would really love users to push their vendors and C standard committee members to get these included in the bog-standard C library part in the next version of the C standard. As you can see from above, they can be implemented in standard C already; it is just that the C library itself can do the same much more efficiently. The GNU C library is a perfect example of that (although even they are stalling with the implementation, because lack of standardization). Just think how many buffer overflow bugs would be avoided if people used getline()/getdelim()/getwline()/getwdelim() instead of fgets()/fgetws()! And avoid having to think about what the maximum reasonable line length in each instance would be to, too. Win-win!
(In fact, we could switch the return type to size_t, and use 0 instead of -1 as the error indicator. That would limit the changes to the text of the C standard to the addition of the four functions. It saddens and irritates me to no end, to have such a significant group of trivial functions so callously and ignorantly overlooked, for no sensible reason. Please, bug your vendors and any C standards committee members you have access to about this, as incessantly and relentlessly as you can manage. Both you and they deserve it.)
The essential parts of the program are
if (!setlocale(LC_ALL, ""))
This tells the C library to use the locale the user has specified.
Please, do not hardcode the locale value into your programs. In most operating systems, all you need to do is to change the LANG or LC_ALL environment variable to the locale you want to use, before running your program.
You might think that "well, I can hardcode it this time, because this is the locale used for this data", but even that can be a mistake, because new locales can be created at any time. This is particularly annoying when the character set part is hardcoded. For example, the ISO 8859 single-byte character set used in Western Europe is ISO 8859-15, not ISO 8859-1, because ISO 8859-15 has the € character in it, whereas ISO 8859-1 does not. If you have hardcoded ISO 8859-1 in your program, then it cannot correctly handle the € character at all.
if (fwide(stream, 1) < 1) for both stdout and file handles
While the C library does internally do an equivalent of the fwide() call based on which type of I/O function you use on the file handle the very first time, the explicit check is much better.
In particular, if the C library cannot support wide I/O to the file or stream represented by the handle, fwide() will return negative. (Unless the second parameter is also zero, it should never return zero; because of the issues in standardization, I recommend a strict return value check approach in this case, to catch vendors who decide to try to make life as difficult as possible for programmers trying to write portable code while technically still fulfilling the standard text, like Microsoft is doing. They even stuffed the C standard committee with their own representatives, so they could tweak C11 away from C99 features they didn't want to support, plus get a stamp of approval of their own nonstandard extensions nobody used before, to help create barriers for developers writing portable C code. Yeah, I don't trust their behaviour at all.)
ssize_t len = get_wide_line(&line, &size, handle);
If you initialize wchar_t *line = NULL; and size_t size = 0; prior to first call to get_wide_line() or get_wide_delimited(), the function will dynamically resize the buffer as needed.
The return value is negative if and only if an error occurs. (The functions should never return zero.)
When a line is read successfully, the return value reflects the number of wide characters in the buffer, including the delimiter (newline, L'\n' for get_wide_delimited()), and is always positive (greater than zero). The contents in the buffer will have a terminating end-of-wide-string character, L'\0', but it is not counted in the return value.
Note that when the delimiter is not L'\0', the buffer may contain embedded wide nul characters, L'\0'. In that case, len > wcslen(line).
The above example programs skips any leading whitespace on each input line, and trims off the line at the first linefeed (L'\n'), carriage return (L'\r'), or nul (L'\0'). Because of this, the return value len is only checked for success (a positive return value greater than zero).
free(line); line = NULL; size = 0;
It is okay to discard the line at any point its contents are no longer needed. I recommend explicitly setting the line pointer to NULL, and the size to zero, to avoid use-after-free bugs. Furthermore, that allows any following get_wide_line() or get_wide_delimited() to correctly dynamically allocate a new buffer.
ferror(handle) after a wide input function fails
Just like with narrow streams and EOF, there are two cases why wide input functions might return WEOF (or return -1, depending on the function): because there is no more input, or because a read error occurred.
There is no reason whatsoever to write computer programs that ignore read or write errors, without reporting them to the user. Sure, they are rare, but not so rare that a programmer can sanely expect them to never occur. (In fact, with Flash memory on flimsy circuits stored in weak plastic housings and subjected to human-sized stresses (I've sat on mine time and time again), the errors aren't that rare.) It is just evil, rather similar to food preparers being too lazy to wash their hands, causing fecal bacteria outbreaks every now and then. Don't be a fecal bacteria spreader equivalent programmer.
Let's say you have a harebrained lecturer who does not allow you to use the above get_wide_line() or get_wide_delimited() functions.
Don't worry. We can implement the same program using fgetws(), if we restrict line to some fixed upper limit (of wide characters). Lines longer than that will read as two or more lines instead:
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <errno.h>
#ifndef MAX_WIDE_LINE_LEN
#define MAX_WIDE_LINE_LEN 1023
#endif
int main(int argc, char *argv[])
{
wchar_t line[MAX_WIDE_LINE_LEN + 1], *p;
unsigned long linenum;
FILE *in;
int arg;
if (!setlocale(LC_ALL, ""))
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (fwide(stdout, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s FILENAME [ FILENAME ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program will output the named files, using wide I/O.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
for (arg = 1; arg < argc; arg++) {
in = fopen(argv[arg], "r");
if (!in) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
return EXIT_FAILURE;
}
if (fwide(in, 1) < 1) {
fprintf(stderr, "%s: Wide input is not supported from this file.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
linenum = 0;
while (1) {
/* If line is an array, (sizeof line / sizeof line[0]) evaluates to
the number of elements in it. This does not work if line is a pointer
to dynamically allocated memory. In that case, you need to remember
number of wide characters you allocated for in a separate variable,
and use that variable here instead. */
p = fgetws(line, sizeof line / sizeof line[0], in);
if (!p)
break;
/* Have a new line. */
linenum++;
/* Remove leading whitespace. */
while (iswspace(*p))
p++;
/* Trim off the line at the first occurrence of newline or carriage return.
(The line will also end at the first embedded nul wide character, L'\0',
if the file contains any.) */
p[wcscspn(p, L"\r\n")] = L'\0';
wprintf(L"%s: Line %lu: '%ls', %zu characters.\n", argv[arg], linenum, p, wcslen(p));
}
if (ferror(in)) {
fprintf(stderr, "%s: Read error.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
if (fclose(in)) {
fprintf(stderr, "%s: Delayed read error.\n", argv[arg]);
return EXIT_FAILURE;
}
wprintf(L"%s: Total %lu lines read.\n", argv[arg], linenum);
fflush(stdout);
}
return EXIT_SUCCESS;
}
Aside from the function used to read each line, the difference is that instead of keeping the while loop condition as while ((p = fgetws(line, ...))) { ... }, I changed to the while (1) { p = fgetws(line, ...); if (!p) break; ... form that I believe is more readable.
I did deliberately show the longer, more complicated-looking one first, and this simpler one last, in the hopes that you would see that the more complicated-looking one actually has the simpler main() -- if we don't just count lines of code or something equally silly, but look at how many opportunities for mistakes there are.
As OP themselves wrote in a comment, the size of the buffer passed to fgets() or fgetws() is a real issue. There are rules of thumb, but they all suffer from being fragile against edits (especially the differences between arrays and pointers). With getline()/getdelim()/getwline()/getwdelim()/get_wide_line()/get_wide_delimited(), the rule of thumb is wchar_t *line = NULL; size_t size = 0; ssize_t len; and len = get_wide_line(&line, &size, handle);. No variations, and simple to remember and use. Plus it gets rid of any fixed limitations.
I have a program that I'm doing for class where I need to take the content of one file, reverse it, and write that reversed content to another file. I have written a program that successfully does this (after much googling as I am new to the C programming language). The problem however is that my professor wants us to submit the program in a certain way with a couple supporting .h and .c files (which I understand is good practice). So I was hoping someone could help me understand exactly how I can take my already existing program and make it into one that is to his specifications, which are as follows:
he would like a file named "file_utils.h" that has function signatures and guards for the following two functions
int read_file( char* filename, char **buffer );
int write_file( char* filename, char *buffer, int size);
thus far I have created this file to try and accomplish this.
#ifndef UTILS_H
#define UTILS_H
int read_file(char* filename, char **buffer);
int write_file(char* filename, char *buffer, int size);
#endif
he would like a file named "file_utils.c" that has the implemented code for the previous two functions
he would like a file named "reverse.c" that accepts command arguments, includes a main function, and calls the functions from the previous two files.
now. I understand how this is supposed to work, but as I'm looking at the program I wrote my way I'm unsure how to actually accomplish the same result by adhering to the previously mentioned specifications.
Below is the program that successfully accomplishes the desired functionality
#include<stdlib.h>
#include<stdio.h>
#include<fcntl.h>
#include<string.h>
#include<sys/stat.h>
#include<unistd.h>
int main(int argc, char *argv[]) {
int file1, file2, char_count, x, k;
char buffer;
// if the number of parameters passed are not correct, exit
//
if (argc != 3) {
fprintf(stderr, "usage %s <file1> <file2>", argv[0]);
exit(EXIT_FAILURE);
}
// if the origin file cannot be opened for whatever reason, exit
// S_IRUSR specifies that this file is to be read by only the file owner
//
if ((file1 = open(argv[1], S_IRUSR)) < 0) {
fprintf(stderr, "The origin-file is inaccessible");
exit(EXIT_FAILURE);
}
// if the destination-file cannot be opened for whatever reason, exit
// S_IWUSR specifies that this file is to be written to by only the file owner
//
if ((file2 = creat(argv[2], S_IWUSR)) < 0) {
fprintf(stderr, "The destination-file is inaccessible");
exit(EXIT_FAILURE);
}
// SEEK_END is used to place the read/write pointer at the end of the file
//
char_count = lseek(file1, (off_t) 0, SEEK_END);
printf("origin-file size is %d\n", char_count - 1);
for (k = char_count - 1; k >= 0; k--) {
lseek(file1, (off_t) k, SEEK_SET);
x = read(file1, &buffer, 1);
if (x != 1) {
fprintf(stderr, "can't read 1 byte");
exit(-1);
}
x = write(file2, &buffer, 1);
if (x != 1) {
fprintf(stderr, "can't write 1 byte");
exit(-1);
}
}
write(STDOUT_FILENO, "Reversal & Transfer Complete\n", 5);
close(file1);
close(file2);
return 0;
}
any insight as to how I can accomplish this "re-factoring" of sorts would be much appreciated, thanks!
The assignment demands a different architecture than your program. Unfortunately, this will not be a refactoring but a rewrite.
You have most of the pieces of read_file and write_file already: opening the file, determining its length, error handling. Those can be copy-pasted into the new functions.
But read_file should call malloc and read the file into memory, which is different.
You should create a new function in reverse.c, called by main, to reverse the bytes in a memory buffer.
After that function runs, write_file should attempt to open the file, and only do its error checking at that point.
Your simple program is superior because it validates the output file before any I/O, and it requires less memory. Its behavior satisfies the assignment, but its form does not.
Dear respected programmers. Please could you help me (again) on how to put the following code into functions for my program.
I have read on-line and understand how functions work but when I do it myself it all goes pear shaped/wrong(I am such a noob).
Please could you help with how to for example to write the code below into functions.(like opening the input file).
My initial code looks like:
main (int argc, char **argv)
{
int bytes_read, bytes_written;
struct stat inode;
int input_fd, output_fd;
char buffer[64];
int eof = 0;
int i;
/* Check the command line arguments */
if (argc != 3)
{
printf("syntax is: %s \n", <fromfile> <tofile>\n", argv[0]);
exit (1);
}
/* Check the input file exists and is a file */
if ((stat(argv[1], &inode) == -1) || (!S_ISREG(inode.st_mode)))
{
printf("%s is not a file\n", argv[1]);
exit(2);
}
/* Check that the output file doesnt exist */
if (stat(argv[2], &inode) != -1)
{
printf("Warning: The file %s already exists. Not going to overwrite\n", argv[2]);
exit(2);
}
/* Open the input file for reading */
input_fd = open(argv[1], O_RDONLY, 0);
if (input_fd == -1)
{
printf("%s cannot be opened\n", argv[1]);
exit(3);
}
output_fd = open(argv[2], O_CREAT | O_WRONLY | O_EXCL , S_IRUSR|S_IWUSR);
if (output_fd == -1)
{
printf("%s cannot be opened\n", argv[2]);
exit(3);
}
/* Begin processing the input file here */
while (!eof)
{
bytes_read = read(input_fd, buffer, sizeof(buffer));
if (bytes_read == -1)
{
printf("%s cannot be read\n", argv[1]);
exit(4);
}
if (bytes_read > > 0)
{
bytes_written = write(output_fd, buffer, bytes_read);
if (bytes_written == -1)
{
printf("There was an error writing to the file %s\n",argv[2]);
exit(4);
}
if (bytes_written != bytes_read)
{
printf("Devistating failure! Bytes have either magically appeared and been written or dissapeard and been skipped. Data is inconsistant!\n");
exit(101);
}
}
else
{
eof = 1;
}
}
close(input_fd);
close(output_fd);
}
My attempt at opening an output file:
void outputFile(int argc, char **argv)
{
/* Check that the output file doesnt exist */
if (stat(argv[argc-1], &inode) != -1)
{
printf("Warning: The file %s already exists. Not going to overwrite\n", argv[argc-1]);
return -1;
}
/*Opening ouput files*/
file_desc_out = open(argv[i],O_CREAT | O_WRONLY | O_EXCL , S_IRUSR|S_IWUSR);
if(file_desc_out == -1)
{
printf("Error: %s cannot be opened. \n",argv[i]); //insted of argv[2] have pointer i.
return -1;
}
}
Any help on how I would now reference to this in my program is appreciated thank you.
I tried:
ouputfile (but I cant figure out what goes here and why either).
Maybe the most useful function for you is:
#include <stdio.h>
#include <stdarg.h>
extern void error_exit(int rc, const char *format, ...); /* In a header */
void error_exit(int rc, const char *format, ...)
{
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
va_end(args);
exit(rc);
}
You can then write:
if (stat(argv[2], &inode) != -1)
error_exit(2, "Warning: The file %s exists. Not going to overwrite\n",
argv[2]);
Which has the merit of brevity.
You write functions to do sub-tasks. Deciding where to break up your code into functions is tricky - as much art as science. Your code is not so big that it is completely awful to leave it as it is - one function (though the error handling can be simplified as above).
If you want to practice writing functions, consider splitting it up:
open_input_file()
open_output_file()
checked_read()
checked_write()
checked_close()
These functions would allow your main code to be written as:
int main(int argc, char **argv)
{
int bytes_read;
int input_fd, output_fd;
char buffer[64];
if (argc != 3)
error_exit(1, "Usage: %s <fromfile> <tofile>\n", argv[0]);
input_fd = open_input_file(argv[1]);
output_fd = open_output_file(argv[2]);
while ((bytes_read = checked_read(input_fd, buffer, sizeof(buffer)) > 0)
check_write(output_fd, buffer, bytes_read);
checked_close(input_fd);
checked_close(output_fd);
return 0;
}
Because you've tucked the error handling out of sight, it is now much easier to see the structure of the program. If you don't have enough functions yet, you can bury the loop into a function void file_copy(int fd_in, int fd_out). That removes more clutter from main() and leaves you with very simple code.
Given an initial attempt at a function to open the output file:
void outputFile(int argc, char **argv)
{
/* Check that the output file doesnt exist */
if (stat(argv[argc-1], &inode) != -1)
{
printf("Warning: The file %s already exists. Not going to overwrite\n", argv[argc-1]);
return -1;
}
/*Opening ouput files*/
file_desc_out = open(argv[i],O_CREAT | O_WRONLY | O_EXCL , S_IRUSR|S_IWUSR);
if(file_desc_out == -1)
{
printf("Error: %s cannot be opened. \n",argv[i]); //insted of argv[2] have pointer i.
return -1;
}
}
Critique:
You have to define the variables used by the function in the function (you will want to avoid global variables as much as possible, and there is no call for any global variable in this code).
You have to define the return type. You are opening a file - how is the file descriptor going to be returned to the calling code? So, the return type should be int.
You pass only the information needed to the function - a simple form of 'information hiding'. In this case, you only need to pass the name of the file; the information about file modes and the like is implicit in the name of the function.
In general, you have to decide how to handle errors. Unless you have directives otherwise from your homework setter, it is reasonable to exit on error with an appropriate message. If you return an error indicator, then the calling code has to test for it, and decide what to do about the error.
Errors and warnings should be written to stderr, not to stdout. The main program output (if any) goes to stdout.
Your code is confused about whether argv[i] or argv[argc-1] is the name of the output file. In a sense, this criticism is irrelevant once you pass just the filename to the function. However, consistency is a major virtue in programming, and using the same expression to identify the same thing is usually a good idea.
Consistency of layout is also important. Don't use both if( and if ( in your programs; use the canonical if ( notation as used by the language's founding fathers, K&R.
Similarly, be consistent with no spaces before commas, a space after a comma, and be consistent with spaces around operators such as '|'. Consistency makes your code easier to read, and you'll be reading your code a lot more often than you write it (at least, once you've finished your course, you will do more reading than writing).
You cannot have return -1; inside a function that returns no value.
When you a splitting up code into functions, you need to copy/move the paragraphs of code that you are extracting, leaving behind a call to the new function. You also need to copy the relevant local variables from the calling function into the new function - possibly eliminating the variables in the calling function if they are no longer used there. You do compile with most warnings enabled, don't you? You want to know about unused variables etc.
When you create the new function, one of the most important parts is working out what the correct signature of the function is. Does it return a value? If so, which value, and what is its type? If not, how does it handle errors? In this case, you probably want the function to bail out (terminate the program) if it runs into an error. In bigger systems, you might need to consistently return an error indicator (0 implies success, negative implies failure, different negatives indicating different errors). When you work with function that return an error indicator, it is almost always crucial that you check the error indicators in the calling code. For big programs, big swathes of the code can be all about error handling. Similarly, you need to work out which values are passed into the function.
I'm omitting advice about things such as 'be const correct' as overkill for your stage in learning to program in C.
you seem to actually understand how to make a function. making a function really isnt that hard. first, you need to kind of understand that a function has a type. in otherwords, argc has type int and argv has type char *, your function (currently) has type void. void means it has no value, which means when you return, you return nothing.
however, if you look at your code, you do return -1. it looks like you want to return an interger. so you should change the top from void outputfile(...) to int outputfile(...).
next, your function must return. it wont compile if there is a circumstance where it won't return (besides infinite loops). so at the very bottom, if no errors happen, it will reach the end. since you're no longer using "void" as the return type, you must return something before the end of the function. so i suggest putting a return 1; to show that everything went great
There's several things.
The function return type isn't what you want. You either want to return a file descriptor or an error code. IIRC, the file descriptor is a nonnegative int, so you can use a return type of int rather than void. You also need to return something on either path, either -1 or file_desc_out.
You probably don't want to pass in the command-line arguments as a whole, but rather something like argv[argc - 1]. In that case, the argument should be something like char * filename rather than the argc/argv it has now. (Note that the argv[i] you've got in the last printf is almost certainly wrong.)
This means it would be called something like
int file_desc_out = outputFile(argv[argc - 1]);
You need to have all variables declared in the function, specifically inode and file_desc_out.
Finally, put an extra level of indentation on the code inside the { and } of the function itself.
This looks like a simple question, but I didn't find anything similar here.
Since there is no file copy function in C, we have to implement file copying ourselves, but I don't like reinventing the wheel even for trivial stuff like that, so I'd like to ask the cloud:
What code would you recommend for file copying using fopen()/fread()/fwrite()?
What code would you recommend for file copying using open()/read()/write()?
This code should be portable (windows/mac/linux/bsd/qnx/younameit), stable, time tested, fast, memory efficient and etc. Getting into specific system's internals to squeeze some more performance is welcomed (like getting filesystem cluster size).
This seems like a trivial question but, for example, source code for CP command isn't 10 lines of C code.
This is the function I use when I need to copy from one file to another - with test harness:
/*
#(#)File: $RCSfile: fcopy.c,v $
#(#)Version: $Revision: 1.11 $
#(#)Last changed: $Date: 2008/02/11 07:28:06 $
#(#)Purpose: Copy the rest of file1 to file2
#(#)Author: J Leffler
#(#)Modified: 1991,1997,2000,2003,2005,2008
*/
/*TABSTOP=4*/
#include "jlss.h"
#include "stderr.h"
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_fcopy_c[] = "#(#)$Id: fcopy.c,v 1.11 2008/02/11 07:28:06 jleffler Exp $";
#endif /* lint */
void fcopy(FILE *f1, FILE *f2)
{
char buffer[BUFSIZ];
size_t n;
while ((n = fread(buffer, sizeof(char), sizeof(buffer), f1)) > 0)
{
if (fwrite(buffer, sizeof(char), n, f2) != n)
err_syserr("write failed\n");
}
}
#ifdef TEST
int main(int argc, char **argv)
{
FILE *fp1;
FILE *fp2;
err_setarg0(argv[0]);
if (argc != 3)
err_usage("from to");
if ((fp1 = fopen(argv[1], "rb")) == 0)
err_syserr("cannot open file %s for reading\n", argv[1]);
if ((fp2 = fopen(argv[2], "wb")) == 0)
err_syserr("cannot open file %s for writing\n", argv[2]);
fcopy(fp1, fp2);
return(0);
}
#endif /* TEST */
Clearly, this version uses file pointers from standard I/O and not file descriptors, but it is reasonably efficient and about as portable as it can be.
Well, except the error function - that's peculiar to me. As long as you handle errors cleanly, you should be OK. The "jlss.h" header declares fcopy(); the "stderr.h" header declares err_syserr() amongst many other similar error reporting functions. A simple version of the function follows - the real one adds the program name and does some other stuff.
#include "stderr.h"
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
void err_syserr(const char *fmt, ...)
{
int errnum = errno;
va_list args;
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
if (errnum != 0)
fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
exit(1);
}
The code above may be treated as having a modern BSD license or GPL v3 at your choice.
As far as the actual I/O goes, the code I've written a million times in various guises for copying data from one stream to another goes something like this. It returns 0 on success, or -1 with errno set on error (in which case any number of bytes might have been copied).
Note that for copying regular files, you can skip the EAGAIN stuff, since regular files are always blocking I/O. But inevitably if you write this code, someone will use it on other types of file descriptors, so consider it a freebie.
There's a file-specific optimisation that GNU cp does, which I haven't bothered with here, that for long blocks of 0 bytes instead of writing you just extend the output file by seeking off the end.
void block(int fd, int event) {
pollfd topoll;
topoll.fd = fd;
topoll.events = event;
poll(&topoll, 1, -1);
// no need to check errors - if the stream is bust then the
// next read/write will tell us
}
int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
for(;;) {
void *pos;
// read data to buffer
ssize_t bytestowrite = read(fdin, buf, bufsize);
if (bytestowrite == 0) break; // end of input
if (bytestowrite == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdin, POLLIN);
continue;
}
return -1; // error
}
// write data from buffer
pos = buf;
while (bytestowrite > 0) {
ssize_t bytes_written = write(fdout, pos, bytestowrite);
if (bytes_written == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdout, POLLOUT);
continue;
}
return -1; // error
}
bytestowrite -= bytes_written;
pos += bytes_written;
}
}
return 0; // success
}
// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
#define FILECOPY_BUFFER_SIZE (64*1024)
#endif
int copy_data(int fdin, int fdout) {
// optional exercise for reader: take the file size as a parameter,
// and don't use a buffer any bigger than that. This prevents
// memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
// is small.
for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
void *buffer = malloc(bufsize);
if (buffer != NULL) {
int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
free(buffer);
return result;
}
}
// could use a stack buffer here instead of failing, if desired.
// 128 bytes ought to fit on any stack worth having, but again
// this could be made configurable.
return -1; // errno is ENOMEM
}
To open the input file:
int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;
Opening the output file is tricksy. As a basis, you want:
int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
close(fdin);
return -1;
}
But there are confounding factors:
you need to special-case when the files are the same, and I can't remember how to do that portably.
if the output filename is a directory, you might want to copy the file into the directory.
if the output file already exists (open with O_EXCL to determine this and check for EEXIST on error), you might want to do something different, as cp -i does.
you might want the permissions of the output file to reflect those of the input file.
you might want other platform-specific meta-data to be copied.
you may or may not wish to unlink the output file on error.
Obviously the answers to all these questions could be "do the same as cp". In which case the answer to the original question is "ignore everything I or anyone else has said, and use the source of cp".
Btw, getting the filesystem's cluster size is next to useless. You'll almost always see speed increasing with buffer size long after you've passed the size of a disk block.
the size of each read need to be a multiple of 512 ( sector size ) 4096 is a good one
Here is a very easy and clear example: Copy a file. Since it is written in ANSI-C without any particular function calls I think this one would be pretty much portable.
Depending on what you mean by copying a file, it is certainly far from trivial. If you mean copying the content only, then there is almost nothing to do. But generally, you need to copy the metadata of the file, and that's surely platform dependent. I don't know of any C library which does what you want in a portable manner. Just handling the filename by itself is no trivial matter if you care about portability.
In C++, there is the file library in boost
One thing I found when implementing my own file copy, and it seems obvious but it's not: I/O's are slow. You can pretty much time your copy's speed by how many of them you do. So clearly you need to do as few of them as possible.
The best results I found were when I got myself a ginourmous buffer, read the entire source file into it in one I/O, then wrote the entire buffer back out of it in one I/O. If I even had to do it in 10 batches, it got way slow. Trying to read and write out each byte, like a naieve coder might try first, was just painful.
The accepted answer written by Steve Jessop does not answer to the first part of the quession, Jonathan Leffler do it, but do it wrong: code should be written as
while ((n = fread(buffer, 1, sizeof(buffer), f1)) > 0)
if (fwrite(buffer, n, 1, f2) != 1)
/* we got write error here */
/* test ferror(f1) for a read errors */
Explanation:
sizeof(char) = 1 by definition, always: it does not matter how many bits in it, 8 (in most cases), 9, 11 or 32 (on some DSP, for example) — size of char is one. Note, it is not an error here, but an extra code.
The fwrite function writes upto nmemb (second argument) elements of specified size (third argument), it does not required to write exactly nmemb elements. To fix this you must write the rest of the data readed or just write one element of size n — let fwrite do all his work. (This item is in question, should fwrite write all data or not, but in my version short writes impossible until error occurs.)
You should test for a read errors too: just test ferror(f1) at the end of loop.
Note, you probably need to disable buffering on both input and output files to prevent triple buffering: first on read to f1 buffer, second in our code, third on write to f2 buffer:
setvbuf(f1, NULL, _IONBF, 0);
setvbuf(f2, NULL, _IONBF, 0);
(Internal buffers should, probably, be of size BUFSIZ.)