Effect of Buffer Size in File I/O in Unix - c

I am trying to understand the inner workings of Unix based OSs. I was reading on buffered I/O, and how the buffer size affects the number of system calls made, which in turn affects the total time taken by say, a copy program. To being with, here is my program:
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
long currentTimeMillis();
int main(int argc, char *argv[]) {
int bufsize = atoi(argv[3]);
printf("copying with buffer size %d\n", bufsize);
char buf[bufsize];
//open the file
int fd_from = open(argv[1], O_RDWR);
if(-1 == fd_from) {
printf("Error opening source file\n");
return -1;
}
//file to be copied to
int fd_to = open(argv[2], O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
if(-1 == fd_to) {
printf("Error opening destination file\n");
return -1;
}
//copy
long startTime = currentTimeMillis();
int bytes_read = 0;
long totalTimeForRead = 0;
long totalTimeForWrite = 0;
while(1) {
long readStartTime = currentTimeMillis();
int bytes_read = read(fd_from,buf,bufsize);
long readEndTime = currentTimeMillis();
if(0 == bytes_read) {
break;
}
if(-1 == bytes_read) {
printf("Error occurred while reading source file\n");
return -1;
}
totalTimeForRead += readEndTime - readStartTime;
long writeStartTime = currentTimeMillis();
int bytes_written = write(fd_to,buf,bufsize);
long writeEndTime = currentTimeMillis();
totalTimeForWrite += (writeEndTime - writeStartTime);
if(-1 == bytes_written) {
printf("Some error occurred while writing file\n");
return -1;
}
}
long endTime = currentTimeMillis();
printf("Total time to copy%ld\n", endTime - startTime);
printf("Total time to write%ld\n", totalTimeForWrite);
printf("Total time to read%ld\n", totalTimeForRead);
}
long currentTimeMillis() {
struct timeval time;
gettimeofday(&time, NULL);
return time.tv_sec * 1000 + time.tv_usec / 1000;
}
I am using a 16G MacBook Pro with 2.9GHz Intel i7 (if this information would be useful). The size of source file is 2.8G. I was a bit surprised to see that total time take by read() is much smaller than write(). Here're my findings with a buffer size of 16K:
./a.out largefile dest 16382
copying with buffer size 16382
Total time to copy5987
Total time to write5330
Total time to read638
From what I have read, write() returns immediately after transferring the data to the kernel buffer from the user buffer. So the time taken by it is this time + the time taken for the system call initiation. read() also reads from kernel buffer to user buffer, so the total time taken should be same (in both cases, there is no disk I/O).
Why is it then that there is such drastic difference in the results? Or am I benchmarking it wrong? Final question: Is it alright to do such benchmarking on a SSD, which has limited write cycles?

Related

Hard Fault when optimization level 1 is enabled (FreeRTOS)

I have the following function used to test the file system (yaffs2). It is called by a FreeRTOS task with a stack size of 65535. In debug mode, everything appears to work fine, but when I enable -O1, I get a hard fault immediately after printing "Testing filesystem...".
I tried changing the test size to a much lower value, thinking maybe there is a memory issue (note that the FreeRTOS heap is 7MB wide), but nothing changed. Any ideas what might be causing this?
#include <stdio.h>
#include "FreeRTOS.h"
#include "task.h"
#include "yaffs_trace.h"
#include "yaffsfs.h"
int fs_test(void)
{
int fd;
int test_size = 1000000;
const char path[] = "nand/testfile.dat";
int pattern = 0x55;
uint8_t *test;
uint8_t *test_read;
uint32_t start_time, stop_time;
DEBUG_PRINT("Testing filesystem...\n");
fd = yaffs_open(path, O_CREAT | O_EXCL | O_WRONLY, S_IREAD | S_IWRITE);
if(fd < 0) {
DEBUG_PRINT("cannot create test file\n");
return -1;
}
test = pvPortMalloc(test_size);
test_read = pvPortMalloc(test_size);
if(test == NULL || test_read == NULL) {
DEBUG_PRINT("not enough memory\n");
vPortFree(test);
vPortFree(test_read);
yaffs_close(fd);
yaffs_rm(path);
return -2;
}
memset(test, pattern, test_size);
start_time = xTaskGetTickCount();
yaffs_write(fd, test, test_size);
yaffs_close(fd);
stop_time = xTaskGetTickCount();
DEBUG_PRINTF("Average write speed = %d kB/s\n", test_size / (stop_time-start_time));
fd = yaffs_open(path, O_RDWR, S_IREAD | S_IWRITE);
if(fd < 0) {
DEBUG_PRINT("cannot open test file\n");
vPortFree(test);
vPortFree(test_read);
yaffs_close(fd);
yaffs_rm(path);
return -1;
}
start_time = xTaskGetTickCount();
yaffs_read(fd, test_read, test_size);
yaffs_close(fd);
stop_time = xTaskGetTickCount();
DEBUG_PRINTF("Average read speed = %d kB/s\n", test_size / (stop_time-start_time));
if (!memcmp(test, test_read, test_size))
DEBUG_PRINT("file integrity test successful\n");
else
DEBUG_PRINT("file integrity test failed!\n");
yaffs_rm(path);
vPortFree(test);
vPortFree(test_read);
return 0;
}

DIRECT I/O performance

I am trying to measure DIRECT IO performance. By my understanding DIRECT I/O ignores the page cache and goes to the underlying device to fetch the data. Therefore, if we are reading the same file over and over again, DIRECT I/O would be slower compared to accesses involving the page cache as the file would be cached.
#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <time.h>
char *DIRECT_FILE_PATH = "direct.dat";
char *NON_DIRECT_FILE_PATH = "no_direct.dat";
int FILE_SIZE_MB = 100;
int NUM_ITER = 100;
void lay_file(int direct_flag) {
int flag = O_RDWR | O_CREAT | O_APPEND | O_DIRECT;
mode_t mode = 0644;
int fd;
if (direct_flag) {
fd = open(DIRECT_FILE_PATH, flag, mode);
} else {
fd = open(NON_DIRECT_FILE_PATH, flag, mode);
}
if (fd == -1) {
printf("Failed to open file. Error: \t%s\n", strerror(errno));
}
ftruncate(fd, FILE_SIZE_MB*1024*1024);
close(fd);
}
void read_file(int direct_flag) {
mode_t mode = 0644;
void *buf = malloc(FILE_SIZE_MB*1024*1024);
int fd, flag;
if (direct_flag) {
flag = O_RDONLY | O_DIRECT;
fd = open(DIRECT_FILE_PATH, flag, mode);
} else {
flag = O_RDONLY;
fd = open(NON_DIRECT_FILE_PATH, flag, mode);
}
for (int i=0; i<NUM_ITER; i++) {
read(fd, buf, FILE_SIZE_MB*1024*1024);
lseek(fd,0,SEEK_SET);
}
close(fd);
}
int main() {
lay_file(0);
lay_file(1);
clock_t t;
t = clock();
read_file(1);
t = clock() - t;
double time_taken = ((double)t)/CLOCKS_PER_SEC; // in seconds
printf("DIRECT I/O read took %f seconds to execute \n", time_taken);
t = clock();
read_file(0);
t = clock() - t;
time_taken = ((double)t)/CLOCKS_PER_SEC; // in seconds
printf("NON DIRECT I/O read took %f seconds to execute \n", time_taken);
return 0;
}
Using the above code to measure DIRECT I/O performance tells me that DIRECT I/O is faster than regular IO involving the page cache. This is the output
**
DIRECT I/O read took 0.824861 seconds to execute
NON DIRECT I/O read took 1.643310 seconds to execute
**
Please let me know if I am missing something. I have an NVMe SSD as the storage device. I wonder if it is too fast to really show the difference in performance of when the page cache is used and not used.
UPDATE:
Changing the buffer size to 4KB shows that DIRECT I/O is slower. The large buffer size was probably making large sequential writes to the underlying device which is more helpful but would still like some insights.
** DIRECT I/O read took 0.000209 seconds to execute NON DIRECT I/O read took 0.000151 seconds to execute **

copying contents of file to another file n bytes at a time in c

Trying to copy the contents of a file to another file by copying n bytes at a time in c. I believe the code below works for copying one byte at a time but am not sure how to make it work for n number of bytes, have tried making a character array of size n and changing the read/write functions to read(sourceFile , &c, n) and write(destFile , &c, n), but the buffer doesn't appear to work that way.
#include <fcntl.h>
#include <unistd.h>
#include <stdint.h>
#include <time.h>
void File_Copy(int sourceFile, int destFile, int n){
char c;
while(read(sourceFile , &c, 1) != 0){
write(destFile , &c, 1);
}
}
int main(){
int fd, fd_destination;
fd = open("source_file.txt", O_RDONLY); //opening files to be read/created and written to
fd_destination = open("destination_file.txt", O_RDWR | O_CREAT);
clock_t begin = clock(); //starting clock to time the copying function
File_Copy(fd, fd_destination, 100); //copy function
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC; //timing display
return 0;
}
how to make it work for n number of bytes
Just read N number of bytes and copy that many bytes that you successfully read.
#define N 4096
void File_Copy(int sourceFile, int destFile, int n){
char c[N];
const size_t csize = sizeof(c)/sizeof(*c);
while (1) {
const ssize_t readed = read(sourceFile, c, csize);
if (readed <= 0) {
// nothing more to read
break;
}
// copy to destination that many bytes we read
const ssize_t written = write(destFile, c, readed);
if (written != readed) {
// we didn't transfer everything and destFile should be blocking
// handle error
abort();
}
}
}
You want to copy a buffer of size n at once:
void File_Copy(int sourceFile, int destFile, int n){
char c[n];
ssize_t st;
while((st = read(sourceFile , c, n)) > 0){
write(destFile , c, st);
}
}
Note, that not necessarily n bytes are always copied at once, it might be less. And you also have to check the return value of write() and handle the situation, when less bytes were written, as it fits your needs.
One example is a loop:
while (st > 0) {
int w = write(destFile, c, st);
if (w < 0) {
perror("write");
return;
}
st -= w;
}
Another issue: When you create the destination file here
fd_destination = open("destination_file.txt", O_RDWR | O_CREAT);
you do not specify the third mode parameter. This leads to a random mode, which might lead to this open() to fail the next time. So better add a valid mode, for example like this:
fd_destination = open("destination_file.txt", O_RDWR | O_CREAT, 0644);
This might have distorted your test results.
This is my version using lseek (no loop required):
It relies on read and write always processing the complete buffer and never a part of it (I don't know if this is guaranteed).
void File_Copy(int sourceFile, int destFile)
{
off_t s = lseek(sourceFile, 0, SEEK_END);
lseek(sourceFile, 0, SEEK_SET);
char* c = malloc(s);
if (read(sourceFile, c, s) == s)
write(destFile, c, s);
free(c);
}
The following code does not rely on this assumption and can also be used for file descriptors not supporting lseek.
void File_Copy(int sourceFile, int destFile, int n)
{
char* c = malloc(n);
while (1)
{
ssize_t readStatus = read(sourceFile, c, n);
if (readStatus == -1)
{
printf("error, read returned -1, errno: %d\n", errno);
return;
}
if (readStatus == 0)
break; // EOF
ssize_t bytesWritten = 0;
while (bytesWritten != readStatus)
{
ssize_t writeStatus = write(destFile, c + bytesWritten, readStatus - bytesWritten);
if (writeStatus == -1)
{
printf("error, write returned -1, errno is %d\n", errno);
return;
}
bytesWritten += writeStatus;
if (bytesWritten > readStatus) // should not be possible
{
printf("how did 'bytesWritten > readStatus' happen?");
return;
}
}
}
free(c);
}
On my system (PCIe SSD) I get best performance with a buffer between 1MB and 4MB (you can also use dd to find this size). Bigger buffers don't make sense. And you need big files (try 50GB) to see the effect.

Filesystem VS Raw disk benchmarking in C

I am doing some benchmarking (on OS X) to see how the use of file system influences the bandwidth etc. I am using concurrency with the hope to create fragmentation in the FS.
However, it looks like using the FS is more efficient than raw disk accesses. Why?
Here is my code:
#include <pthread.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#define NO_THREADS (2)
#define PACKET_SIZE (1024 * 4)
#define SIZE_TO_WRITE (1024 * 1024 * 1024)
void write_buffer(void *arg) {
int *p_start = arg;
int start = *p_start;
char buffer[PACKET_SIZE];
char path[50];
sprintf(path, "file%d", start);
int fd = open(path, O_CREAT | O_WRONLY | O_APPEND);
//int fd = open("/dev/rdisk0s4", O_WRONLY);
if (fd < 0) {
fprintf(stderr, "Cound not open.\n", stderr);
goto end;
}
//lseek(fd, start * SIZE_TO_WRITE, SEEK_SET);
int current;
for (current = start; current < start + SIZE_TO_WRITE; current += PACKET_SIZE) {
int i;
for (i = 0; i < PACKET_SIZE; ++i) {
buffer[i] = i + current;
}
if (PACKET_SIZE != write(fd, buffer, PACKET_SIZE)) {
fprintf(stderr, "Could not write packet %d properly.", current);
goto close;
}
}
fsync(fd);
close:
close(fd);
end:
pthread_exit(0);
}
void flush(void) {
fflush(stdout);
fflush(stderr);
}
int main(void) {
pthread_t threads[NO_THREADS];
int starts[NO_THREADS];
int i;
atexit(flush);
for (i = 0; i < NO_THREADS; ++i) {
starts[i] = i;
if(pthread_create(threads + i, NULL, (void *) &write_buffer, (void *)(starts + i))) {
fprintf(stderr, "Error creating thread no %d\n", i);
return EXIT_FAILURE;
}
}
for (i = 0; i < NO_THREADS; ++i) {
if(pthread_join(threads[i], NULL)) {
fprintf(stderr, "Error joining thread\n");
return EXIT_FAILURE;
}
}
puts("Done");
return EXIT_SUCCESS;
}
With the help of the FS, the 2 threads write the file in 31.33 seconds. Without, it is achieved after minutes...
When you use /dev/rdisk0s4 instead of /path/to/normal/file%d, for every write you perform the OS will issue a disk I/O. Even if that disk is an SSD, that means that the round-trip time is probably at least a few hundred microseconds on average. When you write to the file instead, the filesystem isn't actually issuing your writes to disk until later. The Linux man page describes this well:
A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guarantee that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data.
So, the data you wrote is being buffered by the filesystem, which only requires that a copy be made in memory -- this probably takes on the order of a few microseconds at most. If you want to do an apples-to-apples comparison, you should make sure you're doing synchronous I/O for both test cases. Even running fsync after the whole test is done will probably allow the filesystem to be much faster, since it will batch up the I/O into one continuous streaming write, which could be faster than what your test directly on the disk can achieve.
In general, writing good systems benchmarks is incredibly difficult, especially when you don't know a lot about the system you're trying to test. I'd recommend using an off-the-shelf Unix filesystem benchmarking toolkit if you want high quality results -- otherwise, you could spend literally a lifetime learning about performance pathologies of the OS and FS you're testing... not that that's a bad thing, if you're interested in it like I am :-)

Why does allocating a lot of memory give worse results?

So in my assignment I am testing the times it takes for different copying functions to copy. One of them I am a bit curious on the results. In one my copy functions it involves allocating memory like so:
int copyfile3(char* infilename, char* outfilename, int size) {
int infile; //File handles for source and destination.
int outfile;
infile = open(infilename, O_RDONLY); // Open the input and output files.
if (infile < 0) {
open_file_error(infilename);
return 1;
}
outfile = open(outfilename, O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
if (outfile < 0) {
open_file_error(outfilename);
return 1;
}
int intch; // Character read from input file. must be an int to catch EOF.
char *ch = malloc(sizeof(char) * (size + 1));
gettimeofday(&start, NULL);
// Read each character from the file, checking for EOF.
while ((intch = read(infile, ch, size)) > 0) {
write(outfile, ch, intch); // Write out.
}
gettimeofday(&end, NULL);
// All done--close the files and return success code.
close(infile);
close(outfile);
free(ch);
return 0; // Success!
}
The main program allows the user to input the infile outfile copyFunctionNumber. If 3 is chosen the user can input a specific buffer size. So I was testing copying a file (6.3 MB) with different buffer sizes. When I choose 1024 it gives a difference of 42,000 microseconds, for 2000 it gives 26,000 microseconds, but for 3000 it gives 34,000 microseconds. My question is why does it go back up? And how could you tell what the perfect buffer size will be for the copying to take the least amount of time.

Resources