Disk write does not work with malloc in C - c

I did write to disk using C code.
First I tried with malloc and found that write did not work (write returned -1):
fd = open('/dev/sdb', O_DIRECT | O_SYNC | O_RDWR);
void *buff = malloc(512);
lseek(fd, 0, SEEK_SET);
write(fd, buff, 512);
Then I changed the second line with this and it worked:
void *buff;
posix_memalign(&buff,512,512);
However, when I changed the lseek offset to 1: lseek(fd, 1, SEEK_SET);, write did not work again.
First, why didn't malloc work?
Then, I know that in my case, posix_memalign guarantees that start address of memory alignment must be multiple of 512. But, should not memory alignment and write be a separate process? So why I could not write to any offset that I want?

From the Linux man page for open(2):
The O_DIRECT flag may impose alignment restrictions on the length
and address of user-space buffers and the file offset of I/Os.
And:
Under Linux 2.4, transfer sizes, and the alignment of the user
buffer and the file offset must all be multiples of the logical block
size of the filesystem. Under Linux 2.6, alignment to 512-byte
boundaries suffices.
The meaning of O_DIRECT is to "try to minimize cache effects of the I/O to and from this file", and if I understand it correctly it means that the kernel should copy directly from the user-space buffer, thus perhaps requiring stricter alignment of the data.

Maybe the documentation doesn't say, but it's quite possible that write and reads to/from a block device is required to be aligned and for entire blocks to succeed (this would explain why you get failure in your first and last cases but not in the second). If you use linux the documentation of open(2) basically says this:
The O_DIRECT flag may impose alignment restrictions on the length and
address of user-space buffers and the file offset of I/Os. In Linux
alignment restrictions vary by file system and kernel version and
might be absent entirely. However there is cur‐
rently no file system-independent interface for an application to discover these restrictions for a given file or file system. Some
file systems provide their own interfaces for doing so, for example
the XFS_IOC_DIOINFO operation in xfsctl(3).
Your code shows lack of error handling. Every line in the code contains functions that may fail and open, lseek and write also reports the cause of error in errno. So with some kind of error handling it would be:
fd = open('/dev/sdb', O_DIRECT | O_SYNC | O_RDWR);
if( fd == -1 ) {
perror("open failed");
return;
}
void *buff = malloc(512);
if( !buff ) {
printf("malloc failed");
return;
}
if( lseek(fd, 0, SEEK_SET) == (off_t)-1 ) {
perror("lseek failed");
free(buff);
return;
}
if( write(fd, buff, 512) == -1 ) {
perror("write failed");
free(buff);
return;
}
in that case you would at least get a more detailed explaination on what goes wrong. In this case I suspect that you get EIO (Input/output error) from the write call.
Note that the above maybe isn't complete errorhandling as perror and printf themselves can fail (and you might want to do something about that possibility).

Related

Write atomically to a file using Write() with snprintf()

I want to be able to write atomically to a file, I am trying to use the write() function since it seems to grant atomic writes in most linux/unix systems.
Since I have variable string lengths and multiple printf's, I was told to use snprintf() and pass it as an argument to the write function in order to be able to do this properly, upon reading the documentation of this function I did a test implementation as below:
int file = open("file.txt", O_CREAT | O_WRONLY);
if(file < 0)
perror("Error:");
char buf[200] = "";
int numbytes = snprintf(buf, sizeof(buf), "Example string %s" stringvariable);
write(file, buf, numbytes);
From my tests it seems to have worked but my question is if this is the most correct way to implement it since I am creating a rather large buffer (something I am 100% sure will fit all my printfs) to store it before passing to write.
No, write() is not atomic, not even when it writes all of the data supplied in a single call.
Use advisory record locking (fcntl(fd, F_SETLKW, &lock)) in all readers and writers to achieve atomic file updates.
fcntl()-based record locks work over NFS on both Linux and BSDs; flock()-based file locks may not, depending on system and kernel version. (If NFS locking is disabled like it is on some web hosting services, no locking will be reliable.) Just initialize the struct flock with .l_whence = SEEK_SET, .l_start = 0, .l_len = 0 to refer to the entire file.
Use asprintf() to print to a dynamically allocated buffer:
char *buffer = NULL;
int length;
length = asprintf(&buffer, ...);
if (length == -1) {
/* Out of memory */
}
/* ... Have buffer and length ... */
free(buffer);
After adding the locking, do wrap your write() in a loop:
{
const char *p = (const char *)buffer;
const char *const q = (const char *)buffer + length;
ssize_t n;
while (p < q) {
n = write(fd, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != -1) {
/* Write error / kernel bug! */
} else
if (errno != EINTR) {
/* Error! Details in errno */
}
}
}
Although there are some local filesystems that guarantee write() does not return a short count unless you run out of storage space, not all do; especially not the networked ones. Using a loop like above lets your program work even on such filesystems. It's not too much code to add for reliable and robust operation, in my opinion.
In Linux, you can take a write lease on a file to exclude any other process opening that file for a while.
Essentially, you cannot block a file open, but you can delay it for up to /proc/sys/fs/lease-break-time seconds, typically 45 seconds. The lease is granted only when no other process has the file open, and if any other process tries to open the file, the lease owner gets a signal. (If the lease owner does not release the lease, for example by closing the file, the kernel will automagically break the lease after the lease-break-time is up.)
Unfortunately, these only work in Linux, and only on local files, so they are of limited use.
If readers do not keep the file open, but open, read, and close it every time they read it, you can write a full replacement file (must be on the same filesystem; I recommend using a lock-subdirectory for this), and hard-link it over the old file.
All readers will see either the old file or the new file, but those that keep their file open, will never see any changes.

close() system call takes long time to finish

I am testing performance using sendfile() to copy big files under Linux 6.4.
My code follows the following logic and is compiled with gcc
read_fd = open (argv[1], O_RDONLY);
fstat(read_fd, &stat_buf);
write_fd = open (argv[2], O_WRONLY | O_CREAT, stat_buf.st_mode);
left_to_write = stat_buf.st_size;
while (left_to_write > 0) {
written=sendfile (write_fd, read_fd, &offset, stat_buf.st_size);
if (written == -1)
return -1;
else {
left_to_write -= written;
bytes_done=stat_buf.st_size-left_to_write;
printf("%ld bytes written, %ld bytes left to write\n", written, left_to_write);
}
}
close(read_fd);
close(write_fd); /* this takes minutes */
The sendfile() call is very fast; I can see it writing chunks of 2GB about every 5 seconds.
When the while loop is over, it stays at close(output) for minutes before finishing successfully.
Why does close(output) takes so long to run? Is it flushing a buffer? How can I make it faster?
It's due to writeback. close(output) must report any write errors that have not yet been reported; for example, it must ensure that all the blocks required for the data have been allocated, or else report ENOSPC.
The data are not necessarily written to disk when close() returns; if you want this, you'll need to call fsync() (and endure a longer wait).
See also the NOTES section of the manual page.

Copying files using memory map

I want to implement an effective file copying technique in C for my process which runs on BSD OS. As of now the functionality is implemented using read-write technique. I am trying to make it optimized by using memory map file copying technique.
Basically I will fork a process which mmaps both src and dst file and do memcpy() of the specified bytes from src to dst. The process exits after the memcpy() returns. Is msync() required here, because when I actually called msync with MS_SYNC flag, the function took lot of time to return. Same behavior is seen with MS_ASYNC flag as well?
i) So to summarize is it safe to avoid msync()?
ii) Is there any other better way of copying files in BSD. Because bsd seems to be does not support sendfile() or splice()? Any other equivalents?
iii) Is there any simple method for implementing our own zero-copy like technique for this requirement?
My code
/* mmcopy.c
Copy the contents of one file to another file, using memory mappings.
Usage mmcopy source-file dest-file
*/
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "tlpi_hdr.h"
int
main(int argc, char *argv[])
{
char *src, *dst;
int fdSrc, fdDst;
struct stat sb;
if (argc != 3)
usageErr("%s source-file dest-file\n", argv[0]);
fdSrc = open(argv[1], O_RDONLY);
if (fdSrc == -1)
errExit("open");
/* Use fstat() to obtain size of file: we use this to specify the
size of the two mappings */
if (fstat(fdSrc, &sb) == -1)
errExit("fstat");
/* Handle zero-length file specially, since specifying a size of
zero to mmap() will fail with the error EINVAL */
if (sb.st_size == 0)
exit(EXIT_SUCCESS);
src = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fdSrc, 0);
if (src == MAP_FAILED)
errExit("mmap");
fdDst = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
if (fdDst == -1)
errExit("open");
if (ftruncate(fdDst, sb.st_size) == -1)
errExit("ftruncate");
dst = mmap(NULL, sb.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdDst, 0);
if (dst == MAP_FAILED)
errExit("mmap");
memcpy(dst, src, sb.st_size); /* Copy bytes between mappings */
if (msync(dst, sb.st_size, MS_SYNC) == -1)
errExit("msync");
enter code here
exit(EXIT_SUCCESS);
}
Short answer: msync() is not required.
When you do not specify msync(), the operating system flushes the memory-mapped pages in the background after the process has been terminated. This is reliable on any POSIX-compliant operating system.
To answer the secondary questions:
Typically the method of copying a file on any POSIX-compliant operating system (such as BSD) is to use open() / read() / write() and a buffer of some size (16kb, 32kb, or 64kb, for example). Read data into buffer from src, write data from buffer into dest. Repeat until read(src_fd) returns 0 bytes (EOF).
However, depending on your goals, using mmap() to copy a file in this fashion is probably a perfectly viable solution, so long as the files being coped are relatively small (relative to the expected memory constraints of your target hardware and your application). The mmap copy operation will require roughly 2x the total physical memory of the file. So if you're trying to copy a file that's a 8MB, your application will use 16MB to perform the copy. If you expect to be working with even larger files then that duplication could become very costly.
So does using mmap() have other advantages? Actually, no.
The OS will often be much slower about flushing mmap pages than writing data directly to a file using write(). This is because the OS will intentionally prioritize other things ahead of page flushes so to keep the system 'responsive' for foreground tasks/apps.
During the time the mmap pages are being flushed to disk (in the background), the chance of sudden loss of power to the system will cause loss of data. Of course this can happen when using write() as well but if write() finishes faster then there's less chance for unexpected interruption.
the long delay you observe when calling msync() is roughly the time it takes the OS to flush your copied file to disk. When you don't call msync() it happens in the background instead (and also takes even longer for that reason).

Why does fopen/fgets use both mmap and read system calls to access the data?

I have a small example program which simply fopens a file and uses fgets to read it. Using strace, I notice that the first call to fgets runs a mmap system call, and then read system calls are used to actually read the contents of the file. on fclose, the file is munmaped. If I instead open read the file with open/read directly, this obviously does not occur. I'm curious as to what is the purpose of this mmap is, and what it is accomplishing.
On my Linux 2.6.31 based system, when under heavy virtual memory demand these mmaps will sometimes hang for several seconds, and appear to me to be unnecessary.
The example code:
#include <stdlib.h>
#include <stdio.h>
int main ()
{
FILE *f;
if ( NULL == ( f=fopen( "foo.txt","r" )))
{
printf ("Fail to open\n");
}
char buf[256];
fgets(buf,256,f);
fclose(f);
}
And here is the relevant strace output when the above code is run:
open("foo.txt", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=9, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb8039000
read(3, "foo\nbar\n\n"..., 4096) = 9
close(3) = 0
munmap(0xb8039000, 4096) = 0
It's not the file that is mmap'ed - in this case mmap is used anonymously (not on a file), probably to allocate memory for the buffer that the consequent reads will use.
malloc in fact results in such a call to mmap. Similarly, the munmap corresponds to a call to free.
The mmap is not mapping the file; instead it's allocating memory for the stdio FILE buffering. Normally malloc would not use mmap to service such a small allocation, but it seems glibc's stdio implementation is using mmap directly to get the buffer. This is probably to ensure it's page-aligned (though posix_memalign could achieve the same thing) and/or to make sure closing the file returns the buffer memory to the kernel. I question the usefulness of page-aligning the buffer. Presumably it's for performance, but I can't see any way it would help unless the file offset you're reading from is also page-aligned, and even then it seems like a dubious micro-optimization.
from what i have read memory mapping functions are useful while handling large files. now the definition of large is something i have no idea about. but yes for the large files they are significantly faster as compared to the 'buffered' i/o calls.
in the example that you have posted i think the file is opened by the open() function and mmap is used for allocating memory or something else.
from the syntax of mmap function this can be seen clearly:
void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off);
the second last parameter takes the file descriptor which should be non-negative.
while in the stack trace it is -1
Source code of fopen in glibc shows that mmap can be actually used.
https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofopen.c;h=965d21cd978f3acb25ca23152993d9cac9f120e3;hb=HEAD#l36

Does Linux's splice(2) work when splicing from a TCP socket?

I've been writing a little program for fun that transfers files over TCP in C on Linux. The program reads a file from a socket and writes it to file (or vice versa). I originally used read/write and the program worked correctly, but then I learned about splice and wanted to give it a try.
The code I wrote with splice works perfectly when reading from stdin (redirected file) and writing to the TCP socket, but fails immediately with splice setting errno to EINVAL when reading from socket and writing to stdout. The man page states that EINVAL is set when neither descriptor is a pipe (not the case), an offset is passed for a stream that can't seek (no offsets passed), or the filesystem doesn't support splicing, which leads me to my question: does this mean that TCP can splice from a pipe, but not to?
I'm including the code below (minus error handling code) in the hopes that I've just done something wrong. It's based heavily on the Wikipedia example for splice.
static void splice_all(int from, int to, long long bytes)
{
long long bytes_remaining;
long result;
bytes_remaining = bytes;
while (bytes_remaining > 0) {
result = splice(
from, NULL,
to, NULL,
bytes_remaining,
SPLICE_F_MOVE | SPLICE_F_MORE
);
if (result == -1)
die("splice_all: splice");
bytes_remaining -= result;
}
}
static void transfer(int from, int to, long long bytes)
{
int result;
int pipes[2];
result = pipe(pipes);
if (result == -1)
die("transfer: pipe");
splice_all(from, pipes[1], bytes);
splice_all(pipes[0], to, bytes);
close(from);
close(pipes[1]);
close(pipes[0]);
close(to);
}
On a side note, I think that the above will block on the first splice_all when the file is large enough due to the pipe filling up(?), so I also have a version of the code that forks to read and write from the pipe at the same time, but it has the same error as this version and is harder to read.
EDIT: My kernel version is 2.6.22.18-co-0.7.3 (running coLinux on XP.)
What kernel version is this? Linux has had support for splicing from a TCP socket since 2.6.25 (commit 9c55e01c0), so if you're using an earlier version, you're out of luck.
You need to splice_all from pipes[0] to to every time you do a single splice from from to pipes[1] (the splice_all is for the amount of bytes just read by the last single splice) . Reason: pipes represents a finite kernel memory buffer. So if bytes is more than that, you'll block forever in your splice_all(from, pipes[1], bytes).

Resources