I have an application on Linux platform which requires a server program to write data to a bin file continuously. At the same time another program needs to read the written values. Should I be concerned if I am not locking the file during the read and the write process?
You should be concerned. I assume you are sure that no other program (than the two executables mentioned in your question) are accessing that file. You should indeed lock to serialize that access. Use flock(2), or lockf(3) which uses fcntl(2)
BTW, is the file read and written sequentially? Did you consider using some higher-level thing, e.g. GDBM or some database like mariadb or postgresql or mongodb, etc...
Everything depends on what your requirements are? Can you modify the server process? If so, you have endless possiblities. This is a well studied problem, Interprocess Communication, wikipedia IPC.
Otherwise, in my own test program, it seemed that no locking was necessary to have a producer and consumer operating on the same file. This is anecdotal evidence only, I make no guarantees.
Producer:
int main() {
int fd = open("file", O_WRONLY | O_APPEND);
const char * str = "str";
const int str_len = strlen(str);
int sum = 0;
while (1) {
sum += write(fd, str, str_len);
printf("%d\n", sum);
}
close(fd);
}
Consumer:
int main() {
int fd = open("file", O_RDONLY);
char buf[10];
const int buf_size = sizeof(buf);
int sum = 0;
while (1) {
sum += read(fd, buf, buf_size);
printf("%d\n", sum);
}
close(fd);
}
(Includes:)
#include
#include
#include
#include
This program assumes the "file" exists already.
Just to add to what has already been said here, check your OS documentation. In principle there should not be problem when reading, if reading is atomic (I.e no task switch during the operation), should be ok. Also the OS could have its own restrictions and locks, so be careful.
Related
I'm trying to trigger some concurrent conflicts by having several processes writing to the same file, but couldn't:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/wait.h>
void concurrent_write()
{
int create_fd = open("bar.txt", O_CREAT | O_TRUNC, 0644);
close(create_fd);
int repeat = 20;
int num = 4;
for (int process = 0; process < num; process++)
{
int rc = fork();
if (rc == 0)
{
// child
int write_fd = open("bar.txt", O_WRONLY | O_APPEND, 0644);
for (int idx = 0; idx < repeat; idx++)
{
sleep(1);
write(write_fd, "child writing\n", strlen("child writing\n"));
}
close(write_fd);
exit(0);
}
}
for (int process = 0; process < num; process++)
{
wait(NULL);
// wait for all children to exits
}
printf("write to `bar.txt`\n%d lines written by %d process\n", repeat * num, num);
printf("wc:");
if (fork() == 0)
{
// child
char *args[3];
args[0] = strdup("wc");
args[1] = strdup("bar.txt");
args[2] = NULL;
execvp(args[0], args);
}
}
int main(int argc, char *argv[])
{
concurrent_write();
return 0;
}
This program fork #num children and then have all of them write #repeat lines to a file. But every time (however I change #repeat and #num) I got the same result that the length of bar.txt (output file) matched the number of total written lines. Why is there no concurrent conflicts triggered?
Writing to a file can be divided into a two-step process:
Locate where you want to write.
Write data into the file.
You open a file with flag O_APPEND and it ensures that the two-step process is atomic. So, you can always find the lines of the file as the count you set.
See the open(2) man page:
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as
if with lseek(2). The modification of the file offset and
the write operation are performed as a single atomic step.
In essence, one of the major design features of O_APPEND is precisely to prevent the sort of "concurrent conflicts" you mention. The typical example would be a log file that several processes must write to. Using O_APPEND ensures their messages do not overwrite each other.
Moreover, all data written by a single write call is written atomically, so provided that your write("child writing\n") successfully writes all its bytes (which for a regular file it usually would), they will not be interleaved with the bytes of any other such message.
First, write() calls with the O_APPEND flag should be atomic. Per POSIX write():
If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
But that's not enough when there are multiple threads or processes making parallel write() calls on the same file - that does not guarantee that parallel write() calls are atomic.
POSIX does guarantee that parallel write() calls are also atomic:
All of the following functions shall be atomic with respect to each
other in the effects specified in POSIX.1-2017 when they operate on
regular files or symbolic links:
...
write()
...
See also Is file append atomic in UNIX?
Beware, though. Reading that question and its answers shows that Linux filesystems such as ext3 are not POSIX compliant once you get past a relatively small size operation, or possibly if you cross page and/or file system sector boundaries. I suspect XFS and ZFS will support write() atomicity much better given their origins.
And none of this applies to Windows.
I am trying to speed up my C program to spit out data faster.
Currently I am using printf() to give some data to the outside world. It is a continuous stream of data, therefore I am unable to use return(data).
How can I use write() or fwrite() to give the data out to the console instead of file?
Overall my setup consist of program written in C and its output goes to the python script, where the data is processed further. I form a pipe:
./program_in_c | script_in_python
This gives additional benefit on Raspberry Pi by using more of processor's cores.
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
write() writes up to count bytes from the buffer starting at buf to
the file referred to by the file descriptor fd.
the standard output file descriptor is: 1 in linux at least!
concern using flush the stdoutput buffer as well, before calling to write system call to ensure that all previous garabge was cleaned
fflush(stdout); // Will now print everything in the stdout buffer
write(1, buf, count);
using fwrite:
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
The function fwrite() writes nmemb items of data, each size bytes
long, to the stream pointed to by stream, obtaining them from the
location given by ptr.
fflush(stdout);
int buf[8];
fwrite(buf, sizeof(int), sizeof(buf), stdout);
Please refare to man pages for further reading, in the links below:
fwrite
write
Well, there's little or no win in trying to overcome the already used buffering system of the stdio.h package. If you try to use fwrite() with larger buffers, you'll probably win no more time, and use more memory than is necessary, as stdio.h selects the best buffer size appropiate to the filesystem where the data is to be written.
A simple program like the following will show that speed is of no concern, as stdio is already buffering output.
#include <stdio.h>
int
main()
{
int c;
while((c = getchar()) >= 0)
putchar(c);
}
If you try the above and below programs:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
char buffer[512];
int n;
while((n = read(0, buffer, sizeof buffer)) > 0)
write(1, buffer, n);
if (n < 0) {
perror("read");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
You will see that there's no significative difference or, even, the first program will be faster, despite it is doing I/O on a per character basis. (as B. Kernighan & Dennis Ritchie wrote it in her first edition of "The C programming language") Most probably the first program will win.
The calls to read() and write() involve a system call each, with a buffer size decided by you. The individual getchar() and putchar() calls don't. They just store the received chars in a memory buffer, as you print them, whose size has been decided by the stdio.h library implementation, based on the filesystem, and it flushes the buffer, once it is full of data. If you grow the buffer size in the second program, you'll see that you get better results increasing it up to a point, but after that you'll see no more increment in speed. The number of calls made to the library is insignificant with respect to the time involved in doing the actual I/O, and selecting a very large buffer, will eat much memory from your system (and a Raspberry Pi memory is limited in this sense, to 1Gb or ram) If you end making swap due to a so large buffer, you'll lose the battle completely.
Most filesystems have a preferred buffer size, because the kernel does write ahead (the kernel reads more than what you asked for, on sequential reads, in prevision that you'll continue reading more after you consumed the data) and this affects the optimum buffer size. For that, the stat(2) system call tells you what is the optimum buffer size, and stdio uses that when it selects the actual buffer size.
Don't think you are going to get better (or much better) than the program listed first above. Even if you use large enough buffers.
What is not correct (or valid) is to intermix calls that do buffering (like all the stdio package's) with basic system calls (like read(2) or write(2) ---as I've seen recommending you to use fflush(3) after write(2), which is totally incoherent--- that do not buffer the data) there's no earn (and probably you'll get your output incorrectly ordered, if you do part of the calls using printf(3) and part using write(2) (this happens more in pipelines like you plan to do, because the buffers are not line oriented ---another characteristic of buffered output in stdio---)
Finally, I recomend you to read "The Unix programming environment" by Dennis Ritchie and Rob Pike. It will teach you a lot of unix, but one very good thing is that it will teach you to use perfectly the stdio package and the unix filesystem calls for reading and writing. With a little of luck you'll find it in .pdf on internet.
The next program shows you the effect of buffering:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
int i;
char *sep = "";
for (i = 0; i < 10; i++) {
printf("%s%d", sep, i);
sep = ", ";
sleep(1);
}
printf("\n");
}
One would assume you are going to see (on the terminal) the program, writing the numbers 0 to 9, separated by , and paced on one second intervals.
But due to the buffering, what you observe is quite different, you'll see how your program waits for 10 seconds without writing anything at all on the terminal, and at the end, writes everything in one shot, including the final line end, when the program terminates, and the shell shows you the prompt again.
If you change the program to this:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
int i;
char *sep = "";
for (i = 0; i < 10; i++) {
printf("%s%d", sep, i);
fflush(stdout);
sep = ", ";
sleep(1);
}
printf("\n");
}
You'll see the expected output, because you have told stdio to flush the buffer at each loop pass. In both programs you did 10 calls to printf(3), but there was only one write(2) at the end to write the full buffer. In the second version you forced stdio to do one such write(2) after each printf, and that showed the data out as the program passed through the loop.
Be careful, because another characteristic of stdio can be confounding you, as printf(3), when you print to a terminal device, flushes the output at each \n, but when you run it through a pipe, it does it only when the buffer fills up completely. This saves system calls (in FreeBSD, for example, the buffer size selected by stdio is around 32kb, large enough to force two blocks to write(2) and optimum (you'll not get better going above that size)
The console output in C works almost the same way as a file. Once you have included stdio.h, you can write on the console output, named stdout (for "standard output"). In the end, the following statement:
printf("hello world!\n");
is the same as:
char str[] = "hello world\n";
fwrite(str, sizeof(char), sizeof(str) - 1, stdout);
fflush(stdout);
I wrote code below
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int main() {
int fd = 3;
char c[100] = "Testing\n";
ssize_t nbytes = write(fd, (void *) c, strlen(c));
return 0;
}
compiled/linked, and executed
$ ./io
$ ./io 3> io_3.txt
The first line produced no output. The second line gave me file io_3.txt containing Testing.
This is all expected behaviour (I guess).
Even if in my tests it produced the expected output,
I am not certain if, to avoid potential problems, undefined behavior, etc., I should do anything prior to the first write, like checking if fd=3 is in use (and in that case, how... this may apply), if it is suitably open, etc.
And I am not certain if I should perform some action after the last write, for the same reasons.
Perhaps the way I did is "non-risky", the only potential issue being that nothing is written, which I could detect by checking the value of nbytes... I wouldn't know.
Any clarification is welcome.
If you write a program like this, executing it without fd 3 open is a usage bug. Normally the only file descriptors that should be used by number without having opened them yourself are 0 (stdin), 1 (stdout), and 2 (stderr). If a program needs to take additional pre-opened file descriptors as input, the standard idiom is to pass the fd numbers on the command line or environment variables rather than hard-coding them. For example:
int main(int argc, char **argv) {
if (argc<2 || !isdigit(argv[1][0])) return 1;
int fd = strtol(argv[1], 0, 0);
char c[100] = "Testing\n";
ssize_t nbytes = write(fd, (void *) c, strlen(c));
return 0;
}
In practice, a trivial program like yours is probably safe with the write just failing if fd 3 wasn't open. But as soon as you do anything that might open file descriptors (possibly internal to the implementation, like syslog, or date/time functions opening timezone data, or message translation catalogs, etc.), it might happen that fd 3 now refers to such an open file, and you wrongly attempt a write to it. Using file descriptors like this is a serious bug.
Recently, I am reading the UNIX Systems Programming, and doing some test.
I'm confused that why i truncate the file with truncate command in the terminal, then the read process read the same position return 0 which return the string "a" before. The process open the file, it will cache the file, beacase i change the file content, then read, the result is the same not changed, so why truncate file will affect the process immediately?In the UNIX Systems Programming, it says that v node include the current file size? so the size don't be read from the disk every time, it's cached in the memory.
the read process code source.
#include <unistd.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main(int argc, char *argv[])
{
int fd = open("a.txt", O_RDONLY);
char buf[1024];
int n;
for(int i = 0;i < 10; i++){
if (lseek(fd, 0, 0) == -1){
return 1;
}
n = read(fd, buf, 1);
if (n < 0){
perror("error");
return 1;
}
printf("%d\n", n);
buf[n]=0;
printf("buf %s\n", buf);
sleep(2);
}
return 0;
}
a.txt only have one char:
a
Caching is usually intended to improve performance, without having any other detectable effects.
If the inode was cached by the process, as you describe, then modifications made by another process might not be visible by processes that had cached information. That would be bad.
In reality, disk blocks are cached, inodes are cached, various other structures might be cached, but in each case there is a single cache, in the kernel, which all processes share, so they all have a consistent view.
When one process truncates the file, the in-memory cache is updated, storing the new file size (which will eventually be written to disk, but probably not immediately). When the other process calls read again, the updated file size is read from the in-memory cache, not from disk.
If I have a buffer which contains the data of a file, how can I get a file descriptor from it?
This is a question derived from how to untar file in memory
I wrote a simple example how to make filedescriptor to a memory area:
#include <unistd.h>
#include <stdio.h>
#include <string.h>
char buff[]="qwer\nasdf\n";
int main(){
int p[2]; pipe(p);
if( !fork() ){
for( int buffsize=strlen(buff), len=0; buffsize>len; )
len+=write( p[1], buff+len, buffsize-len );
return 0;
}
close(p[1]);
FILE *f = fdopen( p[0], "r" );
char buff[100];
while( fgets(buff,100,f) ){
printf("from child: '%s'\n", buff );
}
puts("");
}
Not possible in plain C. In plain C all file access happens via FILE * handles and these can only be created with fopen() and freopen() and in both cases must refer to a file path. As C tries to be as portable as possible, it limits I/O to the absolute bare minimum that probably all systems can support in some way.
If you have POSIX API available (e.g. Linux, macOS, iOS, FreeBSD, most other UNIX systems), you can use fmemopen():
char dataInMemory[] = "This is some data in memory";
FILE * fileDescriptor = fmemopen(dataInMemory, sizeof(dataInMemory), "r");
This is a true file handle that can be used with all C file API. It should also allow seeking, something not possible if you work with pipes as pipes support no seeking (you can emulate forward seeking but there is no way to ever seek backwards).
You can't. Unlike C++, the C model of file I/O isn't open to extension.