I am trying to write a bunch of values to a FIFO pipe - which works fine, but the issue that I have is that the program on the other end of the FIFO pipe, ends up reading the values before all are written (I assume kernel scheduling isn't working in my favor). Below is what my code is somewhat like (this works well - half of the time):
write(out_fd, (void *)struct_1, sizeof(struct part_1));
write(out_fd, (void *)struct_2, sizeof(struct part_2));
write(out_fd, (void *)struct_3, sizeof(struct part_3));
However - what I assume is that - kernel scheduling essentially interrupts somewhere in-between the sequential writes, whilst on the other end of the FIFO pipe, another program reads all values as they come in, and when it does not match the expected full (multiple) values being written - at once, my programs fails to operate correctly, as the values do not match the format what is expected.
Does anyone have any ideas into how all write's can be written in bulk, such as preventing scheduling from switching between applications before all write's can be done. My initial attempt was to malloc enough space for all of the values, then to memcpy the values towards a respective offset - matching each interval position - and then writing that; however, this ended up causes heap memory corruption (after freeing all), which only became present further down the line.
Any suggestions? Thank you.
A FIFO is just a stream of bytes; it provides no framing of any kind. In particular, there is no guarantee that read() will read the same chunks that write() wrote. Fix your receiver to handle this.
Related
Assume we have 2 threads in the process.
now we run the following code:
read(fd, buf, 10);
where fd is some file descriptor which is shared among the threads (say static), and buf is an array which is not shared among the threads (local variable).
now, assume that the file is 1KB and the first 10 chars in the file are "AAAAAAAAAA" and all the rest are 'B's. ("BBBBBB.....").
now If we have only one processor, what the output of the bufs are if ill print them in each thread?
I know the answer is that one of the arrays will always have only A's and the other only B's, but I don't fully understand why, because I think that there could be a context switch in the middle of this system-call (read) and then both of the buf's will have A's in them.
Is it even possible for a context switch to occur in the middle of a system-call? and if so what do you think will buf's could have at the end of the execution?
Modern disks cannot perform reads and writes at 10-byte granularity and instead perform reads and writes in units of sectors, traditionally 512 bytes for hard disk drives (HDDs).
Copying 10 characters to thread buffer happen very fast, before context switch though not guaranteed.
A simple program to get a feeling would be to have 2 threads printing to same console, one prints + and the other -. Check how many + before first -.
Anyway, on to the original question, change the size of the array to 1024 and have 1024 A's to start with and you will most probably see the difference.
Is there a portable way to discard a number of incoming bytes from a socket without copying them to userspace? On a regular file, I could use lseek(), but on a socket, it's not possible. I have two scenarios where I might need it:
A stream of records is arriving on a file descriptor (which can be a TCP, a SOCK_STREAM type UNIX domain socket or potentially a pipe). Each record is preceeded by a fixed size header specifying its type and length, followed by data of variable length. I want to read the header first and if it's not of the type I'm interested in, I want to just discard the following data segment without transferring them into user space into a dummy buffer.
A stream of records of varying and unpredictable length is arriving on a file descriptor. Due to asynchronous nature, the records may still be incomplete when the fd becomes readable, or they may be complete but a piece of the next record already may be there when I try to read a fixed number of bytes into a buffer. I want to stop reading the fd at the exact boundary between the records so I don't need to manage partially loaded records I accidentally read from the fd. So, I use recv() with MSG_PEEK flag to read into a buffer, parse the record to determine its completeness and length, and then read again properly (thus actually removing data from the socket) to the exact length. This would copy the data twice - I want to avoid that by simply discarding the data buffered in the socket by an exact amount.
On Linux, I gather it is possible to achieve that by using splice() and redirecting the data to /dev/null without copying them to userspace. However, splice() is Linux-only, and the similar sendfile() that is supported on more platforms can't use a socket as input. My questions are:
Is there a portable way to achieve this? Something that would work on other UNIXes (primarily Solaris) as well that do not have splice()?
Is splice()-ing into /dev/null an efficient way to do this on Linux, or would it be a waste of effort?
Ideally, I would love to have a ssize_t discard(int fd, size_t count) that simply removes count of readable bytes from a file descriptor fd in kernel (i.e. without copying anything to userspace), blocks on blockable fd until the requested number of bytes is discarded, or returns the number of successfully discarded bytes or EAGAIN on a non-blocking fd just like read() would do. And advances the seek position on a regular file of course :)
The short answer is No, there is no portable way to do that.
The sendfile() approach is Linux-specific, because on most other OSes implementing it, the source must be a file or a shared memory object. (I haven't even checked if/in which Linux kernel versions, sendfile() from a socket descriptor to /dev/null is supported. I would be very suspicious of code that does that, to be honest.)
Looking at e.g. Linux kernel sources, and considering how little a ssize_t discard(fd, len) differs from a standard ssize_t read(fd, buf, len), it is obviously possible to add such support. One could even add it via an ioctl (say, SIOCISKIP) for easy support detection.
However, the problem is that you have designed an inefficient approach, and rather than fix the approach at the algorithmic level, you are looking for crutches that would make your approach perform better.
You see, it is very hard to show a case where the "extra copy" (from kernel buffers to userspace buffers) is an actual performance bottleneck. The number of syscalls (context switches between userspace and kernel space) sometimes is. If you sent a patch upstream implementing e.g. ioctl(socketfd, SIOCISKIP, bytes) for TCP and/or Unix domain stream sockets, they would point out that the performance increase this hopes to achieve is better obtained by not trying to obtain the data you don't need in the first place. (In other words, the way you are trying to do things, is inherently inefficient, and rather than create crutches to make that approach work better, you should just choose a better-performing approach.)
In your first case, a process receiving structured data framed by a type and length identifier, wishing to skip unneeded frames, is better fixed by fixing the transfer protocol. For example, the receiving side could inform the sending side which frames it is interested in (i.e., basic filtering approach). If you are stuck with a stupid protocol that you cannot replace for external reasons, you're on your own. (The FLOSS developer community is not, and should not be responsible for maintaining stupid decisions just because someone wails about it. Anyone is free to do so, but they'd need to do it in a manner that does not require others to work extra too.)
In your second case, you already read your data. Don't do that. Instead, use an userspace buffer large enough to hold two full size frames. Whenever you need more data, but the start of the frame is already past the midway of the buffer, memmove() the frame to start at the beginning of the buffer first.
When you have a partially read frame, and you have N unread bytes from that left that you are not interested in, read them into the unused portion of the buffer. There is always enough room, because you can overwrite the portion already used by the current frame, and its beginning is always within the first half of the buffer.
If the frames are small, say 65536 bytes maximum, you should use a tunable for the maximum buffer size. On most desktop and server machines, with high-bandwidth stream sockets, something like 2 MiB (2097152 bytes or more) is much more reasonable. It's not too much memory wasted, but you rarely do any memory copies (and when you do, they tend to be short). (You can even optimize the memory moves so that only full cachelines are copied, aligned, since leaving almost one cacheline of garbage at the start of the buffer is insignificant.)
I do HPC with large datasets (including text-form molecular data, where records are separated by newlines, and custom parsers for converting decimal integers or floating-point values are used for better performance), and this approach does work well in practice. Simply put, skipping data already in your buffer is not something you need to optimize; it is insignificant overhead compared to simply avoiding doing the things you do not need.
There is also the question of what you wish to optimize by doing that: the CPU time/resources used, or the wall clock used in the overall task. They are completely different things.
For example, if you need to sort a large number of text lines from some file, you use the least CPU time if you simply read the entire dataset to memory, construct an array of pointers to each line, sort the pointers, and finally write each line (using either internal buffering and/or POSIX writev() so that you do not need to do a write() syscall for each separate line).
However, if you wish to minimize the wall clock time used, you can use a binary heap or a balanced binary tree instead of an array of pointers, and heapify or insert-in-order each line completely read, so that when the last line is finally read, you already have the lines in their correct order. This is because the storage I/O (for all but pathological input cases, something like single-character lines) takes longer than sorting them using any robust sorting algorithm! The sorting algorithms that work inline (as data comes in) are typically not as CPU-efficient as those that work offline (on complete datasets), so this ends up using somewhat more CPU time; but because the CPU work is done at a time that is otherwise wasted waiting for the entire dataset to load into memory, it is completed in less wall clock time!
If there is need and interest, I can provide a practical example to illustrate the techniques. However, there is absolutely no magic involved, and any C programmer should be able to implement these (both the buffering scheme, and the sort scheme) on their own. (I do consider using resources like Linux man pages online and Wikipedia articles and pseudocode on for example binary heaps doing it "on your own". As long as you do not just copy-paste existing code, I consider it doing it "on your own", even if somebody or some resource helps you find the good, robust ways to do it.)
I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.
I will be a little more specific:
I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.
The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:
#include <stdio.h>
int main(void)
{
FILE * fp = fopen("hallo.txt", "w");
fputc('A', fp);
getchar();
fputc('A', fp);
getchar();
return 0;
}
The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider
I am using Gnu GCC on Windows 8.1.
Update:
I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something?
Please explain this point too.
The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).
As you hinted in the question there are many types of buffers, but as a broad grouping:
Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.
Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:
FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
fputc('x', f);
fclose(f);
Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!
Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!
About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.
My case #3 is for example when you do:
FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);
You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...
Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.
The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.
One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.
As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.
The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".
Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.
So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki
As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:
fflush() or equivalent
File is closed
The buffer is overflown.
Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).
Buffer is a simple small area inside your memory (RAM) and that area is responsible of storing information before sent to your program, as long I'm typing the characters from the keyboard these characters will be stored inside the buffer and as soon I press the Enter key these characters will be transported from the buffer into your program so with the help of buffer all these characters are instantly available to your program (prevent lag and the slowly) and sent them to the output display screen
For some reasons, I want to send pointers via pipe (certainly, not crossing processes)
E.g., some data should be recycled after everything done, the (pointer of the ) structure was send to recycling thread via a pipe; or I want to send some structure from one working thread to another. In the examples, no thread can wait.
So, may I send pointers via pipes?
Think about following things:
I write 8 bytes (a pointer, for example) to a pipe with O_NONBLOCK. but its buffer had only 7 bytes left. Then '7' was returned from write() call. Then other threads might write anther 8 bytes after more buffer became available. When the reader thread read the pipe, it would get an invalid pointer. Segment Fault.
I send only pointers to this pipe. I've read some articles about pipe, and I noticed that when I write or read 8 bytes it's an atomic operation. Is it confirmed that (buffer size of a pipe % sizeof(void*)) == 0 (or at least on an x86_64 linux)? If so, will condition(1) happen?
It will be quicker and easier to send pointers directly. But if it is not safe, I will have to encode the pointers such as what UTF-8 does, or think about other ways.
Thanks a lot~
As long as you ensure that you are receiving 8 bytes before casting your buffer to a pointer, I don't see what is technically wrong with this. There's no point in doing any encoding.
However, there are certainly other, better ways of communicating pointers between threads. Note that using a pipe requires a system call (transition to kernel-mode) for each read and write.
For example, implement a queue in shared memory:
Designing a Queue to be a shared memory
A futex(2) will allow you to implement fast user-space locking (the definition of a futex!) around your data structures.
For pipes, writes smaller than the value of the macro PIPE_BUF are guaranteed to be atomic, i.e. they cannot be split up in the way you are concered about. So your idea is safe.
The writev function takes an array of struct iovec as input argument
writev(int fd, const struct iovec *iov, int iovcnt);
The input is a list of memory buffers that need to be written to a file (say). What I want to know is:
Does writev internally do this:
for (each element in iov)
write(element)
such that every element of iov is written to file in a separate I/O call? Or does writev write everything to file in a single I/O call?
Per the standards, the for loop you mentioned is not a valid implementation of writev, for several reasons:
The loop could fail to finish writing one iov before proceeding to the next, in the event of a short write - but this could be worked around by making the loop more elaborate.
The loop could have incorrect behavior with respect to atomicity for pipes: if the total write length is smaller than PIPE_BUF, the pipe write is required to be atomic, but the loop would break the atomicity requirement. This issue cannot be worked around except by moving all the iov entries into a single buffer before writing when the total length is at most PIPE_BUF.
The loop might have cases where it could result in blocking, where the single writev call would be required to perform a partial write without blocking. As far as I know, this issue would be impossible to work around in the general case.
Possibly other reasons I haven't thought of.
I'm not sure about point #3, but it definitely exists in the opposite direction, when reading. Calling read in a loop could block if a terminal has some data (shorter than the total iov length) available followed by an EOF indicator; calling readv should return immediately with a partial read in this case. However, due to a bug in Linux, readv on terminals is actually implemented as a read loop in kernelspace, and it does exhibit this blocking bug. I had to work around this bug in implementing musl's stdio:
http://git.etalabs.net/cgi-bin/gitweb.cgi?p=musl;a=commit;h=2cff36a84f268c09f4c9dc5a1340652c8e298dc0
To answer the last part of your question:
Or does writev write everything to file in a single I/O call?
In all cases, a conformant writev implementation will be a single syscall. Getting down to how it's implemented on Linux: for ordinary files and for most devices, the underlying file driver has methods that implement iov-style io directly, without any sort of internal loop. But the terminal driver on Linux is highly outdated and lacks the modern io methods, causing the kernel to fallback to a write/read loop for writev/readv when operating on a terminal.
The direct way to know how code works is read the source code.
see http://www.oschina.net/code/explore/glibc-2.9/sysdeps/posix/writev.c
It simplely alloca() or malloc() a buffer, copy all vectors into it, and call write() once.
That how it works. Nothing mysterious.
Or does writev write everything to file in a single I/O call?
I'm afarid not everything, though sys_writev try its best to write everything in a single call. it's depends on vfs's implement, if the vfs doesn't give an implement of writev, then kenerl will call vfs' write() in a loop. it's better to check the return value of writev/readv to see how many bytes wrotten as you do in write().
you can find the code of writev in kernel, fs/read_write.c:do_readv_writev.