I'm training a typical map-reduce architecture (in O.S. classes) and I'm free to decide how the master process will tell its N child processes to parse a log. So, I'm kind of stuck in these two possibilities:
count the number of rows and give X rows for each map OR
each map reads the line of its ID and the next line to read= current_one+number_of_existent_maps
E.g.: with 3 maps, each one is going to read these lines:
Map1: 1, 4, 7, 10, 13
Map2: 2, 5, 8, 11, 14
Map3: 3, 6, 9, 12, 15
I have to do this in order to out-perform a single process that parses the entire log file, so the way I split the job between child processes has to be consistent with this objective.
Which one do you think is best? How can I do the scanf or fgets to adapt to 1) or 2)?
I would be happy with some example code for 2), because the fork/pipes are not my problem :P
RE-EDIT:
I'm not encouraged to use select here, only between map procs and the reduce process that will be monitoring the reads. I have restrictions now and :
I want each process to read total_lines/N lines each. But it seems like I have to make map procs open the file and then read the respective lines. So here are my doubts:
1- Is it bad or even possible to make every procs open the file simultaneously or almost simultaneously? How will that help in speeding up?
2- If it isn't possible to do that, I will have a parent opening the file (instead of each child doing that)that sends a struct with min and max limit and then the map procs will read whatever the lines they are responsible for, process them and give the reduce process a result (this doesn't matter for the problem now).
How can I divide correctly the number of lines by N maps and putting them to read at the same time? I think fseek() may be a good weapon, but I don't know HOW I can use it. Help, please!
If I understood correctly, you want to have all processes reading lines from a single file. I don't recommend this, it's kinda messy, and you'll have to a) read the same parts of the file several times or b) use locking/mutex or some other mechanism to avoid that. It'll get complicated and hard to debug.
I'd have a master process read the file, and assign lines to a subprocess pool. You can use shared memory to speed this up, and reduce the need for data-copying IPC; or use threads.
As for examples, I answered a question about forking and IPC and gave a code snippet on an example function that forks and returns a pair of pipes for parent-child communication. Let me look that up (...) here it is =P Can popen() make bidirectional pipes like pipe() + fork()?
edit: I kept thinking about this =P. Here's an idea:
Have a master process spawn subprocesses with something similar to what I showed in the link above.
Each process starts by sending a byte up to the master to signal it's ready, and blocking on read().
Have the master process read a line from the file to a shared memory buffer, and block on select() on its children pipes.
When select() returns, read one of the bytes that signal readiness and send to that subprocess the offset of the line in the shared memory space.
The master process repeats (reads a line, blocks on select, reads a byte to consume the readiness event, etc.)
The children process the line in whatever way you need, then send a byte to the master to signal readiness once again.
(You can avoid the shared memory buffer if you want, and send the lines down the pipes, though it'll involve constant data-copying. If the processing of each line is computationally expensive, it won't really make a difference; but if the lines require little processing, it may slow you down).
I hope this helps!
edit 2 based on Newba's comments:
Okay, so no shared memory. Use the above model, only instead of sending down the pipe the offset of the line read in the shared memory space, send the whole line. This may sound to you like you're wasting time when you could just read it from the file, but trust me, you're not. Pipes are orders of magnitude faster than reads from regular files in a hard disk, and if you wanted subprocesses to read directly from the file, you'll run into the problem I pointed at the start of the answer.
So, master process:
Spawn subprocesses using something like the function I wrote (link above) that creates pipes for bidirectional communication.
Read a line from the file into a buffer (private, local, no shared memory whatsoever).
You now have data ready to be processed. Call select() to block on all the pipes that communicate you with your subprocesses.
Choose any of the pipes that have data available, read one byte from it, and then send the line you have waiting to be processed in the buffer down the corresponding pipe (remember, we had 2 per child process, on to go up, one to go down).
Repeat from step 2, i.e. read another line.
Child processes:
When they start, they have a reading pipe and a writing pipe at their disposal. Send a byte down your writing pipe to signal the master process you are ready and waiting for data to process (this is the single byte we read in step 4 above).
Block on read(), waiting for the master process (that knows you are ready because of step 1) to send you data to process. Keep reading until you reach a newline (you said you were reading lines, right?). Note I'm following your model, sending a single line to each process at a time, you could send multiple lines if you wanted.
Process the data.
Return to step 1, i.e. send another byte to signal you are ready for more data.
There you go, simple protocol to assign tasks to as many subprocesses as you want. It may be interesting to run a test with 1 child, n children (where n is the number of cores in your computer) and more than n children, and compare performances.
Whew, that was a long answer. I really hope I helped xD
Since each of the processes is going to have to read the file in its entirety (unless the log lines are all of the same length, which is unusual), there really isn't a benefit to your proposal 2.
If you are going to split up the work into 3, then I would do:
Measure (stat()) the size of the log file - call it N bytes.
Allocate the range of bytes 0..(N/3) to first child.
Allocate the range of bytes (N/3)+1..2(N/3) to the second child.
Allocate the range of bytes 2(N/3)+1..end to the third child.
Define that the second and third children must synchronize by reading forward to the first line break after their start position.
Define that each child is responsible for reading to the first line break on or after the end of their range.
Note that the third child (last child) may have to do more work if the log file is growing.
Then the processes are reading independent segments of the file.
(Of course, with them all sharing the file, then the system buffer pool saves rereading the disk, but the data is still copied to each of the three processes, only to have each process throw away 2/3 of what was copied as someone else's job.)
Another, more radical option:
mmap() the log file into memory.
Assign the children to different segments of the file along the lines described previously.
If you're on a 64-bit machine, this works pretty well. If your log files are not too massive (say 1 GB or less), you can do it on a 32-bit machine too. As the file size grows above 1 GB or so, you may start running into memory mapping and allocation issues, though you might well get away with it until you reach a size somewhat less than 4 GB (on a 32-bit machine). The other issue here is with growing log files. AFAIK, mmap() doesn't map extra memory as extra data is written to the file.
Use a master and slave queue pattern.
The master sets up the slaves which sit waiting on a queue for work items.
The master then reads the file line by line.
Each line then represents a work item that you put on the queue
with a function pointer of how do the work.
One of the waiting slaves then takes the item of the queue
A slave processes a work item.
When a slave has finished it rejoins the work queue.
Related
Assume we have 2 threads in the process.
now we run the following code:
read(fd, buf, 10);
where fd is some file descriptor which is shared among the threads (say static), and buf is an array which is not shared among the threads (local variable).
now, assume that the file is 1KB and the first 10 chars in the file are "AAAAAAAAAA" and all the rest are 'B's. ("BBBBBB.....").
now If we have only one processor, what the output of the bufs are if ill print them in each thread?
I know the answer is that one of the arrays will always have only A's and the other only B's, but I don't fully understand why, because I think that there could be a context switch in the middle of this system-call (read) and then both of the buf's will have A's in them.
Is it even possible for a context switch to occur in the middle of a system-call? and if so what do you think will buf's could have at the end of the execution?
Modern disks cannot perform reads and writes at 10-byte granularity and instead perform reads and writes in units of sectors, traditionally 512 bytes for hard disk drives (HDDs).
Copying 10 characters to thread buffer happen very fast, before context switch though not guaranteed.
A simple program to get a feeling would be to have 2 threads printing to same console, one prints + and the other -. Check how many + before first -.
Anyway, on to the original question, change the size of the array to 1024 and have 1024 A's to start with and you will most probably see the difference.
I've read that pipes need to have a limited capacity. But I don't understand why. What happens if a process writes into a pipe without a limit?
It's due to buffering. Pipes are not "magical", pipes do not ensure all processes process each individual byte or character in lockstep. Instead pipes buffer inter-process output and then pass the buffer along. And this buffer size limit is what you're referring to. In many Linux distros and in macOS the buffer size is 64KiB.
Imagine there's a process that outputs 1GB of data every second to stdout - and it's piped to another process that can only process 100 bytes of data every minute on stdin - consider that those gigabytes of data have to go somewhere. If there was an infinitely sized buffer than you would quickly fill up the memory-space of whatever OS component owns the pipe and then start paging out to disk - and then your pagefile on disk would fill up - and that's not good.
By having maximum buffer sizes, the output process will be notified when it's filled the buffer and it's free to handle that event however is appropriate (e.g. by pausing output if it's a random number generator, by dropping data if it's a network monitor, by crashing, etc).
Internal mechanisms aside, I suspect the root issue behind the question is one of terminology. Pipes have limited capacity, but unlimited overall volume of data transferred.
The analogy to a piece of physical plumbing is pretty good: a given piece of water pipe has a characteristic internal volume defined by its length, its shape, and the cross section of its interior. At any given time, it cannot hold any more water than fits in that volume, so if you close a valve at its downstream end then water eventually (maybe immediately) stops flowing into its other end because all the available space within -- the pipe's capacity -- is full. Until and unless the pipe is permanently closed, however, there is no bound on how much water may be able traverse it over its lifetime.
In my software I read information from a stream X (stdout of another process) with process 1, then I send the information read to the other N-1 processes and finally I collected in process 1 all data elaborated by the N processes.
Now my question is: "What's the most efficient way to share the information read from the stream between processes?"
PS. Processes may also be in different computer connected through a network.
Here I list some possibilities:
Counting lines of stream (M lines), save to N files M/N
lines and send to each process 1 file.
Counting lines of stream (M lines), allocate enough memory to contain all information, send to each process directly the information.
But I think that these can be some problem:
Writing so much files can be an overhead and sending files over a network isn't efficient at all.
I need enough memory in process 1, so that process can be a bottleneck .
What do you suggest, do you have better ideas?
I'm using MPI on C to make this computation.
Using files is just fine if performance is not an issue. The advantage is, that you keep everything modular with the files as a decoupled interface. You can even use very simple command line tools:
./YOUR_COMMAND > SPLIT_ALL
split -n l/$(N) -d SPLIT_ALL SPLIT_FILES
Set N in your shell or replace appropriately.
Note: Unfortunately you cannot pipe directly into split in this case, because it then cannot determine the total number of lines when reading from stdin. If round robin, rather than contiguous split is fine, you can pipe directly:
./YOUR_COMMAND | split -n r/$(N) -d - SPLIT_FILES
You second solution is also fine - if you have enough memory. Keep in mind to use appropriate collective operations, e.g. MPI_Scatter(v) for sending, and MPI_Gather or MPI_Reduce for receiving the data from the clients.
If you run out of memory, then buffer the input in chunks (of for instance 100,000 lines), and then scatter the chunks to your workers, compute, collect the result, and repeat.
My understanding of FUSE's multithreaded read cycle is something like this:
....
.-> read --.
/ \
open ---> read ----+-> release
\ /
`-> read --'
....
i.e., Once a file is open'd, multiple read threads are spawned to read different chunks of the file. Then, when everything that was wanted has been read, there is a final, single release. All these are per ones definition of open, read and release as FUSE operations.
I'm creating an overlay filesystem which converts one file type to another. Clearly, random access without any kind of indexing is a problem; so for the time being, I'm resorting to streaming. That is, in the above model, each read thread would begin the conversion process from the start, until it arrives at the correct bit of converted data to push out into the read buffer.
This is obviously very inefficient! To resolve this, a single file conversion process can start at the open stage and use a mutex and read cursor (i.e., "I've consumed this much, so far") that the reader threads can use to force sequential access. That is, the mutex gets locked by the thread that requests the data from the current cursor position and all other reader threads have to wait until it's their turn.
I don't see why this wouldn't work for streaming a file out. However, if any random/non-sequential access occurs we'll have a deadlock: if the requested offset is beyond or before the current cursor position, the mutex will never unlock for the appropriate reader to be able to reach that point. (Presumably we have no access to FUSE's threads, so to act as a supervisor. Also, I can't get around the problem by forcing the file to be a FIFO, as FUSE doesn't support writing to them.)
Indeed, I would only be able to detect when this happens if the mutex is locked and the cursor is beyond the requested offset (i.e., the "can't rewind" situation). In that case, I can return EDEADLK, but there's no way to detect "fast-forward" requests that can't be satisfied.
At the risk of the slightly philosophical question... What do I do?
I've got two processes communicating. The first one writes data to a pipe or a fifo (I've tried both) and the second one reads what's in it. The data sent is currently six floats which come directly from a sensor, hence the need to upload it often (10Hz).
The problem is the process reading data is somewhat "heavier" and it might not be able to read fast enough. The pipe will be full, and the reading will be delayed until the end of the program.
Because such a delay can not be tolerated, I can detect on the writing part if the pipe is full (in which case, correct me if I'm wrong, it will wait for it to have enough room to write data). If such a detection is possible, how can I just clear the contents of the pipe so that the reading part will immediately receive recent data and won't have to go through the entire pipe full of old studd ?
In short, is there a way to just empty a pipe of its data (not having to close and re-open would be a plus).
Thanks a lot,
This will reduce how often the pipe is full, but not eliminate it.
Paradigm change: adjust the reader to throw away a percentage of reads.
Writer
When the writer sees a full queue, then next number it writes is a special number (e. g. NaN), and thens it writes the desired number.
Reader
The reader is set to throw away P percent of numbers.
The reader reads a number - waiting as needed. If it is not a candidate to throw away, proceed as usual.
When the reader considers throwing away a number to meet its throw-away percentage, it 1st tests if the pipe is empty. If so, the reader knows to reduce the percentage being thrown away and uses the number first read. If the pipe is not empty, it reads the pipe, throwing away the first number and using the 2nd.
When the reader reads the special number, it knows it is not throwing away enough numbers and increases its percentage, then it reads the pipe again.
If the overall performance of the reader and writer side does not vary too much, the reader will be throwing away 1 number every so often balancing the swift writer and the plodding reader performance. If there is a slight skewing favoring an empty queue over a full one, the pipe will rarely fill and the reader will more often receive fresh numbers.