How we can read a whole file using multiple threads if the total number of lines are not given ?
Maybe first we calculate total numbers of lines but I don't think its a good approach.
Can Someone Help ?
Related
In my parallel program, there was a big matrix. Each process computed and stored a part of it. Then the program wrote the matrix to a file by letting each process wrote its own part of the matrix in the correct order. The output file is in "unformatted" form. But when I tried to read the file in a serial code (I have the correct size of the big matrix allocated), I got an error which I don't understand.
My question is: in an MPI program, how do you get a binary file as the serial version output for a big matrix which is stored by different processes?
Here is my attempt:
if(ThisProcs == RootProcs) then
open(unit = file_restart%unit, file = file_restart%file, form = 'unformatted')
write(file_restart%unit)psi
close(file_restart%unit)
endif
#ifdef USEMPI
call mpi_barrier(mpi_comm_world,MPIerr)
#endif
do i = 1, NProcs - 1
if(ThisProcs == i) then
open(unit = file_restart%unit, file = file_restart%file, form = 'unformatted', status = 'old', position = 'append')
write(file_restart%unit)psi
close(file_restart%unit)
endif
#ifdef USEMPI
call mpi_barrier(mpi_comm_world,MPIerr)
#endif
enddo
Psi is the big matrix, it is allocated as:
Psi(N_lattice_points, NPsiStart:NPsiEnd)
But when I tried to load the file in a serial code:
open(2,file=File1,form="unformatted")
read(2)psi
forrtl: severe (67): input statement requires too much data, unit 2 (I am using MSVS 2012+intel fortran 2013)
How can I fix the parallel part to make the binary file readable for the serial code? Of course one can combine them into one big matrix in the MPI program, but is there an easier way?
Edit 1
The two answers are really nice. I'll use access = "stream" to solve my problem. And I just figured I can use inquire to check whether the file is "sequential" or "stream".
This isn't a problem specific to MPI, but would also happen in a serial program which took the same approach of writing out chunks piecemeal.
Ignore the opening and closing for each process and look at the overall connection and transfer statements. Your connection is an unformatted file using sequential access. It's unformatted because you explicitly asked for that, and sequential because you didn't ask for anything else.
Sequential file access is based on records. Each of your write statements transfers out a record consisting of a chunk of the matrix. Conversely, your input statement attempts to read from a single record.
Your problem is that while you try to read the entire matrix from the first record of the file that record doesn't contain the whole matrix. It doesn't contain anything like the correct amount of data. End result: "input statement requires too much data".
So, you need to either read in the data based on the same record structure, or move away from record files.
The latter is simple, use stream access
open(unit = file_restart%unit, file = file_restart%file, &
form = 'unformatted', access='stream')
Alternatively, read with a similar loop structure:
do i=1, NPROCS
! read statement with a slice
end do
This of course requires understanding the correct slicing.
Alternatively, one can consider using MPI-IO for output, which is very similar to using stream output. Read this back in with stream access. You can find about this concept elsewhere on SO.
Fortran unformatted sequential writes in record files are not quite completely raw data. Each write will have data before and after the record in a processor dependent form. The size of your reads cannot exceed the record size of your writes. This means if psi is written in two writes, you will need to read it back in two reads, you cannot read it in at once.
Perhaps the most straightforward option is to instead use stream access instead of sequential. A stream file is indexed by bytes (generally) and does not contain record start and end information. Using this access method you can split the write but read all at once. Stream access is a feature of Fortran 2003.
If you stick with sequential access, you'll need to know how many MPI ranks wrote the file and loop over properly sized records to read the data as it was written. You could make the user specify the number of ranks or store that as the first record in the file and read that first to determine how to read the rest of the data.
If you are writing MPI, why not MPI-IO? Each process will call MPI_File_set_view to set a subarray view of the file, then each process can collectively write the data with MPI_FILE_WRITE_ALL . This approach is likely to scale really well on big machines (though your approach will be fine up to oh, maybe 100 processors.)
I 'd like to write a large array in c in a .csv file.
Would it be possible to write it in parallel?
maybe using a OpenMP ?
The piece of code I'd like to parallelize is a typical IO operation in a file.
Given a resutVector1 and a resultVector2 of size n,
fp=fopen("output.csv","w+");
for(i=0;i<n;i++){
fprintf(fp,"%f,%f\n",resultVector1[i],resultVector2[i]);
}
fclose(fp);
You are going to run into a number of problems trying to perform a parallel write to a single file.
w+ truncates an existing file to 0 length before the write operations or creates a new file, How are you going to coordinate the writing of parallel file pointers?
In any case if you have multiple writers, you will need to synchronize them and you will lose any speed advantage you would have had over a sequential write. In fact, they will probably be slower due to the synchronization overhead than a single dedicated sequential write thread.
Thinking about your question a bit more. If you really had a huge array, say 500 million integers and you really needed the fastest way to read/write this array to a persistent file. You could divide the array by the number of dedicated threads you can allocate, write each segment to a separate file. You can then read this array back into your array by doing a parallel read of this data. In this case you can use a Parallel For type of pattern and avoid the synchronization lock overhead you have with a single file.
So in the example I gave, if you have 4 threads, you will divide the array inter quarters where each thread will write/read its own quarter to and from its separate file.
Note: if all the files are on the same disk drive you may have some I/O slowdown do the multiple simultaneous read/write operations going on at different parts of the disk. This effect can be mediated if you are able to save each file to a different disk/server.
You could open 2 files and write each vector in its own file, this MIGHT help but I won't bet on it, it would depend on the architecture of your platform I think. Plus if you need both in the same file you still have to copy it together, which again takes time.
Also the writes to the harddrive itself are probably the bottleneck here so there is no need to speed up the way you fill up the buffer to the harddrive.
You might open two files on two different harddrives, but I still doubt this would give you a real speed up.
The question triggered me to write pread, a parallel read method implemented using pthread library. Given the file size FILESIZE and the number of threads n, pread method will slice the input file into roughly equal chunks of size FILESIZE/n and assign each chunk to a thread. Then, each thread starts reading the file using fread from different offsets of file with predefined BUFFFERSIZE in parallel. You can find the implementation here.
This is an ongoing implementation, I'm still working on parallel write side.
In my software I read information from a stream X (stdout of another process) with process 1, then I send the information read to the other N-1 processes and finally I collected in process 1 all data elaborated by the N processes.
Now my question is: "What's the most efficient way to share the information read from the stream between processes?"
PS. Processes may also be in different computer connected through a network.
Here I list some possibilities:
Counting lines of stream (M lines), save to N files M/N
lines and send to each process 1 file.
Counting lines of stream (M lines), allocate enough memory to contain all information, send to each process directly the information.
But I think that these can be some problem:
Writing so much files can be an overhead and sending files over a network isn't efficient at all.
I need enough memory in process 1, so that process can be a bottleneck .
What do you suggest, do you have better ideas?
I'm using MPI on C to make this computation.
Using files is just fine if performance is not an issue. The advantage is, that you keep everything modular with the files as a decoupled interface. You can even use very simple command line tools:
./YOUR_COMMAND > SPLIT_ALL
split -n l/$(N) -d SPLIT_ALL SPLIT_FILES
Set N in your shell or replace appropriately.
Note: Unfortunately you cannot pipe directly into split in this case, because it then cannot determine the total number of lines when reading from stdin. If round robin, rather than contiguous split is fine, you can pipe directly:
./YOUR_COMMAND | split -n r/$(N) -d - SPLIT_FILES
You second solution is also fine - if you have enough memory. Keep in mind to use appropriate collective operations, e.g. MPI_Scatter(v) for sending, and MPI_Gather or MPI_Reduce for receiving the data from the clients.
If you run out of memory, then buffer the input in chunks (of for instance 100,000 lines), and then scatter the chunks to your workers, compute, collect the result, and repeat.
I'm wondering if there is an mechanism that reads a file while it is being written and remove the content that has been read simultaneously. The purpose for doing this is because the file is stored in memory (ramdisk) and as the file size increases, we need to remove the part that has already being processed.
Thanks a lot!!!
PS: I'm using Linux and Java for this. :)
Data cannot be removed from the beginning or middle of a file. Process the data using multiple files and erase them as they are consumed.
Reading from a file while it is being written to is no big deal, this is the purpose of every tail program, however deleting already read content of an opened file... I don't think it is possible.
You may want to think of a work around. For example you can have a number of files {0,n} with the same limit of bytes to write to. Start writing the file_i where i is the highest available number out of {0,n} and go up to limit. Reading starts from the lowest available file_i, reads up to limit and when done deletes the file just consumed.
We still haven't heard what OS our friend user2386567 is using, but as a counterpoint to the other answers declaring that it's impossible to delete data from the middle of a file, I'd like to point out that Linux has FALLOC_FL_PUNCH_HOLE for that exact purpose.
I am writing multi process fibonacci number calculator , I have a file that keeps track of the fibonacci numbers , first process open the file and write first fibonacci numbers (0 and 1 ), then do fork and its child process read the last two numbers add them up and write the next into file and close the file and fork again this process continue like that with forking and child adding up numbers and writing calculated number into file, using fork inside the for not a good solution neither recursive call,is there any suggesstion for problem ??
Here is the link of the problem we are
talking about multi-process part of
the problem which is part 2
http://cse.yeditepe.edu.tr/~sbaydere/fall2010/cse331/files/assignments/F10A1.pdf
Assuming you're calculating them in a "simple" way (i.e. without using a cunning formula), I don't think it's a good candidate for parallel processing at all.
It's easy to come up with an O(n) solution, but each result depends on the previous one, so it's inherently tricky to parallelize. I can't see any benefit in your current parallel approach, as after each process has finished its own job and forked a child to get the next number, it's basically done... so you might as well do the work of the forked child in the existing process.
Fibonacci number calculation is a really strange idea to go multiprocess. Indeed, to calculate a number, you do need to know the previous two. Multiple processes cannot calculate other numbers but the next one, and only the next one. Multiple processes will all calculate the next Fibonacci number. Anyway, you'll double check.
You might want to look at this article:
http://cgi.cse.unsw.edu.au/~dons/blog/2007/11/29
There are more ideas here:
http://www.haskell.org/haskellwiki/The_Fibonacci_sequence
probably this isn't solving your problem, but here is a trivial way to calculate the fibonacci numbers of a given range
int fibo(int n) { return (n<=2)?n:(fibo(n-1))+(fibo(n-2)); }