Suppose I have some program that at each iteration i outputs some data, a,b,c.
What is the best way to write this data to a file, with regards to both speed and precision? What are the Fortran standard methods/ best practices for writing this data to file?
My current considerations are:
Writing all data to an array and then writing this array to file.
Writing line-by-line to unformatted binary file.
Writing line-by-line to text file.
I understand that writing to an array then to file will be faster, but what about the precision in each case? What about the situation where I do not know the value of i and so cannot initially define an array size?
Related
In Matlab, I need to generate and save over 8 million lines of data to a file.
I have been using fprintf, but as the file gets larger everything seems to slow down. I am not sure but I suspect fprintf is loading the whole file into memory.
Right now I have been iterating like this: calculate values, write to file, repeat.
Would it be better to say first calculate 20,000 values at a time, then write them all to file at once?
I understand data store is a good way to analyze the data, can it also be used to help write it to the file? If so, how?
I am using Intel Fortran with Visual Studio 2008 SP1.
My main question is: I would like to read 2D array from Matlab .mat file into fortran. Also, save the output of Fortran 2D matricies to a preferably .mat file, as currently I can save it to a text file using:
write(unit = #, <linelength>F22.8>),matrixname
This line works, but I am not sure if loose any of my double precision. If I do not loose precision, I can stick to it, otherwise I would need help. And I will only need a way to read from a Matlab file to intel-fortran with keeping the precision. There is no characters in these arrays, they are of numerical values.
I need to conserve the precision, since I am working with spherical functions, and they can be highly divergent.
matlab's internal ".mat" is "maybe" or "maybe not" compressed depending on versions. I think you do not want to use this for portable file transfer. ( Having attempted to find good documentation on the subject I wonder if #HPM was being sarcastic in his comment.. )
A keep it simple approach for a single array is to simply exchange as raw binary.
Example write in matlab:
a=[1. 2. ; 3. 4. ]
fileID = fopen('test.bin','w');
fwrite(fileID,a,'double');
fclose(fileID);
then in fortran
implicit none
double precision a(2,2)
open(unit=100,'test.bin',access='stream',form='unformatted')
read(100)a
note here the data is actually "flat", the reading program needs to know the array dimension. You can of course write the dimensions to the file if you need.
there are of course a number of potential portability issues with binary data, but this will work for most cases assuming you are reading/writing on the same hardware.
Is it possible to create an array in one programme and then use it in other programmes? The array I am looking to create is very large and its creation will take a while so I don't want to make it anew every time I run the main programme but instead just use it after creating it once in the other programme. Because of its size I'm not sure if printing it to file and then reading it back in would not also be quite inefficient?
It is an integer array of dimensions 1:300 000 and 100.
Long comment:
There are many formats in which you can save data: Fortran unformatted sequential, Fortran unformatted direct, Fortran unformatted stream, NetCDF, HDF5, VTK, ... Really difficult to answer this with any definite answer. We really don't know how time consuming it is to compute it, so we cannot judge whether saving would be more time consuming or not.
Definitely you should be looking for unformatted or binary formats.
Edit: your array is actually not that big. The saving and reading will ne quick. Just use an unformatted file form.
I have an array I've created which is of size: 256^3.
real*8, dimension(256,256,256) :: dense
open(unit=8,file=fname,form="unformatted")
write(8)dense(:,:,:)
close(8)
What would be the best way to write this out so Matlab can read it? I have some post processing I want to use.
I am using gfortran so I can't use binary format :{ is this true? I set the form to "binary" and it doesn't recognise it. I don't have ifort installed either.
Write the array out using unformatted stream access. Stream access is the standard equivalent of binary. Stealing from IRO-bot's answer:
real(kind=kind(0.0d0)),dimension(256,256,256) :: dense
open(unit=8,file='test.dat',& ! Unformatted file, stream access
form='unformatted',access='stream')
write(unit=8) dense ! Write array
close(unit=8)
end
This is more than likely adequate and appropriate for your needs. Note though, that for more convoluted or complicated output requirements Matlab comes with an library of routines callable from a compiled language that allow you to write .mat files. Other libraries also exist that can facilitate this sort of data transfer - for example HDF5.
Yes, you can write binary files using either stream access, as suggested by IanH, or direct access:
integer :: reclen
real(kind=kind(0.0d0)),dimension(256,256,256) :: dense
inquire(iolength=reclen)dense ! Inquire record length of the array dense
open(unit=8,file='test.dat',& ! Binary file, direct access
form='unformatted',access='direct',recl=reclen)
write(unit=8,rec=1)dense ! Write array into first record
close(unit=8)
end
Unless you specify access attribute in the open statement, the file will be opened in sequential mode, which may be inconvenient for reading because it adds a padding to each record which contains information about record length. By using direct access, you are able to specify the record length explicitly, and in this case, the size of the file written will be exactly 8*256^3, so assuming you know the array ordering and endianness, you are able to read it from your MATLAB script.
I'm working on creating a binary search algorithm in C that searches for a string in a .txt file. Each line is a string representing a stock ticker. Not being familiar with C, this is taking far too long. I have a few questions:
1.) Once I have opened a file using fopen, does it make more sense in terms of efficiency for the algorithm to step through the file using some function provided in the C library for scanning files, doing the compare directly from the file, or should I copy each line into an array and have the algorithm search the array?
2.) If I should compare directly from the file, what is the best way to step through it? Assume I have the number of lines in the file, is there some way to go directly to the middle line, scan the string and do the compare?
I'm sorry if this is too vague. Not too sure how to better explain. Thanks for your time
Unless your file is exceedingly big (> 2GB) then loading the file in memory prior searching it is the way to go. In case you cannot load the file in memory, you could hold the offset of each line in an int[] or (if the file contains too many lines...) create another binary file and write the offset of each lines as integers...
Having everything in memory is by far preferable, though.
You cannot binary search lines of a text-file without knowing the length of each line in advance, so you'll most likely want to read each line into memory at first (unless the file is very big).
But if your goal is only to search for a single given line as quickly as possible, you might as well just do linear search directly on the file. There's no point in getting O(log n) at the cost of a O(n) setup cost if the search is only done once.
Reading it all in with a bulk read and walking through it with pointers (to memory) is very fast. Avoid doing multiple I/O calls if you can.
I should also mention that memory mapped files can be very suitable for something like this. See mmap() if on Unix. This is definitely your best bet for really large files.
This is a great question!
The challenge of binary search is that the benefits of binary search come from being able to skip past half the elements at each step in O(1). This guarantees that, since you only do O(lg n) probes, that the runtime is O(lg n). This is why, for example, you can do a fast binary search on an array but not a linked list - in the linked list, finding the halfway point of the elements takes linear time, which dominates the time for the search.
When doing binary search on a file you are in a similar position. Since all the lines in the file might not have the same length, you can't easily jump to the nth line in the file given some number n. Consequently, implementing a good, fast binary search on a file will be a bit tricky. Somehow, you will need to know where each line starts and stops so that you can efficiently jump around in the file.
There are many ways you can do this. First, you could load all the strings from the file into an array, as you've suggested. This takes linear time, but once you have the array of strings in memory all future binary searches will be very fast. The catch is that if you have a very large file, this may take up a lot of memory, and could be prohibitively expansive. Consequently, another alternative might be not to store the actual stings in the array, but rather the offsets into the file at which each string occurs. This would let you do the binary search quickly - you could seek the file to the proper offset when doing a comparison - and for large stings can be much more space-efficient than the above. And, if all the strings are roughly the same length, you could just pad every line to some fixed size to allow for direct computation of the start position of each line.
If you're willing to expend some time implementing more complex solutions, you might want to consider preprocessing the file so that instead of having one string per line, instead you have at the top of the file a list of fixed-width integers containing the offsets of each string in the file. This essentially does the above work, but then stores the result back in the file to make future binary searches much faster. I have some experience with this sort of file structure, and it can be quite fast.
If you're REALLY up for a challenge, you could alternatively store the strings in the file using a B-tree, which would give you incredibly fast lookup times fir each string by minimizing the number of disk reads that you need to do.
Hope this helps!
I don't see how you can do compare directly from the file. You will have to have a buffer to store data read from disk and use that buffer. So it doesn't make sense, it is just impossible.
You cannot jump to a particular line in the file. Not unless you know the offset in bytes of the beginning of that line relative to the beginning of the file.
I'd recommend using mmap to map this file directly into memory and work with it as with character array. Operating system will make work with file (like seeking, reading, writing) transparent to you, and you will just work with it like with a buffer in memory. Note that mmap is limited to 4 GB on 32-bit systems. But if that file is bigger, you probably need to ask the question - why on earth someone has this big file not in an indexed database.