FORTRAN: Save array and use in another programme - arrays

Is it possible to create an array in one programme and then use it in other programmes? The array I am looking to create is very large and its creation will take a while so I don't want to make it anew every time I run the main programme but instead just use it after creating it once in the other programme. Because of its size I'm not sure if printing it to file and then reading it back in would not also be quite inefficient?
It is an integer array of dimensions 1:300 000 and 100.

Long comment:
There are many formats in which you can save data: Fortran unformatted sequential, Fortran unformatted direct, Fortran unformatted stream, NetCDF, HDF5, VTK, ... Really difficult to answer this with any definite answer. We really don't know how time consuming it is to compute it, so we cannot judge whether saving would be more time consuming or not.
Definitely you should be looking for unformatted or binary formats.
Edit: your array is actually not that big. The saving and reading will ne quick. Just use an unformatted file form.

Related

Writing data to large file in matlab

In Matlab, I need to generate and save over 8 million lines of data to a file.
I have been using fprintf, but as the file gets larger everything seems to slow down. I am not sure but I suspect fprintf is loading the whole file into memory.
Right now I have been iterating like this: calculate values, write to file, repeat.
Would it be better to say first calculate 20,000 values at a time, then write them all to file at once?
I understand data store is a good way to analyze the data, can it also be used to help write it to the file? If so, how?

Read Matlab 2D array to intel-fortran and write from intel-fortran to matlab file

I am using Intel Fortran with Visual Studio 2008 SP1.
My main question is: I would like to read 2D array from Matlab .mat file into fortran. Also, save the output of Fortran 2D matricies to a preferably .mat file, as currently I can save it to a text file using:
write(unit = #, <linelength>F22.8>),matrixname
This line works, but I am not sure if loose any of my double precision. If I do not loose precision, I can stick to it, otherwise I would need help. And I will only need a way to read from a Matlab file to intel-fortran with keeping the precision. There is no characters in these arrays, they are of numerical values.
I need to conserve the precision, since I am working with spherical functions, and they can be highly divergent.
matlab's internal ".mat" is "maybe" or "maybe not" compressed depending on versions. I think you do not want to use this for portable file transfer. ( Having attempted to find good documentation on the subject I wonder if #HPM was being sarcastic in his comment.. )
A keep it simple approach for a single array is to simply exchange as raw binary.
Example write in matlab:
a=[1. 2. ; 3. 4. ]
fileID = fopen('test.bin','w');
fwrite(fileID,a,'double');
fclose(fileID);
then in fortran
implicit none
double precision a(2,2)
open(unit=100,'test.bin',access='stream',form='unformatted')
read(100)a
note here the data is actually "flat", the reading program needs to know the array dimension. You can of course write the dimensions to the file if you need.
there are of course a number of potential portability issues with binary data, but this will work for most cases assuming you are reading/writing on the same hardware.

Most precise and efficient way to write data in Fortran

Suppose I have some program that at each iteration i outputs some data, a,b,c.
What is the best way to write this data to a file, with regards to both speed and precision? What are the Fortran standard methods/ best practices for writing this data to file?
My current considerations are:
Writing all data to an array and then writing this array to file.
Writing line-by-line to unformatted binary file.
Writing line-by-line to text file.
I understand that writing to an array then to file will be faster, but what about the precision in each case? What about the situation where I do not know the value of i and so cannot initially define an array size?

Standard format for writing Compressed Row Storage into text files?

I have a large sparse matrix stored in Compressed Row Storage (CRS) format. This is basically three arrays: an array containing the Values, an array for Column Index, and a final array containing the Row Pointers. E.g. http://web.eecs.utk.edu/~dongarra/etemplates/node373.html
I want to write this information into a text (.txt) file, which is intended to be read and put into three arrays using C. I currently plan to do this by writing all the entries in the Value array in one long line separated by commas. E.g. 5.6,10,456,78.2,... etc. Then do the same for the other two arrays.
My C code will end read the first line, put all the values into an array labeled "Value". And so on.
Question
Is this "correct"? Or is there a standard way of putting CRS data into text files?
No standard format that I'm aware of. You decide on a format that makes your life easy.
First, consider that if you want to look at one of these text files, you'll be instantly put off by the long lines. Some text editors might simply hate you. There's nothing wrong with splitting lines up.
Second, consider writing out the number of elements in each array (well, I suppose there's only two different array lengths for the three arrays) at the beginning of the file. This will let you preallocate your arrays. If you have all array lengths at hand, you have the option of doing a single memory allocation.
Finally, consider writing out some sensible tag names. Some kind of header that can identify your file is the correct format, then something to denote the start of each array. It's kind of a sanity thing for your code to detect problems with the file. It might just be one character, but it's something.
Now... call me a grungy old programmer, but I'd probably just write whole lot in binary. Especially if it's floating point data, I wouldn't want to deal with the loss of precision you get when you write out numbers as text (or the space they can consume when you write them with full precision). Binary files are easy to write and quick to run. You just have to be careful if you're going to be using them across platforms with different endian order.
That's my 2 cents worth.. Hope it's useful to you.
If you want to stick to some widely-used standards, have a look at the Matrix Market. This is a repository with many matrices arising in a variety of engineering and science problems. You can find software libraries to save and read the matrices as well.

Creating a binary search of an alphabetically ordered .txt file in C

I'm working on creating a binary search algorithm in C that searches for a string in a .txt file. Each line is a string representing a stock ticker. Not being familiar with C, this is taking far too long. I have a few questions:
1.) Once I have opened a file using fopen, does it make more sense in terms of efficiency for the algorithm to step through the file using some function provided in the C library for scanning files, doing the compare directly from the file, or should I copy each line into an array and have the algorithm search the array?
2.) If I should compare directly from the file, what is the best way to step through it? Assume I have the number of lines in the file, is there some way to go directly to the middle line, scan the string and do the compare?
I'm sorry if this is too vague. Not too sure how to better explain. Thanks for your time
Unless your file is exceedingly big (> 2GB) then loading the file in memory prior searching it is the way to go. In case you cannot load the file in memory, you could hold the offset of each line in an int[] or (if the file contains too many lines...) create another binary file and write the offset of each lines as integers...
Having everything in memory is by far preferable, though.
You cannot binary search lines of a text-file without knowing the length of each line in advance, so you'll most likely want to read each line into memory at first (unless the file is very big).
But if your goal is only to search for a single given line as quickly as possible, you might as well just do linear search directly on the file. There's no point in getting O(log n) at the cost of a O(n) setup cost if the search is only done once.
Reading it all in with a bulk read and walking through it with pointers (to memory) is very fast. Avoid doing multiple I/O calls if you can.
I should also mention that memory mapped files can be very suitable for something like this. See mmap() if on Unix. This is definitely your best bet for really large files.
This is a great question!
The challenge of binary search is that the benefits of binary search come from being able to skip past half the elements at each step in O(1). This guarantees that, since you only do O(lg n) probes, that the runtime is O(lg n). This is why, for example, you can do a fast binary search on an array but not a linked list - in the linked list, finding the halfway point of the elements takes linear time, which dominates the time for the search.
When doing binary search on a file you are in a similar position. Since all the lines in the file might not have the same length, you can't easily jump to the nth line in the file given some number n. Consequently, implementing a good, fast binary search on a file will be a bit tricky. Somehow, you will need to know where each line starts and stops so that you can efficiently jump around in the file.
There are many ways you can do this. First, you could load all the strings from the file into an array, as you've suggested. This takes linear time, but once you have the array of strings in memory all future binary searches will be very fast. The catch is that if you have a very large file, this may take up a lot of memory, and could be prohibitively expansive. Consequently, another alternative might be not to store the actual stings in the array, but rather the offsets into the file at which each string occurs. This would let you do the binary search quickly - you could seek the file to the proper offset when doing a comparison - and for large stings can be much more space-efficient than the above. And, if all the strings are roughly the same length, you could just pad every line to some fixed size to allow for direct computation of the start position of each line.
If you're willing to expend some time implementing more complex solutions, you might want to consider preprocessing the file so that instead of having one string per line, instead you have at the top of the file a list of fixed-width integers containing the offsets of each string in the file. This essentially does the above work, but then stores the result back in the file to make future binary searches much faster. I have some experience with this sort of file structure, and it can be quite fast.
If you're REALLY up for a challenge, you could alternatively store the strings in the file using a B-tree, which would give you incredibly fast lookup times fir each string by minimizing the number of disk reads that you need to do.
Hope this helps!
I don't see how you can do compare directly from the file. You will have to have a buffer to store data read from disk and use that buffer. So it doesn't make sense, it is just impossible.
You cannot jump to a particular line in the file. Not unless you know the offset in bytes of the beginning of that line relative to the beginning of the file.
I'd recommend using mmap to map this file directly into memory and work with it as with character array. Operating system will make work with file (like seeking, reading, writing) transparent to you, and you will just work with it like with a buffer in memory. Note that mmap is limited to 4 GB on 32-bit systems. But if that file is bigger, you probably need to ask the question - why on earth someone has this big file not in an indexed database.

Resources