creating an array of financial time series objects - arrays

I have a couple of fints, how do I preallocate a cell array so that I can loop through them later? I don't really care if they are stored as a cell array or array or anything different, I just want to be able to do the following
for(i = 1:size(stocks))
figure(i);
plot(stocks(i));
end
or something equivalent. allocating with stocks = zeros(0,5) works great first, but doesn't work when I try to insert the fints because it is assumes it is a double. How would you even go about preallocating arrays for financial time series obejcts? Since it would be different lenghts everytime you insert a new one.

From the Matlab's doc on Preallocate Memory for a Cell Array
Cell arrays do not require completely contiguous memory. However, each
cell requires contiguous memory, as does the cell array header that
MATLAB® creates to describe the array. For very large arrays,
incrementally increasing the number of cells or the number of elements
in a cell results in Out of Memory errors.
Initialize a cell array by calling the cell function, or by assigning
to the last element. For example, these statements are equivalent:
C = cell(25,50); C{25,50} = [];
MATLAB creates the header for a
25-by-50 cell array. However, MATLAB does not allocate any memory for
the contents of each cell.

Related

How to efficiently append items to large arrays in Swift?

I am working on a Swift project the involves very large dynamically changing arrays. I am running into a problem where each successive operation take longer than the former. I am reasonably sure this problem is caused by appending to the arrays, as I get the same problem with a simple test that just appends to a large array.
My Test Code:
import Foundation
func measureExecution(elements: Int, appendedValue: Int) -> Void {
var array = Array(0...elements)
//array.reserveCapacity(elements)
let start = DispatchTime.now()
array.append(appendedValue)
let end = DispatchTime.now()
print(Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000_000)
}
for i in 0...100 {
measureExecution(elements: i*10000, appendedValue: 1)
}
This tries for a 100 different array sizes between 10000 and 1000000, timing how long it take to append one item to the end of the array. As I understand it, Swift arrays are dynamic arrays that will reallocate memory geometrically (it allocates more and more memory each time it needs to reallocate), which Apple's documentation says should mean appending a single element to an array is an O(1) operation when averaged over many calls to the append(_:) method (source). As such, I don't think memory allocation is causing the issue.
However, there is a linear relationship between the length of the array and the time it takes to append an element. I graphed the times for a bunch of array lengths, and baring some outliers it is pretty clearly O(n). I also ran the same test with reserved capacity (commented out in the code block) to confirm that memory allocation was not the issue, and I got nearly identical results:
How do I efficiently append to the end of massive arrays (preferably without using reserveCapacity)?
From what I've read, Swift arrays pre-allocate storage. Each time you fill an Array's allocated storage, it doubles the space allocated. That way you don't do a new memory allocation that often, and also don't allocate a bunch of space you don't need.
The Array class does have a reserveCapacity(_:). If you know how many elements you are going to store you might want to try that.

Numpy concatenate is slow: any alternative approach?

I am running the following code:
for i in range(1000)
My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)
The above code is slow. Is there any faster approach?
This is basically what is happening in all algorithms based on arrays.
Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).
If you got your data at np.array creation-time, just add these all at once (memory will allocated only once then!)
If not, collect them with something like a linked list (allowing O(1) appending-operations). Then read it in your np.array at once (again only one memory allocation).
This is not much of a numpy-specific topic, but much more about data-strucures.
Edit: as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:
How are lists implemented?
Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.
(bold annotations by me)
Maybe creating an empty array with the correct size and than populating it?
if you have a list of arrays with same dimensions you could
import numpy as np
arr = np.zeros((len(l),)+l[0].shape)
for i, v in enumerate(l):
arr[i] = v
works much faster for me, it only requires one memory allocation
It depends on what New_Rows[i] is, and what kind of array do you want. If you start with lists (or 1d arrays) that you want to join end to end (to make a long 1d array) just concatenate them all at once. Concatenate takes a list of any length, not just 2 items.
np.concatenate(New_Rows, axis=0)
or maybe use an intermediate list comprehension (for more flexibility)
np.concatenate([row for row in New_Rows])
or closer to your example.
np.concatenate([New_Rows[i] for i in range(1000)])
But if New_Rows elements are all the same length, and you want a 2d array, one New_Rows value per row, np.array does a nice job:
np.array(New_Rows)
np.array([i for i in New_Rows])
np.array([New_Rows[i] for i in range(1000)])
np.array is designed primarily to build an array from a list of lists.
np.concatenate can also build in 2d, but the inputs need to be 2d to start with. vstack and stack can take care of that. But all those stack functions use some sort of list comprehension followed by concatenate.
In general it is better/faster to iterate or append with lists, and apply the np.array (or concatenate) just once. appending to a list is fast; much faster than making a new array.
I think #thebeancounter 's solution is the way to go.
If you do not know the exact size of your numpy array ahead of time, you can also take an approach similar to how vector class is implemented in C++.
To be more specific, you can wrap the numpy ndarray into a new class which has a default size which is larger than your current needs. When the numpy array is almost fully populated, copy the current array to a larger one.
Assume you have a large list of 2D numpy arrays, with the same number of columns and different number of rows like this :
x = [numpy_array1(r_1, c),......,numpy_arrayN(r_n, c)]
concatenate like this:
while len(x) != 1:
if len(x) == 2:
x = np.concatenate((x[0], x[1]))
break
for i in range(0, len(x), 2):
if (i+1) == len(x):
x[0] = np.concatenate((x[0], x[i]))
else:
x[i] = np.concatenate((x[i], x[i+1]))
x = x[::2]

Multi dimensional array with varying size

I want to make a 2D array "data" with the following dimensions: data(T,N)
T is a constant and N I dont know anything about to begin with. Is it possible to do something like this in fortran
do i = 1, T
check a few flags
if (all flags ok)
c = c+ 1
data(i,c) = some value
end if
end do
Basically I have no idea about the second dimension. Depending on some flags, if those flags are fine, I want to keep adding more elements to the array.
How can I do this?
There are several possible solutions. You could make data an allocatable array and guess the maximum value for N. As long as you don't excess N, you keep adding data items. If a new item would exceed the array size, you create a temporary array, copy data to the temporary array, deallocate data and reallocate with a larger dimension.
Another design choice would be to use a linked list. This is more flexible in that the length is indefinite. You loss "random access" in that the list is chained rather than indexed. You create an user defined type that contains various data, e.g., scalers, arrays, whatever, and also a pointer. When you add a list item, the pointer points to that next item. The is possible in Fortran >=90 since pointers are supported.
I suggest searching the web or reading a book about these data structures.
Assuming what you wrote is more-or-less how your code really goes, then you assuredly do know one thing: N cannot be greater than T. You would not have to change your do-loop, but you will definitely need to initialize data before the loop.

How to increase array size on-the-fly in Fortran?

My program is running though 3D array, labelling 'clusters' that it finds and then doing some checks to see if any neighbouring clusters have a label higher than the current cluster. There's a second array that holds the 'proper' cluster label. If it finds that the nth adjoining cluster is labelled correctly, that element is assigned to 0, otherwise is assigns it to the correct label (for instance if the nth site has label 2, and a neighbour is labeled 3, the 3rd element of the labelArray is set to 2). I've got a good reason to do this, honest!
All I want is to be able to assign the nth element of the labelArray on the fly. I've looked at allocatable arrays and declaring things as labelArray(*) but I don't really understand these, despite searching the web, and StackOverflow.
So any help on doing this would be awesome.
Here is a Stack Overflow question with some code examples showing several ways of using Fortran allocatable arrays: How to get priorly-unkown array as the output of a function in Fortran: declaring, allocating, testing for being already being allocated, using the new move_alloc and allocation on assignment. Not shown there is explicit deallocation, since the examples are using move_alloc and automatic deallocation on exit of a procedure.
P.S. If you want to repeatedly add one element you should think about your data structure approach. Adding one element at a time by growing an array is not an efficient approach. To grow an array from N elements to N+1 in Fortran will likely mean creating a new array and copying all of the existing elements. A more appropriate data structure might be a linked list. You can create a linked list in Fortran by creating a user-defined type and using pointers. You chain the members together, pointing from one to the next. The overhead to adding another member is minor. The drawback is that it is easiest to access the members of the list in order. You don't have the easy ability of an array, using indices, to access the members in any order.
Info about linked lists in Fortran that I found on the web: http://www-uxsup.csx.cam.ac.uk/courses/Fortran/paper_12.pdf and http://www.iag.uni-stuttgart.de/IAG/institut/abteilungen/numerik/images/4/4c/Pointer_Introduction.pdf
If you declare an array allocatable, you use deffered shape in the form real,
allocatable :: labelArray(:,:)
, or
real,dimension(:,:),allocatable :: labelArray
with number of double colons meaning rank (number of your indexes) of your array.
If the array is unallocated you use
allocate(labelarray(shapeyouwant))
with the correct number of indexes. For example allocate(labelarray(2:3,-1:5)) for array with indexes 2 to 3 in demension 1 and -1 to 5 in dimension 2.
For change of dimension you have to deallocate the array first using
deallocate(labelArray)
To reallocate an allocated array to a new shape you first need to allocate a new array with the new shape, copy the existing array to the new array and move the reference of the old array to the new array using move_alloc().
call allocate(tmp(size_old+n_enlarge))
tmp(1:size_old) = array(1:size_old)
call move_alloc(tmp, array)
The old array is deallocated automatically when the new array reference is moved by move_alloc().
Fortran 95 deallocates arrays automatically, if they fall out of scope (end of their subroutine for example).
Fortran 2008 has a nice feature of automatic allocation on assignment. If you say array1=array2 and array1 is not allocated, it is automatically allocated to have the correct shape.
It can also be used for re-allocation (see also Fortran array automatically growing when adding a value and How to add new element to dynamical array in Fortran 90)
labelArray = [labelArray, new_element]
Late comment... check Numerical Recipes for Fortran 90. They implemented a nice reallocate function that was Fortran 90 compliant. Your arrays must be pointer attributed in this case, not allocatable attributed.
The function receives the old array and desired size, and returns a pointer to the new resized array.
If at all possible, use Fortran 95 or 2003. If 2003 is impossible, then 95 is a good compromise. It provides better pointer syntax.

Preallocating arrays in Matlab?

I am using a simple for loop to crop a large amount of images and then storing them in a cell array. I keep getting the message:
The variable croppedSag appears to change size on every loop iteration. Consider preallocating for speed.
I have seen this several times before while coding in MATLAB. I have always ignored it and am curious how much preallocating will increase the runtime if I have, say, 10,000 images or a larger number?
Also, I have read about preallocating in the documentation and it says to use zeros() for that purpose. How would I use that for the code below?
croppedSag = {};
for i = 1:sagNum
croppedSag{end+1} = imcrop(SagArray{i},rect);
end
I didn't quite follow the examples in the documentation.
Pre-allocating an array is always a good idea in Matlab. The alternative is to have an array which grows during each iteration through a loop. Each time an element is added to the end of the array, Matlab must produce a totally new array, copy the contents of the old array into the new one, and then, finally, add the new element at the end. Pre-allocating eliminates the need to allocate a new array and spend time copying the existing contents of the array into the new memory.
However, in your case, you might not see as much benefit as you might expect. When copying the cell array to a new, enlarged cell array, Matlab doesn't actually have to copy the contents of the cell array (the image data), but only pointers to that data.
Nonetheless, there is no reason not to pre-allocate (unless you actually don't know the final size in advance). Here's a pre-allocated version of your loop:
croppedSag = cell(1, sagNum);
for ii = 1:sagNum
croppedSag{ii} = imcrop(SagArray{ii}, rect);
end
I also changed the index variable "i" to "ii" so that it doesn't over-write the imaginary unit.
You can also re-write this loop in one line using the cellfun function:
croppedSag = cellfun(#(im) imcrop(im, rect), SagArray);
Here's a blog entry that might be informative:
Matlab - Speed up your Code by Preallocating the size of Arrays, Cells, and Structures

Resources