I'm writing a neural network in Julia that tests random topologies. I've left all indices in an array of nodes that are not occupied by a node (but which may be under a future topology) undefined as it saves memory. When a node in an old topology is no longer connected to other nodes in a new topology, is there a way to uninitialize the index to which the node belongs? Also, are there any reasons not to do it this way?
local preSyns = Array(Vector{Int64}, (2,2))
println(preSyns)
preSyns[1] = [3]
println(preSyns)
The output
[#undef #undef
#undef #undef]
[[1] #undef
#undef #undef]
How do I make the first index undefined as it was during the first printout?
In case you dont believe me please about the memory issue take a look below
function memtest()
y = Array(Vector{Int64}, 100000000)
end
function memtestF()
y = fill!(Array(Vector{Int64}, 100000000),[])
end
#time memtest()
#time memtestF()
Output
elapsed time: 0.468254929 seconds (800029916 bytes allocated)
elapsed time: 30.801266299 seconds (5600291712 bytes allocated, 69.42% gc time)
an un initialized array takes 0.8 gigs and initialized takes 5 gigs.
My activity monitor also confirms this.
Undefined values are essentially null pointers, and there's no first-class way to "unset" an array element back to a null pointer. This becomes more difficult with very large arrays since you don't want to have much (or any) overhead for your sentinel values that represent unconnected nodes. On a 64 bit system, an array of 100 million elements takes up ~800MB for just the pointers alone, and an empty array takes up 48 bytes for its header metadata. So if you assign an independent empty array to each element, you end up with ~5GB worth of array headers.
The behavior of fill! in Julia 0.3 is a little flakey (and has been corrected in 0.4). If, instead of filling your array with [], you fill! it with an explicitly typed Int64[], every element will point to the same empty array. This means that your array and its elements will only take up 48 more bytes than the uninitialized array. But it also means that modifying that subarray for one of your nodes (e.g., with push!) will mean that all nodes will get that connection. This probably isn't what you want. You could still use the empty array as a sentinel value, but you'd have to be very careful to not modify it.
If your array is going to be densely packed with subarrays, then there's no straightforward way around this overhead for the array headers. A more robust and forward-compatible way of initializing an array with independent empty arrays is with a comprehension: Vector{Int64}[Int64[] for i=1:2, j=1:2]. This will also be more performant in 0.3 as it doesn't need to convert [] to Int64[] for each element. If each element is likely to contain a non-empty array, you'll need to pay the cost of the array overhead in any case. To remove the node from the topology, you'd simply call empty!.
If, however, your array of nodes is going to be sparsely packed, you could try a different data structure that supports unset elements more directly. Depending upon your use-case, you could use a default dictionary that maps an index-tuple to your vectors (from DataStructures.jl; use a function to ensure that the "default" is a newly allocated and independent empty array every time), or try a package dedicated to topologies like Graphs.jl or LightGraphs.jl.
Finally, to answer the actual question you asked, yes, there is a hacky way to unset an element of an array back to #undef. This is unsupported and may break at any time:
function unset!{T}(A::Array{T}, I::Int...)
isbits(T) && error("cannot unset! an element of a bits array")
P = pointer_to_array(convert(Ptr{Ptr{T}}, pointer(A)), size(A))
P[I...] = C_NULL
return A
end
Related
I am working on a Swift project the involves very large dynamically changing arrays. I am running into a problem where each successive operation take longer than the former. I am reasonably sure this problem is caused by appending to the arrays, as I get the same problem with a simple test that just appends to a large array.
My Test Code:
import Foundation
func measureExecution(elements: Int, appendedValue: Int) -> Void {
var array = Array(0...elements)
//array.reserveCapacity(elements)
let start = DispatchTime.now()
array.append(appendedValue)
let end = DispatchTime.now()
print(Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000_000)
}
for i in 0...100 {
measureExecution(elements: i*10000, appendedValue: 1)
}
This tries for a 100 different array sizes between 10000 and 1000000, timing how long it take to append one item to the end of the array. As I understand it, Swift arrays are dynamic arrays that will reallocate memory geometrically (it allocates more and more memory each time it needs to reallocate), which Apple's documentation says should mean appending a single element to an array is an O(1) operation when averaged over many calls to the append(_:) method (source). As such, I don't think memory allocation is causing the issue.
However, there is a linear relationship between the length of the array and the time it takes to append an element. I graphed the times for a bunch of array lengths, and baring some outliers it is pretty clearly O(n). I also ran the same test with reserved capacity (commented out in the code block) to confirm that memory allocation was not the issue, and I got nearly identical results:
How do I efficiently append to the end of massive arrays (preferably without using reserveCapacity)?
From what I've read, Swift arrays pre-allocate storage. Each time you fill an Array's allocated storage, it doubles the space allocated. That way you don't do a new memory allocation that often, and also don't allocate a bunch of space you don't need.
The Array class does have a reserveCapacity(_:). If you know how many elements you are going to store you might want to try that.
For example: I want to use reflect to get a slice's data as an array to manipulate it.
func inject(data []int) {
sh := (*reflect.SliceHeader)(unsafe.Pointer(&data))
dh := (*[len(data)]int)(unsafe.Pointer(sh.Data))
printf("%v\n", dh)
}
This function will emit a compile error for len(data) is not a constant. How should I fix it?
To add to the #icza's comment, you can easily extract the underlying array by using &data[0]—assuming data is an initialized slice. IOW, there's no need to jump through the hoops here: the address of the first slice's element is actually the address of the first slot in the slice's underlying array—no magic here.
Since taking an address of an element of an array is creating
a reference to that memory—as long as the garbage collector is
concerned—you can safely let the slice itself go out of scope
without the fear of that array's memory becoming inaccessible.
The only thing which you can't really do with the resulting
pointer is passing around the result of dereferencing it.
That's simply because arrays in Go have their length encoded in
their type, so you'll be unable to create a function to accept
such array—because you do not know the array's length in advance.
Now please stop and think.
After extracting the backing array from a slice, you have
a pointer to the array's memory.
To sensibly carry it around, you'll also need to carry around
the array's length… but this is precisely what slices do:
they pack the address of a backing array with the length of the
data in it (and also the capacity).
Hence really I think you should reconsider your problem
as from where I stand I'm inclined to think it's a non-problem
to begin with.
There are cases where wielding pointers to the backing arrays
extracted from slices may help: for instance, when "pooling"
such arrays (say, via sync.Pool) to reduce memory churn
in certain situations, but these are concrete problems.
If you have a concrete problem, please explain it,
not your attempted solution to it—what #Flimzy said.
Update I think I should may be better explain the
you can't really do with the resulting
pointer is passing around the result of dereferencing it.
bit.
A crucial point about arrays in Go (as opposed to slices)
is that arrays—as everything in Go—are passed around
by value, and for arrays that means their data is copied.
That is, if you have
var a, b [8 * 1024 * 1024]byte
...
b = a
the statement b = a would really copy 8 MiB of data.
The same obviously applies to arguments of functions.
Slices sidestep this problem by holding a pointer
to the underlying (backing) array. So a slice value
is a little struct type containing
a pointer and two integers.
Hence copying it is really cheap but "in exchange" it
has reference semantics: both the original value and
its copy point to the same backing array—that is,
reference the same data.
I really advise you to read these two pieces,
in the indicated order:
https://blog.golang.org/go-slices-usage-and-internals
https://blog.golang.org/slices
It is said that an example of a O(1) operation is that of accessing an element in an array. According to one source, O(1) can be defined in the following manner:
[Big-O of 1] means that the execution time of the algorithm does not depend on
the size of the input. Its execution time is constant.
However, if one wants to access an element in an array, does not the efficiency of the operation depend on the amount of elements in the array? For example
int[] arr = new int[1000000];
addElements(arr, 1000000); //A function which adds 1 million arbitrary integers to the array.
int foo = arr[55];
I don't understand how the last statement can be described as running in O(1); does not the 1,000,000 elements in the array have a bearing on the running time of the operation? Surely it'd take longer to find element 55 than it would element 1? If anything, this looks to me like O(n).
I'm sure my reasoning is flawed, however, I just wanted some clarification as to how this can be said to run in O(1)?
Array is a data structure, where objects are stored in continuous memory location. So in principle, if you know the address of base object, you will be able to find the address of ith object.
addr(a[i]) = addr(a[0]) + i*size(object)
This makes accessing ith element of array O(1).
EDIT
Theoretically, when we talk about complexity of accessing an array element, we talk for fixed index i.
Input size = O(n)
To access ith element, addr(a[0]) + i*size(object). This term is independent of n, so it is said to be O(1).
How ever multiplication still depends on i but not on n. It is constant O(1).
The address of an element in memory will be the base address of the array plus the index times the size of the element in the array. So to access that element, you just essentially access memory_location + 55 * sizeof(int).
This of course assumes you are under the assumption multiplication takes constant time regardless of size of inputs, which is arguably incorrect if you are being very precise
To find an element isn't O(1) - but accessing element in the array has nothing to do with finding an element - to be precise, you don't interact with other elements, you don't need to access anything but your single element - you just always calculate the address, regardless how big array is, and that is that single operation - hence O(1).
The machine code (or, in the case of Java, virtual machine code) generated for the statement
int foo = arr[55];
Would be essentially:
Get the starting memory address of arr into A
Add 55 to A
Take the contents of the memory address in A, and put it in the memory address of foo
These three instructions all take O(1) time on a standard machine.
In theory, array access is O(1), as others have already explained, and I guess your question is more or less a theoretical one. Still I like to bring in another aspect.
In practice, array access will get slower if the array gets large. There are two reasons:
Caching: The array will not fit into cache or only into a higher level (slower) cache.
Address calculation: For large arrays, you need larger index data types (for example long instead of int). This will make address calculation slower, at least on most platforms.
If we say the subscript operator (indexing) has O(1) time complexity, we make this statement excluding the runtime of any other operations/statements/expressions/etc. So addElements does not affect the operation.
Surely it'd take longer to find element 55 than it would element 1?
"find"? Oh no! "Find" implies a relatively complex search operation. We know the base address of the array. To determine the value at arr[55], we simply add 551 to the base address and retrieve the value at that memory location. This is definitely O(1).
1 Since every element of an int array occupies at least two bytes (when using C), this is no exactly true. 55 needs to be multiplied by the size of int first.
Arrays store the data contiguously, unlike Linked Lists or Trees or Graphs or other data structures using references to find the next/previous element.
It is intuitive to you that the access time of first element is O(1). However you feel that the access time to 55th element is O(55). That's where you got it wrong. You know the address to first element, so the access time to it is O(1).
But you also know the address to 55th element. It is simply address of 1st + size_of_each_element*54 .
Hence you access that element as well as any other element of an array in O(1) time. And that is the reason why you cannot have elements of multiple types in an array because that would completely mess up the math to find the address to nth element of an array.
So, access to any element in an array is O(1) and all elements have to be of same type.
I want to make a 2D array "data" with the following dimensions: data(T,N)
T is a constant and N I dont know anything about to begin with. Is it possible to do something like this in fortran
do i = 1, T
check a few flags
if (all flags ok)
c = c+ 1
data(i,c) = some value
end if
end do
Basically I have no idea about the second dimension. Depending on some flags, if those flags are fine, I want to keep adding more elements to the array.
How can I do this?
There are several possible solutions. You could make data an allocatable array and guess the maximum value for N. As long as you don't excess N, you keep adding data items. If a new item would exceed the array size, you create a temporary array, copy data to the temporary array, deallocate data and reallocate with a larger dimension.
Another design choice would be to use a linked list. This is more flexible in that the length is indefinite. You loss "random access" in that the list is chained rather than indexed. You create an user defined type that contains various data, e.g., scalers, arrays, whatever, and also a pointer. When you add a list item, the pointer points to that next item. The is possible in Fortran >=90 since pointers are supported.
I suggest searching the web or reading a book about these data structures.
Assuming what you wrote is more-or-less how your code really goes, then you assuredly do know one thing: N cannot be greater than T. You would not have to change your do-loop, but you will definitely need to initialize data before the loop.
My program is running though 3D array, labelling 'clusters' that it finds and then doing some checks to see if any neighbouring clusters have a label higher than the current cluster. There's a second array that holds the 'proper' cluster label. If it finds that the nth adjoining cluster is labelled correctly, that element is assigned to 0, otherwise is assigns it to the correct label (for instance if the nth site has label 2, and a neighbour is labeled 3, the 3rd element of the labelArray is set to 2). I've got a good reason to do this, honest!
All I want is to be able to assign the nth element of the labelArray on the fly. I've looked at allocatable arrays and declaring things as labelArray(*) but I don't really understand these, despite searching the web, and StackOverflow.
So any help on doing this would be awesome.
Here is a Stack Overflow question with some code examples showing several ways of using Fortran allocatable arrays: How to get priorly-unkown array as the output of a function in Fortran: declaring, allocating, testing for being already being allocated, using the new move_alloc and allocation on assignment. Not shown there is explicit deallocation, since the examples are using move_alloc and automatic deallocation on exit of a procedure.
P.S. If you want to repeatedly add one element you should think about your data structure approach. Adding one element at a time by growing an array is not an efficient approach. To grow an array from N elements to N+1 in Fortran will likely mean creating a new array and copying all of the existing elements. A more appropriate data structure might be a linked list. You can create a linked list in Fortran by creating a user-defined type and using pointers. You chain the members together, pointing from one to the next. The overhead to adding another member is minor. The drawback is that it is easiest to access the members of the list in order. You don't have the easy ability of an array, using indices, to access the members in any order.
Info about linked lists in Fortran that I found on the web: http://www-uxsup.csx.cam.ac.uk/courses/Fortran/paper_12.pdf and http://www.iag.uni-stuttgart.de/IAG/institut/abteilungen/numerik/images/4/4c/Pointer_Introduction.pdf
If you declare an array allocatable, you use deffered shape in the form real,
allocatable :: labelArray(:,:)
, or
real,dimension(:,:),allocatable :: labelArray
with number of double colons meaning rank (number of your indexes) of your array.
If the array is unallocated you use
allocate(labelarray(shapeyouwant))
with the correct number of indexes. For example allocate(labelarray(2:3,-1:5)) for array with indexes 2 to 3 in demension 1 and -1 to 5 in dimension 2.
For change of dimension you have to deallocate the array first using
deallocate(labelArray)
To reallocate an allocated array to a new shape you first need to allocate a new array with the new shape, copy the existing array to the new array and move the reference of the old array to the new array using move_alloc().
call allocate(tmp(size_old+n_enlarge))
tmp(1:size_old) = array(1:size_old)
call move_alloc(tmp, array)
The old array is deallocated automatically when the new array reference is moved by move_alloc().
Fortran 95 deallocates arrays automatically, if they fall out of scope (end of their subroutine for example).
Fortran 2008 has a nice feature of automatic allocation on assignment. If you say array1=array2 and array1 is not allocated, it is automatically allocated to have the correct shape.
It can also be used for re-allocation (see also Fortran array automatically growing when adding a value and How to add new element to dynamical array in Fortran 90)
labelArray = [labelArray, new_element]
Late comment... check Numerical Recipes for Fortran 90. They implemented a nice reallocate function that was Fortran 90 compliant. Your arrays must be pointer attributed in this case, not allocatable attributed.
The function receives the old array and desired size, and returns a pointer to the new resized array.
If at all possible, use Fortran 95 or 2003. If 2003 is impossible, then 95 is a good compromise. It provides better pointer syntax.