Network of dependent values - how to only recalculate them once each? - arrays

I'm hoping one of you guys can figure this out.
I have an array containing lots of objects. Each object in the array contains two things:
A value which can change.
A list of zero or more of the other objects in my array which, if their values change, then this object needs to recalculate it's value. This can cascade many times from object to object, but there is no looping of dependencies.
I believe this is called a network (like a tree, but with multiple parents). Specifically, this is a Directed Acyclic Graph.
What I'm doing right now is this: when I change an object's value, I check every object in the array to see if it depends on the object I just changed. If it does, then I tell this child object to recalculate. Then the child tells it's children in the same way, and so on.
This works (the values update correctly), but it's very very slow when a change is made that cascades wide and deep. This is because if an object has many parents that change, it recalculates again for each one, and also tells it's children to recalculate each time, so they get several messages from just one parent. This quickly snowballs until many objects are recalculating dozens of times.
What's the best way to only recalculate each object once, after all of it's parents has recalculated?
Thanks for your help.

It sounds like you want a Topological Sort of a Directed Acyclic Graph. See for example http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Graph/DAG/
If your graph isn't constantly changing you should be able to simply sort it once and from then on you can execute your updates in order from left to right knowing that at each step the set of nodes you will be adding to the list to be computed are all to the right of the current position. There's a few ways you could optimize that, maybe store them in a simple heap, pick the leftmost value off each time, recalculate it and add back any nodes it references, or as someone else has suggested, if the full dependency graph is small enough, just store it on each node in the order in which it needs to be calculated (as found using the Topological Sort).

Create an acyclic digraph with vertices given by the nodes in your array and an edge i --> j whenever a change in i necessitates a recalc of j (i.e. i is in the list for object j). The graph is acyclic iff your process is finite.
Now, when i changes, do a breadth first search to recalculate dependent nodes. At first pass, gather all nodes j such that i --> j. Recalculate those j. At the second pass, take each j that changed and get its dependents j --> k. Then recalculate those k at once. Continue by taking all the dependents of the ks that changed, and so on, until there are only leaves.
This requires you to keep a list of neighbors, which is the inverse of the information that you have. So you need to do one pass to get the directed edges (fill an array so that entry i has the array of all j for which i --> j).

When a value gets updated, create a list of the other elements whose values still need to be recalculated. Before recalculating the value of an element in this list, make sure that none of the other elements on which it depends are also in the list. This will ensure that an element will only get recalculated once. Since there are no circular dependencies, there will always be at least one element in the list of elements to be recalculated which has already had all of its dependent elements recalculated.
Pseudo code:
Create a set S
Add initially updated element to S
while S is not empty
remove an element X from S whose value does not depend on any other elements in E do not exist in S
recalculate X's value and add any elements that depend on X's value to S

Related

Why is looking for an item in a hash map faster than looking for an item in an array?

You might have come across someplace where it is mentioned that it is faster to find elements in hashmap/dictionary/table than list/array. My question is WHY?
(inference so far I made: Why should it be faster, as far I see, in both data structure, it has to travel throughout till it reaches the required element)
Let’s reason by analogy. Suppose you want to find a specific shirt to put on in the morning. I assume that, in doing so, you don’t have to look at literally every item of clothing you have. Rather, you probably do something like checking a specific drawer in your dresser or a specific section of your closet and only look there. After all, you’re not (I hope) going to find your shirt in your sock drawer.
Hash tables are faster to search than lists because they employ a similar strategy - they organize data according to the principle that every item has a place it “should” be, then search for the item by just looking in that place. Contrast this with a list, where items are organized based on the order in which they were added and where there isn’t a a particular pattern as to why each item is where it is.
More specifically: one common way to implement a hash table is with a strategy called chained hashing. The idea goes something like this: we maintain an array of buckets. We then come up with a rule that assigns each object a bucket number. When we add something to the table, we determine which bucket number it should go to, then jump to that bucket and then put the item there. To search for an item, we determine the bucket number, then jump there and only look at the items in that bucket. Assuming that the strategy we use to distribute items ends up distributing the items more or less evenly across the buckets, this means that we won’t have to look at most of the items in the hash table when doing a search, which is why the hash table tends to be much faster to search than a list.
For more details on this, check out these lecture slides on hash tables, which fills in more of the details about how this is done.
Hope this helps!
To understand this you can think of how the elements are stored in these Data structures.
HashMap/Dictionary as you know it is a key-value data structure. To store the element, you first find the Hash value (A function which always gives a unique value to a key. For example, a simple hash function can be made by doing the modulo operation.). Then you basically put the value against this hashed key.
In List, you basically keep appending the element to the end. The order of the element insertion would matter in this data structure. The memory allocated to this data structure is not contiguous.
In Array, you can think of it as similar to List. But In this case, the memory allocated is contiguous in nature. So, if you know the value of the address for the first index, you can find the address of the nth element.
Now think of the retrieval of the element from these Data structures:
From HashMap/Dictionary: When you are searching for an element, the first thing that you would do is find the hash value for the key. Once you have that, you go to the map for the hashed value and obtain the value. In this approach, the amount of action performed is always constant. In Asymptotic notation, this can be called as O(1).
From List: You literally need to iterate through each element and check if the element is the one that you are looking for. In the worst case, your desired element might be present at the end of the list. So, the amount of action performed varies, and in the worst case, you might have to iterate the whole list. In Asymptotic notation, this can be called as O(n). where n is the number of elements in the list.
From array: To find the element in the array, what you need to know is the address value of the first element. For any other element, you can do the Math of how relative this element is present from the first index.
For example, Let's say the address value of the first element is 100. Each element takes 4 bytes of memory. The element that you are looking for is present at 3rd position. Then you know the address value for this element would be 108. Math used is
Addresses of first element + (position of element -1 )* memory used for each element.
That is 100 + (3 - 1)*4 = 108.
In this case also as you can observe the action performed is always constant to find an element. In Asymptotic notation, this can be called as O(1).
Now to compare, O(1) will always be faster than O(n). And hence retrieval of elements from HashMap/Dictionary or array would always be faster than List.
I hope this helps.

Storing and replacing values in array continuously

I'm trying to read amplitude from a waveform and shine a green, yellow or red light depending on the amplitude of the signal. I'm fairly new to labVIEW and couldnt get my idea that wouldve worked with any other programming language I know to work. What I'm trying to do is take the value of the signal and for everytime it updates I'll store the value of the amplitude into an index of a large array. With each measurement being stored in the n+1 index of the array.
After a certain amount of data points I want to start over and replace values in the array (I use the formula node with the modulus for this). By keeping a finite amount of indexes to check for max value I restrict my amplitude check to a certain time period.
However my problem is that whenever I use the replace array subset to insert a new value into index n, all the other index points get erased. Rendering it pretty much useless. I was thinking its the Initialize array causing problems but I just cant seem to wrap my head around what to do here.
I tried creating just basic arrays in the front panel, but those either are control or indicator arrays and can't seem to be both written and read from, its either control (read but not write) or indicate(write but not read)?. Maybe its just not possible to do what I had in mind in an eloquent way in LabVIEW. If its not possible to do this with arrays in LabVIEW I will look for a different way to do it.
I'm pretty sure I got most of the rest of the code down except for an unfinished part here and there. Its just my issue with the arrays not working as I want them too.
I expected the array to retain its previously inputted data for index n-1 when index n is inputted. And only to be replaced once the index has come back to that specific point.
Instead its like a new array is initialized every time a new index is input.
download link for the VI
What you want to do:
Transport the content of the modified array into the next iteration of the WHILE loop.
What happens:
On each iteration, the content of the array is the same. It is the content of the initial array you created outside.
To solve this, right-click the orange square on the left border of the loop, and make it a "shift register". The symbol changes, and a similar symbol appears on the right border. Now, wire the modified array to the symbol on the right. What flows out into that symbol on the right, comes in from the left symbol on the next iteration.
Edit:
I have optimized your code a little. There is a modulo function, and an IF clause can handle ranges. ..3 means "values lower or equal 3". The next case is "Default", the next "7..". Unfortunately, this only works for integers. Otherwise, one would use nested IF clauses with the < comparator or similar.

Algorithm for finding the lowest price to get through an array

I've been thinking a lot about the following problem:
We are given an array of n numbers. We start at the first index and our task is to get to the last index. Every move we can jump one or two steps forward and the number at the index we jump to represents the cost we need to pay for visiting that index. We need to find the cheapest way for getting to the end of the array.
For example if the array looks like this: [2,1,4,2,5] the cheapest way to get to the end is 10: we visit the indexes 1->2->4->5 and we have to pay 2+1+2+5 = 10 which is the cheapest possible way. Let f(i) be the cheapest price to get to the index i. This we can calculate easily in O(n) time with dynamic programming by realizing that f(i) = arr[i] + min(f(i-1),f(i-2))
But here's the twist:
The array gets updated several times and after every update we need to be able to tell in O(logn) time what is the cheapest way at the moment. Updating the array happens by telling the index which will be changed and the number it will be changed to. For example the update could be arr[2] = 7 changing our example array to [2,7,4,2,5]. Now the cheapest way would be 11.
Now how can we support these updates in O(logn) time? Any ideas?
Here's what I've come up with so far:
First I would create an array f for the dynamic programming like described before. I would store the content of this array in a segment tree s in the following way: s(i) = f(i) - f(i-1). This would allow me to update intervals of f (adding a constant to every value) in O(logn) time and ask for the values at a given index in O(logn) time. This would come handy since after some updates it often happens that all the values in f would need to be increased by a constant after some given index in f. So by asking for the value of s(n) in the segment tree after every update we would get the answer we need.
There are however different things that can happen after an update:
Only one value in f needs to be updated. For exampe if [2,1,4,2,5,4,9,3,7] gets changed to [2,1,9,2,5,4,9,3,7] only f(3) would need to be updated, since no cheapest way went through the 3. index anyway.
All the values in f after a given index need to be updated by a constant. This is what the segment tree is good for.
Every other value in f after a given index needs to be updated by a constant.
Something more random.
Alright, I managed to solve the problem all myself so I decided to share the solution with you. :)
I was on the right track with dynamic programming and segment tree, but i was feeding the segment tree in a wrong way in my previous attempts.
Here's how we can support the updates in O(logn) time:
The idea is to use a binary segment tree where the leaves of the tree represent the current array and every node stores 4 different values.
v1 = The lowest cost to get from the leftmost descendant to the rightmost descendant
v2 = The lowest cost to get from the leftmost descendant to the second rightmost descendant
v3 = The lowest cost to get from the second leftmost descendant to the rightmost descendant
v4 = The lowest cost to get from the second leftmost descendant to the second rightmost descendant
With descendants I mean the descendants of the node that are also leaves.
When updating the array we update the value at the leaf and then all its ancestors up to the root. Since at every node we already know all of these 4 values of its two children, we can calculate easily the new 4 values for the current parent node. Just to give an example: v1_current_node = min(v2_leftchild+v1_rightchild, v1_leftchild+v1_rightchild, v1_leftchild+v3_rightchild). All the other three values can be calculated in a similar way.
Since there are only O(logn) ancestors for every leaf, and all 4 values are calculated in O(1) time it takes only O(logn) time to update the entire tree.
Now that we know the 4 values for every node, we can in a similar way calculate the lowest cost from the first to the n:th node by using the nodes of the highest powers of 2 in our path in the tree that add up to n. For example if n = 11 we want to know the lowest cost from first to eleventh node and this can be done by using the information of the node that covers the leaves 1-8, the node that covers the leaves 9-10 and the leaf node 11. For all of those three nodes we know the 4 values described and we can in a similar way combine that information to figure out the answer. At very most we need to consider O(logn) nodes for doing this so that is not a problem.

Difference between Array, Set and Dictionary in Swift

I am new to Swift Lang, have seen lots of tutorials, but it's not clear – my question is what's the main difference between the Array, Set and Dictionary collection type?
Here are the practical differences between the different types:
Arrays are effectively ordered lists and are used to store lists of information in cases where order is important.
For example, posts in a social network app being displayed in a tableView may be stored in an array.
Sets are different in the sense that order does not matter and these will be used in cases where order does not matter.
Sets are especially useful when you need to ensure that an item only appears once in the set.
Dictionaries are used to store key, value pairs and are used when you want to easily find a value using a key, just like in a dictionary.
For example, you could store a list of items and links to more information about these items in a dictionary.
Hope this helps :)
(For more information and to find Apple's own definitions, check out Apple's guides at https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/CollectionTypes.html)
Detailed documentation can be found here on Apple's guide. Below are some quick definations extracted from there:
Array
An array stores values of the same type in an ordered list. The same value can appear in an array multiple times at different positions.
Set
A set stores distinct values of the same type in a collection with no defined ordering. You can use a set instead of an array when the order of items is not important, or when you need to ensure that an item only appears once.
Dictionary
A dictionary stores associations between keys of the same type and values of the same type in a collection with no defined ordering. Each value is associated with a unique key, which acts as an identifier for that value within the dictionary. Unlike items in an array, items in a dictionary do not have a specified order. You use a dictionary when you need to look up values based on their identifier, in much the same way that a real-world dictionary is used to look up the definition for a particular word.
Old thread yet worth to talk about performance.
With given N element inside an array or a dictionary it worth to consider the performance when you try to access elements or to add or to remove objects.
Arrays
To access a random element will cost you the same as accessing the first or last, as elements follow sequentially each other so they are accessed directly. They will cost you 1 cycle.
Inserting an element is costly. If you add to the beginning it will cost you 1 cycle. Inserting to the middle, the remainder needs to be shifted. It can cost you as much as N cycle in worst case (average N/2 cycles). If you append to the end and you have enough room in the array it will cost you 1 cycle. Otherwise the whole array will be copied which will cost you N cycle. This is why it is important to assign enough space to the array at the beginning of the operation.
Deleting from the beginning or the end it will cost you 1. From the middle shift operation is required. In average it is N/2.
Finding element with a given property will cost you N/2 cycle.
So be very cautious with huge arrays.
Dictionaries
While Dictionaries are disordered they can bring you some benefits here. As keys are hashed and stored in a hash table any given operation will cost you 1 cycle. Only exception can be finding an element with a given property. It can cost you N/2 cycle in the worst case. With clever design however you can assign property values as dictionary keys so the lookup will cost you 1 cycle only no matter how many elements are inside.
Swift Collections - Array, Dictionary, Set
Every collection is dynamic that is why it has some extra steps for expanding and collapsing. Array should allocate more memory and copy an old date into new one, Dictionary additionally should recalculate basket indexes for every object inside
Big O (O) notation describes a performance of some function
Array - ArrayList - a dynamic array of objects. It is based on usual array. It is used for task where you very often should have an access by index
get by index - O(1)
find element - O(n) - you try to find the latest element
insert/delete - O(n) - every time a tail of array is copied/pasted
Dictionary - HashTable, HashMap - saving key/value pairs. It contains a buckets/baskets(array structure, access by index) where each of them contains another structure(array list, linked list, tree). Collisions are solved by Separate chaining. The main idea is:
calculate key's hash code[About] (Hashable) and based on this hash code the index of bucket is calculated(for example by using modulo(mod)).
Since Hashable function returns Int it can not guarantees that two different objects will have different hash codes. More over count of basket is not equals Int.max. When we have two different objects with the same hash codes, or situation when two objects which have different hash codes are located into the same basket - it is a collision. Than is why when we know the index of basket we should check if anybody there is the same as our key, and Equatable is to the rescue. If two objects are equal the key/value object will be replaces, otherwise - new key/value object will be added inside
find element - O(1) to O(n)
insert/delete - O(1) to O(n)
O(n) - in case when hash code for every object is the same, that is why we have only one bucket. So hash function should evenly distributes the elements
As you see HashMap doesn't support access by index but in other cases it has better performance
Set - hash Set. Is based on HashTable without value
*Also you are able to implement a kind of Java TreeMap/TreeSet which is sorted structure but with O(log(n)) complexity to access an element
[Java Thread safe Collections]

algorithm/data structure for this "enumerating all possibilities" task (combinatorial objects)

This is probably a common question that arises in search/store situations and there is a standard answer. I'm trying to do this from intuition and am somewhat out of my comfort zone.
I'm attempting to generate all of a certain kind of combinatorial object. Each object of size n can be generated from an object of size n-1, usually in multiple ways. From the single object of size 2, my search generates 6 objects of size 3, about 140 objects of size 4, and about 29,000 objects of size 5. As I generate the objects, I store them in a globally declared array. Before storing each object, I have to check all the previous ones stored for that size, to make sure I didn't generate it already from an earlier (n-1)-object. I currently do this in a naive way, which is just that I go through all the objects currently sitting in the array and compare them to the one currently being generated. Only if it's different from every single one there do I add it to the array and increment the number of objects currently in there. The new object is just added as the most recent object in the array, it is not sorted, and so this is obviously inefficient, and I can't hope to generate the objects of size 6 in this way.
(To give an idea of the problem of the growth of the array: the first couple of 4-objects, from among the 140 or so, give rise to over 2000 new 5-objects in a fraction of a second. By the time I've gotten to the last few 4-objects, with over 25,000 5-objects already stored, each 4-object generates only a handful of previously unseen 5-objects, but takes several seconds for the process for each 4-object. There is very little correlation between the order I generate new objects in, and their eventual position as a consequence of the comparison function I'm using.)
Obviously if I had a sorted array of objects, it would be much more efficient to find out whether I'm looking at a new object: using a binary midpoint search strategy I'd only have to look at roughly log_2(n) of the n objects currently stored, instead of all n of them. But placing the newly generated object at the right place in an array means moving half of the existing ones, on average, to make room for it. (I would implement this with an array of pointers pointing to the unsorted array of object structs, so that I only had to move pointers instead of moving data, but it still seems like a lot of pointers to have to repoint at each insert.)
The other option would be to place the objects in a linked list, as insertion is very cheap in that situation. But then I wouldn't have random access to the elements in the linked list--you can only find the right place to insert the newly generated object (if it's actually new) by traversing the list node by node and comparing. On average you'd have to traverse half the list before finding the right insertion point, which doesn't sound any better than repointing half the pointers.
Is there a third choice I'm missing? I could accomplish this very easily if I had both random access to stored elements so I could find the insertion point quickly (in log_2(n) steps), and I could insert new objects very cheaply, like in a linked list. Am I dreaming?
To summarise: I need to be able to determine whether an object is new or duplicates an existing one, and I need to be able to insert an object at the right place. I don't ever need to delete an object. Thank you.

Resources