How to make a route from an array - arrays

I would like to create a route In Python from the following data
([[ 0, 1],[ 0, 2],[ 0, 8],[ 1, 7],[ 2, 9],[ 3, 6],[ 4, 3],[ 5, 0],[ 6, 0], [ 7, 0][ 8, 4],[ 9, 5],[10, 10]])
The outcome that I would like to have would be a route like [0,1][1,7],[7,0] (0-1-7-0) and [0,2],[2,9][9,5][5,0] (0-2-9-5-0) and [0,8][8,4][4,3][3,6][6,0] (0-8-4-3-6-0). I have tried making the first array into a tuple and I think that would work, but can't seem to find how to sort the array.
Is there some kind of loop which could help me sort it and thereafter make tuples of it?

The "data" that you have describes a graph. I suggest adding this keyword to your question.
The vertices of the graph are the integers 0 to 10. The edges of the graph are your 2-element lists [0, 1], [0, 2], [0, 8], etc.
Now you are looking for a path through your graph. What requirements do you have on the path you're looking for? Should it be the longest path in the graph? Should it be the shortest path between two particular nodes?
If you're only looking for a maximal path, i.e. a path that cannot be further extended, then a greedy algorithm suffices:
initialize the path with one arbitrary edge
loop:
look for an edge in data that can extend the path
pop the edge from data and add it to the path
break the loop when no appropriate edge can be found in data

Related

extract blocks of columns (as seperated subarrays) indicated by 1D binary array

Based on a 1D binary mask, for example, np.array([0,0,0,1,1,1,0,0,1,1,0]), I would like to extract the columns of another array, indicated by the 1's in the binary mask, as as sub-arrays/separate blocks, like [9, 3.5, 7]) and [2.8, 9.1] (I am just making up the numbers to illustrate the point).
So far what I have (again just as a demo to illustrate what my goal is, not the data where this operation will be performed):
arr = torch.from_numpy(np.array([0,0,0,1,1,1,0,0,1,1,0]))
split_idx = torch.where(torch.diff(arr) == 1)[0]+1
torch.tensor_split(arr, split_idx.tolist())
The output is:
(tensor([0, 0, 0]),
tensor([1, 1, 1]),
tensor([0, 0]),
tensor([1, 1]),
tensor([0]))
What I would like to have in the end is:
(tensor([1, 1, 1]),
tensor([1, 1]))
Do you know how to implement it, preferably in pytorch, but numpy functions are also fine. A million thanks in advance!!
You can construct your tensor of slice indices with your approach. Only thing is you were missing the indices for the position of the end of each slice. You can do something like:
>>> slices = arr.diff().abs().nonzero().flatten()+1
tensor([ 3, 6, 8, 10])
Then apply tensor_split and slice to only keep every other element:
>>> torch.tensor_split(arr, slices)[1::2]
(tensor([1, 1, 1]), tensor([1, 1]))

Tensorflow JS Probabilties

i have multiple feature columns and a result column where i want to predict if something happens or not.
so I'm training my model and finally i do
const predictions = model.predict(xTest).argMax(-1);
this returns a tensor and when getting the data with:
predictions.dataSync ()
i get values like [0, 1, 1, 1, 0, ...]
is there any way to get probabilities like in python? [0.121, 0.421, 0.8621, ...]
I only found one result:
https://groups.google.com/a/tensorflow.org/g/tfjs/c/TvcB69MUj_I?pli=1
is this still the case? are there no probabilities in javascript?
tf.argMax returns the indices of the maximum value along the axis. If you rather want to have the maximum value itself you could use tf.max instead
const x = tf.tensor2d([[1, 2, 3],[ 4, 8, 4]]);
x.max(-1).print() // [3, 8]
x.argMax(-1).print() // [2, 1]

Is there any way to increment past an array element in Python? (Like how you can with pointer arithmetic in C?)

I have
arr = [6, 5, 4, 3, 2, 1]
I want to use the ...5, 4, 3, 2, 1] part of the array in a recursive call while keeping the 6 in the array (at its current position) for future use.
It feels very similar to pointer arithmetic in C, I'm just not sure how to implement something like that in Python (ver 3.7). I'm lost as to how to preserve the 6 in the array at it's position, which is essential as the array needs to be maintained in sorted descending order.
Any guidance on how to get around this is appreciated.
You can access the elements of the arr list in the following manner while not disturbing the elements of it
>>> arr[2:]
[4, 3, 2, 1]
>>> arr[1:]
[5, 4, 3, 2, 1]
>>> arr
[6, 5, 4, 3, 2, 1]
>>> arr[2:4]
[4, 3]
List indexing doesn't affect the elements inside. I hope this answers your question
You can use a part of an array in a recursive call without changing the original array using start and end index.
arr=[6,5,4,3,2,1], now if you want to use arr from index=1 then just pass
start and end i.e, fun(arr, start, end) in the function fun where, start=1 and end=length-1, this will not modify the original array and at the same time you can use array from start to end in recursive calls.

Find edge points of numpy array for kmeans centroids initialization

I am working on implementing a kmeans algorithm in python.
I am testing out new ways of initializing my centroids and wanted to implement it and see what affect it would have on the cluster.
My idea is to select datapoints from my data set in a way that the centroids are initialized to edge points of my data.
Simple example 2 attribute example:
Lets say this is my input array
input = array([[3,3], [1,1], [-1,-1], [3,-3], [-1,1], [-3,3], [1,-1], [-3,-3]])
From this array I would like to select the edges points which would be [3,3] [-3,-3] [-3,3] [3,-3]. So if my k is 4, these points would be selected
In the data that I am working with has 4 and 9 attributes and around 300 data points in my data set
note: I have no found a solution to when k <> edge points but if k is > edge points I think I would select these 4 points and then try to place the rest of them around the center point of the graph
I have also thought about finding max and min for each column and from there try to find the edges of my data set but I don't have an idea of an effective way of identifying the edges from these values.
If you believe this idea will not work I would love to hear what you have to say.
Questions
Does numpy have such a function to get the indexes of data points on the edge of my data set?
If not, how would I go at finding these edge points in my data set?
Use scipy and pair-wise distances to find how farther each one is from another:
from scipy.spatial.distance import pdist, squareform
p=pdist(input)
Then, use sqaureform to get p vector into a matrix shape:
s=squareform(pdist(input))
Then, use numpy argwhere to find the indices where values are max or are extreme, and then look up those indices in the input array:
input[np.argwhere(s==np.max(p))]
array([[[ 3, 3],
[-3, -3]],
[[ 3, -3],
[-3, 3]],
[[-3, 3],
[ 3, -3]],
[[-3, -3],
[ 3, 3]]])
Complete code would be:
from scipy.spatial.distance import pdist, squareform
p=pdist(input)
s=squareform(p)
input[np.argwhere(s==np.max(p))]

Usage of Pydatalog Aggregate Functions

I have been playing around with the various aggregate functions to get a feel for them, and after being confused for the past few days I am in need of clarification. I either get completely unintuitive behavior or unhelpful errors. For instance, I test:
(p[X]==min_(Y, order_by=Z)) <= Y.in_((4,6,2)) & Z.in_((6,))
looking at sample output:
p[0]==X,Y,Z
([(6,)], [4, 6, 2], [6, 6, 6])
p[1]==X,Y,Z
([(6,)], [6, 4, 2], [6, 6, 6])
p[2]==X,Y,Z
([(6,)], [4, 2, 6], [6, 6, 6])
Why is the minimum 6? 2. Why has the value bound to Z been repeated 3 times? 3. What exactly is the purpose of 'order_by' in relation to the list from which a minimum value is found? 4. Why does the output change based upon if there are multiple values in the 'order_by' list; why does a specific value--6, in this case--in the 'order_by' list effect the output as it has? Another example:
(p[X]==min_(Y, order_by=Z)) <= Y.in_((4,6,2)) & Z.in_((0,))
Output:
p[0]==X,Y,Z
([(6,)], [4, 6, 2], [0, 0, 0])
p[1]==X,Y,Z
([(6,)], [2, 6, 4], [0, 0, 0])
p[2]==X,Y,Z
([(2,)], [2, 6, 4], [0, 0, 0])
Why did the output of X change--from 6 to 2--based upon the indexed provided? Even though the output was wrong in the previous example, at least it was consistent for the indexes used; with there only being one min/max, this makes since.
I at least get to see the output using the min_, max_, sum_ functions; but, I am lost when it comes to rank_ and running_sum_. I follow a similar process when defining my function:
(p[X]==running_sum_(Z, group_by=Z, order_by=Z)) <= Z.in_((43,34,65))
I try to view the output:
p[0]==X
I get the error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/UserList.py", line 16, in repr
def repr(self): return repr(self.data)
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 109, in data
self.todo.ask()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 566, in ask
self._data = Body(self.pre_calculations, self).ask()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 686, in ask
self._data = literal.lua.ask()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 909, in _
invoke(subgoal)
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 664, in invoke
todo.do() # get the thunk and execute it
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 640, in do
self.thunk()
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 846, in
aggregate.complete(base_subgoal, subgoal))
File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 820, in complete
result = [ tuple(l.terms) for l in list(base_subgoal.facts.values())]
AttributeError: 'bool' object has no attribute 'values'
What does this mean? What was done incorrectly? What are the relations shared by the running_sum_ (and rank_) parameters--'group_by' and 'order_by'?
As there seems to be no examples on the web, 2 or 3 short examples of rank_ and running_sum_ usage would be greatly appreciated.
Aggregate clauses are solved in 2 steps :
first resolve the unknowns in the clause, while ignoring the aggregate function
then apply the aggregate function on the result
Here is how you could write the first clause :
(p[None]==min_(Y, order_by=Y)) <= Y.in_((4,6,2))
The variable(s) in the bracket after p is used as the "group by" in SQL, and must also appear in the body of the clause. In this case, it does not vary, so I use None. The order_by variable is needed when you want to retrieve another value than the one you order by.
Let's say you want to retrieve the names of the youngest pupil in each class of a school. The base predicate would be pupil(ClassName, Name, Age).
+ pupil('1A', 'John', 8)
+ pupil('1B', 'Joe', 9)
The aggregate clause would be :
(younger[ClassName] == min_(Name, order_by= Age)) <= pupil(ClassName, Name, Age)
The query would then be :
(younger[ClassName]==X)

Resources