I'm likely missing something simple here, so I apologize in advance. I am also aware that there is likely a better approach to this, so I'm open to that as well.
I'm trying to run a PowerShell script that will look at an array of values, comparing them to see the value of the difference between two elements of an array.
Below is a sample data set I'm using to test with that is imported into powershell from CSV:
1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.7, 2.9, 3.0
What I'm trying to accomplish is running through this list and comparing the second entry with the first, the third with the second, fourth with the third, etc, adding the element to $export ONLY if it has a value that is at least 0.2 greater than the previous element.
Here's what I've tried:
$import = get-content C:/pathtoCSVfile
$count = $import.Length-1;
$current=0;
Do
{
$current=$current+1;
$previous=$current-1
if (($import[$current]-$import[$previous]) -ge 0.2)
{
$export=$export+$import[$current]+"`r`n";
}
}
until ($current -eq $count)
Now I've run this with Trace on and it assigns values to $current and $previous and runs the subtraction of the two as described in the if condition on each loop through, but ONLY for the value of 2.7 ($import[14]-$import[13]) is it registering that the if condition has been met, thus leaving only a single value of 2.7 in $export. I expected other values (1.7, 1.9, and 2.9) to also be added to the $export variable.
Again, this is probably something stupid/obvious I'm overlooking, but I can't seem to figure it out. Thanks in advance for any insight you can offer.
The problem is that decimal fractions have no exact representation in the implicitly used [double] data type, resulting in rounding errors that cause your -ge 0.2 comparison to yield unexpected results.
A simple example with [double] values, which are what PowerShell implicitly uses with number literals that have a decimal point:
PS> 2.7 - 2.5 -ge 0.2
True # OK, but only accidentally so, due to the specific input numbers.
PS> 1.7 - 1.5 -ge 0.2
False # !! Due to the inexact internally binary [double] representation.
If you force your calculations to use the [decimal] type instead, the problem goes away.
Applied to the above example (appending d to a number literal in PowerShell makes it a [decimal]):
PS> 1.7d - 1.5d -ge 0.2d
True # OK - Comparison is now exact, due to [decimal] values.
Applied in the context of a more PowerShell-idiomatic reformulation of your code:
# Sample input; note that floating-point number literals such as 1.0 default to [double]
# Similarly, performing arithmetic on *strings* that look like floating-point numbers
# defaults to [double], and Import-Csv always creates string properties.
$numbers = 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.7, 2.9, 3.0
# Collect those array elements that are >= 0.2 than their preceding element
# in output *array* $exports.
$exports = foreach ($ndx in 1..($numbers.Count - 1)) {
if ([decimal] $numbers[$ndx] - [decimal] $numbers[$ndx-1] -ge 0.2d) {
$numbers[$ndx]
}
}
# Output the result array.
# To create a multi-line string representation, use $exports -join "`r`n"
$exports
The above yields:
1.7
1.9
2.7
2.9
Related
Trying to come up with a fast way to make sure a is monotonic in Julia.
The slow (and obvious) way to do it that I have been using is something like this:
function check_monotonicity(
timeseries::Array,
column::Int
)
current = timeseries[1,column]
for row in 1:size(timeseries, 1)
if timeseries[row,column] > current
return false
end
current = copy(timeseries[row,column])
end
return true
end
So that it works something like this:
julia> using Distributions
julia>mono_matrix = hcat(collect(0:0.1:10), rand(Uniform(0.4,0.6),101),reverse(collect(0.0:0.1:10.0)))
101×3 Matrix{Float64}:
0.0 0.574138 10.0
0.1 0.490671 9.9
0.2 0.457519 9.8
0.3 0.567687 9.7
⋮
9.8 0.513691 0.2
9.9 0.589585 0.1
10.0 0.405018 0.0
julia> check_monotonicity(mono_matrix, 2)
false
And then for the opposite example:
julia> check_monotonicity(mono_matrix, 3)
true
Does anyone know a more efficient way to do this for long time series?
Your implementation is certainly not slow! It is very nearly optimally fast. You should definitely get rid of the copy. Though it doesn't hurt when the matrix elements are just plain data, it can be bad in other cases, perhaps for BigInt for example.
This is close to your original effort, but also more robust with respect to indexing and array types:
function ismonotonic(A::AbstractMatrix, column::Int, cmp = <)
current = A[begin, column] # begin instead of 1
for i in axes(A, 1)[2:end] # skip the first element
newval = A[i, column] # don't use copy here
cmp(newval, current) && return false
current = newval
end
return true
end
Another tip: You don't need to use collect. In fact, you should almost never use collect. Do this instead (I removed Uniform since I don't have Distributions.jl):
mono_matrix = hcat(0:0.1:10, rand(101), reverse(0:0.1:10)) # or 10:-0.1:0
Or perhaps this is better, since you have more control over the numer of elements in the range:
mono_matrix = hcat(range(0, 10, 101), rand(101), range(10, 0, 101))
Then you can use it like this:
1.7.2> ismonotonic(mono_matrix, 3)
false
1.7.2> ismonotonic(mono_matrix, 3, >=)
true
1.7.2> ismonotonic(mono_matrix, 1)
true
In mathematics typically we define a series to be monotonic if it is non-decreasing or non-increasing. If this is what you want then do:
issorted(view(mono_matrix, :, 2), rev=true)
to check if it is non-increasing, and:
issorted(view(mono_matrix, :, 2))
to check if it is non-decreasing.
If you want a decreasing check do:
issorted(view(mono_matrix, :, 3), rev=true, lt = <=)
for decreasing, but treating 0.0 and -0.0 as equal or
issorted(view(mono_matrix, :, 3), lt = <=)
for increasing, but treating 0.0 and -0.0 as equal.
If you want to distinguish 0.0 and -0.0 then do respectively:
issorted(view(mono_matrix, :, 3), rev=true, lt = (x, y) -> isequal(x, y) || isless(x, y))
issorted(view(mono_matrix, :, 3), lt = (x, y) -> isequal(x, y) || isless(x, y))
I would like to create a distribution for n categorical variables C_1,.., C_n whose event shape is n. Using JointDistributionSequentialAutoBatched the event dimension is a list [[],..,[]]. For example for n=2
import tensorflow_probability.python.distributions as tfd
probs = [
[0.8, 0.2], # C_1 in {0,1}
[0.3, 0.3, 0.4] # C_2 in {0,1,2}
]
D = tfd.JointDistributionSequentialAutoBatched([tfd.Categorical(probs=p) for p in probs])
>>> D
<tfp.distributions.JointDistributionSequentialAutoBatched 'JointDistributionSequentialAutoBatched' batch_shape=[] event_shape=[[], []] dtype=[int32, int32]>
How do I reshape it to get event shape [2]?
A few different approaches could work here:
Create a batch of Categorical distributions and then use tfd.Independent to reinterpret the batch dimension as the event:
vector_dist = tfd.Independent(
tfd.Categorical(
probs=[
[0.8, 0.2, 0.0], # C_1 in {0,1}
[0.3, 0.3, 0.4] # C_2 in {0,1,2}
]),
reinterpreted_batch_ndims=1)
Here I added an extra zero to pad out probs so that both distributions can be represented by a single Categorical object.
Use the Blockwise distribution, which stuffs its component distributions into a single vector (as opposed to the JointDistribution classes, which return them as separate values):
vector_dist = tfd.Blockwise([tfd.Categorical(probs=p) for p in probs])
The closest to a direct answer to your question is to apply the Split bijector, whose inverse is Concat, to the joint distribution:
tfb = tfp.bijectors
D = tfd.JointDistributionSequentialAutoBatched(
[tfd.Categorical(probs=[p] for p in probs])
vector_dist = tfb.Invert(tfb.Split(2))(D)
Note that I had to awkwardly write probs=[p] instead of just probs=p. This is because the Concat bijector, like tf.concat, can't change the tensor rank of its argument---it can concatenate small vectors into a big vector, but not scalars into a vector---so we have to ensure that its inputs are themselves vectors. This could be avoided if TFP had a Stack bijector analogous to tf.stack / tf.unstack (it doesn't currently, but there's no reason this couldn't exist).
I have a JaggedArray (awkward.array.jagged.JaggedArray) that contains indices that point to positions in another JaggedArray. Both arrays have the same length, but each of the numpy.ndarrays that the JaggedArrays contain can be of different length. I would like to sort the second array using the indices of the first array, at the same time dropping the elements from the second array that are not indexed from the first array. The first array can additionally contain values of -1 (could also be replaced by None if needed, but this is currently not that case) that mean that there is no match in the second array. In such a case, the corresponding position in the first array should be set to a default value (e.g. 0).
Here's a practical example and how I solve this at the moment:
import uproot
import numpy as np
import awkward
def good_index(my_indices, my_values):
my_list = []
for index in my_indices:
if index > -1:
my_list.append(my_values[index])
else:
my_list.append(0)
return my_list
indices = awkward.fromiter([[0, -1], [3,1,-1], [-1,0,-1]])
values = awkward.fromiter([[1.1, 1.2, 1.3], [2.1,2.2,2.3,2.4], [3.1]])
new_map = awkward.fromiter(map(good_index, indices, values))
The resulting new_map is: [[1.1 0.0] [2.4 2.2 0.0] [0.0 3.1 0.0]].
Is there a more efficient/faster way achieving this? I was thinking that one could use numpy functionality such as numpy.where, but due to the different lengths of the ndarrays this fails at least for the ways that I tried.
If all of the subarrays in values are guaranteed to be non-empty (so that indexing with -1 returns the last subelement, not an error), then you can do this:
>>> almost = values[indices] # almost what you want; uses -1 as a real index
>>> almost.content = awkward.MaskedArray(indices.content < 0, almost.content)
>>> almost.fillna(0.0)
<JaggedArray [[1.1 0.0] [2.4 2.2 0.0] [0.0 3.1 0.0]] at 0x7fe54c713c88>
The last step is optional because without it, the missing elements are None, rather than 0.0.
If some of the subarrays in values are empty, you can pad them to ensure they have at least one subelement. All of the original subelements are indexed the same way they were before, since pad only increases the length, if need be.
>>> values = awkward.fromiter([[1.1, 1.2, 1.3], [], [2.1, 2.2, 2.3, 2.4], [], [3.1]])
>>> values.pad(1)
<JaggedArray [[1.1 1.2 1.3] [None] [2.1 2.2 2.3 2.4] [None] [3.1]] at 0x7fe54c713978>
I am trying to create time stamp arrays in Swift.
So, say I want to go from 0 to 4 seconds, I can use Array(0...4), which gives [0, 1, 2, 3, 4]
But how can I get [0.0, 0.5 1.0, 2.0, 2.5, 3.0, 3.5, 4.0]?
Essentially I want a flexible delta, such as 0.5, 0.05, etc.
You can use stride(from:through:by:):
let a = Array(stride(from: 0.0, through: 4.0, by: 0.5))
An alternative for non-constant increments (even more viable in Swift 3.1)
The stride(from:through:by:) functions as covered in #Alexander's answer is the fit for purpose solution where, but for the case where readers of this Q&A wants to construct a sequence (/collection) of non-constant increments (in which case the linear-sequence constructing stride(...) falls short), I'll also include another alternative.
For such scenarios, the sequence(first:next:) is a good method of choice; used to construct a lazy sequence that can be repeatedly queried for the next element.
E.g., constructing the first 5 ticks for a log10 scale (Double array)
let log10Seq = sequence(first: 1.0, next: { 10*$0 })
let arr = Array(log10Seq.prefix(5)) // [1.0, 10.0, 100.0, 1000.0, 10000.0]
Swift 3.1 is intended to be released in the spring of 2017, and with this (among lots of other things) comes the implementation of the following accepted Swift evolution proposal:
SE-0045: Add prefix(while:) and drop(while:) to the stdlib
prefix(while:) in combination with sequence(first:next) provides a neat tool for generating sequences with everything for simple next methods (such as imitating the simple behaviour of stride(...)) to more advanced ones. The stride(...) example of this question is a good minimal (very simple) example of such usage:
/* this we can do already in Swift 3.0 */
let delta = 0.05
let seq = sequence(first: 0.0, next: { $0 + delta})
/* 'prefix(while:)' soon available in Swift 3.1 */
let arr = Array(seq.prefix(while: { $0 <= 4.0 }))
// [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]
// ...
for elem in sequence(first: 0.0, next: { $0 + delta})
.prefix(while: { $0 <= 4.0 }) {
// ...
}
Again, not in contest with stride(...) in the simple case of this Q, but very viable as soon as the useful but simple applications of stride(...) falls short, e.g. for a constructing non-linear sequences.
How can one create an array filled with values within a range (having a begin and end value) and a step? It should support begin and end values of float type.
For floats with custom stepping you can use Numeric#step like so:
-1.25.step(by: 0.5, to: 1.25).to_a
# => [-1.25, -0.75, -0.25, 0.25, 0.75, 1.25]
If you are looking on how to do this with integer values only, see this post or that post on how to create ranges and simply call .to_a at the end. Example:
(-1..1).step(0.5).to_a
# => [-1.0, -0.5, 0.0, 0.5, 1.0]