Comparing two array values and deleting row if condition is fullfilled

Comparing two array values and deleting row if condition is fullfilled - arrays

I have an array with 2 columns. I want to compare the values of the first column, like this:
a[i+1]-a[i]<0.0025.
If this is true I need to delete the row with a[i+1].
This is my first attempt, but it doesnt work.
a = np.delete(a, np.diff(a[:,0])<0.0025, 0)
I get the following error:
ValueError: boolean array argument obj to delete must be one dimensional and match the axis length of 8628
8628 is the length of the array.
Another code i´ve tried is:
a = a[~(np.diff(a[:,0]))<0.0025]
But then I get this error:
TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Can somebody help me with this?

You're on the right track. np.diff(a[:, 0]) < 0.0025 creates an array of length 1 less than 8628, which means when you use it in np.delete that the dimension no longer matches with the original array.
I would go with your second attempt. Using < 0.0025 results in a mask of items that you want to delete, which you need to invert using ~ to get a mask of results you would like to keep. You have to make sure to place your parentheses correctly: ~( np.diff(a[:, 0]) < 0.0025 ). Instead, you can also use >= 0.0025 to make a mask of items you would like to keep.
Lastly, you have to make sure to match the dimensions (given that np.diff results in one less element. You can do this by prepending True to signify that you always want to keep the first value. One way to do that is using np.r_.
Final code:
import numpy as np
a = np.random.rand(8628, 2) # Your array here
result = a[ np.r_[True, np.diff(a[:, 0]) >= 0.0025] ]
Detailed example:
Consider the array: [ 1, 3, 2, 5, 3]
np.diff creates: [ 2, -1, 3, -2]
Using threshold creates: [True, False, True, False]
Note that when the next element in the original array is less than the previous,
the thresholding result in False too.
Finally, because there are now 4 values instead of 5, we prepend True.
This has the effect of always including the first element in the result:
Original: [ 1, 3, 2, 5, 3]
Mask [True, True, False, True, False]
^^^^ ~~~~ ~~~~~ ~~~~ ~~~~~
Then using boolean indexing, we get the elements where
the mask contains True to obtain the final result:
[1, 3, 5]

Related

pyspark: From an array of structs, extract a scalar from the struct for even or odd index , then postprocess the array

I have a dataframe row that contains an ArrayType column named moves. moves is an array of StructType with a few fields in it. I can use a dotpath to fetch just one field from that struct and make a new array of just that field e.g. .select(df.moves.other) will create a same-length array as moves but only with the values of the other field. This is the result:
[null, [{null, null, [0:10:00]}], null, null, [{null, null, [0:10:00]}], [{null, null, [0:09:57]}], [{null, null, [0:09:56]}], [{null, null, [0:09:54]}], ...
So clearly other is not simple. Each element in the array is either null (idx 0,2,and 3 above) if 'other' is not in the struct (which is permitted) or an array of struct where the struct contains field clk which itself is an array (note that simple SPARK output does not list the field names, just the values. The nulls in the struct are unset fields). This is a two-player alternating move sequence; we need to do two things:
Extract the even idx elements and the odd idx elements.
From each, "simplify" the array where entries are either null or the value of the zeroeth entry in the clk field.
This is the target:
even list: [null, null, "0:10:00", "0:09:56", ...
odd list: ["0:10:00", null, "0:09:57", ...
Lastly, we wish to walk these arrays (individually) and compute delta time (n+1 - n) iff both n+1 and n not null.
This is fairly straightforward in "regular" python using slicing e.g. [::2] for evens and [1::2] for odds and map and list comprehensions etc. etc. But I cannot seem to assemble the right functions in pyspark to create the simplified arrays (forget about converting 0:10:00 to something for the moment). For example, unlike regular python, pyspark slice does not accept a step argument and pyspark needs more conditional logic around nulls. transform is promising but I cannot get it to skip entries to arrive at a shorter list.
I tried going the other direction with a UDF. To start, my UDF returned the array that was passed to it:
moves_udf = F.udf(lambda z: z, ArrayType(StructType()))
df.select( moves_udf(df.moves.other) )
But this yielded a grim exception, possibly because the other array contains nulls:
py4j.protocol.Py4JJavaError: An error occurred while calling o55.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (192.168.0.5 executor driver): net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for pyspark.sql.types._create_row)
at net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
...
I know the UDF machinery works for simple scalars. I tested a toUpper() function on a different column and the UDF worked fine.
Almost all of the other move data is much more "SPARK friendly". It is the other field and the array-of-array substructure that is vexing.
Any guidance most appreciated.
P.S. All my pyspark logic is pipelined functions; no SQL. I would greatly prefer not to mix and match.

The trick is to use transform as a general loop exploiting the binary form of the lambda that also passes the current index into the array. Here is a solution:
# Translation:
# select 1: Get only even moves and call the output a "temporary" column 'X'
# select 2: X will look like this [ [{null,null,["0:03:00"]},null,{null,null,["0:02:57"]},...]
# This is because the dfg.moves.a is an array in the moves
# array. In this example, we do not further filter on the
# entries in the 'a' array; we know we want a[0] (the first one).
# We just want ["0:03:00",null,"0:02:57",...]
# x.clk will get the clk field but the value there is *also*
# an array so we must use subscript [0] *twice* to dig thru
# to the string we seek
# select 3: Convert all the "h:mm:ss" strings into timestamps
# select 4: Walk the deltas and return diff to the next neighbor.
# The last entry is always null.
dfx = dfg\
.select( F.filter(dfg.moves.a, lambda x,i: i % 2 == 0).alias('X'))\
.select( F.transform(F.col('X'), lambda x: x.clk[0][0]).alias('X'))\
.select( F.transform(F.col('X'), lambda x: F.to_timestamp(x,"H:m:s").cast("long")).alias('X'))\
.select( F.transform(F.col('X'), lambda x,i: x - F.col('X')[i+1]).alias('delta'))
dfx.show(truncate=False)
dfx.printSchema()
+---------------------------------------------------------------+
|delta |
+---------------------------------------------------------------+
|[2, 1, 2, 1, 4, 9, 16, 0, 6, 3, 8, 5, 2, 12, 4, 4, 10, 0, null]|
+---------------------------------------------------------------+
root
|-- delta: array (nullable = true)
| |-- element: long (containsNull = true)
If you want to compactify it you can do so.
dfx = dfg\
.select( F.transform(F.filter(dfg.moves.a, lambda x,i: i % 2 == 0), lambda x: F.to_timestamp(x.clk[0][0],"H:m:s").cast("long")).alias('X') )\
.select( F.transform(F.col('X'), lambda x,i: x - F.col('X')[i+1]).alias('delta'))

None in python array

arr=[1,2,3,4,5]
n=len(arr)
temp = n*[None]
flag = True
flag= bool(1-flag)
I'm new to python, so not sure what it really means.
I want to know what all three lines of code do. Thank you

The first line will create an array of five elements
print (arr)
[1, 2, 3, 4, 5]
The second will make a variable named 'n' that will contain the number of elements in your array
print(n)
5
The third line will create an array with a length of 5 that will only contain None.
None is used to define a variable so that it resets to a variable with no value. It is not similar to a NULL or an empty string, None is an object.
print(temp)
[None, None, None, None, None]
The last line will change your flag value to false.
In standard binary conventions, True is equal to 1 and False is equal to 0. By subtracting a 1 with the flag value that is True, you are doing 1-1 which is equal to Zero. With bool(), you obtain a false.
print(flag)
False

Ruby - compare two arrays for index matches and with the remainder if included

Working on a project to recreate a game Mastermind. I need to compare two arrays, and running into some struggles.
I need to output two integers for the flow of the game to work,
the first integer is the number of correct choices where the index matches. The code I have for this appears to be working
pairs = #code.zip(guess)
correct_position_count = pairs.select { |pair| pair[0] == pair[1] }.count
Where pairs is equal to a 4 element array and the guess is also a 4 element array
The second part I am having a bit of trouble with on how to do the comparison and return an array. The integer should represent where the two arrays index don't match (the above code block but !=) and confirm whether the guess array excluding any exact index matches has any elements included with the code array once again excluding the exact index matches.
Any help would be greatly appreciated!

I am not completely sure to understand your problem but if I understood well, you've two arrays, solution with the solution and guess with the current guess of the player.
Now, let's assume that the solution is 1234 and that the guess is 3335.
solution = [1, 2, 3, 4]
guess = [3, 3, 3, 5]
an element by element comparison produces an array of booleans.
diff = guess.map.with_index { |x,i| x == solution[i] }
# = [false, false, true, false]
Now, you can easily compute the number of good digits diff.count true and the number of wrong digits diff.count false. And, in case you need the index of the false and/or true values you can do
diff.each_index.select { |i| diff[i] } # indexes with true
# = [2]
diff.each_index.select { |i| !diff[i] } # indexes with false
# = [0, 1, 3]

You can count all digit matches ignoring their positions and then subtract exact matches.
pairs = #code.zip(guess)
correct_position_count = pairs.select { |pair| pair[0] == pair[1]}.count
any_position_count = 0
code_digits = #code.clone # protect #code from modifying
guess.each do |digit|
if code_digits.include?(digit)
code_digits.delete_at(code_digits.find_index(digit)) # delete the found digit not to count it more than once
any_position_count += 1
end
end
inexact_position_count = any_position_count - correct_position_count
puts "The first value: #{correct_position_count}"
puts "The second value: #{inexact_position_count}"

What does the range method getValues() return and setValues() accept?

I want to get a range from my sheet. As recommended in Best practices, I am trying to get a array and manipulate it, but I'm confused:
const ss = Spreadsheet.getActive(),
sh = ss.getSheetByName("Sheet1"),
rg = sh.getRange("A1:C1"),//has 1,2,3
values = rg.getValues();
console.log(values);
The logs show
[[1,2,3]]
As you can see I got all three elements. But, when I log the length of the array(array.length), it is just 1(instead of 3). When I test existence of a element using .indexOf or .includes, It says -1 or false.
const values = /*same as logged above*/[[1,2,3]];
console.log(values.indexOf(2));//got -1 expected 1
console.log(values.includes(1));//got false expected true
Why?
I have the same issue with setValues().
rg.setValues([1,2,3]);//throws error
The error is
"The parameters (number[]) don't match the method signature for SpreadsheetApp.Range.setValues."
My specific Question is: What exactly does getValues() return? Is it a special kind of array?

Documentation excerpts:
From The official documentation, getValues() returns
a two-dimensional array of values,
It ALWAYS returns a two dimensional array of values.
One dimensional array is
[1,2,3]
Two dimensional array is
[[1,2,3]]
//or
[[1], [2], [3]]
There is/are array(s) inside a array.
indexed by row, then by column.
It is indexed by row first: i.e., The outer array has rows as inner array. Then each inner array has column elements. Consider the following simple spreadsheet:
A
B
C
1>
1
2
3
2>
2
3
4
3>
3
4
5
A1:A3 contains 3 rows and each row contains 1 column element. This is represented as [[1],[2],[3]]. Similarly, The following ranges represent the following arrays. Try to guess the array structure based on the A1 notation:
A1Notation
Number of Rows
Number of columns
Array Structure
array.length
array[0].length
A1:A3
3
1
[[1],[2],[3]]
3
1
A1:C1
1
3
[[1,2,3]]
1
3
A1:B2
2
2
[[1,2],[2,3]]
2
2
B1:C3
3
2
[[2,3],[3,4],[4,5]]
3
2
A2:C3
2
3
[[2,3,4],[3,4,5]]
2
3
Note how the two dimension provides direction.
See live visualization below:
/*<ignore>*/console.config({maximize:true,timeStamps:false,autoScroll:false});/*</ignore>*/
const test = {
'A1:A3': [[1], [2], [3]],
'A1:C1': [[1, 2, 3]],
'A1:B2': [
[1, 2],
[2, 3],
],
'B1:C3': [
[2, 3],
[3, 4],
[4, 5],
],
'A2:C3': [
[2, 3, 4],
[3, 4, 5],
],
};
Object.entries(test).forEach(([key, value]) => {
console.log(`The range is ${key}`);
console.table(value);
console.info(`The above table's JavaScript array notation is ${JSON.stringify(value)}`)
console.log(`=================================`);
});
<!-- https://meta.stackoverflow.com/a/375985/ --> <script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
The values may be of type Number, Boolean, Date, or String, depending on the value of the cell.
In the above example, We have Spreadsheet Number type elements converted to JavaScript number type. You can check spreadsheet type using =TYPE(). Corresponding JavaScript type reference is here
Empty cells are represented by an empty string in the array.
Check using
console.log(values[0][0]==="")//logs true if A1 is empty
Remember that while a range index starts at 1, 1, the JavaScript array is indexed from [0][0].
Given the two dimensional array structure, to access a value, two indexes of format array[row][column] is needed. In the above table, if A2:C3 is retrieved, To access C3, Use values[1][2]. [1] is second row in range A2:C3. Note that the range itself starts on second row. So, second row in the given range is row3 [2]is third column C.
Notes:
Warning:
Retrieved values from a range is always two dimensional regardless of the range height or width(even if it is just 1). getRange("A1").getValues() will represent [[1]]
setValues() will accept the same array structure corresponding to the range to set. If a 1D array is attempted, the error
The parameters (number[]/string[]) don't match the method signature for SpreadsheetApp.Range.setValues.
is thrown.
If the array does NOT exactly correspond to the range being set,i.e.,if each of the the inner array's length does not correspond to the number of columns in the range or the outer array's length does not correspond to the number of rows in the range being set, The error similar to the following is thrown:
The number of columns in the data does not match the number of columns in the range. The data has 5 but the range has 6.
Related answers to the above error:
https://stackoverflow.com/a/63770270
Related Search
indexOf/includes uses strict type checking. They won't work when you compare primitives against array objects. You can use Array.flat to flatten the 2D array to a 1D one. Alternatively, Use a plain old for-loop to check something.
const values = [[1,2,3]].flat();//flattened
console.log(values.indexOf(2));//expected 1
console.log(values.includes(1));//expected true
References:
Basic reading
MDN Arrays guide

How to convert two associated arrays so that elements are evenly distributed?

There are two arrays, an array of images and an array of the corresponding labels. (e.g pictures of figures and it's values)
The occurrences in the labels are unevenly distributed.
What I want is to cut both arrays in such a way, that the labels are evenly distributed. E.g. every label occurs 2 times.
To test I've just created two 1D arrays and it was working:
labels = np.array([1, 2, 3, 3, 1, 2, 1, 3, 1, 3, 1,])
images = np.array(['A','B','C','C','A','B','A','C','A','C','A',])
x, y = zip(*sorted(zip(images, labels)))
label = list(set(y))
new_images = []
new_labels = []
amount = 2
for i in label:
start = y.index(i)
stop = start + amount
new_images = np.append(new_images, x[start: stop])
new_labels = np.append(new_labels, y[start: stop])
What I get/want is this:
new_labels: [ 1. 1. 2. 2. 3. 3.]
new_images: ['A' 'A' 'B' 'B' 'C' 'C']
(It is not necessary, that the arrays are sorted)
But when I tried it with the right data (images.shape = (35000, 32, 32, 3), labels.shape = (35000)) I've got an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This does not help me a lot:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think that my solution is quite dirty anyhow. Is there a way to do it right?
Thank you very much in advance!

When your labels are equal, the sort function tries to sort on the second value of the tuples it has as input, since this is an array in the case of your real data, (instead of the 1D data), it cannot compare them and raises this error.
Let me explain it a bit more detailed:
x, y = zip(*sorted(zip(images, labels)))
First, you zip your images and labels. What this means, is that you create tuples with the corresponding elements of images and lables. The first element from images by the first element of labels, etc.
In case of your real data, each label is paired with an array with shape (32, 32, 3).
Second you sort all those tuples. This function tries first to sort on the first element of the tuple. However, when they are equal, it will try to sort on the second element of the tuples. Since they are arrays it cannot compare them en throws an error.
You can solve this by explicitly telling the sorted function to only sort on the first tuple element.
x, y = zip(*sorted(zip(images, labels), key=lambda x: x[0]))
If performance is required, using itemgetter will be faster.
from operator import itemgetter
x, y = zip(*sorted(zip(images, labels), key=itemgetter(0)))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Comparing two array values and deleting row if condition is fullfilled - arrays

Related

pyspark: From an array of structs, extract a scalar from the struct for even or odd index , then postprocess the array

None in python array

Ruby - compare two arrays for index matches and with the remainder if included

What does the range method getValues() return and setValues() accept?

How to convert two associated arrays so that elements are evenly distributed?

Categories

Resources