How to compare an integer with string of integers(list) in SQL query - sybase

I have a column (col1) with list of integers separated by space
col1 col2 col3
======== ==== ====
1 1 2 3 4 5 10 20
2 6 7 8 9 10 10 20
So I require to print the records when certain condition is matched
For Eg:
(Col>=1) should print the first row
(Col>=9) should print the second row
I am using the SQL anywhere to perform this.
I tried by using sa_split_list by tokenizing the integers from list and comparing each integer with the input number but I could not arrive with the solution.

Related

How to vectorize my code which runs on row iteration?

I am trying to write a code that will list the total amount of time that a record with a specific value is found next to each other as shown in the attached picture:
I have written the following code and it works fine:
bit= False
exe= True
x= len(df)
for i in range (x):
#Reset bit if type not Residential
if df.loc[i,'type'] != 'Residential':
bit= False
#Duplicate data if type same
if bit and i != 0:
df.loc[i,'Test']= df.loc[(i-1),'Test']
#Finding max row for specific row
if not bit and df.loc[i,'type']=='Residential':
counter=0
k= i
while df.loc[k,'type']=='Residential' and exe:
bit= True
counter += 1
df.loc[i,'Test'] = counter
df.loc[i,'MaxTimeStamp']=df.loc[k,'price']
if k < (x-1):
k += 1
else:
exe= False
However, when I am running it in a large dataframe, the run time is very slow. I have read that row iterations are not efficient and it's better to vectorise - but I am not able to vectorise the code above.
Any help would be highly appreciated.
You wrote that your code is based on rows iteration, but actually the
situation is even worse. Your main loop runs over row numers (and
at the same indices) and then, in each turn of your loop, you retrieve individual row, based of this index.
To make the thing worse, your code:
repeats this retrieval on each access to any element on the "current"
row (indicated by i),
will fail if the index is something other than consecutive integers,
starting from 0.
As I see from your picture of the DataFrame, you actually want to
generate Test column with value - how many times occured
consecutive rows with type == 'Residential' (not the amount of time,
which means rather the difference between e.g. timestamps).
For rows with other values, Test column should have an empty string.
To do it, you can run
df['Test'] = df.groupby((df.type.shift(1) != df.type).cumsum()).type\
.transform(lambda grp: grp.size if grp.iloc[0] == 'Residential' else '')
To check the above code, I prepared a source DataFrame - a copy of only
type column from your picture:
type
0 Residential
1 Residential
2 Residential
3 Residential
4 Residential
5 Condo
6 Residential
7 Residential
8 Condo
After I ran my code, the result was:
type Test
0 Residential 5
1 Residential 5
2 Residential 5
3 Residential 5
4 Residential 5
5 Condo
6 Residential 2
7 Residential 2
8 Condo
Your code sample sets also values in MaxTimeStamp column, but:
your picture does not contain this column,
for me it is a weird combination that in column named MaxTimeStamp
you save values from price column,
in your statment of problem you didn't write about this column,
so I left this detail untouched.
It is also worth to notice that the new column is of object type,
because not all elements are numbers. If you want this column to be of
e.g. int type, change the "else" value from an empty string to some
integer value, e.g. 0 or -1 (meaning "no data" here).
The else value could also be np.nan (true "no data" marker),
but then this column would be coerced to float type, because
np.nan is just a special case of float.
Edit following comments as of 29.08
Explanation of the solution
Run: df.type.shift(1) - it displays type value from the previous row:
0 NaN
1 Residential
2 Residential
3 Residential
4 Residential
5 Residential
6 Condo
7 Residential
8 Residential
Name: type, dtype: object
For the first row the result is NaN - no data, since for the first row
there is no previous row.
Now run: df.type.shift(1) != df.type. It displays a Series of bool
type, an answer to the question: Is type in previous row other than type
in the current row:
0 True
1 False
2 False
3 False
4 False
5 True
6 True
7 False
8 True
Name: type, dtype: bool
As you can see, rows with index 0, 5, 6 and 8 start a new sequence
of identical values (compare with df).
Run: (df.type.shift(1) != df.type).cumsum(). The result is:
0 1
1 1
2 1
3 1
4 1
5 2
6 3
7 3
8 4
Name: type, dtype: int32
Note that rows with index 0 thru 4 (the first group of Residential)
have value of 1. Then row 5 (Condo) has value of 2.
The next group, with value of 3 are rows 6 and 7 (Residential).
And finally row 8 has value of 4.
To see the result of grouping, run:
gr = df.groupby((df.type.shift(1) != df.type).cumsum())
for key, grp in gr:
print(f'\nGroup: {key}\n{grp}')
The result is:
Group: 1
type
0 Residential
1 Residential
2 Residential
3 Residential
4 Residential
Group: 2
type
5 Condo
Group: 3
type
6 Residential
7 Residential
Group: 4
type
8 Condo
so you see the division into groups, just as I described in the previous
point.
To be able to invoke the transformation function individually on each
group, and see the result, define:
trFun = lambda grp: grp.size if grp.iloc[0] == 'Residential' else ''
Then run:
for key, grp in gr:
print(f'{key}: {trFun(gr.get_group(key).type)}')
The result is:
1: 5
2:
3: 2
4:
Note that transform works this way that the returned value is
replicated to all group members, so:
each row from group 1 gets 5,
each row from group 2 gets an empty string,
each row from group 3 gets 2,
each row from group 4 gets again an empty string.
And just these values are saved in Test column.
The second question
This question can be interpreted two ways:
How many rows in df have type equal to the previous row.
Start from a more basic question: Which rows have type equal
to the previous row? The answer is:
df.type.shift(1) == df.type
getting:
0 False
1 True
2 True
3 True
4 True
5 False
6 False
7 True
8 False
Name: type, dtype: bool
So as you see, they are rows 1 thru 4 and 7.
And to answer the original question (How many...), run:
(df.type.shift(1) == df.type).sum() getting 5.
How many groups of consecutive values in type column has
this DataFrame?
The answer is: (df.type.shift(1) != df.type).sum() getting 4.

EXCEL VBA: Removing rows from an array and adding those rows to another array

I am working in EXCEL VBA.
I have a 2 dimensional array (for this example, let's say its a 5 x 5 one-based array). This is my raw data (Array "A"):
6 7 7 8 5
9 9 9 9 7
1 3 6 9 3
7 3 2 9 9
4 9 6 5 2
I also have a separate array whose row space mirrors that of the first (e.g., a 5 x 3 one-based array). The 1st column of this array is the row number of the raw data (A). This is my meta data (Array "B"):
1 0 0
2 1 0
3 0 0
4 0 0
5 1 0
For every occurrence of "1" in the 2nd column of the meta data array (B), I need to remove the corresponding row from my raw data array (A) AND add that row to a third array (Array "C")(which will not contain any data at the beginning of this process). Therefore, in this example, I need to remove rows 2 & 5 from Array A and place them in Array C.
I also need to copy the 1st column of the Array B (the original row numbers of Array A) to both arrays A & C so that after some further processing I can re-combine the results and return the data to its original order.
I'm not sure how best to go about this. Any suggestions?
Thanks!

How do you fill-in missing values despite differences in index values?

Here's my situation. I have a predicted values in the form of array (i.e. ([1,3,1,2,3,...3]) ) and a data frame column of missing NA's. Both array and column of data frame have the same dimensions. But, the indices don't match another.
For instance, the indices of predicted array are 0:100.
On the other hand, the indices of the column of NA's don't begin with 0, rather the first index where NA is observed in the dataFrame.
What's Pandas function will fill-in the first missing value with the first element of predicted array, second missing value with the second element, and so forth?
Assuming your missing data is represented in the DF as NaN/None values:
df = pd.DataFrame({'col1': [2,3,4,5,7,6,5], 'col2': [2,3,None,5,None,None,5],}) # Column 2 has missing values
pred_vals = [11, 22, 33] # Predicted values to be inserted in place of the missing values
print 'Original:'
print df
missing = df[pd.isnull(df['col2'])].index # Find indices of missing values
df.loc[missing, 'col2'] = pred_vals # Replace missing values
print '\nFilled:'
print df
Result:
Original:
col1 col2
0 2 2
1 3 3
2 4 NaN
3 5 5
4 7 NaN
5 6 NaN
6 5 5
Filled:
col1 col2
0 2 2
1 3 3
2 4 11
3 5 5
4 7 22
5 6 33
6 5 5

Count based on column and row

I can't seem to find something quite like this problem...
I have an array table where each row contains a random assortment of numbers 1-N
On another sheet, I have a table with column and row headers numbered 1-N
I want to count how many rows in the array contain both the column and row headers for a given cell in the table. Since countifs only reference the current cell in the specified array, they don't seem to be working in this scenario.
Example array:
A B C D
1 3 5 7
1 2 3 4
2 3 4 5
2 4 6 8
...
Table results (symmetrical about the diagonal):
A B C D E F
. 1 2 3 4 5 ...
1 - 1 2 1 1
2 1 - 2 2 1
3 2 2 - 2 2
4 1 2 2 - 1
5 1 1 2 1 -
Would using nested countifs work?
I don't agree with your results corresponding to 4/2, which surely should be 3, not 2, but this formula, based on the array table being in Sheet1 A1:D4 and the results table being in Sheet2 A1:F6, placed in cell B2 of the latter, should work:
=IF($A2=B$1,"-",SUMPRODUCT(N(MMULT(N(COUNTIF(OFFSET(Sheet1!$A$1:$D$1,ROW(Sheet1!$A$1:$D$4)-MIN(ROW(Sheet1!$A$1:$D$4)),),CHOOSE({1,2},B$1,$A2))>0),{1;1})=2)))
Copy across and down as required.
Note: If your actual table is in fact much larger than that given, it will probably be worth adding a simple clause into the above to the effect that the results for approximately half of the cells are obtained from their symmetrical counterparts, rather than via calculation of this construction, thus saving resource.
Regards

Associating / linking an array column with another column in the array

I have an array that has some calcultations done on the second column. I would like the values from the third column to follow/be linked to the second column.
Test Code:
a1= [1,10,-11;
2,70,232;
3,33.2,-33;
4,40,44;]
a2calc=abs(a1(:,2)-max(a1(:,2))) %calculation
a2=[a1(:,1),a2calc,a1(:,3)] %new array
Example:
original a1 Array
1 10 -11
2 70 232
3 33.2 -33
4 40 44
a2 Array after column 2 calculations looks like this
1 60 -11
2 0 232
3 36.8 -33
4 30 44
I'm trying to get the final array to look like this (column 3 values follow / are linked to the second column)
1 60 232
2 0 -11
3 36.8 44
4 30 -33
What I'm having problems with is I'm not sure if I should use the index values of column 2 and if so how I can get it to look like the final output array I included in the question.
I might be wrong here, but it looks to me like the logic is:
After calculating the second column, change the order of the third column so that the third column is sorted the same way as the second. To see what I mean:
This represents the two columns, numbered from highest to lowest:
A = 1 1
4 3
2 2
3 4
If I understand it right, you want the resulting matrix to be
A = 1 1
4 4
2 2
3 3
If this is the right logic then you should check out sort with two outputs. You can use the second output to index the third column.
[~, idx] = sort(A(:, 2));
sorted_3 = sort(A(:, 3));
A(idx, 3) = sorted_3;
The output from this is:
A =
1.00000 60.00000 232.00000
2.00000 0.00000 -33.00000
3.00000 36.80000 44.00000
4.00000 30.00000 -11.00000
Good luck!

Resources