checking rows of row_number result on condition of another column

checking rows of row_number result on condition of another column - sql-server

I have a query that outputs some rows that I need to "filter".
The data I want to filter is like this:
rownum value
1 0
2 0
3 1
4 1
I need the first 2 rows, but only when they have "0" in the value-column.
The structure of the query is like this:
select count(x)
from
(
select row_number() over (partition by X order by y) as rownum, bla, bla
from [bla bla]
group by [bla bla]
) as temp
where
/* now this is where i want the magic to happen */
temp.rownum = 1 AND temp.value = 0
AND
temp.rownum = 2 AND temp.value = 0
So I want x only when row 1 and 2 have "0" in the value-column.
If either rownumber 1 or 2 have a "1" in the value-column, I dont want them.
I basically wrote the where-clause the way I wrote it here, but it's returning data sets that have "1" as value in either row 1 or 2.
How to fix this?

A value in a single row for a given column can never be both 1 and 2 at the same time.
So, first of all, either use OR (which is what I believe you intended):
WHERE (temp.rownum = 1 AND temp.value = 0)
OR (temp.rownum = 2 AND temp.value = 0)
Or simplify the query altogether:
WHERE temp.rownum <= 2 AND temp.value = 0
From here, you will get just the first two rows, and only if value = 0. If you only want rows when both rows are returned (i.e. both rows have value = 0), add
HAVING COUNT(1) = 2
to the query.

Related

Using the window function "last_value", when the values of the sorted field are same, the value snowflake returns is not the last value

As we all known, the window function "last_value" returns the last value within an ordered group of values.
In the following example, group by field "A" and sort by field "B" in positive order.
In the group of "A = 1", the last value is returned, which is, the C value 4 when B = 2.
However, in the group of "A = 2", the values of field "B" are the same.
At this time, instead of the last value, which is, the C value 4 in line 6, the first C value 1 in B = 2 is returned.
This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.
Example
row_number
A
B
C
LAST_VALUE(C) IGNORE NULLS OVER (PARTITION BY A ORDER BY B ASC)
1
1
1
2
4
2
1
1
1
4
3
1
1
3
4
4
1
2
4
4
5
2
2
1
1
6
2
2
4
1

This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.
For partition A equals 2 and column B, there is a tie:
The sort is NOT stable. To achieve stable sort a column or a combination of columns in ORDER BY clause must be unique.
To ilustrate it:
SELECT C
FROM tab
WHERE A = 2
ORDER BY B
LIMIT 1;
It could return either 1 or 4.

If you sort by B within A then any duplicate rows (same A and B values) could appear in any order and therefore last_value could give any of the possible available values.
If you want a specific row, based on some logic, then you would need to sort by all columns within the group to reflect that logic. So in your case you would need to sort by B and C

Good day Bill!
Right, the sorting is not stable and it will return different output each time.
To get stable results, we can run something like below
select
column1,
column2,
column3,
last_value(column3) over (partition by column1 order by
column2,column3) as column2_last
from values
(1,1,2), (1,1,1), (1,1,3),
(1,2,4), (2,2,1), (2,2,4)
order by column1;

SQL Equivalent of Countif or Index Match on row level instead of columnar level

I have a table which comprises of 30 columns, all adjacent to one another. Of these 5 are text fields indicating certain details pertaining to that entry and 25 are value fields. Value fields have the column name as Val00, Val01, Val02 .....upto Val24
Based on a logic appearing elsewhere, these value fields input a value for n amount of columns and then drop to 0 for all of the subsequent fields
e.g.
When n is 5 the output will be
Val00
Val01
Val02
Val03
Val04
Val05
Val06
Val07
Val24
1.5
1.5
1.5
1.5
1.5
0
0
0
0
As can be seen, all values starting from val05 will drop to 0 and all columns from Val 05 to Val24 will be 0.
Given this output, I want to find what this n value is and create a new column ValCount to store this value in it.
In Excel this would be fairly straight forward to achieve with the below formula
=COUNTIF(Val00:Val24,">0")
However I'm not sure how we would go about in terms of SQL. I understand that the count function works on a columnar level and not on a row level.

I found a solution which is rather long but it should do the job so I hope it helps you.
SELECT SUM(SUM(CASE WHEN Val00 >= 1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN Val01 >= 1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN Val02 >= 1 THEN 1 ELSE 0 END)) As ValCount

Create an ID to identify each loop in SAS

I have a log dataset in SAS like this, which has been ordered by TimeStamp ascendantly
TimeStamp Status
2015Dec01:1:00:00 1
2015Dec01:2:00:00 2
2015Dec01:3:00:00 3
2015Dec01:4:00:00 4
2015Dec01:5:00:00 1
2015Dec01:6:00:00 2
2015Dec01:7:00:00 2
2015Dec01:8:00:00 4
2015Dec01:9:00:00 5
2015Dec01:10:00:00 1
2015Dec01:11:00:00 3
2015Dec01:11:30:00 4
I wanted to create an ID to identify each loop which always started from status 1 and ended at status 4 (no matter what status between 1 and 4) like this:
Time Stamp Status ID
2015Dec01:1:00:00 1 1
2015Dec01:2:00:00 2 1
2015Dec01:3:00:00 3 1
2015Dec01:4:00:00 4 1
2015Dec01:5:00:00 1 2
2015Dec01:6:00:00 2 2
2015Dec01:7:00:00 2 2
2015Dec01:8:00:00 4 2
2015Dec01:9:00:00 5 .
2015Dec01:10:00:00 1 3
2015Dec01:11:00:00 3 3
2015Dec01:11:30:00 4 3
Does anyone can help me out? Thanks a lot

Define your rules (assumed):
Increment ID when status=1
If status>5 then ID is missing
data tmp;
set have;
retain ID_TMP 0; *initialize ID;
if status=1 then ID_TMP + 1;
if status<5 then ID=ID_TMP;
DROP ID_TMP;
run;

We know that a new group has started if current status is < previous status. We'll save the previous status using the lag function so that we can compare the current status to the previous one.
Create a temporary variable called count to count when we increment the ID. Since we are using if-then logic, we want to initialize ID and count with a value of 1 using the retain statement.
There are three cases to account for:
If the current ID is less than the previous ID, then increment count by 1 and set ID to be the value of count;
If the current ID is > 4, set ID to missing.
Any other time, ID will stay the same.
data want;
set log;
retain count ID 1;
Prior_Status = lag(Status);
if(Status < Prior_Status) then do;
count+1;
ID = count;
end;
else if(Status > 4) then call missing(ID);
drop count Prior_Status;
run;

Here's a couple of ways of doing it.
Method 1:
data want;
set have;
if status = 1 then tmp + 1;
if status <= 4 then id = tmp;
else id = .;
run;
Method 2:
data want;
set have;
if status = 1 then tmp + 1;
id = choosen((status<=4)+1, ., tmp);
run;

Updating multiple rows with respect to a change in one row

I am using PostgreSQL and let's say I have a tasks table to keep track of task items. Tasks table is as follows;
Id Name Index
7 name A 1
5 name B 2
6 name C 3
3 name D 4
Index column in tasks table stores the sort order of the tasks. Therefore I will output the tasks with respect to index in increasing order.
So When I change Task D(id = 3)' s index into 2 the new indexes should be as below;
Id Name Index
7 name A 1
5 name B 3
6 name C 4
3 name D 2
or when I change Task A(id = 7)' s index into 4 the new indexes should be as below;
Id Name Index
7 name A 4
5 name B 2
6 name C 3
3 name D 1
What I think is updating all row's index values one by one is pretty inefficient.
So what is the most efficient way to update all index values when I change one of the indexes in my Tasks table?
Edit :
First of all sorry for the confusion. What I am asking is not a simple exchanging two row indexes. If you look at the examples when I change Task D's index in to 2 more than one rows change. So when Task D is 2, Task B becomes 3 and Task C becomes 4.
For instance;
It is like when you drag Task D and drop below Task A so that it's index becomes 2 and B and C's index increases by 1.

SQL Fiddle
What you are doing is exchanging two row's indexes. So it is necessary to store the index value of the first updated one in a temp variable and setting it temporarily to a special value to avoid a unique index collision, that is, if the index is unique. If the index is not unique that step is unnecessary.
begin;
create temp table t as
select
(
select index
from tasks
where id = 3
) as index,
(
select id
from tasks
where index = 2
) as id
;
update tasks
set index = -1
where id = (select id from t)
;
update tasks
set index = 2
where id = 3
;
update tasks
set index = (select index from t)
where id = (select id from t)
;
drop table t;
commit;

The following assumes the index column (as well as id) is unique:
with swapped as (
select n1.id as id1,
n1.name as name1,
n1.index as index1,
n2.id as id2,
n2.name as name2,
n2.index as index2
from names n1
join names n2 on n2.index = 2 -- this is the value of the "new index"
where n1.id = 3 -- this is the id of the row where the index should be changed to the new value
)
update names as n
set index = case
when n.id = s.id1 then s.index2
when n.id = s.id2 then s.index1
end
from swapped s
where n.id in (s.id1, s.id2);
The CTE first creates a single row with the ids of the two rows to be swapped and then the update just compares the ids of the target table with those from the CTE, swapping the values.
SQLFiddle example: http://sqlfiddle.com/#!15/71dc2/1

TSQL - How to prevent optimisation in this query

I have a query analogous to:
update x
set x.y = (
select sum(x2.y)
from mytable x2
where x2.y < x.y
)
from mytable x
the point being, I'm iterating over rows and updating a field based on a subquery over those fields which are changing.
What I'm seeing is the subquery is being executed for each row before any updates occur, so the changed values for each row are not being picked up.
How can I force the subquery to be re-evaluated for each row of the update?
Is there a suitable table hint or something?
As an aside, I was doing the below and it did work, however since modifying my query somewhat (for logic purposes, not to try and solve this issue) this trick no longer works :(
declare #temp int
update x
set #temp = (
select sum(x2.y)
from mytable x2
where x2.y < x.y
),
x.y = #temp
from mytable x
I'm not particularly concerned about performance, this is a background task run over a few rows

It looks like task is incorrect or other rules should apply.
Let's see on example. Let's say you have values 4, 1, 2, 3, 1, 2
Sql will update rows based on original values. I.e. during single update statement newly calculated values is NOT mixing with original values:
-- only original values used
4 -> 9 (1+2+3+1+2)
1 -> null
2 -> 2 (1+1)
3 -> 6 (1+2+1+2)
1 -> null
2 -> 2 (1+1)
Based on your request you wants that update of each rows will count newly calculated values. (Note, that SQL does not guarantees the sequence in which rows will be processed.)
Let's do this calculation by processing rows from top to bottom:
-- from top
4 -> 9 (1+2+3+1+2)
1 -> null
2 -> 1 (1)
3 -> 4 (1+1+2)
1 -> null
2 -> 1 (1)
Do the same in other sequence - from bottom to top:
-- from bottom
4 -> 3 (2+1)
1 -> null
2 -> 1 (1)
3 -> 5 (2+2+1)
1 -> null
2 -> 2 (1+1)
How you can see your expected result is inconsistent. To make it right you need to correct the calculation rule - for instance define strong sequence of the rows to process (date, id, ...)
Also, if you want to do some recursive processing look at the common_table_expression:
http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx