PostgreSQL filter entity with intermediate table - database

I would like to create a query, which filters all entities.
Like that ->
FIRST_TABLE
---------------------
|A_ID | TITLE |
|--------------------
|1 | TEST1 |
|2 | TEST2 |
|3 | TEST3 |
|4 | TEST4 |
---------------------
SECOND_TABLE
---------------------
|B_ID | NAME |
|--------------------
|1 | NAME1 |
|2 | NAME2 |
|3 | NAME3 |
|4 | NAME4 |
---------------------
INTERMEDIATE_TABLE
-----------------
|A_FK | B_FK|
|----------------
|2 | 1 |
|2 | 2 |
|2 | 3 |
|3 | 1 |
-----------------
QUERY
SELECT * FROM FIRST_TABLE ft
JOIN INTERMEDIATE_TABLE it
ON ft.A_ID = it.A_FK
WHERE it.B_FK = 1
AND it.B_FK = 2
Then it should only show the entity 2 from first_table because this entity has a relation with NAME1 and NAME2.
How can I make this work?

Related

Spark dataframe create explode with order

I have a data like below
Input Df
+----------+-----------------------------------+--------------|
|SALES_NO |SALE_LINE_NUM | CODE_1 | CODE_3 | CODE_2 |
+----------+----------------------------|------+---|----------|
|123 |1 | ABC | E456 | GHF989 |
|123 |2 | EDF | EFHJ | WAEWA |
|234 |1 | 2345 | 985E | AWW |
|234 |2 | WERWE | | |
|234 |3 | ERC | AERER | |
|456 |1 | WER | AWER | |
+----------+-----------------------------------+--------------|
Output will be created like : for each unique sales_no, sales_line_num create a each new row for different code column if code is not null and order for the same.
For code_1, order will be 1.
For code_2, order will be 2.
Output df
SALES_NO SALES_LINE_NUM CODE ORDER
123 1 ABC 1
123 1 E456 2
123 1 GHF989 3
123 2 EDF 1
123 2 EFHJ 2
123 2 WAEWA 3
234 1 2345 1
234 1 985E 2
234 1 AWW 3
234 2 WERWE 1
234 3 ERC 1
234 3 AERER 2
456 1 WER 1
456 1 AWER 2
Can anyone please help? Thanks in advance
For this dataset:
var ds = spark.sparkContext.parallelize(Seq(
(123, 1, "ABC", "E456", "GHF989"),
(123, 2, "EDF", "EFHJ", "WAEWA"),
(234, 1, "2345", "985E", "AWW"),
(234, 2, "WERWE", "", ""),
(234, 3, "ERC", "AERER", ""),
(456, 1, "WER", "AWER", ""),
)).toDF("SALES_NO", "SALE_LINE_NUM", "CODE_1", "CODE_3", "CODE_2")
We need to unpivot through stack as below:
ds = ds.selectExpr(
"SALES_NO",
"SALE_LINE_NUM",
"stack(3, CODE_1, '1', CODE_2, '2', CODE_3, '3') as (CODE, ORDER)"
)
Which should give what you want:
+--------+-------------+------+-----+
|SALES_NO|SALE_LINE_NUM|CODE |ORDER|
+--------+-------------+------+-----+
|123 |1 |ABC |1 |
|123 |1 |GHF989|2 |
|123 |1 |E456 |3 |
|123 |2 |EDF |1 |
|123 |2 |WAEWA |2 |
|123 |2 |EFHJ |3 |
|234 |1 |2345 |1 |
|234 |1 |AWW |2 |
|234 |1 |985E |3 |
|234 |2 |WERWE |1 |
+--------+-------------+------+-----+
More about unpivoting can be found here.
Good luck!

Conditional Order By In SQL Server

I have some difficulties working on a query with conditional order by.
Here is a sample data :
|Id | Date-In | Date-Out |
|1 | 01/01/21 | NULL |
|2 | 03/01/21 | NULL |
|3 | 05/01/20 | 11/01/21 |
|3 | 12/01/21 | NULL |
|4 | 12/12/21 | 15/01/21 |
|5 | 17/01/21 | 21/01/21 |
I want to sort the data like this :
|Id | Date-In | Date-Out |
|5 | 17/01/21 | 21/01/21 |
|4 | 12/12/20 | 15/01/21 |
|3 | 05/01/21 | 11/01/21 |
|3 | 12/01/21 | NULL |
|2 | 03/01/21 | NULL |
|1 | 01/01/21 | NULL |
When there is a move-out sort DESC
When there is a move-out and a move-in sort move-out DESC, move-in DESC, Id DESC (Bold) Else sort move-in DESC
I tried this solution :
SELECT Id
,[move_in_date]
,[move_out_date]
,LAG(move_out_date) OVER(Partition By Id ORDER BY move_in_date) as Lag_out
FROM MyTable
ORDER BY isnull(move_out_date,LAG(move_out_date) OVER(Partition By Id ORDER BY move_in_date)) DESC
I don't know if this will work with every scenario i mentioned in my first post

Rank by 2 different levels of partitioning/grouping

I have this set of data using Microsoft SQL Server Management Studio
Category|pet name| date |food price|vet expenses|vat
A | jack |2017-08-28| 12.98 | 2424 |23
A | jack |2017-08-29| 2339 | 2424 |23
A | smithy |2017-08-28| 22.35 | 2324 |12
A | smithy |2017-08-29| 123.35 | 2432 |23
B | casio |2017-08-28| 11.38 | 44324 |32
B | casio |2017-08-29| 2.24 | 3232 |43
B | lala |2017-08-28| 343.36 | 42342 |54
B | lala |2017-08-29| 34.69 | 22432 |54
C | blue |2017-08-28| 223.02 | 534654 |78
C | blue |2017-08-29| 321.01 | 6654 |67
C | collie |2017-08-28| 232.05 | 4765 |43
C | collie |2017-08-29| 233.03 | 4654 |65
What I want to do is rank by food price, but group by category, order by category, pet name, date and then rank by vet expenses, but group by category, order by category, pet name, date and then rank by vat, but group by category, order by category, pet name, date.
I'm thinking this will be a join statement for the table above?
Something exactly like below:
Category|pet name| date |food price|vet expenses|vat|Rankfp|Rankve|Rankvat
A | jack |2017-08-28| 12.98 | 2424 |23 | 2 | 1 |1
A | jack |2017-08-29| 2339 | 2424 |23 | 1 | 2 |1
A | smithy |2017-08-28| 22.35 | 2324 |12 | 1 | 2 |2
A | smithy |2017-08-29| 123.35 | 2432 |22 | 2 | 1 |2
B | casio |2017-08-28| 11.38 | 44324 |32 | 2 | 1 |2
B | casio |2017-08-29| 2.24 | 3232 |43 | 2 | 2 |2
B | lala |2017-08-28| 343.36 | 42342 |54 | 1 | 2 |1
B | lala |2017-08-29| 34.69 | 22432 |54 | 1 | 1 |1
C | blue |2017-08-28| 223.02 | 534654 |78 | 2 | 1 |1
C | blue |2017-08-29| 321.01 | 6654 |67 | 1 | 1 |1
C | collie |2017-08-28| 232.05 | 4765 |43 | 1 | 2 |2
C | collie |2017-08-29| 233.03 | 4654 |65 | 2 | 2 |2
NB: this is not needed in the final output but to make it more readable I have ordered the outcome by category, pet name, date:
Category|pet name| date |food price|vet expenses|vat|Rankfp|Rankve|Rankvat
A | jack |2017-08-28| 12.98 | 2424 |23 | 2 | 1 |1
A | smithy |2017-08-28| 22.35 | 2324 |12 | 1 | 2 |2
A | jack |2017-08-29| 2339 | 2424 |23 | 1 | 2 |1
A | smithy |2017-08-29| 123.35 | 2432 |22 | 2 | 1 |2
B | casio |2017-08-28| 11.38 | 44324 |32 | 2 | 1 |2
B | lala |2017-08-28| 343.36 | 42342 |54 | 1 | 2 |1
B | lala |2017-08-28| 343.36 | 42342 |54 | 1 | 2 |1
B | lala |2017-08-29| 34.69 | 22432 |54 | 1 | 1 |1
C | blue |2017-08-28| 223.02 | 534654 |78 | 2 | 1 |1
C | collie |2017-08-28| 232.05 | 4765 |43 | 1 | 2 |2
C | blue |2017-08-29| 321.01 | 6654 |67 | 1 | 1 |1
C | collie |2017-08-29| 233.03 | 4654 |65 | 2 | 2 |2
The code I have below only ranks by category, but does not group by food price, vet expenses and vat.
RANK ()OVER(PARTITION BY [Category], [Date] order by [Category] ,[Pet Name],[Date]) as 'Rank'
Would it be a case of grouping the costs separately then left joining the rankings on to the original data?
(I will be using pivots and slicers in excel so want to have all the data on one table/query)
After walking away with some time to refresh my brain i had a eureka moment and solved this. It was actually easy when I thought about it.
so
the code to get the desired table goes something like this:
select *
, rank ()OVER(PARTITION BY [Category], [date] order by [food price], [Category] ,[pet name],[date]) as 'Rankfp'
, rank ()OVER(PARTITION BY [Category], [date] order by [vet expenses], [Category] ,[pet name], [date]) as 'Rankve'
, rank ()OVER(PARTITION BY [Category], [date] order by [vat], [Category] ,[pet name], [date]) as 'Rankvat'
from petcost
order by [category, [pet name]

Counting Retrieved Records from SQL Server

I just want to ask if how I may be able to create a dynamic numbering column based from what I will be retrieving from the database?
Ex.
Table Reservations
|ReservationNo----ClientNo------DateAdded----DateModified|
|1 | 1 | 01-01-01 | 01-01-01 |
|2 | 2 | 01-01-01 | 01-01-01 |
|3 | 2 | 01-01-01 | 01-01-01 |
|4 | 2 | 01-01-01 | 01-01-01 |
|5 | 1 | 01-01-01 | 01-01-01 |
|6 | 3 | 01-01-01 | 01-01-01 |
|7 | 3 | 01-01-01 | 01-01-01 |
|8 | 2 | 01-01-01 | 01-01-01 |
|9 | 1 | 01-01-01 | 01-01-01 |
|10 | 1 | 01-01-01 | 01-01-01 |
When I execute the statement below...
SELECT * FROM Table WHERE ClientNo = '1'
Result :
**Counter**-----ReservationNo----Client--------DateAdded----DateModified|
|1 | 1 | 1 | 01-01-01 | 01-01-01 |
|2 | 5 | 1 | 01-01-01 | 01-01-01 |
|3 | 9 | 1 | 01-01-01 | 01-01-01 |
|4 | 10 | 1 | 01-01-01 | 01-01-01 |
You could use the row_number() function:
select row_number() over (order by ReservationNo) as Counter
, *
from YourTable
order by
ReservationNo
Looks like you're searching for the ROW_NUMBER function, see http://msdn.microsoft.com/de-de/library/ms186734.aspx
It seems you need total the number of available rows in the table you fetch with each condition/query.
If thats the case COUNT(*) OVER() will meet your requirements.
SELECT ReservationNo
,ClientNo
,DateAdded
,DateModified
,COUNT(*) OVER()
FROM Reservations
WHERE condition if required

Select top n records based on ordinal and attribute data

I have a case where I need to show only the top rows based on a setting in a table and the ordinal set.
Example dataset below shows two customers; each of the customers have a different product.
Since NumRowsToShow is "1" I only want to show one row (the top row based on ordinal) for EACH Customer.
| CustomerID | ProductID | Ordinal | NumRowsToShow |
+------------+-----------+---------+---------------+
| 1 |A |1 |1 |
| 1 |B |2 |1 |
| 1 |C |3 |1 |
| 5 |D |1 |1 |
| 5 |E |2 |1 |
| 5 |F |3 |1 |
The result set after query is run should be
| CustomerID | ProductID |
+------------+-----------+
| 1 |A |
| 5 |D |
In the same scenario if NumRowsToShow were 1 for customerID 1 and 2 for CustomerID 5 I would see something like.
| CustomerID | ProductID | Ordinal | NumRowsToShow |
+------------+-----------+---------+---------------+
| 1 |A |1 |1 |
| 1 |B |2 |1 |
| 1 |C |3 |1 |
| 5 |D |1 |2 |
| 5 |E |2 |2 |
| 5 |F |3 |2 |
The result set after query is run should be
| CustomerID | ProductID |
+------------+-----------+
| 1 |A |
| 5 |D |
| 5 |E |
How can this be done?
Including a screen cap of actual result set with highlights of what I'm trying to filter down to which may be a little helpful.
(source: harpernet.net)
It feels like "cheating in the exams":
SELECT CustomerID, ProductID
FROM tableX
WHERE Ordinal <= NumRowsToShow
If, as comments suggest, the Ordinal can have 10, 20, 30 values and not only 1, ..., n values, then this will work:
SELECT t.CustomerID, t.ProductID
FROM tableX AS t
JOIN tableX AS tt
ON tt.CustomerID = t.CustomerID
AND tt.Ordinal <= t.Ordinal
GROUP BY t.CustomerID
, t.ProductID
, t.NumRowsToShow
HAVING COUNT(*) <= t.NumRowsToShow
or even better, the:
SELECT CustomerID, ProductID
FROM
( SELECT CustomerID, ProductID, NumRowsToShow
, ROW_NUMBER() OVER( PARTITION BY CustomerID
ORDER BY Ordinal
) AS Rn
FROM tableX
) AS tmp
WHERE Rn <= NumRowsToShow ;
Test in: SQL-Fiddle
Your table looks to be not normalized. The NumRowsToShow columns has duplicate infomation and that can lead to update anomalies. This:
| CustomerID | ProductID | Ordinal | NumRowsToShow |
+------------+-----------+---------+---------------+
| 1 |A |1 |1 |
| 1 |B |2 |1 |
| 1 |C |3 |1 |
| 5 |D |1 |2 |
| 5 |E |2 |2 |
| 5 |F |3 |2 |
could be normalized to 2 tables:
| CustomerID | ProductID | Ordinal |
+------------+-----------+---------+
| 1 |A |1 |
| 1 |B |2 |
| 1 |C |3 |
| 5 |D |1 |
| 5 |E |2 |
| 5 |F |3 |
and:
| CustomerID | NumRowsToShow |
+------------+---------------+
| 1 |1 |
| 5 |2 |

Resources