This question already has an answer here:
SQL transpose full table
(1 answer)
Closed 8 years ago.
I have a table like this:
╔════════╦═══╦═══╦═══╦═══╦═══╗
║ row_id ║ 1 ║ 2 ║ 3 ║ 4 ║ 5 ║
╠════════╬═══╬═══╬═══╬═══╬═══╣
║ 1 ║ T ║ E ║ S ║ N ║ U ║
║ 2 ║ M ║ B ║ R ║ H ║ A ║
║ 3 ║ C ║ D ║ F ║ G ║ I ║
║ 4 ║ J ║ K ║ L ║ O ║ P ║
║ 5 ║ V ║ W ║ X ║ Y ║ Z ║
╚════════╩═══╩═══╩═══╩═══╩═══╝
I want to "pivot" the table to get an outcome where the row_id column is the first row, the 1 column the second etc.
The results should look like this:
╔════════╦═══╦═══╦═══╦═══╦═══╗
║ row_id ║ 1 ║ 2 ║ 3 ║ 4 ║ 5 ║
╠════════╬═══╬═══╬═══╬═══╬═══╣
║ 1 ║ T ║ M ║ C ║ J ║ V ║
║ 2 ║ E ║ B ║ D ║ K ║ W ║
║ 3 ║ S ║ R ║ F ║ L ║ X ║
║ 4 ║ N ║ H ║ G ║ O ║ Y ║
║ 5 ║ U ║ A ║ I ║ P ║ Z ║
╚════════╩═══╩═══╩═══╩═══╩═══╝
I've looked for ideas about Pivoting without aggregates but without much luck, mainly since the data I want to pivot is non numeric.
I've set up the sample data in SQL Fiddle.
Thanks!
What you need is called "matrix transposition". The optimal SQL query will depend very much on the actual way you store the data, so it wouldn't hurt if you will provide more realistic example of your table' structure. Are you sure all matrices you will ever need to work with will be exactly 5*5 ? :)
UPD: Oh, I see you've found it.
I realized my mistake was looking for pivot and not for transpose.
I found an answer here and solved the problem with the following query:
SELECT *
FROM (SELECT row_id,
col,
value
FROM table1
UNPIVOT ( value
FOR col IN ([1],
[2],
[3],
[4],
[5]) ) unpiv) src
PIVOT ( Max(value)
FOR row_id IN ([1],
[2],
[3],
[4],
[5]) ) piv
The results are on SQL Fiddle.
Related
I have the following table, it displays the SalesQty and the StockQty grouped by Article, Supplier, Branch and Month.
╔════════╦════════╦══════════╦═════════╦══════════╦══════════╗
║ Month ║ Branch ║ Supplier ║ Article ║ SalesQty ║ StockQty ║
╠════════╬════════╬══════════╬═════════╬══════════╬══════════╣
║ 201811 ║ 333 ║ 2 ║ 3122 ║ 4 ║ 11 ║
║ 201811 ║ 345 ║ 1 ║ 1234 ║ 2 ║ 10 ║
║ 201811 ║ 345 ║ 1 ║ 4321 ║ 3 ║ 11 ║
║ 201812 ║ 333 ║ 2 ║ 3122 ║ 2 ║ 4 ║
║ 201812 ║ 345 ║ 1 ║ 1234 ║ 3 ║ 12 ║
║ 201812 ║ 345 ║ 1 ║ 4321 ║ 4 ║ 5 ║
║ 201901 ║ 333 ║ 2 ║ 3122 ║ 1 ║ 8 ║
║ 201901 ║ 345 ║ 1 ║ 1234 ║ 6 ║ 9 ║
║ 201901 ║ 345 ║ 1 ║ 4321 ║ 2 ║ 8 ║
║ 201902 ║ 333 ║ 2 ║ 3122 ║ 7 ║ NULL ║
║ 201902 ║ 345 ║ 1 ║ 1234 ║ 4 ║ 13 ║
║ 201902 ║ 345 ║ 1 ║ 4321 ║ 1 ║ 10 ║
╚════════╩════════╩══════════╩═════════╩══════════╩══════════╝
Now I want to sum the SalesQty and get the latest StockQty and group them by Article, Supplier, Branch.
The final result should look like this:
╔════════╦══════════╦═════════╦═════════════╦════════════════╗
║ Branch ║ Supplier ║ Article ║ SumSalesQty ║ LatestStockQty ║
╠════════╬══════════╬═════════╬═════════════╬════════════════╣
║ 333 ║ 2 ║ 3122 ║ 14 ║ NULL ║
║ 345 ║ 1 ║ 1234 ║ 15 ║ 13 ║
║ 345 ║ 1 ║ 4321 ║ 10 ║ 10 ║
╚════════╩══════════╩═════════╩═════════════╩════════════════╝
I already tried this but it gives me an error, and i have no idea what i have to do in this case.
I've made this example so you can try it by yourself. db<>fiddle
SELECT
Branch,
Supplier,
Article,
SumSalesQty = SUM(SalesQty),
-- my attempt
LatestStockQty = (SELECT StockQty FROM TestTable i
WHERE MAX(Month) = Month
AND TT.Branch = i. Branch
AND TT.Supplier = i.Branch
AND TT.Article = i.Branch)
FROM
TestTable TT
GROUP BY
Branch, Supplier, Article
Thank you for your help!
We can try using ROW_NUMBER here, to isolate the latest record for each group:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY Branch, Supplier, Article
ORDER BY Month DESC) rn,
SUM(SalesQty) OVER (PARTITION BY Branch, Supplier, Article) SumSalesQty
FROM TestTable t
)
SELECT
Month,
Branch,
Supplier,
Article,
SumSalesQty,
StockQty
FROM cte
WHERE rn = 1;
Inside the CTE we compute, for each Branch/Supplier/Article group a row number value, starting with 1 for the most recent month. We also compute the sum of the sales quantity over the same partition. Then, we only need to select all rows from that CTE where the row number is equal to 1.
Demo
A similar approach but without the CTE
SELECT top 1 with ties
Branch
, Supplier
, Article
, SUM(SalesQty) OVER (PARTITION BY Branch, Supplier, Article) SumSalesQty
, tt.StockQty as LatestStockQty
FROM TestTable TT
order by ROW_NUMBER() OVER (PARTITION BY Branch, Supplier, Article ORDER BY Month DESC)
I have two tables with name of WH_table and Store_table,I am trying to create query to get result as mentioned below result table ,Can you anyone help to create query
Warehouse table
╔══════════════╦═════╗
║ Item ║ Qty ║
╠══════════════╬═════╣
║ Foot-ball ║ 1 ║
║ Foot-ball ║ 1 ║
║ Gloves ║ 1 ║
║ Track suites ║ 1 ║
╚══════════════╩═════╝
Store table
╔═══════════╦═════╗
║ Item ║ Qty ║
╠═══════════╬═════╣
║ Foot-ball ║ 1 ║
║ Foot-ball ║ 1 ║
║ Gloves ║ 1 ║
╚═══════════╩═════╝
Result
╔════════════╦═══════════╦══════════════╗
║ Item ║ Qty in WH ║ Qty in Store ║
╠════════════╬═══════════╬══════════════╣
║ Foot-ball ║ 2 ║ 2 ║
║ Gloves ║ 1 ║ 1 ║
║ Tracksuite ║ 1 ║ 0 ║
╚════════════╩═══════════╩══════════════╝
You can use a FULL JOIN:
SELECT ISNULL(w.Item,s.Item) Item,
ISNULL(w.Qty,0) Qty_In_WH,
ISNULL(s.Qty,0) Qty_In_Store
FROM ( SELECT Item,
SUM(Qty) Qty
FROM dbo.Warehouse
GROUP BY Item) w
FULL JOIN ( SELECT Item,
SUM(Qty) Qty
FROM dbo.Store
GROUP BY Item) s
ON w.Item = s.Item;
Question: What is the most computationally efficient way to determine if two bike riders rode together given a stream of data with time, latitude, and longitude?
Background: I'm an avid cyclist and want to reverse engineer how Strava groups bike riders together. Here is their method to determine if cyclists are riding together (they use time and lat/lon of a ride): https://support.strava.com/hc/en-us/articles/216919497-Why-don-t-I-get-grouped-in-Activities-when-I-rode-ran-with-others-
After a bike ride is complete I have a file of latitude and longitude every second.
Rider 1 Route:
Rider 2 Route:
You can see Rider 1 and 2 rode together, but Rider 2 started from a different spot and joined Rider 1 later.
I want to come up with the least computational intensive way of determining these two riders rode together, despite starting from different locations.
I think Strava's approach is good - basically establish a proximity zone (150 meters) around each point on the route and compare routes of the rider to see if the riders spent 70% of their time within 150 meters of each other.
Rider 1 - Locations:
2016-03-27T11:47:45Z 42.113059 -87.736485
2016-03-27T11:47:46Z 42.113081 -87.736511
2016-03-27T11:47:47Z 42.113105 -87.736538
2016-03-27T11:47:48Z 42.113142 -87.736564
2016-03-27T11:47:49Z 42.113175 -87.736587
Rider 2 - Locations:
-2016-03-27T11:47:45Z 42.113049 -87.736394 <= Find same time of Rider 1 and determine if within 150 meters. If < 150 meters assign 1, if > 150 assign 0.
I would iterate over every point of Rider 2 against every point of Rider 1. Then sum up the 1s and 0s. If the (sum of 1s and 0s) / (total points) is greater than 70% riders are grouped together.
I think this method would generally work, but seems very computational intensive, especially if there are thousands of riders to evaluate. Also, the data does not always have latitude and longitude every second. One method would be to average the location every minute and compare the average location by minute. At least it would reduce iterations by 60 times.
I was hoping there was some statistical or GIS method to establish the "signature" of a route and compare signatures rather than a point by point comparison.
Any thoughts on how to compute the route comparison in the most efficient way?
Note: I posted a similar question on the GIS forum, but no one responded yet. Although, I do think the question written here is more clear. https://gis.stackexchange.com/questions/187019/strava-activity-route-grouping
I'm going to assume the following is true:
for each cyclist C, there is a data stream of time T, longitude X and latitude Y (we're using projected X and Y for simplification, not caring about the projection; however, we should)
data stream can be written into database or another kind of persistent data storage
the data stream for C is sampled at rate of 1s, given that there is no guarantee that every sample is taken; we have to assume that sample is taken in more than 50% cases (preferably > 95%; 99,7% would be perfect)
In that case, one table in database contains all of the data needed for analytics. Let's see what does it look like for two cyclist C1 and C2, compared one to another.
╔════╦════╦════╦════╦════╦═══════╗
║ T ║ X1 ║ Y1 ║ X2 ║ Y2 ║ D ║
╠════╬════╬════╬════╬════╬═══════╣
║ 1 ║ 10 ║ 15 ║ - ║ - ║ - ║
║ 2 ║ 11 ║ 16 ║ - ║ - ║ - ║
║ 3 ║ 11 ║ 17 ║ 19 ║ 11 ║ 10,00 ║
║ 4 ║ 12 ║ 18 ║ 18 ║ 11 ║ 9,22 ║
║ 5 ║ 12 ║ 17 ║ 17 ║ 12 ║ 7,07 ║
║ 6 ║ - ║ - ║ 15 ║ 12 ║ - ║
║ 7 ║ 13 ║ 16 ║ 14 ║ 13 ║ 3,16 ║
║ 8 ║ 13 ║ 15 ║ 13 ║ 14 ║ 1,00 ║
║ 9 ║ 14 ║ 14 ║ 13 ║ 14 ║ 1,00 ║
║ 10 ║ 14 ║ 13 ║ 14 ║ 13 ║ 0,00 ║
║ 11 ║ 14 ║ 14 ║ 14 ║ 14 ║ 0,00 ║
║ 12 ║ 14 ║ 15 ║ 14 ║ 14 ║ 1,00 ║
║ 13 ║ 15 ║ 15 ║ 15 ║ 15 ║ 0,00 ║
║ 14 ║ 15 ║ 16 ║ 15 ║ 16 ║ 0,00 ║
║ 15 ║ 16 ║ 16 ║ 16 ║ 17 ║ 1,00 ║
║ 16 ║ 17 ║ 18 ║ 16 ║ 16 ║ 2,24 ║
╚════╩════╩════╩════╩════╩═══════╝
This comparison can easily be done using e.g. SELECT in database, self-joining a table for two cyclists. For a reasonable number of rows (e.g. <10E5, <10E6) and correctly set indexes, this computation is not resource intensive at all. Especially if we take into the consideration that the database query can be written in such a way that value D is not output for every position, but calculated jut in order to aggregate (count) the value. In that case, all you need is a ratio of count of rows where D is less of equal your preferred treshold D0 vs total count of rows. If that ratio is equal or more than your limit (say, 70%), cyclists went on a ride together.
Let's see an example. If there is such table in the database, named CyclistPosition:
CyclistId - identifier of the cyclist
SamplingTime - UTC time of the sample (position) taken
Long - longitude
Lat - latitude
...with the following data:
╔═══════════╦═══════════════════════╦═══════════╦════════════╗
║ CyclistId ║ SamplingTime ║ Long ║ Lat ║
╠═══════════╬═══════════════════════╬═══════════╬════════════╣
║ 1 ║ 2016-03-27T11:47:45Z ║ 42,113059 ║ -87,736485 ║
║ 1 ║ 2016-03-27T11:47:46Z ║ 42,113081 ║ -87,736511 ║
║ 1 ║ 2016-03-27T11:47:47Z ║ 42,113105 ║ -87,736538 ║
║ 1 ║ 2016-03-27T11:47:48Z ║ 42,113142 ║ -87,736564 ║
║ 1 ║ 2016-03-27T11:47:49Z ║ 42,113175 ║ -87,736587 ║
║ 2 ║ 2016-03-27T11:47:45Z ║ 42,113059 ║ -87,736394 ║
║ 2 ║ 2016-03-27T11:47:46Z ║ 42,113085 ║ -87,736481 ║
║ 2 ║ 2016-03-27T11:47:47Z ║ 42,113103 ║ -87,736531 ║
║ 2 ║ 2016-03-27T11:47:48Z ║ 42,113139 ║ -87,736572 ║
║ 2 ║ 2016-03-27T11:47:49Z ║ 42,113147 ║ -87,736595 ║
╚═══════════╩═══════════════════════╩═══════════╩════════════╝
...then we can extract data for the cyclists 1 and 2 using:
SELECT SamplingTime, Long, Lat FROM CyclistPosition WHERE CyclistId = 1
SELECT SamplingTime, Long, Lat FROM CyclistPosition WHERE CyclistId = 2
...and cross-reference that data using this query...
SELECT
cp1.SamplingTime,
Long1 = cp1.Long,
Lat1 = cp1.Lat,
Long2 = cp2.Long,
Lat2 = cp2.Lat
FROM
CyclistPosition cp1
JOIN CyclistPosition cp2
ON cp2.SamplingTime = cp1.SamplingTime
WHERE
cp1.CyclistId = 1
AND cp2.CyclistId = 2
We now have this kind of output, and if we include rougly calculated X and Y (using Mercator), we get:
╔═══════════════════════╦═══════════╦════════════╦═══════════╦════════════╦══════════════╗
║ SamplingTime ║ Long1 ║ Lat1 ║ Long2 ║ Lat2 ║ Dm ║
╠═══════════════════════╬═══════════╬════════════╬═══════════╬════════════╬══════════════╣
║ 2016-03-27T11:47:45Z ║ 42,113059 ║ -87,736485 ║ 42,113059 ║ -87,736394 ║ 10,118517 ║
║ 2016-03-27T11:47:46Z ║ 42,113081 ║ -87,736511 ║ 42,113085 ║ -87,736481 ║ 3,334919 ║
║ 2016-03-27T11:47:47Z ║ 42,113105 ║ -87,736538 ║ 42,113103 ║ -87,736531 ║ 0,777079 ║
║ 2016-03-27T11:47:48Z ║ 42,113142 ║ -87,736564 ║ 42,113139 ║ -87,736572 ║ 0,890572 ║
║ 2016-03-27T11:47:49Z ║ 42,113175 ║ -87,736587 ║ 42,113147 ║ -87,736595 ║ 0,900635 ║
╚═══════════════════════╩═══════════╩════════════╩═══════════╩════════════╩══════════════╝
Note that for a rough calculation of distance in meters you have to find the formula; I used the one here:
http://bluemm.blogspot.hr/2007/01/excel-formula-to-calculate-distance.html
Now we have to aggregate the data and count it. We have to limit the data to start and end time (T1 and T2) and establish the maximum distance (D0) to say cyclists are riding together. The simple way to do that in SQL would be:
DECLARE #togetherPositions int
DECLARE #allPositions int
DECLARE #ratio decimal(18,2)
SELECT #togetherPositions = count(*)
FROM
CyclistPosition cp1
JOIN CyclistPosition cp2
ON cp2.SamplingTime = cp1.SamplingTime
WHERE
cp1.SamplingTime BETWEEN #T1 AND #T2
AND {formula to get distance in meters} <= #D0
SELECT #allPositions = count(*)
FROM
CyclistPosition cp1
JOIN CyclistPosition cp2
ON cp2.SamplingTime = cp1.SamplingTime
WHERE
cp1.SamplingTime BETWEEN #T1 AND #T2
SET #ratio = #togetherPositions / #allPositions * 1.0
Now you just have to decide if the ratio is 0.7, 0.8, 0.85...
HTH
I have a table containing a set of links that form a hierarchy. The big problem is that each link may be used several times (in different positions). I need to be able to distinguish between each "instance" of each node.
For example in the following data, link "D-G" will show up several times:
╔════════════╦════════╗
║ SOURCE ║ TARGET ║
╠════════════╬════════╣
║ A ║ B ║
║ A ║ C ║
║ B ║ D ║
║ B ║ E ║
║ B ║ F ║
║ C ║ D ║
║ C ║ E ║
║ C ║ F ║
║ D ║ G ║
║ E ║ D ║
║ F ║ D ║
╚════════════╩════════╝
I can build the hierarchy using a recursive CTE without any problems, but I want to give each row in the results a unique ID and link it to the parent node's unique ID.
My original idea was to assign a unique ID to each row using Row_Number() + Max(ID) up to this point and have the row inherit it's parents ID, but further reading and trial & error showed that this wont work :-(
Does anybody have an idea how to solve this problem (or at least give me a clue)?
The results should be something like this:
╔═════════════╦═════════════╦═══════════╦═══════════╗
║ SOURCE_DESC ║ TARGET_DESC ║ Source_ID ║ Target_ID ║
╠═════════════╬═════════════╬═══════════╬═══════════╣
║ A ║ B ║ 0 ║ 1 ║
║ A ║ C ║ 0 ║ 2 ║
║ B ║ D ║ 1 ║ 6 ║
║ B ║ E ║ 1 ║ 7 ║
║ B ║ F ║ 1 ║ 8 ║
║ C ║ D ║ 2 ║ 3 ║
║ C ║ E ║ 2 ║ 4 ║
║ C ║ F ║ 2 ║ 5 ║
║ D ║ G ║ 3 ║ 13 ║
║ E ║ D ║ 4 ║ 11 ║
║ F ║ D ║ 5 ║ 10 ║
║ D ║ G ║ 6 ║ 14 ║
║ E ║ D ║ 7 ║ 12 ║
║ F ║ D ║ 8 ║ 9 ║
║ D ║ G ║ 9 ║ 18 ║
║ D ║ G ║ 10 ║ 17 ║
║ D ║ G ║ 11 ║ 16 ║
║ D ║ G ║ 12 ║ 15 ║
╚═════════════╩═════════════╩═══════════╩═══════════╝
Here the "D-G" link shows up several times, but in each instance it has a different ID and a different parent ID!
I've managed to do it but I'm not happy with the way I did it. It doesn't seems very efficient (not important for this example but very important for much larger sets!)
WITH JUNK_DATA
AS (SELECT *,
ROW_NUMBER()
OVER (
ORDER BY SOURCE) RN
FROM LINKS),
RECUR
AS (SELECT T1.SOURCE,
T1.TARGET,
CAST('ROOT' AS VARCHAR(MAX)) NAME,
1 AS RAMA,
CAST(T1.RN AS VARCHAR(MAX)) + ',' AS FULL_RAMA
FROM JUNK_DATA T1
LEFT JOIN JUNK_DATA T2
ON T1.SOURCE = T2.TARGET
WHERE T2.TARGET IS NULL
UNION ALL
SELECT JUNK_DATA.SOURCE,
JUNK_DATA.TARGET,
CASE
WHEN RAMA = 1 THEN (SELECT [DESC]
FROM NAMES
WHERE ID = JUNK_DATA.SOURCE)
ELSE NAME
END NAME,
RAMA + 1 AS RAMA,
FULL_RAMA
+ CAST(JUNK_DATA.RN AS VARCHAR(MAX)) + ','
FROM (SELECT *
FROM JUNK_DATA)JUNK_DATA
INNER JOIN (SELECT *
FROM RECUR) RECUR
ON JUNK_DATA.SOURCE = RECUR.TARGET),
FINAL_DATA
AS (SELECT T2.[DESC] SOURCE_DESC,
T3.[DESC] TARGET_DESC,
RECUR.*,
ROW_NUMBER()
OVER (
ORDER BY RAMA) ID
FROM RECUR
INNER JOIN NAMES T2
ON RECUR.SOURCE = T2.ID
INNER JOIN NAMES T3
ON RECUR.TARGET = T3.ID)
SELECT T1.SOURCE_DESC,
T1.TARGET_DESC,
ISNULL(T2.ID, 0) AS SOURCE_ID,
T1.ID TARGET_ID
FROM FINAL_DATA T1
LEFT JOIN (SELECT ID,
FULL_RAMA
FROM FINAL_DATA)T2
ON LEFT(T1.FULL_RAMA, LEN(T1.FULL_RAMA) - CHARINDEX(',',
REVERSE(T1.FULL_RAMA), 2))
+ ',' = T2.FULL_RAMA
ORDER BY SOURCE_ID,
TARGET_ID
Check it out on SQL fiddle.
The query:
select Escuser,Eslevel from WF_UserConfiguration
is returning me the table bellow:
╔═════════════════════╗
║ Escuser Eslevel ║
╠═════════════════════╣
║ A000 1 ║
║ A010 4 ║
║ A021 3 ║
║ ABCD 1 ║
║ C067 3 ║
║ C099 1 ║
║ C252 2 ║
╚═════════════════════╝
My problem is I want to get the following output
╔═════════════════════════════╗
║ 1 2 3 4 ║
╠═════════════════════════════╣
║ A000 C252 A021 A010 ║
║ ABCD C067 ║
║ C099 ║
╚═════════════════════════════╝
The table headers 1, 2, 3 and 4 are EsLevel values of first query result.
How should I get the following result (I mean what query)?
The answer using pivot:
See live demo
select
[1],
[2],
[3],
[4]
from
(
select
Escuser,
Eslevel,
Row_number() over(partition by Eslevel order by escuser asc) as r
from WF_UserConfiguration
)src
pivot
(
max(escuser)
for Eslevel in
(
[1],[2],[3],[4]
)
)p