UNNEST array and assign to new columns with CASE WHEN - arrays

I have following BigQuery table, which has nested structure, i.e. example below is one record in my table.
Id | Date | Time | Code
AQ5ME | 120520 | 0950 | 123
---------- | 150520 | 1530 | 456
My goal is to unnest the array to achieve the following structure (given that 123 is the Start Date code and 456 is End Date code):
Id | Start Date | Start Time | End Date | End Time
AQ5ME | 120520 | 0950 | 150520 | 1530
I tried basic UNNEST in BigQuery and my results are as follows:
Id | Start Date | Start Time | End Date | End Time
AQ5ME | 120520 | 0950 | NULL | NULL
AQ5ME | NULL | NULL | 150520 | 1530
Could you please support me how to unnest it in a correct way as described above?

You can calculate mins and max within the row, and extract them as a new column.
Since you didn't show the full schema, I assume Date and Time are separate arrays.
For that case, you can use that query:
SELECT Id,
(SELECT MIN(D) from UNNEST(Date) as d) as StartDate,
(SELECT MIN(t) from UNNEST(Time) as t) as StartTime,
(SELECT MAX(D) from UNNEST(Date) as d) as EndDate,
(SELECT MAX(t) from UNNEST(Time) as t) as EndTime
FROM table

As in Sabri's response - using aggregation functions while unnesting works perfectly. To use this fields later on for sorting purposes (in ORDER BY statement) SAFE_OFFSET[0] can be used, like for example below:
...
ORDER BY StartDate[SAFE_OFFSET(0)] ASC

Related

Group by a value if it exists otherwise group by another value of the same column

I have a table like this
| Id | ExternalId | Type | Date | StatusCode |
-------------------------------------------------------
| 1 | 123 | 25 | 2020-01-01 | A |
| 2 | 123 | 25 | 2020-01-02 | A |
| 5 | 125 | 25 | 2020-01-01 | A |
| 6 | 125 | 25 | 2020-01-02 | B |
| 3 | 124 | 25 | 2020-01-01 | B |
| 4 | 124 | 25 | 2020-01-02 | A |
I need to take just one row for each ExternalId having the Max(Date) and having the StatusCode = B if B exists, otherwise the StatusCode = A
So, the expected result is
| Id | ExternalId | Type | Date | StatusCode |
-------------------------------------------------------
| 2 | 123 | 25 | 2020-01-02 | A | <--I take Max Date and the StatusCode of the same row
| 6 | 125 | 25 | 2020-01-02 | B | <--I take Max Date and the StatusCode of the same row
| 3 | 124 | 25 | 2020-01-02 | B | <--I take Max Date and B, even if the Status code of the Max Date is A
Here the query I have tried to write:
SELECT ExternalId, Type, EntityType, Max(Date) as Date
From MyTable
group by ExternalId, Type, EntityType
But I cannot finish it.
If I understand your requirements, this could be, what you want:
SELECT ExternalId, Type, MAX(Date) AS Date, MAX(StatusCode) AS StatusCode
FROM MyTable
GROUP BY ExternalId, Type
Explanation:
You want the Max of StatusCode, because B is greater than A. You want the Max of Date, no matter what StatusCode is shown. And you want it for each ExternalId. Therefore you have to Group by ExternalId.
Furthermore, you Need also the Type shown, and as it's no group function, the query has to be grouped by type either. It's no problem though, because type is dependent on ExternalId ( or at least in your example data, it is).
As far as I understand from your sql, you also need to group by Type and EntityType. If it’s correct, you can write max with condition for 'B' and another max for all rows and use those results in isnull or coalesce function like this:
Select
t.ExternalId
,t.Type
,t.EntityType
,isnull(
max(iif(t.StatusCode='B', t.Date, null))
,max(t.Date)
) as Date
From MyTable t
Group by
t.ExternalId
,t.Type
,t.EntityType
You want to filter instead of aggregate. One solution is to use row_number():
select *
from (
select
t.*,
row_number() over(partition by ExternalId order by StatusCode desc, Date desc) rn
from mytable t
) t
where rn = 1
The order by clause of row_number() puts rows with StatusCode = 'B' first, and then orders by descending date.
This works because StatusCode has only two values, and because 'B'> 'A'. If your real data has different values (or more than 2 values), then you would need something more explicit, like:
order by case when StatusCode = 'B' then 0 else 1 end, Date desc
Here is the Query, Which can help you.
SELECT Externalid, MAX([Date]) as 'Date', MAX(StatusCode) 'StatusCode' from MyTable Group by Externalid
In your expected result, you have added the id column which cannot added here, if you want to have values from multiple rows.
Result will be
|123|2020-01-02|A|
|124|2020-01-02|B|
|125|2020-01-02|B|

SQL Server find sum of values based on criteria within another table

I have a table consisting of ID, Year, Value
---------------------------------------
| ID | Year | Value |
---------------------------------------
| 1 | 2006 | 100 |
| 1 | 2007 | 200 |
| 1 | 2008 | 150 |
| 1 | 2009 | 250 |
| 2 | 2005 | 50 |
| 2 | 2006 | 75 |
| 2 | 2007 | 65 |
---------------------------------------
I then create a derived, aggregated table consisting of an ID, MinYear, and MaxYear
---------------------------------------
| ID | MinYear | MaxYear |
---------------------------------------
| 1 | 2006 | 2009 |
| 2 | 2005 | 2007 |
---------------------------------------
I then want to find the sum of Values between the MinYear and MaxYear foreach ID in the aggregated table, but I am having trouble determining a proper query.
The final table should look something like this
----------------------------------------------------
| ID | MinYear | MaxYear | SumVal |
----------------------------------------------------
| 1 | 2006 | 2009 | 700 |
| 2 | 2005 | 2007 | 190 |
----------------------------------------------------
Right now I can perform all the joins to create the second table. But then I use a fast forward cursor to iterate through each record of the second table with the code inside the for loop looking like the following
DECLARE #curMin int
DECLARE #curMax int
DECLARE #curID int
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT Sum(Value) FROM ValTable WHERE Year >= #curMin and Year <= #curMax and ID = #curID
Group By ID
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
Having found the sum of values between specified years, I can connect it back to the second table and I wind up the desired result (the third table).
However, the second table in reality is roughly 4 million rows, so this iteration is extremely time consuming (~generating 300 results a minute) and presumably not the best solution.
My question is, is there a way to generate the third table's results without having to use a cursor/for loop?
During a group by the sum will only be for the ID in question -- since the min year and max year is for the ID itself then you don't need to double query. The query below should give you exactly what you need. If you have a different requirement let me know.
SELECT ID, MIN(YEAR) as MinYear, MAX(YEAR) as MaxYear, SUM(VALUE) as SUMVALUE
FROM tablenameyoudidnotsay
GROUP BY ID
You could use query as bellow
TableA is your first table, and TableB is the second one
SELECT *,
(select SUM(Value) FROM TableA where tablea.ID=TableB.ID AND tableA.Year BETWEEN
TableB.MinYear AND TableB.MaxYear) AS SumValue
from TableB
You can put your criteria into a join and obtain the result all as one set which should be faster:
SELECT b.Id, b.MinYear, b.MaxYear, sum(a.Value)
FROM Table2 b
JOIN Table1 a ON a.Id=b.Id AND b.MinYear <= a.Year AND b.MaxYear >= a.Year
GROUP BY b.Id, b.MinYear, b.MaxYear

SSRS and ignored "having" clause

I'm trying to get a list of data whose last update date is older than the current date minus x month.
As an example, here is a sample table :
| DataGUID | UpdateDate |
|------------|---------------|
| AAA | 12-05-2017 |
| BBB | 22-06-2017 |
| AAA | 14-02-2017 |
| BBB | 16-05-2017 |
Currently, I have a SQL request looking somewhat like the following :
SELECT
DataGUID,
MAX(UpdateDate)
FROM
Table
GROUP BY
DataGUID
HAVING
MAX(UpdateDate) <= DATEADD(mm, CAST('-'+#LastUpdatedXMonthAgo AS INT), GETDATE())
ORDER BY
MAX(UpdateDate) DESC;
Expected result is the following (with #LastUpdatedXMonthAgo = 1, and current date 13-07-2017) :
| DataGUID | UpdateDate |
|------------|---------------|
| AAA | 12-05-2017 |
It works on SSMS, but SSRS seems to ignore the "having" clause, and gives me this result :
| DataGUID | UpdateDate |
|------------|---------------|
| BBB | 22-06-2017 |
| AAA | 12-05-2017 |
Seem like SSRS just ignore the "having" clause, is there a way to make it work without using SSRS filters?
The problem appears to be that you are trying to prepend concatenate a hyphen to a session variable which is a number. This is resulting in the minus sign being ignored/dropped. Hence, your HAVING clause is working, but it is comparing each max date against one month in the future from now, instead of one month behind.
The following query will show you one way to fix this problem:
DECLARE #LastUpdatedXMonthAgo INT;
SET #LastUpdatedXMonthAgo = 1;
SELECT
DATEADD(mm, CAST('-'+CAST(#LastUpdatedXMonthAgo AS VARCHAR(55)) AS INT), GETDATE());
This approach is to cast #LastUpdatedXMonthAgo to text before trying to "negate" the string. Another approach would be to just make #LastUpdatedXMonthAgo text, but maybe this would be inconvenient for the rest of your script.
Update:
In SSMS, we could simplify this even further to:
DATEADD(mm, -#LastUpdatedXMonthAgo, GETDATE());
Demo here:
Rextester

TSQL Multiple column unpivot with named rows possible?

I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.
Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder

sum column with duplicates in another table

Wrong Result
So i have two tables
Order
Staging
Order Table having column structure
+-------+---------+-------------+---------------+----------+
| PO | cashAmt | ClaimNumber | TransactionID | Supplier |
+-------+---------+-------------+---------------+----------+
| 12345 | 100 | 99876 | abc123 | 0101 |
| 12346 | 50 | 99875 | abc123 | 0102 |
| 12345 | 100 | 99876 | abc123 | 0101 |
+-------+---------+-------------+---------------+----------+
Staging Table having column structure
+----------+------------+-------------+---------------+
| PONumber | paymentAmt | ClaimNumber | TransactionID |
+----------+------------+-------------+---------------+
| 12345 | 100 | 99876 | abc123 |
| 12346 | 50 | 99875 | abc123 |
+----------+------------+-------------+---------------+
The query i am executing is
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM [order] with (nolock)
WHERE TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging with (nolock)
where TransactionID='abc123'
but the sum is getting messed up because there is duplicate in one of the tables.
How can i edit that i get only uniques from the order table and the sums are correct
First ask yourself why are there duplicates in the Orders table? There must be a reason why they are there. I would deal with that first.
That issue aside, if the duplicates in the Orders table have a purpose and yet are not to be considered for this particular query, then you should be able to leave out the duplicates by simply changing the query to use DISTINCT on whatever field in the Orders table can reliably identify a duplicate.
select Distinct fieldname sum(cashAmt)... etc.
Assuming duplicates in your table are OK.
Not sure why you are using no lock, it seems like it shouldn't be included.
You could use a table variable to store the distinct values. You'll need to adjust the data types in the table variable to match your table structure.
I haven't tested the code below but it should look something like this.
DECLARE #OrderTmp TABLE (
cashAmt MyNumericColumn numeric(10,2)
, ClaimNumber int
, TransactionID Int
)
INSERT INTO #OrderTmp
select Distinct
cashAmt
,ClaimNumber
,TransactionID
FROM
[order]
WHERE TransactionID='abc123'
SELECT DISTINCT
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM #OrderTmp
where TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging
where TransactionID='abc123'

Resources