Internal SQL error (000603 (XX000)) due to 300010:2077141494 - snowflake-cloud-data-platform

I am getting the following error in dbt, using snowflake and I can't figure out what the issue is.
Database Error in model stg_bank_balances2 (models/staging/cas/vpapay/finance/stg_bank_balances.sql)
000603 (XX000): SQL execution internal error:
Processing aborted due to error 300010:2077141494; incident 5570604.
compiled SQL at target/run/cas_datawarehouse/staging/cas/vpapay/finance/stg_bank_balances.sql
I have a staging table that is running 100% when I open the file and run it manually.
However when I run it with
dbt run --models +stg_bank_balances
then I get this error... any ideas?
Compiled SQL code:
with
__dbt__CTE__dw_bank_balance_base as (
with
source as (select * from CAS_RAW.BANK_BALANCE_INFORMATION_FOR_DATAWAREHOUSE.FACILITY_DATA),
renamed as (
select
to_date(date) as date
,FACILITY_BALANCE as facility_balance
,FACILITY_LIMIT as facility_limit
,LVR as loan_to_value_ratio_expected
,UNENCUMBERED_CASH as unencumbered_cash
from source
)
select *
from renamed
),data_sheet as ( select *
,row_number() over (order by date) as row_num
from __dbt__CTE__dw_bank_balance_base
),
calendar as ( select *
from ANALYTICS.dev_avanwyk.stg_calendar
where date >= (select min(date) from data_sheet)
and date <= current_date()
),
creating_leads as (
select a.*
,a.date as date_from
,case
when b.date is null then current_date()
else b.date
end as date_to
from data_sheet a
left join data_sheet b on a.row_num = b.row_num-1
),
renamed as (
select cal.date as cal_date
,ds.date_from, ds.date_to
,ds.facility_balance
,ds.facility_limit
,ds.loan_to_value_ratio_expected
,ds.unencumbered_cash
from calendar cal
left join creating_leads ds on
ds.date_from <= cal.date
and
cal.date < ds.date_to
)
select *
from renamed

Your cte names are the same, try using in your models unique cte (common table expression) names. You can see you are referencing twice a cte called "renamed". Try changing this and write back what is Snowflake spitting out.

I think Mincho is right.
The first thing to note is that this is a Database Error (docs) — this means that Snowflake is returning the error, and dbt is just passing it on.
Here, Snowflake is having difficulty because you have two CTEs (common table expressions) with the same name — renamed. It looks like you have an upstream model named dw_bank_balance_base that is ephemeral, so it's being injected as a CTE.
You can:
Rename one of your renamed CTEs to something else
Make dw_bank_balance_base a view or table by changing the materialized config
Let me know if that fixes it!

Found the issue - dbt doesn't want me joining a table to itself.
Hence I created another CTE with the prev_row_num = row_num -1 to facilitate this.
with
__dbt__CTE__dw_bank_balance_base as (
with
source as (select * from CAS_RAW.BANK_BALANCE_INFORMATION_FOR_DATAWAREHOUSE.FACILITY_DATA),
renamed as (
select
to_date(date) as date
,FACILITY_BALANCE as facility_balance
,FACILITY_LIMIT as facility_limit
,LVR as loan_to_value_ratio_expected
,UNENCUMBERED_CASH as unencumbered_cash
from source
)
select *
from renamed
),data_sheet as ( select *
,row_number() over (order by date) as row_num
,(row_number() over (order by date))-1 as prev_row_num
from __dbt__CTE__dw_bank_balance_base
),
data_sheet1 as ( select *
,(row_number() over (order by date))-1 as prev_row_num
from __dbt__CTE__dw_bank_balance_base
),
calendar as ( select *
from ANALYTICS.dev_avanwyk.stg_calendar
where date >= (select min(date) from data_sheet)
and date <= current_date()
),
creating_leads as (
select
a.date as date_from
,a.facility_balance
,a.facility_limit
,a.loan_to_value_ratio_expected
,a.unencumbered_cash
,case
when b.date is null then current_date()
else b.date
end as date_to
from data_sheet a
left join data_sheet1 b on a.row_num = b.prev_row_num
),
staging as (
select cal.date as cal_date
,ds.date_from
, ds.date_to
,ds.facility_balance
,ds.facility_limit
,ds.loan_to_value_ratio_expected
,ds.unencumbered_cash
from calendar cal
left join creating_leads ds on
ds.date_from <= cal.date
and
cal.date < ds.date_to
)
select *
from staging

Related

snowflake unsupported subquery cannot be evaluated

/Table TEMP has customer hash, effective start date and effective end date. Table CDTLS has customer hash, effective start date.I want to customer hash, effective from, Customer name from TEMP and CDTLS. I am calculating CDTLS end date on the fly and comparing it with TEMP.EFFECTIVE_FROM and TEMP_EFFECTIVE_TO dates. I get an error that unsupported subquery cannot be evaluated./
SELECT
TEMP.CUSTOMER_HASH,
TEMP.EFFECTIVE_FROM,
TEMP.EFFECTIVE_TO,
CDTLS.NAME
FROM TEMP
LEFT CDTLS
ON
TEMP.CUSTOMER_HASH = CDTLS.CUSTOMER_HASH
AND
CDTLS.EFFECTIVE_FROM <= TEMP.EFFECTIVE_FROM
AND
(
SELECT VW.EFFECTIVE_TO FROM
(
SELECT CUSTOMER_HASH, EFFECTIVE_FROM, LEAD(EFFECTIVE_FROM, 1, '9999-12-31') OVER (PARTITION
BY CUSTOMER_HASH ORDER BY EFFECTIVE_FROM ASC) AS EFFECTIVE_TO
FROM CUST_DETAILS
) AS VW
WHERE CDTLS.CUSTOMER_HASH = VW.CUSTOMER_HASH AND CDTLS.EFFECTIVE_FROM = VW.EFFECTIVE_FROM
) >= TEMP.EFFECTIVE_TO
;
I suppose you wanted to run this query:
SELECT
TEMP.CUSTOMER_HASH,
TEMP.EFFECTIVE_FROM,
TEMP.EFFECTIVE_TO,
CDTLS.NAME
FROM TEMP
LEFT join CDTLS
ON
TEMP.CUSTOMER_HASH = CDTLS.CUSTOMER_HASH
AND
CDTLS.EFFECTIVE_FROM <= TEMP.EFFECTIVE_FROM
left join (
SELECT CUSTOMER_HASH, EFFECTIVE_FROM, LEAD(EFFECTIVE_FROM, 1, '9999-12-31') OVER (PARTITION
BY CUSTOMER_HASH ORDER BY EFFECTIVE_FROM ASC) AS EFFECTIVE_TO
FROM CUST_DETAILS
) AS VW on CDTLS.CUSTOMER_HASH = VW.CUSTOMER_HASH AND CDTLS.EFFECTIVE_FROM = VW.EFFECTIVE_FROM
where
VW.EFFECTIVE_TO >= TEMP.EFFECTIVE_TO
You could try using MIN / MAX / LISTAGG etc in the select query to make it deterministically scalar to check if that helps.
https://docs.snowflake.net/manuals/user-guide/querying-subqueries.html#differences-between-correlated-and-non-correlated-subqueries

Display of online users on the system

I don't know exactly where I'm wrong, but I need a list of all the workers who are currently at work (for the current day), this is my sql query:
SELECT
zp.ID,
zp.USER_ID,
zp.Arrive,
zp.Deppart,
zp.DATUM
FROM time_recording as zp
INNER JOIN personal AS a on zp.USER_ID, = zp.USER_ID,
WHERE zp.Arrive IS NOT NULL
AND zp.Deppart IS NULL
AND zp.DATUM = convert(date, getdate())
ORDER BY zp.ID DESC
this is what the data looks like with my query:
For me the question is, how can I correct my query so that I only get the last Arrive time for the current day for each user?
In this case to get only these values:
Try this below script using ROW_NUMBER as below-
SELECT * FROM
(
SELECT zp.ID, zp.USER_ID, zp.Arrive, zp.Deppart, zp.DATUM,
ROW_NMBER() OVER(PARTITION BY zp.User_id ORDER BY zp.Arrive DESC) RN
FROM time_recording as zp
INNER JOIN personal AS a
on zp.USER_ID = zp.USER_ID
-- You need to adjust above join relation as both goes to same table
-- In addition, as you are selecting nothing from table personal, you can drop the total JOIN part
WHERE zp.Arrive IS NOT NULL
AND zp.Deppart IS NULL
AND zp.DATUM = convert(date, getdate())
)A
WHERE RN =1
you can try this:
SELECT DISTINCT
USER_ID,
LAR.LastArrive
FROM time_recording as tr
CROSS APPLY (
SELECT
MAX(Arrive) as LastArrive
FROM time_recording as ta
WHERE
tr.USER_ID = ta.USER_ID AND
ta.Arrive IS NOT NULL
) as LAR

SQL Pull the latest datediff for each Category

I have two tables.
Repair -
RepairID, EquipID, RepairDate
Events -
EventID, EquipID, ReturnDate, CustomerID
I am trying to determine who the last customer was that returned the equipment, before the repair was done. Equipment could have been returned multiple times in the past, but I only need to track the very last customer that returned it.
Final result will include CustomerID, EquipID, ReturnDate, RepairDate
My SQLFiddle for a sample DDL and query:
http://sqlfiddle.com/#!3/f2691/6/0
This returns all the customers, not only the very last one that returned.
Does that return what you expect?
Option 1:
SELECT E.EquipID,
E.CustomerID,
max(E.ReturnDate) MAXRETURN
FROM Repair R
CROSS APPLY (
SELECT *,
row_number() OVER (
PARTITION BY EquipID ORDER BY ReturnDate DESC
) AS RN
FROM Event E
WHERE R.RepairDate > E.ReturnDate
AND E.EquipID = R.EquipID
) E
WHERE E.RN = 1
GROUP BY E.EquipID,
E.CustomerID
Option 2:
SELECT E.EquipID,
E.CustomerID,
max(E.ReturnDate) MAXRETURN
FROM (
SELECT E.*,
row_number() OVER (
PARTITION BY E.EquipID ORDER BY E.ReturnDate DESC
) AS RN
FROM Event E
INNER JOIN Repair R
ON E.EquipID = R.EquipID
WHERE R.RepairDate > E.ReturnDate
) E
WHERE E.RN = 1
GROUP BY E.EquipID,
E.CustomerID

SELECT most recent date out of group

I have a T-SQL query that is designed to weed out duplicate entries of a certain product training, grabbing only the one with the most recent DateTaken. For example, if someone has taken a certain training course 3 times, we only want to display one row, that row being the one that contains the most recent DateTaken. Here is what I have so far, however I am receiving the following error:
An expression of non-boolean type specified in a context where a condition is expected, near 'ORDER'.
The ORDER BY is necessary since we want to group all the results of this query by the expiration date. Below is the full query:
SELECT DISTINCT
p.ProductDescription as ProductDesc,
c.CourseDescription as CourseDesc,
c.Partner, a.DateTaken, a.DateExpired, p.Status
FROM
sNumberToAgentId u, AgentProductTraining a, Course c, Product p
WHERE
#agentId = u.AgentId
and u.sNumber = a.sNumber
and a.CourseCode = c.CourseCode
and (a.DateExpired >= #date or a.DateExpired IS NULL)
and a.ProductCode = p.ProductCode
and (p.status != 'D' or p.status IS NULL)
GROUP BY
(p.ProductDescription)
HAVING
MIN(a.DateTaken)
ORDER BY
DateExpired ASC
EDIT
I've made the following changes to the GROUP BY and HAVING clauses, however I am still receiving errors:
GROUP BY
(p.ProductDescription, c.CourseDescription)
HAVING
MIN(a.DateTaken) > GETUTCDATE()
In SQL Management Studio, and red line error marker appears under the ',' after p.ProductDescription, the ')' after c.CourseDescription, the 'a' in a.DateTaken, and the closing parenthesis ')' of GETUTCDATE(). If I simply leave the GROUP BY statement to include only p.ProductDescription I get this error message:
Column 'Product.ProductDescription' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I'm relatively new to SQL, could someone explain what's going on? Thank you!
My suggestion since you are using sql server is to implement row_number() and partition by the ProductDescription and CourseDescription. This will go in a subquery and then you apply a filter to return only those where the row number is equal to one or the most recent record:
select *
from
(
SELECT p.ProductDescription as ProductDesc,
c.CourseDescription as CourseDesc,
c.Partner, a.DateTaken, a.DateExpired, p.Status
row_number() over(partition by p.ProductDescription, c.CourseDescription order by a.DateTaken desc) rn
FROM sNumberToAgentId u
INNER JOIN AgentProductTraining a
ON u.sNumber = a.sNumber
AND (a.DateExpired >= #date or a.DateExpired IS NULL)
INNER JOIN Course c
ON a.CourseCode = c.CourseCode
INNER JOIN Product p
ON a.ProductCode = p.ProductCode
AND (p.status != 'D' or p.status IS NULL)
WHERE u.AgentId = #agentId
) src
where rn = 1
order by DateExpired
Its this line
HAVING MIN(a.DateTaken)
Should be a boolean type such as
HAVING MIN(a.DateTaken) > GETUTCDATE()
Have to return True or a False (Boolean)
Here is the final query I wound up using. It is similar to the suggestions above:
SELECT ProductDesc, CourseDesc, Partner, DateTaken, DateExpired, Status
FROM(
SELECT
p.ProductDescription as ProductDesc,
c.CourseDescription as CourseDesc,
c.Partner, a.DateTaken, a.DateExpired, p.Status,
row_number() OVER (PARTITION BY p.ProductDescription, c.CourseDescription ORDER BY abs(datediff(dd, DateTaken, GETDATE()))) as Ranking
FROM
sNumberToAgentId u, AgentProductTraining a, Course c, Product p
WHERE
#agentId = u.AgentId
and u.sNumber = a.sNumber
and a.CourseCode = c.CourseCode
and (a.DateExpired >= #date or a.DateExpired IS NULL)
and a.ProductCode = p.ProductCode
and (p.status != 'D' or p.status IS NULL)
) aa
WHERE Ranking = '1'

SQL Server T-SQL: Get a records set based on a priority value, startdate and enddate

I am working on an time attendance system. I have to following tables:
Schedule: Contains a Name nvarchar field and a Start and End DateTime fields.
Policy: Contains Start and End DateTime fields too.
PolicySchedule (Cross Table): Contains a Priority int field beside the foreign keys.
End datetime fields are nullable which indicates open periods.
The schedule of the greatest priority will be applied and activated within its time and the policy time.
I need to get a list of the applied schedules within each policy and their activation periods start and end time knowing that shedules may be intersected. Policies are not related here ..
Example:
Schedule_____________Start_________________________End
Schedule1____________01/01/2011 00:00______________04/01/2011 00:00
Schedule2____________04/01/2011 00:00______________11/01/2011 14:00
Schedule3____________11/01/2011 14:00______________02/15/2012 00:00
Schedule2____________02/15/2012 00:00______________01/01/2013 00:00
What is the most efficient way to get the requested result ?
If you have lots of policies and schedules (separately) but few schedules per policy, the most straightforward way would be quite efficient:
WITH dates (policyId, changeDate) AS
(
SELECT policyId, changeDate,
ROW_NUMBER() OVER (PARTITION BY policyId ORDER BY changeDate) AS rn
FROM (
SELECT policyId, startDate AS changeDate
FROM policySchedule ps
JOIN schedule s
ON s.id = ps.scheduleId
UNION
SELECT policyId, endDate AS changeDate
FROM policySchedule ps
JOIN schedule s
ON s.id = ps.scheduleId
) q
),
ranges (startDate, endDate)AS
(
SELECT d1.policyId,
d1.changeDate,
d2.changeDate
FROM dates d1
JOIN dates d2
ON d2.policyId = d1.policyId
AND d2.rn = d1.rn + 1
)
SELECT *
FROM policy p
JOIN ranges r
ON r.policyId = p.id
CROSS APPLY
(
SELECT TOP 1 s.*
FROM policySchedules ps
JOIN schedule s
ON ps.policyId = p.id
AND s.id = ps.scheduleId
WHERE ps.startDate BETWEEN r.startDate AND r.endDate
AND ps.endDate BETWEEN r.startDate AND r.endDate
ORDER BY
ps.Priority DESC
)
Assuming you're using SQL Server 2005 or later, you can use an outer apply to look up the policy with the highest priority:
select *
from Schedule s
outer apply
(
select top 1 *
from PolicySchedule ps
join Policy p
on p.id = ps.policyid
where s.StartTime <= p.EndTime
and p.StartTime <= s.EndTime
order by
ps.priority desc
) pol
If there are time periods that have to overlap, you can add a where clause in the outer apply.

Resources