DBT: Loading a Vertical(Key/value) table from multiple queries

DBT: Loading a Vertical(Key/value) table from multiple queries - snowflake-cloud-data-platform

I have a use case to fill a vertical table with various product data. For internal reasons the join is messy as in our case I have only one way to link product data to the internal billing system. To get around this I am trying to feed multiple queries into a vertical table. I am aware of using an incremental configuration to populate a query but I have not found anything in the documentation to support loading the result of multiple queries into a more narrow/vertical table.
The table would be structured like the following:
UNIQUEID
ACCOUNTID
PRODUCTID
KEY
VALUE
PPD
DATE
asdfasd
099080as
8998asfas
500.00
10
1.5
6/28/2022
EDIT: Adding some more clarification as a numerical key/value seems off I know. But in this context a given account id can have only one product id and only one target/value for each productid
Through snowflake tasks I would have several queries like:
INSERT INTO TABLE (fields...)
SELECT
UNIQUEID,
ACCOUNTID,
PRODUCTID,
TARGET,
VALUE,
PPD,
CURRENT_DATE() as date
FROM accounttable t
INNER JOIN producttable t2 ON
t.accountid = t2.accountid
where t2.productid = '5898988asdfas'
and t.type = 'prodtype'
Without getting too much into the data structure problem I have I can only link to the value I need by the account Id but need to specify the productId in each query loading into it so I can ensure that I get the right product and account data each time.
Just looking for how to use DBT to do something like this if it is even possible at all.

Assuming that your Product IDs (like 5898988asdfas) are not present in your database, or not otherwise related to the account IDs the way you need them to be, you have a couple of options:
Create a CSV that maps account ID's to product ID's (in Excel or Sheets) and upload this table to your database using dbt seed. Then you would use that mapping to join the other tables together, like:
FROM accounttable t
INNER JOIN mappingtable on t.accountid = mappingtable.accountid
INNER JOIN producttable t2
ON t.accountid = t2.accountid
AND mappingtable.productid = t2.productid
where t.type = 'prodtype'
Alternatively, you could store the values of productid in a Jinja list and build a select statement using a union
{% set ids = ["5898988asdfas", "6898988asdfas", "7898988asdfas"] %}
{% for id in ids %}
SELECT
UNIQUEID,
ACCOUNTID,
PRODUCTID,
TARGET,
VALUE,
PPD,
CURRENT_DATE() as date
FROM accounttable t
INNER JOIN producttable t2 ON
t.accountid = t2.accountid
where t2.productid = '{{ id }}'
and t.type = 'prodtype'
{% if not loop.last %} union all {% endif %}
{% endfor %}
Either of the above solutions are a single select statement, so they can be a single model in your dbt project. You can materialize them however you like, but probably not incrementally (incremental models are really intended for something like an event stream, where new immutable data is always being appended to the source dataset).

Related

How do I query three related tables?

I'm new to databases and I'm having a hard time figuring this out. Any assistance would be greatly appreciated
Deliveries Table
-- ID (PK)
-- DriverID (FK to Drivers Table)
Drivers Table
-- ID (PK)
-- LocationID (FK to Locations Table)
Locations Table
-- ID (PK)
-- RestaurantID (FK to Restaurants Table)
Restaurant Table
--ID (PK)
A Restaurant can have multiple locations (1 to many). A location can have multiple drivers (1 to many). a Driver can have multiple deliveries (1 to many). This design is supposed to break things out in 3rd normal form. So if I want to go to the deliveries table and get all of the deliveries associated with a particular restaurant, how would I query or do a join for that? Would I have to add a second foreign key to Deliveries that directly references the Restaurant table? I think after I see the query I can figure out what is going on. Thx

You can use left or right outer join to make a combined table and then you can easily query it, or else you can use a query with multiple sub-queries inside it to attain the required result without using join. Here is an example on how to use sub-query for your use-case.
SELECT ID FROM Deliveries De
WHERE De."DriverID" IN (SELECT ID FROM Drivers Dr
WHERE Dr. "LocationID" IN (SELECT ID FROM Locations L
WHERE L. "RestaurantID" IN (SELECT ID FROM Restaurant)))
I hope this solves your issue without using join statement.

You can use inner join or union depending on what you want to achieve. Example:
SELECT a."articleId" AS id, a.title, a."articleImage" AS "articleImage/url", c.category AS "category/public_id", a."createdOn", concat("firstName", ' ', "lastName") AS author
FROM articles a
INNER JOIN users u ON a."userId" = u."userId"
INNER JOIN categories c ON a."categoryId" = c."categoryId"
UNION
SELECT g."gifId" AS id, g.title, g."imageUrl" AS "articleImage/url", g.public_id AS "category/public_id", g."createdOn", concat("firstName", ' ', "lastName") AS author
FROM gifs g
INNER JOIN users u ON g."userId" = u."userId"
ORDER BY "createdOn" DESC
You can say how you want to get the results for more detailed query.

If I understand what you want to do then it maybe like this,
1st you have to join all those table to get corresponding result you want,
the join condition will be
select <your desire column name>
from Restaurant A,Locations B,Drivers C,Deliveries D
where A.ID = B.RestaurantID
and B.ID = C.LocationID
and C.ID = D.DriverID
Hope this is helpful, fell free to say anything.

SQL query inside a query

Allow me to share my query in an informal way (not following the proper syntax) as I'm a newbie - my apologies:
select * from table where
(
(category = Clothes)
OR
(category = games)
)
AND
(
(Payment Method = Cash) OR (Credit Card)
)
This is one part from my query. The other part is that from the output of the above, I don’t want to show the records meeting these criteria:
Category = Clothes
Branch = B3 OR B4 OR B5
Customer = Chris
VIP Level = 2 OR 3 OR 4 OR 5
SQL is not part of my job but I’m doing it to ease things for me. So you can consider me a newbie. I searched online, maybe I missed the solution.
Thank you,
HimaTech

There's a few ways of doing this (specifically within SQL - not looking at MDX here).
Probably the easiest to understand way would be to get the dataset that you want to exclude as a subquery, and use the not exists/not in command.
SELECT * FROM table
WHERE category IN ('clothes', 'games')
AND payment_method IN ('cash', 'credit card')
AND id NOT IN (
-- this is the subquery containing the results to exclude
SELECT id FROM table
WHERE category = 'clothes' [AND/OR]
branch IN ('B3', 'B4', 'B5') [AND/OR]
customer = 'Chris' [AND/OR]
vip_level IN (2, 3, 4, 5)
)
Another way you could do it is to do left join the results you want to exclude on to the overall results, and exclude these results using IS NULL like so:
SELECT t1.*
FROM table
LEFT JOIN
(SELECT id FROM table
WHERE customer = 'chris' AND ...) -- results to exclude
AS t2 ON table.id = t2.id
WHERE t2.id IS NULL
AND ... -- any other criteria
The trick here is that when doing a left join, if there is no result from the join then the value is null. But this is certainly more difficult to get your head around.
There will also be different performance impacts from doing it either way, so it may be worth looking into it. This is probably a good place to start:
What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?

SQL Server - Select from child-parent-child and return multiple results-set

I am using SQL Server 12/Azure and have 3 tables (T1, T2, T3) where T1 has 1-many with T2 and T3, I want to select from T2 and return the information of T1 records and their associated T3 records.
To give a simplified example, T1 is "Customer", T1 is "Orders", T3 is "CustomerAddresses", so a customer can have many orders and multiple addresses. Now I want to query the orders and include the customers information and addresses, to make things a little bit complicated, the query for orders could include matching on the customer addresses, e.g. get the orders for these addresses.
Customer Table
----------------------
Id, Name,...
----------------------
Orders Table
------------------------------
OrderId, CustomerKey, Date,...
------------------------------
CustomerAddresses
-----------------------------------------------
AutoNumber, CustomerKey, Street, ZipCode,...
-----------------------------------------------
I am having trouble writing the best way (optimized) to return all the results in one transaction and dynamically generate the sql statements, this is how I think the results should come back:
Orders (T2) and customer information (T1) are returned in one result-set/table and CustomerAddresses (T2) are returned in another result-set/table. I am using ADO.NET to generate and execute the queries and use System.Data.SqlClient.SqlDataReader to loop on the returned results.
Example of how the results could come back:
Order-Customer Table
-------------------------------
Order.OrderId, Customer.Id, Customer.Name, Order.Date,....
-------------------------------
CustomerAddresses
-------------------------------
AutoNumber, CustomerKey, Street
-------------------------------
This is an example of a query that I currently generate:
SELECT [Order].[OrderId], [Order].[Date], [Customer].[Id], [Customer].[Name]
FROM Order
INNER JOIN [Customer] on [Order].[CustomerKey] = [Customer].[Id]
WHERE ([Order].[Date] > '2015-06-28')
Questions:
1. How do I extend the above query to also allow returning the CustomerAddresses in a separate result-set/table?
To enable matching on the CustomerAddresses I should be able to do a join with the Customer table and include whatever columns I need to match in the WHERE statement.
Is there a better, simpler and more optimized way to achieve what I want?
-------- UPDATE ---------------
To clarify more about how I use the returned data in my app:
I am using ADO.NET, SqlConnection, SqlCommand and SqlDataReader to parse the results. (I don't want to use Entity Framework or any other high-level DB Framework here)
My model object is a collection of T2 (Orders) which contains T1 (Customer information) and T3 (CustomerAddresses)
OrderClass:
OrderId, OrderDate, CustomerId, CustomerName, CustomerAddresses[],...
CustomerAddresses Class:
Street, ZipCode, ....
I found out that people usually return all the results within a single select statement in one table which returns a redundant data. What I prefer is to return the tables as is (T1, T2 and T3) that only contains the relevant information then I can process it in my app to create the model.
Another solution is to insert the IDs from the select statement into a Temp table then return the results in multiple select statements:
Select T1.* From T1 where Id in (
select Temp.T1Id from Temp )
Select T2.* From T2 where Id in (
select Temp.T2Id from Temp )
Select T3.* From T3 where Id in (
select Temp.T3Id from Temp )

This is a great question and is a common issue that is apparently not well addressed elsewhere. The issue, as you allude to, is that since there can be many orders and addresses for each customer, the number of results in a single query can be large. For example,
select * from customer
left outer join order
on (order.customer_id = customer.customer_id)
left outer join customer_orders co
on (co.customer_id = customer.customer_id)
will generate the information you need but will return many results. For example if there are n orders per customer and m addresses per customer there will be mxn results.
So the approach you also allude to is a good approach. What you are saying is get the customer_ids from the first query and the use those ids to "generate" the order query and the address query.
Essentially what you need to to is to issue queries like:
select * from customer
where ....
to retrieve the customer information. Then
select * from order
where customer_id in [The customer_ids found in the above query]
and
select * from customer_address
where customer_id in [The customer_ids found in the above query]
You can use a temp table as you suggests but table valued parameters is going to be more efficient. Since you are using SQL Server 12 you can use table valued parameters. See the following link for more information about this: http://www.sommarskog.se/arrays-in-sql-2008.html
All these queries should be done in one transaction and you will need to pay attention to transaction isolation levels, which further complicates the issue.

Ok, first off... It will be easier for you and recommended to create a Stored Procedure with your SQL logic, for example
'[dbo].[GetOrderDetails]'.
The reason is to prevent falling victim to SQL injection and to prevent recompiling and re-distributing your assembly incase your query is updated.
Here is some interpreted pseudo code to guide you:
CREATE PROCEDURE [dbo].[OrderDetails]
(
#OrderDate DateTime,
#AddressFilter VARCHAR(128)
)
AS
BEGIN
IF(OBJECT_ID('tempdb..#OrderDetails') IS NOT NULL) DROP TABLE #OrderDetails
SELECT orders.[*ListYourOrdersColumnsHere*], customer.[*ListYourCustomerColumnsHere*], addresses.[*ListYourAddressColumnsHere*]
INTO #OrderDetails
FROM [Order] orders
INNER JOIN [Customer] customer on orders.[CustomerKey] = customer.[Id]
LEFT JOIN [CustomerAddresses] addresses ON addresses.[CustomerKey] = customer.[CustomerKey]
-- Or inner join if you need orders just with CustomerAddresses
--INNER JOIN [CustomerAddresses] addresses ON addresses.[CustomerKey] = customer.[CustomerKey]
WHERE orders.[Date] > #OrderDate
AND addresses.StreetName LIKE '%' + #AddressFilter + '%'
-- Orders
SELECT [*ListYourOrdersColumnsHere*]
FROM #OrderDetails
GROUP BY [*ListYourOrdersColumnsHere*]
-- Customers
SELECT [*ListYourCustomerColumnsHere*]
FROM #OrderDetails
GROUP BY [*ListYourCustomerColumnsHere*]
-- Addresses
SELECT [*ListYourAddressColumnsHere*]
FROM #OrderDetails
GROUP BY [*ListYourAddressColumnsHere*]
DROP TABLE #OrderDetails
END
Then moving away from the interpreted environment to your managed (compiled) code , you will need to look at :
SqlDataReader:NextResult()
This method advances the data reader to the next result set as seen below in the next block of pseudo code.
SqlConnection myConnection = new SqlConnection("*YourAzureDbConnectionString*");
SqlCommand myCommand = new SqlCommand();
SqlDataReader myReader;
myCommand.CommandType = CommandType.StoredProcedure;
myCommand.Connection = myConnection;
myCommand.CommandText = "[dbo].[OrderDetails]";
myCommand.Parameters.Add(new SqlParameter("#OrderDate", DateTime.Now);
myCommand.Parameters.Add(new SqlParameter("#AddressFilter", "Some or other address filter"));
int resultSetCount = 0;
try
{
myConnection.Open();
myReader = myCommand.ExecuteReader();
do
{
resultSetCount++;
while (myReader.Read())
{
switch (resultSetCount)
{
case 1:
// Populate Order information from 1st resultset
break;
case 2:
// Populate Customer customer information from 2nd resultset
break;
case 3:
// Populate CustomerAddress information from 3rd resultset
break;
}
}
}
while (myReader.NextResult());
}
catch (Exception ex)
{
// handle exception logic
}
finally
{
myConnection.Close();
}
Finally, this can also be done using SqlDataAdapter which will return your Stored Procedure out put as 3 separate DataTables. Here is a link to the documentation.
Hope this helps!

How to compare the results of two separate queries that have a common field in Sql Server?

Maybe it's because it's Friday but I can't seem to get this and it feels like it should be really really easy.
I have one result set (pulls the data from multiple tables) that gives me the following result set:
Room Type | ImageID | Date
The next query (pulled from separate tables than above) result give me :
ImageID | Date | Tagged
I just want to compare the results to see which imageid's are common between the two results, and which fall into each list separately.
I have tried insert the results from each into temp tables and do a join on imageid but sql server does NOT like that. Ideally I would like a solution that allows me to do this without creating temp tables.
I researched using union but it seems that because the two results don't have the same columns I avoided that route.
Thanks!

You can do this a number of different ways, for instance you can use either a inner join or intersect using the two sets as derived tables.
select ImageID from (your_first_query)
intersect
select ImageID from (your_second_query)
or
select query1.ImageID
from (your_first_query) query1
inner join (your_second_query) query2 on query1.ImageID = query2.ImageID

You don't explain why SQL-Server does not like performing a join on ImageId. Shouldn't be a problem. As to your first question, you need to transform your two queries into subqueries and perform a Full Out Join on them:
Select * from
(Select Room Type, ImageID, Date ...) T1 Full Outer Join
(Select ImageID, Date, Tagged ...) T2 on T1.ImageId = T2.ImageId
The analysis of Null values on both side of the join should give you what you want.

SELECT TableA.ImageID
FROM TableA
WHERE TableA.ImageID
IN (SELECT TableB.ImageID FROM TableB)

select q1.ImageID
from (your_first_query) q1
WHERE EXISTS (select 1
from (your_second_query)
WHERE ImageID = q1.ImageID)

SQL Same Column in one row

I have a lookup table that has a Name and an ID in it. Example:
ID NAME
-----------------------------------------------------------
5499EFC9-925C-4856-A8DC-ACDBB9D0035E CANCELLED
D1E31B18-1A98-4E1A-90DA-E6A3684AD5B0 31PR
The first record indicates and order status. The next indicates a service type.
In a query from an orders table I do the following:
INNER JOIN order.id = lut.Statusid
This returns the 'cancelled' name from my lookup table. I also need the service type in the same row. This is connected in the order table by the orders.serviceid How would I go about doing this?
It Cancelled doesnt connect to 31PR.
Orders connects to both. Orders has 2 fields in it called Servicetypeid and orderstatusid. That is how those 2 connect to the order. I need to return both names in the same order row.

I think many will tell you that having two different pieces of data in the same column violates first normal form. There is a reason why having one lookup table to rule them all is a bad idea. However, you can do something like the following:
Select ..
From order
Join lut
On lut.Id = order.StatusId
Left Join lut As l2
On l2.id = order.ServiceTypeId
If order.ServiceTypeId (or whatever the column is named) is not nullable, then you can use a Join (inner join) instead.

A lot of info left out, but here it goes:
SELECT orders.id, lut1.Name AS OrderStatus, lut2.Name AS ServiceType
FROM orders
INNER JOIN lut lut1 ON order.id = lut.Statusid
INNER JOIN lut lut2 ON order.serviceid = lut.Statusid

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight