Snowflake: concatenate 4 tables data into singe table

Snowflake: concatenate 4 tables data into singe table - snowflake-cloud-data-platform

I have 4 tables, all are same w.r.t columns and column types.
Example: the below are the table names
data_2017
data_2018
data_2019
data_2020
For ease of data ingestion we have to create seperate tables for each year.
Now I want to concatenate all the tables into one table
How can i do this in snowflake.

You could use UNION ALL to create a new table/view
CREATE OR REPLACE TABLE/VIEW concatenated_data
AS
SELECT * FROM data_2017 UNION ALL
SELECT * FROM data_2018 UNION ALL
SELECT * FROM data_2019 UNION ALL
SELECT * FROM data_2020;
* should be replaced with actual column names.

Related

How to append data from one table to another table in Snowflake

I have a table of all employees (employees_all) and then created a new table (employees_new) with the same structure that I would like to append to the original table to include new employees.
I was looking for the right command to use and found that INSERT lets me add data as in the following example:
create table t1 (v varchar);
insert into t1 (v) values
('three'),
('four');
But how do I append data coming from another table and without specifying the fields (both tables have the same structure and hundreds of columns)?

With additional research, I found this specific way to insert data from another table:
insert into employees_all
select * from employees_new;
This script lets you append all rows from a table into another one without specifying the fields.
Hope it helps!

Your insert with a select statement is the most simple answer, but just for fun, here's some extra options that provide some different flexibility.
You can generate the desired results in a select query using
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new;
This allows you to have a few more options with how you use this data downstream.
--use a view to preview the results without impacting the table
CREATE VIEW employees_all_preview
AS
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new;
--recreate the table using a sort,
-- generally not super common, but could help with clustering in some cases when the table
-- is very large and isn't updated very frequently.
INSERT OVERWRITE INTO employees_all
SELECT * FROM (
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new
) e ORDER BY name;
Lastly, you can also do a merge to give you some extra options. In this example, if your new table might have records that already match an existing record then instead of inserting them and creating duplicates, you can run an update for those records
MERGE INTO employees_all a
USING employees_new n ON a.employee_id = n.employee_id
WHEN MATCHED THEN UPDATE SET attrib1 = n.attrib1, attrib2 = n.attrib2
WHEN NOT MATCHED THEN INSERT (employee_id, name, attrib1, attrib2)
VALUES (n.employee_id, n.name, n.attrib1, n.attrib2)

SQL Server - More efficient way of joining three tables without parent table

This is an issue I have been working on for a while, I have three tables, all of which share 3 of the same columns but there are rows that are unique to each row. I would like to combine all of the tables without duplicating rows. I have a working solution but I feel like it might not be the most efficient. I tried using joins but found that without a parent table, I wasn't getting the expected number of results. My solution which does yield the correct number of results(I've cut some columns for simplicity):
--Create table
CREATE TABLE #Temp
(
ID,
Date
)
-- Insert rows that are only in db1
INSERT INTO #Temp
SELECT
ID,
Date
FROM test.dbo.db1
-- Do not include rows shared by db1 and db2
EXCEPT
(
SELECT
ID,
Date
FROM test.dbo.db2
INTERSECT
SELECT
ID,
Date
FROM test.dbo.db1
)
EXCEPT
-- And not in db1 and db3
(
SELECT
ID,
Date
FROM test.dbo.db1
INTERSECT
SELECT
ID,
Date
FROM test.dbo.db3
)
EXCEPT
-- And not in db1, db2 and db3
** Code where I intersect all 3 tables
I repeat the above steps for all three tables and then add the intersections for each combined ID/Date(db1+d2+db3, db1+db2, etc...)
Does anyone know of a way to do this that is more direct and to the point? I have tried doing a full join of all of them but without a parent table with all of the ID's, I found the ID's that only appear in the other two tables don't show up.

SELECT
ID,
Date
FROM test.dbo.db1
UNION
SELECT
ID,
Date
FROM test.dbo.db2
UNION
SELECT
ID,
Date
FROM test.dbo.db3
The UNION takes care of removing duplicates.

SELF JOIN /SUBQUERIES/SSIS performance

Hi I have three tables
Wrk_Order ,Wrk_Info,Wrk_Driver
all the three have DFirstName,DLastName,DUsername columns . I want to anonymise them
What I did is I have got distinct of each three columns in all the three table and loaded into lookup table as
Table Name :LKP_driverinfo
Columns :[DriverInfo,DriverAnonInfo]
DriverInfo column will have data as follows
SELECT Distinct DFirstName FROM Wrk_Order
UNION
SELECt DISTINCT DLastName FROM WRk_Order
UNION
SELECt DISTINCT DUsername FROM WRk_Order
UNION
SELECT Distinct DFirstName FROM Wrk_Info
.
.
.
Similarly for rest two tables
DriverAnonInfo : will have anonymized value of Driverinfo
I need to update DFirstName,DLastName,DUsername of all the three tables to anonymised information [DriverAnonInfo] from Lookup table LKP_driverinfo
whats the best way to achieve this self-join or inner queries or using SSIS?

You seem to create a sort of EAV model with your driver info, but you do not seem to keep any references for the originating tables.
Perhaps you can try adding 3 more columns to your driverinfo table, to hold the id of the table that the value comes from. That way you can directly update or join entries in the tables.

How to use INSERT SELECT?

I have a table's structure:
[Subjects]:
id int Identity Specification yes
Deleted bit
[Juridical]:
id int
Name varchar
typeid int
[Individual]:
id int
Name varchar
Juridical and Individual it's a children classes of Subjects class. So it's mean that same rows in tables Individual and Subjects have a same id.
Now I have a table:
[MyTable]:
typeid varchar
Name varchar
And I want to select data from this table and insert it into my table structure. But I don't know what to do. I tried to use OUTPUT:
INSERT INTO [Individual](Name)
OUTPUT false
INTO [Subjects].[Deleted]
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
But the syntax is not correct.

Just use:
INSERT INTO Individual(Name)
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
and
INSERT INTO Subjects(Deleted)
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
You can't insert in a single query in two tables, you need two separate queries for that. For that reason I split your initial query into two INSERT statements, to add records to both your Individual and Subjects table.
Just as #marc_s said, you must select the exact number of columns in your SELECT statement with the number of columns you want to insert data into your tables.
Other than these two constraints, which are both related to syntax, you are fully allowed to do any filtering in the SELECT part or make any complex logic as you would do in a normal SELECT query.

You need to use this syntax:
INSERT INTO [Individual] (Name)
SELECT [MyTable].[Name]
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
You should define the list of column to insert into in the INSERT INTO line, and then you must have a SELECT that returns exactly that many columns as you need (and the column types need to match, too)

Perform Query and count rows on multiple identical table

I have multiple tables created for each date to store some information for each date.
For example History3108,History0109..etc All of these tables share same schema. Some time i need to query multiple tables and get the rows and count of records. What is the faster way of doing this in oracle and SQL Server?
Currently i am doing like this...
When i need count of multiple tables: Select count(*) for each table and add
When i need records of multiple tables: select * from table1, select * from table2 (Basically select * for each table.)
Would this give better performance if we include all of the queries in one transaction?

With UNION you can get records from multiple tables that shares the same datatype group and column names. For example, if you want to see all records from multiple tables:
(select * from history3108)
union all
(select * from history0109)
union all
(select * from history0209)
/* [...] and so on */
and if you want to count all records from these tables:
select count(*) from (
(select * from history3108)
union all
(select * from history0109)
union all
(select * from history0209)
/* [...] and so on */
);
Oracle Docs - The UNION [ALL], INTERSECT, MINUS Operators

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake: concatenate 4 tables data into singe table - snowflake-cloud-data-platform

You could use UNION ALL to create a new table/view CREATE OR REPLACE TABLE/VIEW concatenated_data AS SELECT * FROM data_2017 UNION ALL SELECT * FROM data_2018 UNION ALL SELECT * FROM data_2019 UNION ALL SELECT * FROM data_2020; * should be replaced with actual column names.

Related

How to append data from one table to another table in Snowflake

SQL Server - More efficient way of joining three tables without parent table

SELF JOIN /SUBQUERIES/SSIS performance

How to use INSERT SELECT?

Perform Query and count rows on multiple identical table

Categories

Resources