Generate an sql query from multiple tables, then create new table - sql-server

I have a tricky bit of sql query I need to write. To best explain it, I will post some pictures to show three tables. The first two are tables which already contain data, the last table will be the table I need created using data from the first two:

You can use JOIN for each column you want to get in final table:
SELECT
Width.itemNumber,
Width.itemValue as 'Width',
Height.itemValue as 'Height',
[Type].valueID as 'Type',
Frame.valueID as 'Frame',
Position.valueID as 'Position'
INTO third_table_name
FROM itemMaster_itemValue Width
JOIN itemMaster_itemValue Height ON Width.itemNumber=Height.itemNumber AND Height.itemPropertyID='Height'
JOIN itemMaster_EnumValue 'Type' ON Width.itemNumber=[Type].itemNumber AND [Type].itemPropertyID='Type'
JOIN itemMaster_EnumValue Frame ON Width.itemNumber=Frame.itemNumber AND Frame.itemPropertyID='Frame'
JOIN itemMaster_EnumValue Position ON Width.itemNumber=Position.itemNumber AND Position.itemPropertyID='Position'
WHERE Width.itemPropertyID='Width'

I'm not sure if you actually are wanting to create a table for the third view or just a query (in access) / view (in MS SQL Server). Here is how I would do it:
In MS-Access:
Step 1 (Which can end here if all you need is a way to see the data in this format)
TRANSFORM Max(P.vid) AS MaxOfvid
SELECT P.inum
FROM (SELECT itemNumber as inum, itemPropertyID as ival, itemValue as vid
FROM itemMaster_itemValue
UNION
SELECT Enum.itemNumber AS inum, Enum.itemPropertyID AS ival, Enum.valueID AS vid
FROM itemMaster_EnumValue AS Enum) AS P
GROUP BY P.inum
PIVOT P.ival;
Step 2 (If you need to actually create an additional table)
Select * INTO tableName FROM previousPivotQueryName;
That will get you what you need in Access.
The SQL Server part is a little different and can all be done in one T-SQL Statement dbo.test is the name of the table you will create... If you are creating a table for performance reasons then this statement can be put into a job and run nightly to create the table. The Drop Table line will have to be removed before the first time you run it or it will fail because the table will not exist yet:
Drop Table dbo.Test
Select * INTO dbo.Test FROM (
/*just use the following part if you only need a view*/
SELECT *
FROM (SELECT itemNumber as inum, itemPropertyID as ival, itemValue as vid
FROM dbo.itemMaster_itemValue
UNION
SELECT Enum.itemNumber AS inum, Enum.itemPropertyID AS ival, Enum.valueID AS vid
FROM dbo.itemMaster_EnumValue AS Enum) P
PIVOT (max(vid) FOR ival IN([Width],[Height],[Type],[Frame],[Position])) PV)PVT;
And that should get you what you need in the most efficient way possible without using a bunch of joins. :)

Related

How to append data from one table to another table in Snowflake

I have a table of all employees (employees_all) and then created a new table (employees_new) with the same structure that I would like to append to the original table to include new employees.
I was looking for the right command to use and found that INSERT lets me add data as in the following example:
create table t1 (v varchar);
insert into t1 (v) values
('three'),
('four');
But how do I append data coming from another table and without specifying the fields (both tables have the same structure and hundreds of columns)?
With additional research, I found this specific way to insert data from another table:
insert into employees_all
select * from employees_new;
This script lets you append all rows from a table into another one without specifying the fields.
Hope it helps!
Your insert with a select statement is the most simple answer, but just for fun, here's some extra options that provide some different flexibility.
You can generate the desired results in a select query using
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new;
This allows you to have a few more options with how you use this data downstream.
--use a view to preview the results without impacting the table
CREATE VIEW employees_all_preview
AS
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new;
--recreate the table using a sort,
-- generally not super common, but could help with clustering in some cases when the table
-- is very large and isn't updated very frequently.
INSERT OVERWRITE INTO employees_all
SELECT * FROM (
SELECT * FROM employees_all
UNION ALL
SELECT * FROM employees_new
) e ORDER BY name;
Lastly, you can also do a merge to give you some extra options. In this example, if your new table might have records that already match an existing record then instead of inserting them and creating duplicates, you can run an update for those records
MERGE INTO employees_all a
USING employees_new n ON a.employee_id = n.employee_id
WHEN MATCHED THEN UPDATE SET attrib1 = n.attrib1, attrib2 = n.attrib2
WHEN NOT MATCHED THEN INSERT (employee_id, name, attrib1, attrib2)
VALUES (n.employee_id, n.name, n.attrib1, n.attrib2)

String or binary data would be truncated error in SQL server. How to know the column name throwing this error

I have an insert Query and inserting data using SELECT query and certain joins between tables.
While running that query, it is giving error "String or binary data would be truncated".
There are thousands of rows and multiple columns I am trying to insert in that table.
So it is not possible to visualize all data and see what data is throwing this error.
Is there any specific way to identify which column is throwing this error? or any specific record not getting inserted properly and resulted into this error?
I found one article on this:
RareSQL
But this is when we insert data using some values and that insert is one by one.
I am inserting multiple rows at the same time using SELECT statements.
E.g.,
INSERT INTO TABLE1 VALUES (COLUMN1, COLUMN2,..) SELECT COLUMN1, COLUMN2,.., FROM TABLE2 JOIN TABLE3
Also, in my case, I am having multiple inserts and update statements and even not sure which statement is throwing this error.
You can do a selection like this:
select TABLE2.ID, TABLE3.ID TABLE1.COLUMN1, TABLE1.COLUMN2, ...
FROM TABLE2
JOIN TABLE3
ON TABLE2.JOINCOLUMN1 = TABLE3.JOINCOLUMN2
LEFT JOIN TABLE1
ON TABLE1.COLUMN1 = TABLE2.COLUMN1 and TABLE1.COLUMN2 = TABLE2.COLUMN2, ...
WHERE TABLE1.ID = NULL
The first join reproduces the selection you have been using for the insert and the second join is a left join, which will yield null values for TABLE1 if a row having the exact column values you wanted to insert does not exist. You can apply this logic to your other queries, which were not given in the question.
You might just have to do it the hard way. To make it a little simpler, you can do this
Temporarily remove the insert command from the query, so you are getting a result set out of it. You might need to give some of the columns aliases if they don't come with one. Then wrap that select query as a subquery, and test likely columns (nvarchars, etc) like this
Select top 5 len(Col1), *
from (Select col1, col2, ... your query (without insert) here) A
Order by 1 desc
This will sort the rows with the largest values in the specified column first and just return the rows with the top 5 values - enough to see if you've got a big problem or just one or two rows with an issue. You can quickly change which column you're checking simply by changing the column name in the len(Col1) part of the first line.
If the subquery takes a long time to run, create a temp table with the same columns but with the string sizes large (like varchar(max) or something) so there are no errors, and then you can do the insert just once to that table, and run your tests on that table instead of running the subquery a lot
From this answer,
you can use temp table and compare with target table.
for example this
Insert into dbo.MyTable (columns)
Select columns
from MyDataSource ;
Become this
Select columns
into #T
from MyDataSource;
select *
from tempdb.sys.columns as TempCols
full outer join MyDb.sys.columns as RealCols
on TempCols.name = RealCols.name
and TempCols.object_id = Object_ID(N'tempdb..#T')
and RealCols.object_id = Object_ID(N'MyDb.dbo.MyTable)
where TempCols.name is null -- no match for real target name
or RealCols.name is null -- no match for temp target name
or RealCols.system_type_id != TempCols.system_type_id
or RealCols.max_length < TempCols.max_length ;

Does MS SQL Server automatically create temp table if the query contains a lot id's in 'IN CLAUSE'

I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.

Correlation names using insert and outer join

I am trying to run a code to insert rows from one table using rows from a different table on a different database.
I had this:
INSERT [testDB].[dbo].[table1]
SELECT * FROM
[sourceDB].[dbo].[table1]
LEFT OUTER JOIN [testDB].[dbo].[table1]
ON [sourceDB].[dbo].[table1].[PKcolumn] = [testDB].[dbo].[table1].[PKcolumn]
WHERE [testDB].[dbo].[table1].[PKcolumn] IS NULL
However I was told to add correlation names so I made this:
INSERT test
SELECT * FROM
[sourceDB].[dbo].[table1] as source
LEFT OUTER JOIN
[testDB].[dbo].[table1] as test
ON
source.[PKcolumn] = test.[PKcolumn]
WHERE test.[PKcolumn] IS NULL
I ended up getting this as an error message:
Msg 208, Level 16, State 1, Line 1
Invalid object name 'test'.
Does anyone know what I'm doing wrong?
In the first line you should use the real table name as in
insert into testDB.dbo.table1
SQLServer does not accept an alias or correlation name in that spot, and I confirmed that by testing.
But you can use the alias later in the query and it can be quite useful to do so to avoid ambiguity about which table a column comes from.
Another potential problem in this query is the use of select *. This tries to insert the combined column set from sourcedb.dbo.table1 and testdb.dbo.table1 into testdb.dbo.table1. That can't work.
Instead of select * you could say...(assuming source and test have exactly the same columns)
select source.*
or you could call out the specific columns as in...
select source.colA, source.col3, etc....
I don't know the names of your columns.
INSERT test
SELECT *
FROM [sourceDB].[dbo].[table1] as source
LEFT OUTER JOIN [testDB].[dbo].[table1] as test
ON source.[PKcolumn] = test.[PKcolumn]
WHERE test.[PKcolumn] IS NULL
Let's talk about what is wrong with this. First select * would have all the columns from source and test in it which is clearly more columns than the table you plan to insert into has. It never acceptable to use select * in an insert statement for several reasons.
First, if anyone changes the order of the columns or the structure of table, the insert breaks. Second, when you have a join like this, then it has the wrong number of columns. Third, even if they have the same columns if they are orginally in a differnt order, you may put the data into the worng column. If they are similar datatypes and the data fits or can be implicity converted, the database won't stop you from doing this.
Next you can't use an alias from the select as the destination in an insert, you must reference the actual tablename.
Finally, it is a very poor practice to not use a column list in every insert. This helps with maintenance and makes sure you can check to see if the columns inteh selct match up to the columns in the insert. Further, if you have an autogenerated field, you must use a column list or it will try to insert into the autogenerated field and thus error.
So your statement should look something like this:
INSERT [testDB].[dbo].[table1] (field1, field2, field3)
SELECT source.field1, source.field2, source.field3
FROM [sourceDB].[dbo].[table1] as source
LEFT OUTER JOIN [testDB].[dbo].[table1] as test
ON source.[PKcolumn] = test.[PKcolumn]
WHERE test.[PKcolumn] IS NULL
Or (possibly more efficient, you will have to test in your particular situation):
INSERT [testDB].[dbo].[table1] (field1, field2, field3)
SELECT source.field1, source.field2, source.field3
FROM [sourceDB].[dbo].[table1] as source
WHERE NOT EXISTS (SELECT * FROM testDB].[dbo].[table1] test
WHERE source.[PKcolumn]=test.[PKcolumn])

How to SELECT * but without "Column names must be unique in each view"

I need to encapsulate a set of tables JOINs that we freqently make use of on a vendor's database server. We reuse the same JOIN logic in many places in extracts etc. and it seemed a VIEW would allow the JOINs to be defined and maintained in one place.
CREATE VIEW MasterView
AS
SELECT *
FROM entity_1 e1
INNER JOIN entity_2 e2 ON e2.parent_id = entity_1.id
INNER JOIN entity_3 e3 ON e3.parent_id = entity_2.id
/* other joins including business logic */
etc.
The trouble is that the vendor makes regular changes to the DB (column additions, name changes) and I want that to be reflected in the "MasterView" automatically.
SELECT * would allow this, but the underlying tables all have ID columns so I get the "Column names in each view must be unique" error.
I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
Is there any way to achieve the dynamism of SELECT * but effectively exclude certain columns (i.e. the ID ones)
Thanks
I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
In this case, you can't avoid it. You must specify column names and for those columns with duplicate names use an alias. Code generation can help with these many columns.
SELECT * is bad practice regardless - if someone adds a 2GB binary column to one of these tables and populates it, do you really want it to be returned?
One simple method to generate the columns you want is
select column_name+',' from information_schema.columns
where table_name='tt'
and column_name not in('ID')
As well as Oded's answer (100% agree with)...
If someone changes the underlying tables, you need view maintenance anyway (with sp_refreshview). The column changes will not appear in the view automatically. See "select * from table" vs "select colA, colB, etc. from table" interesting behaviour in SQL Server 2005
So your "reflected in the "MasterView" automatically requirement can't be satisfied anyway
If you want to ensure the view is up to date, use WITH SCHEMABINDING which will prevent changes to the underlying tables (until removed or dropped). Then make column changes, then re-apply the view
I had the same issue, see example below:
ALTER VIEW Summary AS
SELECT * FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
and I encountered that error you mentioned, the easiest solution is using the alias before * like this:
SELECT t1.* FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
You shouldn't see that error anymore.
I had gone with this in the end, building off of Madhivanan's suggestion. It's similar to what t-clausen.dk later suggested (thanks for your efforts) though I find the xml path style more elegant than cursors / rank partitions.
The following recreates the MasterView definition when run. All columns in the underlying tables are prepended with the table name, so I can include two similarly named columns in the view by default. This alone solves my original problem, but I also included the "WHERE column_name NOT IN" clause to specifically exclude certain columns that will never be used in the MasterView.
create procedure Utility_RefreshMasterView
as
begin
declare #entity_columns varchar(max)
declare #drop_view_sql varchar(max)
declare #alter_view_definition_sql varchar(max)
/* create comma separated string of columns from underlying tables aliased to avoid name collisions */
select #entity_columns = stuff((
select ','+table_name+'.['+column_name+'] AS ['+table_name+'_'+column_name+']'
from information_schema.columns
where table_name IN ('entity_1', 'entity_2')
and column_name not in ('column to exclude 1', 'column to exclude 2')
for xml path('')), 1, 1, '')
set #drop_view_sql = 'if exists (select * from sys.views where object_id = object_id(N''[dbo].[MasterView]'')) drop view MasterView'
set #alter_view_definition_sql =
'create view MasterView as select ' + #entity_columns + '
from entity_1
inner join entity_2 on entity_2 .id = entity_1.id
/* other joins follow */'
exec (#drop_view_sql)
exec (#alter_view_definition_sql)
end
If you have a Select * and then you are using the JOIN, the result might include columns with the same name and that cannot be possible in a view.If you run the query by itself, works fine but not when creating the View.
For example:
**Table A**
ID, CatalogName, CatalogDescription
**Table B**
ID, CatalogName, CatalogDescription
**After the JOIN query**
ID, CatalogName, CatalogDescription, ID, CatalogName, CatalogDescription
That's not possible in a View.
Specify a unique name for each column in the view. Using just * is not a very good practice.

Resources