SQL Server Indexes - Column Order - sql-server

Going of the diagram here: I'm confused on column 1 and 3.
I am working on an datawarehouse table and there are two columns that are used as a key that gets you the primary key.
The first column is the source system. there are three possible values Lets say IBM, SQL, ORACLE. Then the second part of the composite key is the transaction ID it could ne numerical or varchar. There is no 3rd column. Other than the secret key which would be a key generated by Identity(1,1) as the record gets loaded. So in the graph below I imagine if I pass in a query
Select a.Patient,
b.Source System,
b.TransactionID
from Patient A
right join Transactions B
on A.sourceSystem = B.sourceSystem and
a.transactionID = B.transactionID
where SourceSystem = "SQL"
The graph leads me to think that column 1 in the index should be set to the SourceSystem. Since it would immediately split the drill down into the next level of index by a 3rd. But when showing this graph to a coworker, they interpreted it as column 1 would be the transactionID, and column 2 as the source system.
Cols
1 2 3
-------------
| | 1 | |
| A |---| |
| | 2 | |
|---|---| |
| | | |
| | 1 | 9 |
| B | | |
| |---| |
| | 2 | |
| |---| |
| | 3 | |
|---|---| |

First, you should qualify all column names in a query. Second, left join usually makes more sense than a right join (the semantics are keep all columns in the first table). Finally, if you have proper foreign key relationships, then you probably don't need an outer join at all.
Let's consider this query:
Select p.Patient, t.Source System, t.TransactionID
from Patient p join
Transactions t
on t.sourceSystem = p.sourceSystem and
t.transactionID = p.transactionID
where t.SourceSystem = 'SQL';
The correct index for this query is Transactions(SourceSystem, TransactionId).
Notes:
Outer joins affect the choice of indexes. Basically if one of the tables has to be scanned anyway, then an index might be less useful.
t.SourceSystem = 'SQL' and p.SourceSystem = 'SQL' would probably optimize differently.
Does the patient really have a transaction id? That seems strange.

Related

Create SQL Server Select/Delete Query from value in other table

I have a master table named Master_Table and the columns and values in the master table are below:
| ID | Database | Schema | Table_name | Common_col | Value_ID |
+-------+------------+--------+-------------+------------+----------+
| 1 | Database_1 | Test1 | Test_Table1 | Test_ID | 1 |
| 2 | Database_2 | Test2 | Test_Table2 | Test_ID | 1 |
| 3 | Database_3 | Test3 | Test_Table3 | Test_ID2 | 2 |
I have another Value_Table which consist of values that need to be deleted.
| Value_ID | Common_col | Value |
+----------+------------+--------+
| 1 | Test_ID | 110 |
| 1 | Test_ID | 111 |
| 1 | Test_ID | 115 |
| 2 | Test_ID2 | 999 |
I need to build a query to create a SQL query to delete the value from the table provided in Master_Table whose database and schema information is provided in the same row. The column that I need to refer to delete the record is given in Common_col column of master table and the value I need to select is in Value column of Value_Table.
The result of my query should create a query as given below :
DELETE FROM Database_1.Test1.Test_Table1 WHERE Test_ID=110;
or
DELETE FROM Database_1.Test1.Test_Table1 WHERE Test_ID in (110,111,115);
These query should be inside a loop so that I can delete all the row from all the database and tables provided in master table.
Queries don't really create queries.
One way to do what you're saying, which could be useful if this is a one time thing or very occasional thing, is to use SSMS to generate query statements, then copy them to the clipboard, paste them into the window, and execute there.
SELECT 'DELETE FROM Database_1.Test1.Test_Table1 WHERE '
+ common_col
+ ' = '
+ convert(VARCHAR(10),value)
This probably isn't what you want; it sounds more like you want to automate cleanup or something.
You can turn this into one big query if you don't mind repeating yourself a little:
DELETE T1
FROM Database_1.Test1.Test_Table1 T1
INNER JOIN Database_1.Test1.ValueTable VT ON
(VT.common_col = 'Test_ID' and T1.Test_ID=VT.Value) OR
(VT.common_col = 'Test_ID2' and T1.Test_ID2=VT.Value)
You can also use dynamic SQL combined with the first part ... but I hate dynamic SQL so I'm not going to put it in my answer.

Optimize SQL query Select on Select Case

I was looking for some threads in here that mention optimization in queries, but i couldn't resolve my problem.
I need to perform a query in SQL Server that involve using a select case on my primary select, this is the description of the main table:
WS:
| Oid | model_code | product_code | year |
In my query, I need to select all of this columns plus an extra column that compares to another table if by some criteria the values from my main table exist on my other table, let me explain my other table and then I explain what i mean by this.
TA:
| Oid | model_code | product_code | year |
Both tables have matching columns, so for example, if on my table WS I have this result:
| Oid | model_code | product_code | year |
| 1 | 13 | 123 | 2018 |
And on my TA table I have this:
| Oid | model_code | product_code | year |
| 1 | 25 | 134 | 2016 |
| 2 | 13 | 123 | 2018 |
| 3 | 67 | 582 | 2017 |
I need to print an "Exist" result on that row because the row on my main table match exactly with this 3 column values.
So my query on that row should print something like this:
| model_code | product_code | year | Exist |
| 13 | 123 | 2018 | Yes |
The query I was trying to use to make this happen, was this:
SELECT
WS.Oid, WS.model_code, WS.product_code, Ws.year,
(SELECT
CASE
WHEN EXISTS (SELECT 1 FROM TA
WHERE TA.model_code = Ws.model_code
AND TA.product_code = Ws.product_code
AND TA.[Year] = Ws.[Year])
THEN 'Yes'
ELSE 'No'
END) as 'Exist'
FROM
Ws
And it works, the problem is that on my real tables there are more columns and more rows (about 960,000) and for example, a query around 50,000 elements (using this query) takes more than a minute, and the same query with same elements but without the select case, takes about 2 seconds, so the difference is immense.
I'm sure that a more viable way to achieve this exist, in less time, but I don't know how. any recommendations?
Unless already there, an index on ta (model_code, product_code, year) might help.
CREATE INDEX ta_model_code_product_code_year
ON ta (model_code,
product_code,
year);
Though chances are that the optimizer already rewrites your query in such a way, another thing you could try is to (explicitly) rewrite the query using a left join. I assume oid is NOT NULL in ta.
SELECT ws.oid,
ws.model_code,
ws.product_code,
ws.year,
CASE
WHEN ta.oid IS NULL THEN
'No'
ELSE
'Yes'
END exist
FROM ws
LEFT JOIN ta
ON ta.model_code = ws.model_code
AND ta.product_code = ws.product_code
AND ta.year = ws.year;
With that you want the index from above and maybe try one one ws (model_code, product_code, year) too.
CREATE INDEX ws_model_code_product_code_year
ON ws (model_code,
product_code,
year);
You might also want to play with the order of the columns in the indexes. If for a column more distinct values exist in ta, put it before a column where fewer distinct values exist in ta. But keep the order in both indexes identical, i.e. if you shift a column in the index on ta also move it in the index on ws the same way.
What you want to do is join the two tables together, instead of looking for a matching record for each record. Try something like this:
SELECT
WS.model_code, WS.product_code, Ws.year,
SELECT CASE
WHEN TA.OID IS NOT NULL THEN 'Yes'
ELSE 'No'
END As 'Exist'
FROM WS LEFT OUTER JOIN TA ON
TA.model_code = Ws.model_code
AND TA.product_code = Ws.product_code
AND TA.[Year] = Ws.[Year]
That will print all of the records from the WS table, and if there's a matching record in the TA table, the 'Exist' column will say 'Yes', otherwise it will say 'No'.
This uses one query to do everything. Your original approach would do a completely separate sub-query to check the TA table, and that is creating your performance issue.
You may also want to look at putting indexes on these 3 fields in each table to make the matching go even faster.

SQL Server delete on multiple foreign keyed tables - performance

I am trying to remove old data from a SQL Server database, given a list of ID's, but I'm trying to figure out how to get it to run faster. Currently deleting a list of 250 ID's takes around 1 hour. These ID's are attached to our 'root' objects, example below. Each of these has foreign key constraints.
Products
| productID | description | price |
+-----------------+-------------------+-------------+
| 1 | item 1 | 5.00 |
| 2 | item 2 | 5.00 |
| 3 | item 3 | 5.00 |
| ... | ... | ... |
Sales
| saleID | productID |
+-----------------+-------------------+
| 4 | 1 |
| 5 | 2 |
| 6 | 3 |
| ... | ... |
Taxes
| taxID | saleID |
+-----------------+-------------------+
| 7 | 4 |
| 8 | 5 |
| 9 | 6 |
| ... | ... |
Currently, we are just passing a list of product ID's and cascading through manually, such as
DECLARE #ProductIDsRemoval AS TABLE { id int }
INSERT INTO #ProductIDsRemoval VALUES (1)
DELETE t
FROM dbo.Taxes t
INNER JOIN dbo.Sales s ON (s.saleID = t.saleID)
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE s
FROM dbo.Sales s
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE p
FROM dbo.Products p
INNER JOIN #ProductIDsRemoval p2 ON (p.productID = p2.id)
This works fine, however my issue is that my table structure has ~70 tables and at least a couple thousand rows in each to remove, if not a couple million. Currently, my query takes anywhere from 1 to 6 hours to run, depending on the number of base ID's we're removing (my structure doesn't actually use Products/Taxes/Sales, but it's a decent analogy, and the number we're aiming to remove is ~750 base ids, which we are estimating 3-5 hours for runtime)
I've seen other Stack Overflow answers saying to drop all constraints, add the on-cascade delete, and then re-add the constraints, but this also is taking quite a long time, as I would need to 1. Drop constraints. 2. Rebuild with on-cascade. 3. run my query. 4 drop constraints. 5 re-add without on-cascade.
I've also been looking at possibly just selecting everything I need into temp tables, truncating all of the other tables, and then re-inserting all of my values back and re-setting the indexes based on the last item I added, but again I would need to edit all foreign keys, which I would prefer to not do.

What is the best SQL query method to export data from related groups?

I have an old data table (in MS ACCESS, if you can believe it) that is supposed to be 'related products' from an older ecommerce store. I'm trying to salvage these related products for my new store.
The dataset with the following fields/data sample:
+---------+------------+-----------------+
| GroupID | ProductId | Sku |
+---------+------------+-----------------+
| 1001 | 12473 | C2S-44682-AMB |
| 1001 | 3628 | C-43604-1 |
+---------+------------+-----------------+
The "groupID" is the association -- productIds in the same group are related to each other. So these two products are related to each other because they both belong in GroupId 1001. There are some 3500 rows of data total.
What I need is to export these related products into a new table so that I can import into the new store and retain the related relationship. The new data needs a different formatted structure:
ParentId (the first product), ChildId (the second, related product)
So -- using my example from above:
12473, 3628 (the first product should display the second)
3628, 12473 (the second product should display the first
I'm not sure how to author the correct SQL query to locate, loop through, and write these new records into a new DB.
I thought perhaps a "For/Each" loop, but in looking for references, I couldn't seem to locate the proper context (lots of PHP examples, but I'm not strong in PHP and really think there has to be a SQL method to do this). You can run aggregates on "having" clauses on SQL, but again, that didn't seem right to me either.
Any suggestions on how to proceed?
No need for a loop... just a self-join. Notice I added some records to the test data for a more in depth example.
declare #oldTable table (GroupID int, ProductId int, Sku varchar(64))
insert into #oldTable
values
(1001,12473,'C2S-44682-AMB'),
(1001,3628,'C-43604-1'),
(1001,4896,'C-43-558604-1'),
(1099,4458,'C-xxx-1'),
(1099,5217,'C-asbf3-1')
select
t1.ProductId as parent
,t2.ProductId as Child
from
#oldTable t1
left join
#oldTable t2 on
t1.GroupID = t2.GroupID
and t1.ProductId <> t2.ProductId
RETURNS
+--------+-------+
| parent | Child |
+--------+-------+
| 12473 | 3628 |
| 12473 | 4896 |
| 3628 | 12473 |
| 3628 | 4896 |
| 4896 | 12473 |
| 4896 | 3628 |
| 4458 | 5217 |
| 5217 | 4458 |
+--------+-------+
You need a self-join where you match all products to all other products in the group, but exclude the initial product. I believe this will do the trick for you.
Just replace dbo.Product with your actual table name.
select
ProductID = P.ProductID,
RelatedProductID = R.ProductID
from dbo.Product P
join dbo.Product R on P.GroupID = R.GroupID and P.ProductID != R.ProductID

SQL - Query Multiple Dissimilar Tables (not UNION)

Language: T-SQL
Server: SQL Server 2008 R2 - SQL Server 2014
I have what, based on searching here an elsewhere, appears to either be a unique problem or I can't properly verbalize what I'm trying to accomplish. I'd like to query across multiple dissimilar tables that have dissimilar field structures and JOIN them to a single other table. We have a table of ASSETS tb_assets and a table of LICENSES tb_licenses. I'd like to query across both of these and JOIN them to the table of VENDORS tb_vendors.
Like this:
---------------------- ---------------------------
| TB_ASSETS | | TB_LICENSES |
---------------------- ---------------------------
| f_assetvendor | <~~~ ~~~> | f_licensevendor |
| f_assettag | | | | f_licensename |
| f_assetname | | | | f_licenseexpirationdate |
| | | | | f_licensequantity |
---------------------- | | ---------------------------
| |
~~~~~~~~ ~~~~~~~~~~
| ---------------------- |
| | TB_VENDORS | |
| ---------------------- |
~~> | f_vendorGUID | <~~
| f_vendorname |
----------------------
For a short example, I want to search for a vendor name (f_vendorname) of Amazon, I'd like to query against tb_assets as well as against tb_licenses. The query I tried below errors with Invalid column name 'f_assetvendor', so I'm doing something wrong.
SELECT
f_assetvendor AS 'AssetVendor', f_licensevendor as 'LicenseVendor'
FROM
tb_assets, tb_licenses
LEFT JOIN
tb_vendors assven ON assven.f_vendorGUID = f_assetvendor
LEFT JOIN
tb_vendors licven ON licven.f_vendorGUID = f_licensevendor
WHERE
f_vendorname LIKE '%Amazon%'
Regarding my title stating "not UNION", I can't use a UNION here because with UNION column names for the final result set are taken from the first query, the columns must have the same data types, and both tables must have the same number of columns.
Give this a go;
SELECT
v.f_vendorGUID,
v.f_vendorname,
a.f_assetvendor AssetVendor,
l.f_licensevendor LicenseVendor
FROM
TB_VENDORS v
JOIN
TB_ASSETS a ON v.f_vendorGUID = a.f_assetvendor
JOIN
TB_LICENSES l ON v.vendorGUID = l.f_licensevendor
WHERE
v.vendorname LIKE '%Amazon%'
You can use the TB_VENDORS as the main table and join the other two tables to it, in this instance (inner join) there's no particular order they should be in. You've shown in your diagram that there's a join between these tables. If you have the chance of missing data in either TB_ASSETS or TB_LICENCES, use LEFT JOIN instead of JOIN.
Please get out of the habit of that old style join you've used in your FROM statement, it's a really old way of doing it.
I know you said no to a union statement but I think this gives you what you need unless I am misunderstanding your query.
Select [t].[Vendor]
, [t].[VendorType]
From ( Select [f_assetvendor] As 'Vendor'
, 'Asset' As 'VendorType'
From [tb_assets]
Left Join [tb_vendors] [assven]
On [assven].[f_vendorGUID] = [f_assetvendor]
Union All
Select [f_licensevendor] As 'Vendor'
, 'License' As 'VendorType'
From [tb_licenses]
Left Join [tb_vendors] [licven]
On [licven].[f_vendorGUID] = [f_licensevendor]
) [t]
Where [t].[Vendor] Like '%Amazon%';
However you get there, you will need to make your tables similar enough to report together.

Resources