TSQL to Eliminate Repetitive Query - sql-server

I have pretty basic table schema.
Table A
TEMPLATE_ID TEMPLATE_NAME
Table A has the following rows
1 Procs
2 Letter
3 Retire
4 Anniversary
5 Greet
6 Event
7 Meeting
8... etc.
Table B
TEMPLATE_ID VALUE
Table B has 100K+ rows with TEMPLATE_ID connecting the two tables.
Now the execs want a sample of 20 records of types 1-5 from table A. I could do something basic...which is about my speed when it comes to TSQL.
SELECT TOP(20) B.VALUE FROM TableB
JOIN TableA ON
B.TEMPLATE_ID = A.TEMPLATE_ID
AND TableA.TEMPLATE_NAME IN ('Procs', 'Letter'...)
But that isn't quite right as I end up with 20 rows...in other words I was expecting 100 rows. 20 for each.
Is this one of those areas where partition could be used. I can see how I would break TableB into partitions for each template (tableA) but I'm not sure how I would limit it to 20 rows.
OK so I could just cut and past into Excel 20 rows from each partition...I could also write 5 very basic queries...but this is kind of an academice...improve my knowledge pursuit.
So to clarify. 20 records from each of the first r template types.
TIA

you can use ROW_NUMBER and partition the data based on the template_name and return only 20 from each partition
SELECT * FROM
(
SELECT B.VALUE,
ROW_NUMBER() OVER ( PARTITION BY TableA.TEMPLATE_NAME ORDER BY ( select NULL)) as seq
FROM
TableB
JOIN TableA ON
B.TEMPLATE_ID = A.TEMPLATE_ID
) T
where T.seq <=20
order by B.VALUE

Could you try?
SELECT B.VALUE
FROM
(
SELECT TEMPLATE_ID,VALUE, DENSE_RANK ( ) OVER (PARTITION BY TEMPLATE_ID ORDER BY VALUE DESC) AS RANK_NO
FROM TABLE_B
) B INNER JOIN TABLE_A A ON (A.TEMPLATE_ID = B.TEMPLATE_ID)
WHERE A.TEMPLATE_NAME IN ('Procs', 'Letter'...)
AND B.RANK_NO <= 20
;

You use a ranking function. You first partition your data, order each partition and apply the ranking function:
select seq = row_number() over (
partition by table_catalog , table_schema , table_name
order by column_name
) ,
*
from information_schema.COLUMNS
The above code partitions the rows in information_schame.COLUMNS on the fully-qualified table/view name to which they belong. Each partition is then ordered alphabetically and given a row_number().
That then gets wrapped in another select which makes use of it. This code pulls the first 3 columns for each table in the system based on column and provides some information about it:
select t.table_name ,
t.table_schema ,
t.table_name ,
t.table_type ,
c.seq ,
c.ordinal_position ,
c.COLUMN_NAME ,
data_type = c.data_type + coalesce('('+convert(varchar,c.character_maximum_length)+')','')
+ case c.is_nullable when 'yes' then ' is null' else ' is not null' end
from information_schema.tables t
join ( select seq = row_number() over (
partition by table_catalog , table_schema , table_name
order by column_name
) ,
*
from information_schema.COLUMNS
) c on c.table_catalog = t.table_catalog
and c.table_schema = t.table_schema
and c.table_name = t.table_name
where c.seq <= 3
order by t.table_catalog ,
t.table_schema ,
t.table_name ,
c.seq

SELECT * FROM
( SELECT B.VALUE, TableA.TEMPLATE_NAME
ROW_NUMBER() OVER ( PARTITION BY A.TEMPLATE_ID ORDER BY NEWID() ) as row
FROM TableB
JOIN TableA
ON A.TEMPLATE_ID = B.TEMPLATE_ID
AND A.TEMPLATE_ID <= 5
) T
where T.row <= 20
order by B.VALUE

Related

When joining tables adapt the "on" statement depending on the query results

I have 2 tables:
Table_1 with columns col_A, col_B , col_C , col_D , col_E
Table_2 with columns col_A, col_B , col_C , col_D , col_F
I would like to join them on columns col_A, col_B , col_C , col_D.
For the rows in Table_1 that do not get joined this way (as they don't have a match in Table_2), I would like to join them only on columns col_A, col_B , col_C.
If there are still rows in Table_1 that did not get joined, i would like to join them only on columns col_A, col_B.
And once that is done and there are still rows in Table_1 that did not get joined, i would like to join them only on column col_A.
I wrote the following script where i use a new table to get this result.
Is there is a more efficient way to do this? Preferably by creating a view, not a table?
create table new_table (col_A nvarchar(50) , col_B nvarchar(50) , col_C nvarchar(50)
, col_D nvarchar(50) , col_E nvarchar(50) , col_F nvarchar(50) )
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
and Table_1.col_C=Table_2.col_C
and Table_1.col_D=Table_2.col_D
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
and Table_1.col_C=Table_2.col_C
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
You could join them on just colA and then assign some different numbers:
WITH cte AS(
SELECT
CASE WHEN t1.D = t2.D THEN 100 ELSE 0 END +
CASE WHEN t1.C = t2.C THEN 10 ELSE 0 END +
CASE WHEN t1.B = t2.B THEN 1 ELSE 0 END as whatMatched,
*
FROM
t1 JOIN t2 on t1.A = t2.A
)
Now if a row got 111 we know that all (ABCD) matched, got 0 then only A matched etc..
So we can ask for only some rows:
SELECT * FROM cte WHERE whatmatched IN (111,11,1,0)
And lastly if there were multiples (matching on just A might mean there are duplicates), we can assign a row number to them in descending order and only take the first row:
SELECT x.* FROM
(SELECT *, ROW_NUMBER() OVER(ORDER BY whatmatched DESC) rown FROM cte WHERE whatmatched IN (111,11,1,0)) x
WHERE x.rown = 1
If it suits you better to use letters
we can assess the matches, choose only A, AB, ABC, or ABCD, then pick the most specific one by looking at the LENgth of the match string:
WITH cte AS(
SELECT
'A' +
CASE WHEN t1.B = t2.B THEN 'B' ELSE '' END +
CASE WHEN t1.C = t2.C THEN 'C' ELSE '' END +
CASE WHEN t1.D = t2.D THEN 'D' ELSE '' END as whatMatched,
*
FROM
t1 JOIN t2 on t1.A = t2.A
)
SELECT x.* FROM
(SELECT *, ROW_NUMBER() OVER(ORDER BY LEN(whatmatched) DESC) rown FROM cte WHERE whatmatched IN ('A','AB','ABC','ABCD')) x
WHERE x.rown = 1
If you want ties (i.e. a row from t1 that matches two rows from t2 because their A/B/C is the same and D differs, use DENSE_RANK instead of ROW_NUMBER so they end up tied for first place

Faster execution of non nulls for a column

I need to get percentage of nulls for a given column in a table. The table contains close to 368081344 records as of now in table. Number of records will increase by 20 million each day. Below is the query am using.
SELECT (COUNT_BIG(column)/ count_big(*)) * 100
from <table>
Then, I perform 100 - above output to fetch the required output
Please let me know best possible solution which can yield faster result
Have you tried the below method :
DECLARE #T TABLE
(
Id INT
)
;WITH CTE
AS
(
SELECT
SeqNo = 1,
NULL "Val"
UNION ALL
SELECT
SeqNo = SeqNo+1,
Val
FROM CTE
WHERE SeqNo<100
)
INSERT INTO #T(Id)
SELECT Val FROM CTE
UNION ALL
SELECT SeqNo FROM CTE
SELECT
TotCount = COUNT(1),
ValCount = SUM(CASE WHEN Id IS NULL THEN 0 ELSE 1 END),
NullCount = SUM(CASE WHEN Id IS NOT NULL THEN 0 ELSE 1 END),
NullPercent = (CAST(SUM(CASE WHEN Id IS NOT NULL THEN 0 ELSE 1 END) AS FLOAT)/CAST(COUNT(1) AS FLOAT))*100
FROM #T
Partial answer only. Not sure how to get the count for a specific column
You can speed up the total row count using this query.
SELECT P.ROWS
FROM SYS.OBJECTS AS O INNER JOIN SYS.PARTITIONS AS P
ON O.OBJECT_ID = P.OBJECT_ID
WHERE O.NAME = 'PARENT' AND
P.INDEX_ID < 2
ORDER BY O.NAME

Updating multiple row with random data from another table?

Combining some examples, I came up with the following query (fields and table names have been anonymised soI hope I didn't insert typos).
UPDATE destinationTable
SET destinationField = t2.value
FROM destinationTable t1
CROSS APPLY (
SELECT TOP 1 'SomeRequiredPrefix ' + sourceField as value
FROM #sourceTable
WHERE sourceField <> ''
ORDER BY NEWID()
) t2
Problem
Currently, all records get the same value into destinationField , value needs to be random and different. I'm probably missing something here.
Here's a possible solution. Using CTE's assign row numbers to both tables based on random order. Join the tables together using that rownumber and update the rows accordingly.
;WITH
dt AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNum
FROM dbo.destinationtable),
st AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNum
FROM dbo.#sourcetable)
UPDATE dt
SET dt.destinationfield = 'SomeRequiredPrefix ' + st.sourcefield
FROM dt
JOIN st ON dt.RowNum = st.RowNum
UPDATED SOLUTION
I used CROSS JOIN to get all possibilities since you have less rows in source table. Then assign random rownumbers and only take 1 row for each destination field.
;WITH cte
AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY destinationfield ORDER BY NEWID()) AS Rownum
FROM destinationtable
CROSS JOIN #sourcetable
WHERE sourcefield <> ''
)
UPDATE cte
SET cte.destinationfield = 'SomeRequiredPrefix ' + sourcefield
WHERE cte.Rownum = 1
SELECT * FROM dbo.destinationtable

SQL Server 2012, exclude column

I have SQL query
; with cte as
(
SELECT
PARSENAME(REPLACE(replace(replace(replace(replace(dbo.IDENTITY_MAP.Name, 'My Company\', ''), '-VLAN2', ''), '.VLAN2\', ''), '.Instr\', '') , '\' , '.'), 1) as "Site",
Count (CASE
WHEN dbo.SEM_AGENT.AGENT_VERSION LIKE '11.%' THEN 1
END) AS 'SEP-11',
Count (CASE
WHEN dbo.SEM_AGENT.AGENT_VERSION LIKE '12.%' THEN 1
END) AS 'SEP-12',
FROM
dbo.sem_computer
INNER JOIN
[dbo].[V_SEM_COMPUTER] ON [dbo].[V_SEM_COMPUTER].COMPUTER_ID = SEM_COMPUTER.COMPUTER_ID
WHERE
dbo.IDENTITY_MAP.Name NOT LIKE '%Servers%'
GROUP BY
PARSENAME(REPLACE(replace(replace(replace(replace(dbo.IDENTITY_MAP.Name,'My Company\',''),'-VLAN2',''),'.VLAN2\',''),'.Instr\','') , '\' , '.'),1)
)
select *
from cte
join SEPM_site ss on cte.Site = ss.Site
That gives output I am looking for ------ almost i.e.
Site SEP-11 SEP-12 Rackcode Circuit Site
I only need one column for Site.
I tried recreating a temporary table with the columns, and dropping it, i.e.
; with cte as (SELECT ...)
select * into temptable
from cte
join SEPM_site ss
on cte.Site = ss.Site
alter table temptable
drop column cte.Site
select * from temptable
drop table temptable
But I get error
Incorrect syntax near '.'
And if I don't specify which table Site is from, I get error,
Column names in each table must be unique. Column name 'Site' in table 'temptable' is specified more than once.
But that's why I am trying to remove duplicate column!
Thanks!
Just specify the columns you want in your select statement:
select cte.Site, cte.[SEP-11], cte.[SEP-12], ss.Rackcode, ss.Circuit
from cte
join SEPM_site ss
on cte.Site = ss.Site
You can also select all columns in cte and just the ones you want in ss:
select cte.*, ss.Rackcode, ss.Circuit
from cte
join SEPM_site ss
on cte.Site = ss.Site

SQL Server Full Text Search - Weighting Certain Columns Over Others

If I have the following full text search query:
SELECT *
FROM dbo.Product
INNER JOIN CONTAINSTABLE(Product, (Name, Description, ProductType), 'model') ct
ON ct.[Key] = Product.ProductID
Is it possible to weigh the columns that are being searched?
For example, I care more about the word model appearing in the Name column than I do the
Description or ProductType columns.
Of course if the word is in all 3 columns then I would expect it to rank higher than if it was just in the name column. Is there any way to have a row rank higher if it just appears in Name vs just in Description/ProductType?
You can do something like the following query. Here, WeightedRank is computed by multiplying the rank of the individual matches. NOTE: unfortunately I don't have Northwind installed so I couldn't test this, so look at it more like pseudocode and let me know if it doesn't work.
declare #searchTerm varchar(50) = 'model';
SELECT 100 * coalesce(ct1.RANK, 0) +
10 * coalesce(ct2.RANK, 0) +
1 * coalesce(ct3.RANK, 0) as WeightedRank,
*
FROM dbo.Product
LEFT JOIN
CONTAINSTABLE(Product, Name, #searchTerm) ct1 ON ct1.[Key] = Product.ProductID
LEFT JOIN
CONTAINSTABLE(Product, Description, #searchTerm) ct2 ON ct2.[Key] = Product.ProductID
LEFT JOIN
CONTAINSTABLE(Product, ProductType, #searchTerm) ct3 ON ct3.[Key] = Product.ProductID
order by WeightedRank desc
Listing 3-25. Sample Column Rank-Multiplier Search of Pro Full-Text Search in SQL Server 2008
SELECT *
FROM (
SELECT Commentary_ID
,SUM([Rank]) AS Rank
FROM (
SELECT bc.Commentary_ID
,c.[RANK] * 10 AS [Rank]
FROM FREETEXTTABLE(dbo.Contributor_Birth_Place, *, N'England') c
INNER JOIN dbo.Contributor_Book cb ON c.[KEY] = cb.Contributor_ID
INNER JOIN dbo.Book_Commentary bc ON cb.Book_ID = bc.Book_ID
UNION ALL
SELECT c.[KEY]
,c.[RANK] * 5
FROM FREETEXTTABLE(dbo.Commentary, Commentary, N'England') c
UNION ALL
SELECT ac.[KEY]
,ac.[RANK]
FROM FREETEXTTABLE(dbo.Commentary, Article_Content, N'England') ac
) s
GROUP BY Commentary_ID
) s1
INNER JOIN dbo.Commentary c1 ON c1.Commentary_ID = s1.Commentary_ID
ORDER BY [Rank] DESC;
Similar to Henry's solution but simplified, tested and using the details the question provided.
NB: I ran performance tests on both the union and left join styles and found the below to require far less logical reads on the union style below with my datasets YMMV.
declare #searchTerm varchar(50) = 'model';
declare #nameWeight int = 100;
declare #descriptionWeight int = 10;
declare #productTypeWeight int = 1;
SELECT ranksGroupedByProductID.*, outerProduct.*
FROM (SELECT [key],
Sum([rank]) AS WeightedRank
FROM (
-- Each column that needs to be weighted separately
-- should be added here and unioned with the other queries
SELECT [key],
[rank] * #nameWeight as [rank]
FROM Containstable(dbo.Product, [Name], #searchTerm)
UNION ALL
SELECT [key],
[rank] * #descriptionWeight as [rank]
FROM Containstable(dbo.Product, [Description], #searchTerm)
UNION ALL
SELECT [key],
[rank] * #productTypeWeight as [rank]
FROM Containstable(dbo.Product, [ProductType], #searchTerm)
) innerSearch
-- Grouping by key allows us to sum each ProductID's ranks for all the columns
GROUP BY [key]) ranksGroupedByProductID
-- This join is just to get the full Product table columns
-- and is optional if you only need the ordered ProductIDs
INNER JOIN dbo.Product outerProduct
ON outerProduct.ProductID = ranksGroupedByProductID.[key]
ORDER BY WeightedRank DESC;

Resources