tuning in sql server - views

tuning in sql server - views - sql-server

I created a view in sql server 2012, such as:
create myview as
select mytable2.name
from mytable1 t1
join myTable2 t2
on t1.id = t2.id
I want that join table1 and table2 will be with correct index (id), but when I do:
select * from myview
where name = 'abcd'
I want that the last select will be with index of column 'name'.
What is the correct syntax in sql server with hints (tuning), that do the best run, as I have described?
I want to force using of index for join purpose only (the column = id), and forcing index name when doing:
select name from myview
where name = 'abcd'.
Something like
create myview as
select mytable2.name
/* index hint name on column name */
from mytable1 t1
join myTable2 t2
/* index hint name on column id - just for join */
on t1.id = t2.id
I don't want to force end-user that uses the view add hint when doing the view - just bring him the view as his with proper index hints.
(or, if it is not possible - how can I do that).
Need samples, please.
Thanks :)

I reckon creating an Index on the Name column would use the index, when selecting from view with the above shown where clause, you dont have to explicitly give any query hints to make that view use the index.
Index should be something like...
Index
CREATE NONCLUSTERED INDEX [IX_MyTable1_Name]
ON [dbo].[myTable2] ([CompanyName] ASC)
GO
View Definition
CREATE VIEW myview
AS
SELECT t2.name --<-- Use alias here since you have alised your table in from clause
FROM mytable1 t1
INNER JOIN myTable2 t2 ON t1.id = t2.id

Indexes in SqlServer are built from two sets of columns.
Create index IX on table B (Filter Columns,Sorting Columns) INCLUDE (Additional columns to be included).
And when selecting from views, the optimizer will incorporate indexes on the referenced tables.
The first set is the indexing table itself. Best practice is to place the columns by which you filter first, and then the columns by which you sort.
The second set (Include), are additional columns you add to the indexing table, so all the data you require is in the index (to prevent key look ups - dpending on your table design).
In your case, the order will be
1) Go to MyTable2 by name, and get all of the matching ID's.
2) With the Id's from step 1, find the matching ID's in Mytable1
Your indexes should be :
1) An index on Table2(Name,ID) or Table2(Name)Include(ID)
2) An index on Table1(ID)
There shouldn't be any hint used in this case.
And in general, you should avoid using hints.

Related

Database Index when SQL statement includes "IN" clause

I have SQL statement which takes really a lot of time to execute and I really had to improve it somehow.
select * from table where ID=1 and GROUP in
(select group from groupteam where
department= 'marketing' )
My question is if I should create index on columns ID and GROUP would it help?
Or if not should I create index on second table on column DEPARTMENT?
Or I should create two indexes for both tables?
First table has 249003.
Second table has in total 900 rows while query in that table returns only 2 rows.
That is why I am surprised that response is so slow.
Thank you

You can also use EXISTS, depending on your database like so:
select * from table t
where id = 1
and exists (
select 1 from groupteam
where department = 'marketing'
and group = t.group
)
Create a composite index on individual indexes on groupteam's department and group
Create a composite index or individual indexes on table's id and group
Do an explain/analyze depending on your database to review how indexes are being used by your database engine.

Try a join instead:
select * from table t
JOIN groupteam gt
ON d.group = gt.group
where ID=1 AND gt.department= 'marketing'
Index on table group and id column and table groupteam group column would help too.

Shortcut for adding table to column name SQL-server 2014

Stupidly simple question, but I just don't know what to google!
If I create a query like this:
Select id, data
from table1
Now I want to join with table2. I can immediately see that the id column is no longer unique and I have to change it to
table1.id
Is there any smart way (like a keyboard-shortcut) to do this, instead of manually adding table1 to every column? Either before I add the Join to secure that all columns will be unique, or after with suggestions based on the different possible tables.

No, there is no helper.
But do not you can alias the table name:
select x.Col1, y.Col2
from ALongTableName x
inner join AReallyReallyLongTableName y on x.Id = y.OtherId
which can also make queries clearer, and is very much necessary when doing self joins.

First of all, you should start using aliases:
SQL aliases are used to give a database table, or a column in a table,
a temporary name.
Basically aliases are created to make column names more readable.
This will narrow down your problem and make your code maintenance easier. If that's not enough, I guess you could start using auto-completion tools, such as these:
SQL Complete
SQL Prompt
ApexSQL Complete
These have your desired functionality, however, they do not always work as expected (at least for me).

Oh! You can use alias table name. Like this:
SELECT A.ID, A.data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
You just only use A. or B. if two table have same this column selected. If they different, you don't need: Like this:
SELECT A.ID, data -- if Table B not have column data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
Or:
Select A.*, B.ID
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID

Searching for the unique key (not in meta)

With which SQL Server standard tool it is possible to search unique key in the table's data (but not in meta declaration)?
P.S. I am thinking to write such script by myself. May be you could point a snippet for
combinatorics in t-sql? e.g. for generation all Combinations from n by 1..n ?
P.P.S About problem complexity for those who do not see it. It is important that we do not need to analyze the whole data to dismiss the hypnotize that those two columns is the 'unique key'. With real world, 'report-like', sorted data even after analysing first two rows, I think, it is possible to remove many of columns combinations. So I feel such algorithm should have 'before full table compare' phase. But there it is a question for what portion of data to choose for this 'before full table compare' phase . The best candidate about which I think is the 'page'... If data unique in the page we could test the uniqueness on whole table, if not unique (on the page), then go to the next column set.

select t1.col, count(*)
from table t1
join table t2
on t1.col = t2.col
group by t1.col
having count(*) > 1
if zero rows are returned then it is unique
more than one column
select t1.cola, t1.colb, count(*)
from table t1
join table t2
on t1.cola = t2.cola
and t1.colb = t2.colb
group by t1.cola, t2.colb
having count(*) > 1

How to SELECT * but without "Column names must be unique in each view"

I need to encapsulate a set of tables JOINs that we freqently make use of on a vendor's database server. We reuse the same JOIN logic in many places in extracts etc. and it seemed a VIEW would allow the JOINs to be defined and maintained in one place.
CREATE VIEW MasterView
AS
SELECT *
FROM entity_1 e1
INNER JOIN entity_2 e2 ON e2.parent_id = entity_1.id
INNER JOIN entity_3 e3 ON e3.parent_id = entity_2.id
/* other joins including business logic */
etc.
The trouble is that the vendor makes regular changes to the DB (column additions, name changes) and I want that to be reflected in the "MasterView" automatically.
SELECT * would allow this, but the underlying tables all have ID columns so I get the "Column names in each view must be unique" error.
I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
Is there any way to achieve the dynamism of SELECT * but effectively exclude certain columns (i.e. the ID ones)
Thanks

I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
In this case, you can't avoid it. You must specify column names and for those columns with duplicate names use an alias. Code generation can help with these many columns.
SELECT * is bad practice regardless - if someone adds a 2GB binary column to one of these tables and populates it, do you really want it to be returned?

One simple method to generate the columns you want is
select column_name+',' from information_schema.columns
where table_name='tt'
and column_name not in('ID')

As well as Oded's answer (100% agree with)...
If someone changes the underlying tables, you need view maintenance anyway (with sp_refreshview). The column changes will not appear in the view automatically. See "select * from table" vs "select colA, colB, etc. from table" interesting behaviour in SQL Server 2005
So your "reflected in the "MasterView" automatically requirement can't be satisfied anyway
If you want to ensure the view is up to date, use WITH SCHEMABINDING which will prevent changes to the underlying tables (until removed or dropped). Then make column changes, then re-apply the view

I had the same issue, see example below:
ALTER VIEW Summary AS
SELECT * FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
and I encountered that error you mentioned, the easiest solution is using the alias before * like this:
SELECT t1.* FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
You shouldn't see that error anymore.

I had gone with this in the end, building off of Madhivanan's suggestion. It's similar to what t-clausen.dk later suggested (thanks for your efforts) though I find the xml path style more elegant than cursors / rank partitions.
The following recreates the MasterView definition when run. All columns in the underlying tables are prepended with the table name, so I can include two similarly named columns in the view by default. This alone solves my original problem, but I also included the "WHERE column_name NOT IN" clause to specifically exclude certain columns that will never be used in the MasterView.
create procedure Utility_RefreshMasterView
as
begin
declare #entity_columns varchar(max)
declare #drop_view_sql varchar(max)
declare #alter_view_definition_sql varchar(max)
/* create comma separated string of columns from underlying tables aliased to avoid name collisions */
select #entity_columns = stuff((
select ','+table_name+'.['+column_name+'] AS ['+table_name+'_'+column_name+']'
from information_schema.columns
where table_name IN ('entity_1', 'entity_2')
and column_name not in ('column to exclude 1', 'column to exclude 2')
for xml path('')), 1, 1, '')
set #drop_view_sql = 'if exists (select * from sys.views where object_id = object_id(N''[dbo].[MasterView]'')) drop view MasterView'
set #alter_view_definition_sql =
'create view MasterView as select ' + #entity_columns + '
from entity_1
inner join entity_2 on entity_2 .id = entity_1.id
/* other joins follow */'
exec (#drop_view_sql)
exec (#alter_view_definition_sql)
end

If you have a Select * and then you are using the JOIN, the result might include columns with the same name and that cannot be possible in a view.If you run the query by itself, works fine but not when creating the View.
For example:
**Table A**
ID, CatalogName, CatalogDescription
**Table B**
ID, CatalogName, CatalogDescription
**After the JOIN query**
ID, CatalogName, CatalogDescription, ID, CatalogName, CatalogDescription
That's not possible in a View.
Specify a unique name for each column in the view. Using just * is not a very good practice.

How can I "subtract" one table from another?

I have a master table A, with ~9 million rows. Another table B (same structure) has ~28K rows from table A. What would be the best way to remove all contents of B from table A?
The combination of all columns (~10) are unique. Nothing more in the form a of a unique key.

If you have sufficient rights you can create a new table and rename that one to A. To create the new table you can use the following script:
CREATE TABLE TEMP_A AS
SELECT *
FROM A
MINUS
SELECT *
FROM B
This should perform pretty good.

DELETE FROM TableA WHERE ID IN(SELECT ID FROM TableB)
Should work. Might take a while though.

one way, just list out all the columns
delete table a
where exists (select 1 from table b where b.Col1= a.Col1
AND b.Col2= a.Col2
AND b.Col3= a.Col3
AND b.Col4= a.Col4)

Delete t2
from t1
inner join t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
and t1.col4 = t2.col4
and t1.col5 = t2.col5
and t1.col6 = t2.col6
and t1.col7 = t2.col7
and t1.col8 = t2.col8
and t1.col9 = t2.col9
and t1.col10 = t2.col0
This is likely to be very slow as you would have to have every col indexed which is highly unlikely in an environment when a table this size has no primary key, so do it during off peak. What possessed you to have a table with 9 million records and no primary key?

If this is something you'll have to do on a regular basis, the first choice should be to try to improve the database design (looking for primary keys, trying to get the "join" condition to be on as few columns as possible).
If that is not possible, the distinct second option is to figure out the "selectivity" of each of the columns (i.e. how many "different" values does each column have, 'name' would be more selective than 'address country' than 'male/female').
The general type of statement I'd suggest would be like this:
Delete from tableA
where exists (select * from tableB
where tableA.colx1 = tableB.colx1
and tableA.colx2 = tableB.colx2
etc. and tableA.colx10 = tableB.colx10).
The idea is to list the columns in order of the selectivity and build an index on colx1, colx2 etc. on tableB. The exact number of columns in tableB would be a result of some trial&measure. (Offset the time for building the index on tableB with the improved time of the delete statement.)
If this is just a one time operation, I'd just pick one of the slow methods outlined above. It's probably not worth the effort to think too much about this when you can just start a statement before going home ...

Is there a key value (or values) that can be used?
something like
DELETE a
FROM tableA a
INNER JOIN tableB b
on b.id = a.id

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

tuning in sql server - views - sql-server

Related

Database Index when SQL statement includes "IN" clause

Shortcut for adding table to column name SQL-server 2014

Searching for the unique key (not in meta)

How to SELECT * but without "Column names must be unique in each view"

How can I "subtract" one table from another?

Categories

Resources