SQL Server: Select in vs or? - sql-server

Which is faster?
SELECT UserName
FROM dbo.UserTable
WHERE UserID in (1,3,4)
SELECT UserName
FROM dbo.UserTable
WHERE UserID = 1
OR UserID = 3
OR UserID = 4

Due to Sql Server's optimization of queries these will run at the same speed since they are logically equivalent.
i favor the IN syntax for brevity and readability though.

Actually it is the same.
If you display the estimated execution plan you will see that it is performing the same action.

This is relevant when it comes to producing SQL via Linq. There are some instances where the sql created is in the form of: field='xxx' OR field='yyy'.

Related

How to write a parameterized SQL query?

I am trying this query to fetch my value from a MS SQL database on basis of two conditions but still I am getting exception in the syntax part.
Can anybody tell me what is the correct way to write a parameterized query in R?
Following is the query I used:
query<- paste0("SELECT [value] FROM [RiskDashboard].[dbo].[tbl_Simulation]
where Row_Id=", row_id[c(1),] ," AND Script_Id=", script_id[c(1),] ,)
T_data<-sqlQuery(ch,query)
print(T_data)
Parameterizing data is very important - especially from a security perspective. The examples you have are string concatenations and are subject to SQL injections.
The RODBCext package does have support for parameterization.
First - standard SQL parameterization syntax:
SELECT ColA, ColB FROM MyTable where FirstName = ? and LastName = ?
Each ? mark indicates in order the values that will appear in a vector. This syntax is true for ODBC regardless of platform. Others have extended to support position. eg. OLEDB supports #P1, #P2 etc.
While maybe not important for your R queries - in a multi-user system parameterized queries execute faster because the query plan is stored by the database-server (true of both Oracle and SQL Server).
To semi-plagiarize from the documentation:
library(RODBCext)
connHandle <- odbcConnect("myDatabase")
data <- sqlExecute(connHandle, "SELECT * FROM myTable WHERE column = ?", 'myValue', fetch = TRUE)
odbcClose(connHandle)
Documentation is here: https://cran.r-project.org/web/packages/RODBCext/vignettes/Parameterized_SQL_queries.html
More discussion here: Parameterized queries with RODBC
i got the correct way as follows:
query<- paste0("SELECT [Row_Id],[Script_id],[value] FROM
[RiskDashboard].[dbo]. [tbl_Simulation] where Row_Id='",row_id[c(i),],"'
AND Script_Id='",script_id[c(i),],"'")

SQL Performance using MAX on joined tabled

I'm using SQL Server 2008R2
I'd like your views on the the two SQL statements below as regards to performance and best practice.
select
*
from BcpSample1
where dataloadid = (select MAX(id) from LoadControl_BcpSample1 where Status = 'completed')
And
select
a.*
from CiaBcpSample1 a
inner join (select ActiveDataLoadId = MAX(id) from LoadControl_BcpSample1 where Status = 'completed') as t
on a.DataLoadId = t.ActiveDataLoadId
I tried the Query plan in SQL Server Studio but after 2 runs both are returning showing the same query plans.
Thanks
In "regards to performance and best practice." it depends on many things. What works well one time might not be the best the next. You have to test, measure the performance and then choose.
You say the plan generated by SQL Server is the same, so in this instance there shouldn't be any difference. Choose the query easiest to maintain and move on to the next problem.

Microsoft SQL Server: How to improve the performance of a dumb query?

I have been asked to help with performance issue of a SQL server installation. I am not a SQL Server expert, but I decided to take a look. We are using a closed source application that appears to work OK. However after a SQL Server upgrade from 2000 to 2005, application performance has reportedly suffered considerably. I ran SQL profiler and caught the following query (field names changed to protect the innocent) taking about 30 seconds to run. My first thought was that I should optimize the query. But that is not possible, given that the application is closed source and the vendor is not helpful. So I am left, trying to figure out how to make this query run fast without changing it. It is also not clear to me how this query ran faster on the older SQL server 2000 product. Perhaps there was some sort of performance tuning applied to on that instance that did not carry over or does not work on the new SQL server. DBCC PINTABLE comes to mind.
Anyway, here is the offending query:
select min(row_id) from Table1 where calendar_id = 'Test1'
and exists
(select id from Table1 where calendar_id = 'Test1' and
DATEDIFF(day, '12/30/2010 09:21', start_datetime) = 0
)
and exists
(select id from Table1 where calendar_id = 'Test1' and
DATEDIFF(day, end_datetime, '01/17/2011 09:03') = 0
);
Table1 has about 6200 entries and looks like this. I have tried creating various indices to no effect.
id calendar_id start_datetime end_datetime
int, primary key varchar(10) datetime datetime
1 Test1 2005-01-01... 2005-01-01...
2 Test1 2005-01-02... 2005-01-02...
3 Test1 2005-01-03... 2005-01-03...
...
I would be very grateful if somebody could help resolve this mystery.
Thanks in advance.
The one thing that should help is a covering index on calendar_id:
create index <indexname>
on table (calendar_id, id)
include (start_datetime, end_datetime);
This will satisfy the calendar_id = 'Test1' predicates, the min(row_id) sort and will provide the material to evaluate the non-SARG-able DATEFIFF predicates. If there are no other columns in the table, then this is probably the clustered index you need and the id primary key should be a non-clustered one.
Make sure the indexes made the conversion. Then update statistics.
Check the differences between the execution plan on the old sql server and the new one. http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
About the other only thing you can do beyond Remus Rusanu's index suggestion, is to upgrade to the Enterprise edition which has a more advanced scan feature (on both SQL Server 2005 and 2008 Enterprise Edition) which allows multiple tasks to share full table scans.
Beyond that, I do not think there is anything you can do if you cannot change the query. The reason is that the query is doing a comparison against a function result in the Where clause. That means it will force SQL Server to do a table scan on Table1 each time it is executed.
Reading Pages (more info about Advanced Scanning)

SQL Server Linked Server Example Query

While in Management Studio, I am trying to run a query/do a join between two linked servers.
Is this a correct syntax using linked db servers:
select foo.id
from databaseserver1.db1.table1 foo,
databaseserver2.db1.table1 bar
where foo.name=bar.name
Basically, do you just preface the db server name to the db.table ?
The format should probably be:
<server>.<database>.<schema>.<table>
For example:
DatabaseServer1.db1.dbo.table1
Update: I know this is an old question and the answer I have is correct; however, I think any one else stumbling upon this should know a few things.
Namely, when querying against a linked server in a join situation the ENTIRE table from the linked server will likely be downloaded to the server the query is executing from in order to do the join operation. In the OP's case, both table1 from DB1 and table1 from DB2 will be transferred in their entirety to the server executing the query, presumably named DB3.
If you have large tables, this may result in an operation that takes a long time to execute. After all it is now constrained by network traffic speeds which is orders of magnitude slower than memory or even disk transfer speeds.
If possible, perform a single query against the remote server, without joining to a local table, to pull the data you need into a temp table. Then query off of that.
If that's not possible then you need to look at the various things that would cause SQL server to have to load the entire table locally. For example using GETDATE() or even certain joins. Others performance killers include not giving appropriate rights.
See http://thomaslarock.com/2013/05/top-3-performance-killers-for-linked-server-queries/ for some more info.
SELECT * FROM OPENQUERY([SERVER_NAME], 'SELECT * FROM DATABASE_NAME..TABLENAME')
This may help you.
For those having trouble with these other answers , try OPENQUERY
Example:
SELECT * FROM OPENQUERY([LinkedServer], 'select * from [DBName].[schema].[tablename]')
If you still find issue with <server>.<database>.<schema>.<table>
Enclose server name in []
You need to specify the schema/owner (dbo by default) as part of the reference. Also, it would be preferable to use the newer (ANSI-92) join style.
select foo.id
from databaseserver1.db1.dbo.table1 foo
inner join databaseserver2.db1.dbo.table1 bar
on foo.name = bar.name
select * from [Server].[database].[schema].[tablename]
This is the correct way to call.
Be sure to verify that the servers are linked before executing the query!
To check for linked servers call:
EXEC sys.sp_linkedservers
right click on a table and click script table as select
select name from drsql01.test.dbo.employee
drslq01 is servernmae --linked serer
test is database name
dbo is schema -default schema
employee is table name
I hope it helps to understand, how to execute query for linked server
Usually direct queries should not be used in case of linked server because it heavily use temp database of SQL server. At first step data is retrieved into temp DB then filtering occur. There are many threads about this. It is better to use open OPENQUERY because it passes SQL to the source linked server and then it return filtered results e.g.
SELECT *
FROM OPENQUERY(Linked_Server_Name , 'select * from TableName where ID = 500')
For what it's worth, I found the following syntax to work the best:
SELECT * FROM [LINKED_SERVER]...[TABLE]
I couldn't get the recommendations of others to work, using the database name. Additionally, this data source has no schema.
In sql-server(local) there are two ways to query data from a linked server(remote).
Distributed query (four part notation):
Might not work with all remote servers. If your remote server is MySQL then distributed query will not work.
Filters and joins might not work efficiently. If you have a simple query with WHERE clause, sql-server(local) might first fetch entire table from the remote server and then apply the WHERE clause locally. In case of large tables this is very inefficient since a lot of data will be moved from remote to local. However this is not always the case. If the local server has access to remote server's table statistics then it might be as efficient as using openquery More details
On the positive side T-SQL syntax will work.
SELECT * FROM [SERVER_NAME].[DATABASE_NAME].[SCHEMA_NAME].[TABLE_NAME]
OPENQUERY
This is basically a pass-through. The query is fully processed on the remote server thus will make use of index or any optimization on the remote server. Effectively reducing the amount of data transferred from the remote to local sql-server.
Minor drawback of this approach is that T-SQL syntax will not work if the remote server is anything other than sql-server.
SELECT * FROM OPENQUERY([SERVER_NAME], 'SELECT * FROM DATABASE_NAME.SCHEMA_NAME.TABLENAME')
Overall OPENQUERY seems like a much better option to use in majority of the cases.
I have done to find out the data type in the table at link_server using openquery and the results were successful.
SELECT * FROM OPENQUERY (LINKSERVERNAME, '
SELECT DATA_TYPE, COLUMN_NAME
FROM [DATABASENAME].INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME =''TABLENAME''
')
Its work for me
Following Query is work best.
Try this Query:
SELECT * FROM OPENQUERY([LINKED_SERVER_NAME], 'SELECT * FROM [DATABASE_NAME].[SCHEMA].[TABLE_NAME]')
It Very helps to link MySQL to MS SQL
PostgreSQL:
You must provide a database name in the Data Source DSN.
Run Management Studio as Administrator
You must omit the DBName from the query:
SELECT * FROM OPENQUERY([LinkedServer], 'select * from schema."tablename"')
For MariaDB (and so probably MySQL), attempting to specify the schema using the three-dot syntax did not work, resulting in the error "invalid use of schema or catalog". The following solution worked:
In SSMS, go to Server Objects > Linked Servers > Providers > MSDASQL
Ensure that "Dynamic parameter", "Level zero only", and "Allow inprocess" are all checked
You can then query any schema and table using the following syntax:
SELECT TOP 10 *
FROM LinkedServerName...[SchemaName.TableName]
Source: SELECT * FROM MySQL Linked Server using SQL Server without OpenQuery
Have you tried adding " around the first name?
like:
select foo.id
from "databaseserver1".db1.table1 foo,
"databaseserver2".db1.table1 bar
where foo.name=bar.name

'LINQ query plan' horribly inefficient but 'Query Analyser query plan' is perfect for same SQL!

I have a LINQ to SQL query that generates the following SQL :
exec sp_executesql N'SELECT COUNT(*) AS [value]
FROM [dbo].[SessionVisit] AS [t0]
WHERE ([t0].[VisitedStore] = #p0) AND (NOT ([t0].[Bot] = 1)) AND
([t0].[SessionDate] > #p1)',N'#p0 int,#p1 datetime',
#p0=1,#p1='2010-02-15 01:24:00'
(This is the actual SQL taken from SQL Profiler on SQL Server 2008.)
The query plan generated when I run this SQL from within Query Analyser is perfect.
It uses an index containing VisitedStore, Bot, SessionDate.
The query returns instantly.
However when I run this from C# (with LINQ) a different query plan is used that is so inefficient it doesn't even return in 60 seconds. This query plan is trying to do a key lookup on the clustered primary key which contains a couple million rows. It has no chance of returning.
What I just can't understand though is that the EXACT same SQL is being run - either from within LINQ or from within Query Analyser yet the query plan is different.
I've ran the two queries many many times and they're now running in isolation from any other queries. The date is DateTime.Now.AddDays(-7), but I've even hardcoded that date to eliminate caching problems.
Is there anything i can change in LINQ to SQL to affect the query plan or try to debug this further? I'm very very confused!
This is a relatively common problem that surprised me too when I first saw it. The first thing to do is ensure your statistics are up to date. You can check the age of statistics with:
SELECT
object_name = Object_Name(ind.object_id),
IndexName = ind.name,
StatisticsDate = STATS_DATE(ind.object_id, ind.index_id)
FROM SYS.INDEXES ind
order by STATS_DATE(ind.object_id, ind.index_id) desc
Statistics should be updated in a weekly maintenance plan. For a quick fix, issue the following command to update all statistics in your database:
exec sp_updatestats
Apart from the statistics, another thing you can check is the SET options. They can be different between Query Analyzer and your Linq2Sql application.
Another possibility is that SQL Server is using an old cached plan for your Linq2Sql query. Plans can be cached on a per-user basis, so if you run Query Analyser as a different user, that can explain different plans. Normally you could add Option (RECOMPILE) to the application query, but I guess that's hard with Linq2Sql. You can clear the entire cache with DBCC FREEPROCCACHE and see if that speeds up the Linq2Sql query.
switched to a stored procedure and the same SQL works fine. would really like to know what's going on but can't spend any more time on this now. fortunately in this instance the query was not too dynamic.
hopefully this at least helps anyone in the same boat as me

Resources