SQL Server 2000 to 2008 Migration - ORDER BY Issue when using DISTINCT - sql-server

We're currently testing our app from SQL Server 2000 to 2008R2.
The following statement works in 2000, and not in 2008.
select distinct left(tz.Zipcode,5) as zipCode
from TerritoryZip tz
order by tz.Zipcode
The error message is:
Msg 145, Level 15, State 1, Line 1
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
The fix is simple:
select distinct left(tz.Zipcode,5) as zipCode
from TerritoryZip tz
order by left(tz.Zipcode,5)
However there is a risk that we may not find all the instances of this type of SQL.
So one solution might be to set the compatibility level to 2000 - what are the cons of doing that (e.g. performance of not updating the SQL to use this more strict approach)?
And are there other options/settings, e.g. is there a 'strictness' setting that is enforcing better practices, etc...?
Thanks!!

You could change the semantics slightly by doing this:
SELECT ZipCode FROM
(
SELECT DISTINCT ZipCode = LEFT(tz.Zipcode, 5)
FROM dbo.TerritoryZip
) AS x
ORDER BY ZipCode;
Doesn't solve the problem really since the query is more verbose and you still can't avoid touching it. The fix you've already suggested is better in my mind because it is more explicit about what is going on.
Not to be harsh, but if you don't think you'll be able to "find all the instances of this type of SQL" then how are you trusting your testing at all?
I would suggest that keeping 2000 mode is not the optimal answer. The reason is this can cause other syntax to break (e.g. the way you might call dynamic management functions - see this Paul Randal blog post and my comment to it), you also run the risk of perpetuating code that should be fixed (e.g. old-style *= / =* joins that are valid in 2000 compat mode, but won't be valid when you go from 2008 R2 -> Denali, which doesn't support 2000 compat).
There is no "strictness" setting but you can vote for Erland Sommarskog's SET STRICT_CHECKS ON; suggestion.

So one solution might be to set the compatibility level to 2000 - what
are the cons of doing that (e.g. performance of not updating the SQL
to use this more strict approach)?
I suggest you look at the documentation on ALTER DATABASE Compatibility Level (Transact-SQL) which lists dozens of differences between the compatibility levels along with the Possibility of impacts of low medium and high.
Also you should probably run the Upgrade Advisor which looks through your components for potential problems that you would need to fix

MSDN has a really good article showing the differences between different compatibility levels in SQL Server 2008 (including performance notes and best practices): http://msdn.microsoft.com/en-us/library/bb510680.aspx. Even in the example you gave, the SQL in the 2008 version is more intuitive and is enforcing a better best practice.

Related

Large difference in performance of complex SQL query based on initial "Use dB" statement

Why does a complex SQL query run worse with a Use statement and implicit dB references than with a Use master statement and full references to the user dB?
I'm using SQL Server Std 64-bit Version 13.0.4466.4 running on Windows Server 2012 R2. This is an "academic" question raised by one of my users, not an impediment to production.
By "complex" I mean several WITH clauses and a CROSS APPLY, simplified query structure below. By "worse" I mean 3 min. vs. 1 sec for 239 Rows, repeatably. The "plain" Exec Plan for fast query will not show, however, the Exec Plan w/ Live Query Stats runs for both, analysis further below. Tanx in advance for any light shed on this!
USE Master versus USE <userdb>;
DECLARE #chartID INTEGER = 65;
WITH
with1 AS
( SELECT stuff FROM <userdb>.schema1.userauxtable ),
with2 AS
( SELECT lotsastuff FROM <userdb>.dbo.<views w/ JOINS> ),
with3 AS
( SELECT allstuff FROM with2 WHERE TheDate IN (SELECT MAX(TheDate) FROM with2 GROUP BY <field>, CAST(TheDate AS DATE)) ),
with4 AS
( SELECT morestuff FROM with1 WHERE with1.ChartID = #chartID )
SELECT finalstuff FROM with3
CROSS APPLY ( SELECT littelstuff FROM with4 WHERE
with3.TheDate BETWEEN with4.PreDate AND with4.AfterDate
AND with4.MainID = with3.MainID ) as AvgCross
The Exec Plan w/ Live Query Stats for slow query has ~41% Cost ea. (83% total) in two ops:
a) Deep under the 5th Step (of 15) Hash match (Inner Join) Hash Keys Build ... 41% Cost to Index Scan (non-clustered) of ...
b) Very deep under the 4th Step (of 15) Nested Loops (Left Semi Join) -- 42% Cost to near-identical Index Scan per (1) except addition of (... AND datediff(day,Date1,getdate() ) to Predicate.
While the Exec Plan w/ Live Query Stats for fast query shows an 83% Cost in a Columnstore Idx Scan (non-clustered) of quite deep under the 9th Step (of 12) Hash match (Inner Join) Hash Keys Build .
It would seem that the difference is in the Columnstore Idx, but why does the Use master stmt send the Execution down that road?
There may be several possible reasons for this kind of behaviour; however, in order to identify them all, you will need people like Paul Randall or Kalen Delaney to answer this.
With my limited knowledge and understanding of MS SQL Server, I can think of at least 2 possible causes.
1. (Most plausible one) The queries are actually different
If, as you are saying, the query text is sufficiently lengthy and complex, it is completely possible to miss a single object (table, view, user-defined function, etc.) when adding database qualifiers and leave it with no DB prefix.
Now, if an object by that name somehow ended up in both the master and your UserDB databases then different objects will be picked up depending on the current database context, the data might be different, indices and their fragmentation, even data types... well, you get the idea.
This way, queries become different depending on the database context, and there is no point comparing their performance.
2. Compatibility level of user database
Back in the heyday of the 2005 version, I had a database with its compatibility level set to 80, so that ANSI SQL-89 outer joins generated by some antiquated ORM in legacy client apps would keep working. Most of the tasty new stuff worked too, with one notable exception however: the pivot keyword.
A query with PIVOT, when executed in the context of that database, threw an error saying the keyword is not recognised. However, when I switched the context to master and prefixed everything with user database's name, it ran perfectly fine.
Of course, this is not exactly your case, but it's a good demonstration of what I'm talking about. There are lots of internal SQL Server components, invisible to the naked eye, that affect the execution plan, performance and sometimes even results (or your ability to retrieve them, as in the example above) that depend on settings such as database' compatibility level, trace flags and other similar things.
As a possible cause, I can think of the new cardinality estimator which was introduced in SQL Server 2014. The version of the SQL Server instance you mentioned corresponds to 2016 SP1 CU7, however it is still possible that:
your user database may be in compatibility with 2012 version (for example, if it was restored from 2012 backup and nobody bothered to check its settings after that), or
trace flag 9481 is set either for the session or for the entire SQL Server instance, or
database scoped configuration option LEGACY_CARDINALITY_ESTIMATION is set for the database, etc.
(Thankfully, SQL Server doesn't allow to change compatibility level of the master database, so it's always of the latest supported level. Which is probably good, as no one can screw the database engine itself - not this way, at least.)
I'm pretty sure that I have only scratched the surface of the subject, so while checking the aforementioned places definitely wouldn't hurt, what you need to do is to identify the actual cause of the difference (if it's not #1 above, that is). This can be done by looking at actual execution plans of the queries (forget the estimated ones, they are worthless) with a tool other than vanilla SSMS. As an example, SentryOne Plan Explorer might be a good thing to begin with. Even without that, saving plans in .sqlplan files and opening them with any XML-capable viewer/editor will show you much more, including possible leads that might explain the difference you observe.

How to remove case sensitivity for Table and column names in Sybase

I'm new to Sybase and I really find it annoying to write sql with appropriate case for table names and column names. For eg, if the table name is 'Employee' I can't query as,
select * from employee
Is there a way to change this behavior in Sybase?
I don't want to change the sort order or anything. I'm looking for a hack to bypass this issue.
Cheers!!
As correctly pointed out in the other responses, this is a server-level configuration setting, which can be changed.
However, what is not mentioned is that in ASE, case-sensitivity applies equally to identifiers as well as to data comparison. So if you configure a case-insensitive sort order as discussed here, the effect will also be that 'Johnson' is now consider equal to 'JOHNSON' - and this could potentially cause trouble in applications.
In this sense, ASE is different from other databases where these two aspects of case-sensitivity are decoupled.
This behavior is a result of the servers sort order. This is a server level setting, not a database level setting, so the change will affect all databases on the server. Also if the database is in replication, all connected servers will need their sort order changed as well.
Changing the sort order will also require you to rebuild all the indexes in your system.
Here is the correct documentation on selecting or changing character sets and sort orders.
Configuring Character Sets, Sort Orders, and Languages
As mentioned in the comments, it will require DBA level access and the server will have to be restarted before changes will take affect.

pagination in SQL server 2008

I am a newbie to SQL server. keeping this question as reference.My doubt is
why Microsoft Sql server doesn't have something like limit in Mysql and now they are forcing to write either SP or inner query for pagination.I think creating a temporary view/table or using a inner query will be slower than a simple query.And i believe that there will be a strong reason for deprecating this. I like to know the reason.
If anyone know it please share it.
I never knew SQL Server supported something like TOP 10,20 - are you really totally sure?? Wasn't that some other system maybe??
Anyway: SQL Server 2011 (code-named "Denali") will be adding more support for this when it comes out by the end of 2011 or so.
The ORDER BY clause will get new additional keywords OFFSET and FETCH - read more about them here on MSDN.
You'll be able to write statements like:
-- Specifying variables for OFFSET and FETCH values
DECLARE #StartingRowNumber INT = 150, #FetchRows INT = 50;
SELECT
DepartmentID, Name, GroupName
FROM
HumanResources.Department
ORDER BY
DepartmentID ASC
OFFSET #StartingRowNumber ROWS
FETCH NEXT #FetchRows ROWS ONLY;
SQL Server 2005 Paging – The Holy Grail (requires free registration).
(Although it says SQL Server 2005 it is still applicable to SQL Server 2008)
I agree 100%! MySQL has the LIMIT clause that makes a very easy syntax to return a range of rows.
I don't know for sure that temporary table syntax is slower because SQL Server may be able to make some optimizations. However, a LIMIT clause would be far easier to type. And I would expect there would be more opportunities for optimization too.
I brought this once before, and the group I was talking to just didn't seem to agree.
As far as I'm concerned, there is no reason not to have a LIMIT clause (or equivalent), and I strongly suspect SQL Server eventually will!

Full Text Query takes minutes instead of sub seconds after upgrade

We just upgraded our SQL Server 2005 to SQL server 2008 R2 and noticed some performance problems.
The query below was already slow but now in 2008 it just times out. We rebuild the catalog to make sure its freshly made on 2008
DECLARE #FREETEXT varchar(255) = 'TEN-T'
select Distinct ...
from
DOSSIER_VERSION
inner join
DOSSIER_VERSION_LOCALISED ...
where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
or
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
The query takes minutes if you have both conditions enabled.
If you just put the following in the where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
Its super fast. Same goes for the case if its just
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
Since we are or'ing the results I would expect the time for this query to run to be less than the sum but as stated above it takes minutes/times out.
Can anyone tell me what is going on here? If I use a union (which is conceptually the same as the or) the performance problem is gone but I would like to know what issue I am running into here since I want to avoid rewriting queries.
Regards, Tom
See my answers to these very similar questions:
Adding more OR searches with
CONTAINS Brings Query to Crawl
SQL Server full text query across
multiple tables - why so slow?
The basic idea is that using LEFT JOINs to CONTAINSTABLE (or FREETEXTTABLE) performs significantly better than having multiple CONTAINS (or FREETEXT) ORed together in the WHERE clause.

Is this query equivalent to SQL Server 2008's OPTIMIZE FOR UNKNOWN?

I'm maintaining stored procedures for SQL Server 2005 and I wish I could use a new feature in 2008 that allows the query hint: "OPTIMIZE FOR UNKNOWN"
It seems as though the following query (written for SQL Server 2005) estimates the same number of rows (i.e. selectivity) as if OPTION (OPTIMIZE FOR UNKNOWN) were specified:
CREATE PROCEDURE SwartTest(#productid INT)
AS
DECLARE #newproductid INT
SET #newproductid = #productid
SELECT ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = #newproductid
This query avoids parameter sniffing by declaring and setting a new variable. Is this really a SQL Server 2005 work-around for the OPTIMIZE-FOR-UNKNOWN feature? Or am I missing something? (Authoritative links, answers or test results are appreciated).
More Info:
A quick test on SQL Server 2008 tells me that the number of estimated rows in this query is in fact the same as if OPTIMIZE FOR UNKNOWN was specified. Is this the same behavior on SQL Server 2005? I think I remember hearing once that without more info, the SQL Server Optimizing Engine has to guess at the selectivity of the parameter (usually at 10% for inequality predicates). I'm still looking for definitive info on SQL 2005 behavior though. I'm not quite sure that info exists though...
More Info 2:
To be clear, this question is asking for a comparison of the UNKNOWN query hint and the parameter-masking technique I describe.
It's a technical question, not a problem solving question. I considered a lot of other options and settled on this. So the only goal of this question was to help me gain some confidence that the two methods are equivalent.
I've used that solution several times recently to avoid parameter sniffing on SQL 2005 and it seems to me to do the same thing as OPTIMIZE FOR UNKNOWN on SQL 2008. Its fixed a lot of problems we had with some of our bigger stored procedures sometimes just hanging when passed certain parameters.
Okay, so I've done some experimenting. I'll write up the results here, but first I want to say that based on what I've seeen and know, I'm confident that using temporary parameters in 2005 and 2008 is exactly equivalent to using 2008's OPTIMIZE FOR UNKNOWN. At least in the context of stored procedures.
So this is what I've found.
In the procedure above, I'm using the AdventureWorks database. (But I use similar methods and get similar results for any other database) I ran:
dbcc show_statistics ('Sales.SalesOrderDetail', IX_SalesOrderDetail_ProductID)
And I see statistics with 200 steps in its histogram. Looking at its histogram I see that there are 66 distinct range rows (i.e. 66 distinct values that weren't included in stats as equality values). Add the 200 equality rows (from each step), and I get an estimate of 266 distinct values for ProductId in Sales.SalesOrderDetail.
With 121317 rows in the table, I can estimate that each ProductId has 456 rows on average. And when I look at the query plan for my test procedure (in xml format) I see something like:
...
<QueryPlan DegreeOfParallelism="1" >
<RelOp NodeId="0"
PhysicalOp="Index Seek"
LogicalOp="Index Seek"
EstimateRows="456.079"
TableCardinality="121317" />
...
<ParameterList>
<ColumnReference
Column="#newproductid"
ParameterRuntimeValue="(999)" />
</ParameterList>
</QueryPlan>
...
So I know where the EstimateRows value is coming from (accurate to three decimals) and Notice that the ParameterCompiledValue attribute is missing from query plan. This is exactly what a plan looks like when using 2008's OPTIMIZE FOR UNKNOWN
Interesting question.
There's a good article on the SQL Programming and API Development Team blog here which lists the workaround solutions, pre-SQL 2008 as:
use RECOMPILE hint so the query is recompiled every time
unparameterise the query
give specific values in OPTIMIZE FOR hint
force use of a specific index
use a plan guide
Which leads me on to this article, which mentions your workaround of using local parameters and how it generates an execution plan based on statistics. How similar this process is to the new OPTIMIZER FOR UNKNOWN hint, I don't know. My hunch is it is a reasonable workaround.
I've been using this parameter masking technique for at least the past year because off odd performance problems, and it has worked well, but is VERY annoying to have to do all the time.
I have ALSO been using WITH RECOMPILE.
I do not have controlled tests because I can't selectively turn the usage of each on and off automatically in the system, but I suspect that the parameter masking will only help IF the parameter is used. I have some complex SPs where the parameter is not used in every statement, and I expect that WITH RECOMPILE was still necessary because some of the "temporary" work tables are not populated (or even indexed identically, if I'm trying to tune) the same way on every run, and some later statements don't rely on the parameter once the work tables are already appropriately populated. I have broken some processes up into multiple SPs precisely so that work done to populate a work table in one SP can be properly analyzed and executed against WITH RECOMPILE in the next SP.

Resources