I very rarely use SQL Server and in a professional context I keep well clear! I'm working on a pet project though and I'm have problems with a script creation.
I've got an online database that I need to extract everything out of. I use the Tasks > Generate Scripts option within SQL Server Management Studio. The following is an example of one insert statement the script creates (I have 1,000s of these inserts):
INSERT [dbo].[NewComics] ([NewComicId], [Title], [Subtitle], [ReleaseDate], [CollectionId]) VALUES (366, N'Hawk & Dove 1: ', N'First Strikes ', CAST(0x00009F6F00000000 AS DateTime), 248)
I have two issues with this:
(a) I want to strip all the whitespace from the two title elements
(b) I don't want a HEX date - I want something readable like 2006-09-01 (yyyy-mm-dd)
INSERT [dbo].[NewComics] ([NewComicId], [Title], [Subtitle], [ReleaseDate], [CollectionId]) VALUES (366, N'Hawk & Dove 1:', N'First Strikes', '2006-09-01', 248)
What would be the quickest way to change about 3,000 insert statements to this revised format?
FYI - this is the design of the table:
[NewComicId] [int] NOT NULL,
[Title] [nchar](100) NOT NULL,
[Subtitle] [nchar](100) NULL,
[ReleaseDate] [datetime] NOT NULL,
[CollectionId] [int] NOT NULL,
Thanks in advance!
Yes, generate scripts sadly scripts datetime columns as CONVERT(binary_value, Datetime). I'll try to get an answer as to why (or more importantly if there is a way to change the behavior). I suspect the reason is to avoid any issues with running the scripts on a different machine with different locale / regional settings etc. I don't know if there's a way to change that from happening in the meantime, but Management Studio isn't the only way to script your data... you could look into 3rd party products like Red-Gate's SQL Data Compare.
If it's really only 3,000 rows, and you intend to run the generated script on a different server, stop using the wizard and do this (on first glance this looks horrific, but it does several of the things you'll want - outputs a script ready to copy, paste and run, with nicely formatted and readable dates, inserts batched into multi-row VALUES by 1000 with GO commands in between, and even deals with potentially NULL values in title, subtitle and collectionid):
DECLARE #newtable SYSNAME = 'dbo.NewComics';
SET NOCOUNT ON;
;WITH x AS (SELECT TOP (4000) s = '('
+ CONVERT(VARCHAR(12), NewComicId) + ','
+ COALESCE('N''' + REPLACE(RTRIM(Title), '''', '''''') + '''', 'NULL') + ','
+ COALESCE('N''' + REPLACE(RTRIM(SubTitle), '''', '''''') + '''', 'NULL')
+ ', ''' + CONVERT(CHAR(8), ReleaseDate, 112) + ' '
+ CONVERT(CHAR(8), ReleaseDate, 108) + ''','
+ CONVERT(VARCHAR(12), COALESCE(CollectionId, 'NULL')) + ')',
rn = ROW_NUMBER() OVER (ORDER BY NewComicId)
FROM dbo.OldComics ORDER BY NewComicId
),
y AS
(
SELECT [/*a*/] = 1, [/*b*/] = 'SET NOCOUNT ON;
GO
INSERT ' + #newtable + ' VALUES'
UNION ALL
SELECT 2, s = CASE WHEN rn > 1 THEN ',' ELSE '' END + s
FROM x WHERE rn BETWEEN 1 AND 1000
UNION ALL
SELECT 3, 'GO' UNION ALL
SELECT 4, s = CASE WHEN rn > 1001 THEN ',' ELSE '' END + s
FROM x WHERE rn BETWEEN 1001 AND 2000
UNION ALL
SELECT 5, 'GO' UNION ALL
SELECT 6, s = CASE WHEN rn > 2001 THEN ',' ELSE '' END + s
FROM x WHERE rn BETWEEN 2001 AND 3000
UNION ALL
SELECT 7, 'GO' UNION ALL
SELECT 8, s = CASE WHEN rn > 3001 THEN ',' ELSE '' END + s
FROM x WHERE rn BETWEEN 3001 AND 4000
)
SELECT [/*b*/] FROM y ORDER BY [/*a*/];
(You might have to play with it if you have exactly 3000 or 3001 rows, or add another couple of unions if you have more than 4000, etc.)
If you are moving the data to a different table or different database on the same instance, use the script that #swasheck provided (and again, stop using the wizard).
You may have noticed a common trend here: stop using the generate scripts wizard if you don't like the binary format it outputs for dates.
So if this was me, what I'd do would be to build up the table structure in a separate database:
CREATE TABLE NewComics (
[NewComicId] [int] identity (0,1) NOT NULL,
[Title] [nvarchar](100) NOT NULL,
[Subtitle] [nvarchar](100) NULL,
[ReleaseDate] [datetime] NOT NULL,
[CollectionId] [int] NOT NULL
);
ALTER TABLE [NewComics]
ADD CONSTRAINT PK_NewComicsID PRIMARY KEY;
And then use SQL to clean the data like so:
INSERT INTO [NewDatabase].[dbo].[NewComics] (Title, Subtitle, ReleaseDate, CollectionID)
SELECT
LTRIM(RTRIM(Title))
, LTRIM(RTRIM(Subtitle))
, CAST(ReleaseDate as Datetime)
, CollectionID
FROM [OldDatabase].[dbo].[NewComics];
Alternatively, you can use this same SELECT query:
SELECT
NewComicID
, LTRIM(RTRIM(Title))
, LTRIM(RTRIM(Subtitle))
, CAST(ReleaseDate as Datetime)
, CollectionID
FROM [OldDatabase].[dbo].[NewComics];
as the source for an Import/Export Data Task (in the same menu that you've used to Generate Scripts). [OldDatabase] on this server would be the source and [NewDatabase] on this server would be the destination. Make sure you check the box to all identity inserts.
Related
In a SqlServer database I use, the database name is something like StackExchange.Audio.Meta, or StackExchange.Audio or StackOverflow . By sheer luck this is also the url for a website. I only need split it on the dots and reverse it: meta.audio.stackexchange. Adding http:// and .com and I'm done. Obviously Stackoverflow doesn't need any reversing.
Using the SqlServer 2016 string_split function I can easy split and reorder its result:
select value
from string_split(db_name(),'.')
order by row_number() over( order by (select 1)) desc
This gives me
| Value |
-----------------
| Meta |
| Audio |
| StackExchange |
As I need to have the url in a variable I hoped to concatenate it using this answer so my attempt looks like this:
declare #revname nvarchar(150)
select #revname = coalesce(#revname +'.','') + value
from string_split(db_name(),'.')
order by row_number() over( order by (select 1)) desc
However this only returns me the last value, StackExchange. I already noticed the warnings on that answer that this trick only works for certain execution plans as explained here.
The problem seems to be caused by the order by clause. Without that I get all values, but then in the wrong order. I tried to a add ltrimand rtrim function as suggested in the Microsoft article as well as a subquery but so far without luck.
Is there a way I can nudge the Sql Server 2016 Query Engine to concatenate the ordered result from that string_split in a variable?
I do know I can use for XML or even a plain cursor to get the result I need but I don't want to give up this elegant solution yet.
As I'm running this on the Stack Exchange Data Explorer I can't use functions, as we lack the permission to create those. I can do Stored procedures but I hoped I could evade those.
I prepared a SEDE Query to experiment with. The database names to expect are either without dots, aka StackOverflow, with 1 dot: StackOverflow.Meta or 2 dots, `StackExchange.Audio.Meta, the full list of databases is here
I think you are over-complicating things. You could use PARSENAME:
SELECT 'http://' + PARSENAME(db_name(),1) +
ISNULL('.' + PARSENAME(db_name(),2),'') + ISNULL('.'+PARSENAME(db_name(),3),'')
+ '.com'
This is exactly why I have the Presentation Sequence (PS) in my split function. People often scoff at using a UDF for such items, but it is generally a one-time hit to parse something for later consumption.
Select * from [dbo].[udf-Str-Parse]('meta.audio.stackexchange','.')
Returns
Key_PS Key_Value
1 meta
2 audio
3 stackexchange
The UDF
CREATE FUNCTION [dbo].[udf-Str-Parse] (#String varchar(max),#delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse]('meta.audio.stackexchange','.')
-- Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
-- Select * from [dbo].[udf-Str-Parse]('id26,id46|id658,id967','|')
Returns #ReturnTable Table (Key_PS int IDENTITY(1,1) NOT NULL , Key_Value varchar(max))
As
Begin
Declare #intPos int,#SubStr varchar(max)
Set #IntPos = CharIndex(#delimeter, #String)
Set #String = Replace(#String,#delimeter+#delimeter,#delimeter)
While #IntPos > 0
Begin
Set #SubStr = Substring(#String, 0, #IntPos)
Insert into #ReturnTable (Key_Value) values (#SubStr)
Set #String = Replace(#String, #SubStr + #delimeter, '')
Set #IntPos = CharIndex(#delimeter, #String)
End
Insert into #ReturnTable (Key_Value) values (#String)
Return
End
Probably less elegant solution but it takes only a few lines and works with any number of dots.
;with cte as (--build xml
select 1 num, cast('<str><s>'+replace(db_name(),'.','</s><s>')+'</s></str>' as xml) str
)
,x as (--make table from xml
select row_number() over(order by num) rn, --add numbers to sort later
t.v.value('.[1]','varchar(50)') s
from cte cross apply cte.str.nodes('str/s') t(v)
)
--combine into string
select STUFF((SELECT '.' + s AS [text()]
FROM x
order by rn desc --in reverse order
FOR XML PATH('')
), 1, 1, '' ) name
Is there a way I can nudge the Sql Server 2016 Query Engine to concatenate the ordered result from that string_split in a variable?
You can just use CONCAT:
DECLARE #URL NVARCHAR(MAX)
SELECT #URL = CONCAT(value, '.', #URL) FROM STRING_SPLIT(DB_NAME(), '.')
SET #URL = CONCAT('http://', LOWER(#URL), 'com');
The reversal is accomplished by the order of parameters to CONCAT. Here's an example.
It changes StackExchange.Garage.Meta to http://meta.garage.stackexchange.com.
This can be used to split and reverse strings in general, but note that it does leave a trailing delimiter. I'm sure you could add some logic or a COALESCE in there to make that not happen.
Also note that vNext will be adding STRING_AGG.
To answer the 'X' of this XY problem, and to address the HTTPS switch (especially for Meta sites) and some other site name changes, I've written the following SEDE query which outputs all site names in the format used on the network site list.
SELECT name,
LOWER('https://' +
IIF(PATINDEX('%.Mathoverflow%', name) > 0,
IIF(PATINDEX('%.Meta', name) > 0, 'meta.mathoverflow.net', 'mathoverflow.net'),
IIF(PATINDEX('%.Ubuntu%', name) > 0,
IIF(PATINDEX('%.Meta', name) > 0, 'meta.askubuntu.com', 'askubuntu.com'),
IIF(PATINDEX('StackExchange.%', name) > 0,
CASE SUBSTRING(name, 15, 200)
WHEN 'Audio' THEN 'video'
WHEN 'Audio.Meta' THEN 'video.meta'
WHEN 'Beer' THEN 'alcohol'
WHEN 'Beer.Meta' THEN 'alcohol.meta'
WHEN 'CogSci' THEN 'psychology'
WHEN 'CogSci.Meta' THEN 'psychology.meta'
WHEN 'Garage' THEN 'mechanics'
WHEN 'Garage.Meta' THEN 'mechanics.meta'
WHEN 'Health' THEN 'medicalsciences'
WHEN 'Health.Meta' THEN 'medicalsciences.meta'
WHEN 'Moderators' THEN 'communitybuilding'
WHEN 'Moderators.Meta' THEN 'communitybuilding.meta'
WHEN 'Photography' THEN 'photo'
WHEN 'Photography.Meta' THEN 'photo.meta'
WHEN 'Programmers' THEN 'softwareengineering'
WHEN 'Programmers.Meta' THEN 'softwareengineering.meta'
WHEN 'Vegetarian' THEN 'vegetarianism'
WHEN 'Vegetarian.Meta' THEN 'vegetarianism.meta'
WHEN 'Writers' THEN 'writing'
WHEN 'Writers.Meta' THEN 'writing.meta'
ELSE SUBSTRING(name, 15, 200)
END + '.stackexchange.com',
IIF(PATINDEX('StackOverflow.%', name) > 0,
CASE SUBSTRING(name, 15, 200)
WHEN 'Br' THEN 'pt'
WHEN 'Br.Meta' THEN 'pt.meta'
ELSE SUBSTRING(name, 15, 200)
END + '.stackoverflow.com',
IIF(PATINDEX('%.Meta', name) > 0,
'meta.' + SUBSTRING(name, 0, PATINDEX('%.Meta', name)) + '.com',
name + '.com'
)
)
)
)
) + '/'
)
FROM sys.databases WHERE database_id > 5
Please look at the below query..
select name as [Employee Name] from table name.
I want to generate [Employee Name] dynamically based on other column value.
Here is the sample table
s_dt dt01 dt02 dt03
2015-10-26
I want dt01 value to display as column name 26 and dt02 column value will be 26+1=27
I'm not sure if I understood you correctly. If I'am going into the wrong direction, please add comments to your question to make it more precise.
If you really want to create columns per sql you could try a variation of this script:
DECLARE #name NVARCHAR(MAX) = 'somename'
DECLARE #sql NVARCHAR(MAX) = 'ALTER TABLE aps.tbl_Fabrikkalender ADD '+#name+' nvarchar(10) NULL'
EXEC sys.sp_executesql #sql;
To retrieve the column name from another query insert the following between the above declares and fill the placeholders as needed:
SELECT #name = <some colum> FROM <some table> WHERE <some condition>
You would need to dynamically build the SQL as a string then execute it. Something like this...
DECLARE #s_dt INT
DECLARE #query NVARCHAR(MAX)
SET #s_dt = (SELECT DATEPART(dd, s_dt) FROM TableName WHERE 1 = 1)
SET #query = 'SELECT s_dt'
+ ', NULL as dt' + RIGHT('0' + CAST(#s_dt as VARCHAR), 2)
+ ', NULL as dt' + RIGHT('0' + CAST((#s_dt + 1) as VARCHAR), 2)
+ ', NULL as dt' + RIGHT('0' + CAST((#s_dt + 2) as VARCHAR), 2)
+ ', NULL as dt' + RIGHT('0' + CAST((#s_dt + 3) as VARCHAR), 2)
+ ' FROM TableName WHERE 1 = 1)
EXECUTE(#query)
You will need to replace WHERE 1 = 1 in two places above to select your data, also change TableName to the name of your table and it currently puts NULL as the dynamic column data, you probably want something else there.
To explain what it is doing:
SET #s_dt is selecting the date value from your table and returning only the day part as an INT.
SET #query is dynamically building your SELECT statement based on the day part (#s_dt).
Each line is taking #s_dt, adding 0, 1, 2, 3 etc, casting as VARCHAR, adding '0' to the left (so that it is at least 2 chars in length) then taking the right two chars (the '0' and RIGHT operation just ensure anything under 10 have a leading '0').
It is possible to do this using dynamic SQL, however I would also consider looking at the pivot operators to see if they can achieve what you are after a lot more efficiently.
https://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
How can I bulk import lat, long from csv file into sql server as spatial data type, not two separate columns of type double?
Given input .csv file with two columns:
Latitude, Longitude
want to create sql db that has one column corresponding to a spatial data type.
Did some research and found this article:
Convert Latitude/Longitude (Lat/Long) to Geography Point
So as given in article, I've created table and inserted some test data using given script:
CREATE TABLE [dbo].[Landmark] (
[ID] INT IDENTITY(1, 1),
[LandmarkName] VARCHAR(100),
[Location] VARCHAR(50),
[Latitude] FLOAT,
[Longitude] FLOAT
)
GO
INSERT INTO [dbo].[Landmark] ( [LandmarkName], [Location], [Latitude], [Longitude] )
VALUES ( 'Statue of Liberty', 'New York, USA', 40.689168,-74.044563 ),
( 'Eiffel Tower', 'Paris, France', 48.858454, 2.294694),
( 'Leaning Tower of Pisa', 'Pisa, Italy', 43.72294, 10.396604 ),
( 'Great Pyramids of Giza', 'Cairo, Egypt', 29.978989, 31.134632 ),
( 'Sydney Opera House', 'Syndey, Australia', -33.856651, 151.214967 ),
( 'Taj Mahal', 'Agra, India', 27.175047, 78.042042 ),
( 'Colosseum', 'Rome, Italy', 41.890178, 12.492378 )
GO
And then added calculated column using this query:
ALTER TABLE [dbo].[Landmark]
ADD [GeoLocation] AS geography::STPointFromText('POINT(' + CAST([Longitude] AS VARCHAR(20)) + ' ' + CAST([Latitude] AS VARCHAR(20)) + ')', 4326)
GO
Now when I query table using
SELECT *
FROM dbo.Landmark
I'm getting calculated spatial results too:
And results in axis:
Hopefully I understood you right.
Update:
I'm not sure if this will satisfy you. It's quite dirty, but it does the job:
That's how I formatted CSV file. I've used same structure as in previous example:
Statue of Liberty| New York, USA| 40.689168|-74.044563
Eiffel Tower| Paris, France| 48.858454| 2.294694
Leaning Tower of Pisa| Pisa, Italy| 43.72294| 10.396604
Great Pyramids of Giza| Cairo, Egypt| 29.978989| 31.134632
Sydney Opera House| Syndey, Australia| -33.856651| 151.214967
Taj Mahal| Agra, India| 27.175047| 78.042042
Colosseum| Rome, Italy| 41.890178| 12.492378
Columns seperator is | symbol and rows seperator is break symbol.
So what I did is, I used OPENROWSET to open CSV file and format this into rows instead having one long string( that's how OPENROWSET read my csv file, unfortunately). I've used this SplitString function:
https://stackoverflow.com/a/19935594/3680098
Now I need to turn these buddies into columns instead of one string. I've used this answer provided on SO: https://stackoverflow.com/a/15108499/3680098
Summing things up, that's the query:
SELECT LTRIM(RTRIM(xmlValue.value('/values[1]/value[1]','nvarchar(100)'))) AS LandmarkName
, LTRIM(RTRIM(xmlValue.value('/values[1]/value[2]','nvarchar(100)'))) AS Location
, LTRIM(RTRIM(xmlValue.value('/values[1]/value[3]','nvarchar(20)'))) AS Lon
, LTRIM(RTRIM(xmlValue.value('/values[1]/value[4]','nvarchar(20)'))) AS Lat
, GEOGRAPHY::STPointFromText('POINT(' + xmlValue.value('/values[1]/value[4]','nvarchar(20)') + ' ' + xmlValue.value('/values[1]/value[3]','nvarchar(20)') + ')', 4326)
FROM dbo.SplitString((SELECT Document.* FROM OPENROWSET(BULK N'C:\Temp\test.csv', SINGLE_CLOB) AS Document), CHAR(10)) AS T
CROSS APPLY (SELECT CAST('<values><value>' + REPLACE(T.Value, '|', '</value><value>') + '</value></values>' AS XML)) AS T1(xmlValue);
It generates me required data as in my first screenshot and it seems just fine.
So what I need to do, is to create my table and insert these into it:
CREATE TABLE [dbo].[Landmark] (
[ID] INT IDENTITY(1, 1),
[LandmarkName] VARCHAR(100),
[Location] VARCHAR(50),
[GeoLocation] GEOGRAPHY
)
GO
INSERT INTO dbo.Landmark (LandmarkName, Location, GeoLocation)
SELECT LTRIM(RTRIM(xmlValue.value('/values[1]/value[1]','nvarchar(100)'))) AS LandmarkName
, LTRIM(RTRIM(xmlValue.value('/values[1]/value[2]','nvarchar(100)'))) AS Location
, GEOGRAPHY::STPointFromText('POINT(' + xmlValue.value('/values[1]/value[4]','nvarchar(20)') + ' ' + xmlValue.value('/values[1]/value[3]','nvarchar(20)') + ')', 4326)
FROM dbo.SplitString((SELECT Document.* FROM OPENROWSET(BULK N'C:\Temp\test.csv', SINGLE_CLOB) AS Document), CHAR(10)) AS T
CROSS APPLY (SELECT CAST('<values><value>' + REPLACE(T.Value, '|', '</value><value>') + '</value></values>' AS XML)) AS T1(xmlValue)
Results:
I was looking at different ways of writing a stored procedure to return a "page" of data. This was for use with the ASP ObjectDataSource, but it could be considered a more general problem.
The requirement is to return a subset of the data based on the usual paging parameters; startPageIndex and maximumRows, but also a sortBy parameter to allow the data to be sorted. Also there are some parameters passed in to filter the data on various conditions.
One common way to do this seems to be something like this:
[Method 1]
;WITH stuff AS (
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
One problem with this is that it doesn't give the total count and generally we need another stored procedure for that. This second stored procedure has to replicate the parameter list and the complex WHERE clause. Not nice.
One solution is to append an extra column to the final select list, (SELECT COUNT(*) FROM stuff) AS TotalRows. This gives us the total but repeats it for every row in the result set, which is not ideal.
[Method 2]
An interesting alternative is given here (https://web.archive.org/web/20211020111700/https://www.4guysfromrolla.com/articles/032206-1.aspx) using dynamic SQL. He reckons that the performance is better because the CASE statement in the first solution drags things down. Fair enough, and this solution makes it easy to get the totalRows and slap it into an output parameter. But I hate coding dynamic SQL. All that 'bit of SQL ' + STR(#parm1) +' bit more SQL' gubbins.
[Method 3]
The only way I can find to get what I want, without repeating code which would have to be synchronized, and keeping things reasonably readable is to go back to the "old way" of using a table variable:
DECLARE #stuff TABLE (Row INT, ...)
INSERT INTO #stuff
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
(Or a similar method using an IDENTITY column on the table variable).
Here I can just add a SELECT COUNT on the table variable to get the totalRows and put it into an output parameter.
I did some tests and with a fairly simple version of the query (no sortBy and no filter), method 1 seems to come up on top (almost twice as quick as the other 2). Then I decided to test probably I needed the complexity and I needed the SQL to be in stored procedures. With this I get method 1 taking nearly twice as long as the other 2 methods. Which seems strange.
Is there any good reason why I shouldn't spurn CTEs and stick with method 3?
UPDATE - 15 March 2012
I tried adapting Method 1 to dump the page from the CTE into a temporary table so that I could extract the TotalRows and then select just the relevant columns for the resultset. This seemed to add significantly to the time (more than I expected). I should add that I'm running this on a laptop with SQL Server Express 2008 (all that I have available) but still the comparison should be valid.
I looked again at the dynamic SQL method. It turns out I wasn't really doing it properly (just concatenating strings together). I set it up as in the documentation for sp_executesql (with a parameter description string and parameter list) and it's much more readable. Also this method runs fastest in my environment. Why that should be still baffles me, but I guess the answer is hinted at in Hogan's comment.
I would most likely split the #SortBy argument into two, #SortColumn and #SortDirection, and use them like this:
…
ROW_NUMBER() OVER (
ORDER BY CASE #SortColumn
WHEN 'Name' THEN Name
WHEN 'OtherName' THEN OtherName
…
END *
CASE #SortDirection
WHEN 'DESC' THEN -1
ELSE 1
END
) AS Row
…
And this is how the TotalRows column could be defined (in the main select):
…
COUNT(*) OVER () AS TotalRows
…
I would definitely want to do a combination of a temp table and NTILE for this sort of approach.
The temp table will allow you to do your complicated series of conditions just once. Because you're only storing the pieces you care about, it also means that when you start doing selects against it further in the procedure, it should have a smaller overall memory usage than if you ran the condition multiple times.
I like NTILE() for this better than ROW_NUMBER() because it's doing the work you're trying to accomplish for you, rather than having additional where conditions to worry about.
The example below is one based off a similar query I'm using as part of a research query; I have an ID I can use that I know will be unique in the results. Using an ID that was an identity column would also be appropriate here, though.
--DECLARES here would be stored procedure parameters
declare #pagesize int, #sortby varchar(25), #page int = 1;
--Create temp with all relevant columns; ID here could be an identity PK to help with paging query below
create table #temp (id int not null primary key clustered, status varchar(50), lastname varchar(100), startdate datetime);
--Insert into #temp based off of your complex conditions, but with no attempt at paging
insert into #temp
(id, status, lastname, startdate)
select id, status, lastname, startdate
from Table1 ...etc.
where ...complicated conditions
SET #pagesize = 50;
SET #page = 5;--OR CAST(#startRowIndex/#pagesize as int)+1
SET #sortby = 'name';
--Only use the id and count to use NTILE
;with paging(id, pagenum, totalrows) as
(
select id,
NTILE((SELECT COUNT(*) cnt FROM #temp)/#pagesize) OVER(ORDER BY CASE WHEN #sortby = 'NAME' THEN lastname ELSE convert(varchar(10), startdate, 112) END),
cnt
FROM #temp
cross apply (SELECT COUNT(*) cnt FROM #temp) total
)
--Use the id to join back to main select
SELECT *
FROM paging
JOIN #temp ON paging.id = #temp.id
WHERE paging.pagenum = #page
--Don't need the drop in the procedure, included here for rerunnability
drop table #temp;
I generally prefer temp tables over table variables in this scenario, largely so that there are definite statistics on the result set you have. (Search for temp table vs table variable and you'll find plenty of examples as to why)
Dynamic SQL would be most useful for handling the sorting method. Using my example, you could do the main query in dynamic SQL and only pull the sort method you want to pull into the OVER().
The example above also does the total in each row of the return set, which as you mentioned was not ideal. You could, instead, have a #totalrows output variable in your procedure and pull it as well as the result set. That would save you the CROSS APPLY that I'm doing above in the paging CTE.
I would create one procedure to stage, sort, and paginate (using NTILE()) a staging table; and a second procedure to retrieve by page. This way you don't have to run the entire main query for each page.
This example queries AdventureWorks.HumanResources.Employee:
--------------------------------------------------------------------------
create procedure dbo.EmployeesByMartialStatus
#MaritalStatus nchar(1)
, #sort varchar(20)
as
-- Init staging table
if exists(
select 1 from sys.objects o
inner join sys.schemas s on s.schema_id=o.schema_id
and s.name='Staging'
and o.name='EmployeesByMartialStatus'
where type='U'
)
drop table Staging.EmployeesByMartialStatus;
-- Populate staging table with sort value
with s as (
select *
, sr=ROW_NUMBER()over(order by case #sort
when 'NationalIDNumber' then NationalIDNumber
when 'ManagerID' then ManagerID
-- plus any other sort conditions
else EmployeeID end)
from AdventureWorks.HumanResources.Employee
where MaritalStatus=#MaritalStatus
)
select *
into #temp
from s;
-- And now pages
declare #RowCount int; select #rowCount=COUNT(*) from #temp;
declare #PageCount int=ceiling(#rowCount/20); --assuming 20 lines/page
select *
, Page=NTILE(#PageCount)over(order by sr)
into Staging.EmployeesByMartialStatus
from #temp;
go
--------------------------------------------------------------------------
-- procedure to retrieve selected pages
create procedure EmployeesByMartialStatus_GetPage
#page int
as
declare #MaxPage int;
select #MaxPage=MAX(Page) from Staging.EmployeesByMartialStatus;
set #page=case when #page not between 1 and #MaxPage then 1 else #page end;
select EmployeeID,NationalIDNumber,ContactID,LoginID,ManagerID
, Title,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours
, CurrentFlag,rowguid,ModifiedDate
from Staging.EmployeesByMartialStatus
where Page=#page
GO
--------------------------------------------------------------------------
-- Usage
-- Load staging
exec dbo.EmployeesByMartialStatus 'M','NationalIDNumber';
-- Get pages 1 through n
exec dbo.EmployeesByMartialStatus_GetPage 1;
exec dbo.EmployeesByMartialStatus_GetPage 2;
-- ...etc (this would actually be a foreach loop, but that detail is omitted for brevity)
GO
I use this method of using EXEC():
-- SP parameters:
-- #query: Your query as an input parameter
-- #maximumRows: As number of rows per page
-- #startPageIndex: As number of page to filter
-- #sortBy: As a field name or field names with supporting DESC keyword
DECLARE #query nvarchar(max) = 'SELECT * FROM sys.Objects',
#maximumRows int = 8,
#startPageIndex int = 3,
#sortBy as nvarchar(100) = 'name Desc'
SET #query = ';WITH CTE AS (' + #query + ')' +
'SELECT *, (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingPageNo' +
', pagingCountRow / ' + CAST(#maximumRows as nvarchar(10)) + ' As pagingCountPage ' +
', (dt.pagingRowNo - 1) % ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingRowInPage ' +
'FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ' + #sortBy + ') As pagingRowNo, COUNT(*) OVER () AS pagingCountRow ' +
'FROM CTE) dt ' +
'WHERE (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 = ' + CAST(#startPageIndex as nvarchar(10))
EXEC(#query)
At result-set after query result columns:
Note:
I add some extra columns that you can remove them:
pagingRowNo : The row number
pagingCountRow : The total number of rows
pagingPageNo : The current page number
pagingCountPage : The total number of pages
pagingRowInPage : The row number that started with 1 in this page
Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collation?
Example. If I create a test database with collation SQL_Ukrainian_CP1251_CS_AS and then do CHAR(255) it returns я.
If I try the following on a database with SQL_Latin1_General_CP1_CS_AS collation however
SELECT CHAR(255) COLLATE SQL_Ukrainian_CP1251_CS_AS
It returns y
SELECT CHAR(255)
Returns ÿ so it is obviously going first via the database's default collation then trying to find the closest equivalent to that in the explicit collation. Can this be avoided?
Actually I have found an answer to my question now. A bit clunky but does the job unless there's a better way out there?
SET NOCOUNT ON;
CREATE TABLE #Collations
(
code TINYINT PRIMARY KEY
);
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1), --2
E02(N) AS (SELECT 1 FROM E00 a, E00 b), --4
E04(N) AS (SELECT 1 FROM E02 a, E02 b), --16
E08(N) AS (SELECT 1 FROM E04 a, E04 b) --256
INSERT INTO #Collations
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1
FROM E08
DECLARE #AlterScript NVARCHAR(MAX) = ''
SELECT #AlterScript = #AlterScript + '
RAISERROR(''Processing' + name + ''',0,1) WITH NOWAIT;
ALTER TABLE #Collations ADD ' + name + ' CHAR(1) COLLATE ' + name + ';
EXEC(''UPDATE #Collations SET ' + name + '=CAST(code AS BINARY(1))'');
EXEC(''UPDATE #Collations SET ' + name + '=NULL WHERE ASCII(' + name + ') <> code'');
'
FROM sys.fn_helpcollations()
WHERE name LIKE '%CS_AS'
AND name NOT IN /*Unicode Only Collations*/
( 'Assamese_100_CS_AS', 'Bengali_100_CS_AS',
'Divehi_90_CS_AS', 'Divehi_100_CS_AS' ,
'Indic_General_90_CS_AS', 'Indic_General_100_CS_AS',
'Khmer_100_CS_AS', 'Lao_100_CS_AS',
'Maltese_100_CS_AS', 'Maori_100_CS_AS',
'Nepali_100_CS_AS', 'Pashto_100_CS_AS',
'Syriac_90_CS_AS', 'Syriac_100_CS_AS',
'Tibetan_100_CS_AS' )
EXEC (#AlterScript)
SELECT * FROM #Collations
DROP TABLE #Collations
While MS SQL supports both code pages and Unicode unhelpfully it doesn't provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.
There are two potential methods I've seen to handle conversions, one is detailed here
http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571
and involves bolting a custom conversion program onto the database and using that for conversions.
The other is to construct a db table consisting of
[CodePage], [ANSI Value], [UnicodeValue]
with the unicode value stored as either the int representing the unicode character to be converted using nchar()or the nchar itself
Your using the collation SQL_Ukrainian_CP1251_CS_AS which is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
Its a TSV so after trimming the top off the raw data should import fairly cleanly.
Personally I'd lean more towards the latter than the former especially for a production server as the former may introduce instability.