I have a view that is defined like so:
CREATE VIEW dbo.v_ListingTestView
WITH SCHEMABINDING
AS
SELECT
[ListingID],
[BusinessName],
[Description],
[ProductDescription],
[Website],
[ListingTypeID],
ISNULL([BusinessName], '') + ' ' + ISNULL([Description], '') + ' ' + ISNULL([ProductDescription], '') + ' ' AS [ComputedText]
FROM dbo.Listings;
I use this view when a user searches for a record in the database. The keyword the user provides when searching is compared to the computed column. The columns that are included in the computed column are all NVARCHAR columns. I would like to create an index on this column to help speed up searching.
I was following a tutorial to add an index to the computed column, but I ran into an issue where my computed column was non-deterministic and could not complete the tutorial. If anyone has suggestions on how to accomplish this I would appreciate it. Or if I should go about this a different way.
What length are your columns? ISNULL() is a deterministic function and shouldn't cause any issues.
I'm guessing that's length of your columns that causes it. As you might know, index cannot be longer than 900 symbols. Since you're storing them as NVARCHAR, it takes double space as a varchar and needs additional two bytes - this means your index can store up to 448 symbols.
Variable-length Unicode string data. n defines the string length and
can be a value from 1 through 4,000. max indicates that the maximum
storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is
two times the actual length of data entered + 2 bytes. The ISO
synonyms for nvarchar are national char varying and national character
varying.
So please try to cast your column at specific length, perhaps 400?
CREATE VIEW dbo.v_ListingTestView
WITH SCHEMABINDING
AS
SELECT [ListingID]
, [BusinessName]
, [Description]
, [ProductDescription]
, [Website]
, [ListingTypeID]
, CAST(ISNULL([BusinessName], '') + ' ' + ISNULL([Description], '') + ' ' + ISNULL([ProductDescription], '') AS NVARCHAR(400)) AS [ComputedText]
FROM dbo.Listings;
On top of that, why you'd be creating a view on top of that? Perhaps it would make more sense to alter your table accordingly and only then add index on it:
ALTER TABLE dbo.Listings
ADD [ComputedText] AS CAST(ISNULL([BusinessName], '') + ' ' + ISNULL([Description], '') + ' ' + ISNULL([ProductDescription], '') AS NVARCHAR(400));
Now knowing that your NVARCHARS are stored as MAX, perhaps it would be a better idea to start using Full Text Search? This question might be worth looking at then: How do you implement a fulltext search over multiple columns in sql server?
Related
I have a lot of questions from a survey im using for a pivot table. To collect all the questions to my pivot dynamically im using stuff and for xml path. However it seems like question text > 130 in length is not showing.
And i can select all the columns from my cte Questions, so I know the data is there.
UPDATE: If I select my output, my total length is around 8.000 could it be something about the nvarchar(max) not storing more than 8.000 even though it should be able to store around 2gb?
What am I doing wrong?
SELECT QuestionList = cast(STUFF((
SELECT ',' + QUOTENAME(cast(question AS NVARCHAR(max)))
FROM questions
ORDER BY [AgpdbQuestionID]
FOR XML PATH('')
), 1, 1, '') AS NVARCHAR(max))
This is because of QUOTENAME, if input is larger than 128 it returns NULL because it is supposed to handle sysname, not (N)VARCHAR:
"character_string is sysname and is limited to 128 characters. Inputs greater than 128 characters return NULL."
Instead try:
SELECT QuestionList = cast(STUFF((
SELECT ',' + '[' + (cast(question AS NVARCHAR(max)) + ']')
FROM (
VALUES (REPLICATE('a', 130))
)q(question)
FOR XML PATH('')
), 1, 1, '') AS NVARCHAR(max))
Just as another way of achieving this. This method achieves the same without using XML so that you aren't restricted to certain characters. It itterates through your table, building the string with each row, with the last instance being set to your variable #QuestionList.
Declare #QuestionList AS NVARCHAR(max)
SELECT
#QuestionList = isnull(#QuestionList + ', ', '') + question
FROM
questions
ORDER BY
AgpdbQuestionID
It is important to use the isnull, as this achives ommiting the first comma when the existing string is null.
I'd be intregeagued to see how efficient this is compared to the XML method, but this has been useful for myself when I've needed ceratin characters like >, <, " and '
I have a data set of about 33 million rows and 20 columns. One of the columns is a raw data tab I'm using to extract relevant data from, inlcuding ID's and account numbers.
I extracted a column for User ID's into a temporary table to trim the User ID's of spaces. I'm now trying to add the trimmed User ID column back into the original data set using this code:
SELECT *
FROM [dbo].[DATA] AS A
INNER JOIN #TempTable AS B ON A. [RawColumn] = B. [RawColumn]
Extracting the User ID's and trimming the spaces took about a minute for each query. However, running this last query I'm at the 2 hour mark and I'm only 2% of the way through the dataset.
Is there a better way to run the query?
I'm running the query in SQL Server 2014 Management Studio
Thanks
Update:
I continued to let it run through the night. When I got back into work, only 6 million rows had been completed of the 33 million rows. I cancelled the execution and I'm trying to add a smaller primary key (The only other key I could see on the table was the [RawColumn], which was a very long string of text) using:
ALTER TABLE [dbo].[DATA]
ADD ID INT IDENTITY(1,1)
Right now I'm an hour into the execution.
Next, I'm planning to make it the primary key using
ALTER TABLE dbo.[DATA]
ADD CONSTRAINT PK_[DATA] PRIMARY KEY(ID)
I'm not familiar with using Indexes.. I've tried looking up on Stack Overflow how to create one, but from what I'm reading it sounds like it would take just as long to create an index as it would to run this query. Am I wrong about that?
For context on the RawColumn data, it looks something like this:
FirstName: John LastName: Smith UserID: JohnS Account#: 000-000-0000
Update #2:
I'm now learning that using "ALTER TABLE" is a bad idea. I should have done a little bit more research into how to add a primary key to a table.
Update #3
Here's the code I used to extract the "UserID" code out of the "RawColumn" data.
DROP #TEMPTABLE1
GO
SELECT [RAWColumn],
SUBSTRING([RAWColumn], CHARINDEX('USERID:', [RAWColumn])+LEN('USERID:'), CHARINDEX('Account#:', [RAWColumn])-Charindex('Username:', [RAWColumn]) - LEN('Account#:') - LEN('USERID:')) AS 'USERID_NEW'
INTO #TempTable1
FROM [dbo].[DATA]
Next I trimmed the data from the temporary tables
DROP #TEMPTABLE2
GO
SELECT [RawColumn],
LTRIM([USERID_NEW]) AS 'USERID_NEW'
INTO #TempTable2
FROM #TempTable1
So now I'm trying to get the data from #TEMPTABLE2 back into my original [DATA] table. Hopefully this is more clear now.
So I think your parsing code is a little bit wrong. Here's an approach that doesn't assume that the values appear in any particular order. It does assume that the header/tag name has a space after the colon character and it assumes that the value end at the subsequent space character. Here's a snippet that manipulates a single value.
declare #dat varchar(128) = 'FirstName: John LastName: Smith UserID: JohnS Account#: 000-000-0000';
declare #tag varchar(16) = 'UserID: ';
/* datalength() counts the trailing space character unlike len() */
declare #idx int = charindex(#tag, #dat) + datalength(#tag);
select substring(#dat, #idx, charindex(' ', #dat + ' ', #idx + 1) - #idx) as UserID
To use it in a single query without the temporary variable, the most straightforward approach is to just replace each instance of "#idx" with the original expression:
declare #tag varchar(16) = 'UserID: ';
select RawColumn,
substring(
RawColumn,
charindex(#tag, RawColumn) + datalength(#tag),
charindex(
' ', RawColumn + ' ',
charindex(#tag, RawColumn) + datalength(#tag) + 1
) - charindex(#tag, RawColumn) + datalength(#tag)
) as UserID
from dbo.DATA;
As an update it looks something like this:
declare #tag varchar(16) = 'UserID: ';
update dbo.DATA
set UserID =
substring(
RawColumn,
charindex(#tag, RawColumn) + datalength(#tag),
charindex(
' ', RawColumn + ' ',
charindex(#tag, RawColumn) + datalength(#tag) + 1
) - charindex(#tag, RawColumn) + datalength(#tag)
) as UserID;
You also appear to be ignoring upper/lower case in your string matches. It's not clear to me whether you need to consider that more carefully.
I have a column with data like this:
COMPRESSIN 1(PB4),
COMPRESSIN 12(PB4),
COMPRESSIN 3(PB4).
I want to order the column by the COMPRESSION. So 1, 3, and 12 respectively as shown below:
COMPRESSION 1(PB4),
COMPRESSION 3(PB4),
COMPRESSION 12(PB4).
You need to order by a substring.
...
order by
substring(yourColumn,12,99)
This basically sorts on everything after 'COMPRESSION ' It's important to know that this is still sorting based on varchar order, which is using unicode decimal values. Thus, b comes before A due to it's case sensitivity, etc...
If your columns all end with the (XXX) pattern, you can use this to circumvent this.
order by
left(substring(yourColumn,12,99),len(substring(yourColumn,12,99)) - 5)
Another way you can do this is to remove the non-numeric portion while ordering:
ORDER BY CAST(
REPLACE(
REPLACE(YourColumn, 'COMPRESSION ','')
, '(PB4)','')
AS int) ASC
Here is a potential solution or at least one that can help put you in the right direction. I recommend you consider the clarity of your question and add more detail.
SELECT
ColumnName
FROM
TableName
ORDER BY
CAST(SUBSTRING(ColumnName, CHARINDEX(' ', ColumnName) + 1, CHARINDEX('(', ColumnName) - CHARINDEX(' ', ColumnName) - 1) AS int)
This works by using CHARINDEX to find the space after compression as a starting position (+1 to begin after the space) and the open parenthesis as the ending position to isolate the numeric value for sorting. You also need to convert this value to some type of numeric value like an integer.
I have a field in my table which has multiple reason codes concatenated in 1 column.
e.g. 2 records
Reason_Codes
Record1: 001,002,004,009,010
Record2: 001,003,005,006
In my SSRS report the user will be searching for data using one of the above reason codes. e.g.
001 will retrieve both records.
005 will retrieve the second record
and so on.
Kindly advise how this can be achieved using SQL or Stored Procedure.
Many thanks.
If you are just passing in a single Reason Code to search on, you don't even need to bother with splitting the comma-separated list: you can just use a LIKE clause as follows:
SELECT tb.field1, tb.field2
FROM SchemaName.TableName tb
WHERE ',' + tb.Reason_Codes + ',' LIKE '%,' + #ReasonCode + ',%';
Try the following to see:
DECLARE #Bob TABLE (ID INT IDENTITY(1, 1) NOT NULL, ReasonCodes VARCHAR(50));
INSERT INTO #Bob (ReasonCodes) VALUES ('001,002,004,009,010');
INSERT INTO #Bob (ReasonCodes) VALUES ('001,003,005,006');
DECLARE #ReasonCode VARCHAR(5);
SET #ReasonCode = '001';
SELECT tb.ID, tb.ReasonCodes
FROM #Bob tb
WHERE ',' + tb.ReasonCodes + ',' LIKE '%,' + #ReasonCode + ',%';
-- returns both rows
SET #ReasonCode = '005';
SELECT tb.ID, tb.ReasonCodes
FROM #Bob tb
WHERE ',' + tb.ReasonCodes + ',' LIKE '%,' + #ReasonCode + ',%';
-- returns only row #2
I have blogged about something like this a long time ago. May be this will help: http://dotnetinternal.blogspot.com/2013/10/comma-separated-to-temp-table.html
The core solution would be to convert the comma separated values into a temporary table and then do a simple query on the temporary table to get your desired result.
Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collation?
Example. If I create a test database with collation SQL_Ukrainian_CP1251_CS_AS and then do CHAR(255) it returns я.
If I try the following on a database with SQL_Latin1_General_CP1_CS_AS collation however
SELECT CHAR(255) COLLATE SQL_Ukrainian_CP1251_CS_AS
It returns y
SELECT CHAR(255)
Returns ÿ so it is obviously going first via the database's default collation then trying to find the closest equivalent to that in the explicit collation. Can this be avoided?
Actually I have found an answer to my question now. A bit clunky but does the job unless there's a better way out there?
SET NOCOUNT ON;
CREATE TABLE #Collations
(
code TINYINT PRIMARY KEY
);
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1), --2
E02(N) AS (SELECT 1 FROM E00 a, E00 b), --4
E04(N) AS (SELECT 1 FROM E02 a, E02 b), --16
E08(N) AS (SELECT 1 FROM E04 a, E04 b) --256
INSERT INTO #Collations
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1
FROM E08
DECLARE #AlterScript NVARCHAR(MAX) = ''
SELECT #AlterScript = #AlterScript + '
RAISERROR(''Processing' + name + ''',0,1) WITH NOWAIT;
ALTER TABLE #Collations ADD ' + name + ' CHAR(1) COLLATE ' + name + ';
EXEC(''UPDATE #Collations SET ' + name + '=CAST(code AS BINARY(1))'');
EXEC(''UPDATE #Collations SET ' + name + '=NULL WHERE ASCII(' + name + ') <> code'');
'
FROM sys.fn_helpcollations()
WHERE name LIKE '%CS_AS'
AND name NOT IN /*Unicode Only Collations*/
( 'Assamese_100_CS_AS', 'Bengali_100_CS_AS',
'Divehi_90_CS_AS', 'Divehi_100_CS_AS' ,
'Indic_General_90_CS_AS', 'Indic_General_100_CS_AS',
'Khmer_100_CS_AS', 'Lao_100_CS_AS',
'Maltese_100_CS_AS', 'Maori_100_CS_AS',
'Nepali_100_CS_AS', 'Pashto_100_CS_AS',
'Syriac_90_CS_AS', 'Syriac_100_CS_AS',
'Tibetan_100_CS_AS' )
EXEC (#AlterScript)
SELECT * FROM #Collations
DROP TABLE #Collations
While MS SQL supports both code pages and Unicode unhelpfully it doesn't provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.
There are two potential methods I've seen to handle conversions, one is detailed here
http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571
and involves bolting a custom conversion program onto the database and using that for conversions.
The other is to construct a db table consisting of
[CodePage], [ANSI Value], [UnicodeValue]
with the unicode value stored as either the int representing the unicode character to be converted using nchar()or the nchar itself
Your using the collation SQL_Ukrainian_CP1251_CS_AS which is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
Its a TSV so after trimming the top off the raw data should import fairly cleanly.
Personally I'd lean more towards the latter than the former especially for a production server as the former may introduce instability.