String manipulation in SQL Server-- adding placeholder characters - sql-server

I'm a little green when it comes to SQL Server string manipulation functions. If I have a string with six characters in it, say:
DECLARE #p_MyStringVariable VARCHAR (100)
SET #p_MyStringVariable = 'FANFFF'
And I want to insert, say, the letter 'M' in the first and seventh positions of the final string and assign that to another VARCHAR variable to read 'MFANFFMF', how can I best do that? And am I correct in reading that SQL Server strings are indexed starting from one, instead of zero? I'm thinking of the SUBSTRING() function, for instance.
(Note that some strings will be up to 100 characters in length, thus the VARCHAR(100) declaration above, even for a six-character string)
Thanks much for your help.

You could also take a look at the STUFF function.
SELECT STUFF(STUFF(#p_MyStringVariable,1,0,'M'),7,0,'M')

Yes, SQL Server indexes varchar etc columns starting at one.
To insert at specfic points, use STUFF (and see Joes's answer for examples)

Related

SQL Server: STRING_SPLIT() result in a computed column

I couldn't find good documentation on this, but I have a table that has a long string as one of it's columns. Here's some example data of what it looks like:
Hello:Goodbye:Apple:Orange
Example:Seagull:Cake:Chocolate
I would like to create a new computed column using the STRING_SPLIT() function to return the third value in the string table.
Result #1: "Apple"
Result #2: "Cake"
What is the proper syntax to achieve this?
At this time your answer is not possible.
The output rows might be in any order. The order is not guaranteed to
match the order of the substrings in the input string.
STRING_SPLIT reference
There is no way to guarantee which item was the third item in the list using string_split and the order may change without warning.
If you're willing to build your own, I'd recommend reading up on the work done by
Brent Ozar and Jeff Moden.
You shouldn't be storing data like that in the first place. This points to a potentially serious database design problem. BUT you could convert this string into JSON by replacing : with ",", surround it with [" and "] and retrieve the third array element , eg :
declare #value nvarchar(200)='Example:Seagull:Cake:Chocolate'
select json_value('["' + replace(#value,':','","' )+ '"]','$[2]')
The string manipulations convert the string value to :
["Example","Seagull","Cake","Chocolate"]
After that, JSON_VALUE parses the JSON string and retrieves the 3rd item in the array using a JSON PATH expression.
Needless to say, this will be slow and can't take advantage of indexing. If those values are meant to be read or written individually, they should be stored in separate columns. They'll probably take less space than one long string.
If you have a lot of optional fields but only a subset contain values at any time, you could use sparse columns. This way you could have thousands of rows, only a few of which would contain data at any time

Unable to return query Thai data

I have a table with columns that contain both thai and english text data. NVARCHAR(255).
In SSMS I can query the table and return all the rows easy enough. But if I then query specifically for one of the Thai results it returns no rows.
SELECT TOP 1000 [Province]
,[District]
,[SubDistrict]
,[Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]
Returns
Province District SubDistrict Branch
อุตรดิตถ์ ลับแล ศรีพนมมาศ Northern
Bangkok Khlong Toei Khlong Tan SSS1
But this query:
SELECT [Province]
,[District]
,[SubDistrict]
,[Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]
where [Province] LIKE 'อุตรดิตถ์'
Returns no rows.
What do I need o do to get the expected results.
The collation set is Latin1_General_CI_AS.
The data is displayed and inserted with no errors just can't search.
Two problems:
The string being passed into the LIKE clause is VARCHAR due to not being prefixed with a capital "N". For example:
SELECT 'อุตรดิตถ์' AS [VARCHAR], N'อุตรดิตถ์' AS [NVARCHAR]
-- ????????? อุตรดิตถ
What is happening here is that when SQL Server is parsing the query batch, it needs to determine the exact type and value of all literals / constants. So it figures out that 12 is an INT and 12.0 is a NUMERIC, etc. It knows that N'ดิ' is NVARCHAR, which is an all-inclusive character set, so it takes the value as is. BUT, as noted before, 'ดิ' is VARCHAR, which is an 8-bit encoding, which means that the character set is controlled by a Code Page. For string literals and variables / parameters, the Code Page used for VARCHAR data is the Database's default Collation. If there are characters in the string that are not available on the Code Page used by the Database's default Collation, they are either converted to a "best fit" mapping, if such a mapping exists, else they become the default replacement character: ?.
Technically speaking, since the Database's default Collation controls string literals (and variables), and since there is a Code Page for "Thai" (available in Windows Collations), then it would be possible to have a VARCHAR string containing Thai characters (meaning: 'ดิ', without the "N" prefix, would work). But that would require changing the Database's default Collation, and that is A LOT more work than simply prefixing the string literal with "N".
For an in-depth look at this behavior, please see my two-part series:
Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition? (Part A of 2: “Duck”)
Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition? (Part B of 2: “Rabbit”)
You need to add the wildcard characters to both ends:
N'%อุตรดิตถ์%'
The end result will look like:
WHERE [Province] LIKE N'%อุตรดิตถ์%'
EDIT:
I just edited the question to format the "results" to be more readable. It now appears that the following might also work (since no wildcards are being used in the LIKE predicate in the question):
WHERE [Province] = N'อุตรดิตถ์'
EDIT 2:
A string (i.e. something inside of single-quotes) is VARCHAR if there is no "N" prefixed to the string literal. It doesn't matter what the destination datatype is (e.g. an NVARCHAR(255) column). The issue here is the datatype of the source data, and that source is a string literal. And unlike a string in .NET, SQL Server handles 'string' as an 8-bit encoding (VARCHAR; ASCII values 0 - 127 same across all Code Pages, Extended ASCII values 128 - 255 determined by the Code Page, and potentially 2-byte sequences for Double-Byte Character Sets) and N'string' as UTF-16 Little Endian (NVARCHAR; Unicode character set, 2-byte sequences for BMP characters 0 - 65535, two 2-byte sequences for Code Points above 65535). Using 'string' is the same as passing in a VARCHAR variable. For example:
DECLARE #ASCII VARCHAR(20);
SET #ASCII = N'อุตรดิตถ์';
SELECT #ASCII AS [ImplicitlyConverted]
-- ?????????
Could be a number of things!
Fist of print out the value of the column and your query string in hex.
SELECT convert(varbinary(20)Province) as stored convert(varbinary(20),'อุตรดิตถ์') as query from allDistricsBranches;
This should give you some insight to the problem. I think the most likely cause is the ั, ิ, characters being typed in the wrong sequence. They are displayed as part of the main letter but are stored internally as separate characters.

SQL LIKE Operator doesn't work with Asian Languages (SQL Server 2008)

Dear Friends,
I've faced with a problem never thought of ever. My problem seems too simple but I can't find a solution to it.
I have a sql server database column that is of type NVarchar and is filled with standard persian characters. when I'm trying to run a very simple query on it which incorporates the LIKE operator, the resultset becomes empty although I know the query term is present in the table. Here is the very smiple example query which doesn't act corectly:
SELECT * FROM T_Contacts WHERE C_ContactName LIKE '%ف%'
ف is a persian character and the ContactName coulmn contains multiple entries which contain that character.
Please tell me how should I rewrite the expression or what change should I apply. Note that my database's collation is SQL_Latin1_General_CP1_CI_AS.
Thank you very much
Also, if those values are stored as NVARCHAR (which I hope they are!!), you should always use the N'..' prefix for any string literals to make sure you don't get any unwanted conversions back to non-Unicode VARCHAR.
So you should be searching:
SELECT * FROM T_Contacts
WHERE C_ContactName COLLATE Persian_100_CI_AS LIKE N'%ف%'
Shouldn't it be:
SELECT * FROM T_Contacts WHERE C_ContactName LIKE N'%ف%'
ie, with the N in front of the comparing string, so it treats it like an nvarchar?

How to get number of chars in string in Transact SQL, the "other way"

We faced a very strange issue (really strange for such mature product):
how to get number of characters in Unicode string using Transact-SQL statements.
The key problem of this issue that the len() TSQL function returns number of chars, excluding trailing blanks. The other variant is to use datalength (which return number of bytes) and divide by 2, so get numbers of Unicode chars. But Unicode chars can be surrogate pairs so it won't work either.
We have 2 variants of solution: the first is to use len(replace()) and the second is add a single symbol and then subtract 1 from result. But IMO both variants are rather ugly.
declare #txt nvarchar(10)
set #txt = 'stack '
select #txt as variable,
len(#txt) as lenBehaviour,
DATALENGTH(#txt)/2 as datalengthBehaviour,
len(replace(#txt,' ','O')) as ReplaceBehaviour,
len(#txt+'.')-1 as addAndMinusBehaviour
Any other ideas how to count chars in string with trailing spaces?
I can't leave a comment so I will have to leave an answer (or shutup).
My vote would be for the addAndMinusBehaviour
I haven't got a good third alternative, there maybe some obscure whitespace rules to fiddle with in the options / SET / Collation assignment but don't know more detail off the top of my head.
but really addAndMinusBehaviour is probably the eaiest to implement, fastest to execute and if you document it, farily maintainable as well.
CREATE FUNCTION [dbo].[ufn_CountChar] ( #pInput VARCHAR(1000), #pSearchChar CHAR(1) )
RETURNS INT
BEGIN
RETURN (LEN(#pInput) - LEN(REPLACE(#pInput, #pSearchChar, '')))
END
GO
My understanding is that DATALENGTH(#txt)/2 should always give you the number of characters. SQL Server stores Unicode characters in UCS-2 which does not support surrogate pairs.
http://msdn.microsoft.com/en-us/library/ms186939.aspx
http://en.wikipedia.org/wiki/UCS2

Do I have use the prefix N in the "insert into" statement for unicode?

Like:
insert into table (col) values (N'multilingual unicode strings')
I'm using SQL Server 2008 and I already use nVarChar as the column data type.
You need the N'' syntax only if the string contains characters which are not inside the default code page. "Best practice" is to have N'' whenever you insert into an nvarchar or ntext column.
Yes, you do if you have unicode characters in the strings.
From books online (http://msdn.microsoft.com/en-us/library/ms191313.aspx)...
"Unicode string constants that appear in code executed on the server, such as in stored procedures and triggers, must be preceded by the capital letter N. This is true even if the column being referenced is already defined as Unicode. Without the N prefix, the string is converted to the default code page of the database. This may not recognize certain characters. The requirement to use the N prefix applies to both string constants that originate on the server and those sent from the client."
It is preferable for compatibility sake.
Best practice is to use parameterisation in which case you don't need the N prefix.

Resources