mssql: manipulating text (not string) values

mssql: manipulating text (not string) values - sql-server

I want to ask:
len(someTextField)
ltrim(someTextField)
which doesn't work, as the above functions apparently only work with string but not text
Any alternative ways to perform the same?

Use varchar(max) not text since SQL Server 2005+
If you are stuck with text or ntext, you can use DATALENGTH to replace LEN. There are differences with it (no implied RTRIM, measures bytes not characters) but your options are limited.
There is no LTRIM equivalent for the old datatypes
You can CAST the columns though:
CREATE TABLE dbo.Typetest (OldCol text NULL)
GO
INSERT dbo.TypeTest (OldCol) VALUES ('abcdefg')
INSERT dbo.TypeTest (OldCol) VALUES ('abcdefghijklm')
GO
SELECT LEN(CAST(OldCol AS varchar(max))), LTRIM(CAST(OldCol AS varchar(max))) FROM dbo.Typetest
GO
DROP TABLE dbo.Typetest
GO

Related

Is there a way to auto truncate column values while bulk insert in snowflake

While inserting long values in snowflake we get error as :
String is too long and would be truncated.
Is there anyway we can specify to truncate automatically and proceed with insert?
I am using :
INSERT INTO TABLE_1 VALUES (SELECT * FROM TABLE_2)
I have found this article but not work with snowflake : (https://www.mytecbits.com/microsoft/sql-server/avoid-error-string-or-binary-data-would-be-truncated )
SET ANSI_WARNINGS OFF
INSERT INTO Table_A VALUES ('long value. ....');
SET ANSI_WARNINGS ON
I can use left() or substring() function but wanted to know if there is any other way.

If you use COPY(which much more performant for batch insert) instead of INSERT INTO, then the parameter TRUNCATECOLUMNS could be utilized.
COPY INTO:
TRUNCATECOLUMNS
Boolean that specifies whether to truncate text strings that exceed the target column length:
If TRUE, strings are automatically truncated to the target column length.
If FALSE, the COPY statement produces an error if a loaded string exceeds the target column length.
This copy option supports CSV data, as well as string values in semi-structured data when loaded into separate columns in relational tables.

text encodings in .net, sql server processing

I have an application that gets terms from a DB to run as a list of string terms. The DB table was set up with nvarchar for that column to include all foreign characters. Now in some cases where characters like ä will come through clearly when getting the terms from the DB and even show that way in the table.
When importing japanese or arabic characters, all I see are ????????.
Now I have tried converting it using different methods, first converting it into utf8 encoding and then back and also secondly using the httputility.htmlencode which works perfectly when it is these characters but then converts quotes and other stuff which I dont need it to do.
Now I accused the db designer that he needs to do something on his part but am I wrong in that the DB should display all these characters and make it easy to just query it and add to my ssearch list. If not is there a consistent way of getting all international characters to display correctly in SQL and VB.net
I know when I have read from text files I just used the Microsoft.visualbasic.textfieldparser reader tool with encoding set to utf8 and this would not be an issue.

If the database field is nvarchar, then it will store data correctly. As you have seen.
Somewhere before it gets to the database, the data is being lost or changed to varchar: stored procedure, parameters, file encoding, ODBC translation etc.
DECLARE #foo nvarchar(100), #foo2 varchar(100)
--with arabic and japanese and proper N literal
SELECT #foo = N'العربي 日本語', #foo2 = N'العربي 日本語'
SELECT #foo, #foo2 -- gives العربي 日本語
--now a varchar literal
SELECT #foo = 'العربي 日本語', #foo2 = 'العربي 日本語'
SELECT #foo, #foo2 --gives ?????? ???
--from my Swiss German keyboard. These are part of my code page.
SELECT #foo = 'öéäàüè', #foo2 = 'öéäàüè'
SELECT #foo, #foo2 --gives ?????? ???
So, apologise to the nice DB monkey... :-)

Always try to use NVARCHAR or NTEXT to store foreign charactesr.
you cannot store UNICODE in varchar ot text datatype.
Also put a N before string value
like
UPDATE [USER]
SET Name = N'日本語'
WHERE ID = XXXX;

How to Show Eastern Letter(Chinese Character) on SQL Server/SQL Reporting Services?

I need to insert chinese characters in my database but it always show ???? ..
Example:
Insert this record.
微波室外单元-Apple
Then it became ???
Result:
??????-Apple
I really Need Help...thanks in regard.
I am using MSSQL Server 2008

Make sure you specify a unicode string with a capital N when you insert like:
INSERT INTO Table1 (Col1) SELECT N'微波室外单元-Apple' AS [Col1]
and that Table1 (Col1) is an NVARCHAR data type.

Make sure the column you're inserting to is nchar, nvarchar, or ntext. If you insert a Unicode string into an ANSI column, you really will get question marks in the data.
Also, be careful to check that when you pull the data back out you're not just seeing a client display problem but are actually getting the question marks back:
SELECT Unicode(YourColumn), YourColumn FROM YourTable
Note that the Unicode function returns the code of only the first character in the string.
Once you've determined whether the column is really storing the data correctly, post back and we'll help you more.

Try adding the appropriate languages to your Windows locale setings. you'll have to make sure your development machine is set to display Non-Unicode characters in the appropriate language.
And ofcourse u need to use NVarchar for foreign language feilds

Make sure that you have set an encoding for the database to one that supports these characters. UTF-8 is the de facto encoding as it's ASCII compatible but supports all 1114111 Unicode code points.

SELECT 'UPDATE table SET msg=UNISTR('''||ASCIISTR(msg)||''') WHERE id='''||id||''' FROM table WHERE id= '123344556' ;

SQL Server Text Datatype Maxlength = 65,535?

Software I'm working with uses a text field to store XML. From my searches online, the text datatype is supposed to hold 2^31 - 1 characters. Currently SQL Server is truncating the XML at 65,535 characters every time. I know this is caused by SQL Server, because if I add a 65,536th character to the column directly in Management Studio, it states that it will not update because characters will be truncated.
Is the max length really 65,535 or could this be because the database was designed in an earlier version of SQL Server (2000) and it's using the legacy text datatype instead of 2005's?
If this is the case, will altering the datatype to Text in SQL Server 2005 fix this issue?

that is a limitation of SSMS not of the text field, but you should use varchar(max) since text is deprecated
Here is also a quick test
create table TestLen (bla text)
insert TestLen values (replicate(convert(varchar(max),'a'), 100000))
select datalength(bla)
from TestLen
Returns 100000 for me

MSSQL 2000 should allow up to 2^31 - 1 characters (non unicode) in a text field, which is over 2 billion. Don't know what's causing this limitation but you might wanna try using varchar(max) or nvarchar(max). These store as many characters but allow also the regular string T-SQL functions (like LEN, SUBSTRING, REPLACE, RTRIM,...).

If you're able to convert the column, you might as well, since the text data type will be removed in a future version of SQL Server. See here.
The recommendation is to use varchar(MAX) or nvarchar(MAX). In your case, you could also use the XML data type, but that may tie you to certain database engines (if that's a consideration).

You should have a look at
XML Support in Microsoft SQL Server
2005
Beginning SQL Server 2005 XML
Programming
So I would rather try to use the data type appropriate for the use. Not make a datatype fit your use from a previous version.

Here's a little script I wrote for getting out all data
SELECT #data = N'huge data';
DECLARE #readSentence NVARCHAR (MAX) = N'';
DECLARE #dataLength INT = ( SELECT LEN (#data));
DECLARE #currIndex INT = 0;
WHILE #data <> #readSentence
BEGIN
DECLARE #temp NVARCHAR (MAX) = N'';
SET #temp = ( SELECT SUBSTRING (#data, #currIndex, 65535));
SELECT #temp;
SET #readSentence += #temp;
SET #currIndex += 65535;
END;

Sql Server XML-type column duplicate entry detection

In Sql Server I am using an XML type column to store a message. I do not want to store duplicate messages.
I only will have a few messages per user. I am currently querying the table for these messages, converting the XML to string in my C# code. I then compare the strings with what I am about to insert.
Unfortunately, Sql Server pretty-prints the data in the XML typed fields. What you store into the database is not necessarily exactly the same string as what you get back out later. It is functionally equivalent, but may have white space removed, etc.
Is there an efficient way to compare an XML string that I am considering inserting with those that are already in the database? As an aside, if I detect a duplicate I need to delete the older message then insert the replacement.

0 - Add a hash column to your table
1 - when you receive a new message, convert the whole XML to uppercase, remove all blanks and returns/linefeed, then compute the hash value of the normalized string.
2 - check if you already have a row with the resulting hash code in it.
If yes, this is duplicated, treat it
accordingly
If not, store the original XML along with the hash in a new row

I'm not 100% sure on your exact implementation but here is something I played around with. The idea being a stored procedure would do the inserting. Inserting into the messages table does a basic check on existing messages (SQL 2008 syntax):
declare #messages table (msg xml)
insert into #messages values
('<message>You like oranges</message>')
,('<message>You like apples</message>')
declare #newMessage xml = '<message>You like apples</message>'
insert into #messages (msg)
select #newMessage
where #newMessage.value('(message)[1]', 'nvarchar(50)') not in (
select msg.value('(message)[1]', 'nvarchar(50)')
from #messages
)

One solution is to stop using the XML typed field. Store the XML string into a varchar typed field.
I don't really like this solution, but I don't really like p.marino's solution either. It doesn't seem right to store a hash of something that is already in the row in the table.

What if you use OPENXML on each row in the table and query the actual XML information for key nodes and/or key attributes? But then you need to do it row by row, I don't think OPENXML works with a whole set of table rows.