SQL Server - trying to convert column to XML fails - sql-server

I'm in the process of importing data from a legacy MySQL database into SQL Server 2005.
I have one table in particular that's causing me grief. I've imported it from MySQL using a linked server and the MySQL ODBC driver, and I end up with this:
Col Name Datatype MaxLen
OrderItem_ID bigint 8
PDM_Structure_ID int 4
LastModifiedDate datetime 8
LastModifiedUser varchar 20
CreationDate datetime 8
CreationUser varchar 20
XMLData text -1
OrderHeader_ID bigint 8
Contract_Action varchar 10
ContractItem int 4
My main focus is on the XMLData column - I need to clean it up and make it so that I can convert it to an XML datatype to use XQuery on it.
So I set the table option "large data out of row" to 1:
EXEC sp_tableoption 'OrderItem', 'large value types out of row', 1
and then I go ahead and convert XMLData to VARCHAR(MAX) and do some cleanup of the XML stored in that field. All fine so far.
But when I now try to convert that column to XML datatype:
ALTER TABLE dbo.OrderItem
ALTER COLUMN XMLData XML
I get this message here:
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8077 which
is greater than the allowable maximum
row size of 8060. The statement has
been terminated.
which is rather surprising, seeing that the columns besides the XMLData only make up roughly 90 bytes, and I specifically instructed SQL Server to store all "large data" off-row....
So why on earth does SQL Server refuse to convert that column to XML data??? Any ideas?? Thoughts?? Things I can check / change in my approach??
Update: I don't know what changed, but on a second attempt to import the raw data from MySQL into SQL Server, I was successfully able to convert that NTEXT -> VARCHAR(MAX) column to XML in the end..... odd..... anyhoo - works now - thanks guys for all your input and recommendations! Highly appreciated !

If you have sufficient storage space, you could try selecting from the VARCHAR(MAX) version of the table into a new table with the same schema but with XMLData set up as XML - either using SELECT INTO or by explicitly creating the table before you begin.
PS - it's a side issue unrelated to your problem, but you might want to check that you're not losing Unicode characters in the original MySQL XMLData field by this conversion since the text/varchar data types won't support them.

Can you ADD a new column of type xml?
If so, add the new xml column, update the table to set the new column equal to the XmlData column and then drop the XmlData column.
Edit
I have a table "TestTable" with a "nvarchar(max)" column.
select * from sys.tables where name = 'TestTable'
This gives a result containing:
[lob_data_space_id] [text_in_row_limit] [large_value_types_out_of_row]
1 0 0
yet I can happily save 500k characters in my nvarchar(max) field.
What do you get if you query sys.tables for your OrderItems table?
If your [text_in_row_limit] is not zero, try this, which should convert any existing in-row strings into BLOBs:
exec sp_tableoption 'OrderItems', 'text in row', 0
and then try to switch from nvarchar(max) to xml.
From BOL,
Disabling the text in row option or
reducing the limit of the option will
require the conversion of all BLOBs;
therefore, the process can be long,
depending on the number of BLOB
strings that must be converted. The
table is locked during the conversion
process.

Related

How to cast a text column into integer column in MS SQL Express?

I am using Microsoft SQL Express and SQL Server Management Studio.
I am following a tutorial to create a small table from scratch and enter some values as per the below code. The tutorial is teaching how to correctly cast a column if by mistake it is incorrectly declared in the first place.
CREATE TABLE transactions(
transaction_date date,
amount integer,
fee text
);
SELECT * FROM transactions;
INSERT INTO transactions (transaction_date, amount, fee)
VALUES ('2018-09-24', 5454, '30');
The 'fee' column wrongly created as text. I am trying to typecast this column into integer while using the below code. But this is giving following error. Any suggestions?
SELECT transaction_date, amount + CAST (fee AS integer) AS net_amount
FROM transactions;
Explicit conversion from data type text to int is not allowed.
The error is telling you the problem here; you can't explicitly (or implicitly) convert/cast a text value to an int. text has been deprecated for 16 years, so you should not be using it. It was replaced by varchar(MAX) way back in 2005 (along with nvarchar(MAX) for ntext and varbinary(MAX) for image).
Instead, you'll need to convert the value to a varchar first, and then an int. I also recommend using TRY_CONVERT for the latter, as a value like '3.0' will fail to convert:
SELECT TRY_CONVERT(int,CONVERT(varchar(MAX),fee))
FROM dbo.transactions;
Of course, what you should really be doing is fixing the table:
ALTER TABLE dbo.transactions ADD TextFee varchar(MAX) NULL; --To retain any data that couldn't be converted
GO
UPDATE dbo.transactions
SET fee = TRY_CONVERT(int,CONVERT(varchar(MAX),fee)),
TextFee = fee;
GO
ALTER TABLE dbo.transactions ALTER COLUMN fee int;

String variable is truncated when inserted into an nvarchar(max) column

I take XML from a webservice and it's very long (> 1 million characters). I put the XML in an SSIS variable.
I want to put the raw XML from the variable into a SQL Server 2012 table. Table column is nvarchar(max).
From sql task i use simple
Insert (xml) values (#variable)
However when i look at the column length in SQL Server, only 500k chars there!
Why is this?

How to query for rows containing <Unable to read data> in a column?

I have a SQL table in which some columns, when viewed in SQL Server Manager, contain <Unable to read data>. Does anyone know how to query for <Unable to read data>? I can individually modify the data in this column with update table set column = NULL where key = 'value', but how can I find whether additional rows exist with this bad data?
I would recommend against replacing the data. There is nothing wrong with it, is just that SSMs cannot display it properly in the Edit panel. The data in the database itself is perfectly fine, from your description.
This script shows the problem:
create table test (id int not null identity(1,1) primary key,
large_value numeric(38,0));
go
insert into test (large_value) values (1);
insert into test (large_value) values (12345678901234567890123456789012345678);
insert into test (large_value) values (1234567890123456789012345678901234567);
insert into test (large_value) values (123456789012345678901234567890123456);
insert into test (large_value) values (12345678901234567890123456789012345);
insert into test (large_value) values (1234567890123456789012345678901234);
insert into test (large_value) values (123456789012345678901234567890123);
insert into test (large_value) values (12345678901234567890123456789012);
insert into test (large_value) values (1234567890123456789012345678901);
insert into test (large_value) values (123456789012345678901234567890);
insert into test (large_value) values (12345678901234567890123456789);
insert into test (large_value) values (NULL);
go
select * from test;
go
The SELECT will work fine, but showing the Edit Top 200 Rows in object explorer will not:
There is a Connect Item for this issue. SSMS 2012 still exhibits the same problem.
If we look at the Numeric and Decimal details we'll see that the problem occurs at a weird boundary, at precision 29 which is actually not a SQL Server boundary (precision 28 is):
Precision Storage bytes
1 - 9 5
10-19 9
20-28 13
29-38 17
If we check the .Net (SSMS is a managed application) decimal precision table we can see quickly where the crux of the issue is: Precision is 28-29 significant digits. So the .Net decimal type cannot map high precision (>29) SQL Server numeric/decimal types.
This will affect not only SSMS display, but your applications as well. Specialized applications like SSIS will use high precisions representation like DT_NUMERIC:
DT_NUMERIC An exact numeric value with a fixed precision and scale.
This data type is a 16-byte unsigned integer with a separate sign, a
scale of 0 - 38, and a maximum precision of 38.
Now back to your problem: you can discover invalid entries by simply looking at the value. Knowing that the C# representation range can accommodate values between approximate (-7.9 x 1028 to 7.9 x 1028) / (100 to 28)` (the range depends on the scale) you can search for values outside the range on each column (the actual values to search between will depend on the column scale). But that begs the question 'what to replace the data with?'.
I would recommend instead using dedicated tools for import export, tools that are capable of handling high precision numeric values. SSIS is the obvious candidate. But even the modest bcp.exe would also fit the bill.
BTW if your values are actually incorrect (ie. true corruption) then I would recommend running DBCC CHECKTABLE (...) WITH DATA_PURITY:
DATA_PURITY
Causes DBCC CHECKDB to check the database for column values that are not valid or out-of-range. For example, DBCC CHECKDB detects
columns with date and time values that are larger than or less than
the acceptable range for the datetime data type; or decimal or
approximate-numeric data type columns with scale or precision values
that are not valid.
For databases created in SQL Server 2005 and later, column-value integrity checks are enabled by default and do not require the
DATA_PURITY option. For databases upgraded from earlier versions of
SQL Server, column-value checks are not enabled by default until DBCC
CHECKDB WITH DATA_PURITY has been run error free on the database.
After this, DBCC CHECKDB checks column-value integrity by default.
Q: How can this issue arise for a datetime column?
use tempdb;
go
create table test(d datetime)
insert into test (d) values (getdate())
select %%physloc%%, * from test;
-- Row is on page 0x9100000001000000
dbcc traceon(3604,-1);
dbcc page(2,1,145,3);
Memory Dump #0x000000003FA1A060
0000000000000000: 10000c00 75f9ff00 6aa00000 010000 ....uùÿ.j .....
Slot 0 Column 1 Offset 0x4 Length 8 Length (physical) 8
dbcc writepage(2,1,145, 100, 8, 0xFFFFFFFFFFFFFFFF)
dbcc checktable('test') with data_purity;
Msg 2570, Level 16, State 3, Line 2 Page (1:145), slot 0 in object ID
837578022, index ID 0, partition ID 2882303763115671552, alloc unit ID
2882303763120062464 (type "In-row data"). Column "d" value is out of
range for data type "datetime". Update column to a legal value.
As suggested above ,these errors usually occurs when Precision and scale are not preserved .If your comfortable with SSIS then you can achieve to get those rows which are corrupt .Taking the values which Martin Smith created
CREATE TABLE T(ID int ,C DECIMAL(38,0));
INSERT INTO T VALUES(1,9999999999999999999999999999999999999)
The above table reproduces the error . Here the first column represents the primary key . I inserted around 1000 rows out of which few were corrupted values . Below is the SSIS package design
In the Data Conversion ,i took the column C which had errors and tried to cast it to Decimal(38,0) .Since a conversion or truncation error will occur ,therefore i redirected the error rows to an OLEDB command which basically updates the table and sets the column to NULL
Update T
Set C=NULL
where ID=?
The value of C and ID will be directed to oledb command .In case if there is no error then i'm just inserting into a table ( Actually no need to do this ).This will work if you have a primary key column in your table .
In case if there is any error in date time column a sql query can be written to verify the format of datetime values .Please go through the MSDN link for valid date time value
Select * from YourTable where ISDATE(Col)!=1
I think you can fetch data with cursor. please try again with cursor query such as below query :
DECLARE VerifyCursor CURSOR FOR
SELECT *
FROM MyTable
WHILE 1=1 BEGIN
BEGIN Try
FETCH FIRST FROM VerifyCursor INTO #Column1, #Column2, ...
INSERT INTO #MyTable2(Column1, Column2,...)
VALUES (#Column1, #Column2, ...)
END TRY
BEGIN CATCH
END CATCH
IF (##FETCH_STATUS<>0) BREAK
End
OPEN VerifyCursor
CLOSE VerifyCursor
DEALLOCATE VerifyCursor
Replacing the bad data is simple with an update:
UPDATE table SET column = NULL WHERE key_column = 'Some value'

XML input getting truncated

I have an xml doc (size: 3.59 mb) with 3765815 total characters in it. My sql server 2008 database table has a column with xml data type. When I try to insert this xml into the column it seems to truncate it.
I thought xml data type can handle 2GB of data. Is this a correct understanding or am i missing something?
Thanks
Here is the query i am using
declare printxml nvarchar(max)
select printxml=cast(inputxml as varchar(max))
from TableA
where SomeKey='<some key>'
print printxml
Select the data directly instead of printing it to the messages window:
SELECT
inputxml
FROM TableA
WHERE SomeKey = '<somekey>'
The caveat is that you have to set up Management Studio to be able to return all the data to the window. You do that using the following option (the default setting is 2MB):

SQL Server Text Datatype Maxlength = 65,535?

Software I'm working with uses a text field to store XML. From my searches online, the text datatype is supposed to hold 2^31 - 1 characters. Currently SQL Server is truncating the XML at 65,535 characters every time. I know this is caused by SQL Server, because if I add a 65,536th character to the column directly in Management Studio, it states that it will not update because characters will be truncated.
Is the max length really 65,535 or could this be because the database was designed in an earlier version of SQL Server (2000) and it's using the legacy text datatype instead of 2005's?
If this is the case, will altering the datatype to Text in SQL Server 2005 fix this issue?
that is a limitation of SSMS not of the text field, but you should use varchar(max) since text is deprecated
Here is also a quick test
create table TestLen (bla text)
insert TestLen values (replicate(convert(varchar(max),'a'), 100000))
select datalength(bla)
from TestLen
Returns 100000 for me
MSSQL 2000 should allow up to 2^31 - 1 characters (non unicode) in a text field, which is over 2 billion. Don't know what's causing this limitation but you might wanna try using varchar(max) or nvarchar(max). These store as many characters but allow also the regular string T-SQL functions (like LEN, SUBSTRING, REPLACE, RTRIM,...).
If you're able to convert the column, you might as well, since the text data type will be removed in a future version of SQL Server. See here.
The recommendation is to use varchar(MAX) or nvarchar(MAX). In your case, you could also use the XML data type, but that may tie you to certain database engines (if that's a consideration).
You should have a look at
XML Support in Microsoft SQL Server
2005
Beginning SQL Server 2005 XML
Programming
So I would rather try to use the data type appropriate for the use. Not make a datatype fit your use from a previous version.
Here's a little script I wrote for getting out all data
SELECT #data = N'huge data';
DECLARE #readSentence NVARCHAR (MAX) = N'';
DECLARE #dataLength INT = ( SELECT LEN (#data));
DECLARE #currIndex INT = 0;
WHILE #data <> #readSentence
BEGIN
DECLARE #temp NVARCHAR (MAX) = N'';
SET #temp = ( SELECT SUBSTRING (#data, #currIndex, 65535));
SELECT #temp;
SET #readSentence += #temp;
SET #currIndex += 65535;
END;

Resources