I have over a million rows in an SQLServer2005 database, with a text column that contains XML strings. I want to cast the text to the XML datatype in order to extract parts of the data.
The problem is there are some records that will throw errors when casting (ie. invalid XML). How can I ignore these errors so that all the valid XML is casted correctly and invalid XML is stored as null?
Once in a similar situation I added the XML column to the same table as the Text column. Then I used a RBAR process to attempt to copy the "XML" from the text column to the new XML column (not the fastest but commits single writes and this will be a one time thing, right?). This is assuming your table has a PK of an int data type.
declare #minid int, #maxid int;
select #minid=min(ID), #maxid=max(ID) from XMLTable;
while #minid <= #maxid
begin
begin try
update t
set XMLColumn = cast(TextColumn as XML)
from XMLTable t
where ID = #minid;
set #minid = #minid+1
end try
begin catch
print('XML transform failed on record ID:'+cast(#minid as varchar))
--advance to the next record
set #minid = #minid+1
end catch
end
I know this is SQL Server 2012+ functionality but since this question is the top Google result here it is:
SELECT
COALESCE(TRY_CONVERT(xml, '</bad xml>'), 'InvalidXML')
You can find the documentation here: TRY_CONVERT (Transact-SQL)
Related
There are ~10 different subquestions that could be answered here, but the main question is in the title. TLDR version: I have a table like the example below and I want to replace all double quote marks across the whole table. Is there a simple way to do this?
My solution using cursor seems fairly straightforward. I know there's some CURSOR hatred in the SQL Server community (bad runtime?). At what point (num rows and/or num columns) would CURSOR stink at this?
Create Reproducible Example Table
DROP TABLE IF EXISTS #example;
CREATE TABLE #example (
NumCol INT
,CharCol NVARCHAR(20)
,DateCol NVARCHAR(100)
);
INSERT INTO #example VALUES
(1, '"commas, terrible"', '"2021-01-01 20:15:57,2021:04-08 19:40:50"'),
(2, '"loadsrc,.txt"', '2020-01-01 00:00:05'),
(3, '".txt,from.csv"','1/8/2021 10:14')
Right now, my identified solutions are:
Manually update for each column UPDATE X SET CharCol = REPLACE(CharCol, '"',''). Horribly annoying to do at any more than 2 columns IMO.
Use a CURSOR to update (similar to annoyingly complicated looking solution at SQL Server- SQL Replace on all columns in all tables across an entire DB
REPLACE character using CURSOR
This gets a little convoluted with all the cursor-related script, but seems to work well otherwise.
-- declare variable to store colnames, cursor to filter through list, string for dynamic sql code
DECLARE #colname VARCHAR(10)
,#sql VARCHAR(MAX)
,#namecursor CURSOR;
-- run cursor and set colnames and update table
SET #namecursor = CURSOR FOR SELECT ColName FROM #colnames
OPEN #namecursor;
FETCH NEXT FROM #namecursor INTO #colname;
WHILE (##FETCH_STATUS <> -1) -- alt: WHILE ##FETCH_STATUS = 0
BEGIN;
SET #sql = 'UPDATE #example SET '+#colname+' = REPLACE('+#colname+', ''"'','''')'
EXEC(#sql); -- parentheses VERY important: EXEC(sql-as-string) NOT EXEC storedprocedure
FETCH NEXT FROM #namecursor INTO #colname;
END;
CLOSE #namecursor;
DEALLOCATE #namecursor;
GO
-- see results
SELECT * FROM #example
Subquestion: While I've seen it in our database elsewhere, for this particular example I'm opening a .csv file in Excel and exporting it as tab delimited. Is there a way to change the settings to export without the double quotes? If I remember correctly, BULK INSERT doesn't have a way to handle that or a way to handle importing a csv file with extra commas.
And yes, I'm going to pretend that I'm fine that there's a list of datetimes in the date column (necessitating varchar data type).
Why not just dynamically build the SQL?
Presumably it's a one-time task you'd be doing so just run the below for your table, paste into SSMS and run. But if not you could build an automated process to execute it - better of course to properly sanitize when inserting the data though!
select
'update <table> set ' +
String_Agg(QuoteName(COLUMN_NAME) + '=Replace(' + QuoteName(column_name) + ',''"'','''')',',')
from INFORMATION_SCHEMA.COLUMNS
where table_name='<table>' and TABLE_SCHEMA='<schema>' and data_type in ('varchar','nvarchar')
example DB<>Fiddle
You might try this approach, not fast, but easy to type (or generate).
SELECT NumCol = y.value('(NumCol/text())[1]','int')
,CharCol = y.value('(CharCol/text())[1]','nvarchar(100)')
,DateCol = y.value('(DateCol/text())[1]','nvarchar(100)')
FROM #example e
CROSS APPLY(SELECT e.* FOR XML PATH('')) A(x)
CROSS APPLY(SELECT CAST(REPLACE(A.x,'"','') AS XML)) B(y);
The idea in short:
The first APPLY will transform all columns to a root-less XML.
Without using ,TYPE this will be of type nvarchar(max) implicitly
The second APPLY will first replace any " in the whole text (which is one row actually) and cast this to XML.
The SELECT uses .value to fetch the values type-safe from the XML.
Update: Just add INTO dbo.SomeNotExistingTableName right before FROM to create a new table with this data. This looks better than updating the existing table (might be a #-table too). I'd see this as a staging environment...
Good luck, messy data is always a pain in the neck :-)
How can i insert into TSQL text field containing xml.
I can create custom fields in one of my Application which uses MSSQL as a back-end. When i create those custom fields, all go to a single field called fldxml in a table called MIITEM. I want to write Insert and update statement but i don't know how to insert record into fldxml field between <field></field>
<field1></field1> is custFld1( Custom Field1)
<field2></field2> is custFld2( Custom Field2)
<field3></field3> is custFld3( Custom Field3)
<field4></field4> is custFld4( Custom Field4)
here is how the data looks like in the field
<fields><field3>PFB652S6</field3><field1></field1><field2></field2><field4></field4></fields>
here is the Data Type
Indeed you should not use the TEXT datatype. For this purpose, use XML instead.
Regardless of the datatype, you can modify XML in TSQL by using the TSQL XML DML functionality. This makes it possible to write DML statements like INSERT, MODIFY, DELETE to modify XML documents.
Below is an example demonstrating this on your document:
-- First declare a variable of type XML
DECLARE #fields xml;
-- Here are the values we will be manipulating
DECLARE #nodeToReplace VARCHAR(MAX),#newValue VARCHAR(MAX)
SET #nodeToReplace = 'field3'
SET #newValue = 'PFB652S6'
-- Then fetch the value from the database. Insert the correct where clause
SELECT #fields=CAST(fldxml AS XML) FROM MIITEM WHERE .....
-- Now #fieds will contain your XML
SELECT #fields AS OldValue;
-- When the value of the node is empty, you have to insert the text node as follows.
SET #fields.modify('
insert text {sql:variable("#newValue")} as last into (/fields/*[ local-name()=sql:variable("#nodeToReplace") ])[1]
');
SELECT #fields AS NewInsertedValue;
-- When the value is present already, a slightly different syntax must be used to update it
SET #newValue = 'BLABLA'
SET #fields.modify('
replace value of (/fields/*[local-name()=sql:variable("#nodeToReplace")][1]/text())[1]
with sql:variable("#newValue")
');
SELECT #fields AS NewUpdatedValue;
Feel free to let me know if this sufficiently answers your questions. I could provide more specific help if needed.
So i have a script to update/Insert the XML value of the following Node to True:
<Submitted>False</Submitted>
The issue is not all rows will contain the Node and because of this, it throws the error: "Mutator 'modify()' on '#temp' cannot be called on a null value."
What do i need to do to filter out the rows which do not contain the "Submitted" Node within the XML?
**Note, i have all these crazy CASTS because the column type is TEXT and cannot be changed because the client originally set it up that way.
DECLARE #temp XML
SELECT
#temp = CAST(CAST(TicorOregon..tbl_Module_RequestForms_Items.XML AS NTEXT) AS XML)
FROM
TicorOregon..tbl_Module_RequestForms_Items
WHERE
CAST(CAST(TicorOregon..tbl_Module_RequestForms_Items.XML AS NTEXT) AS XML).value('(//Record/Submitted)[1]', 'NVARCHAR(max)') <> 'True'
-- modification to local XML var
SET
#temp.modify('replace value of (//Record/Submitted[1]/text())[1] with "True"')
-- write it back into the table as TEXT column
UPDATE
TicorOregon..tbl_Module_RequestForms_Items
SET
XML = CAST(CAST(#temp AS VARCHAR(MAX)) AS TEXT)
WHERE
CAST(CAST(TicorOregon..tbl_Module_RequestForms_Items.XML AS NTEXT) AS XML).value('(//Record/Submitted)[1]', 'NVARCHAR(max)') <> 'True'
AND CAST(CAST(TicorOregon..tbl_Module_RequestForms_Items.XML AS NTEXT) AS XML).value('(//Record/Submitted)[1]', 'NVARCHAR(max)') <> null
Test your XML variable for null before trying to update.
if #temp is not null
begin
-- modification to local XML var
SET #temp.modify ----
-- write it back into the table as TEXT column
SET #temp.modify....
end
Note: You might have trouble with this code if there are more than one row having <Submitted>False</Submitted>. You will have the XML from one row in #temp (probably the last one according to some index) but you will update all rows where <Submitted>False</Submitted> with that XML.
I been going through this tutorial
http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
and them make a SP like this
CREATE PROCEDURE [dbo].[spTEST_InsertXMLTEST_TEST](#UpdatedProdData nText)
AS
DECLARE #hDoc int
exec sp_xml_preparedocument #hDoc OUTPUT,#UpdatedProdData
INSERT INTO TBL_TEST_TEST(NAME)
SELECT XMLProdTable.NAME
FROM OPENXML(#hDoc, 'ArrayOfTBL_TEST_TEST/TBL_TEST_TEST', 2)
WITH (
ID Int,
NAME varchar(100)
) XMLProdTable
EXEC sp_xml_removedocument #hDoc
Now my requirements require me to mass insert and mass update one after another. So first I am wondering can I merge those into one SP? I am not sure how it works with this OPENXML but I would think it would just be making sure that the XPath is right.
Next what happens while it would be running this combined SP and something goes wrong. Would it roll back all the records or just stop and the records that happened before this event that crashed it would be inserted?
A transaction is atomic. Either all inserted records are commited, either all are rolled back. A statement will always do the updates as part of a transaction. So this INSERT is either all going to commit, or is going to rollback and no row att all is going to be inserted.
In SQL 2005 you should avoid using NTEXT types and OPENXML. They inneficient and NTEXT is actually deprecated, and there are much better alternatives:
use XML datatype instead of NTEXT
use the XML data type methods instead of OPENXML:
.
create procedure usp_insertxml (#data xml)
as
begin
insert into table (id, name)
select x.value('ID', 'INT'),
x.value('NAME', 'varchar(100)')
from #data.nodes('ArrayOfTBL_TEST_TEST/TBL_TEST_TEST') t(x);
end
We have two columns in a database which is currently of type varchar(16). Thing is, it contains numbers and always will contain numbers. We therefore want to change its type to integer. But the problem is that it of course already contains data.
Is there any way we can change the type of that column from varchar to int, and not lose all those numbers that are already in there? Hopefully some sort of sql we can just run, without having to create temporary columns and create a C# program or something to do the conversion and so forth... I imagine it could be pretty easy if SQL Server have some function for converting strings to numbers, but I am very unstable on SQL. Pretty much only work with C# and access the database through LINQ to SQL.
Note: Yes, making the column a varchar in the first place was not a very good idea, but that is unfortunately the way they did it.
The only reliable way to do this will be using a temporary table, but it will not be much SQL:
select * into #tmp from bad_table
truncate table bad_table
alter bad_table alter column silly_column int
insert bad_table
select cast(silly_column as int), other_columns
from #tmp
drop table #tmp
The easiest way to do this is:
alter table myTable alter column vColumn int;
This will work as long as
all of the data will fit inside an int
all of the data can be converted to int (i.e. a value of "car" will fail)
there are no indexes that include vColumn. If there are indexes, you will need to include a drop and create for them to get back to where you were.
Just change the datatype in SQL Server Management Studio.
(You may need to go to menu Tools → Options → Designers, and disable the option that prevents saving changes that re-create the table.)
I totally appreciate the previous answers, but also thought a more complete answer would be helpful to other searchers...
There are a couple caveats that would be helpful if you making the changes on a production type table.
If you have an identity column defined on the table you will have to set IDENTITY_INSERT on and off around the re-insert of data. You will also have to use an explicit column list.
If you want to be sure of not killing data in the database, use TRANSACTIONS around the truncate/alter/reinsert process
If you have a lot of data, then trying to just make the change in SQ Server Management Studio could fail with a timeout and you could lose data.
To expand the answer that #cjk gave, look at the following:
Note: 'tuc' is just a placeholder in this script for the real tablename
begin try
begin transaction
print 'Selecting Data...'
select * into #tmp_tuc from tuc
print 'Truncating Table...'
truncate table tuc
alter table tuc alter column {someColumnName} {someDataType} [not null]
... Repeat above until done
print 'Reinserting data...'
set identity_insert tuc on
insert tuc (
<Explicit column list (all columns in table)>
)
select
<Explicit column list (all columns in table - same order as above)>
from #tmp_tuc
set identity_insert tuc off
drop table #tmp_tuc
commit
print 'Successful!'
end try
begin catch
print 'Error - Rollback'
if ##trancount > 0
rollback
declare #ErrMsg nvarchar(4000), #ErrSeverity int
select #ErrMsg = ERROR_MESSAGE(), #ErrSeverity = ERROR_SEVERITY()
set identity_insert tuc off
RAISERROR(#ErrMsg, #ErrSeverity, 1)
end catch