SQL Server: Find out what row caused the TSQL to fail (SSIS) - sql-server

SQL Server 2005 Question:
I'm working on a data conversion project where I'm taking 80k+ rows and moving them from one table to another. When I run the TSQL, it bombs with various errors having to do with converting types, or whatever. Is there a way to find out what row caused the error?
=====================
UPDATE:
I'm performing an INSERT INTO TABLE1 (...) SELECT ... FROM TABLE2
Table2 is just a bunch of varchar fields where TABLE1 has the right types.
This script will be put into a sproc and executed from an SSIS package. The SSIS package first imports 5 large flat files into TABLE2.
Here is a sample error message: "The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value."
There are many date fields. In TABLE2, there are data values like '02/05/1075' for Birthdate. I want to examine each row that is causing the error, so I can report to the department responsible for the bad data so they can correct it.

This is not the way to do it with SSIS. You should have the data flow from your source, to your destination, with whatever transformations you need in the middle. You'll be able to get error details, and in fact, error rows by using the error output of the destination.
I often send the error output of a destination to another destination - a text file, or a table set up to permit everything, including data that would not have been valid in the real destination.
Actually, if you do this the standard way in SSIS, then data type mismatches should be detected at design time.

What I do is split the rowset in half with a WHERE clause:
INSERT MyTable(id, datecol) SELECT id, datecol FROM OtherTable WHERE ID BETWEEN 0 AND 40,000
and then keep changing the values on the between part of the where clause. I've done this by hand many times, but it occurs to me that you could automate the splitting with a little .Net code in a loop, trapping exceptions and then narrowing it down to just the row throwing the exception, little by little.

I assume you do the update with the INSERT INTO ...
Instead try to do the update with the cursor, use exception handling to catch the error and log all you need: the row number it failed on etc.

Not exactly a cursor, but as effective - I had over 4 million rows to examine with multiple conversion failrues. Here is what I used, and it resulted in a two temp tables one with all my values and assigned rows and one that simply contained a list of rows in the first temp table that failed to convert.
select row_number() over (order by TimeID) as rownum,timeID into #TestingTable from MyTableWithBadData
set nocount on
declare #row as int
declare #last as int
set #row=0
select #last = count(*) from #TestingTable
declare #timeid as decimal(24,0)
create table #fails (rownum int)
while #row<=#last
begin
Begin Try
select #timeid=cast(timeID as decimal(24,0)) from #TestingTable where rownum = #row
end try
begin catch
print cast(#row as varchar(25)) + ' : failed'
insert into #fails(rownum) values(#row)
end catch
set #row = #row+1
end

if you are looping, add prints in the loop.
if you are using set based operations, add a restrictive WHERE condition and run it. Keep running it (each time making it more and more restrictive) until you can find the row in the data. if you could run it for blocks of N rows, then just select out those rows and look at them.
ADD CASE statements to catch the problems (converting that bad value to NULL or whetever) and put a value in a new FlagColumn telling you the type of problem:
CASE WHEN ISNUMERIC(x)!=1 then NULL ELSE x END as x
,CASE WHEN ISNUMERIC(x)!=1 then 'not numeric' else NULL END AS FlagColumn
then select out the new converted data where FlagColumn IS NOT NULL
you could try using select statements with isnumeric() or isdate() functions on the various columns of the source data
EDIT
There are many date fields. In TABLE2,
there are data values like
'02/05/1075' for Birthdate. I want to
examine each row that is causing the
error, so I can report to the
department responsible for the bad
data so they can correct it.
Use this to return all bad date rows:
SELECT * FROM YourTable WHERE ISDATE(YourDateColumn)!=1

If you are working with cursors, yes and is trivial. If you are not working with cursors, I don't think so because SQL operations are ACID, or transactions per se.

John Sauders has the right idea, there are better ways to do this kind of processing using SSIS. However, learning SSIS and redoing your package to completely change the process may not be an option at this time, so I offer this advice. You appear to be having trouble with the dates being incorrect. So first run a query to identify those records which are bad and insert them into an execptions table. Then do you insert only of those records that are left. Something like:
insert exceptiontable (field1, field2)
select field1, field2 from table2 where isdate(field2) = 0
insert table1 (field1, field2)
select field1, field2 from table2 where isdate(field2) = 1
Then of course you can send the contents of the exception table to the people who provide the bad data.

Related

Use Sybase triggers to write dynamic statement using all old and new values for creating your own replication transaction statement log?

PROBLEM SUMMARY
I have to write I/U/D-statement-generating-triggers for a bucardo/symmetricDS-inspired homemade bidirectional replication system between Sybase ADS and Postgresql 11 groups of nodes, using BEFORE triggers on any Postgresql and Sybase DB that creates Insert/Update/Delete commands based on the command entered in a replicating source table: e.g. an INSERT INTO PERSON (first_name,last_name,gender,age,ethnicity) Values ('John','Doe','M',42,'C') and manipulate them into a corresponding Insert statement, and UPDATE by getting OLD and NEW values to dynamically make an UPDATE statement, along with getting OLD values to make a DELETE command, all to run per command on a destination at some interval.
I know this is difficult and no one does this but it is for a job and I have no other options and can't object to offer a different solution. I have no other teammates or human resources to help outside of SO and something like Codementors, which was not so helpful. My idea/strategy is to copy parts of bucardo/SymmetricDS when inserting OLD and NEW values for generating a statement/command to run on the destination. Right now, I am snapshotting the whole table to a CSV as opposed to doing by individual command, but by command and looping through table that generates and saves commands will make the job much easier.
One big issue is that they come from Sybase ADS and have a mixed Key/Index structure (many tables have NO PK) and are mirroring that in Postgresql, so I am trying to write PK-less statements, or all-column commands to get around the no-pk tables. They also will only replicate certain columns for certain tables, so I have a column in a table for them to insert the column names delimited by ';' and then split it out into an array and link the column names to the values for each statement to generate a full command for I/U/D, Hopefully. I am open to other strategies but this is a big solo project and I have gone at it many ways with much difficulty.
I mostly come from DBA background and have some programming experience with the fundamentals, so I am mostly pseudocoding each major sequence,googling for syntax by part, and adjusting as I go or encounter a language incapability. I am thankful for any help given, as I am getting a bit desperate and discouraged.
WHAT I HAVE TRIED
I have to do this for Sybase ADS and Postgresql but this question is intially over ADS since it's more challenging and older.
To have one "Log" table which tracks row changes for each of the replicating tables and records and ultimately dynamically generates a command is the goal for both platforms. I am trying to make trigger statements like:
CREATE TRIGGER PERSON_INSERT
ON PERSON
BEFORE
INSERT
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'INSERT','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
END;
CREATE TRIGGER PERSON_UPDATE
ON PERSON
BEFORE
UPDATE
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'U','UPDATE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
UPDATE Backlog SET OldValues=select ''first_name';'last_name';'gender';'age';'ethnicity'' from __old where SourceTableID=select ID from __old;
END;
CREATE TRIGGER PERSON_DELETE
ON PERSON
BEFORE
DELETE
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, OldValues) select ID, 'D','DELETE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __old;
END;
but I would like the "''first_name';'last_name';'gender';'age';'ethnicity''" to come from another table as a value to make it dynamic since multiple tables will write their value and statement info to the single log table. Then, it can be made into a variable and then probably split to link to the corresponding values so the IUD statements can be made which will be executed on the destination one at a time.
ATTEMPTED INCOMPLETE SAMPLE TRIGGER CODE
CREATE TRIGGER PERSON_INSERT
ON PERSON
BEFORE
INSERT
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,';') into array to correspond to new and old VALUES
--#NewValues=#['#Columns='+NEW.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'INSERT','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
END;
CREATE TRIGGER PERSON_UPDATE
ON PERSON
BEFORE
UPDATE
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,';') into array to correspond to new and old VALUES
--#NewValues=#['#Columns='+NEW.#Columns+'']
--#OldValues=#['#Columns='+OLD.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'U','UPDATE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
UPDATE Backlog SET OldValues=select ''first_name';'last_name';'gender';'age';'ethnicity'' from __old where SourceTableID=select ID from __old;
END;
CREATE TRIGGER PERSON_DELETE
ON PERSON
BEFORE
DELETE
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,',') into array to correspond to new and old VALUES
--#OldValues=#['#Columns='+OLD.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, OldValues) select ID, 'D','DELETE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __old;
END;
CONCLUSION
For each row inserted,updated, or deleted; in a COMMAND column in the log table, I am trying to generate a corresponding 'INSERT INTO PERSON ('+#Columns+') VALUES ('+#NewValues+')' type statement, or an UPDATE or DELETE. Then a Foreach service will run each command value ordered by create time, as the main replication service.
To be clear, I am trying to make my sample code trigger write all old values and new values to a column in a dynamic way without hardcoding the columns in each trigger since it will be used for multiple tables, and writing the values into a single column delimited by a comma or semicolon.
An even bigger wish or goal behind this is to find a way to save/script each IUD command and then be able to run them on subscriber server.DBs of postgresql and Sybase platform, therefore making my own replication from a log
It is a complex but solvable problem that would take time and careful planning to write. I think what you are looking for is the "Execute Immediate" command in ADS SQL syntax. With this command you can create a dynamic statement to then be executed once construction of the SQL statement is terminated. Save each desired column value to a temp table by carefully constructing the statement as a string and then execute it with Execute Immediate. For example:
DECLARE TableColumns Cursor ;
DECLARE FldName Char(100) ;
...
OPEN TableColumns AS SELECT *
FROM system.columns
WHERE parent = #cTableName
AND field_type < 21 //ADS_ROWVERSION
AND field_type <> 6 //ADS_BINARY
AND field_type <> 7; //ADS_IMAGE
While Fetch TableColumns DO
FldName = Trim( TableColumns.Name) ;
StrSql = 'SELECT New.[' + Trim( FldName ) + '] newVal' +
'INTO #myTmpTable FROM ___New n' ;
After constructing the statement as a string it can then be executed like this:
EXECUTE IMMEDIATE STRSQL ;
You can pickup old and new values from __old and __new temp tables that are always available to triggers. Insert values into temp table myTmpTable and then use it to update the target. Remember to drop myTmpTable at the end.
Furthermore, I would think you can create a function on the DD that can actually be called from each trigger on the tables you want to keep track of instead of writing a long trigger for each table and cTableName can be a parameter sent to the function. That would make maintenance a little easier.

SQL Server System.OutOfMemoryException

I am generating XML from 60 tables, and storing this xml in a table.
Table Name : Final_XML_Table
PK FK XML_Content (type xml)
1 1 "XML that I am generating from 60 tables"
When I am running below query , it gives memory exception :
Select * from Final_XML_Table
Things I have tried :
1. Results to text : I am getting only few lines from XML as text in output window
2. Results to file : I am getting only few lines from XML in file.
Please suggest, and also if there is any change , will I have to do this on server's SQL server as well while deployment.
I have also set XML_Data to unlimited :
This is not an answer, but to much for a comment...
The fact, that you are able to store the XML, shows clearly, that the XML is not to big for the database.
The fact that you get an out-of-memory exception with Select * from Final_XML_Table shows clearly, that SSMS has a problem on reading/displaying your XML.
You might try to check the length like here:
DECLARE #tbl TABLE (x XML);
INSERT INTO #tbl VALUES('<root><test>blah</test><test /><test2><x/></test2></root>');
SELECT * FROM #tbl; --This does not work for you
SELECT DATALENGTH(x) FROM #tbl; --This returns just "82" in this case
Might be, that due to a logical error in your XML's creation (a wrong join?) the XML contains multiple/repeated elements. You might try a query like this to get a count of nodes in order to check if this number is realistic:
SELECT x.value('count(//*)','int') FROM #tbl
For the exampe above this returns "5"
You might do the same with your original XML.
With a query like the following you can retrieve all node names of the first level, the second level and so on. You can check if this looks okay:
SELECT firstLevel.value('local-name(.)','varchar(max)') AS l1_node
,SecondLevel.value('local-name(.)','varchar(max)') AS l2_node
--add more
FROM #tbl
OUTER APPLY x.nodes('/*') AS A(firstLevel)
OUTER APPLY A.firstLevel.nodes('*') AS B(SecondLevel)
--add more
And - of course - you might open the ResourceMonitor to look at the actual usage of memory...
Come back with more details...
That error isn't a SQL Server error, it's from SSMS. It means that SSMS has run out of memory.
SSMS is only a 32bit application, so can only address 2GB of RAM. If it tries to address more than that, the error will occur. if you've had SSMS open and returned some very large datasets, that RAM is going to get used up.
In all honesty, if you're running a query like SELECT * FROM Final_XML_Table then I would hazard a guess that the dataset is huge. Add a WHERE clause, or don't return the dataset on screen. if you really need to view the data (all of it), export it to something else. But I very much doubt you need to look at every row, if you're returning around 2GB of data.

SQL Conversion failed when converting the varchar value '*' to data type int

Aside: Please note that this is not a duplicate of the countless other string-to-int issues in sql. This is a specific cryptic error message (with the asterisk) that hides another problem
Today I came across an issue in my SQL that took me all day to solve. I was passing a data table parameter into my stored procedure and then inserting part of it into another table. The code used was similar to the following:
INSERT INTO tblUsers
(UserId,ProjectId)
VALUES ((SELECT CAST(CL.UserId AS int) FROM #UserList AS UL),#ProjectId)
I didn't seem to get the error message when using test data, only when making the call from the dev system.
What is the issue and how should the code look?
I could bet that (SELECT CAST(CL.UserId AS int) FROM #UserList AS UL) returns more than 1 row and your test scenario had only 1 row. But that may be just me.
Anyway, the way the code should look is:
INSERT INTO tblUsers (UserId,ProjectId)
SELECT CAST(CL.UserId AS int),
#ProjectId
FROM #UserList AS UL
After some time of trawling through google and such places, I have determined that this SQL code is wrong. I believe the correct code is something more like this:
INSERT INTO tblUsers
(UserId,ProjectId)
SELECT CAST(CL.UserId AS int) ,#ProjectId
FROM #UserList AS UL
The issue is that the other way of doing it attempts to insert one record with the data of all of the rows in the table parameter. I needed to remove the VALUES statement. Also, in order to add other data, I can simply put that as part of the select as you would otherwise

error when insert into linked server

I want to insert some data on the local server into a remote server, and used the following sql:
select * into linkservername.mydbname.dbo.test from localdbname.dbo.test
But it throws the following error
The object name 'linkservername.mydbname.dbo.test' contains more than the maximum number of prefixes. The maximum is 2.
How can I do that?
I don't think the new table created with the INTO clause supports 4 part names.
You would need to create the table first, then use INSERT..SELECT to populate it.
(See note in Arguments section on MSDN: reference)
The SELECT...INTO [new_table_name] statement supports a maximum of 2 prefixes: [database].[schema].[table]
NOTE: it is more performant to pull the data across the link using SELECT INTO vs. pushing it across using INSERT INTO:
SELECT INTO is minimally logged.
SELECT INTO does not implicitly start a distributed transaction, typically.
I say typically, in point #2, because in most scenarios a distributed transaction is not created implicitly when using SELECT INTO. If a profiler trace tells you SQL Server is still implicitly creating a distributed transaction, you can SELECT INTO a temp table first, to prevent the implicit distributed transaction, then move the data into your target table from the temp table.
Push vs. Pull Example
In this example we are copying data from [server_a] to [server_b] across a link. This example assumes query execution is possible from both servers:
Push
Instead of connecting to [server_a] and pushing the data to [server_b]:
INSERT INTO [server_b].[database].[schema].[table]
SELECT * FROM [database].[schema].[table]
Pull
Connect to [server_b] and pull the data from [server_a]:
SELECT * INTO [database].[schema].[table]
FROM [server_a].[database].[schema].[table]
I've been struggling with this for the last hour.
I now realise that using the syntax
SELECT orderid, orderdate, empid, custid
INTO [linkedserver].[database].[dbo].[table]
FROM Sales.Orders;
does not work with linked servers. You have to go onto your linked server and manually create the table first, then use the following syntax:
INSERT INTO [linkedserver].[database].[dbo].[table]
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = 'UK';
I've experienced the same issue and I've performed the following workaround:
If you are able to log on to remote server where you want to insert data with MSSQL or sqlcmd and rebuild your query vice-versa:
so from:
SELECT * INTO linkservername.mydbname.dbo.test
FROM localdbname.dbo.test
to the following:
SELECT * INTO localdbname.dbo.test
FROM linkservername.mydbname.dbo.test
In my situation it works well.
#2Toad: For sure INSERT INTO is better / more efficient. However for small queries and quick operation SELECT * INTO is more flexible because it creates the table on-the-fly and insert your data immediately, whereas INSERT INTO requires creating a table (auto-ident options and so on) before you carry out your insert operation.
I may be late to the party, but this was the first post I saw when I searched for the 4 part table name insert issue to a linked server. After reading this and a few more posts, I was able to accomplish this by using EXEC with the "AT" argument (for SQL2008+) so that the query is run from the linked server. For example, I had to insert 4M records to a pseudo-temp table on another server, and doing an INSERT-SELECT FROM statement took 10+ minutes. But changing it to the following SELECT-INTO statement, which allows the 4 part table name in the FROM clause, does it in mere seconds (less than 10 seconds in my case).
EXEC ('USE MyDatabase;
BEGIN TRY DROP TABLE TempID3 END TRY BEGIN CATCH END CATCH;
SELECT Field1, Field2, Field3
INTO TempID3
FROM SourceServer.SourceDatabase.dbo.SourceTable;') AT [DestinationServer]
GO
The query is run on DestinationServer, changes to right database, ensures the table does not already exist, and selects from the SourceServer. Minimally logged, and no fuss. This information may already out there somewhere, but I hope it helps anyone searching for similar issues.

SQL Server 2000: search through out database

Some how some records in my table are getting updated with value of xyz in a certain column. Out of hundred of stored procedures, functions, triggers, how can I determine which code is doing this action. Is there a way to search through the database some how through each and every script of the code?
Please help.
One approach is to check syscomments
Contains entries for each view, rule,
default, trigger, CHECK constraint,
DEFAULT constraint, and stored
procedure within the database. The
text column contains the original SQL
definition statements..
e.g. select text from syscomments
If you are having trouble finding that literal string, the values could be coming from a table, or they could be being concatenated within a routine.
Try this
Select text from syscomments
where CharIndex('x', text) > 0
and CharIndex('y', text) > 0
and CharIndex('z', text) > 0
That might help you either find the right routine, or further indicate that the values are coming from a table.
This is going to be nearly impossible to do in SQL Server 2000 because the update might very well be from a variable that has that value or a join to another table that has that value and not hard-coded into the stored proc, trigger etc. The update could also be coming from a DTS package, a job, a piece of dynamic code run by the app or even from query analyzer, so the code itself may not be recorded inthe datbase anywhere.
Perhaps a better approach might be to create an audit table for the table in question and have it record the user and the code from the spid that generated the change as well as the old and new values. You'll have to wait until it happens again, but then you would know exactly what changed the value and what value to put it back to if need be.
Alternatively you could run profiler on the system until it happens but profiler tends to hurt performance and is not usually a good idea to run on a production system. If it is happening very often, it might be an acceptable alternative.
Here's a hint as to how you might get some of the info you want for the eventual trigger code you write:
create table #temp (eventtype nvarchar (1000), parameters int, eventinfo nvarchar (4000), myspid int)
declare #myspid int
select #myspid =##spid
insert #temp (eventtype,parameters, eventinfo)
exec ('dbcc inputbuffer (##spid)')
update #temp
set myspid = #myspid
select hostname, program_name, eventinfo
from #temp t
join sysprocesses s on t.myspid = s.spid
WHERE spid = #myspid
You might use sql-profiler to trac the update of a given table / column.

Resources