SQL 2000 'TRY CATCH like' Error Handling - sql-server

This is a sql 2000 database that I am working with.
I have what I call a staging table that is a raw data dump of data, so everything is ntext or nvarchar(255).
I need to cast/convert all of this data into the appropriate data types (ie int, decimal, nvarchar, etc.)
The way I was going to do this was to iterate through all records using a while loop and attempt a CAST on each column on a single record during each iteration, after I visit a particular record I flag it as processed (bit field).
But how can I log the error when/if it occurs but allow the while loop to continue.
At first I implemented this using a TRY CATCH in a local SQL 2005 instance (to get the project going) and all was working well, but i learned today that the dev & production database that the international DBA's have set up is a SQL 2000 instance so I have to conform.
EDIT: I am using a SSIS package to populate the staging table. I see that now I must revisit that package and implement a script component to handle the conversions. Thanks guys
EDIT: I am doing this on a record by record basis, not a batch insert, so the transaction idea seems like it would be feasible but I'm not sure how to trap ##ERROR and allow the stored procedure to continue.
EDIT: I really like Guy's approach, I am going to implement it this way.

Generally I don't like "loop through the record" solutions as they tend to be slow and you end up writing a lot of custom code.
So...
Depending on how many records are in your staging table, you could post process the data with a series of SQL statements that test the columns for correctness and mark any records that fail the test.
i.e.
UPDATE staging_table
SET status_code = 'FAIL_TEST_1'
WHERE status_code IS NULL
AND ISDATE(ntext_column1) = 0;
UPDATE staging_table
SET status_code = 'FAIL_TEST_2'
WHERE status_code IS NULL
AND ISNUMERIC(ntext_column2) = 0;
etc...
Finally
INSERT INTO results_table ( mydate, myprice )
SELECT ntext_column1 AS mydate, ntext_column2 AS myprice
FROM staging_table
WHERE status_code IS NULL;
DELETE FROM staging_table
WHERE status_code IS NULL;
And the staging table has all the errors, that you can export and report out.

What are you using to import the file? DTS has scripting abilities that can be used for data validation. If your not using DTS are you using a custom tool? If so do your validation there.
But i think this is what your looking for.
http://www.sqlteam.com/article/using-dts-to-automate-a-data-import-process
IF ##Error <> 0
GOTO LABEL
#op
In SSIS the "red line" from a data import task can redirect bad rows to a separate destination or transform. I haven't played with it in a while but hope it helps.

Run each cast in a transaction, after each cast, check ##ERROR, if its clear, commit and move on.

It looks like you are doomed. See this document.
TL/DR: A data conversion error always causes the whole batch to be aborted - your sql script will not continue to execute no matter what you do. Transactions won't help. You can't check ##ERROR because execution will already have aborted.
I would first reexamine why you need a staging database full of varchar(255) columns - can whatever fills that database do the conversion?
If not, I guess you'll need to write a program/script to select from the varchar columns, convert, and insert into the prod db.

You could try checking for the data type before casting and actually avoid throwing errors.
You could use functions like:
ISNUM - to check if the data is of a numeric type
ISDATE - to check if it can be cast to DATETIME

Related

What is the fastest way of copying a data from a DataWindow/DataStore to SQL Server table using Powerbuilder

We have a datastore (powerbuilder datawindow's twin sister) that contains over 40.000 rows, which takes more than 30 minutes to insert into a Microsoft SQL Server table.
Currently, I am using a script generator that generates the sql table definition and an insert command for each row. At the end, the full script to sql server for execution.
I have already found that script generation process consumes more than 97% of the whole task.
Could you please help me finding a more efficient way of copying my client's data to sql server table?
Edit1 (after NoazDad's comments):
Before answer, please bear in mind that:
Tabel structure is dynamic;
I am trying to avoid using datastore.Update() method;
Not sure it would be faster but you could save the data from the datastore in a tab delimited file then do a BULK INSERT via Sql. Something like
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
You can try saving the datastore contents into a string variable via ds.object.datawindow.data syntax then save that to a file then execute the SQL.
The way I read this, you're saying that the table that the data is being inserted into doesn't even exist in the schema until the user presses "GO" and initiates the script? And then you create embedded SQL statements that create the table, and insert rows 1 by 1 in a loop?
That's... Well, let's just say I wouldn't do it this way.
Do you not have any idea what the schema will look like ahead of time? If you do, then paint the datastore against that table, and use ds_1.Update() to generate the INSERT statements. Use the datawindow for what it's good for.
If that's not possible, and you must use embedded SQL, then at least perform a COMMIT every 1000 rows or so. Otherwise, SQLServer is building up UNDO logs against the table, in case something goes wrong and they have to be rolled back.
Other ideas...
Disable triggers on the updated table while it is being updated (if possible)
Use the PB Pipeline object, it has settings for commit- might be faster but not much.
Best idea. Do something on the server side. I'd try to create SQL statements for your 40K inserts, and call a stored procedure sending all 40k insert/update statements and let the stored procedure handle the inserts/updates.
Create a dummy table with a few columns, one being a long text, update it with a block of SQL statements like mentioned in last idea and have a process that delimits and executes the sql statements.
Some variant of above but using bulk insert as mentioned by Matt. Bulk insert is the fastest way to insert many rows.
Maybe try something with autocommit so that you commit only at the end, or every 10k rows as mentioned by someone already.
PB has an async option in the transaction object (connection) maybe you could let the update go in the background and let the user continue. This doesn't work with all databases and may not work in your situation. I haven't had much luck using async option.
The reason your process is so slow is that PB does each update separately, so you are hitting the network and database constantly. There may be triggers on the update table and those are getting hammered too. Slamming them in on the server eliminates network lag and is much faster. Using bulk load is ever faster yet because it doesn't run triggers and eliminates a lot of the database management overhead.
Expanding on the idea of sending SQL statements to a procedure, you can create the sql very easily by doing a dw_1.saveas( SQL! ) (syntax is not right) and send it to the server all at once. Let the server parse it and run the SQL.
Send something like this to the server via procedure, it should update pretty fast as it is only one statement:
Update TABLE set (col1, col2) values ('a', 'b')|Update TABLE set (col1, col2) values ('a', 'b')|Update TABLE set (col1, col2) values ('a', 'b')
In procedure:
Parse the sql statements, and run them. Easy peasy.
While Matt's answer is probably best, I have another option. (Options are good, right?)
I'm not sure why you're avoiding the datastore.Update() method. I'm assuming it's because the schema doesn't exist at the time of the update. If that's the only reason, it can still be used, thus eliminating 40,000 instances of string manipulation to generate valid SQL.
To do it, you would first create the table. Then, you would use datastore.SyntaxFromSQL() to create a datastore that's bound to the table. It might take a couple of Modify() statements to make the datastore update-able. Then you'd move the data from your original datastore to the update-able, bound datastore. (Look at RowsMove() or dot notation.) After that, an Update() statement generates all of your SQL without the overhead of string parsing and looping.

SQL Capture BULK INSERT error 4863

I have a bulk insert inside a try - catch block:
BEGIN TRY
BULK INSERT dbo.EQUIP_STATUS_CODE
FROM 'filepath\filename.csv'
WITH ( MAXERRORS = 1, FIELDTERMINATOR = ',')
END TRY
BEGIN CATCH
EXECUTE dbo.ERROR_LOG_CSV;
END CATCH
I would like to be able to capture the following error when it occurs:
Bulk load data conversion error (truncation)
But it seems that I can't, even though the level is 16 which falls within the try-catch range. I was wondering if there is a way to capture this error when it occurs.
Before I specified the MAXERRORS to 1 I got this error:
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
Since the former error is much more descriptive to the problem, that is the one I'd like to record.
Though my competence is more Oracle than SQL Server, anyway I'll try to help somehow with this issue. I discovered that your situation is already in the bugtracker of SQL Server (bug id: 592960) with status "Won't fix" since 2010. You can see the corresponding discussion on connect.microsoft.com yourself (on the present moment host is unreachable so I used google cache).
Alexander has given you the answer but you have to read bug log very carefully and consider what might be going on. SQL Server (bug id: 592960)
You are trying to bulk insert directly from a data file to a data table?
From the article, there is a mismatch in data types or truncation. The SQL engine has a bug that does not report this as an error.
Quote from first person reporting the bug - "Inspite of the severity level being 16 I don't see the error being caught by TRY / CATCH construct. The code doesn't break and proceeds smoothly as if no error has occurred."
Have you investigated what fields that may contain bad data?
Here are some suggestions.
1 - COMMA DELIMITED FILES ARE PROBLEMATIC - I always hate comma delimited format since commas can be in the data stream. Try using a character like tilde ~ as the delimiter which occurs less often. Could the problem be that a text field has a comma , in it? Thus adding a field to the data stream?
2 - USE STAGING TABLE - It is sometimes better to import the data from the file into a staging table that is defined with columns as varchar (x). This allows the data to get into a table.
Then write a stored procedure to validate the data in the columns before transferring to the production table. Mark any bad rows as suspect.
Insert the data from the staging table to production leaving behind any bad rows.
Send an email for someone to look at the bad data. If this is a re-occurring data file transfer, you will want to fix it at the source.
3 - REWRITE PROCESS WITH A ETL TOOL - Skip writing this stuff in the Engine. SQL Server Integration Services (SSIS) is a great Extract Translate Load (ETL) tool.
There are options in the connection that you can state that text is quoted "", eliminates the above extra comma issue. You can send rows that fail to import into the production table to a hospital table for review.
In summary, there is a bug in the engine.
However, I would definitely consider changing to a tilde formatted file and/or use a staging table. Better yet, if you have the time, rewrite the process with a SSIS package!
Sincerely
J
PS: I am giving Alexander points since he did find the bug on SQL connect. However, I think the format of the file is the root cause.
This will probably catch this error because it catches error Msg 4860:
Q:
TRY doesn't CATCH error in BULK INSERT
BEGIN TRY
DECLARE #cmd varchar(1000)
SET #cmd = 'BULK INSERT [dbo].[tblABC]
FROM ''C:\temp.txt''
WITH (DATAFILETYPE = ''widechar'',FIELDTERMINATOR = '';'',ROWTERMINATOR = ''\n'')'
EXECUTE (#cmd)
END TRY
BEGIN CATCH
select error_message()
END CATCH

SQL 2008 All records in column in table updated to NULL

About 5 times a year one of our most critical tables has a specific column where all the values are replaced with NULL. We have run log explorers against this and we cannot see any login/hostname populated with the update, we can just see that the records were changed. We have searched all of our sprocs, functions, etc. for any update statement that touches this table on all databases on our server. The table does have a foreign key constraint on this column. It is an integer value that is established during an update, but the update is identity key specific. There is also an index on this field. Any suggestions on what could be causing this outside of a t-sql update statement?
I would start by denying any client side dynamic SQL if at all possible. It is much easier to audit stored procedures to make sure they execute the correct sql including a proper where clause. Unless your sql server is terribly broken, they only way data is updated is because of the sql you are running against it.
All stored procs, scripts, etc. should be audited before being allowed to run.
If you don't have the mojo to enforce no dynamic client sql, add application logging that captures each client sql before it is executed. Personally, I would have the logging routine throw an exception (after logging it) when a where clause is missing, but at a minimum, you should be able to figure out where data gets blown out next time by reviewing the log. Make sure your log captures enough information that you can trace it back to the exact source. Assign a unique "name" to each possible dynamic sql statement executed, e.g., each assign a 3 char code to each program, and then number each possible call 1..nn in your program so you can tell which call blew up your data at "abc123" as well as the exact sql that was defective.
ADDED COMMENT
Thought of this later. You might be able to add / modify the update trigger on the sql table to look at the number of rows update prevent the update if the number of rows exceeds a threshhold that makes sense for your. So, did a little searching and found someone wrote an article on this already as in this snippet
CREATE TRIGGER [Purchasing].[uPreventWholeUpdate]
ON [Purchasing].[VendorContact]
FOR UPDATE AS
BEGIN
DECLARE #Count int
SET #Count = ##ROWCOUNT;
IF #Count >= (SELECT SUM(row_count)
FROM sys.dm_db_partition_stats
WHERE OBJECT_ID = OBJECT_ID('Purchasing.VendorContact' )
AND index_id = 1)
BEGIN
RAISERROR('Cannot update all rows',16,1)
ROLLBACK TRANSACTION
RETURN;
END
END
Though this is not really the right fix, if you log this appropriately, I bet you can figure out what tried to screw up your data and fix it.
Best of luck
Transaction log explorer should be able to see who executed command, when, and how specifically command looks like.
Which log explorer do you use? If you are using ApexSQL Log you need to enable connection monitor feature in order to capture additional login details.
This might be like using a sledgehammer to drive in a thumb tack, but have you considered using SQL Server Auditing (provided you are using SQL Server Enterprise 2008 or greater)?

Correct error handling when dropping and adding columns

I have a table that is currently using a couple of columns named DateFrom and DateTo. I'm trying to replace them with a single NewDate column, populated for existing rows with the value from DateFrom.
I need good error/transaction handling as, if the change fails, I don't want a halfway in-between table, I want to revert.
I've tried a number of things but can't get it to work properly. Any help is appreciated as I'm far from experienced with this.
I started with
BEGIN TRAN
ALTER TABLE TableName
ADD NewDate DATETIME
IF ##ERROR = 0 AND ##TRANCOUNT = 1
UPDATE TableName
SET NewDate = ValidFrom
....
This fails immediately as NewDate is not currently a column in the table. Fine, so I add a GO in there. This breaks it into two batches and it now runs, except it makes the ##ERROR check pointless. I also can't use a local variable as those are lost after GO as well. Ideally I'd like to use a TRY...CATCH to avoid checking errors after each statement but I can't use a GO with that as it needs to be one batch.
None of the articles I've found talk about this situation (error handling with GO). So the question is: Is there any way I can get the transaction-with-error-handling approach I'm looking for when adding and updating a column (which seems to necessitate a GO somewhere)?
Or am I going to have to settle for doing it in several batches, without the ability to roll back to my original table if anything goes wrong?
Why are you worried about creating the new column in the transaction? Just create the column and then populate it. You don't even need an explicit tran when populating it. If it fails (which is very unlikely), just do the update again.
I would do the following steps
Add new column
Update new column
Check to see if the data in the new column looks correct
Drop the old columns no longer needed (you may want to check where these columns are being used before dropping them e.g. are they used in any stored procedures, reports, front-end application code)
Also, it is worth adding more context to your question. I assume you are testing a script against a test database and will later apply the script to a prod database. Is the prod database very big? Very busy? Mission critical? Backed up on a schedule?

Find out which row caused the error

I have a big fat query that's written dynamically to integrate some data. Basically what it does is query some tables, join some other ones, treat some data, and then insert it into a final table.
The problem is that there's too much data, and we can't really trust the sources, because there could be some errored or inconsistent data.
For example, I've spent almost an hour looking for an error while developing using a customer's database because somewhere in the middle of my big fat query there was an error converting some varchar to datetime. It turned out to be that they had some sales dating '2009-02-29', an out-of-range date.
And yes, I know. Why was that stored as varchar? Well, the source database has 3 columns for dates, 'Month', 'Day' and 'Year'. I have no idea why it's like that, but still, it is.
But how the hell would I treat that, if the source is not trustable?
I can't HANDLE exceptions, I really need that it comes up to another level with the original message, but I wanted to provide some more info, so that the user could at least try to solve it before calling us.
So I thought about displaying to the user the row number, or some ID that would at least give him some idea of what record he'd have to correct. That's also a hard job because there will be times when the integration will run up to 80000 records.
And in an 80000 records integration, a single dummy error message: 'The conversion of a varchar data type to a datetime data type resulted in an out-of-range datetime value' means nothing at all.
So any idea would be appreciated.
Oh I'm using SQL Server 2005 with Service Pack 3.
EDIT:
Ok, so for what I've read as answers, best thing to do is check each column that could be critical to raising errors, and if they do attend the condition, I should myself raise an error, with the message I find more descriptive, and add some info that could have been stored in a separate table or some variables, for example the ID of the row, or some other root information.
for dates you can use the isdate function
select ISDATE('20090229'),ISDATE('20090227')
I usually insert into a staging table, do my checks and then insert into the real tables
My suggestion would be to pre-validate the incoming data, and as you encounter errors, set aside the record. For example, check for invalid dates. Say you find 20 in a set of 80K. Pull those 20 out into a separate table, with the error message attached to the record. Run your other validation, then finally import the remaining (all valid) records into the desired target table(s).
This might have too much impact on performance, but would allow you to easily point out the errors and allow them to be corrected and then inserted in a second pass.
This sounds like a standard ETL issue: Extract, Transform, and Load. (Unless you have to run this query over and over again against the same set of data, in which case you'd pretty much do the same thing, only over and over again. So how critical is performance?)
What kind of error handling and/or "reporting of bad data" are you allowed to provide? If you have everything as "one big fat query", your options become very limited -- either the query works or it doesn't, and if it doesn't I'm guessing you get at best one RAISERROR message to tell the caller what's what.
In a situation like this, the general framework I'd try to set up is:
Starting with the source table(s)
Produce an interim set of tables (SQLMenace's staging tables) that you know are consistant and properly formed (valid data, keys, etc.)
Write the "not quite so big and fat query" against those tables
Done this way, you should always be able to return (or store) a valid data set... even if it is empty. The trick will be in determining when the routine fails -- when is the data too corrupt to process and produce the desired results, so you return a properly worded error message instead?
try something like this to find the rows:
...big fat query here...
WHERE ISDATE(YourBadVarcharColumn)!=1
Load the Data into a staging table, where most columns are varchar and allow NULLs, where you have a status column.
Run an UPDATE command like
UPDATE Staging
SET Status='X'
WHERE ISDATE(CONVERT(YourCharYear+YourCharMonth+YourCharDat+)!=1
OR OtherColumn<4...
Then just insert from your staging table where Status!='X'
INSERT INTO RealTable
(col1, col2...)
SELECT
col1, col2, ...
where Status!='X'

Resources