Data integration using SQL statements, opposed to cursors/triggers

Data integration using SQL statements, opposed to cursors/triggers - sql-server

I am integrating my .NET/SQL application with two other products. Having read/experienced the performance and other issues with cursors and triggers, I decided to use the batch method using a series of SQL insert and update statements. In some places I need to look up mapping IDs from the incoming feeds and map to IDs in my system. I also need to do a fair amount of error handling in my sql batch code where for instance if a related ID is missing or NULL, I write that to an error log and not process that record at all. I think the system will process a large initial batch of hundreds/thousands of records and once in production we will read incoming feeds on an hourly basis. So far so good.
The problem is I can't possibly pre-determine every single error that could result in the incoming feed. After days of testing I am still seeing something or the other fails and the batch doesn't process. When an error happens in a batch or set-based integration (rather than cursor/trigger), I have no way of pin pointing which record the statement failed at. I can figure out which SQL statement in my batch failed but not at which exact row.
Whereas if I used a cursor I would know as the cursor processes which exact record bombed and tuck it away into an error log. Isn't this one reason why in some cases cursors are helpful?
Also, is there any way using my current set-based method of batching, I could pin point which row insert/update failed and have it move on with the rest of the processing?
Thanks.

Related

SQL Process without Task State bocks other processes

I'm having an issue with processes that lock my SQL Server even though they appear to be finished.
The blocking Processes are 4 a simple SELECT GETDATE() commands that just don't finish for some reason unknown to me. The SQL Server Profiler doesn't really show any activity except for repeating the SELECT GETDATE() every four minutes. - It is possible that the same connection sent a UPDLOCK request before that.
I mostly want to find an explaination for that behaviour. I can't really influence those blocking requests. - As you can see, they are sent by an external company.
The suspended process is also called from within the Business Central Server, but is circumventing the standard interpretation layer to optimize performance. - To do this, it's calling the SQL .Net Class to execute the SQL query directly.
If i kill the process, the server throws an error and the whole execution falls appart.
p.s. for the PPL that work with BC in here and think this is not a good idea:
This code won't run on a daily business. I it's just for migrating data during upgrades from NAV to BC. We Build a tool that allows us to map Fields between C/AL and AL solutions and generate AL-Extension based on those mappings. These extensions grab the data from a copy of the original DB and write them directly into their destination files. We need the SQL commands because some of our customers have so much data accumulated over the years that an upgrade would otherwise take more than a Week if the data was processed in AL

How does SQL server insert data parallely between applications?

I have two applications.
One inserts data into database continuously like it is having an infinity loop.
When the second application inserts data to same database and table what will happen.
If it waits till the other application to complete inserting which will handle this?
Or it will say it is busy?
Or code throws an exception?

SQL servers have something called a connection pool which means that more than once connection to the database can be made at any particular time, and that's where the easy bit ends.
If you were to for example connect to the database on two applications at the same time and insert data in to different tables from each application then the two could happily happen at the same time without issue.
If however those applications wanted to do something like edit the same row then there's an issue with "locking" ...
Essentially any operation on a SQL database requires "acquiring a lock" on a "set" or "row" or "cell" depending on the configuration of the server its hard to say what might happen in your case.
So the simple answer is:
Yes, SQL can make stuff happen (like inserts) at the same time but with some clauses.
And long answer ...
requires in depth knowledge of locking and your database and server configuration.

SQL Server Nested Triggers not firing as expected

I have upgraded a SQL Server 6.5 database to SQL Server 2012 by scripting the schema from 6.5, fixing any syntax issues in this script and then I have used this script to create a 2012 database.
At the same time I have upgraded the front-end application from PowerBuilder 6 to 12.5.
When I perform a certain action in the application it inserts data in to a given table. This table has a trigger associated with the INSERT action and within this trigger other tables are updated. This causes additional triggers to fire on these tables as well.
Initially the PowerBuilder application reports the following error:
Row changed between retrieve and update.
No changes made to database.
Now I understand what this error message means but this is where it gets really 'interesting'!
In order to understand what is happening in the triggers I decided to insert data in to a logging table from within the triggers so that I could better understand the flow of events. This had a rather unexpected side effect - the PowerBuilder application no longer reports any errors and when I check in the database all data is written away as expected.
If I remove these lines of logging, the application once again fails with the error message previously listed.
My question is - Can anyone explain why adding some lines of logging could possibly have this side effect? It almost seems like the act of adding some logging which write data away to a logging table, slows things down or somehow serializes the triggers to fire in the correct order....
Thanks in advance for any insight you can offer :-)

Well, let's recap why this message comes up (I have a longer explanation at http://www.techno-kitten.com/PowerBuilder_Help/Troubleshooting/Database_Irregularities/database_irregularities.html). It's basically because the database can no longer find the data it's trying to UPDATE, based on the WHERE clause generated by the DataWindow. Triggers cause this by changing data in columns in the updated table, so logic of WHERE = fails.
If I were troubleshooting this, I'd do the following for both versions of the trigger:
retrieve your data in the app, and also in a DBMS tool, cache the data from both (data from the Original buffer in PB, debugger breakpoint in PB may help) and compare the columns you expect to be in the WHERE clause (client side manipulation of data and status flags can also cause this problem)
make your data changes and initiate the save
from a breakpoint in the SQLPreview events (this is likely multiple rows if it's trigger-caused), cache the pending UPDATE statements
while still paused in SQLPreview, use the WHERE clause in the UPDATE statements to SELECT the data with the DBMS tool
Somewhere through all this, you'll identify where the process is breaking down in the failure case, and figure out why it passes in the good case. I suspect you'll find a much simpler solution than you're hypothesizing.
Good luck,
Terry

Transaction Size Limit in SQL Server

I'm loading large amounts of data from a text file into SQL Server. Currently each record is inserted (or updated) in a separate transaction, but this leaves the DB in a bad state if a record fails.
I'd like to put it all in one big transaction. In my case, I'm looking at ~250,000 inserts or updates and maybe ~1,000,000 queries. The text file is roughly 60MB.
Is it unreasonable to put the entire operation into one transaction? What's the limiting factor?

It's not only not unreasonable to do so, but it's a must in case you want to preserve integrity in case any record fails, so you get an "all or nothing" import as you note. 250000 inserts or updates will be no problem for SQL to handle, but I would take a look at what are those million queries. If they're not needed to perform the data modification, I would take them out of the transaction, so they don't slow down the whole process.
You have to consider that when you have an open transaction (regardless of size), looks will occur at the tables it touches, and lengthy transactions like yours might cause blocking in other users that are trying to read them at the same time. If you expect the import to be big and time-consuming and the system will be under load, consider doing the whole process over the night (or any non-peak hours) to mitigate the effect.
About the size, there is no specific size limit in SQL Server, they can theoretically modify any amount of data without problems. The practical limit is really the size of the transaction log file of the target database. The DB engine stores all the temporary and modified data in this file while the transaction is in progress (so it can use it to roll it back if needed), so this file will grow in size. It must have enough free space in the DB properties, and enough HD space for the file to grow. Also, the row or table locks that the engine will put on the affected tables consumes memory, so the server must have enough free memory for all this plumbing too. Anyway, 60MB in size is often too little to worry about generally. 250,000 rows is considerable, but not that much too, so any decent-sized server will be able to handle it.

SQL Server can handle those size transactions. We use a single transaction to bulk load several million records.
The most expensive part of a database operation is usually the client server connection and traffic. For inserts/updates indexing and logging are also expensive, but you can mitigate those costs by using the correct loading techniques(see below). You really want to limit the amount of connections and data transfered between client and server.
To that end, you should consider bulk loading the data using SSIS or C# with SqlBulkCopy. Once you bulk load everything then you can use set based operations ON THE SERVER to update or verify your data.
Take a look at this question for more suggestions about optimizing data loads. The question is related to C# but a lot of the information is useful for SSIS or other loading methods. What's the fastest way to bulk insert a lot of data in SQL Server (C# client) .

There is no issue with doing an all or nothing bulk operation, unless a complete rollback is problematic for your business. In fact, a single transaction is the default behavior for a lot of bulk insert utilities.
I would strongly advise against a single operation per row. If you want to weed out bad data, you can load the data into a staging table first and pro grammatically determine "bad data" and skip those rows.

Well personally, I don't load imported data directly to my prod tables ever and I weed out all the records which won't pass muster long before I ever get to the point of loading. Some kinds of errors kill the import completely and others might just send the record to an exception table to be sent back to the provider and fixed for the next load. Typically I have logic that determines if there are too many exceptions and kills the package as well.
For instance suppose the city is a reuired field in your database and in the file of 1,000,000 records, you have ten that have no city. It is probably best to send them to an exception table and load the rest. But suppose you have 357,894 records with no city. Then you might need to be having a conversation with the data provider to get the data fixed before loading. It will certainly affect prod less if you can determine that the file is unuseable before you ever try to affect production tables.
Also, why are you doing this one record at a time? You can often go much faster with set-based processing especially if you have already managed to clean the data beforehand. Now you may still need to do in batches, but one record at a time can be very slow.
If you really want to roll back the whole thing if any part errors, yes you need to use transactions. If you do this in SSIS, then you can put transactions on just the part of the package where you affect prod tables and not worry about them in the staging of the data and the clean up parts.

Getting stored procedure usage data on SQL Server 2000

What is the best way to get stored procedure useage data on a specific database out of SQL Server 2000?
The data I need is:
Total of all stored procedure calls over X time
Total of each specific stored procedure call over X time.
Total time spent processing all stored procedures over X time.
Total time spent processing specific stored procedures over X time.
My first hunch was to setup SQL Profiler wiht a bunch of filters to gather this data. What I don't like about this solution is that the data will have to be written to a file or table somewhere and I will have to do the number crunching to figure out the results I need. I would also like get these results ober the course of many days as I apply changes to see how the changes are impacting the database.
I do not have direct access to the server to run SQL Profiler so I would need to create the trace template file and submit it to my DBA and have them run it over X time and get back to me with the results.
Are there any better solutions to get the data I need? I would like to get even more data if possible but the above data is sufficient for my current needs and I don't have a lot of time to spend on this.
Edit: Maybe there are some recommended tools out there that can work on the trace file that profile creates to give me the stats I want?

Two options I see:
Re-script and recompile your sprocs to call a logging sproc. That sproc would be called by all your sprocs that want to have perf tracking. Write it to a table with the sproc name, current datetime, and anything else you'd like.
Pro: easily reversible, as you'd have a copy of your sprocs in a script that you could easily back out. Easily queryable!
Con: performance hit on each run of the sprocs that you are trying to gauge.
Recompile your data access layer with code that will write to a log text file at the start and end of each sproc call. Are you inheriting your DAL from a single class where you can insert this logging code in one place? Pro: No DB messiness, and you can switch in and out over an assembly when you want to stop this perf measurement. Could even be tweaked with on/off in app.config. Con: disk I/O.

Perhaps creating a SQL Server Trace outside of SQL Profiler might help.
http://support.microsoft.com/kb/283790
This solution involves creating a text file with all your tracing options. The output is put into a text file. Perhaps it could be modified to dump into a log table.
Monitoring the traces: http://support.microsoft.com/kb/283786/EN-US/

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight