Insert/Update rows in a DataTable, VB.NET - sql-server

I'm having issues with a piece of code that should load a database table on a TypedTable and insert (or update if the key is already present), on the update part though my code run extremely slow.
Now, most of the tables I handle require a full refresh, so I wipe the data and re-add everything from another table in the typedtable using a simple AddTableRow(row) procedure that works just fine, but when I need to update the data I use the LoadDataRow(row, fAcceptChanges) function, and even with the .BeginLoadData() -> .EndLoadData() it gets extremely slow (2/3 update per second) with a Table containing around 500k rows of data (every row has like 15 cols).
I'm pretty new to vb.net so I don't know much about alternatives I have to update the datatable, but if anyone know any way to speed it up I'll be really glad to hear everything about.
Some more info:
Mostly the reason because I'm inserting the data row by row is because I need to check the constraints for my table so I can handle exeptions raised from the insert part, plus the automatic constraint check of the TypedDataTable it's pretty good, considering I have to handle more than 10 db tables.
My code for the update run like this atm:
Table = Parser.GetData()
TypedTable = TableAdapter.GetData()
For Each row In Table
Try
Dim TypedRow = TypedTable.NewRow()
LoadNotTypedIntoTyped(row, TypedRow)
TypedTable.BeginLoadData()
TypedTable.LoadDataRow(TypedRow.ItemArray, True) 'TODO speed up this
TypedTable.EndLoadData()
Catch ex As Exception
'Generic exception handling here
End Try
Next
SqlBulkCopyLoadProcedure()

I found a good solution to my particular problem; using a typedtable mean that I have more control on the table constraints, because my datasource is related to the DB table, so I created a new empty typed table to load the new data, then I load the current data from the db and Table1.Merge(Table2) to merge the data.
In my case this is possible because the amount od data I handle in not too big (around 500k records), if the memory becomes a problem I think that a viable solution can be to create a support table and merge directly using SQL, but I'm a DB newbie so contradict me if I'm wrong here
Code of what I did:
Dim SupportTable As TypedTable = MyTypedTable.Clone()
For each row in TableToLoad
Dim NewTypedRow = SupportTable.NewRow()
For Each col In Columns
'Load every column
Next
SupportTable.AddTypedRow(NewTypedRow)
Next
TypedTable.Merge(SupportTable)
TypedTable.AcceptChanges()
'Load to database

Related

Access Linked to SQL: Wrong data shown for a newly created record, autonumber primary keys

This is similar to another question and I have given it the same name. But my situation is a bit different.
The first question for reference: Access Linked to SQL: Wrong data shown for a newly created record
I have an Access front end linked to tables in SQL Server. For all relevant tables, there is an autonumber (int with Identity Specification) as Primary Key. About half of the linked tables have the following issue, the others do not, despite being set up similarly:
When adding a new record to the table, the record is inserted in the SQL database, but then in the access front end view, be it a table or form, the added record is filled up with data of another record.
In the other question, it was explained that Access is querying SQL Server with ##IDENTITY. I saw the same thing in a trace. In my case it tries SELECT ##IDENTITY twice, then attempts to pull the new record with a sp_prepexec generated SQL that I can't read, and consistently gets the wrong one, in certain tables, not in others, which are set up basically the same.
The wrong record being returned seems to be an earlier autonumber in the table, and if I do it several times in a row, it returns a series of autonumbers in sequence, for instance, 18347, 18348, 18349. (These are the incorrect autonumbers being displayed, along with all data from their records, instead of the newly created record.) But if I wait a few minutes, there will be a gap, it might return 18456 next, for instance.
Refreshing does bring the correct record into view.
The autonumber fields do show up in Access design view as Primary Keys.
The Access front end is an .mdb file. We are using Access for Microsoft 365 MSO 64 bit.
As a general rule, this issue should not show up.
However, there are two cases to keep in mind.
First case:
Access when you START typing in a record, with a Access back end (BE), then the auto number is generated, and displayed instant, and this occurs EVEN before the record save.
And in fact if the record is not saved (user hits Esc key, or un-do from menu, or even ctrl-z). At that point, the record is not dirty and will not be saved. And of course this means gaps will and can appear in the autonumber.
WHEN using a linked table to sql server? You can start typing, and the record becomes dirty, but the AUTONUMBER will NOT display, and has NOT yet been generated. And thus your code cannot use the autonumber quite yet. The record has to be saved first before you can get/grab/use the autonumber.
Now for a form + sub form? Well, they work because access (for sql or access tables) ALWAYS does a record save of the main form when focus moves to the child form. So these setups should continue to work.
I note, and mention the above since SOME code that uses or requires use of the autonumber during a record add process MIGHT exist in your application. That code will have to be changed. Now to be fair, even in a fair large application, I tend to find few places where this occurs.
Often the simple solution is to modify the code, and simply force the record to be written, and then you have use of the autonumber.
You can do this:
if me.IsNewReocrd = True then
if me.dirty = true then me.Dirty = false
end if
' code here that needs the PK autonumber
lngNewID = me!id ' the autonumber is now generated and available for use.
The next common issue (and likely YOUR issue).
The table(s) in question have triggers. You have to modify the store procedures to re-select the PK id, and if you don't, then you see/find the symptoms you see. If the store procedure updates other tables, then it can work, but the last line of the store procedure will need to re-select the PK id.
So, in the last line of your store procedure that is attached to the table? you need to re-select the existing PK value.
eg:
SELECT #MyPK as ID

High volume inserts for SQL Server

I'm looking for some advice on how to implement a process for mass inserts, like to the tune of 400 records per second. The data is coming from an outside real time trigger and the app will get notified when a data change happens. When that data change happens, I need to consume it.
I've looked at several different implementations for doing batch processing including using datatables/sqlbulkcopy or writing to csv and consuming.
What can you recommend?
400 inserts per second doesn't feel like it should present any major challenge. It depends on what you're inserting, if there are any indexes which could have page splits due to inserts, and if you have any extra logic going on during your insert proc or script.
If you want to insert them one by one, I would recommend just building a barebones stored procedure which does a simple insert of it's parameters into a staging table with no indexes, constraints, anything. That will allow you to very quickly get the data into the database, and you can have a separate process come through every minute or something and work off the rows in batches.
Alternatively, you could have your application store up records until you reach a certain number, and then insert them into the database with a proc using a table-valued parameter. Then you'll only have one insert of however many rows you chose to batch up. The cost of that should be pretty trivial. Do note however that if your application crashes before it's inserted enough rows, those will be lost.
SqlBulkCopy is a powerful tool, but as the name suggests, it's built more for bulk loading of tables. If you have a constant stream of insert requests coming in, I would not recommend using it to load up your data. That might be a good approach if you want to batch up a LOT of requests to load all at once, but not as a recurring and frequent activity.
This works pretty well for me. I can't guarantee you 400 per sec tho:
private async Task BulkInsert(string tableName, DataTable dt)
{
if (dt == null)
return;
using (SqlBulkCopy bulkCopy = new SqlBulkCopy("./sqlserver..."))
{
bulkCopy.DestinationTableName = tableName;
await bulkCopy.WriteToServerAsync(dt);
}
}

How to improve the update?

description
I use Postgres together with python3
There are 17 million rows in the table, the max ID 3000 million+
My task is select id,link from table where data is null;.And do some codes them Update table set data = %s where id = %s.
I tested a single data update needs 0.1s.
my thoughts
The following is my idea
Try a new database, I heard radis soon.But i don't know how to do.
In addition,what is the best number of connections?
I used to made 5-6 connections.
Now only two connections, but better.One hour updated 2million data.
If there is any way you can push the calculation of the new value into the database, i.e. issue a single large UPDATE statement like
UPDATE "table"
SET data = [calculation here]
WHERE data IS NULL;
you would be much faster.
But for the rest of this discussion I'll assume that you have to calculate the new values in your code, i.e. run one SELECT to get all the rows where data IS NULL and then issue a lot of UPDATE statements, each targeting a single row.
In that case, there are two ways how you can speed up processing considerable:
Avoid index updates
Updating an index is more expensive than adding a tuple to the table itself (the appropriately so-called heap, onto which it is quick and easy to pile up entries). So by avoiding index updates, you will be much faster.
There are two ways to avoid index updates:
Drop all indexes after selecting the rows to change and before the UPDATEs and recreate them after processing is completed.
This will be a net win if you update enough rows.
Make sure that there is no index on data and that the tables have been created with a fillfactor of less then 50. Then there is room enough in the data pages to write the update into the same page as the original row version, which obviates the need to update the index (this is known as a HOT update).
This is probably not an option for you, since you probably didn't create the table with a fillfactor like that, but I wanted to add it for completeness' sake.
Bundle many updates in a single transaction
By default, each UPDATE will run in its own transaction, which is committed at the end of the statement. However, each COMMIT forces the transaction log (WAL) to be written out to disk, which slows down processing considerably.
You do that by explicitly issuing a BEGIN before the first UPDATE and a COMMIT after the last one. That will also make the whole operation atomic, so that all changes are undone automatically if processing is interrupted.

How to bulk insert and validate data against existing database data

Here is my situation, my client wants to bulk insert 100,000+ rows into the database from a csv file which is simple enough but the values need to be checked against data that is already in the database (does this product type exist? is this product still sold? etc.). To make things worse these files will also be uploaded into the live system during the day so I need to make sure I’m not locking any tables for long. The data that is inserted will also be spread across multiple tables.
I’ve been adding the date into a staging table which takes seconds, I then tried creating a WebService to start processing the table using Linq and marking any erroneous rows with an invalid flag (this can take some time). Once the validation is done I need take the valid rows and update/add the rows to the appropriate tables.
Is there a process for this that I am unfamiliar with?
For a smaller dataset I would suggest
IF EXISTS (SELECT blah FROM blah WHERE....)
UPDATE (blah)
ELSE
INSERT (blah)
You could do this in chunks to avoid server load, but this is by no means a quick solution, so SSIS would be preferable

Correct error handling when dropping and adding columns

I have a table that is currently using a couple of columns named DateFrom and DateTo. I'm trying to replace them with a single NewDate column, populated for existing rows with the value from DateFrom.
I need good error/transaction handling as, if the change fails, I don't want a halfway in-between table, I want to revert.
I've tried a number of things but can't get it to work properly. Any help is appreciated as I'm far from experienced with this.
I started with
BEGIN TRAN
ALTER TABLE TableName
ADD NewDate DATETIME
IF ##ERROR = 0 AND ##TRANCOUNT = 1
UPDATE TableName
SET NewDate = ValidFrom
....
This fails immediately as NewDate is not currently a column in the table. Fine, so I add a GO in there. This breaks it into two batches and it now runs, except it makes the ##ERROR check pointless. I also can't use a local variable as those are lost after GO as well. Ideally I'd like to use a TRY...CATCH to avoid checking errors after each statement but I can't use a GO with that as it needs to be one batch.
None of the articles I've found talk about this situation (error handling with GO). So the question is: Is there any way I can get the transaction-with-error-handling approach I'm looking for when adding and updating a column (which seems to necessitate a GO somewhere)?
Or am I going to have to settle for doing it in several batches, without the ability to roll back to my original table if anything goes wrong?
Why are you worried about creating the new column in the transaction? Just create the column and then populate it. You don't even need an explicit tran when populating it. If it fails (which is very unlikely), just do the update again.
I would do the following steps
Add new column
Update new column
Check to see if the data in the new column looks correct
Drop the old columns no longer needed (you may want to check where these columns are being used before dropping them e.g. are they used in any stored procedures, reports, front-end application code)
Also, it is worth adding more context to your question. I assume you are testing a script against a test database and will later apply the script to a prod database. Is the prod database very big? Very busy? Mission critical? Backed up on a schedule?

Resources