Do relational databases execute insert statements in parallel or sequentially? - database

If two users were to execute INSERT INTO statements on the same target table at the same time, would these be executed in parallel or in sequence?
Will this behavior change based on whether the target table has a primary key or not?
Is this a defined rule for all relational databases or do different vendors implement this in different ways?

In general they will (should) be executed in parallel, also if a primary key is defined.
The behaviour depends heavily on the DBMS. MySQL with MyISAM will e.g. block any further access to a table if DML is being executed against the table. The same is true for SQL Server in the default installation and older DB2 version.
In general if the DBMS is using MVCC (Oracle, PostgreSQL, Firebird, MySQL/InnoDB, ...) then you can expect inserts to run in parallel
One thing that can block concurrent inserts is if two transactions insert the same primary key value. In that case the second transaction will need to wait for the first to either commit (then the second one will get a pk violation error) or rollback (the second one will succeed).

Related

Set Null works in Sqlite but not works in Sql Server when relating table itself

The problem is that "DeleteBehavior.SetNull" works only in Sqlite and doesn't work at all in Sql Server, is this some limitation of Sql Server with SET NULL?
I have the "User" model:
User.Id
User.Name
And I also have the "Partner" model:
Partner.Id
Partner.Title
Partner.ParentId
Partner.Parent (virtual)
Scenario:
I create Partner 1
I create Partner 2 and define that the ParentId is Partner 1 (1 is the father of 2)
I try to delete Partner 1 (I try to delete the parent)
At that moment, Sqlite defines NULL in the ParentId of Partner 2, that's correct, that's the behavior I want, but in SQL Server I can't do that at all, I tried innumerable ways and I fall into some errors, follow below:
Errors:
Delete Error:
Microsoft.Data.SqlClient.SqlException (0x80131904): The DELETE statement conflicted with the SAME TABLE REFERENCE constraint "FK_Partners_Partners_ParentId". The conflict occurred in database "master", table "dbo.Partners", column 'ParentId'.
Migrations Error:
Introducing FOREIGN KEY constraint 'FK_Partners_Partners_ParentId' on table 'Partners' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
Could not create constraint or index. See previous errors.
I even found some old texts saying that this is a Sql Server limitation, but it's already 2023 and this limitation still exists? Is it possible to get around this in some way that is easy and affects every table in the database?
I already tried all the DefaultBehavior and none works like Sqlite, I was programming 100% in Sqlite and I managed to develop a system and everything is working, however when generating the migration and trying to use Sql Server I came across this problem.
The same thing is asked at dba.stackexchange.com. The answers explain in detail why this isn't so easy to implement. Relational databases operate on sets of rows at a time, not individual rows. Deleting or updating rows one by one is the slowest way possible.
While SQLite is built to handle a few thousand rows for a single application running inside a watch, SQL Server has to handle thousands of concurrent operations to the same table that may contain several millions of rows spread across multiple partitions. The self-referencing ON DELETE SET NULL has to work reliably and predictably when deleting 1 row in an 1000 row table and when deleting 10K rows in a 50M row table.
As Mikael Eriksson explains in the first answer, ON DELETE SET NULL converts a DELETE operation on a table to an UPDATE operation on the same table.
This DBA question on cascading DELETEs shows what's involved in the easy case :
In this picture the server :
Finds the rows that need to be deleted in the first table,
Removes them from the parent table. That means marking rows and pages for deletion, writing records to the transaction log
Spool the deleted keys so they can be used on the related table
Repeat 1-2 on the child table
When all that finishes, commit the transaction by committing all changes in the data pages and the transaction log.
And that's a single operation. ON DELETE SET NULL on the other hand converts the DELETE operation to an DELETE and an UPDATE on the same table. The database would have to both DELETE and UPDATE index rows on the ParentID index to get this to happen. Different kinds of locks would have to be taken, and some of them could be taken
There's a similar statement that does multiple operations at once, MERGE. Aaron Bertrand's Use Caution with SQL Server's MERGE Statement shows a list of 30 bugs for that statement alone. MERGE isn't even atomic and the UPDATE/DELETE/INSERT operations are executed separately, which is the cause of some of the bugs.
I'd rather not have ON DELETE SET NULL than have a slow or unreliable one
While trying to reproduce this I found an SQLite limitation - foreign keys aren't enforced by default for compatibility with the way it worked over a decade ago. The docs warn this can change in the future:
Foreign key constraints are disabled by default (for backwards compatibility), so must be enabled separately for each database connection. (Note, however, that future releases of SQLite might change so that foreign key constraints enabled by default. Careful developers will not make any assumptions about whether or not foreign keys are enabled by default but will instead enable or disable them as necessary.) The application can also use a PRAGMA foreign_keys statement to determine if foreign keys are currently enabled.
This can seem like an illogical restriction in 2023 until one remembers that SQLite was built to run on the weakest possible devices (microcontrollers, not even processors) where the very fact of checking constraints can cause significant problems. Those devices can easily be inside a car or other hardware device with a lifetime of decades.

Why does postgres lock one table when inserting into another

My source tables called Event sitting in a different database and it has millions of rows. Each event can have an action of DELETE, UPDATE or NEW.
We have a Java process that goes through these events in the order they were created and do all sort of rules and then insert the results into multiple tables for look up, analyse etc..
I am using JdbcTemplate and using batchUpdate to delete and upsert to Postgres DB in a sequential order right now, but I'd like to be able to parallel too. Each batch is 1,000 entities to be insert/upserted or deleted.
However, currently even doing in a sequential manner, Postgres locks queries somehow which I don't know much about and why.
Here are some of the codes
entityService.deleteBatch(deletedEntities);
indexingService.deleteBatch(deletedEntities);
...
entityService.updateBatch(allActiveEntities);
indexingService.updateBatch(....);
Each of these services are doing insert/delete into different tables. They are in one transaction though.
The following query
SELECT
activity.pid,
activity.usename,
activity.query,
blocking.pid AS blocking_id,
blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid));
returns
Query being blocked: "insert INTO ENTITY (reference, seq, data) VALUES($1, $2, $3) ON CONFLICT ON CONSTRAINT ENTITY_c DO UPDATE SET data = $4",
Blockking query: delete from ENTITY_INDEX where reference = $1
There are no foreign constraints between these tables. And we do have indexes so that we can run queries for our processing as part of the process.
Why would one completely different table can block the other tables? And how can we go about resolving this?
Your query is misleading.
What it shows as “blocking query” is really the last statement that ran in the blocking transaction.
It was probably a previous statement in the same transaction that caused entity (or rather a row in it) to be locked.

Synchronize data from Oracle to PostgreSQL

We would like to synchronize data (insert, update) from Oracle (11g) to PostgreSQL (10). Our approach was the following:
A trigger on the table in Oracle updates a column with nextval from a sequence before insert and update.
PostgreSQL knows the last sequence number processed and fetches the rows from Oracle > lastSequenceNumberFetched.
We now have the following problem:
Session 1 in Oracle inserts a row, sequence number (let's say 45) is written but no COMMIT is done in Oracle.
Session 2 in Oracle inserts a row, sequence number is written (let's say 49 (because sequences in Oracle can have gaps)) and a COMMIT is done in Oracle.
Session in PostgreSQL fetches rows from Oracle with sequenceNumber > 44 (because the lastSequenceNumberFetched is 44) and gets the row with sequenceNumber 49. So this is the new lastSequenceNumberFetched.
Session 1 in Oracle makes a commit.
Session in PostgreSQL fetches rows from Oracle with sequenceNumber > 49. Problem is that the row with sequenceNumber 45 is never fetched.
Are there any better approaches for our use case avoiding our problem with missing data?
In case you don't have delete operations in your tables and the tables are not very big then I suggest to use Oracle System Change Number (SCN) on the row level which is returned by the pseudo column ORA_ROWSCN (link). This is the commit time presented by number. By default the SCN is tracked for the data block, but you can enable tracking on the row level (keyword rowdependencies). So you have to recreate your table with this keyword. At the sync procedure launch you get the current scn by the function call dbms_flashback.get_system_change_number, then scan all tables where ora_rowscn between _last_scn_value_ and _current_scn_value_. The disadvantage is that this pseudo columns is not indexed, so you will have full table scans, which is slow for big tables.
If you use delete statements then you have to track the records which were deleted. For this purpose you can use one log table having the following columns: table_name, table_id_value, operation (insert/update/delete). Table is filled by the trigger on base tables. So for your case when session 1 commits data in base table - then you have the record in log table to process. And you don't see it until the session commits. So no issues with sequence numbers that you described.
Hope that helps.
Is this purely a data project or do you have some client here. If you do have a middle tier you could use an ORM to abstract some of this and do writes to both. Do you care whether the sequences are the same? It would be possible to do something like collect all the data to synchronize since a particular timestamp (every table would have to have a UTC timestamp) and then take a hash of all the data and compare with what is in Postgres.
It might be useful to have some more of your requirements for the synchronization of data and the reasoning behind this e.g.
Do the keys need to be the same against both environments? Why?
Who views the data, is the same consumer looking at both sources.
Why wouldn't you just use an ORM to target only one db why do you need oracle and postgres?
I have seen a similar setup. An application on Postgres mostly for reporting and other secondary tasks while main app was on Oracle.
Some of the main app tables are cached in Postgres for convenience. But this setup brings in the sync problem.
The compromise solution was a mix of incremental sequence-based sync during daytime and full table copy overnight
Regarding other solutions proposed here:
Postgres fdw is slow for complex queries and it puts extra load on foreign db especially when where clause refer to both local and foreign tables.
The same query will run much faster if foreign table is cached in postgres.
Incremental/differential sync using sequence numbers -tried this and works acceptable for small tables, but the nightmare starts with child relations maybe an orm can help here
The ideal solution in my opinion would probably be to stream Oracle changes to Postgres or intermediary process that replicates changes to Postgres
I have no clue about how to do this as I understood it requires Oracle golden gate app (+ licence)

Insert from memory optimized table to physical table

Imagine this scenario in SQL Server 2016: we have to tables A and B
A is a memory optimized table
B is a normal table
We join A and B, and nothing happens and 1000 rows are returned in min time.
But when we want to insert this result set into another table (memory optimized table OR normal table or even a temp table), it takes 10 to 20 seconds to insert.
Any ideas?
UPDATE : Execution plans for normal scenario and memory optimized table added
When a DML statement targets a Memory-Optimized table, the query cannot run in parallel, and the server will employ a serialized plan. So, your first statement runs in a single-core mode.
In the second instance, the DML statement leverages the fact that "SELECT INTO / FROM" is parallelizable. This behavior was added in SQL Server 2014. Thus, you get a parallel plan for that. Here is some information about this:
Reference: What's New (Database Engine) - SQL Server 2014
I have run into this problem countless times with Memory-Optimized targets. One solution I have found, if the I/O requirements are high on the retrieval, is to stage the result of the SELECT statement into a temporary table or other intermediate location, then insert from there into the Memory-Optimized table.
The third issue is that, by default, statements that merely read from a Memory-Optimized table, even if that table is not the target of DML, are also run in serialized fashion. There is a hotfix for this, which you can enable with a query hint.
The hint is used like this:
OPTION(HINT USE ('ENABLE_QUERY_OPTIMIZER_HOTFIXES'))
Reference: Update enables DML query plan to scan query memory-optimized tables in parallel in SQL Server 2016
In either case, any DML that has a memory-optimized table as a target is going to run on a single core. This is by design. If you need to exploit parallelism, you cannot do it if the Memory-Optimized table is the target of the statement. You will need to benchmark different approaches to find the one that performs best for your scenario.

Database view performance issue

I am using SQL Server 2008 and I have two tables which are of the same schema and I create a view which union the content of the two tables to provide a single view of "table" to external access.
One of the table is read only and the other table contains bulk insert/delete operation (on the other table, I will use bulk insert at some interval to insert everal thousand of rows and run another SQL Job to remove several Million rows daily).
My question is, if the other table is under bulk insert/delete operation, will the physical table be locked so that the access from external user to the union view of two tables are also blocked? (I am thinking of whether lock escalation applies in this scenario, row locks finally lock the table, which finally locks the access of the view?)
if the other table is under bulk insert/delete operation, will the physical table be locked so that the access from external user to the union view of two tables are also blocked?
Yes, with the caveat that, if the optimiser can find a way to execute the query that does not involve accessing the bulk insert table then access will not be blocked.
If you are looking to optimise bulk loading times make sure you have a read of this blog post.
EDIT
What is the actual problem you are experiencing? Do you really need to be using this view everywhere (for example are there places that only need data from one table, that are querying it via the view?)
If you want you view to be "online" all the time consider either snapshot isolation, or if you are loading up full sets into the bulk table (eg. full content is replaced daily), you can load the data into a separate table and sp_rename the table in (in a transaction)
Mostly likely yes. It depends on lock escalation
To work around (not all options):
Use the WITH (NOLOCK) table hint to ignore and don't set any locks. If used on the view it also applies to both tables
Use WITH (READPAST) if you don't mind skipping locked rows in the BCP table
Change the lock granularity for the BCP table. Use sp_tableoption and set "table lock on bulk load" = false.
Edit: Now I've had coffee...
If you need to query the bulk table during load/delete operations and get accurate results and not suffer performance hits, I suggest you need to consider SNAPSHOT isolation
Edit 2: SNAPSHOT isolation

Resources