Multiple users issue on same table - oracle-data-integrator

I have 10 mappings In ODI12c,and all are using same target table(single table),but due to some performance issue,i want that at a time max 2 users can Execute mappings(Max 2 mappings), since they are using same target table,if more then 2 users uses that same target then it should not be executed.How can i implement this in ODI12c?

Are these mappings executed in different packages or the same?
This is a lock-release question.
Depending on your details, you will need to have your mappings have access to a lock mechanism (i.e. increasing/decreasing a variable stored in a db) to manage the maximum amount of parallelism.
Pay attention to failing scenarios though that can leave the lock closed forever.
If they are in the same package, just execute 2 at a time and wait for children

Related

How to deal with race condition in case when it's possible to have multiple servers (and each of them can have multiple threads)

Let's say we have an inventory system that tracks the available number of products in a shop (quantity). So we can have something similar to this:
Id
Name
Quantity
1
Laptop
10
We need to think about two things here:
Be sure that Quantity is never negative
If we have simultaneous requests for a product we must ensure valid Quantity.
In other words, we can have:
request1 for 5 laptops (this request will be processed on thread1)
request2 for 1 laptop (this request will be processed on thread2)
When both requests are processed, the database should contain
Id
Name
Quantity
1
Laptop
4
However, that might not be the case, depending on how we write our code.
If on our server we have something similar to this:
var product = _database.GetProduct();
if (product.Quantity - requestedQuantity >= 0)
{
product.Quantity -= requestedQuantity;
_database.Save();
}
With this code, it's possible that both requests (that are executed on separate threads) would hit the first line of the code at the exact same time.
thread1: _database.GetProduct(); // Quantity is 10
thread2: _database.GetProduct(); // Quantity is 10
thread1: _product.Quantity = 10 - 5 = 5
thread2: _product.Quantity = 10 - 1 = 9
thread1: _database.Save(); // Quantity is 5
thread2: _database.Save(); // Quantity is 9
What has just happened? We have sold 6 laptops, but we reduced just one from the inventory.
How to approach this problem?
To ensure only positive quantity we can use some DB constraints (to imitate unsigned int).
To deal with race condition we usually use lock, and similar techniques.
And depending on a case that might work, if we have one instance of a server...But, what should we do when we have multiple instances of the server and the server is running on multithreading environment?
It seems to me that the moment you have more than one web server, your only reasonable option for locking is the database. Why do I say reasonable? Because we have Mutex.
A lock allows only one thread to enter the part that's locked and the lock is not shared with any other processes.
A mutex is the same as a lock but it can be system-wide (shared by multiple processes).
Now...This is my personal opinion, but I expect that managing Mutex between a few processes in microservice-oriented world where a new instance of the server can spin up each second or where the existing instance of the server can die each second is tricky and messy (Do we have some Github example?).
How to solve the problem then?
Stored procedure* - offload the responsibility to the database. Write a new stored procedure and wrap the whole logic into a transaction. Each of the servers will call this SP and we don't need to worry about anything. But this might be slow?
SELECT ...FOR UPDATE - I saw this while I was investigating the problem. With this approach, we still try to solve the problem on 'database' level.
Taking into account all of the above, what should be the best approach to solve this problem? Is there any other solution I am missing? What would you suggest?
I am working in .NET and using EF Core with PostgreSQL, but I think that this is really a language-agnostic question and that principle for solving the issue is similar in all environments (and similar for many relational databases).
After reading the majority of the comments let's assume that you need a solution for a relational database.
The main thing that you need to guarantee is that the write operation at the end of your code only happens if the precondition is still valid (e.g. product.Quantity - requestedQuantity).
This precondition is evaluated at the application side in memory. But the application only sees a snapshot of the data at the moment, when database read happened: _database.GetProduct(); This might become obsolete as soon as someone else is updating the same data. If you want to avoid using SERIALIZABLE as a transaction isolation level (which has performance implications anyway), the application should detect at the moment of writing if the precondition is still valid. Or said differently, if the data is unchanged while it was working on it.
This can be done by using offline concurrency patterns: Either an optimistic offline lock or a pessimistic offline lock. Many ORM frameworks support these features by default.

How to prevent update anomalies with multiple clients running non-atomic computations concurrently in PostgreSQL?

I am running three PostgreSQL instances using replication (1 master, 2 slaves) which are accessed by two separate servers:
The first (unexposed) server basically iterates over every row in a particular table and continuously updates specific columns (resources) every tick (based on production rate of those resources) for each user.
The second server is a public API that exposes various functions such as spending a certain amount of those resources.
In order to access and manipulate the data I am using an ORM library which allows me to write code as follows:
const resources = await repository.findById(1337);
// some complex computation
resources.iron = computeNewIron(resources.iron);
await repository.save(resources);
Of course it might occur that the API wants to deduct a specific amount of resources right when the server handling the ticks is trying to update the amount of resources which can cause either of the servers to assume a certain amount of resources that is incorrect, basically your typical UPDATE anomaly.
My problem is that I am not just writing a "simple" atomic query such UPDATE table SET iron = iron + 42 WHERE id = :id. The ORM library is internally using a direct assignment that is not self-referencing the respective columns which yields something akin to UPDATE table SET iron = 123 WHERE id = :id where the amount has been computed previously.
I can just assume that it's possible to prevent the mentioned anomaly if I use manually written queries that are incrementing/decrementing the values atomically with self-references. I'd like to know which other options can alleviate the issue. Should I wrap my SELECT/computation/UPDATE in a transaction? Does this suffice?
Your question is a bit unclear, but if your transaction spans several statements, yet needs to have a consistent state of the database, there are basically two options:
Use pessimistic locking: when you read values from the database, do it with SELECT ... FOR UPDATE. Then the rows are locked for the duration of your transaction, and no concurrent transaction can modify them.
Use optimistic locking: start your transaction in REPEATABLE READ isolation level. Then you see a consistent snapshot of the database for the whole duration of your transaction. If somebody else modifies your data after you read them, your UPDATE will cause a serialization error and you'll have to retry the transaction.
Optimistic locking is better if conflicts are rare, while pessimistic locking is preferable if conflicts are likely.

Library design methodology

I want to make the "TRAP AGENT" library. The trap agent library keeps the tracks of the various parameter of the client system. If the parameter of the client system changes above threshold then trap agent library at client side notifies to the server about that parameter. For example, if CPU usage exceeds beyond threshold then it will notify the server that CPU usage is exceeded. I have to measure 50-100 parameters (like memory usage, network usage etc.) at client side.
Now I have the basic idea about the design, but I am stuck with the entire library design.
I have thought of below solutions:
I can create a thread for each parameter (i.e. each thread will monitor single parameter).
I can create a process for each parameter (i.e. each process will monitor single parameter).
I can classify the various parameters into the various groups, like data usage parameter will fall into network group, CPU memory usage parameter will fall into the system group, and then will create thread for each group.
Now 1st solution is looking good as compare to 2nd. If I am adopting 1st solution then it may fail when I want to upgrade my library for 100 to 1000 parameters. Because I have to create 1000 threads at that time, which is not good design (I think so; if I am wrong correct me.)
3rd solution is good, but response time will be high since many parameters will be monitored in single thread.
Is there any better approach?
In general, it's a bad idea to spawn threads 1-to-1 for any logical mapping in your code. You can quickly exhaust the available threads of the system.
In .NET this is very elegantly handled using thread pools:
Thread vs ThreadPool
Here is a C++ discussion, but the concept is the same:
Thread pooling in C++11
Processes are also high overhead on Windows. Both designs sound like they would ironically be quite taxing on the very resources you are trying to monitor.
Threads (and processes) give you parallelism where you need it. For example, letting the GUI be responsive while some background task is running. But if you are just monitoring in the background and reporting to a server, why require so much parallelism?
You could just run each check, one after the other, in a tight event loop in one single thread. If you are worried about not sampling the values as often, I'd say that's actually a benefit. It does no help to consume 50% CPU to monitor your CPU. If you are spot-checking values once every few seconds that is probably fine resolution.
In fact high resolution is of no help if you are reporting to a server. You don't want to denial-of-service-attack your server by doing a HTTP call to it multiple times a second once some value triggers.
NOTE: this doesn't mean you can't have a pluggable architecture. You could create some base class that represents checking a resource and then create subclasses for each specific type. Your event loop could iterate over an array or list of objects, calling each one successively and aggregating the results. At the end of the loop you report back to the server if any are out of range.
You may want to add logic to stop checking (or at least stop reporting back to the server) for some "cool down period" once a trap hits. You don't want to tax your server or spam your logs.
You can follow below methodology:
1.You can have two threads one thread is dedicated to measure emergency parameter and second thread monitors non emergency parameter.
hence response time for emergency parameter will be less.
2.You can define 3 threads.First thread will monitor the high priority(emergency parameter).Second thread will monitor the intermediate priority parameter. and last thread will monitor lowest priority parameter.
So overall response time will be improved as compared to first solution.
3.If response time is not concern then you can monitor all the parameters in single thread.But in this case response time becomes worst when you upgrade your library to monitor 100 to 1000 parameters.
So in 1st case there will be more response time for non emergency parameter.While in 3rd case there will be definitely very high response time.
So solution 2 is better.

When I update/insert a single row should it lock the entire table?

I have two long running queries that are both on transactions and access the same table but completely separate rows in those tables. These queries also perform some update and inserts based on those queries.
It appears that when these run concurrently that they encounter a lock of some kind and it’s preventing the task from finishing and locks up when it goes to update one of the rows. I’m using an exclusive row lock on the rows being read and the lock that shows up on the process is a lck_m_ix lock.
Two questions:
When I update/insert a single row does it lock the entire table?
What can be done to work around this sort of issue?
Typically no, but it depends (most often used answer for SQL Server!)
SQL Server will have to lock the data involved in a transaction in some way. It has to lock the data in the table itself, and the data any affected indexes, while you perform a modification. In order to improve concurrency, there are several "granularities" of locking that the server might decide to use, in order to allow multiple processes to run: row locks, page locks, and table locks are common (there are more). Which scale of locking is in play depends on how the server decides to execute a given update. Complicating things, there are also classifications of locks like shared, exclusive, and intent exclusive, that control whether the locked object can be read and/or modified.
It's been my experience that SQL Server mainly uses page locks for changes to small portions of tables, and past some threshold will automatically escalate to a table lock, if a larger portion of a table seems (from stats) to be affected by an update or delete. The idea is that it is faster to lock a table (one lock) than obtaining and managing thousands of individual row or page locks for a big update.
To see what is happening in your specific case, you'd need to look at the query logic and, while your stuff is running, examine the locking/blocking conditions in sys.dm_tran_locks, sys.dm_os_waiting_tasks or other DMV's. You would want to discover what exactly is getting locked by what step in each of your processes, to discover why one is blocking the other.
The short version:
No
Fix your code.
The long version:
LCK_M_IX is an intent lock, meaning the operation will place an X lock on a subordinate element. Eg. When updating a row in a table, the operation table takes an IX lock on the table before locking X the row being updated/inserted/deleted. Intent locks are common strategy to deal with hierarchies, like table/page/row, because the lock manager cannot understand the physical structure of resources requested to be locked (ie. it cannot know that an X-lock on page P1 is incompatible with an S-lock on row R1 because R1 is contained in P1). For more details, see Lock Modes.
The fact that you are seeing contention on intent locks means you are trying to obtain high level object locks, like table locks. You will need to analyze your source code for the request being blocked (the one requesting the lock incompatible with LCK_M_IX) and remove the cause of the object level lock request. What that means will depend on your source code, I cannot know what you're doing there. My guess is that you use an erroneous lock hint.
A more general approach is to rely on SNAPSHOT ISOLATION. But this, most likely, will not solve the problem you're seeing, since snapshot isolation can only benefit row level contention issues, not applications that request table locks.
A frequent aim of using transactions: keep them as short and sweet as possible. I get the sense from your wording in the question that you are opening a transaction, then doing all kinds of things, some of which take a long time. Then expecting multiple users to be able to run this same code concurrently. Unfortunately, if you perform an insert at the beginning of that set of code, then do 40 other things before committing or rolling back, it is possible that that insert will block everyone else from running the same type of insert, essentially turning your operation from free-for-all to serial.
Find out what each query is doing, and if you are getting lock escalations that you wouldn't expect. Just because you say WITH (ROWLOCK) on a query doesn't mean SQL Server will be able to comply... if you are touched multiple indexes, indexed views, persisted computed columns etc. then there are all kinds of reasons why your rowlock may not hold any water. You also might have things later in the transaction that are taking longer than you think, and maybe you don't realize that the locks on all of the objects involved in the transaction (not just the statement that is currently running) can be held for the duration of the transaction.
Different databases have different locking mechanisms, but ones like SQL Server and Oracle have different types of locking.
The default on SQL Server appears to be pessimistic Page locking - so if you have a small number of records then all of them may get locked.
Most databases should not lock when running a script, so I'm wondering whether you're potentially running multiple queries concurrently without transactions.

Architecting a Work Item Processing System with Modified FIFO Semantics in Windows

I’m building a system that generates “work items” that are queued up for back-end processing. I recently completed a system that had the same requirements and came up with an architecture that I don’t feel is optimal and was hoping for some advice for this new system.
Work items are queued up centrally and need to be processed in an essentially FIFO order. If this were the only requirement, then I would probably favor an MSMQ or SQL Server Service Broker solution. However, in reality, I need to select work items in a modified FIFO order. A work item has several attributes, and they need to be assigned in FIFO order where certain combinations of attribute values exist.
As an example, a work item may have the following attributes: Office, Priority, Group Number and Sequence Number (within group). When multiple items are queued for the same Group Number, they are guaranteed to be queued in Sequence Number order and will have the same priority.
There are several back-end processes (currently implemented as Windows Services) that pull work times in modified FIFO order given certain configuration parameters for the given service. The service running Washington, DC is configured to process only work items for DC, while the service in NY may be configured to process both NY and DC items (mainly to increase overall throughput). In addition to this type of selectivity, higher priority items should be processed first, and items that contain the same “Group Number” must be processed in Sequence Number order. So if the NY service is working on a DC item in group 100 with sequence 1, I don’t want the DC service to pull off DC item in group 100 sequence 2 because sequence 1 is not yet complete. Items in other groups should remain eligible for processing.
In the last system, I implemented the queues with SQL tables. I created stored procedures to submit items and, more importantly, to “assign” items to the Windows Services that were responsible for processing them. The assignment stored procedures contain the selection logic I described above. Each Windows Service would call the assignment stored procedure, passing it the parameters that were unique to that instance of the service (e.g. the eligible offices). This assignment stored procedure stamps the work item as assigned (in process) and when the work is complete, a final stored procedure is called to remove the item from the “queue” (table).
This solution does have some advantages in that I can quickly examine the state of these “queues” by a simple SQL select statement. I’m also able to manipulate the queues easily (e.g. I can bump priorities with a simple SQL update statement). However, on the downside, I occasionally have to deal with deadlocks on these queue tables and have the burden of writing these stored procedures (which gets tedious after a while).
Somehow I think that either MSMQ (with or without WCS) or Service Broker should be able to provide a more elegant solution. Rolling my own queuing/work-item-processing system just feels wrong. But as far as I know, these technologies don’t offer the flexibility that I need in the assignment process. I am hoping that I am wrong. Any advice would be welcome.
It seems to me that your concept of an atomic unit of work is a Group. So I would suggest that you only queue up a message that identified a Group Id, and then your worker will have to go to a table that maps Group Id to 1 or more Work Items.
You can handle your other problems by using more than one queue - NY-High, NY-Low, DC-High, DC-Low, etc.
In all honesty, though, I think you are better served to fix your deadlock issues in your current architecture. You should be reading the TOP 1 message from your queue table with Update Lock and Read Past hints, ordered by your priority logic and whatever filter criteria you want (Office/Location). Then you process your 1 message, change it's status or move it to another table. You should be able to call that stored procedure in parallel without a deadlock issue.
Queues are for FIFO order, not random access order. Even though you are saying that you want FIFO order, you want FIFO order with respect to a random set of variables, which is essentially random order. If you want to use queues, you need to be able to determine order before the message goes in the queue, not after it goes in.

Resources