I am going to make an appointment system. There is still one question in my mind.
The common requirement is to avoid duplicate booking of the same slot for this kind of system. My question is the actual way to implement this.
Suppose I am going to do the booking part logic within an transaction (with its isolation level set to serializable), my proposed procedures would be:
1. Select data of current booking status.
2. Check if the slot is free to book. If yes, reserve the slot by doing insertion. If not, reserve other slot (with some logic behind for picking other slots).
My questions is, should I place a "XLock" in step 1 to avoid other transactions to know the booking status before the current transaction finished?
Because it seems that placing select with xlock is rare and introduce deadlocks. It could probably slow down the system. (This is what I researched from Web).
Thanks.
Related
I have two tables in DynamoDB. One has data about homes, one has data about businesses. The homes table has a list of the closest businesses to it, with walking times to each of them. That is, the homes table has a list of IDs which refer to items in the businesses table. Since businesses are constantly opening and closing, both these tables need to be updated frequently.
The problem I'm facing is that, when either one of the tables is updated, the other table will have incorrect data until it is updated itself. To make this clearer: let's say one business closes and another one opens. I could update the businesses table first to remove the old business and add the new one, but the homes table would then still refer to the now-removed business. Similarly, if I updated the homes table first to refer to the new business, the businesses table would not yet have this new business' data yet. Whichever table I update first, there will always be a period of time where the two tables are not in synch.
What's the best way to deal with this problem? One way I've considered is to do all the updates to a secondary database and then swap it with my primary database, but I'm wondering if there's a better way.
Thanks!
Dynamo only offers atomic operations on the item level, not transaction level, but you can have something similar to an atomic transaction by enforcing some rules in your application.
Let's say you need to run a transaction with two operations:
Delete Business(id=123) from the table.
Update Home(id=456) to remove association with Business(id=123) from the home.businesses array.
Here's what you can do to mimic a transaction:
Generate a timestamp for locking the items
Let's say our current timestamp is 1234567890. Using a timestamp will allow you to clean up failed transactions (I'll explain later).
Lock the two items
Update both Business-123 and Home-456 and set an attribute lock=1234567890.
Do not change any other attributes yet on this update operation!
Use a ConditionalExpression (check the Developer Guide and API) to verify that attribute_not_exists(lock) before updating. This way you're sure there's no other process using the same items.
Handle update lock responses
Check if both updates succeeded to Home and Business. If yes to both, it means you can proceed with the actual changes you need to make: delete the Business-123 and update the Home-456 removing the Business association.
For extra care, also use a ConditionExpression in both updates again, but now ensuring that lock == 1234567890. This way you're extra sure no other process overwrote your lock.
If both updates succeed again, you can consider the two items updated and consistent to be read by other processes. To do this, run a third update removing the lock attribute from both items.
When one of the operations fail, you may try again X times for example. If it fails all X times, make sure the process cleans up the other lock that succeeded previously.
Enforce the transaction lock throught your code
Always use a ConditionExpression in any part of your code that may update/delete Home and Business items. This is crucial for the solution to work.
When reading Home and Business items, you'll need to do this (this may not be necessary in all reads, you'll decide if you need to ensure consistency from start to finish while working with an item read from DB):
Retrieve the item you want to read
Generate a lock timestamp
Update the item with lock=timestamp using a ConditionExpression
If the update succeeds, continue using the item normally; if not, wait one or two seconds and try again;
When you're done, update the item removing the lock
Regularly clean up failed transactions
Every minute or so, run a background process to look for potentially failed transactions. If your processes take at max 60 seconds to finish and there's an item with lock value older than, say 5 minutes (remember lock value is the time the transaction started), it's safe to say that this transaction failed at some point and whatever process running it didn't properly clean up the locks.
This background job would ensure that no items keep locked for eternity.
Beware this implementation do not assure a real atomic and consistent transaction in the sense traditional ACID DBs do. If this is mission critical for you (e.g. you're dealing with financial transactions), do not attempt to implement this. Since you said you're ok if atomicity is broken on rare failure occasions, you may live with it happily. ;)
Hope this helps!
I must / have to create unique ID for invoices. I have a table id and another column for this unique number. I use serialization isolation level. Using
var seq = #"SELECT invoice_serial + 1 FROM invoice WHERE ""type""=#type ORDER BY invoice_serial DESC LIMIT 1";
Doesn't help because even using FOR UPDATE it wont read correct value as in serialization level.
Only solution seems to put some retry code.
Sequences do not generate gap-free sets of numbers, and there's really no way of making them do that because a rollback or error will "use" the sequence number.
I wrote up an article on this a while ago. It's directed at Oracle but is really about the fundamental principles of gap-free numbers, and I think the same applies here.
Well, it’s happened again. Someone has asked how to implement a requirement to generate a gap-free series of numbers and a swarm of nay-sayers have descended on them to say (and here I paraphrase slightly) that this will kill system performance, that’s it’s rarely a valid requirement, that whoever wrote the requirement is an idiot blah blah blah.
As I point out on the thread, it is sometimes a genuine legal requirement to generate gap-free series of numbers. Invoice numbers for the 2,000,000+ organisations in the UK that are VAT (sales tax) registered have such a requirement, and the reason for this is rather obvious: that it makes it more difficult to hide the generation of revenue from tax authorities. I’ve seen comments that it is a requirement in Spain and Portugal, and I’d not be surprised if it was not a requirement in many other countries.
So, if we accept that it is a valid requirement, under what circumstances are gap-free series* of numbers a problem? Group-think would often have you believe that it always is, but in fact it is only a potential problem under very particular circumstances.
The series of numbers must have no gaps.
Multiple processes create the entities to which the number is associated (eg. invoices).
The numbers must be generated at the time that the entity is created.
If all of these requirements must be met then you have a point of serialisation in your application, and we’ll discuss that in a moment.
First let’s talk about methods of implementing a series-of-numbers requirement if you can drop any one of those requirements.
If your series of numbers can have gaps (and you have multiple processes requiring instant generation of the number) then use an Oracle Sequence object. They are very high performance and the situations in which gaps can be expected have been very well discussed. It is not too challenging to minimise the amount of numbers skipped by making design efforts to minimise the chance of a process failure between generation of the number and commiting the transaction, if that is important.
If you do not have multiple processes creating the entities (and you need a gap-free series of numbers that must be instantly generated), as might be the case with the batch generation of invoices, then you already have a point of serialisation. That in itself may not be a problem, and may be an efficient way of performing the required operation. Generating the gap-free numbers is rather trivial in this case. You can read the current maximum value and apply an incrementing value to every entity with a number of techniques. For example if you are inserting a new batch of invoices into your invoice table from a temporary working table you might:
insert into
invoices
(
invoice#,
...)
with curr as (
select Coalesce(Max(invoice#)) max_invoice#
from invoices)
select
curr.max_invoice#+rownum,
...
from
tmp_invoice
...
Of course you would protect your process so that only one instance can run at a time (probably with DBMS_Lock if you're using Oracle), and protect the invoice# with a unique key contrainst, and probably check for missing values with separate code if you really, really care.
If you do not need instant generation of the numbers (but you need them gap-free and multiple processes generate the entities) then you can allow the entities to be generated and the transaction commited, and then leave generation of the number to a single batch job. An update on the entity table, or an insert into a separate table.
So if we need the trifecta of instant generation of a gap-free series of numbers by multiple processes? All we can do is to try to minimise the period of serialisation in the process, and I offer the following advice, and welcome any additional advice (or counter-advice of course).
Store your current values in a dedicated table. DO NOT use a sequence.
Ensure that all processes use the same code to generate new numbers by encapsulating it in a function or procedure.
Serialise access to the number generator with DBMS_Lock, making sure that each series has it’s own dedicated lock.
Hold the lock in the series generator until your entity creation transaction is complete by releasing the lock on commit
Delay the generation of the number until the last possible moment.
Consider the impact of an unexpected error after generating the number and before the commit is completed — will the application rollback gracefully and release the lock, or will it hold the lock on the series generator until the session disconnects later? Whatever method is used, if the transaction fails then the series number(s) must be “returned to the pool”.
Can you encapsulate the whole thing in a trigger on the entity’s table? Can you encapsulate it in a table or other API call that inserts the row and commits the insert automatically?
Original article
You could create a sequence with no cache , then get the next value from the sequence and use that as your counter.
CREATE SEQUENCE invoice_serial_seq START 101 CACHE 1;
SELECT nextval('invoice_serial_seq');
More info here
You either lock the table to inserts, and/or need to have retry code. There's no other option available. If you stop to think about what can happen with:
parallel processes rolling back
locks timing out
you'll see why.
In 2006, someone posted a gapless-sequence solution to the PostgreSQL mailing list: http://www.postgresql.org/message-id/44E376F6.7010802#seaworthysys.com
I am wondering how resource expensive is to perform a begin transaction on a connection and imediatelly update/insert a row into a database and letting this transaction hanging for several hours. Basically I just want to perform a "series number" reservation for my document management system. My series are something that are very custom and I want that whenever a user press the "Add new document" button, the next value will be allocated into my series allocation table. To allocate it I would insert a row into the allocation table. Next time, when a new user asks for the next value, will read using NOLOCK hint so that he will see my pending inserted value so that he will know the next value also. If the user cancels the form that adds a new document, I would simply perform a rollback over my opened connection. If the connection is lost and I am in "add" mode, then I would check if current transaction id on wich I allocated my series matches the current one. If not, then I would allocate another one. There is not a problem that a user lose a series due to connection lost. What do you think? I feel like it's a very bad practice because it is in contradiction with the ideea that I learned in my several years of software development: Open connection as late as possible and close it as soon as possible.
Thank you in advance!
I would considering using sequences. If they do not fit, I would do something like the following:
Have separate transactions to manage your "series numbers".
These transactions are very short and only do e.g. "get next number".
Have a "state" column to know whether something is in progress.
Lock the whole table to manage its contents.
Avoid NOLOCK. Avoid long running transactions.
Try to keep your transactions small as possible, get sequence number in a different transaction and then you can start your actual process this way there would be less connections waiting for your transactions in process.
You can also consider using Read Uncommitted or other isolation level for certain cases, like last week last month or yearly sale where data needed is already present or minor mimor error is acceptable.
Consider having proper indexes and properly sequenced joins in order to lower execution time.
I work in a company that uses single table Access database for its outbound cms, which I moved to a SQL server based system. There's a data list table (not normalized) and a calls table. This has about one update per second currently. All call outcomes along with date, time, and agent id are stored in the calls table. Agents have a predefined set of records that they will call each day (this comprises records from various data lists sorted to give an even spread throughout their set). Note a data list record is called once per day.
In order to ensure speed, live updates to this system are stored in a duplicate of the calls table fields in the data list table. These are then copied to the calls table in a batch process at the end of the day.
The reason for this is not obviously the speed at which a new record could be added to the calls table live, but when the user app is closed/opened and loads the user's data set again I need to check which records have not been called today - I would need to run a stored proc on the server that picked the last most call from the calls table and check if its calldate didn't match today's date. I believe a more expensive query than checking if a field in the data list table is NULL.
With this setup I only run the expensive query at the end of each day.
There are many pitfalls in this design, the main limitation is my inexperience. This is my first SQL server system. It's pretty critical, and I had to ensure it would work and I could easily dump data back to access db during a live failure. It has worked for 11 months now (and no live failure, less downtime than the old system).
I have created pretty well normalized databases for other things (with far fewer users), but I'm hesitant to implement this for the calling database.
Specifically, I would like to know your thoughts on whether the duplication of the calls fields in the data list table is necessary in my current setup or whether I should be able to use the calls table. Please try and answer this from my perspective. I know you DBAs may be cringing!
Redesigning an already working Database may become the major flaw here. Rather try to optimize what you have got running currently instead if starting from scratch. Think of indices, referential integrity, key assigning methods, proper usage of joins and the like.
In fact, have a look here:
Database development mistakes made by application developers
This outlines some very useful pointers.
The thing the "Normalisation Nazis" out there forget is that database design typically has two stages, the "Logical Design" and the "Physical Design". The logical design is for normalisation, and the physical design is for "now lets get the thing working", considering among other things the benefits of normalisation vs. the benefits of breaking nomalisation.
The classic example is an Order table and an Order-Detail table and the Order header table has "total price" where that value was derived from the Order-Detail and related tables. Having total price on Order in this case still make sense, but it breaks normalisation.
A normalised database is meant to give your database high maintainability and flexibility. But optimising for performance is one of the considerations that physical design considers. Look at reporting databases for example. And don't get me started about storing time-series data.
Ask yourself, has my maintainability or flexibility been significantly hindered by this decision? Does it cause me lots of code changes or data redesign when I change something? If not, and you're happy that your design is working as required, then I wouldn't worry.
I think whether to normalize it depends on how much you can do, and what may be needed.
For example, as Ian mentioned, it has been working for so long, is there some features they want to add that will impact the database schema?
If not, then just leave it as it is, but, if you need to add new features that change the database, you may want to see about normalizing it at that point.
You wouldn't need to call a stored procedure, you should be able to use a select statement to get the max(id) by the user id, or the max(id) in the table, depending on what you need to do.
Before deciding to normalize, or to make any major architectural changes, first look at why you are doing it. If you are doing it just because you think it needs to be done, then stop, and see if there is anything else you can do, perhaps add unit tests, so you can get some times for how long operations take. Numbers are good before making major changes, to see if there is any real benefit.
I would ask you to be a little more clear about the specific dilemma you face. If your system has worked so well for 11 months, what makes you think it needs any change?
I'm not sure you are aware of the fact that "Database design fundamentals" might relate to "logical database design fundamentals" as well as "physical database design fundamentals", nor whether you are aware of the difference.
Logical database design fundamentals should not (and actually cannot) be "sacrificed" for speed precisely because speed is only determined by physical design choices, the prime desision factor in which is precisely speed and performance.
accidentally some code that I'm writing is slowly turning into a DB system on its own, with incremental indexing, freeform "documents" (aka CouchDB kind) which can have arbitrary properties... annyyywaay... I decided to keep evolving it, mainly for educational purposes, and also to really tightly customize it just for my needs and keep it lean since Im not trying to make it useful for anyone but my own needs (how generous :) )...
anyway, I was wondering if anyone has oppinions/more info on how Mnesia implements transactions "under the hood".
Ulf W., I always appreciate your posts on the net so maybe you have some deeper info about this?
Mnesia uses a two-phase commit protocol to manage distributed transactions.
so I've been thinking a bit more about everything...
transaction locks could be kind of hacked on by having a "Lock" element in each tuple that represents the row of a table... that element would contain the Pid of the process thats holding the lock executing the current transaction (and that was spawned by the transaction manager)(or the Pid would be stored someplace else for efficiency reasons, the point is that there's a Pid per row). If another transaction wanted to write/read from a locked row the transaction manager would just not execute it, and leave it in the queue for later attempts (next time it tail recurses). I would have to think more about how Checkpoints would work as well... but overall Im starting to understand how things are structured, at least conceptually... its gonna be ugly ;)) and probably orders of magnitude slower than what Mnesia pulls off, but at least I'll learn plenty...
about distributed transactions... Im guessing that the transaction fun is sent over the wire to another node by converting it into a binary first and then reconstructing it on the other end... now, a question about that. Since the fun is a closure, say that Im using a variable in the fun thats bound outside the fun, with say a list of 10 elements, and then the the closure is passed in as a transaction that is to be executed on another node (transparently by the transaction manager) - Im assuming for the closure semantics to stand that the list with 10 elements would be sent as well as a part of the lexical environment that the closure "closes" over... am I missing something here? just thinking about how would one implemented distributed transactions...
thanks
As Mnesia is open source, you can have a look at the code itself. Similarly with CouchDB.