system design for concurrent requests on a ecommerce platform

system design for concurrent requests on a ecommerce platform - database

Suppose you have 1 item in stock but at any instance 1 million requests came to purchase the product how will you design a system that will prevent selling of product more than once?

This is where the ACIDity of relation database comes into the picture. If one item will be sold more than once, then DB will go into an inconsistent state(for example, 10 items sold, 11 items purchased) and transaction query of the database will prevent this as per C(consistency) in ACID. In this case, one transaction will go through while all other transactions will be aborted.
Apart from this, you could also implement locking on the resource(optimistic vs pessimistic locking). For example, you could lock a particular itemId. You could either implement locking in the database layer or application layer.
Under pessimistic locking, one thread will lock the item and all other threads will be told that the item is sold.
In optimistic locking, all threads will lock the resource and one thread that goes through first(assume payment done for that item) will mark the item sold and all other threads will be told that item is sold at the very last moment.
This is a short answer, but hopefully, this gives you an idea.

The answer to this question depends on a lot of factors driven by the business & technical details.
There's a good discussion already on this topic here.
Also, if you're asking this from a Systems Design interview's perspective, then the interviewer is more interested in having a discussion of the sort mentioned in the link shared before than just expecting a "correct" answer.

Related

Mongodb atomic schema-planning

My program has users and groups.
Use-case: Users have credits they can give to the groups.
I want no credits to be lost.
The problem with RDBMS-like schema is the atomicity of Mongo is document-level. I reduce the credits of the user and then increase the credits of the group. If the application crashes between the two the credits, that are sent, become lost. (If I replace the two actions credits can appear without being withdrawn from the user.)
One solution could be using of a collection called transactions. Then the credits of the user can be calculated from the send credits of the user subtracted from the credits of the user. The credits of the group can be calculated as the sum of the sent credits. The problem is if the database grows the sum will take a while.
Can you give me any acceptable, scalable and robust solution for this with NoSQL? (I know it is really easy in RDBMS)

Two phase commits in MongoDB
You can perform multi-document updates or multi-document transactions using a two phase commit approach.
Using two-phase commit ensures that data is consistent and, in case of an error, the state that preceded the transaction is recoverable. During the procedure, however, documents can represent pending data and states.
For more details about Two phase commits you can refer to the docs here
NOTE
Because only single-document operations are atomic with MongoDB, two-phase commits can only offer transaction-like semantics. It is possible for applications to return intermediate data at intermediate points during the two-phase commit or rollback.

Appointment System, Making Reservation in a sql transaction

I am going to make an appointment system. There is still one question in my mind.
The common requirement is to avoid duplicate booking of the same slot for this kind of system. My question is the actual way to implement this.
Suppose I am going to do the booking part logic within an transaction (with its isolation level set to serializable), my proposed procedures would be:
1. Select data of current booking status.
2. Check if the slot is free to book. If yes, reserve the slot by doing insertion. If not, reserve other slot (with some logic behind for picking other slots).
My questions is, should I place a "XLock" in step 1 to avoid other transactions to know the booking status before the current transaction finished?
Because it seems that placing select with xlock is rare and introduce deadlocks. It could probably slow down the system. (This is what I researched from Web).
Thanks.

Is there an Entity Group Max Size?

I have an Entity that represents a Payment Method. I want to have an entity group for all the payment attempts performed with that payment method.
The 1 write-per-second limitation is fine and actually good for my use case, as there is no good reason to charge a specific credit card more frequently than that, but I could not find any specifications on the max size of an entity group.
My concern is would a very active corporate account hit any limitations in terms of number of records within an entity group (when they perform their 1 millionth transaction with us)?

No, there isn't a limit for the entity group size, all datastore-related limits are documented at Limits.
But be aware that the entity group size matters when it comes to data contention, see Keep entity groups small. Please note that contention is not only happening when writing entities, but also when reading them inside transaction (see Contention problems in Google App Engine) or, occasionally, maybe even outside transactions (see TransactionFailedError on GAE when no transaction).
IMHO your usage case is not worth the risk of dealing with these issues (fairly difficult to debug and address), I wouldn't use a single entity group in this case.

Entity group in transaction (contention)

I'm reading a book on GAE. In a chapter about transactions, it says:
Updating an entity in a group can potentially cancel updates to any other entity in the group by another process. You should design your data model so that entity groups do not need to be updated by many users simultaneously.
Be especially careful if the number of simultaneous updates to a
single group grows as your application gets more users. In this case,
you usually want to spread the load across multiple entity groups, and
increase the number of entity groups automatically as the user base
grows. Scalable division of a data resource like this is known as
sharding.
An often used example for an entity group is the message board, where the board is the ancestor of messages belonging to that board.
However, If updating a message (i.e. editing it) causes contention, and more often so as the userbase grows, isn't it bad design to create a huge group of messages with the board as its ancestor? The write rate of an entity group is limited to 1 per second. Does that mean any message within the board can be updated at most once per second?
Also, does merely adding an entity to a group (i.e. posting a new message) also count as "updating" and cause contention?

Yes, such design may be considered a bad one as it doesn't scale well with the number of users. I can't see a good reason for which messages would need the board as ancestor.
Yes, creating a new entity in a group counts as an entity group update and all updates can contribute to contention.
Side note: you might find this clarification useful: https://stackoverflow.com/a/39309022/4495081 (but for designs which have good reasons for using multi-entity groups).

Database: Are there any vendors support column level locking?

I'm studying about database mechanism and see that there are two mechanisms: table level locking and row level locking. I don't see column level locking and when I google, I see no document tell about this except this link: database locking. In this link:
A column level lock just means that some columns within a given row in
a given table are locked. This form of locking is not commonly used
because it requires a lot of resources to enable and release locks at
this level. Also, there is very little support for column level
locking in most database vendors.
So, which vendors support column level locking ? And can you tell me more detail, why column level locking requires a lot of resources than row level locking.
Thanks :)

A lock cannot, in and of itself, require anything. It's an abstract verb acting on an abstract noun. Why should locking a column cost any more than locking a byte, or a file, or a door? So I wouldn't put a lot of stock in your link.
The answer to your question lies in why locks exist -- what they protect -- and how DBMSs are engineered.
One of the primary jobs of a DBMS is to manage concurrency: to give each user, insofar as possible, the illusion that all the data belong to each one, all the time. Different parties are changing the database, and the DBMS ensures that those changes appear to all users as a transaction, meaning no one sees "partial changes", and no one's changes ever "step on" another's. You and I can both change the same thing, but not at the same time: the DBMS makes sure one of us goes first, and can later show who that was. The DBMS uses locks to protect the data while they're being changed, or to prevent them from being changed while they're being viewed.
Note that when we "want to change the same thing", the thing is a row (or rows). Rows represent the things out there in the real world, the things we're counting and tracking. Columns are attributes of those things.
Most DBMSs are organized internally around rows of data. The data are in memory pages and disk blocks, row-by-row. Locks in these systems protect row-oriented data structures in memory. To lock individual rows is expensive; there are a lot of rows. As an expedient, many systems lock sets of rows (pages) or whole tables. Fancier ones have elaborate "lock escalation" to keep the lock population under control.
There are some DBMSs organized around columns. That's a design choice; it makes inserts more expensive, because a row appears in several physical places (1/column), not neatly nestled between to other rows. But the tradeoff is that summarization of individual columns is cheaper in terms of I/O. It could be, in such systems, that there are "column locks", and there's no reason to think they'd be particularly expensive. Observe, however, that for insertion they'd affect concurrency in exactly the same way as a table lock: you can't insert a row into a table whose column is locked. (There are ways to deal with that too. DBMSs are complex, with reason.)
So the answer to your question is that most DBMSs don't have "columns" as internal structures that a lock could protect. Of those that do, a column-lock would be a specialty item, something to permits a certain degree of column-wise concurrency, at the expense of otherwise being basically a table-lock.