is there any stock management design pattern? - database

We want to design an e-commerce application, and we are mental about consitent stock numbers. We don't want our customers finding out, after they have bought an item, that that item is out of stock, that's a big thing here. The average order here has about 60 different items, which will makes things even trickier.
Let's imagine these two scenarios:
1st Scenario:
1) Customer C1 opens the online store and find a product he/she wants to buy;
2) That product is shown as "in stock" (but the current stock is 1);
3) Customer C1 puts 1 item in the basket;
4) Customer C2 gets into the website and select the same item (put in the basket), which is still marked as "in stock" (stock is still 1);
5) Customer C1 goes to checkout and confirms his purchase and the application decreases the current stock for that item to 0;
6) Customer C2 keeps buying items, let's say 35 other distinct items (it took 20 minutes to customer c2 to select the items he wanted);
7) Customer C2 goes to checkout and confirms this purchase, but now, the first item he bought is no longer available (and we CAN NOT sell it);
8) The application warns customer C2 that the first item is no longer available and that he has to check his basket;
9) Customer C2 gets pissed and close the browser without buying anything.
2nd scenario (but I think it is unnecessarily complex and buggy):
1) Customer C1 opens the online store and find a product he/she wants to buy;
2) That product is shown as "in stock" (but the current stock is 1);
3) Customer C1 puts 1 item in the basket (and the application decreases the current stock for that item to 0);
4) Customer C2 gets into the website and see the item he/she wanted is out of stock;
5) Customer C2 leaves the website;
6) Customer C1 keeps buying items (the stock decreases for it of these items);
7) Customer C1 closes the browser;
8) Every now and then some batch routine kicks in to remove the items which had decreased the stock but didn't get bought/confirmed.
We have just a few distinct products, but we have been selling about 30.000.000 items by phone, some products get sold as much as 2.000.000 every day, so the concurrency in the row responsible for the stock of that product might get many updates at the same time, so it's important we get a good performance.
Those are usual scenario, but is there any design pattern which gives the user a better experience while keeping the stock numbers consistent and yet yield a great application performance?
Any help will be much appreciated.
Cheers

First off, taking a step back, do you really need to solve the inventory management problem on the front end? Since you're selling large volumes of a relatively small set of products, it should be relatively easy to manage your inventory so that you are never out of stock or, if you are, it doesn't prevent you from fulfilling orders. There is a great deal of literature and examples that deal with calculating safety stock which requires just a bit of statistics to follow. It would make far more sense to me to focus your attention on giving the company the tools (if it doesn't already have them) to manage their inventory to prevent stock-out situations rather than trying to prevent them from happening in the sales portal.
That being said, I'm not quite sure that I follow your problem with the two scenarios you outline. Even if the database performance was flawless, if you have only 1 of item A in stock and you can't sell an item if it's not in stock, then one of the two customers, by definition, one of the two potential customers is going to lose out. If in the first scenario C2 is going to go away without buying anything if any of his 35 items are not in stock (which seems unlikely if he spent 20 minutes filling his cart), there is nothing you can do in the database to prevent that. Your interface could potentially have some AJAX that alerts them while they're shopping that one of the items in their cart is out of stock much like StackExchange notifies you while you're entering an answer that someone else has entered an answer. It's not at all clear to me, that telling C2 about the problem earlier is going to be beneficial-- if he's going to leave if he can't buy all 35 items in one transaction, he's going to leave no matter when you tell him that C1 bought the item. Realistically, there is no way to design the system so as not to disappoint one of the two customers in that case.
It might help if you can explain a bit more about why your application and your customers are so sensitive to stock-out situations. Most customers and most retailers are relatively accustomed to the fact that sometimes after placing an order they get notified that the retailer isn't going to be able to fulfill the order as quickly as they had expected and are given the option to cancel that part of their order, the whole order (assuming the remaining items haven't shipped yet), or to wait for the item to come back in to stock. Assuming that you do something to notify customers while they're browsing that inventory is relatively low (i.e. Amazon will tell you "N items in stock" if you're looking at an item for which they only have a handful of stock left), most customers are reasonably understanding 20 minutes when they get to the checkout and are told that the item is now out of stock since they knew in advance that they needed to order quickly. And most retailers are comfortable that even if they run out of stock of most popular items, they can still satisfy more requests than they have inventory in hand because they undoubtedly have new inventory arriving in the next day or two or they can rush an order for new inventory.

Your 1st scenario is what most companies do, and that's why stock management systems have the concept of a back order.
Your 2nd scenario is more beneficial to the customer, but will reduce your sales somewhat, as well as be more complicated to manage.
This really isn't a database decision. This is a management decision on how you want to handle your inventory.
Most relational databases supported with sufficient hardware can handle 2 million changes a day.

You could try to find out how other online retailers do it, and emulate them. For example, when Amazon is almost out of a product, they'll often display a notice saying, "Only n left in stock!" Try to find a product like that, then add it to your cart in one browser and use a different browser to see what happens to the inventory.

I am with Justin and Gilbert. This is more about logistics than front-end. There is also the amazon solution of saying "I want all these things shipped in the same packet", (i.e. that will take longer, as all the bits have to wait for the slowest one) or "send them separately, as soon as they are available". Basically, you give yourself time to restock.
I think the most infuriating scenario is booking airline/ferry tickets, and when you get to the paying part, they either time out, do "not have that price available anymore" or some such nonsense. Particularly annoying, as i is not exactly buying cruise boat propellers..
You could do a kan-ban routine, where you basically say that when you have 10 (or whatever) left of something, it is shown as "1 item remaining" in front end. That means that customerA and customerB buying at the same time both gets their stuff. And then the warning goes to procurement within the company: "we are "out" of objectN".
I would be interested in knowing what kind of stuff your client is selling, that most customer buys 60 objects.

Related

Life Time Value Model (LTV)... where do I even start?

I recently was picked to lead a longitudinal LTV model for our analytics dept. The final deliverable will be for external stakeholders, so essentially how the users on our platform (can't specify the company) are providing life time value to our external partners.
We'll be building this model from the ground up. We have nothing in place for this currently, just a sea of data (assume very generic assets, e.g. users, sign ups, user interaction with platform, etc.)
So... where do I even start? I've just been reading random docs on google for the time being. Any specific resources that are good? Are there different LTV methodologies? What's the "best" one (please take that with a grain of salt)?
I know this is an extremely broad topic so any answers even loosely related to LTV will hold significant value. Thanks all
I haven't tried anything yet. Just reading up on a few resources.
First thing you want to do is lay out the reasoning for having LTV. What's it's gonna be used for and by whom. I'll give some examples, but your industry and your business will have to have it tailored to them.
Next, you have series of meetings with all the stakeholders so that they would agree on a good definition for LTV under a tight guidance of someone who understands the data, or at least what dimensions have to influence it and what format it has to be in.
An example would be: you have an app that offers seven products. The first two products are freebies. Another requires an email to get. The fourth product is just one buck per month, the fifth costs a hundred, but one-time payment, the sixths is 20$/month and the final product is an enterprise/b2b level solution.
An arbitrary model would be to have something like:
No products (guests) => LTV = 0
Product 1 => LTV + 1
Product 2 => LTV + 1
Product 3 => LTV + 3
Product 4 => LTV + 10/month of subscription
Product 5 => LTV + 1000
Product 6 => LTV + 200/month of subscription
Product 7 => LTV + 10k/month of subscription
Then the LTV stakeholders, - mainly business owners and PMs refine the model depending on what kinds of analysis they need conducted typically. That basically depends on what and how they report to their executives or the board.
This is if you want to go with a simple integer as an LTV. Most commonly used for weighting users. Going with integer is a very comfortable starting point since it allows for easy mathematical aggregations. Just to make your user-based analysis more robust. Say, you found out that 2% of your users encounter certain issue that blocks them from navigating somewhere or finishing a process. How should it be prioritized? Should it just be ignored? Should it be addressed immediately?
Well that depends on who those users are. If they're just free users or even just guests and the error is not blocking them from product onboarding, then it's worth to get the ticket to the backlog, but realistically it won't get released any time soon if ever.
However, if those users are enterprise customers, then the issue not only has to be hotfixed. It has to be hotfixed immediately. Probably paying overtime to the devs, qa and devops to work till late today.
Generally, LTV should be a user-level dimension. There are implementations of it as a session-level, but it's way more difficult.
From the technical standpoint, LTV is most commonly implemented on the tracking stage, so commonly in a TMS, say, GTM by a tracking specialist.
Another way it's implemented is in or after ETL, by the data engineers or data scientists.

Customer Deduplication in Booking Application

We have a booking system where dozens of thousands of reservations are done every day. Because a customer can create a reservation without being logged in, it means that for every reservation a new customer id/row is created, even if the very same customer already have reserved in the system before. That results in a lot of customer duplicates.
The engineering team has decided that, in order to deduplicate the customers, they will run a nightly script, every day, which checks for this duplicates based on some business rules (email, address, etc). The logic for the deduplication then is:
If a new reservation is created, check if the (newly created) customer for this reservation has already an old customer id (by comparing email and other aspects).
If it has one or more old reservations, detach that reservation from the old customer id, and link it to a new customer id. Literally by changing the customer ID of that old reservation to the newly created customer.
I don't have a too strong technical background but this for me smells like terrible design. As we have several operational applications relying on that data, this creates a massive sync issue. Besides that, I was hoping to understand why exactly, in terms of application architecture, this is bad design and what would be a better solution for this problem of deduplication (if it even has to be solved in "this" application domain).
I would appreciate very much any help so I can drive the engineering team to the right direction.
In General
What's the problem you're trying to solve? Free-up disk space, get accurate analytics of user behavior or be more user friendly?
It feels a bit risky, and depends on how critical it is that you get the re-matching 100% correct. You need to ask "what's the worst that can happen?" and "does this open the system to abuse" - not because you should be paranoid, but because to not think that through feels a bit negligent. E.g. if you were a govt department matching private citizen records then that approach would be way too cavalier.
If the worst that can happen is not so bad, and the 80% you get right gets you the outcome you need, then maybe it's ok.
If there's not a process for validating the identity of the user then by definition your customer id/row is storing sessions, not Customers.
In terms of the nightly job - If your backend system is an old legacy system then I can appreciate why a nightly batch job might be the easiest option; that said, if done correctly and with the right architecture, you should be able to do that check on the fly as needed.
Specifics
...check if the (newly created) customer
for this reservation has already an old customer id (by comparing
email...
Are you validating the email - e.g. by getting users to confirm it through a confirmation email mechanism? If yes, and if email is a mandatory field, then this feels ok, and you could probably use the email exclusively.
... and other aspects.
What are those? Sometimes getting more data just makes it harder unless there's good data hygiene in place. E.g. what happens if you're checking phone numbers (and other data) and someone does a typo on the phone number which matches with some other customer - so you simultaneously match with more than one customer?
If it has one or more old reservations, detach that reservation from
the old customer id, and link it to a new customer id. Literally by
changing the customer ID of that old reservation to the newly created
customer.
Feels dangerous. What happens if the detaching process screws up? I've seen situations where instead of updating the delta, the system did a total purge then full re-import... when the second part fails the entire system is blank. It's not your exact situation but you are creating the possibility for similar types of issue.
As we have several operational applications relying on that data, this creates a massive sync issue.
...case in point.
In your case, doing the swap in a transaction would be wise. You may want to consider tracking all Cust ID swaps so that you can revert if something goes wrong.
Option - Phased Introduction Based on Testing
You could try this:
Keep the system as-is for now.
Add the logic which does the checks you are proposing, but have it create trial data on the side - i.e. don't change the real records, just make a copy that is what the new data would be. Do this in production - you'll get a way better sample of data.
Run extensive tests over the trial data, looking for instances where you got it wrong. What's more likely, and what you could consider building, is a "scoring" algorithm. If you are checking more than one piece of data then you'll get different combinations with different likelihood of accuracy. You can use this to gauge how good your matching is. You can then decide in which circumstances it's safe to do the ID switch and when it's not.
Once you're happy, implement as you see fit - either just the algorithm & result, or the scoring harness as well so you can observe its performance over time - especially if you introduce changes.
Alternative Customer/Session Approach
Treat all bookings (excluding personal details) as bookings, with customers (little c, i.e. Sessions) but without Customers.
Allow users to optionally be validated as "Customers" (big C).
Bookings created by a validated Customer then link to each other. All bookings relate to a customer (session) which never changes, so you have traceability.
I can tweak the answer once I know more about what problem it is you are trying to solve - i.e. what your motivations are.
I wouldn't say that's a terrible design, it's just a simple approach of solving this particular problem, with some room for improvement. It's not optimal because the runtime of that job depends on the new bookings that are received during the day, which may vary from day to day, so other workflows that depend on that will be impacted.
This approach can be improved by processing new bookings in parallel, and using an index to get a fast lookup when checking if a new e-mail already exists or not.
You can also check out Bloom Filters - an efficient data structure that is able to tell you if an element is not in a given set.
The way I would do it is to store the bookings in a No-SQL DB table keyed-off the user email. You get the user email in both situations - when it has an account or when it makes a booking without an account, so you just have to make a lookup to get the bookings by email, which makes that deduplication job redundant.

What storage mechanism can I use to store the data related to user interaction of my website for a day

I store information about which items were accessed. That's it initially. I will store the id and type of item that were accessed. For example in a relational table it would be.
id type view
1 dairy product 100
2 meat 88
Later on, in the end of the day, I will transfer this data to the actual table of the product.
products
id name view
1 Cheesy paradise 100
This is a web site, I don't want to update the table everytime the user visits a product. Because the products are in relational database and it would be very unprofessional. I want to make a service in Nodejs that when the user visits a product and stay for 5 secs and roll the page to the bottom I increment a high speed storage and in the end of the day I updated the related products in "one go".
I will handle only 300 visits in diferent products a day. But, of course, I want to my system to grow and it will handle keeping track of 1 thousand of products per minute, for example. In my mind when I though about this feature I thought about using Mongo. But I don't know it seems so much for this simple task. What tecnology can fit this situation better?
I would recommend MongoDB, since you are mostly "dumping" data into a database. That also allows you in the future to dump more information then you will now, no matter what kind of documents you dump now. Mongo is totally fine for a "dump" database structure.

Finding unique products (never seen before by a user) in a datastore sorted by a dynamically changing value (i.e. product rating)

been trying to solve this problem for a week and couldn't come up with any solutions in all my research so I thought I'd ask you all.
I have a "Product" table and a "productSent" table, here's a quick scheme to help explain:
class Product(ndb.Model):
name = ndb.StringProperty();
rating = ndb.IntegerProperty
class productSent(ndb.Model): <--- the key name here is md5(Product Key+UUID)
pId = ndb.KeyProperty(kind=Product)
uuId = ndb.KeyProperty(kind=userData)
action = ndb.StringProperty()
date = ndb.DateTimeProperty(auto_now_add=True)
My goal is to show users the highest rated product that they've never seen before--fast. So to keep track of the products users have seen, I use the productSent table. I created this table instead of using Cursors because every time the rating order changes, there's a possibility that the cursor skips the new higher ranking product. An example: assume the user has seen products 1-24 in the db. Next, 5 users liked product #25, making it the #10 product in the database--I'm worried that the product will never be shown again to the user (and possibly mess things up on a higher scale).
The problem with the way I'm doing it right now is that, once the user has blown past the first 1,000 products, it really starts slowing down the query performance. Because I'm literally pulling 1,000+ results, checking if they've been sent by querying against the productSent table (doing a keyName lookup to speed things up) and going through the loop until 15 new ones have been detected.
One solution I thought of was to add a repeated property (listProperty) to the Product table of all the users who have seen a product. Or if I don't want to have inequality filters I could put a repeated property of all the users who haven't seen a product. That way when I query I can dynamically take those out. But I'm afraid of what happens when I have 1,000+ users:
a) I'll go through the roof on the limit of repeated properties in one entity.
b) The index size will increase size costs
Has anyone dealt with this problem before (I'm sure someone has!) Any tips on the best way to structure it?
update
Okay, so had another idea. In order to minimize the changes that take place when a rating (number of likes) changes, I could have a secondary column that only has 3 possible values: positive, neutral, negative. And sort by that? Ofcourse for items that have a rating of 0 and get a 'like' (making them a positive) would still have a chance of being out of order or skipped by the cursor--but it'd be less likely. What do y'all think?
Sounds like the inverse, productNotSent would work well here. Every time you add a new product, you would add a new productNotSent entity for each user. When the user wants to see the highest rated product they have not seen, you will only have to query over the productNotSent entities that match that user. If you put the rating directly on the productNotSent you could speed the query up even more, since you will only have to query against one Model.
Another idea would be to limit the number of productNotSent entities per user. So each user only has ~100 of these entities at a time. This would mean your query would be constant for each user, regardless of the number of products or users you have. The creation of new productNotSent entities would become more complex, though. You'd have to have a cron job or something that "tops up" a user's collection of productNotSent entities when they use some up. You also may want to double-check that products rated higher than those already within the user's set of productNotSent entities get pushed in there. These are a little more difficult and well require some design trade-offs.
Hope this helps!
I do not know your expected volumes and exact issues (only did a quick perusal of your question), but you may consider using Json TextProperty storage as part of your plan. Create dictionaries/lists and store them in records by json.dump()ing them to a TextProperty. When the client calls, simply send the TextProperties to the client, and figure everything out on the client side once you JSON.parse() them. We have done some very large array/object processing in JS this way, and it is very fast (particularly indexed arrays). When the user clicks on something, send a transaction back to update their record. Set up some pull or push queue processes to handle your overall product listing updates, major customer rec updates, etc.
One downside is higher bandwidth going out of you app, but I think this cost will be minimal given potential processing savings on GAE. If you structure this right, you may be able to use get_by_id() to replace all or most of your planned indices and queries. We have found json.loads() and json.dumps() to be very fast inside the app, but we only use simple dictionary/list structures.This approach will be, though, a big, big quantum measure lower than your planned use of queries. The other potential issue is that very large objects may run into soft memory limits. Be sure that your Json objects are fairly simple+lightweight to avoid this (e.g. do no include product description, sub-objects, etc. in the Json item, just the basics such as product number). HTH, -stevep

Invoicing database design

I created an application few days ago that deals with invoicing. I would like to know how to best integrate a discount to my invoices. Should I put it as a negative item (in the invoice_items table) or should I create a "discount" column in the invoice table ?
I would have it as a negative-valued item. The reasons are:
With invoicing, it's very important that the calculated value remains contant forever; even if your calculation formula later changes, you can correctly reproduce any given invoice. This is even true if the value was incorrectly calculated at the time - it was what it was.
Having a value amount means that manual adjustments for exceptional circumstances is easily handled - eg, your marketing manager/accountant may decide to give a one-off discount of $100 because of a late delivery. This is trivial with negative values - just add another row, but difficult/hassle with discount rates
You can have multiple discount amounts per invoice
It's totally flexible - it has its own space to exist and be whatever it needs to be. In fact, I would make the discount another "product" (maybe even multiple products - one for each distinct discount reason, eg xmas, coupon, referral, etc.
With its own item, you can add a reason description just like any other "product" - eg "10% discount for paying cash" or whatever
You don't need any special code or database columns! Just total items up as before and print them on the invoice. "There is no spoon (discount)": It's just another line item - what could be more simple than no code/db changes required?
Not all items should be discounted - eg refunds, returns, subscriptions (if applicable). It becomes too complicated and it's unnecessary to represent the business logic of discounts in the database. Leave the calculation etc in the app code, store the result in the db
Having its own item means the calculation can be arbitrarily complex. This means no db maintenance as the complexity grows. It's a whole lot easier to maintain/alter code than it is to maintain/alter a database
Finally, I successfully built an invoicing system, and I took the "item" approach and it worked really well
What consequences would either of those choices have for you down the road? For example, would you like to have multiple discounts, or very specified discounts later on? If there will only be one discount per invoice, then I wouldn't make it any more complicated than need be. In my opinion it's easier and clearer to have it in the invoice table - having it as a negative item will make the processing of items more difficult, I think.
I fully agree with making it as simple as possible, but one thing to consider is if any item should be exempted from the discount? In that case you need to add a bool field in the details to remember which line should have discount.

Resources