I has a structural performance question.
I am preparing a book selling web site. I has books, authors, categories ect. I want to give users some discounts. But these discount can be about book, category, author and even can be about user. I am storing discounts at 'discounts' table and storing types. When people list products, i want to show discounts too. If i join discounts table and product table and check for total discount for each record that is a huge performance issue for listing. Do you have any suggestion for this and something like this situations?
(Note: I can put discount into product table and calculate it daily with a sql job but i has more tables with that problem. Discounts is just a sample and i wanna learn alternative ways for something like these issues.)
Sorry for my bad english, thanks.
If i join discounts table and product table and check for total
discount for each record that is a huge performance issue for listing.
Ah. No. Tried 10 years ago for an online shop we created and we figured out that with some simple caching (shop products, with all elements together) in memory (we could live showing 5 minutes outdated prices in case of a change) plus output cache of the non-super-simple way we could mitigate pretty much all the performance impact.
Make a clear database. Then make a smart architecture on top of it. Not every item must all the time be queried from the database.
Related
I'm writing a simple transactional database to practice my T-SQL skills.
If I sell an umbrella in my sales.orderdetails table and it's getting the current retailprice of that umbrella from the items table and putting it in the invoice, how do I keep from having incorrect historical report data 6 months from now when I jack up the retail price of the umbrella by $10?
How do i store that umbrella sold price in the orderdetails table so it's unaffected by any changes in the items table in the future?
I know you can use an SCD for a datawarehouse for this kind of issue but was wondering how to do it in an OLTP system. Computed persisted column? Can't seem to get that to work in the object explorer when I try to enter the items.retailprice as the computed value for the salesorderdetails.cost column.
The way I have seen this done in the past, without using a technique like SCD, was to have the order detail have the price that was charged and then use a foreign key to another table, possibly products or productprices, that contains the current price.
In a full-on transactional system, you'd want the order detail row to record full retail (MSRP, or what have you), current price (in case you had the item posted at a discount that day), and price charged (in case the customer used a promo/coupon code to reduce the price themselves). Unless you log all three, you're at the mercy of whatever the price changes to tomorrow or next week or next year, which makes for bad analytics.
You probably also want to capture current cost of goods, too, since that's subject to change over time, especially in an average costing scenario. Otherwise, margin calculations will be suspect.
But then, yes, a foreign key or keys to any other descriptive tables for those less ephemeral characteristics of the product.
I have been studying datawarehouse in the last couple days, particularly, i have been reading The Data Wharehouse Toolkit - The Definitive Guide to Dimensional Modeling by Kimball and Ross.
Uppon that reading, i came to the 1st exapmle where there is a sales fact and it related to a product dimension, as you can see in the bellow image:
I think i can grasp the gist of how this relationship allows us to rotate the "cube" slicing and dicing data, however this is where i get lost:
In this example and many others on the web product is a one-to-one relationship with sales, which is fine i guess for most cases. But this generates a sales registtry for at least each kind of product that was in one sale.
So supposing i bought 1 banana, 2 apples and 1 orange, this would yield at least 3 sales registry. Again, which is fine i guess as it is storing the sale's ticket ID in the sales fact, we still can relate all itens in a given sale.
However if this was an use case: relate products on sales say i want to get every sale that had a banana and get stuff like: how many items each of these sales had, their price cost, their profit, stuff like that...
Wouldn't be better if the fact-product relation were Fact-one_to_many-Product relationship? Where fact would hold the sale's ticket ID and products would have its foreign key referencing where they are from or something?
I reckon these metrics should be in the fact table, and not in the product table as i think i would want. So, is this me not fighting my urge to normalize it or does it make sense in the way i would want to do that kind of filtering -> [given all sales with X product, get data from other products in the same sale].
If i were to follow the guidelines, product dimension would have one registry for every exclusive kind of product the store would have correct? And all the measurements i want i would store it on the fact itself, like price cost, sales price, profit, etc...
On the other hand, if i were to one-to-many product dimension would have many copies of each product. Which is bad, i think. However, i think it would give me better queries in that regard.
As you can see, i'm a beginer and really in the early stages of this path, so if you would endulge me in a Explain Like I'm Five kind of answer I would appreciate.
EDITED:
Sorry #Nick.McDermaid, you are right. I meant from the perspective of the sales fact where for every sale fact i will have only one product, but are correct that for one product it can have N sales related. And so, we have one record of product in the database for every different product on our store. This is the right way to do it, how to rightfully model it. Also, the many indicator is the "sales quantity" i'm guessing.
Anyhow, while this allows for slicing and dicing when/if we have sales as the point of view, but what if i want to for example:
Get all sales that had a banana in it, with all the other items in those sales. We can still do it with this structure but its harder than if the products were repeated and we had the sale id as a foreign key in the product table.
Cuz ultimetly i want to get all the sales(and products within that sale) that had a banana. And then take metrics out of them.
What you are somewhat hinting at would be a degenerate dimension, consisting of the sales id/invoice #/purchase order # of the transaction that took place. The whole purpose of a degenerate dimension is to group items that are related by a meaningless piece of data. For example, a PO # of A1234 is meaningless on its own, it doesn't tell you anything about the purchase. However, it can be used to identify other meaningful data, such as the date of purchase of the products for the customer. In that context, the PO # is defined by the collection of the entities it brings together to describe an event.
Another critical concept in data-warehousing is the abstraction of the schema in the database from the model in the cube. You don't join and group data in a cube model. You slice and filter. There are no foreign keys in a cube model. Those are used in the underlying data schema, but all of that work is handled behind the scenes of the cube model.
I do descriptive analytics and reporting at a company that sells a wide range of products. We record sales transactions and everytime an item is sold, the following is recorded:
Customer ID (each customer has a unique ID)
Product ID (each product has a unique ID)
Sale date
(Other fields are recorded too - location of purchase, quantity, payment type, etc.)
We sell a few big ticket items, and what I'm wondering is if it's possible to predict whether a customer will buy one of the big ticket items based on their purchase history, using transactional data as described above. We have about 2 million rows of sales data spanning seven years, and in that time maybe 14,000 big ticket items have been sold to 5,000 out of 50,000 customers.
I use SQL Server 2008 R2 which has the data mining feature. I did some brief reading on it but can't figure out what model would be best, or if it's something that's even doable. Can someone point me in the right direction to get started?
Not sure if the SQL server data mining feature is useful. I took a look at the one for SQL 2012 and decided it wasn't.
As for your prediction, this would be a supervised learning problem (just pick any simple algorithm), where each customer is a row and your features would be the different products. Your positive labels then would then be the rows of customers that had bought big ticket items.
what you are looking for is called sequential pattern mining and the specific technique that you are looking for is called discrete event prediction. However with that being said, I don't think you will be able to do what you want to do with an out of the box solution on sql server.
I am new to Dimension Modeling and I am working on doing a dimension model for a university. The current business process that I have picked up is actually sales/revenue. I have been reading different chapters of different books and although I think I have a good understanding of facts and dimensions I am having some tough time fitting the sales process on to the paper.
Ideally the sales process in the school is similar to other businesses where students are customers and the product is the "courses" they take. However in certain situation there are different product types and I don't know how to fit the product type. For example student pays an Application fee, late fee or transcript request fee which is not associated with any course. How do I fit these different type of revenue streams in my star?
What I have done so far is like this
Sales_FACT
====
Date_Key_FK
Product_Key_FK
Campus_Key_FK
Student_Key_FK
ChargeCredit_SKU
Amount
Product_Key
------
Product_Key_PK
SectionID
AcademicYear
AcademicTerm
AcademicSession
CourseCode
CourseName
ProductType???
Now for certain type of products (e.g. a transcript request fee) - I do not have the coursename,code, year term,session -- I am struggling how this will work.
Anyone has any input on this? or any helpful material/schema examples will definately appreciate them
Thanks,
You will find a plenty of those cases in future. Generally, you are experiencing this problem because you are mixing two different types of 'product' in your case.
It can be resolved logically or technically.
During the ETL process, in cleansing step, you can rewrite your null fields with sql (nvl, coalesce, CASE WHEN) with something related to field
nvl(CourseCode, 'No Course Code') as CourseCode
Then, when you group by ProductType, and CourseName you shoud get something like this:
ProductType CourseName sum(Amount)
------------------------------------------
AppFee Course1 345.13
AppFee Course4 8901.00
TranscriptFee No Course Name 245.99
Or, you can put it in separate tables. Even that is contradictory to your business process (can't have different products in fact row), sometimes terms you want to merge (i.e ApplicationFee and TranscriptFee) have many different grouping levels which is often too hard to map.
Edit:
No, snowflake make sense when there exists big dimension tables, high cardinality, many levels, as well as many to many relationship (i.e movies, categories). In your case good idea is to follow ERP/CRM database design, because it's current working solution. If there is no such reporting possibilty you want, you can make more generic dimension table:
Product-Service Dimension
--------------------------------------------
SurogateKey
NaruralKey
Type(Product/Sevrice/Other)
Level1(ProductType/ServiceType)
Level2(ProductSubType/ServiceSubType)
Level3
Level4
Attribute1
Attribute2
I would be grateful if somebody could help me to find an elegant solution to this database design problem. There is a company with a lot of different products (P1,P2,P3,P4) and a lot of customers (C1, C2, C3, C4). Now they have a simple database table to deal with orders, something like
20101027 C2 P1 qty status
20101028 C1 P2 qty status
Now I would like to create groups of products ( eg. (P1+P3+p4) and (P2+P3)) that could be purchase together for a reduced price. What is the best way to represent such groups in a database system? Dealing with these groups as individual products doesn't work, because I need the functionality of replacing, adding or removing products from the groups. So I need to keep the currently given table of products.
Thanks for reading. I hope I will get some help.
Add a new table product_group_promotions, with an ID, name and discount price. Then create a table product_group_promotions_products that links products to product group promotions. This will contain a product group ID and a product ID. This way, you can place one product in multiple groups, and let groups contain multiple products (of course).
Jan's answer is correct but incomplete.
You'll also need start and end dates of the promotion. You'll probably want to enter next week promotions so they are ready but not apply them until appropriate.
Discount price may not be enough either. You also will need to get business rules from business people as to how to apply the discount. It could be a percentage or a free item or a fixed amount. If a percentage do you distribute the discount evenly, proportionally, on the cheapest product, the most expensive? If a free item, which one in the set is free. It could also be a fixed amount, $10 off if you buy x, y, and z. Is the discount applied more than once. If someone buys 5x of P2 and P3 do they get the discount on all of them or just the first ones. Is there a limit over a time period. As in the past example, if you don't give me the discount on all 5, I would just fill out 5 orders of 1 each and get the discount you were trying to prevent. If so you'd have to go back through previous purchase by that customer to see if they've received that discount.
You can see how ugly this can get. I would clarify with the business EXACTLY what they plan to use this new feature for and run through these use cases with them.
As Q, asked, if the basket of purchased items is large enough there could be more than one discount possible. Do you have to determine what to give, do you present a list of choices back to the UI... and the recalculate.
This is why I have mercy on department stores who screw this stuff up. It's not simple.
On the surface this is simple sounding but in reality it's very complex.
"Dealing with these groups as individual products doesn't work, because I need the functionality of replacing, adding or removing products from the groups."
Here are several things you need to look at:
Current Inventory vs Past Orders
How do you deal with Price Changes on P1, P2, P3
How do you handle adding a new product or products to an existing group
How do you handle removing a product or products from an existing group
In my opinion you need two sets of tables.
Tables that make up your current inventory
Tables that record what customers purchased (historical data tables)
If you need to reconstruct a customers purchase from six months ago, you can not rely on the current data givin the fact that a grouping may not look the same today as it did six months ago. So, if you do not already have a set of historical data tables for customer records then, I recommend you create them.
With a set of historical data tables that house what was bought by the customer you can pretty much do what every you want to the current invetory data. Change prices, regroup products, make products obsolete, Temporarily suspend a product, etc.