I do descriptive analytics and reporting at a company that sells a wide range of products. We record sales transactions and everytime an item is sold, the following is recorded:
Customer ID (each customer has a unique ID)
Product ID (each product has a unique ID)
Sale date
(Other fields are recorded too - location of purchase, quantity, payment type, etc.)
We sell a few big ticket items, and what I'm wondering is if it's possible to predict whether a customer will buy one of the big ticket items based on their purchase history, using transactional data as described above. We have about 2 million rows of sales data spanning seven years, and in that time maybe 14,000 big ticket items have been sold to 5,000 out of 50,000 customers.
I use SQL Server 2008 R2 which has the data mining feature. I did some brief reading on it but can't figure out what model would be best, or if it's something that's even doable. Can someone point me in the right direction to get started?
Not sure if the SQL server data mining feature is useful. I took a look at the one for SQL 2012 and decided it wasn't.
As for your prediction, this would be a supervised learning problem (just pick any simple algorithm), where each customer is a row and your features would be the different products. Your positive labels then would then be the rows of customers that had bought big ticket items.
what you are looking for is called sequential pattern mining and the specific technique that you are looking for is called discrete event prediction. However with that being said, I don't think you will be able to do what you want to do with an out of the box solution on sql server.
Related
I'm writing a simple transactional database to practice my T-SQL skills.
If I sell an umbrella in my sales.orderdetails table and it's getting the current retailprice of that umbrella from the items table and putting it in the invoice, how do I keep from having incorrect historical report data 6 months from now when I jack up the retail price of the umbrella by $10?
How do i store that umbrella sold price in the orderdetails table so it's unaffected by any changes in the items table in the future?
I know you can use an SCD for a datawarehouse for this kind of issue but was wondering how to do it in an OLTP system. Computed persisted column? Can't seem to get that to work in the object explorer when I try to enter the items.retailprice as the computed value for the salesorderdetails.cost column.
The way I have seen this done in the past, without using a technique like SCD, was to have the order detail have the price that was charged and then use a foreign key to another table, possibly products or productprices, that contains the current price.
In a full-on transactional system, you'd want the order detail row to record full retail (MSRP, or what have you), current price (in case you had the item posted at a discount that day), and price charged (in case the customer used a promo/coupon code to reduce the price themselves). Unless you log all three, you're at the mercy of whatever the price changes to tomorrow or next week or next year, which makes for bad analytics.
You probably also want to capture current cost of goods, too, since that's subject to change over time, especially in an average costing scenario. Otherwise, margin calculations will be suspect.
But then, yes, a foreign key or keys to any other descriptive tables for those less ephemeral characteristics of the product.
I am building a new business application for my personal business which has close to ~100 transactions of sale and purchase per day. I am thinking of having Separate tables to record the sale and purchase with another linked table for Items that were sold and a seperate linked table with items that were purchased.
Example:
**SaleTable**
InvoiceNo
TotalAmt
**SaleTableDetail**
LinkedInvNo
ProductID
Quantity
Amount
etc.,
would this design be better or would it be more efficient to have one transactiontable with a column stating sale or purchase?
-From an App/Database/Query/Reporting Perspective
An invoice is not the same as a sales order. An invoice is a request for payment. A sales order is an agreement to sell products to a party at a price on a date.
A sales order is almost exactly the same as a purchase order, except you are the customer, and a sales order line item can reference a purchase order line item. You can put them in separate tables, but you should probably use Table Inheritance (CTI, extending from an abstract Order). Putting them in the same table with a "type" column is called Single Table Inheritance and is nice and simple.
Don't store totals in your operational db. You can put them in your analytic db though (warehouse).
You are starting small, thats a quick way to do. But, I am sure, very shortly you will run into differences between sale and purchase transactions, some fields will describe only a sale and some fields that will be applicable only for purchases.
In due course, you may want to keep track of modifications or a modification audit. Then you start having multiple rows for the same transaction with fields indicating obsoletion or you have to move history records to another table.
Also, consider the code-style-overhead in all your queries, you got to mention the Transaction Type as sale or purchase for simple queries.
It would be better to design your database with a model that maps business reality closest. At the highest level, everything may abstract to a "transaction", with date, amount and some kind of tag to indicate amount is paid or received against what context. That shouldn't mean we can have a table with Tag, Date, Amount, PayOrReceive to handle all the diverse transactions.
I has a structural performance question.
I am preparing a book selling web site. I has books, authors, categories ect. I want to give users some discounts. But these discount can be about book, category, author and even can be about user. I am storing discounts at 'discounts' table and storing types. When people list products, i want to show discounts too. If i join discounts table and product table and check for total discount for each record that is a huge performance issue for listing. Do you have any suggestion for this and something like this situations?
(Note: I can put discount into product table and calculate it daily with a sql job but i has more tables with that problem. Discounts is just a sample and i wanna learn alternative ways for something like these issues.)
Sorry for my bad english, thanks.
If i join discounts table and product table and check for total
discount for each record that is a huge performance issue for listing.
Ah. No. Tried 10 years ago for an online shop we created and we figured out that with some simple caching (shop products, with all elements together) in memory (we could live showing 5 minutes outdated prices in case of a change) plus output cache of the non-super-simple way we could mitigate pretty much all the performance impact.
Make a clear database. Then make a smart architecture on top of it. Not every item must all the time be queried from the database.
I'm trying to make the database system point of sale, however I am confused between the entity and product inventory entity. What are the differences between product and inventory?
I know that the inventory should control the amount of product available .... but i have all that in products.
product code
name
description
cost
unit price
Subcategory code
brand code
amount available
Minimum quantity for rehearing
state
tax code
weight
amount wholesales
wholesales price
perishable
due date
creation date
upgrade date
what i should have in inventory? I have researched and according to what I read I need to have the product, the description, the quantity, purchase price, sale price, profit or gain and date of the transactions. But almost everything is in the Products table, what should I do?
A Product is an abstract Good or Service. A Good is the specification of an Asset.
Example "2014 Mazda 3" is a good. A "2014 Mazda 3 with VIN 12345" is an Asset.
A Catalog is the list of products that you want to sell. They don't need to exist yet, or you could be selling them for someone else.
Items Held For Sale are assets that you keep around to sell. These could be consigned (owned by someone else).
Inventory is an accounting concept. It is the dollar value of the items held for sale that you own, plus inbound and outbound goods that you're responsible for, plus any costs associated with holding that inventory.
You can track the value of inventory in a variety of ways such as FIFO and LIFO
I think you can store the inventory in the products table. There will certainly be transaction tables for purchases for the products and sales and even adjusting records (when items get count and the number differs from what's stored in the database), but you can easily work with the stock stored in the production table itself, thus not having to scan the whole database and sum up all purchases and sales and corrections every time (and never being able to delete old transaction data from the database, as that would invalidate the calculations).
However there are reasons to have stock stored in an inventory table instead. For instance if you want to store different statusses, e.g. you have 100 pieces in store plus twenty just arrived and still unchecked. Or you have a store with goods plus a warehouse housing additional stock. Or you have charges (different model numbers for example for a slightly altered product) which you offer as the same product, but still want to know how many old and how many new ones are in stock. And so on.
So make your mind up, if you want to store additional data with product stock, which would result in an 1:n relation instead of 1:1 which you have now.
first of all sorry for my bad english hehehe I need some help, I want to design a database for a website, like a mini Amazon. This database will manage every kind of products (TV, cars, computers, books, videogames, penciles, tables, pants...), but also, each product must have some properties (that will be indexed) for example, if the product is a book, the properties will be something like genre, year, author. If the product is a TV, the properties will be something like size, color, also year. And if the product is a car, the properties will be something like year, color, model, for example. So, this is my idea:
One table to manage departments (like electronics, books...)
One table to manage categories of the departments, this table will be a child of the previous. If the department is electronics, here will be audio, tv and video, games... (each category belongs to one department, the relationship is one department to many categories)
One table to manage the products (each product belongs to one category, the relationship is one category to many products)
One table to manage properties (like year, color, genre, model...)
One table to engage products with properties, this table will be called ProductProperties
Im not sure if this is the best way, the database will be huge, I will develop the database on MySQL. But, I think this is not the best way, this article talks about "Database Abstraction: Aggregation and Generalization" http://cs-exhibitions.uni-klu.ac.at/index.php?id=433, in other words generic objects (I think), but this way is old (70s). In this article http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/ in the section "One table to hold all domain values" says that this is a wrong way... Im saying all of this because of the table ProductProperties, I dont know if I make this table or if I make especific tables for each kind of products.
Do you have any suggestion? Or do you have a better idea?
Thanks in advance, take care!!!
1.One table to manage departments (like electronics, books...)
2.One table to manage categories of the departments, this table will be a
child of the previous. If the
department is electronics, here will
be audio, tv and video, games... (each
category belongs to one department,
the relationship is one department to
many categories)
Why? One table, categories, forming a hierarchy. More flexible.
3.One table to manage the products (each product belongs to one category,
the relationship is one category to
many products)
Why? Allow m:n here. A product in many categorries.
Im not sure if this is the best way,
the database will be huge
Ah - no. Sorry. Nontrivial, yes. Hugh? No. Just to get you an idea of hugh - I have a db I am adding 1.2 billion rows PER DAY to a specific table. On average. THIS is big. YOu end up with what - 100.000 items? not even worth mentioning.
Pablo89, the description of what you want is very close to what the AdventureWorks database for SQL Server does. There are many examples of using AdventureWorks on the Web from web applicatons to reporting to BI.
Download and install SQL Server Express 2008 R2. Download and install the sample database for the above product. Inspect the database design for AdventureWorks.
Use AdventureWorks as examples in questions you may post.
I use AdventureWorks because I use SQL Server. I do not say it is better than other database products I say this because I know AdventureWorks.
I do not think that some database can work fast with 500,000,000 items. Complete tree of products categories for amazon.com contains 51,000 nodes (amazoncategories.info). Also the data is updated hourly, so saved product information can be incorrect. I think the optimal way is to store categories tree only get the product data at runtime using Amazon's API.