In database design, one order can have multiple products, which is no problem. But one product can always be only in one order, isn't it? For example, how can the same iPhone be placed in two different orders?
I think it should be a one to many relationship, instead of many to many. What's wrong with the logic?
Say you are an e-commerce company and have 70000 IPhones in stock.
According to the design you are suggesting, in your "Product" DB table you would store 70000 rows (1 row for each IPhone). This way each "Order" will reference a unique IPhone. Do you see how quickly your db would have a data explosion.
What you should do instead is store the general characteristics of an IPhone (name, price, brand etc) in 1 row of the table and to keep track of the inventory count using a column called "inventory"
Something like this below (Note you can improve this design below further but this will be sufficient to understand the many-to-many logic here)
Product
-------------
id | name | brand | price | inventory
1 | IPhone | Apple | 600 | 70000
Now for every order that has an IPhone in it, the order will reference the same row of the product table. Hence 1 product will be in multiple orders.
Let me know if you need any more clarification. Cheers!
the answer is that one order can have many products and also one product can belongs to many orders
as an example:
you can order many books from an E-commerce website and any book can be ordered many times
Product-Order many to many relationship means that in one order u can order many products e.g. you can buy many products at once and seller will give you one receipt (one order). And the opposite site is one product type can be sealed many times e.g. store ordered (import) iPhone and sealed it. It turns out that the same product has many orders.
For more information have look at Many-to-many (data model) in Wikipedia.
Hope i helped you!
Related
I have found org tables to be very powerful and useful. I feel like I have movement, table restructuring and basic formulas down fairly well. But I am having a difficult time wrapping my head around how I should structure this for tracking large collections. Not sure if I can do this in one table or if I need multiple tables.
Say I have a business that buys and sells trading cards. There are baseball, basketball and football cards. I want to track purchase price, sale price, purchase date, sale date, average sale price, last sale price, quantity in stock, and item condition for every card sold or in stock.
Is it possible to do this in a single table or do I need multiple tables?
I'd like to track statistics such as:
"What is the average price of all football cards sold in the last six months?"
"In the last month, did I buy more basketball cards or baseball cards?
And for a more lengthy example:
"Last year I sold 4 Mickey Mantle cards. 2 in Mint condition, 1 in Excellent condition, 1 in Poor condition and 1 unsold. What percentage of Mint Mickey Mantle cards were sold last year?"
To reiterate, in org-mode can all this be accomplished within a single table? How would it be structured if say, you knew Tops only made 2000 unique cards in a particular year, would table only contain 2000 rows? (plus the header)
If it can't be accomplished in a single table, I'm just going to use a postgres database structured much like the one mentioned here. I was really hoping there was a snazzy way to do this with org-table alone. But it looks like there are other ways to manipulate databases within emacs.
Sorry if most of this sounds like a high school math problem with no code but I'm sure most people (at least here) know what a single org table with the mentioned columns and a finite set of rows would look like.
Edit1: Can org references be used to link tables together to help get the results I'm looking for?
Edit2: The reason why I thought this was possible in org-mode, was because I did not think a foreign key was necessary. Here is a very similar example not using a foreign key. When reading about construction of spreadsheets in org-mode, foreign keys seemed to be the only obvious hurdle. Anyone have thoughts on this?
I have been studying datawarehouse in the last couple days, particularly, i have been reading The Data Wharehouse Toolkit - The Definitive Guide to Dimensional Modeling by Kimball and Ross.
Uppon that reading, i came to the 1st exapmle where there is a sales fact and it related to a product dimension, as you can see in the bellow image:
I think i can grasp the gist of how this relationship allows us to rotate the "cube" slicing and dicing data, however this is where i get lost:
In this example and many others on the web product is a one-to-one relationship with sales, which is fine i guess for most cases. But this generates a sales registtry for at least each kind of product that was in one sale.
So supposing i bought 1 banana, 2 apples and 1 orange, this would yield at least 3 sales registry. Again, which is fine i guess as it is storing the sale's ticket ID in the sales fact, we still can relate all itens in a given sale.
However if this was an use case: relate products on sales say i want to get every sale that had a banana and get stuff like: how many items each of these sales had, their price cost, their profit, stuff like that...
Wouldn't be better if the fact-product relation were Fact-one_to_many-Product relationship? Where fact would hold the sale's ticket ID and products would have its foreign key referencing where they are from or something?
I reckon these metrics should be in the fact table, and not in the product table as i think i would want. So, is this me not fighting my urge to normalize it or does it make sense in the way i would want to do that kind of filtering -> [given all sales with X product, get data from other products in the same sale].
If i were to follow the guidelines, product dimension would have one registry for every exclusive kind of product the store would have correct? And all the measurements i want i would store it on the fact itself, like price cost, sales price, profit, etc...
On the other hand, if i were to one-to-many product dimension would have many copies of each product. Which is bad, i think. However, i think it would give me better queries in that regard.
As you can see, i'm a beginer and really in the early stages of this path, so if you would endulge me in a Explain Like I'm Five kind of answer I would appreciate.
EDITED:
Sorry #Nick.McDermaid, you are right. I meant from the perspective of the sales fact where for every sale fact i will have only one product, but are correct that for one product it can have N sales related. And so, we have one record of product in the database for every different product on our store. This is the right way to do it, how to rightfully model it. Also, the many indicator is the "sales quantity" i'm guessing.
Anyhow, while this allows for slicing and dicing when/if we have sales as the point of view, but what if i want to for example:
Get all sales that had a banana in it, with all the other items in those sales. We can still do it with this structure but its harder than if the products were repeated and we had the sale id as a foreign key in the product table.
Cuz ultimetly i want to get all the sales(and products within that sale) that had a banana. And then take metrics out of them.
What you are somewhat hinting at would be a degenerate dimension, consisting of the sales id/invoice #/purchase order # of the transaction that took place. The whole purpose of a degenerate dimension is to group items that are related by a meaningless piece of data. For example, a PO # of A1234 is meaningless on its own, it doesn't tell you anything about the purchase. However, it can be used to identify other meaningful data, such as the date of purchase of the products for the customer. In that context, the PO # is defined by the collection of the entities it brings together to describe an event.
Another critical concept in data-warehousing is the abstraction of the schema in the database from the model in the cube. You don't join and group data in a cube model. You slice and filter. There are no foreign keys in a cube model. Those are used in the underlying data schema, but all of that work is handled behind the scenes of the cube model.
I have an existing database that models all products, a company is either producing or consuming. The database is quite simple:
Table: companies {PK: company_id}
+------------+--------------+
| company_id | company_name |
+------------+--------------+
Table: products {PK: product_id}
+------------+--------------+---------------+
| company_id | product_id | product_price |
+------------+--------------+---------------+
Now, if I need to add location information to it, it starts to get complicated.
Basically, now a company has many locations and each location has many products.
To further complicate matters, some attributes of the product e.g. price may not be the same at each location. I would like to share other common attributes at all locations (Basically, I want to avoid creating three copies of product A that's used at all three locations).
I'm not sure what the best way to model this is. I can think of
Table: company_location
+------------+-------------+
| company_id | location_id |
+------------+-------------+
Table: location_product
+-------------+------------+
| location_id | product_id |
+-------------+------------+
But this design would not allow product attributes to change per location, without creating an entirely different product for each location. I also don't have a way to maintain a master product list per company.
Any help is appreciated.
PS: I'm using a postgreSQL database
The rules of normalization would tell you that you need your non-key attributes to depend on all of the key values (and nothing else).
If price is determined by:
- The company who makes it
- The location that sells it
- What the product actually is
Then that implies that PRICE needs a candidate key that specifies company, location and production.
The issue becomes what the relationships are between companies, products and locations. Also, what else do you know (what columns do you have) about these three kinds of things?
If they are all totally independent, for example, the products are commodities and don't depend at all on companies and the locations are independent distributors, which have nothing to do with either companies or what kind of products are sold there, then really a single three-way join is probably your best bet.
However, if there are some linkages between company, product and location, then you need to normalize these items out appropriately. At the end, you may still find yourself tempted to keep price as the only attribute in a three-way join. Alternatively, you may find that your data is actually more hierarchical (companies have locations which sell products that are fundamentally different in some meaningful way from similar products sold at other locations). In such a case the price might live on the leaf level of a tree structure.
It's really hard to say for sure what would work best for you without understanding your business rules better.
The bottom line is, you should aim for third normal form (3NF).
You probably want something like this:
I am designing a database in Sqlite that is intended to help facilitate the creation of predictive algorithms for FIRST Robotics Competition. On the surface, things looked pretty easy, but I'm struggling with a single issue: How to store past ratings of a team. I have looked over previous questions pertaining to how historical data is stored, but I'm not sure any of it applies well to my situation (though it could certainly be that I just don't understand it well enough).
Each team has an individual rating, and after each match that team participates in the rating gets revised. Now, there are several ways I could go about storing them, but none of them seem particularly good. I'll go through the ones that I have thought through, in no particular order.
Option 1:
Each Team has it's own table.It would include the match_id and the rating after the match was done, and could possibly also include the rating before. The problem is, that there would be bordering on 10,000 tables. I'm pretty sure that's inefficient, especially considering I believe that it's also unnormalized (correct me if I'm wrong).
Table Name: Team_id
match_id | rating_after
Option 2:
Historical rating ratings for each team or stored in the match table, and current ratings are stored in the team table. A simplified version of the team table looks like this:
Table : Team_list
team_id | team_name | team_rating
That isn't really the problem, the problem is with the historical data. The historical data would be stored with the match. Likely, it would be each teams rating before the match.
The problems I have with this one, are how tough of a search this will be to find the previous ratings. This comes from the structure of how FRC works. There are 3 teams on each side (forming what is known as an alliance) for a total of 6 teams. (These alliances are normally designated by the colors Red and Blue)
These alliances are randomly assigned ahead of time, and could include any team at the event, on either side.) In other words the match table would look like this (simplified):
Table: match_table
match_id | Red1 | Red2 | Red3 | Blue1 | Blue2 | Blue3 | RedScore | BlueScore | Red1Rating | Red2Rating | etc.....
So each team has to be included in the match info, as well as a rating for each team. If were to create more than one rating (such as an updated rating design that I want to do a pure comparison test with), things could get clogged really fast.
In order to find the previous rating for team # 67, for instance, I'd have to search Red1, Red2, Red3, Blue1, etc. and then look at the column that pertains to the position, all while being sure that this really is the most recent match.
Note: This might involve knowing not only the year of the data, the week it was taken in (I would get this data from a join with an event table), but the match level(whether it was qualifications or playoffs), and match #(which is not match_id).
Sure, this option is normalized, but it's also got a weird search pattern, and isn't easy from a front end standpoint(I might build a front-end for some of the data in the future, so I want to keep that in mind as well).
My question: Is there an easier/more efficient option that I am missing?
Because both designs feel somewhat inefficient. The first has too many tables, the other has a table that will have well over 100,000 entries and will have to be searched in a convoluted pattern. I feel as if there is some simple design solution that I simply haven't thought of.
There's only one sane answer:
team_rating:
team_id, rating, start_date, end_date
Making all ranges closed by using the creation date of the team as the first rating's start_date, and some arbitrarily distant future date (eg 2199-01-01) as the end_date for the current row. all dates being inclusive.
Queries to find the rating at any date are then a simple
select rating
from team_rating
where team_id = $id
and $date between start_date and end_date
and rating history is just
select start_date, rating
from team_rating
where team_id = $id
order by start_date
It's key that both start and end dates are stored, otherwise the queries are trainwrecks.
Right now, I am designing the database, as such I don't have any code. I am looking to use sql server, asp.net if that is relevant.
I have a big number of stores and a big number of products too, both in some thousands. For the same pId, prices may vary by sId. I would build it like this:
1. one "store" table containing fields (sId, name, location),
2. one "products" table containing fields (pId, name size, category, sub-category) and
3. "max(sId)" number of price tables containing fields (pId, mrp, availability).
where max(sId) is the total number of stores.
I would rather not make "max(pId)" number of tables containing fields (sId, mrp, availability) as I need to provide a UI to each store so that they can update the details about product prices and availability at their respective stores. I also need to display some products of a particular store but I never need to display some stores for any specific product. That is, search for stores by product is not required, but listing of products by store would be required.
Is this a good way or can I do better?
You appear to be on the right track and I will offer some recommendations. Although there is no requirement to display some stores for any specific products, you should always think about how the requirements will change and how your system can handle that. Build your system so that you can answer questions like these easily - What stores have product ABC priced under $3/piece?
Store table should contain, as you mentioned, information about stores. Take Aaron Bertrand's comment seriously. Name the fields in a way that the next developer can read and figure out what it is. User StoreID instead of sID.
StoreID StoreName ...other fields
------- --------------
1 North Chicago
2 East Los Angeles
Product table should contain information about products. It would be better to store category and sub-category into a different table.
ProductID ProductName ...other fields
--------- --------------
1 Bread
2 Soap
Categories can be located in its own table with hierarchal structure. See Hierarchal Data and how to use hierarchyid data type. This may help in finding out the depth of each top level category and help management decide if they are going overboard with categorization and making life miserable for everybody, including themselves unknowingly.
Many-to-many ProductCategory table can link products to categories. Also keep a history table. When a product's category is changed, keep track of what it was and what it is set to. It may help in answering questions such as - How many products were moved from Agriculture to Construction category in the last 6 months?
Many-to-many StoreProductPrice can bring together store and product and a price can be defined there. Also remember - prices may differ by customers also. Some customers may get discounts at a certain level. Although this may be too much to discuss here, it should be kept in the back of the mind in case a requirement to support customer discount structure comes up.
StoreProductID StoreID ProductID Price
-------------- ------- --------- -----
1 1 1 $4.00
2 1 2 $1.00
3 2 1 $4.05
4 2 2 $1.02
Availability of the product should be done through the inventory management database table(s). For example, you may have a master table of Warehouse and master table of Location. Bringing them together would be WearhouseLocation table. A WarehouseProduct table may bring together warehouse, product and units available.
Alternatively, your production or procurement facility might be dumping data into ProcuredProduct table. Your manufacturing unit might be putting locks on a subset of products while building something out of it. Your sales unit might be putting locks on a subset of products they are trying to sell. In other words, your products may be continually get allocated. You may run queries to find out availability of a certain product and that can be a little taxing. During any such allocation, the number of available units can be updated in a single table (which contains calculated available products that you can comfortably rely on).
So...depending on your customer's needs, the system you are building can get fairly complicated. I am recommending that you think about these things and keep your database structure flexible to anticipated changes. Normalization is a good thing, and de-normalization has its place also. Use them wisely.