I am attempting to compress duplicates in semi-inconsistent data for better user experience and storage. Due to some natural (and important) inconsistency, combining entries may require an additional table to prevent loss of data. Additionally, a unique identifier must be made so the same (previously squashed) entries are not pulled down again. What is the best way to represent this on a relational database?
Context:
I am currently designing an application that takes information available on a dining hall menu (json format) and makes it rate-able for the user so they are notified when their favorite items are available. I was successful at directly extracting this data in-app, but eventually realized that the source had a bit of inconsistency between naming and ids, so that the same item (Burger, or Char-grilled Burger) at two dining halls would appear twice (sometimes different names). As such, I decided to create a database that would allow me to edit entries so the user would only have to rate once for both items.
Problem:
Tables:
What I eventually realized was that, due to the inconsistencies in the table, two separate items (food) would link to two different serving locations (courses), each at different dining halls (meal). For example, Burgers and Char-grilled Burger are stored as two separate rows even though they are identical aside from the name and the dining hall they are served at. When trying to combine two separate food items into one using my current schema, one of the serving locations would be lost. It is possible that I could introduce another column, but this is not good for scaling to additional dining halls or for memory use(as most items will have the same location consistently between dining halls ~ like burgers at the grill). I also thought about adding some kind of intermediary table, but I am not sure how to go about doing so without being redundant or taking up too much space. I also believe that the location - time combination in Meal must remain as the dining halls can close at different times. Additionally, since the items are being renamed, I must have some sort of unique identifier to prevent storing the original item again(outside data has ids but these are unique to each of the differently named items).
Current Schema:
Calendar
Links a date with the Meal and Food item
Example: On 07/07/2017, Cinnamon Rolls will be served at Breakfast # South
Fields: date, meal, food
Meal
Refers to the particular time and place the meal is served
Example: Breakfast # South, Dinner # North
Fields: id, name, location, starttime, endtime
Food
Information about a particular food item
Example: Cinnamon Roll, Burger, Char-grilled Burger
Fields: id, name, nutritionurl, course
Course
Refers to what section of the dining hall the item is served at
Example: Grill Station, Salad Bar, SDH Grill
Fields: name
Calendar has many rows but only stores information a week in advance, while the others contain cumulative data (food items will be added once they appear on the menu for the first time).
Example data input (from source):
"White Dinner Rolls" (Food, outside id: 57) are at "South Dining Hall" at "Dinner" (Meal) at the "SDH Grill" (Course) on a particular day
"Dinner Rolls" (Food, outside id:56) are at "North Dining Hall" at "Dinner" (Meal) at the "Salad Bar" (Course) on a particular day (could be same)
For my database, I would try and compress the two dinner roll entries as one food, but then with the current schema, one of the courses would be lost. Similarly, I also need to compress some of the Course entries as they specify the dining hall, even when both have the same section ("Grill" vs "SDH Grill").
How my question is different:
There may be a similar answer but my search has been unsuccessful so far (not sure about keywords to look for). As opposed to many others, my question relates specifically to the relationship between tables and how to efficiently work with semi-inconsistent data.
Related
I have found org tables to be very powerful and useful. I feel like I have movement, table restructuring and basic formulas down fairly well. But I am having a difficult time wrapping my head around how I should structure this for tracking large collections. Not sure if I can do this in one table or if I need multiple tables.
Say I have a business that buys and sells trading cards. There are baseball, basketball and football cards. I want to track purchase price, sale price, purchase date, sale date, average sale price, last sale price, quantity in stock, and item condition for every card sold or in stock.
Is it possible to do this in a single table or do I need multiple tables?
I'd like to track statistics such as:
"What is the average price of all football cards sold in the last six months?"
"In the last month, did I buy more basketball cards or baseball cards?
And for a more lengthy example:
"Last year I sold 4 Mickey Mantle cards. 2 in Mint condition, 1 in Excellent condition, 1 in Poor condition and 1 unsold. What percentage of Mint Mickey Mantle cards were sold last year?"
To reiterate, in org-mode can all this be accomplished within a single table? How would it be structured if say, you knew Tops only made 2000 unique cards in a particular year, would table only contain 2000 rows? (plus the header)
If it can't be accomplished in a single table, I'm just going to use a postgres database structured much like the one mentioned here. I was really hoping there was a snazzy way to do this with org-table alone. But it looks like there are other ways to manipulate databases within emacs.
Sorry if most of this sounds like a high school math problem with no code but I'm sure most people (at least here) know what a single org table with the mentioned columns and a finite set of rows would look like.
Edit1: Can org references be used to link tables together to help get the results I'm looking for?
Edit2: The reason why I thought this was possible in org-mode, was because I did not think a foreign key was necessary. Here is a very similar example not using a foreign key. When reading about construction of spreadsheets in org-mode, foreign keys seemed to be the only obvious hurdle. Anyone have thoughts on this?
This might be a stupid question but I have very little experience. I have encountered an issue where I am working with a Excel spreadsheet for a small factory.
It has a huge list of products that are grouped into families.
analogy: Corolla, Avensis, Landcruiser = Toyota
Furthermore the products have a list of tasks associated with them.
Corolla:
Step 1
Step 2
Step 3...
All products share tasks in the first few stages even across different families.
But some occur at a different stage during production
What may be step 6 in productX is step 5 in productY.
But productX and productY share 1-5. (And this is true across the board.
I have three questions.
Is it possible to polymorphically structure a database? Common tasks can be placed in the base class and get more specific (common for OO).
If it is not can you create a central database of unordered tasks and give some sort of priority to each database of a product and they give the tasks some order.
Final question is has anyone encountered such a problem? I have a feeling there has to be a design pattern to this. It feels like a solution is just beyond my grasp.
Edit 1. Spread sheet is mostly blank for time being. Worksheets are the product names. That string-integer combination are the product numbers. Values will be put in underneath i.e. Time/hr and the amount of product should be made in the time specified [
So, this is what I understood:
You need to store a mapping between products and tasks/steps. The latter should be stored in order that are to be performed.
Some initial tasks are always common for all products.
You'd like to structure your database 'polymorphically'. Since you didn't mention what kind of database you are using, I'll assume it to be a relational one.
You can create your tables so:
Product: each row stores data on one product. Primary key: product-name (or product-id, whatever)
Task: information on a task, such as time taken to finish it etc. Primary key: task-name/id.
ProductTaskMapping: contains mapping of what tasks are to be done what product, in order. Its schema will be as follows. You can also think of having the first two columns as foreign keys.
product-name- refers to the primary-key in Product table.
task-name- refers to the primary-key in Task table.
priority, or sequence-number
CommonTask: Two columns:
task-name
priority
Also, there's no way to define 'inheritance' between two tables.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Note: This is a rough copy i didnt include constraints, weak entities, ..., etc yet. I still need to have a solid understanding of this question.
Questions:
To keep track of what theater company manages performer, what performer is in two theatre companies do i have to make a unique code for each entity set in other entity sets to keep track of them?
Can start_Location simply point to Place for the theatre company entity?
Can an Actor be Born in a place or does it have to have a attribute that points to place?
Do my relationships so far make sense?
Are there any redundant attributes such as Short_Descript in Plays?
Can i make an attribute in Place called "Town, State/Department/Province"? Or does it have to be a composed attribute?
Please note: I will be editing and updating my diagram if I have more questions and such...
I would appreciate any suggestions or hints.
ERD:
Question Information:
An actor is born in a place and he/she lives presently in a place (this information is mandatory).
We store in the database only the last known place where the actor lives.
We need the following information for an actor: actor number, actor name , date when actor was born, and date when actor died (check if died > born).
An actor is a performer, or/and a theater director.
We store for performer the date when he/she started to perform.
We store for theater director the date when starts his/her last employment as theater director
We consider in DBActors the following types of plays: drama, comedy and tragedy.
For each we like to store the following data: play’s number , play’s title , play’s short description , year when it was written ,date when it was first presented on stage(p_date_p, date).
For dramas we store also the drama type,name of the main positive character, and name of main negative character.
The drama type is one of the following:
“classical”, “medieval”, “renaissance”, “nineteenth-century”, “modern”, and
“contemporary”
For comedies we store the comedy type, the name of main
character , and the name of the second character
The comedy type is one of the following: “ancient mroman”, “ancient greek”, “farce”, “comedy of humors”, “comedy of manners”,
“commedia dell’arte”, and “theater of absurd”;
For tragedies we store the tragedy type(t_type, varchar(20)),and name of main
character
The tragedy type is one of the following: “Greek”, “Roman”, “Renaissance”, “Neo
-classical”, and “Modern”
A play is written by one or many dramatists
It is possible that we do not know the dramatist for certain plays.
We store in the database all known plays even if they were not performed (“closet plays”)
Some actors are also dramatists.
We store in the database all known mdramatists.
An actor is hired by a unique theater company at any timestamp
He/she will stay in the same company the whole year when he/she was hired.
We store in the database the year when he/she was hired by the theater company
(small integer)
It is possible that the actor changes the theater company where he/she is
working during his/her life many times. It is possible that an actor is hired by the same company many times in different years. He/she can perform in
one or many plays (at least one)
which are presented by theater companies.
It is possible that an actor is hired by a theater company and performs in a play presented by another theater company.
It is unusual but possible that the same performer plays in the same play
presented by different theater companies. A theater company performs/presents
one or many plays every year.
Same play can be performed by one or many distinct theater companies.
We like to store in the database the date when the play starts to be performed
by a theater company.
It is possible that the same play is performed by different theater companies starting at same date.
We need to store for a dramatist his/her dramatist number,his/her name.
A dramatist wrote one or many plays(at least one).
The information to be stored in the database for each theater company
is: theater company number,theater company name , date when the
theater company started.
For each theater company we store in the database
the first location (place) where the theater company started
There might be more than one theater company starting in the same place.
A theater company must hire at least one actor.
Each theater company has a unique theater director.
He/she starts his/her work at a specific date.
It is possible that the same theater company has different theater directors but at distinct times and the same theater director manages different
theater companies in distinct times(never at the same date).
It is possible that the same theater director manages the same
theater company at different dates.
The information to be stored for place is: place number, town and state/department/province, place country
Here are my responses to your questions:
Whenever you look at two tables and see a Many to Many relationship, you can solve the problem easily using a linker table. Also known as a junction table “is a database table that contains common fields from two or more other database tables within the same database. It is on the many side of a one-to-many relationship with each of the other tables. Junction tables are known under many names, among them cross-reference table, bridge table, join table, map table, intersection table, linking table, many-to-many resolver, link table, pairing table, transition table, crosswalk, associative entity or association table.” Wikipedia example You saw me use these tables in your previous question. In this case you are stating that an actor can be managed my many Theater Companies and A Theater Company and also manage many Actors. This is a many to many so if you created a link table in between those tables for every relastionship between the two you’d add a new row in the link table that only contains a theater Company id and an actor id. If an actor was managed by many theater companies then you’d add several rows to the link table each holding the same actor id but each row having a different theater company’s id.
Yes, you can have start_Location point directly to place. This means that that Start_Location attribute must be a Foreign Key (FK) pointing the theater company to the Primary Key (PK) of the related Place record.
By all means an actor can be born in a place, but just like above, you need a column in Actor, that is a FK to the Place Table’s PK. You could call this column Birth_Place and all it’d hold is the PK of the record in Place that relates to the actor’s birth place. This column would also need to be NOT NULL because all actor’s need a Birth_Place.
So far it seems like your diagram will work to solve this problem, yes. Just see question 1’s answer for that follow up addition.
You’re getting good at removing redundancies. Your diagram looks good. The only suggestion, I’d make is why do you have a play table and then 3 separate play type tables? Why not add them together in on Table called Play. It’d sit exactly where Play currently sits in your diagram and contain the same attributes it already does, but you also add the following:
a. Type – Would be a string that you could place “Drama”, “Comedy”, or “Tradegy” in so you’d know exactly what type of play it is. Also this would allow you to add future play types to the plays table and not have to add a whole new table to the DB.
b. Sub_Type – Would also be a string and hold the type that you currently have under the separate tables. They are all essentially the same attribute in each table and would just hold different type descriptors depending on the parent Type.
c. Main_Character – Would be a string that holds the main character, because in your three separate tables, you have main characters. You’re just calling them 3 separate things. (get the direction I’m going in here? )
d. Secondary_Character – Would be a string that holds the secondary character. You have a secondary character in your dramas and comedies, but non in your tradegies so in tradegy records this column would wind up being null. See what I did there? You now have one table where you used to have 4, and in that one table you can retrieve all the same information you had in those 4 separate tables. Hopefully that’ll make your life easier.
You can do whatever you like, but I’m assuming you mean by best practices and it would be generally considered best practice to separate this single attribute into it’s Simple attribute sub parts. I.E. make it a composed attribute.
I try to design database which contains data about street parking. Parking have gps coordinates, time restriction by day, day of week rules (some days are permitted, other restricted), free or paid status. In the end, I need to do some queries that can specify parking by criteria.
For first overdraw I try to do something like this:
Pakring
-------
parkingId
Lat
Long
Days (1234567)
Time -- already here comes trouble
But it`s not normalized and quickly overflow database. How to design data in the best way?
Update For now I have two approaches
The first one is:
I try to use restrictions tables with many-to-many links.(This is example for days and months). But queries will be complicated and I don`t now how to link time with day.
The second approach is:
Using one restricted table with Type field, that will have priority. But this solution also not normalized.
Just to be clear what data I have.
PakingId Coords String Description(NO PARKING11:30AM TO 1PM THURS)
And I want to show user where he can find street parking by area, time and day.
Thanks to all for your help and time.
This seems like a difficult task. Just a few thoughts.
Are you only concerned with street parking? Parking houses have multiple floors so GPS coordinates won't work unless you stay on the streets.
What is the accuracy of the coordinates? Would it be easier to identify each parking space individually by some other standard. Like unique identifiers of the painted parking squares. (But what happens if people don't park into squares? Or the GPS coordinates accuraycy fails/is not exact enough because of illegal parking? Do you intend to keep records of the parking tickets too?)
Some thought for the tables or information you need to take into account:
time: opening hours, days
price: maybe a different price for different time intervals?
exceptions: holidays, maintenance (maybe not so important, you could just make parking space status active/inactive)
parking slot: id (GPS/random id), status
Three or four tables above could be linked by an intermediate table which reveals the properties of a parking space for every possible parking time (like a prototype for all possible combinations). That information could be linked into another table where you keep records of a actual parking events (so you can for example keep records of people who have or have not paid their bills if you need to).
There are lots of stuff that affect your implementation so you really need to list all the rules of the parking space (and event?). Database structure can be done (and redone) later after you have an understanding of the properties of the events you need to keep records of. And thats the key to everything: understanding what you need to do so you can design and create the implementation. If the implementation (application) doesn't work change the implementation. If the design is faulty redesign. If you don't undestand the whole process (what you really need), everything you do is bound to fail. (Unless you are incredibly lucky but I wouldn't count on luck...)
Try using two tables with an intersection entity between them.
Table parking will have parking_id, lat and long columns. Table Restrictions will have all the type of restrictions that you have in your scenario with something like restriction_id, restriction_day, restriction_time and restriction_status and maybe restriction_type.
Then you can link the two tables with foreign key constraints in the intersection entity.
Example parking_id has restriction_id.
This way a parking can have more than one restriction and a restriction can be applied to more than one parking.
As you seem to have heard of normalization, and following the comment from Damien, you should use different tables to represent different things.
You should then think about how to link those tables together, and in the process define the type of relationship between the 2. Could be one-to-one (this one is the one where you could be tempted to put everything in the same table, but a simple foreign key in a linked table is cleaner), one-to-many (this is where the trouble would begin if you put everything in one table, cause now there will be several lines in the linked table with the same foreign key, and if everything was in the same table, you'd have to myltiply the fields in that table), or many to many (where you would need to add a table only to make the link between 2 other tables, thus with 2 foreign key fields pointing to records in both tables).
For example, in your case, a Parking table could hold the parking name, coordinates, etc.
A second table TimeTable could hold the opening days/time for each parking, with a foreign key to the parkingId (making it a one-to-many rlationship, 1 parking can have many opening frames). The fields of this table could for example be DayOfWeek (number indicating the day), openingTime, closingTime. This would allow you to define several timeframes on the same day, or a single one (if it's always open for example), giving in this case 7 records in this table for this parking (=> one-to-many relationship).
You could then imagine a 3rd table Price where you put data concerning the price of that parking (probably a one-to-many too, with records for hourly rates/long stay/..., and so on depending on the needs and the different "objects" you would need to represent.
Please note these are only rough examples. Database design can sometimes be very tricky and that's a matter I'm not specialist in, but I think these advises can help you go further and come back with another question if you get stuck.
Good luck !
I'm looking to design an inventory database that tracks a snack bar. As this would be a single person/computer access and need to be easily movable to another system, I plan to use SQLite as the DB engine. The basic concept is to track inventory bought from a wholesale warehouse such as Sams Club, and then keep track of the inventory.
The main obstacle I'm trying to overcome is how to track bulk vs individual items in the products database. For example if a bulk item is purchased, let us say a 24 pack of coke, how do I maintain in the product database, the bulk item and that it contains 24 of the individual items. The solution would be fairly easy if all bulk items only contained multiple of 1 item, but in variety packs, such as a carton of chips that contains 5 different individual items all with separate UPCs, the solution becomes a bit more difficult.
So far I have come up with the multiple pass approach where the DB would be scanned multiple times to obtain all of the information.
Product_Table
SKU: INT
Name: TEXT
Brand: TEXT
PurchasePrice: REAL
UPC: BIGINT
DESC: TEXT
BULK: BOOLEAN
BulkList: TEXT // comma separated list of SKUs for each individual item
BulkQty: TEXT // comma separated list corresponding to each SKU above representing the quantity
Transaction_Table
SKU: INT
Qty: INT
// Other stuff but that is the essential
When I add a bulk item to the inventory (A Positive Quantity Transaction), it should instead add all of it's individual items, as I can't think of any time I would keep in stock to sell the bulk item. I would like to keep the bulk items in the database however, to help receiving and adding them into the inventory.
one way to do it is to create a 1:N mapping between bulk objects and their contents:
create table bulk_item (
bulk_product_id integer not null,
item_product_id integer not null,
qty integer not null,
primary key(bulk_product_id, item_product_id),
foreign key(bulk_product_id) references product(sku),
foreign key(item_product_id) references product(sku)
);
A comma-separated list is certainly fine (it might make it harder to do certain queries such as find all bulk objects that contain this SKU etc...).
I have to both agree and disagree with jspcal. I agree with the "bulk_item" table, but I would not say that it's "fine" to use a comma separated list. I suspect that they were only being polite and would not endorse a design that isn't in first normal form.
The design that jspcal has suggested is commonly called "Bill of Materials" and is the only sane way to approach a problem like composite products.
In order to use this effectively with your transaction table, you should include a transaction type code along with the SKU and quantity. There are different reasons why your stock in any given SKU might go up or down. The most common are receiving new stock and customers buying stock. However, there are other things like manual inventory adjustments to take into consideration clerical errors and shrinkage. There are also stock conversions like when you decide to bust up a variety pack into individual products for sale. Don't think you can count on whether the quantity is positive or negative to give you enough information to be able to make sense of your inventory levels and how (and why) they've changed.