Requesting feedback on database design - sql-server

I am building a database for a christmas tree growing operation. I have put together, what I believe to be, a workable schema. I am hoping to get some feedback from someone, and I have no one. You are my only hope.
So, there are 3 growing plots, we will call them Orchards. Each Orchard has rows & columns, and each row/column intersection can have zero or one trees, planted in it. The rows/columns are numbers and letters, so row 3, column f, etc. Each row/column intersection has a status (empty, in use). A tree can be different species (denoted by manually created GID {Genetic ID}), modified (have a different species grafted on), or moved to a different location. So a plant can have one or many locations, and a location can contain, through history, one or many trees, but only one at a time.
Here is a diagram I put together:

So I was thinking for historical purposes, I would use the
treelocation table. Do you think it is unnecessary?
No, but in that case you should have the information pertaining to the tree's location in the tree location table. For instance "MovedYear". If a tree moves multiple times, don't you want to keep the Year of each Move, instead of just one MovedYear for each tree?
It's fine to have a history table the way you do, but right now, if TreeId 1 has been in 3 different locations, how could you query your database to see which location it's in NOW? All you'll see is:
TreeId LocationId
1 1
1 2
1 3
You won't know in what order the moves took place. (Unless you have some business rule that says trees can only move from 1 to 2 and from 2 to 3, and never follow any other order).
The usual way to solve this is to have a StartDate and EndDate in the history table.

It seams
A plant can have one or many locations
No, a plant have a location but it can move.
To gain this we need to
Have location foreign key(FK) inside Tree table, showing current tree location.
This FK needs to be mandatory (exposing have a)
To prevent multiple trees having the same location we need to have a unique key constraint on this FK column.
A plant can move, so to trace a plants location history
We will need a plant-location-history table
Each row/column intersection has a status (empty, in use)
So the intersections status can have predefined limited values.
Do we need a LocationStatus table? I don't think so. status can be a static field inside locatin table with a check constraint of (1= empty, 2= in-use, 3= ETC)

Related

Duplicate data vs Calculated data in database

I'm starting to track a host of variables around my life (QuantifiedSelf). I have a lot of input sources, and I'm working on sticking it all into a database. I plan on using this database with R to ask arbitrary questions about my life ("Which routes are the fastest to work", or "What foods affect my mood", etc)
The key question I'm trying to answer here is "Do I process the input before sticking it into the database?"
Examples of "process":
Some of my input is a list of moods (one for each day). As of right now, there are only 5 available moods (name with a rating between -2 and 2). Do I normalize this data and create two tables: A Mood table (with 5 items) and a DailyMood table?
If I process the data then I lose the original data. Perhaps I change a mood to have a different name. If I do this in a normalized database, then I lose the information that before the change, I had a mood "oldName"
If I don't process the data, then I have duplication of data
Another input is a list of GPS locations (lat, long). However, most of my day is spent in a single spot, or spent driving. Do I process this data to create two tables "Locations" and "Routes"?
If I don't process the data, then I have a whole bunch of duplicate locations (at different timestamps), which is difficult to query and get good data out of.
If I process the data, then I lose the original data. I end up with a nice set of Locations and Routes that is easy to query, but if those locations or routes are wrong, I would have to redownload the input source and rebuild the database.
However, I feel like I'm stuck between two opposing "ideals":
If I process the data, then I don't have the original data.
If I don't process the data, then I have duplicate, hard to use data.
I've considered storing both the original and the calculated. This feels like I'm getting the worst of both worlds: Some of my tables aren't original, and would need a full recalculation if they are wrong, while other tables are original but hard to use and have duplicate data.
To some of the points in the comments, I think which data you store depend on the need in your application, and I would approach each set of data through a use case lens.
For the first use case, mood data, it sounds like there is value in being able to see this data over time (i.e. it appears that over the last month, my mood has been improving) as well as to pull up individual events (i.e. on date x, I ate a hamburger, how did this affect my mood in the subsequent mood entry after date x).
If it were me, I would create a Mood table, with two attributes:
Name
Id (pk)
This table would essentially serve as a definition table. Here you could add attributes specific to the mood (such as description).
I would then create a MoodHistory table with the following attributes:
- Timestamp
- MoodId
- IsCurrent (Boolean)
Before you enter a mood in your application, UPDATE MoodHistory SET IsCurrent = 0 WHERE IsCurrent = 1, and then insert your new record with IsCurrent = 1. This structure is normalized and by indexing or partitioning by the IsCurrent column (and honestly even without any indexing/partitioning), even as your table grows quite large, you should always be able to query the current mood super quickly.
For your second use case, this is quite dependent not only on your planned usage, but where the data is coming from (particularly for routes). I'm not sure how you are planning on grouping locations into "routes" but if you clarify in the comments, I'm happy to add to my answer.
For locations however, I'm assuming you're taking a Location Snapshot during some set time interval. I would create a LocationSnapshot table structured similarly to the MoodHistory table:
I would then create a MoodHistory table with the following attributes:
Timestamp
Latitude
Longitude
IsCurrent
By processing your IsCurrent data in a similar way to your MoodHistory data, it should be quite straightforward to grab the last entered location. You could also do some additional processing if you want to avoid duplicates. Essentially, before updating IsCurrent, query the row where IsCurrent = 1. Then compare that records Latitude and Longitude to your new Latitude and Longitude before Inserting the new record. If there is any change, proceed to the insert, otherwise, no need to insert a new record.
You could also create a table of known locations such as KnownLocation:
Latitude
Longitude
Name
Joining to this table ON Latitude and Longitude should tell you when you were spending time at a particular location, say "Home" vs "Work"

How can I polymorphically structure a database?

This might be a stupid question but I have very little experience. I have encountered an issue where I am working with a Excel spreadsheet for a small factory.
It has a huge list of products that are grouped into families.
analogy: Corolla, Avensis, Landcruiser = Toyota
Furthermore the products have a list of tasks associated with them.
Corolla:
Step 1
Step 2
Step 3...
All products share tasks in the first few stages even across different families.
But some occur at a different stage during production
What may be step 6 in productX is step 5 in productY.
But productX and productY share 1-5. (And this is true across the board.
I have three questions.
Is it possible to polymorphically structure a database? Common tasks can be placed in the base class and get more specific (common for OO).
If it is not can you create a central database of unordered tasks and give some sort of priority to each database of a product and they give the tasks some order.
Final question is has anyone encountered such a problem? I have a feeling there has to be a design pattern to this. It feels like a solution is just beyond my grasp.
Edit 1. Spread sheet is mostly blank for time being. Worksheets are the product names. That string-integer combination are the product numbers. Values will be put in underneath i.e. Time/hr and the amount of product should be made in the time specified [
So, this is what I understood:
You need to store a mapping between products and tasks/steps. The latter should be stored in order that are to be performed.
Some initial tasks are always common for all products.
You'd like to structure your database 'polymorphically'. Since you didn't mention what kind of database you are using, I'll assume it to be a relational one.
You can create your tables so:
Product: each row stores data on one product. Primary key: product-name (or product-id, whatever)
Task: information on a task, such as time taken to finish it etc. Primary key: task-name/id.
ProductTaskMapping: contains mapping of what tasks are to be done what product, in order. Its schema will be as follows. You can also think of having the first two columns as foreign keys.
product-name- refers to the primary-key in Product table.
task-name- refers to the primary-key in Task table.
priority, or sequence-number
CommonTask: Two columns:
task-name
priority
Also, there's no way to define 'inheritance' between two tables.

Database design: ordered set

task_set is a database with two colums(id, task):
id task
1 shout
2 bark
3 walk
4 run
assume there is another table with two colums(employee,task_order)
task_order is an ordered set of tasks, for example (2,4,3,1)
generally, the task_order is unchanged, but sometimes it may be inserted or deleted, e.g, (2,4,9,3,1) ,(2,4,1)
how to design such a database? I mean how to realize the ordered set?
If, and ONLY if you don't need to search inside the task_set column, or update one of it's values (i.e change 4,2,3 to 4,2,1), keeping that column as a delimited string might be an easy solution.
However, if you ever plan on searches or updates for specific values inside the task_set, then you better normalize that structure into a table that will hold employee id, task id, and task order.

Database hierarchy, aggregation, relations in LibreOffice Base

I am working on a homework project where we design a website for a store, and I have been assigned the database. This is my first database attempt. I am using LibreOffice Base for the design, and cannot find any guides on how to make subtypes. For example, for every shirt in the inventory, there'd be a different group of colors it comes in and for every different color a list of individual sizes and how many of each size is in stock. However, I can't find aggregation anywhere in "Table Relations."
So I make a table for shirts with the base information (brand, price, etc), and then a separate table with just 2 columns (size and number of units in stock --- we're letting the possibility of multiple colors wait for now). I then make a form for the shirt with the base information and a subform with 2 columns: size and number available. Both of the forms are tables rather than labeled text boxes. However, the subform for shirt size does not maintain separate information for each row in the main form (ie the one with the base information for the shirts). How the heck do I do this?
Lastly, since this is my first crack at databases, I would not be at all surprised if I'm going at it all wrong, and if so would gladly appreciate a push in the right direction or a webpage explaining how to do this that I didn't find due to not entering the correct search terms.
You need to create linked fields in the master table. The shirt table has a primary key; refer to that in the subordinate table. Alternatively, create a primary key in the subordinate table and refer to it in the master table. Then in the subform --> properties, designate the appropriate link between the master and slave fields. The functionality is described in the LibreOffice Base handbook (p.105)

Using b trees in a database

I have to implement a database using b trees for a school project. the database is for storing audio files(songs), and a number of different queries can be made like asking for all the songs of a given artist or a specific album.
The intuitive idea is to use on b tree for each field ( songs, albums, artists, ...), the problem is that one can be asked to delete any member of any field, and in-case you delete an artist you have to delete all his albums and songs from the other b trees, keeping in mind that for example all the songs of a given artist don't have to be near each other in the b tree that corresponds to songs.
My question is: is there a way to do so (delete the songs after a delete to an author has been made) without having to iterate over all elements of the other b trees? I'm not looking for code just ideas because all the ones I've come up with are brute force ones.
This is my understanding and may not be entirely right.
Typically in a database implementation B Trees are used for indexes, so unless you want to force your user to index every column, defaulting to creating a B Tree for each field is unnecessary. Although this many indexes will lead to a fast read in virtually every case (with an index on everything, you wont have to do a full table scan), it will also cause an extremely slow insert/update/delete, as the corresponding data has to be updated in each tree. As I'm sure you know, modern databases for you to have at least one index (the primary key), so you will have at least one B Tree with a key for the primary key, and a pointer to the appropriate node.
Every node in a B Tree index should have a pointer/reference to the full object it represents.
Future indexes created would include the attributes you specify in the index, such as song name, artist, etc, however will still contain the pointer/reference to the corresponding node. Thus when you modify, lets say, the song title, you will want to modify the referenced node which all the indexes reference. If you have any indexes that have the modified reference as an attribute, you will have to modify the values in that index itself.
Unfortunately I believe you are correct in your belief that you will have to brute-force your way through the other B Trees when deleting/updating, and is one of the downsides of using alot of indexes (slowed update/delete time). If you just delete the referenced nodes, you will likely end up with pointers to deleted objects, which will (depending on your language) give you some form of a NullPointerException. In order to prevent this they references will have to be removed from all the trees.
Keep in mind though that doing a full scan of your indexes will still be much better than doing full table scans.

Resources