Database design - Multiple objecttypes shown hierarcial? - database

I have been struggling and searching for a solution about this for a couple of days but i cannot find any "best practices" or good explanations of how to achive what i want.
Lets say that I have a database consisting of the following tables (just an example);
Customers (Fields: Id, CustomerName, Location)
Products (Fields: Id, ProductName, ProductCode)
Groups (Fields: Id, GroupName)
I then need to link these together to be shown in a Treeview. For example;
Customer1
|
|-Group1
| |-Product1
| |-Product2
|
|-Group2
|-Product2
|-Product3
|-Group3
|
|-Product1
|-Product4
As i said, this is just an example. The real solution consists of other types.
Since the products can occur in several places i need to create a "link table"
to display the hierarchial data.
So i created another table looking like this;
Id (int)
ParentId (int)
ObjectType (int)
GroupId (int)
ProductId (int)
CustomerId (int)
The reason for the ObjectType field is to know in what database i need to check for the items name etc. to display in the treeview.
My question now: Is there any other way to design this database?
I am developing in C# using LINQ etc.

from your example, each level of the tree should be a new link table.
you do not show if group 1 is repeated for more than one customer, should the contents of group1 also be repeated.? but i assume group1 contents are the same no matter which customers are associated.
if you can truly link anything to anything, then the objectType is the way to go... but you would have something like:
parentId
ParentObjectType
childId
childObjectType

Related

Simple database design - some columns have multiple values

Caveat: very new to database design/modeling, so bear with me :)
I'm trying to design a simple database that stores information about images in an archive. Along with file_name (which is one distinct string), I have fields like genre and starring where each field might contains multiple strings (if an image is associated with multiple genres, and/or if an image has multiple actors in it).
Right now the database is just a single table keyed on file_name, and the fields like starring and genre just have multiple comma-separated values stored. I can query it fine by using wildcards and like and in operators, but I'm wondering if there's a more elegant way to break out the data such that it is easier to use/query. For instance, I'd like to be able to find how many unique actors are represented in the archive, but I don't think that's possible with the current model.
I realize this is a pretty elementary question about data modeling, but any guidance anyone can provide or reading you can direct me to would be greatly appreciated!
Thanks!
You need to create extra tables in order to stick with the normalization. In your situation you need 4 extra tables to represent these n->m relations(2 extra would be enough if the relations were 1->n).
Tables:
image(id, file_name)
genre(id, name)
image_genres(image_id, genre_id)
stars(id, name, ...)
image_stars(image_id, star_id)
And some data in tables:
image table
id
file_name
1
/users/home/song/empire.png
2
/users/home/song/promiscuous.png
genre table
id
name
1
pop
2
blues
3
rock
image_genres table
image_id
genre_id
1
2
1
3
2
1
stars table
id
name
1
Jay-Z
2
Alicia Keys
3
Nelly Furtado
4
Timbaland
image_stars table
image_id
star_id
1
1
1
2
2
3
2
4
For unique actor count in database you can simply run the sql query below
SELECT COUNT(name) FROM stars

Nested FutureBuilder vs nested calls for lazy loading from database

I need to choose best approach between two approaches that I can follow.
I have a Flutter app that use sqflite to save data, inside the database I have two tables:
Employee:
+-------------+-----------------+------+
| employee_id | employee_name |dep_id|
+-------------+-----------------+------+
| e12 | Ada Lovelace | dep1 |
+-------------+-----------------+------+
| e22 | Albert Einstein | dep2 |
+-------------+-----------------+------+
| e82 | Grace Hopper | dep3 |
+-------------+-----------------+------+
SQL:
CREATE TABLE Employee(
employee_id TEXT NOT NULL PRIMARY KEY,
employee_name TEXT NOT NULL ,
dep_id TEXT,
FOREIGN KEY(dep_id) REFERENCES Department(dep_id)
ON DELETE SET NULL
);
Department:
+--------+-----------+-------+
| dep_id | dep_title |dep_num|
+--------+-----------+-------+
| dep1 | Math | dep1 |
+--------+-----------+-------+
| dep2 | Physics | dep2 |
+--------+-----------+-------+
| dep3 | Computer | dep3 |
+--------+-----------+-------+
SQL:
CREATE TABLE Department(
dep_id TEXT NOT NULL PRIMARY KEY,
dep_title TEXT NOT NULL ,
dep_num INTEGER,
);
I need to show a ListGrid of departments that are stored in the Employee table. I should look at Employee table and fetch department id from it, This is easy but after fetching that dep_id I need to make a card from those ids so I need information from Department table.
complete inforamtion for thoses id I had fetched from Emplyee table is inside Department table.
There are thousands of rows in each table.
I have a database helper class to connect to the database :
DbHelper is something like this:
Future<List<String>> getDepartmentIds() async{
'fetch all dep_id from Employee table'
}
Future<Department> getDepartment(String id) async{
'fetch Department from Department table for a specific id'
}
Future<List<Department>> getEmployeeDepartments() async{
'''1.fetch all dep_id from Employee table
2.for each id fetch Department records from Department table'''
var ids = await getDepartmentIds();
List<Departments> deps=[];
ids.forEach((map) async {
deps.add(await getDepartment(map['dep_id']));
});
}
There is two approaches:
First One:
Define a function in dbhelper that returns all dep_id from Employee table(getDepartmentIds and another function that returns a department object(model) for that specific id.(getDepartment)
Now I need two FutureBuilder inside each other, one for fetching ids and the other one for fetching department model.
second One:
Define a function that first fetch ids then inside that function each id is maped to department model.(getEmployeeDepartments)
So I need one FutureBuilder .
Which one is better??
should I let FutureBuilders handle it or I should put pressure on dbHelper to habdle it?
If I use the first approach then I have to(as far as I can imagine!) put the the second future call(the one that fetch Department Object(model) based on it's id(getDepartment)) on build function and it's recommended no to do so.
And the problem with second one is that it does a lot of nested call in dbHelper.
I used ListView.builder for performance.
I checked both with some data but couldn't figure out which one is better. I guess it depends both on flutter and sqlite(sqflite).
which one is better or is there any better approach?
Given that I don't see too much code on this example, I'll do a high-level answer on your questions.
Evaluate Approach One
Right off the bat this part sticks out: "returns all dep_id from Employee table"
I would say scratch that, since "return all" is typically never a good solution, especially since you mention your tables have a lot of rows.
Evaluate Approach Two
I'm not sure what the difference in performance this has compared to the first approach, seems also bad for the same reasons. I think this one just changes your UI logic a big is all.
Typical 'Endless' List Approach
You would do a query on the Employees table with a join to the Departments table.
You would implement Pagination on your UI and pass in your values to the query from step one.
At a basic level you'll need these variables: Take, Skip, HasMore
Take: The count # of items to request each query
Skip: The count # of items to skip on the next query, this will be the size of the number of items you currently have in your List in memory driving your UI.
HasMore: You can set this on the response of each query, to let the UI know if there are still more items or not.
As you scroll down the list, when you get to the bottom, you will request more items .
Initially issue a query for example: Take: 10, Skip: 0
Next query when you hit the bottom of the UI: Take: 10, Skip: 10
etc..
Example sql query:
SELECT *
FROM Employees E
JOIN Departments D on D.id = E.dept_id
order by E.employee_name
offset {SKIP#} rows
FETCH NEXT {TAKE#} rows only
Hopefully, this helps, I'm not fully sure what you're trying to do actually - in terms of Code.
As far as I can tell, what you're looking to do is get a list of employees with relevant info including department.
If that's the case, then it's tailor made for INNER JOIN. Something like this:
SELECT Employee.*, Department.dep_id, Department.dep_title
FROM Employee INNER JOIN Department
ON Employee.dep_id = Department.dep_id;
(although you may want to double check that, my SQL is a bit rusty).
This would do what you need in one step. However, there is still the issue of what you're asking which seems to be "Is it more efficient to do many small requests or one big one, and what are the performance ramifications".
The answer to that is a bit specific to Flutter. What's happening when you do a request with SQFLITE, is that it is processing whatever you've passed to it, sending it to java/objc and possibly doing more processing and pushes processing to a backround thread, which then calls to the SQLITE library which does more processing to understand the request, then actually reads the data on the disk to do the operation, then returns back to the java/objc layer, which pushes the response to the UI thread, which in turns responds back to dart.
If that doesn't sound particularly efficient, that's because it isn't =D. If you're doing this a few times (or even a few hundred) it's probably fine, but if you're getting into thousands as you state it might start slowing down.
The alternative you've proposed is to do one large request. You will know better than I whether that is wise; if it's a couple thousand but only ever a couple thousand, and the data you're returning is always going to be relatively small (i.e. just a 10-20 character name and department name), then you'll have say (20+20)*2000 = 8000b = 80kb of data. Even if you assume the overhead will double that size, 160 kb of data shouldn't be enough to faze any relatively recent smartphone (after all that's much smaller than any single photo!).
Now, taking some domain specific knowledge, you could optimize this. For example, if you know the number of departments is much smaller than employees (i.e. < 100 or something), you could skip the entire issue of doing joins, and simply request all departments before this begins and put it in a map (dep_id => dep_title), and then once you've requested employees you could just simply do that lookup from dep_id to dep_title yourself. That way your requests wouldn't have to include the dep_title over and over again.
That being said, you may want to consider paging the employee lookup whether or not you use a join. You'd do this by requesting 100 employees (or whatever number) at a time rather than the entire batch - that way you don't have the overhead of 1000+ calls through the stack, but you also don't have a large block of data all in memory all at once.
SELECT * FROM Employee
WHERE employee_name >= LastValue
ORDER BY employee_name
LIMIT 100;
Unfortunately that doesn't fit in as well with how flutter does lists, so you'd probably need to have something like a 'EmployeeDatabaseManager' that does the actual requests, and your list would call into it to get the data. That's probably beyond the scope of this question though.

Designing a schedule in a sports database

I will try to be as specific as possible, but I am having trouble conceptualizing the problem. As a hobby I am trying to design a NFL database that takes raw statistics and stores it for future evaluation for fantasy league analysis. One of the primary things I want to see is if certain players/teams perform well against specific teams and which defenses are suspect to either pass/run. The issue I am having is trying to design the schedule/event table. My current model is as follows.
TEAMS
TeamID, Team
SCHEDULE
ScheduleID, TeamID, OpponentID, Season, Week, Home_Away, PointsFor, PointsAgainst
In this scenario I will be duplicating every game, but when I use an event table where I use TeamAway and TeamHome I find my queries impossible to run since I have to query both AwayTeam and HomeTeam to find the event for a specific team.
In general though I cannot get a query to work where I have two relationships from a table back to one table, even in the schedule table my query does not work.
I have also considered dropping the team table and just storing NE, PIT, etc. for the Team and Opponent fields so I do not have to deal with the cross-relationships back to the team table.
How can I design this so I am not running queries for TeamID = OpponentID AND TeamID?
I am doing this in MS Access.
Edit
The issue I am having is when I query two table: Team (TeamID, Team) and Event(TeamHomeID, TeamAwayID), that had relationships built between the TeamID - TeamHomeID, and TeamID - TeamWayID I had issues building the query in ms Access.
The SQL would look something like:
SELECT Teams.ID, Teams.Team, Event.HomeTeam
FROM Teams INNER JOIN (Event INNER JOIN Result ON Event.ID = Result.EventID)
ON (Teams.ID = Result.LosingTeamID) AND (Teams.ID = Result.WinningTeamID)
AND (Teams.Team = Event.AwayTeam) AND (Teams.Team = Event.HomeTeam);
It was looking for teams that had IDs of both the losing team and the winning team (which does not exist).
I think I might have fixed this problem. I didn't realize the Relationships in database design are only default, and that within the Query builder I could change the joins on which a particular query is built. I discovered this by deleting all the AND portions of the SQL statement returned, and was able to return the name of all winnings teams.
This is an interesting concept - and good practice.
First off - it sounds like you need to narrow down exactly what kind of data you want so you know what to store. I mean, hell, what about storing the weather conditions?
I would keep Team, but I would also add City (because Teams could switch cities).
I would keep Games (Schedule) with columns GameID, HomeTeamID, AwayTeamID, ScheduleDate.
I would have another table Results with columns ResultID, GameID, WinningTeamID, LosingTeamID, Draw (Y/N).
Data could look like
TeamID | TeamName | City
------------------------
1 | PATS | NE
------------------------
2 | PACKERS | GB
GameID | HomeTeamID | AwayTeamID | ScheduleDate | Preseason
-----------------------------------------------------------
1 | 1 | 2 | 1/1/2016 | N
ResultID | GameID | WinningTeamID | LosingTeamID | Draw
------------------------------------------------------------
1 | 1 | 1 | 2 | N
Given that, you could pretty easily give any W/L/D for any Scheduled Game and date, you could easily SUM a Teams wins, their wins when they were home, away, during preseason or regular season, their wins against a particular team, etc.
I guess if you wanted to get really technical you could even create a Season table that stores SeasonID, StartDate, EndDate. This would just make sure you were 100% able to tell what games were played in which season (between what dates) and from there you could infer weather statistics, whether or not a player was out during that time frame, etc.

Database, table inheritance?

This is about database structure. (inheritance)
Say you have Place and Restaurant and Cafe are two subtypes of place.
You can create a Place table to hold a common info of the subtypes.
and create a foreign key to connect to Retaurant or Cafe instance.
or
You can duplicate stuff in Restaurant and Cafe
I'm coming from Django background, and many seem to prefer #2 over #1.
Is there a compelling scenario where you should pick one over another?
One scenario I think I need the #1 is when you are going to sort all Places collectively. (Can we use #2 for this?)
I think I would go for #2, as you don't have to think about relations and foreign keys and the model itself is complete, so you could just copy the database and use it for something else.
Further you just have to query one table instead of two.
If you need to sort Restaurant and Cafe, you can use the SQL UNION Operator.
Let's assume you have these two simple tables:
restaurant
id | name | likes
-------------------------
1 | Steakhouse | 5
2 | Italian Food | 3
cafe
id | name | likes
--------------------------
1 | Starbucks | 0
You can query them using the UNION operator like this:
SELECT * FROM cafe
UNION
SELECT * FROM restaurant
ORDER BY likes DESC
Which will return a list of cafes and restaurants ordered by likes as if they are coming from the same table.

How do I create nested categories in a Database?

I am making a videos website where categories will be nested:
e.g. Programming-> C Language - > MIT Videos -> Video 1
Programming -> C Language -> Stanford Video - > Video 1
Programming -> Python -> Video 1
These categories and sub-categories will be created by users on the fly. I will need to show them as people create them in the form of a navigable menu, so that people can browse the collection easily.
Could someone please help me with how I can go about creating such a database?
Make a categories table with the following fields:
CategoryID - Integer
CategoryName - String/Varchar/Whatever
ParentID - Integer
Your ParentID will then reference back to the CategoryID of its parent.
Example:
CategoryID CategoryName ParentID
---------------------------------
1 Dog NULL
2 Cat NULL
3 Poodle 1
4 Dachsund 1
5 Persian 2
6 Toy Poodle 3
Quassnoi said :
You should use either nested sets or parent-child models.
I used to implement both of them. What I could say is:
Use the nested set architecture if your categories table doesn't change often, because on a select clause it's fast and with only one request you can get the whole branch of the hierarchy for a given entry. But on a insert or update clause it takes more time than a parent child model to update the left and right (or lower and upper in the example below) fields.
Another point, quite trivial I must admit, but:
It's very difficult to change the hierarchy by hand directly in the database (It could happen during the development). So, be sure to implement first an interface to play with the nested set (changing parent node, move a branch node, deleting a node or the whole branch etc.)
Here are two articles on the subject:
Storing Hierarchical Data in a Database
Managing Hierarchical Data in MySQL
Last thing, I didn't try it, but I read somewhere that you can have more than one tree in a nested set table, I mean several roots.
You should use either nested sets or parent-child models.
Parent-child:
typeid parent name
1 0 Buyers
2 0 Sellers
3 0 Referee
4 1 Electrical
5 1 Mechanic
SELECT *
FROM mytable
WHERE group IN
(
SELECT typeid
FROM group_types
START WITH
typeid = 1
CONNECT BY
parent = PRIOR typeid
)
will select all buyers in Oracle.
Nested sets:
typeid lower upper Name
1 1 2 Buyers
2 3 3 Sellers
3 4 4 Referee
4 1 1 Electrical
5 2 2 Mechanic
SELECT *
FROM group_types
JOIN mytable
ON group BETWEEN lower AND upper
WHERE typeid = 1
will select all buyers in any database.
See this answer for more detail.
Nested sets is more easy to query, but it's harder to update and harder to build a tree structure.
From the example in your question it looks like you'd want it to be possible for a given category to have multiple parents (e.g., "MIT Videos -> Video 1 Programming" as well as "Video -> Video 1 Programming"), in which case simply adding a ParentID column would not be sufficient.
I would recommend creating two tables: a simple Categories table with CategoryID and CategoryName columns, and a separate CategoryRelationships table with ParentCategoryID and ChildCategoryID columns. This way you can specify as many parent-child relationships as you want for any particular category. It would even be possible using this model to have a dual relationship where two categories are each other's parent and child simultaneously. (Off the top of my head, I can't think of a great use for this scenario, but at least it illustrates how flexible the model is.)
What you need is a basic parent-child relationship:
Category (ID: int, ParentID: nullable int, Name: nvarchar(1000))
A better way to store the parent_id of the table is to have it nested within the ID
e.g
100000 Programming
110000 C Language
111000 Video 1 Programming
111100 C Language
111110 Stanford Video
etc..so all you need it a script to process the ID such that the first digit represents the top level category and so on as you go deeper down the hierarchy

Resources