Many to Many Database Relationship Design - to enable Word Clouds - database

I'm relatively new to database design and struggling to introduce a many-to-many relationship in a SSAS Tabular model.
I have some 'WordGroup' performance data in one table, like so;
WordGroup | IndexedVolume
Dining | 1,000
Sports | 2,000
Movies | 1,600
... and so on
Then I have 'Words' contained within these 'WordGroups' sitting in another category table, like so;
WordGroup | Word
Dining | Restaurant
Dining | Food
Dining | Dinner
Sports | Football
Sports | Basketball
... and so on
I can't see Performance data (IndexedVolume) by 'Word' detail - only by the 'WordGroup' that it is contained within. For example above, I can't look at 'Football' IndexedVolume on it's own, I can only choose the 'Sports' WordGroup that contains Football.
However, when analysing by 'WordGroup' I would still like users to understand what 'Words' are included (ideally in a Word Cloud Visualisation). Therefore, I wanted to develop a relationship between these two tables, so when someone chooses a Word Group (or multiple) we can return the Words that are contained within the Word Group(s) - i.e. below.
User selects Dining WordGroup
<<<Word Cloud or Flat Table would show Words below>>>
Restaurant
Food
Dinner
I looked at Concatenate / Strings etc, but was deterred as the detail here is much more complex and each WordGroup may contain 10+ Words, with translations.
Any advice would be greatly appreciated!

If analizing by WordGroup is an obligatory requirement, you sholud use these tables:
The many-to-many aplies beacuse your words may be conected to one or more groups, e.g. tree is conected to enviroment, forest, etc.
and obviously one word_group is conected to many words.
To see performance data by Word use :
select w.idword , w.name, sum(wg.index_volume)
from word w
left join word_group_has_word wgw
on w.idword=wgw.word_id
left join word_group wg
on wg.idword_group=wgw.word_group_id
group by w.idword
So you will see the sum of all the index volume of all the group_words conected to the words. ANd if you wanna see the words conected to the word groups use:
select distinct w.idword , w.name
from word w
left join word_group_has_word wgw
on w.idword=wgw.word_id
where wgw.word_group_id in [listWordGroupsId]

Related

Nested FutureBuilder vs nested calls for lazy loading from database

I need to choose best approach between two approaches that I can follow.
I have a Flutter app that use sqflite to save data, inside the database I have two tables:
Employee:
+-------------+-----------------+------+
| employee_id | employee_name |dep_id|
+-------------+-----------------+------+
| e12 | Ada Lovelace | dep1 |
+-------------+-----------------+------+
| e22 | Albert Einstein | dep2 |
+-------------+-----------------+------+
| e82 | Grace Hopper | dep3 |
+-------------+-----------------+------+
SQL:
CREATE TABLE Employee(
employee_id TEXT NOT NULL PRIMARY KEY,
employee_name TEXT NOT NULL ,
dep_id TEXT,
FOREIGN KEY(dep_id) REFERENCES Department(dep_id)
ON DELETE SET NULL
);
Department:
+--------+-----------+-------+
| dep_id | dep_title |dep_num|
+--------+-----------+-------+
| dep1 | Math | dep1 |
+--------+-----------+-------+
| dep2 | Physics | dep2 |
+--------+-----------+-------+
| dep3 | Computer | dep3 |
+--------+-----------+-------+
SQL:
CREATE TABLE Department(
dep_id TEXT NOT NULL PRIMARY KEY,
dep_title TEXT NOT NULL ,
dep_num INTEGER,
);
I need to show a ListGrid of departments that are stored in the Employee table. I should look at Employee table and fetch department id from it, This is easy but after fetching that dep_id I need to make a card from those ids so I need information from Department table.
complete inforamtion for thoses id I had fetched from Emplyee table is inside Department table.
There are thousands of rows in each table.
I have a database helper class to connect to the database :
DbHelper is something like this:
Future<List<String>> getDepartmentIds() async{
'fetch all dep_id from Employee table'
}
Future<Department> getDepartment(String id) async{
'fetch Department from Department table for a specific id'
}
Future<List<Department>> getEmployeeDepartments() async{
'''1.fetch all dep_id from Employee table
2.for each id fetch Department records from Department table'''
var ids = await getDepartmentIds();
List<Departments> deps=[];
ids.forEach((map) async {
deps.add(await getDepartment(map['dep_id']));
});
}
There is two approaches:
First One:
Define a function in dbhelper that returns all dep_id from Employee table(getDepartmentIds and another function that returns a department object(model) for that specific id.(getDepartment)
Now I need two FutureBuilder inside each other, one for fetching ids and the other one for fetching department model.
second One:
Define a function that first fetch ids then inside that function each id is maped to department model.(getEmployeeDepartments)
So I need one FutureBuilder .
Which one is better??
should I let FutureBuilders handle it or I should put pressure on dbHelper to habdle it?
If I use the first approach then I have to(as far as I can imagine!) put the the second future call(the one that fetch Department Object(model) based on it's id(getDepartment)) on build function and it's recommended no to do so.
And the problem with second one is that it does a lot of nested call in dbHelper.
I used ListView.builder for performance.
I checked both with some data but couldn't figure out which one is better. I guess it depends both on flutter and sqlite(sqflite).
which one is better or is there any better approach?
Given that I don't see too much code on this example, I'll do a high-level answer on your questions.
Evaluate Approach One
Right off the bat this part sticks out: "returns all dep_id from Employee table"
I would say scratch that, since "return all" is typically never a good solution, especially since you mention your tables have a lot of rows.
Evaluate Approach Two
I'm not sure what the difference in performance this has compared to the first approach, seems also bad for the same reasons. I think this one just changes your UI logic a big is all.
Typical 'Endless' List Approach
You would do a query on the Employees table with a join to the Departments table.
You would implement Pagination on your UI and pass in your values to the query from step one.
At a basic level you'll need these variables: Take, Skip, HasMore
Take: The count # of items to request each query
Skip: The count # of items to skip on the next query, this will be the size of the number of items you currently have in your List in memory driving your UI.
HasMore: You can set this on the response of each query, to let the UI know if there are still more items or not.
As you scroll down the list, when you get to the bottom, you will request more items .
Initially issue a query for example: Take: 10, Skip: 0
Next query when you hit the bottom of the UI: Take: 10, Skip: 10
etc..
Example sql query:
SELECT *
FROM Employees E
JOIN Departments D on D.id = E.dept_id
order by E.employee_name
offset {SKIP#} rows
FETCH NEXT {TAKE#} rows only
Hopefully, this helps, I'm not fully sure what you're trying to do actually - in terms of Code.
As far as I can tell, what you're looking to do is get a list of employees with relevant info including department.
If that's the case, then it's tailor made for INNER JOIN. Something like this:
SELECT Employee.*, Department.dep_id, Department.dep_title
FROM Employee INNER JOIN Department
ON Employee.dep_id = Department.dep_id;
(although you may want to double check that, my SQL is a bit rusty).
This would do what you need in one step. However, there is still the issue of what you're asking which seems to be "Is it more efficient to do many small requests or one big one, and what are the performance ramifications".
The answer to that is a bit specific to Flutter. What's happening when you do a request with SQFLITE, is that it is processing whatever you've passed to it, sending it to java/objc and possibly doing more processing and pushes processing to a backround thread, which then calls to the SQLITE library which does more processing to understand the request, then actually reads the data on the disk to do the operation, then returns back to the java/objc layer, which pushes the response to the UI thread, which in turns responds back to dart.
If that doesn't sound particularly efficient, that's because it isn't =D. If you're doing this a few times (or even a few hundred) it's probably fine, but if you're getting into thousands as you state it might start slowing down.
The alternative you've proposed is to do one large request. You will know better than I whether that is wise; if it's a couple thousand but only ever a couple thousand, and the data you're returning is always going to be relatively small (i.e. just a 10-20 character name and department name), then you'll have say (20+20)*2000 = 8000b = 80kb of data. Even if you assume the overhead will double that size, 160 kb of data shouldn't be enough to faze any relatively recent smartphone (after all that's much smaller than any single photo!).
Now, taking some domain specific knowledge, you could optimize this. For example, if you know the number of departments is much smaller than employees (i.e. < 100 or something), you could skip the entire issue of doing joins, and simply request all departments before this begins and put it in a map (dep_id => dep_title), and then once you've requested employees you could just simply do that lookup from dep_id to dep_title yourself. That way your requests wouldn't have to include the dep_title over and over again.
That being said, you may want to consider paging the employee lookup whether or not you use a join. You'd do this by requesting 100 employees (or whatever number) at a time rather than the entire batch - that way you don't have the overhead of 1000+ calls through the stack, but you also don't have a large block of data all in memory all at once.
SELECT * FROM Employee
WHERE employee_name >= LastValue
ORDER BY employee_name
LIMIT 100;
Unfortunately that doesn't fit in as well with how flutter does lists, so you'd probably need to have something like a 'EmployeeDatabaseManager' that does the actual requests, and your list would call into it to get the data. That's probably beyond the scope of this question though.

Is this database structure designed efficiently?

I need to setup a DB where users can create categories within categories and inside the categories are multiple objects with multiple different stats
hierarchy example:
Cartoon > category
Simpsons > category (within the cartoon category)
Homer > object
Homer object stats > stupidity: 117
Homer object stats > color: yellow
I cant just make a table like:
-----------------------------
|character| Stupidy | color |
-----------------------------
|Homer | 111 |yellow |
because i need the users to be able to take away and add different object stats on their own (like add a stat of weight and remove color stats) for each object the stat types will be different plus with thousands of categories i don't want thousands of tables being generated by the users for the different objects.
My DB setup is in the google spreadsheet below which I feel works but I'm not the best with DB setups so I'm checking if there are some improvements.
I need to be able to display tables from the data displaying all item values that have a relationships with certain categories and/ or objects e.g. all simpsons characters weights or just homers weight.
https://docs.google.com/spreadsheets/d/1fHrugYv6JA3id_5zDxWaE-5IUNNGFo5yA1DZ1xQlqtg/edit?usp=sharing
Here is your database design:
Explanation:
There may be many categories and each category may have many sub categories
Each sub category may have many objects and
Each object may have many keys and values
Cheers!!

Database design for voting

I am implementing a voting feature to allow users to vote for their favourite images. They are able to vote for only 3 images. Nothing more or less. Therefore, I am using checkboxes to do validation for it. I need to store these votes in my database.
Here is what i have so far :
|voteID | name| emailAddress| ICNo |imageID
(where imageID is a foreign key to the Images table)
I'm still learning about database systems and I feel like this isn't a good database design considering some of the fields like email address and IC Number have to be repeated.
For example,
|voteID | name| emailAddress | ICNo | imageID
1 BG email#example.com G822A28A 10
2 BG email#example.com G822A28A 11
3 BG email#example.com G822A28A 12
4 MO email2#example.com G111283Z 10
You have three "things" in your system - images, people, and votes.
An image can have multiple votes (from different people), and a person can have multiple votes (for different images).
One way to represent this in a diagram is as follows:
So you store information about a person in one place (the Person table), about Images in one place (the Images table), and Votes in one place. The "chicken feet" relationships between them show that one person can have many votes, and one image can have many votes. ("Many" meaning "more than one").

How to store sets of objects that have occurred together during events?

I'm looking for an efficient way of storing sets of objects that have occurred together during events, in such a way that I can generate aggregate stats on them on a day-by-day basis.
To make up an example, let's imagine a system that keeps track of meetings in an office. For every meeting we record how many minutes long it was and in which room it took place.
I want to get stats broken down both by person as well as by room. I do not need to keep track of the individual meetings (so no meeting_id or anything like that), all I want to know is daily aggregate information. In my real application there are hundreds of thousands of events per day so storing each one individually is not feasible.
I'd like to be able to answer questions like:
In 2012, how many minutes did Bob, Sam, and Julie spend in each conference room (not necessarily together)?
Probably fine to do this with 3 queries:
>>> query(dates=2012, people=[Bob])
{Board-Room: 35, Auditorium: 279}
>>> query(dates=2012, people=[Sam])
{Board-Room: 790, Auditorium: 277, Broom-Closet: 71}
>>> query(dates=2012, people=[Julie])
{Board-Room: 190, Broom-Closet: 55}
In 2012, how many minutes did Sam and Julie spend MEETING TOGETHER in each conference room? What about Bob, Sam, and Julie all together?
>>> query(dates=2012, people=[Sam, Julie])
{Board-Room: 128, Broom-Closet: 55}
>>> query(dates=2012, people=[Bob, Sam, Julie])
{Board-Room: 22}
In 2012, how many minutes did each person spend in the Board-Room?
>>> query(dates=2012, rooms=[Board-Room])
{Bob: 35, Sam: 790, Julie: 190}
In 2012, how many minutes was the Board-Room in use?
This is actually pretty difficult since the naive strategy of summing up the number of minutes each person spent will result in serious over-counting. But we can probably solve this by storing the number separately as the meta-person Anyone:
>>> query(dates=2012, rooms=[Board-Room], people=[Anyone])
865
What are some good data structures or databases that I can use to enable this kind of querying? Since the rest of my application uses MySQL, I'm tempted to define a string column that holds the (sorted) ids of each person in the meeting, but the size of this table will grow pretty quickly:
2012-01-01 | "Bob" | "Board-Room" | 2
2012-01-01 | "Julie" | "Board-Room" | 4
2012-01-01 | "Sam" | "Board-Room" | 6
2012-01-01 | "Bob,Julie" | "Board-Room" | 2
2012-01-01 | "Bob,Sam" | "Board-Room" | 2
2012-01-01 | "Julie,Sam" | "Board-Room" | 3
2012-01-01 | "Bob,Julie,Sam" | "Board-Room" | 2
2012-01-01 | "Anyone" | "Board-Room" | 7
What else can I do?
Your question is a little unclear because you say you don't want to store each individual meeting, but then how are you getting the current meeting stats (dates)? In addition any table given the right indexes can be very fast even with alot of records.
You should be able to use a table like log_meeting. I imagine it could contain something like:
employee_id, room_id, date (as timestamp), time_in_meeting
Where foreign keys to employee id to employee table, and room id key to room table
If you index employee id, room id, and date you should have a pretty quick lookup as mysql multiple-column indexes go left to right such that you gain index on (employee id, employee id + room id, and employee id + room id + timestamp) when do searches. This is explained more in the multi-index part of:
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
By refusing to store meetings (and related objects) individually, you are loosing the original source of information.
You will not be able to compensate for this loss of data, unless you memorize on a regular basis the extensive list of all potential daily (or monthly or weekly or ...) aggregates that you might need to question later on!
Believe me, it's going to be a nightmare ...
If the number of people are constant and not very large you can then assign a column to each person for present or not and store the room, date and time in 3 more columns this can remove the string splitting problems.
Also by the nature of your question I feel first of all you need to assign Ids to everything rooms,people, etc. No need for long repetitive string in DB. Also try reducing any string operation and work using individual data in each column for better intersection performance. Also you can store a permutation all the people in a table and assign a id for them then use one of those ids in the actual date and time table. But all techniques will require that something be constant either people or rooms.
I do not understand whether you know all "questions" in design time or it's possible to add new ones during development/production time - this approach would require to keep all data all the time.
Well if you would know all your questions it seems like classic "banking system" which recalculates data on daily basis.
How I think about it.
Seems like you have limited number of rooms, people, days etc.
Gather logging data on daily basis, one table per day. Just one event, one database row, all information (field) what you need.
Start to analyse data using some crone script at "midnight".
Update stats for people, rooms, etc. Just increment number of hours spent by Bob in xyz room etc. All what your requirements need.
As analyzed data are limited and relatively small as you analyzed (compress) them, your system can contain also various queries as indexes would be relatively small etc.
You could be able to use scalable map/reduce algorithm.
You can't avoid storing the atomic facts as follows: (the meeting room, the people, the duration, the day), which is probably only a weak consolidation when the same people meet multiple times in the same room on the same day. Maybe that happens a lot in your office :).
Making groups comparable is an interesting problem, but as long as you always compose the member strings the same, you can probably do it with string comparisons. This is not "normal" however. To normalise you'll need a relation table (many to many) and compose a temporary table out of your query set so it joins quickly, or use an "IN" clause and a count aggregate to ensure everyone is there (you'll see what I mean when you try it).
I think you can derive the minutes the board room was in use as meetings shouldn't overlap, so a sum will work.
For storage efficiency, use integer keys for everything with lookup tables. Dereference the integers during the query parsing, or just use good old joins if you are feeling traditional.
That's how I would do it anyway :).
You'll probably have to store individual meetings to get the data you need anyway.
However you'll have to make sure you aggregate and anonymise it properly before creating your reports. Make sure to separate concerns and access levels to stay within the proper legal limits on data.

What is the most effective way to handle lots of tables in a database?

I am new to database programming and am using sqlite and python. As an example lets say I have a database named Animals.db which I open with and get the cursor for in python. Now if I wanted to separate the animals by species I would have a different table per species and since it can get even more specific I would likely need something more specific than just a table of species.
I am a bit confused on how one allocates the correct data to the correct area of a database, how is it separated. Are there tables of tables?
if I wanted to lets say have a table for every land animal and another for every animal of the sea, but each table would need further specification(homo sapiens, etc), how can I do that?
Now if I wanted to separate the
animals by species I would have a
different table per species
Maybe. Maybe not. You might use a table that looked like this. It depends entirely on what you mean by "separate the animals by species". Here's one reasonable interpretation.
Animal_name Sex Species
------
Jack M Leopardus pardalis
Susie F Leopardus pardalis
Kimmie M Leopardus pardalis
Susie F Stenella clymene
Ginger F Stenella clymene
Mary Ann F Stenella clymene
To find all the Clymene dolphins, you might use a query along these lines.
select Animal_name
from animals
where species = 'Stenella clymene'
order by Animal_name
Animal_name
--
Ginger
Mary Ann
Susie
Start by collecting data. Your goal is to collect a set of representative sample data. Sample data, because the full population is too big to handle. Representative, because ideally it represents all the problems you're likely to run into with the full population. If "animal name" to you doesn't mean "Jack" or "Ginger", but "ocelot" and "Clymene dolphin", representative sample data will make that clear.

Resources