Modelling a voting poll in a graph database - database

I've modelled a voting poll for a RDBMS system. The structure is a bit more complicated than a conventional voting poll since users can choose to vote either for an option on the poll or pass on their vote to another user for a given poll.
My structure looks something like this:
Polls
id | title
----------
1 | Who should be president
Options
id | poll_id | title
--------------------
1 | 1 | Obama
2 | 1 | Bush
Vote
id | poll_id | user_id | vote_type | vote_id
--------------------------------------------
1 | 1 | 1 | option | 1
2 | 1 | 2 | user | 1
In this case, option 1 would receive 2 votes since user 2 gave his vote to user 1 who votes for option 1.
I realize that the data I am going to store is going to be fairly complicated to query in a RDBMS system if I want to visualise how the votes move between users. However, I don't have much experience with graph databases and would like some hints as to how I go around modelling this.

It's always preferable, when making a DB model, to start with an information design model, and then transform this into a DB model.
In an information design model for your problem, options would be componenents of polls (so the UML class diagram would have a composition between Option and Poll), and votes would be relationships/links between users and options (so the UML class diagram would have a *many-to-many association between Option and User, the instances of which are the votes). In addition, there is a ternary association User-delegates-his-vote-in-Poll-to-User, the instances of which are the delegations.
From this, I get the following DB model:
Poll( id, question)
Option( poll_id, option_sequence_no, possible_vote)
Vote( user_id, poll_id, option_sequence_no, nmr_of_votes)
Delegation( user_id, poll_id, delegate_id)
Of course, we have to add a constraint that the number of votes by a use in a poll is the number of delegations plus 1.

Related

What is the name of the approach where you store the history in the same table?

I'm developing an application that uses a mysql database and we wanted to do an approach for history purposes, that we store the current state and the history in the same table for performance reasons (on updates the application doesn't have the id for an entity just a key pair, so it is easier just to insert a new row).
The table looks like this:
+------+-------+-----------+------------------------------+
| id |user_id| type |content |
+------+-------+-----------+------------------------------+
| 1 |'1-2-3'| position | *creation |
| 2 |'1-2-3'| position | *something_changed |
| 3 |'1-2-3'| device | *creation |
| 4 |'1-2-4'| position | *creation |
| 5 |'1-2-4'| device | *creation |
| 6 |'1-2-4'| device | *something_changed |
+------+-------+-----------+------------------------------+
Every entity is described with the user_id and type "key" pair, when something is changed in the entity a new row is inserted. The current state of an entity is selected by the highest id row from the group, which is grouped by the user_id and type. Performance wise the updates should be super fast and the selects can be slower, because those are not used often.
I would like to look up best practices and other people experiences with this method, but I don't know how to search for them. Can you help me? I'm interested in your experiences or opinions on this topic as well.
I know about Kafka and other streaming platforms, but that was sadly not an option for this.

How to deal with Variable data over time in associations

In linked models (let's say a drink transaction, a waiter, and a restaurant), when you want to display data, you look for informations in your linked content :
Where was that beer bought ?
Fetch Drink transaction => Fetch its Waiter => Fetch this waiter's Restaurant : this is where the beer was purchased
So at time T, when I display all transactions, I fetch my data following associations, thus I can display this :
TransactionID Waiter Restaurant
1 Julius Caesar's palace
2 Cleo Moe's tavern
Let's say now that my waiter is moved to another restaurant.
If I refresh this table, the result will be
TransactionID Waiter Restaurant
1 Julius Moe's tavern
2 Cleo Moe's tavern
But we know that the transaction n°1 was made in Caesar's palace !
Solution 1
Don't modify the waiter Julius, but clone it.
Upside : I keep an association between models, and still can filter with every field of every associated models.
Downside : Every modification on every model duplicates content, which can do a LOT when time passes.
Solution 2
Keep a copy of the current state of your associated models when you create the transaction.
Upside : I don't duplicate the contents.
Downside : You can't anymore use fields on your content to display, sort or filter them, as your original and real data is inside, let's say, a JSON field. So you have to, if you use MySQL, filter your data by makin plain-search queries in that field.
What is your solution ?
[EDIT]
The problem goes further, as it's not only a matter when association changes : a simple modification on an associated model causes a problem too.
What I mean :
What's the amount of this order ?
Fetch Drink transaction => Fetch its product => Fetch this product's Price => Multiply by order quantity : this is the total amount of the order
So at time T, when I display all transactions, I fetch my data following associations, thus I can display this :
TransactionID Qty ProductId
1 2 1
ProductID Title Price
1 Beer 3
==> Amount of order n°1 : 6.
Let's say now that the beer costs 2,5.
If I refresh this table, the result will be
TransactionID Qty ProductId
1 2 1
ProductID Title Price
1 Beer 2,5
==> Amount of order n°1 : 5.
So, once again, the 2 solutions are available : do I clone the beer product when its price is changed ? Do I save a copy of beer in my order when the order is made ? Do you have any third solution ?
I can't just add an "amount" attribute on my orders : yes it can solve that problem (partially) but it's not a scalable solution as many other attributes will be in the same situation and I can't multiply attributes like this.
Event Sourcing
This is a good use case for Event Sourcing. Martin Fowler wrote a very good article about it, I advise you to read it.
there are times when we don't just want to see where we are, we also want to know how we got there.
The idea is to never overwrite data but instead create immutable transactions for everything you want to keep a history of. In your case you'll have WaiterRelocationEvents and PriceChangeEvents. You can recreate the status of any given time by applying every event in order.
If you don't use Event Sourcing, you lose information. Often it's acceptable to forget historic information, but sometimes it's not.
Lambda Architecture
As you don't want to recalculate everything on every single request, it's advisable to implement a Lambda Architecture. That architecture is often explained with BigData technology and frameworks, but you could implement it with Plain Old Java and CronJobs.
It consists of three parts: Batch Layer, Service Layer and Speed Layer.
The Batch Layer regularly calculates an aggregated version of the data, for example you'll calculate the monthly income once per day. So the current month's income will change every night until the month is over.
But now you want to know the income in real-time. Therefore you add a Speed Layer, which will apply all events of the current date immediately. Now if a request of the current month's income arrives, you'll add up the last result of the Batch Layer and the Speed Layer.
The Service Layer allows more advanced queries by combing multiple batch results and the Speed Layer results into one query. For example you can calculate the year's income by summing the monthly incomes.
But as said before, only use the Lambda approach if you need the data often and fast, because it adds extra complexity. Calculations which are rarely needed, should be run on-the-fly. For example: Which waiter creates the most income at Saturday evenings?
Example
Restaurants:
| Timestamp | Id | Name |
| ---------- | -- | --------------- |
| 2016-01-01 | 1 | Caesar's palace |
| 2016-11-01 | 2 | Moe's tavern |
Waiters:
| Timestamp | Id | Name | FirstRestaurant |
| ---------- | -- | -------- | --------------- |
| 2016-01-01 | 11 | Julius | 1 |
| 2016-11-01 | 12 | Cleo | 2 |
WaiterRelocationEvents:
| Timestamp | WaiterId | RestaurantId |
| ---------- | -------- | ------------ |
| 2016-06-01 | 11 | 2 |
Products:
| Timestamp | Id | Name | FirstPrice |
| ---------- | -- | -------- | ---------- |
| 2016-01-01 | 21 | Beer | 3.00 |
PriceChangeEvent:
| Timestamp | ProductId | NewPrice |
| ---------- | --------- | -------- |
| 2016-11-01 | 21 | 2.50 |
Orders:
| Timestamp | Id | ProductId | Quantity | WaiterId |
| ---------- | -- | --------- | -------- | -------- |
| 2016-06-14 | 31 | 21 | 2 | 11 |
Now let's get all information about order 31.
get order 31
get price of product 21 at 2016-06-14
get last PriceChangeEvent before the date or use FirstPrice if none exists
calculate total price by multiplying retrieved price with quantity
get waiter 11
get waiter's restaurant at 2016-06-14
get last WaiterRelocationEvent before the date or use FirstRestaurant if none exists
get restaurant name by retrieved restaurant id of the waiter
As you can see it becomes complicated, therefore you should only keep history of useful data.
I wouldn't involve the relocation events in the calculation. They could be stored, but I would store the restaurant id and the waiter id in the order directly.
The price history on the other hand could be interesting to check if orders went down after a price change. Here you could use the Lambda Architecure to calculate a full order with prices from the raw order and the price history.
Summary
Decide of which data you want to keep the history.
Implement Event Sourcing for that data.
Use the Lambda Architecture to speed up commonly used queries.
I like the question as it raises something very straightforward and also something more subtle.
The common principle in both cases is that ‘History must not change’, meaning if we run a query over a specified past date range today the results are the same as when we run that same query at any point in the future.
Waiters Case
When a waiter changes restaurants we must not change the history of sales. If waiter Julius sells a drink yesterday in restaurant 1 then he switches to sell more drinks today in restaurant 2 we must retain those details.
Thus we want to be able to answer queries such as ‘how many drinks has Julius sold in restaurant 1’ and ‘how many drinks has Julius sold in all restaurants’.
To achieve this you have to abstract away from Julius as a waiter by bringing in a concept of staff. Julius is a member of staff. Staff work as waiters. When working in restaurant 1 Julius is waiter A and when he works in another restaurant he is waiter B, but always the same member of staff – Julius. With an entity ‘Staff’ the queries can be answered easily.
Upside:
No loss of historic data or excessive duplications.
Downside New entity Staff must be managed. But waiter table content is reduced making net overhead of data storage is low.
In summary - abstract data subject to change into a new entity and refer back to it from transactions.
Value of Order Case
The extended use case regarding ‘what is the value of this order’ is more involved. I work in cross-currency transactions where value for the observer (user) in the price list changes from day to day as currency fluctuations occur.
But there are good reasons to lock the order value in place. For example invoice processing systems have tolerance for a small difference between their expected invoice value and that of the submitted invoice, but any large difference can lead to late payment whilst invoice handlers check the issue. Also, if customers run reports on their historic purchases then the values of those orders must remain consistent despite fluctuations in currency rates over time.
The solution is to save into the order line:
the value of product in the customers currency,
or the rate between custom and supplier currency,
but ideally do both to avoid rounding errors.
What this does is provide a statement that ‘on the date that this order was placed line 1 cost $44.56 at exchange rate 1.1 $/£’. Having this data locked in allows you to invoice to the customers expectation and provide consistent spend reports over time.
Upside: Consistent historic data. Fast database performance as no look-ups required against historic rate tables.
Downside: Some data duplication. However, trading off against overhead of storage and indexation for historic rate storage plus indexation then this is possibly an upside.
Regarding adding 'amount' to your order table - you have to do this if you want to achieve a consistent data history. If you only work in one currency then amount is the only additional storage concern. And by adding this one attribute you have protected history. Your other alternative is to store a historic cost table for drinks so you know in January beer was $1, in February it as $1.10 etc and then store the cost-table key in the transaction so that you can look up the cost if anyone asks about a historic order. But the overhead on storing the key PLUS the indexes needed to make this practicable will outweigh the storage cost of cloning 'amount' onto the order record.
In summary - clone cost data that will change over time.

Database Design - Drop Down Input Box Issue

I'm trying to create a friendship site. The issue I'm having is when a user joins a website they have to fill out a form. This form has many fixed drop down items the user must fill out. Here is an example of one of the drop downs.
Drop Down (Favorite Pets)
Items in Favorite Pets
1. Dog
2. Cat
3. Bird
4. Hampster
What is the best way to store this info in a database. Right now the profile table has a column for each fixed drop down. Is this correct database design. See Example:
User ID | Age | Country | Favorite Pet | Favorite Season
--------------------------------------------------------------
1 | 29 | United States | Bird | Summer
Is this the correct database design? right now I have probably 30 + columns. Most of the columns are fixed because they are drop down and the user has to pick one of the options.
Whats the correct approach to this problem?
p.s. I also thought about creating a table for each drop down but this would really complex the queries and lead to lots of tables.
Another approach
Profile table
ID | username | age
-------------------
1 | jason | 27
profileDropDown table:
ID | userID | dropdownID
------------------------
1 | 1 | 2
2 | 1 | 7
Drop Down table:
ID | dropdown | option
---------------------
1 | pet | bird
2 | pet | cat
3 | pet | dog
4 | pet | Hampster
5 | season | Winter
6 | Season | Summer
7 | Season | Fall
8 | Season | spring
"Best way to approach" or "correct way" will open up a lot of discussion here, which risks this question being closed. I would recommend creating a drop down table that has a column called "TYPE" or "NAME". You would then put a unique identifier of the drop down in that column to identify that set. Then have another column called "VALUE" that holds the drop down value.
For example:
ID | TYPE | VALUE
1 | PET | BIRD
2 | PET | DOG
3 | PET | FISH
4 | SEASON | FALL
5 | SEASON | WINTER
6 | SEASON | SPRING
7 | SEASON | SUMMER
Then to get your PET drop down, you just select all from this table where type = 'PET'
Will the set of questions (dropdowns) to be asked every user ever be changed? Will you (or your successor) ever need to add or remove questions over time? If no, then a table for users with one column per question is fine, but if yes, it gets complex.
Database purists would require two tables for each question:
One table containing a list of all valid answers for that question
One table containing the many to many relation between user and answer to “this” question
If a new question is added, create new tables; if a question is removed, drop those tables (and, of course, adjust all your code. Ugh.) This would work, but it's hardly efficient.
If, as seems likely, all the questions and answer sets are similar, then a three-table model suggests itself:
A table with one row per question (QuestionId, QuestionText)
A table with one row for each answer for each Question (QuestionId, AnswerId, AnswerText)
A table with one row for each user-answered question (UserId, QuestionId, AnswerId)
Adding and removing questions is straightforward, as is identifying skipped or unanswered questions (such as, if you add a new question a month after going live).
As with most everything, there’s a whole lot of “it depends” behind this, most of which depends on what you want your system to do.

Friendship Website Database Design

I'm trying to create a database for a frienship website I'm building. I want to store multiple attributes about the user such as gender, education, pets etc.
Solution #1 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbie. . .
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | Female | University | Dog | Ping Pong
The problem I'm having is the list of attributes goes on and on and right now my user table has 20 something columns.
I feel I could normalize this by creating another table for each attribute see below. However this would create many joins and I'm still left with a lot of columns in the user table.
Solution #2 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbies
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | 0 | 0 | 0 | 0
Pets table:
id | Pet Type
---------------
0 | Dog
Anyone have any ideas how to approach this problem it feels like both answers are wrong. What is the proper table design for this database?
There is more to this than meets the eye: First of all - if you have tons of attributes, many of which will likely be null for any specific row, and with a very dynamic selection of attributes (i.e. new attributes will appear quite frequently during the code's lifecycle), you might want to ask yourself, whether a RDBMS is the best way to materialize this ... essentially non-schema. Maybe a document store would be a better fit?
If you do want to stay in the RDBMS world, the canonical answer is to have either one or one-per-datatype property table plus a table of properties:
Users.id | .name | .birthdate | .Gender | .someotherfixedattribute
----------------------------------------------------------
1743 | Me. | 01/01/1970 | M | indeed
Propertytpes.id | .name
------------------------
234 | pet
235 | hobby
Poperties.uid | .pid | .content
-----------------------------
1743 | 234 | Husky dog
You have a comment and an answer that recommend (or at least suggest) and Entity-Attribute-Value (EAV) model.
There is nothing wrong with using EAV if your attributes need to be dynamic, and your system needs to allow adding new attributes post-deployment.
That said, if your columns and relationships are all known up front, and they don't need to be dynamic, you are much better off creating an explicit model. It will (generally) perform better and will be much easier to maintain.
Instead of a wide table with a field per attribute, or many attribute tables, you could make a skinny table with many rows, something like:
Attributes (id,user_id,attribute_type,attribute_value)
Ultimately the best solution depends greatly on how the data will be used. People can only have one DOB, but maybe you want to allow for multiple addresses (billing/mailing/etc.), so addresses might deserve a separate table.

how to see a difference between entity and a column

Sometimes I am having a hard time seeing a difference between an entity and a column when I am starting to make a diagram. I don't know when it is supposed to be a entity or a column. For example, in some game if you have a user and that user can play by itself or it can play in the group. Would you make that two different entities User and GroupUser ?
Also, for example if the User has levels, status and badges they earn which is part of the game. Would these be entities also or they would just be in one entity which would be part of the User ?
Entity could be a Person (e.g. Student), Place (e.g. Room Name), Object (e.g. Books), Abstract Concept (e.g. Course, Order) that could be represented in your database and normally could become a Table in your Database.
Column(s) on the other hand is/are the attribute(s) of your Entity.
So, in your case you have a User entity and the possible columns or attributes (or fields) are
UserID, UserLevel, UserStatus, Badges, PlayStatus (values could be individual or group).
Your Badges although is a column could turn into Entity if it violates the Normalization rules.
For example if you have this Table for User:
Table: Users
UserID UserName UserStatus PlayStatus Badges
------ -------- ---------- ---------- ------
1 Surefire Active Single Private, Warrior, Platoon Leader
2 FastMachine Active Group Private, Warrior
3 BeatTheGeek Inactive Group Private
The Badges here violates the 1NF (1st Normal Form) in Normalization rules which says that there should be no repeating groups or in this case no Multi-valued columns. So, this could be normalized like:
Table: Users
UserID UserName UserStatus PlayStatus
------ -------- ---------- ----------
1 Surefire Active Single
2 FastMachine Active Group
3 BeatTheGeek Inactive Group
Table: Badges
BadgeID BadgeName
------ --------
1 Private
2 Indie
3 Warrior
4 Platoon Leader
5 Colonel
6 1 Star General
7 2 Star General
8 3 Star General
9 4 Star General
10 5 Star General
11 Hero
Table: UserBadgesHistory
UserID BadgeID ReceiveDate
------ -------- -----------
1 1 12/01/2013
1 3 12/05/2013
1 4 1/5/2014
2 1 2/5/2014
2 3 2/10/2014
3 2 11/10/2013
In general, an entity has multiple columns (i.e. attributes) of its own, and a column (or attribute) does not.
In your example, if the only data you're interested in storing is a User's current level, then level is unlikely to be an entity. This is because it would have only a single attribute of name/number. If you wanted to find all Users currently at level 4, you would simply do a query with level = 4.
On the other hand, if you had a reason to add additional data about the level, such as what abilities are associated with that level or the date a given User achieved the level, then you would want to make Level a separate entity.
A Level entity would have an ID, a number or name, and whatever other attributes you need as data.
ID | Prerequisite | Ability
----+--------------+--------------
1 | NULL | May gain foos
2 | Gain 10 foos | May gain bars
3 | Gain 20 bars | 30 free foos
In a fully normalized state, you would have another entity called UserLevel in which you would store data about, for example, when a certain User gained a level.
The UserLevel entity would contain the LevelID and the UserID as foreign keys (links back to the other entities), and a DateAchieved column for when the User achieved the level.
LevelID | UserID | DateAchieved
---------+--------+-------------
1 | 1 | 2014-02-01
1 | 2 | 2014-02-01
2 | 1 | 2014-02-05
3 | 1 | 2014-02-09
2 | 2 | 2014-02-11
4 | 1 | 2014-02-13
This shows User 1 and User 2 starting at Level 1 on the same day and leveling up at different rates.

Resources