Assuming that all playlists are subsets of a user's main library of music, how should a main library as well as playlists be managed in the database? It seems like a playlists table would grow extremely quickly for even a moderate amount of users. Would this be a decent use case for a nosql database having a list of playlists in each User collection, as opposed to a giant playlists table incorporating all users in the same place?
You haven't given a lot of details so I'm answering as best I can. I think a relational database solution is perfect for this problem and though you might end up with millions of records in the playlists and playlists_songs tables any modern RDBMS should be able to handle that with no problems.
You may or may not need/want a table for albums, I've included it here for the sake of completeness...
albums
id unsigned int(P)
artist_id unsigned int(F artists.id)
name varchar(50)
...
+----+-----------+-----------------------------------+-----+
| id | artist_id | name | ... |
+----+-----------+-----------------------------------+-----+
| 1 | 1 | The Last in Line | ... |
| 2 | 3 | American IV: The Man Comes Around | ... |
| 3 | 2 | Animal House Soundtrack | ... |
| 4 | 4 | None or Unknown | ... |
| .. | ......... | ................................. | ... |
+----+-----------+-----------------------------------+-----+
Like albums, you may or may not want a table for artists but I've included it in case you want to show that kind of data.
artists
id unsigned int(P)
name varchar(50)
...
+----+-------------+
| id | name |
+----+-------------+
| 1 | Dio |
| 2 | Various |
| 3 | Johnny Cash |
| 4 | Unknown |
| 5 | Sam Cooke |
| .. | ........... |
+----+-------------+
I view playlists as very basic: a user can have an unlimited number of them and they have a name. In my example data we see that bob has two playlists "Mix" and "Speeches" while mary has only one "Oldies".
playlists
id unsigned int(P)
user_id unsigned int(F users.id)
name varchar(50)
+----+---------+----------+
| id | user_id | name |
+----+---------+----------+
| 1 | 1 | Mix |
| 2 | 1 | Speeches |
| 3 | 2 | Oldies |
| .. | ....... | ........ |
+----+---------+----------+
We have to keep track of what songs are on each playlist. In my example data you can see that "Egypt (The Chains Are On)" and "Hurt" are on the "Mix" playlist while the "Town Hall speech" is on the "Speeches" playlist and "Egypt (The Chains Are On)", "Hurt" and "Twistin' the Night Away" are all on the "Oldies" playlist.
playlists_songs
id unsigned int(P)
playlist_id unsigned int(F playlists.id)
song_id unsigned int(F songs.id)
+----+-------------+---------+
| id | playlist_id | song_id |
+----+-------------+---------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 4 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
| 6 | 3 | 3 |
| .. | ........... | ....... |
+----+-------------+---------+
Even though millions of users might all have the song "Hurt" in their collection, we only need to store information about each song once. So in the songs table we store information about each song including where the actual audio file is located. My example for file locations are just off the top of my head, how you would actually organize the files in the filesystem could easily be very different.
songs
id unsigned int(P)
album_id unsigned int(F albums.id) // Default NULL
artist_id unsigned int(F artists.id)
name varchar(50)
filename varchar(255)
...
+----+----------+-----------+---------------------------+---------------------------+-----+
| id | album_id | artist_id | name | filename | ... |
+----+----------+-----------+---------------------------+---------------------------+-----+
| 1 | 1 | 1 | Egypt (The Chains Are On) | /media/audio/1/1/9.mp3 | ... |
| 2 | 2 | 3 | Hurt | /media/audio/3/2/2.mp3 | ... |
| 3 | 3 | 5 | Twistin' the Night Away | /media/audio/5/2/3.mp3 | ... |
| 4 | NULL | 4 | Town Hall speech | /media/audio/4/4/<id>.mp3 | ... |
| .. | ........ | ......... | ......................... | ......................... | ... |
+----+----------+-----------+---------------------------+---------------------------+-----+
And of course your users table.
users
id unsigned int(P)
username varchar(32)
password varbinary(255)
...
+----+----------+----------+-----+
| id | username | password | ... |
+----+----------+----------+-----+
| 1 | bob | ******** | ... |
| 2 | mary | ******** | ... |
| .. | ........ | ........ | ... |
+----+----------+----------+-----+
I think having a conceptual design like below will helps.
The key here is to store media files out of application's database and make a link between them by media's path.
Some RDBMS's provide APIs to access file system, like Oracle BFILE or SqlServer FILESTREAM .
Using relational or No-Sql solution is related to application business.
Any of them come with its own pros and cons, a comparison could be found here.
Related
I'm building an app for trading cards for a given game. This means, a user can have multiple cards and even repeated cards. This is may approach but I don't know if it's correct (or even possible):
Users
---------------------------
|id| name | cards_ids |
---------------------------
|20| John | 31, 40, 50, 50|
---------------------------
Cards
-------------------------------
|id| name | type |
-------------------------------
|31| Monster31 | Aqua Monster|
-------------------------------
|50| Monster50 | Rock Monster|
-------------------------------
|40| Monster40 | Air Monster |
-------------------------------
As you can see, a user can have many cards even if they are the same. Would this duplicate foreign keys approach work fine? I will do this using Postgres, if that's relevant
You need think third normal form when designing your database.
In this case you want add the number of cards as a property
Users
-----------
|id| name |
-----------
|20| John |
-----------
CardsOwned
--------------------------------
|user_id| card_type_id | count |
--------------------------------
|20 | 31 | 1 |
|20 | 40 | 1 |
|20 | 50 | 2 |
--------------------------------
Or even better they should have their own id. Even when two cards are the same monster, they can have different attributes like "Near Mint" or "Mint"
Your cards definition should be something like cards_type where you define the card. But the cards own by anyone are the cards where even when are the same cards they have different id because are two different cards
------------------------------------------
| card_id | card_type_id | condition |
------------------------------------------
| 1 | 31 | Mint |
| 2 | 40 | Near Mint |
| 3 | 50 | Used |
| 4 | 50 | Mint |
------------------------------------------
then you need the ownership table to control who own what
CardsOwned:
| card_id | owner_id |
| 1 | 20 |
| 2 | 20 |
| 3 | 20 |
| 4 | 20 |
So I have a database that is for selling products. The orders tables hold's the customer's order info like shipping province and Shipping method:
+--------------------+----+
| order | |
+--------------------+----+
| someOtherInfo | |
| shippingMethodID | fk |
| shippingProvinceID | fk |
| basePrice | |
+--------------------+----+
Basically what's happen now is we have a base price for shipping that's based on the Province and Shipping Method. I need to now add that base price for shipping to my orders table and the fact that base price for shipping is a Matrix table or 2d array where col(shipping Method) + Row(Province) = baseCost is throwing me off on how to implement it. I never had to deal with this so Don't even know what to look up.
Example of what the matrix looks like:
+--------+----+----+----+-----+
| | BC | AB | SK | etc |
+--------+----+----+----+-----+
| ground | 9 | 9 | 9 | |
| Air | 15 | 21 | 21 | |
| etc | | | | |
+--------+----+----+----+-----+
So I'm currently working on a stored procedure that returns a table comprising of data about the user (user ID) and performance metrics (field, metric_obtained). I was originally doing it so I would return all data back, but I was thinking it would be more efficient to return only people who meet the minimum to be recognized. I've already filtered them out based on minimum requirement, but the thing is they can be qualified based on a combination of things, so if I have 3 metrics A, B, and C, one recognition can be for A and B, or another is just C. I'm also not limited to a max of 3 or this wouldn't be a problem.
My tables look like this:
|Employee | Metric | Obtained|
|_________|________|_________|
| John | Email | .98 |
| Sue | Email | .99 |
| Sue | Phone | .82 |
| Larry | Email | .93 |
| Larry | Phone | .83 |
| Jess | Phone | .9 |
| Jess | Email | .94 |
| Bob | Phone | .99 |
So if I need to get back both Phone AND Email my results would look like this:
|Employee | Metric | Obtained|
|_________|________|_________|
| Sue | Email | .99 |
| Sue | Phone | .82 |
| Larry | Email | .93 |
| Larry | Phone | .83 |
| Jess | Phone | .9 |
| Jess | Email | .94 |
Like I said, this would be easy if I had a guaranteed number of metrics, but I don't. Any thoughts?
We are working on an application where millions of users will be entering information at the same time. Suppose the application allows people to rate geographic regions on where they would like to live. Each participant is allowed to rate each region using a decimal value from 0-10. Each person belongs to one or more groups based upon attributes such as gender, and people that consider themselves active, or enjoy culture.
Every time a rating is made, we need to have a view which shows us the average rating for each region/group. I'm aware that most DB's have an "average" function, but for our purposes we need to be able to use our own function as we may use a the geometric mean instead of the arithmetic mean.
Below are some tables which might be used. Note: I did not include the relationship table PeopleGroups which map which groups a person is a member of for brevity purposes.
Regions People Groups RegionScoresByPerson
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| RID | NAME | | PID | Name | | GID | Name | | RID | PID | Score |
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| 1 | Flordia | | P1 | Alice | | G0 | Everyone | | 1 | P1 | 6 |
| 2 | California | | P2 | Bob | | G1 | Women | | 1 | P2 | 8 |
+-----+------------+ | P3 | Frank | | G2 | Men | | 1 | P3 | 3 |
| P4 | Mary | | G3 | Active | | 1 | P4 | 2 |
+-----+-------+ | G4 | Culture | | 1 | P1 | 7 |
+-----+----------+ | 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
+-----+-----+-------+
Our current implementation uses a similar set of tables for storing ratings, but we don't calculate averages real-time. Anytime we need the results (e.g. show me the average score California for women), we have to pull all the information into memory and run the calculations manually.
I was wondering how I leverage database technologies such as views, triggers, stored procedures, etc. to present to me a simple table that will allow me to get scores by for people and groups so we don't have to manually run calculations.
I would like some table like the following, where everything is handled by the DB. Any insert,update,delete actions on the RegionScoresByPerson or Groups tables would automatically be reflected in this table. If it is not apparent, the rows marked with * calculated rows. In this case I'm using a simple arithmetic average, but I the design should allow for any type of function.
EID stands for entity ID (a person or group)
Besides deciding how to build such a view, I'm unsure of what sort of datatypes to use (and index) for People and Groups. I suppose I'd like the index to be integers, but that would prevent me from creating the table below because I couldn't distinguish between Person 1 and Group 1 -- Would having ID's such as P1 and G1 be a performance hit? I'm obviously concerned about the design being scalable.
ScoreView
+-----------+-----+-------+
| RID | EID | Score |
| 1 | P1 | 6 |
| 1 | P2 | 8 |
| 1 | P3 | 3 |
| 1 | P4 | 2 |
| 1 | P1 | 7 |
| 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
| 1 | G0 | 4.75 |*
| 1 | G1 | 4 |*
| 1 | G2 | … |*
| 1 | G3 | … |*
+-----------+-----+-------+
Apache Flume is the open source tool designed to solve this kind of problem. Also have a look at Google Cloud Dataflow.
https://flume.apache.org/
Newbie with databases, I would like some advise please..
I have agencies who can download photo's.
Standard each agency can download "medium" & "large" photos.
Now from their account page I would like them to make extra custom presets and manage those.
I looked in the database of some blog software how they handle categories and wrapped my head around this example. Is this the right approach?
Cheers
agency 1 has preset "medium" & "large"
agency 2 has preset "medium", "large" & "Bill custom"
-----------
| presets |
-----------------------------------------------
| preset_id | preset_name | preset_dimensions |
-----------------------------------------------
| 1 | medium | 800x600 |
| 2 | large | 3000x2000 |
| 3 | Bill custom | 640x420 |
-----------------------------------------------
----------------
| preset_assoc |
------------------------------------------------------------
| presassoc_id | presassoc_preset_id | presassoc_agency_id |
------------------------------------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 3 | 2 |
------------------------------------------------------------
------------
| agencies |
---------------------------
| agency_id | agency_name |
---------------------------
| 1 | Joe ltd |
| 2 | Bill inc |
---------------------------
The approach is right. Because you have NxN relation (1 agency can have multiple presets, and the same preset could be used by multiple agencies) you need to have a joining table. The only questionable thing is that preset_assoc doesn't have to have presassoc_id because the other 2 columns could be used as a combined primary key.