Best way to store results data in database? [duplicate] - database

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 9 years ago.
I have results data like this:
1. account, name, #, etc
2. account, name, #, etc
...
10. account, name, #, etc
I have approximately 1 set of results data generated each week.
Currently it's stored like so:
DATETIME DATA_BLOB
Which is annoying because I can't query any of the data without parsing the BLOB into a custom object. I'm thinking of changing this.
I'm thinking of having one giant table:
DATETIME RANK ACCOUNT NAME NUMBER ... ETC
date1 1 user1 nn #
date1 2 user2 nn #
...
date1 10 userN nn #
date2 1 user5 nn #
date2 2 user12 nn #
...
date2 10 userX nn #
I don't know anything about database design principles, so can someone give me feedback on whether this is a good approach or there might be a better one?
Thanks

I think it is ok to have a table like that, if there are not one-to-many relationships. In that case, it would be more efficient to have multiple tables like in my example below. Here are some general tips as well:
Tip: Good practice My professor told me that it's always good to have an "ID" column, which is a unique number identifier for each item in the table (1, 2, 3… etc.). (Perhaps that was the intent of your "Number" column.) I think SQLite forces each table to have an ID column anyways.
Tip: Saving storage space - Also, if there is a one-to-many relationship (example: one name has many accounts) then it might save space to have a separate table for the accounts, and then store the ID of the name in the first table- so that way you are storing many ints instead of duplicate strings.
Tip: Efficiency - Some databases have specific frameworks designed to handle relationships such as many-to-one or many-to-many, so if you use their framework for that (I don't remember exactly how to do it) it will probably work more efficiently.
Tip: Saving storage space - If you make your own ID column it might be a waste if it automatically includes an "ID" column anyways - so you might want to check for that possibility.
Conceptual Example: (Storing multiple accounts for the same name)
Poor Solution:
Storing everything in 1 table (inefficient, because it duplicates Bob's name, rank, and datetime):
ID NAME RANK DATETIME ACCOUNT
1 Bob 1 date1 bob_account_1
2 Joe 2 date2 user2_joe
3 Bob 1 date1 bob_account_2
4 Bob 1 date1 bobs_third_account
Better Solution: Having 2 tables to prevent duplicated information (Also demonstrates the usefulness of ID's). I named the 2 tables "Account" and "Name."
Table 1: "Account" (Note that NAME_ID refers to the ID column of Table 2)
ID NAME_ID ACCOUNT
1 1 bob_account_1
2 2 user2_joe
3 1 bob_account_2
4 1 bobs_third_account
Table 2: "Name"
ID NAME RANK DATETIME
1 Bob 1 date1
2 Joe 2 date2
I'm not a database expert so this is just some of what I learned in my internet programming class. I hope this helps lead you in the right direction in further research.

Related

Simple database design - some columns have multiple values

Caveat: very new to database design/modeling, so bear with me :)
I'm trying to design a simple database that stores information about images in an archive. Along with file_name (which is one distinct string), I have fields like genre and starring where each field might contains multiple strings (if an image is associated with multiple genres, and/or if an image has multiple actors in it).
Right now the database is just a single table keyed on file_name, and the fields like starring and genre just have multiple comma-separated values stored. I can query it fine by using wildcards and like and in operators, but I'm wondering if there's a more elegant way to break out the data such that it is easier to use/query. For instance, I'd like to be able to find how many unique actors are represented in the archive, but I don't think that's possible with the current model.
I realize this is a pretty elementary question about data modeling, but any guidance anyone can provide or reading you can direct me to would be greatly appreciated!
Thanks!
You need to create extra tables in order to stick with the normalization. In your situation you need 4 extra tables to represent these n->m relations(2 extra would be enough if the relations were 1->n).
Tables:
image(id, file_name)
genre(id, name)
image_genres(image_id, genre_id)
stars(id, name, ...)
image_stars(image_id, star_id)
And some data in tables:
image table
id
file_name
1
/users/home/song/empire.png
2
/users/home/song/promiscuous.png
genre table
id
name
1
pop
2
blues
3
rock
image_genres table
image_id
genre_id
1
2
1
3
2
1
stars table
id
name
1
Jay-Z
2
Alicia Keys
3
Nelly Furtado
4
Timbaland
image_stars table
image_id
star_id
1
1
1
2
2
3
2
4
For unique actor count in database you can simply run the sql query below
SELECT COUNT(name) FROM stars

How to model arbitrarily ordering items in database?

I accepted a new feature to re-order some items by using Drag-and-Drop UI and save the preference for each user to the database. What's the best way to do so?
After reading some questions on StackOverflow, I found this solution.
Solution 1: Use decimal numbers to indicate order
For example,
id item order
1 a 1
2 b 2
3 c 3
4 d 4
If I insert item 4 between item 1 and 2, the order becomes,
id item order
1 a 1
4 d 1.5
2 b 2
3 c 3
In this way, every new order = order[i-1] + order[i+1] / 2
If I need to save the preference for every user, then I need to another relationship table like this,
user_id item_id order
1 1 1
1 2 2
1 3 3
1 4 1.5
I need num_of_users * num_of_items records to save this preference.
However, there's a solution I can think of.
Solution 2: Save the order preference in a column in the User table
This is straightforward by adding a column in the User table to record the order. Each value would be parsed as an array of item_ids that ranked by the index of the array.
user_id . item_order
1 [1,4,2,3]
2 [1,2,3,4]
Is there any limitation of this solution? Or is there any other ways to solve this problem?
Usually, an explicit ordering deals with the presentation or some specific processing of data. Hence, it's a good idea to separate entities of theirs presentation/processing. For example
users
-----
user_id (PK)
user_login
...
user_lists
----------
list_id, user_id (PK)
item_index
item_index can be a simply integer value :
ordered continuously (1,2...N): DELETE/INSERT of the whole list are normally required to change the order
ordered discretely with some seed (10,20...N): you can insert new items without reordering the whole list
Another reason to separate entity data and lists: reordering lists should be done in transaction that may lead to row/table locks. In case of separated tables only data in list table is impacted.

Database design for multiple similar types?

Say I have two question types: Multiple Choice and Range. A Range question allows users to answer by specifying a range of values in their answer (1-10 or 2-4 for example).
I inherited a database where the answers to these question types are stored in the same table which is structured like so:
Answers
-------
Id
QuestionId
choice
range_from
range_to
This results in data like below:
1 1 null 1 10
2 1 null 2 4
3 2 Pants null null
4 2 Hat null null
Does it make sense to include columns from every answer type in the answer table? Or should they be broken out into separate tables?
This is a very slimmed-down version of my real database. In reality there are about 8 question types, so with every answer there are several columns that are left unused.
Does it make sense to include columns from every answer type in the answer table?
This is "all classess in the same table" strategy for implementing inheritance, which is suitable for small number of classes. As the number of classes grows, you might consider one of the other strategies. There is no predefined "cut-off point" for that - you'll have to measure and decide for yourself.
The alternative would be an EAV-like system as proposed by blotto, but that would shift the enforcement of data consistency away from the DBMS. This is a valid solution if you don't know the structure of data at design-time and want to avoid DML at run-time, but if you do know the structire of data at design-time better stick with inheritance.
You could have a single field that represents the 'type' of question, that seems best suited in the Question table ( not the Answer table). For example:
question_type ENUM('choice', 'range', 'type_3', 'type_4'..)
Then make a one-to-many link ( a join table ) that represents the Question-to-Answers relationship
AnswerId (pk) | QuestionId (fk)
1 1
2 1
3 2
4 2
Finally, your Answer table is a collection of values for each Answer . It can designate each record more specifically by having its own ENUM.
answer_type ENUM('low_range', 'high_range', 'choice', etc)
Id (pk)| AnswerId (fk) | Type | Value
1 1 low_range 1
2 1 high_range 10
3 2 low_range 2
4 2 high_range 4
5 3 choice Pants
6 4 choice Hat
This is much more scalable, and basically pivots the fields in your previous table to values in the answers table. So you can always add new 'Type's both for questions an answers without adding new fields to the schema.

Query a Lookup Table in Excel or Access

I've got a mental block about what I'm sure is a common scenario:
I have some data in a csv file that I need to do some very basic reporting from.
The data is essentially a table with Resources as column headings and People as row headings, the rest of the table consists of Y/N flag, "Y" if the person has access to the resource, "N" if they don't. Both the resources and the people have unique names.
Sample data:
Res1 Res2 Res3
Bob Y Y N
Tom N N N
Jim Y N Y
The table is too large to simply view it as whole in Excel(say 300 resources and 600 people), so I need a way to easily query and display (A simple list would be ok) what resources a person has access to, given the person's name.
The person that will need to use this has MS Office, and not much else on their PC.
So, the question is: What is the best way to manipulate this data to get the report I need? My gut says MS Access would be the best, but I can't figure out to automatically import data like this into a normal relational database. If not Access, perhaps there are some functions in Excel that could help me out?
You should normalize your data. This will make it easier to query against. For example:
table users:
UserID UserName
1 Bob
2 Tim
3 Jim
table resources:
ResourceID ResourceDesc
1 Printer #1
2 Fax Machine
3 Bowling Ball Wax
table users_resources:
LinkID UserID ResourceID
1 1 1
2 1 2
3 3 1
4 3 3
SELECT ResourceID
FROM users_resources, users
WHERE users.UserName="Bob"

GROUP_CONCAT and DISTINCT are great, but how do i get rid of these duplicates i still have?

i have a mysql table set up like so:
id uid keywords
-- --- ---
1 20 corporate
2 20 corporate,business,strategy
3 20 corporate,bowser
4 20 flowers
5 20 battleship,corporate,dungeon
what i WANT my output to look like is:
20 corporate,business,strategy,bowser,flowers,battleship,dungeon
but the closest i've gotten is:
SELECT DISTINCT uid, GROUP_CONCAT(DISTINCT keywords ORDER BY keywords DESC) AS keywords
FROM mytable
WHERE uid !=0
GROUP BY uid
which outputs:
20 corporate,corporate,business,strategy,corporate,bowser,flowers,battleship,corporate,dungeon
does anyone have a solution? thanks a ton in advance!
What you're doing isn't possible with pure SQL the way you have your data structured.
No SQL implementation is going to look at "Corporate" and "Corporate, Business" and see them as equal strings. Therefore, distinct won't work.
If you can control the database,
The first thing I would do is change the data setup to be:
id uid keyword <- note, not keyword**s** - **ONE** value in this column, not a comma delimited list
1 20 corporate
2 20 corporate
2 20 business
2 20 strategy
Better yet would be
id uid keywordId
1 20 1
2 20 1
2 20 2
2 20 3
with a seperate table for keywords
KeywordID KeywordText
1 Corporate
2 Business
Otherwise you'll need to massage the data in code.
Mmm, your keywords need to be in their own table (one record per keyword). Then you'll be able to do it, because the keywords will then GROUP properly.
Not sure if MySql has this, but SQL Server has a RANK() OVER PARTITION BY that you can use to assign each result a rank...doing so would allow you to only select those of Rank 1, and discard the rest.
You have two options as I see it.
Option 1:
Change the way your store your data (keywords in their own table, join the existing table with the keywords table using a many-to-many relationship). This will allow you to use DISTINCT. DISTINCT doesn't work currently because the query sees "corporate" and "corporate,business,strategy" as two different values.
Option 2:
Write some 'interesting' sql to split up the keywords strings. I don't know what the limits are in MySQL, but SQL in general is not designed for this.

Resources