How to model arbitrarily ordering items in database? - database

I accepted a new feature to re-order some items by using Drag-and-Drop UI and save the preference for each user to the database. What's the best way to do so?
After reading some questions on StackOverflow, I found this solution.
Solution 1: Use decimal numbers to indicate order
For example,
id item order
1 a 1
2 b 2
3 c 3
4 d 4
If I insert item 4 between item 1 and 2, the order becomes,
id item order
1 a 1
4 d 1.5
2 b 2
3 c 3
In this way, every new order = order[i-1] + order[i+1] / 2
If I need to save the preference for every user, then I need to another relationship table like this,
user_id item_id order
1 1 1
1 2 2
1 3 3
1 4 1.5
I need num_of_users * num_of_items records to save this preference.
However, there's a solution I can think of.
Solution 2: Save the order preference in a column in the User table
This is straightforward by adding a column in the User table to record the order. Each value would be parsed as an array of item_ids that ranked by the index of the array.
user_id . item_order
1 [1,4,2,3]
2 [1,2,3,4]
Is there any limitation of this solution? Or is there any other ways to solve this problem?

Usually, an explicit ordering deals with the presentation or some specific processing of data. Hence, it's a good idea to separate entities of theirs presentation/processing. For example
users
-----
user_id (PK)
user_login
...
user_lists
----------
list_id, user_id (PK)
item_index
item_index can be a simply integer value :
ordered continuously (1,2...N): DELETE/INSERT of the whole list are normally required to change the order
ordered discretely with some seed (10,20...N): you can insert new items without reordering the whole list
Another reason to separate entity data and lists: reordering lists should be done in transaction that may lead to row/table locks. In case of separated tables only data in list table is impacted.

Related

Simple database design - some columns have multiple values

Caveat: very new to database design/modeling, so bear with me :)
I'm trying to design a simple database that stores information about images in an archive. Along with file_name (which is one distinct string), I have fields like genre and starring where each field might contains multiple strings (if an image is associated with multiple genres, and/or if an image has multiple actors in it).
Right now the database is just a single table keyed on file_name, and the fields like starring and genre just have multiple comma-separated values stored. I can query it fine by using wildcards and like and in operators, but I'm wondering if there's a more elegant way to break out the data such that it is easier to use/query. For instance, I'd like to be able to find how many unique actors are represented in the archive, but I don't think that's possible with the current model.
I realize this is a pretty elementary question about data modeling, but any guidance anyone can provide or reading you can direct me to would be greatly appreciated!
Thanks!
You need to create extra tables in order to stick with the normalization. In your situation you need 4 extra tables to represent these n->m relations(2 extra would be enough if the relations were 1->n).
Tables:
image(id, file_name)
genre(id, name)
image_genres(image_id, genre_id)
stars(id, name, ...)
image_stars(image_id, star_id)
And some data in tables:
image table
id
file_name
1
/users/home/song/empire.png
2
/users/home/song/promiscuous.png
genre table
id
name
1
pop
2
blues
3
rock
image_genres table
image_id
genre_id
1
2
1
3
2
1
stars table
id
name
1
Jay-Z
2
Alicia Keys
3
Nelly Furtado
4
Timbaland
image_stars table
image_id
star_id
1
1
1
2
2
3
2
4
For unique actor count in database you can simply run the sql query below
SELECT COUNT(name) FROM stars

Best way to store results data in database? [duplicate]

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 9 years ago.
I have results data like this:
1. account, name, #, etc
2. account, name, #, etc
...
10. account, name, #, etc
I have approximately 1 set of results data generated each week.
Currently it's stored like so:
DATETIME DATA_BLOB
Which is annoying because I can't query any of the data without parsing the BLOB into a custom object. I'm thinking of changing this.
I'm thinking of having one giant table:
DATETIME RANK ACCOUNT NAME NUMBER ... ETC
date1 1 user1 nn #
date1 2 user2 nn #
...
date1 10 userN nn #
date2 1 user5 nn #
date2 2 user12 nn #
...
date2 10 userX nn #
I don't know anything about database design principles, so can someone give me feedback on whether this is a good approach or there might be a better one?
Thanks
I think it is ok to have a table like that, if there are not one-to-many relationships. In that case, it would be more efficient to have multiple tables like in my example below. Here are some general tips as well:
Tip: Good practice My professor told me that it's always good to have an "ID" column, which is a unique number identifier for each item in the table (1, 2, 3… etc.). (Perhaps that was the intent of your "Number" column.) I think SQLite forces each table to have an ID column anyways.
Tip: Saving storage space - Also, if there is a one-to-many relationship (example: one name has many accounts) then it might save space to have a separate table for the accounts, and then store the ID of the name in the first table- so that way you are storing many ints instead of duplicate strings.
Tip: Efficiency - Some databases have specific frameworks designed to handle relationships such as many-to-one or many-to-many, so if you use their framework for that (I don't remember exactly how to do it) it will probably work more efficiently.
Tip: Saving storage space - If you make your own ID column it might be a waste if it automatically includes an "ID" column anyways - so you might want to check for that possibility.
Conceptual Example: (Storing multiple accounts for the same name)
Poor Solution:
Storing everything in 1 table (inefficient, because it duplicates Bob's name, rank, and datetime):
ID NAME RANK DATETIME ACCOUNT
1 Bob 1 date1 bob_account_1
2 Joe 2 date2 user2_joe
3 Bob 1 date1 bob_account_2
4 Bob 1 date1 bobs_third_account
Better Solution: Having 2 tables to prevent duplicated information (Also demonstrates the usefulness of ID's). I named the 2 tables "Account" and "Name."
Table 1: "Account" (Note that NAME_ID refers to the ID column of Table 2)
ID NAME_ID ACCOUNT
1 1 bob_account_1
2 2 user2_joe
3 1 bob_account_2
4 1 bobs_third_account
Table 2: "Name"
ID NAME RANK DATETIME
1 Bob 1 date1
2 Joe 2 date2
I'm not a database expert so this is just some of what I learned in my internet programming class. I hope this helps lead you in the right direction in further research.

Keep the order of current and future posts

One of my curiosities these days is how to order some posts. I will give you a clear example, maybe one the majority of you experimented in the past.
The Facebook Timeline, which you can add posts and events to, at any point in time you want. In this case, I assume, the posts are ordered by the date. When you add a new status for the past, you have to assign it a date, so it is easy to get them in order.
What I want to do is to have posts and the option to add a new post after a specific one. I don't have the option to add a date to it but I have to have a way to get them in that order.
So, if every post has an id and a date (the creation date). If a new post is added between 2 posts, I can't increment all the ids of the "upper" posts, so that I can order by the id. Neither is the date, because I can add a post between two older posts in the future.
What solution do you imagine for this? What criteria should I order by (I am ready to make some database scheme changes if needed)?
OK, there's another possible solution. Use two columns for ordering: Order1 is basic order, and Order2 is order within group defined by Order1. You assign your posts to some arbitrary groups (i.e. all posts from single day or every 100 posts) by setting Order1. Within each group posts are ordered according to value in Order2.
If a new post must be inserted into some group you only need to renumber Order2 values for posts in this particular group - not all posts in table.
Of course when retrieving the rows you order by Order1, Order2.
So your table looks like this:
PostName Order1 Order2
------------------------
A 1 1
B 1 2
C 1 3
D 1 4
E 2 1
F 2 2
G 2 3
Now if you want to insert X between F and G you renumber only items in group Order1 = 2, so that rows become:
PostName Order1 Order2
------------------------
A 1 1
B 1 2
C 1 3
D 1 4
E 2 1
F 2 2
G 2 4
X 2 3
Now, if you want to insert Z between A and B you only renumber Order2 in posts in group Order1 = 1:
PostName Order1 Order2
------------------------
A 1 1
B 1 3
C 1 4
D 1 5
E 2 1
F 2 2
G 2 4
X 2 3
Z 1 2
IF you're not going order by a criteria (Date, for example), you will need something to order them. Not necessary have to use column Id to order, you could add a column that is not the PK, and makes the function of ordinalPosition.
So, when you insert at the end, you will get the max ordinalPosition, and then do ordinalPosition+1.
If you want to insert between two of those, then you look the ordinal position of the two post you have (to insert between them), update incrementing in 1 all ordinalColumns, from the major of those two post, and then (now you got a "hole" in the ordinalPosition), insert the new ordinalPosition, and that will be the the minor of those two post + 1 (which equally the mayor of those two post)
Then, you will get the posts ordered by your ordinalPosition:
Select fields from Posts
-- where your criteria goes here
order by ordinalPosition
Maybe you consider that's not a good way, because everytime you insert a post between two another one, you will have to do an update - but the Db is not magic, has to order by some criteria. And have to make sure there is no posts with same order id, so probably will have to add some Unique Constraint or something as you want.
You're not gonna update very old posts probably, so I don't think will be so much update's everytime you add a posts between other two.

yet another tsql question

i have three tables
documents
attributes
attributevalues
documents can have many attributes
and these atributes have value in attributevalue table
what i want in single query get all documents and assigned atributes of relevant documents in row each row
(i assume every documents have same attributes assigned dont need complexity of diffrent attribues now)
for example
docid attvalue1 attvalue2
1 2 2
2 2 2
3 1 1
how can i do that in single query
Off the top if my head, I don't think you can do this without dynamic SQL.
The crux of the Entity-Attribute-Value (EAV) technique (which is what you are using) is to store columns as rows. What you want to do is convert those rows back to columns for the purpose of this query. Using PIVOT makes this possible. However, PIVOT requires knowing the number of rows that need to be converted to columns at the time the query is written. So assuming you are using EAV because you need flexible attributes/values, you won't know this information when you write the query.
So the solution would be to use dynamic SQL in conjunction with PIVOT. Did a quick search and this looks promising (didn't really read the whole thing):
http://www.simple-talk.com/community/blogs/andras/archive/2007/09/14/37265.aspx
For the record, I am not a fan of dynamic SQL and would recommend finding another approach to the larger problem (e.g. pivoting in application code).
If you know all the attributes (and their IDs) at design-time:
SELECT d.docid,
a1.attvalue AS attvalue1
a2.attvalue AS attvalue2
FROM documents d
JOIN attributevalues a1 ON d.docid = a1.docid
JOIN attributevalues a2 ON d.docid = a2.docid
WHERE a1.attrid = 1
AND a2.attrid = 2
If you don't, things get quite a bit messier and difficult to answer without knowing your schema.
lets make example
documents table's columns
docid,docname,createddate,createduser
and values
1 account.doc 10.10.2010 aeon
2 hr.doc 10.11.2010 aeon
atributes table's columns
attid,name,type
and values
1 subject string
2 recursive int
attributevalues table's columns
attvalueid,docid,attid,attvalue(sql_variant)
and values
1 1 1 "accounting doc"
1 1 2 0
1 2 1 "humen r doc"
1 2 2 1
and I want query result
docid,name,atribvalue1,atribvalue1,atribvalueN
1 account.doc "accounting doc" 0
2 hr.doc "humen r doc" 1

How do I create nested categories in a Database?

I am making a videos website where categories will be nested:
e.g. Programming-> C Language - > MIT Videos -> Video 1
Programming -> C Language -> Stanford Video - > Video 1
Programming -> Python -> Video 1
These categories and sub-categories will be created by users on the fly. I will need to show them as people create them in the form of a navigable menu, so that people can browse the collection easily.
Could someone please help me with how I can go about creating such a database?
Make a categories table with the following fields:
CategoryID - Integer
CategoryName - String/Varchar/Whatever
ParentID - Integer
Your ParentID will then reference back to the CategoryID of its parent.
Example:
CategoryID CategoryName ParentID
---------------------------------
1 Dog NULL
2 Cat NULL
3 Poodle 1
4 Dachsund 1
5 Persian 2
6 Toy Poodle 3
Quassnoi said :
You should use either nested sets or parent-child models.
I used to implement both of them. What I could say is:
Use the nested set architecture if your categories table doesn't change often, because on a select clause it's fast and with only one request you can get the whole branch of the hierarchy for a given entry. But on a insert or update clause it takes more time than a parent child model to update the left and right (or lower and upper in the example below) fields.
Another point, quite trivial I must admit, but:
It's very difficult to change the hierarchy by hand directly in the database (It could happen during the development). So, be sure to implement first an interface to play with the nested set (changing parent node, move a branch node, deleting a node or the whole branch etc.)
Here are two articles on the subject:
Storing Hierarchical Data in a Database
Managing Hierarchical Data in MySQL
Last thing, I didn't try it, but I read somewhere that you can have more than one tree in a nested set table, I mean several roots.
You should use either nested sets or parent-child models.
Parent-child:
typeid parent name
1 0 Buyers
2 0 Sellers
3 0 Referee
4 1 Electrical
5 1 Mechanic
SELECT *
FROM mytable
WHERE group IN
(
SELECT typeid
FROM group_types
START WITH
typeid = 1
CONNECT BY
parent = PRIOR typeid
)
will select all buyers in Oracle.
Nested sets:
typeid lower upper Name
1 1 2 Buyers
2 3 3 Sellers
3 4 4 Referee
4 1 1 Electrical
5 2 2 Mechanic
SELECT *
FROM group_types
JOIN mytable
ON group BETWEEN lower AND upper
WHERE typeid = 1
will select all buyers in any database.
See this answer for more detail.
Nested sets is more easy to query, but it's harder to update and harder to build a tree structure.
From the example in your question it looks like you'd want it to be possible for a given category to have multiple parents (e.g., "MIT Videos -> Video 1 Programming" as well as "Video -> Video 1 Programming"), in which case simply adding a ParentID column would not be sufficient.
I would recommend creating two tables: a simple Categories table with CategoryID and CategoryName columns, and a separate CategoryRelationships table with ParentCategoryID and ChildCategoryID columns. This way you can specify as many parent-child relationships as you want for any particular category. It would even be possible using this model to have a dual relationship where two categories are each other's parent and child simultaneously. (Off the top of my head, I can't think of a great use for this scenario, but at least it illustrates how flexible the model is.)
What you need is a basic parent-child relationship:
Category (ID: int, ParentID: nullable int, Name: nvarchar(1000))
A better way to store the parent_id of the table is to have it nested within the ID
e.g
100000 Programming
110000 C Language
111000 Video 1 Programming
111100 C Language
111110 Stanford Video
etc..so all you need it a script to process the ID such that the first digit represents the top level category and so on as you go deeper down the hierarchy

Resources