Need help for DB structure - database

In my application user can post questions and comments. each question has many comments and a comment is belongs to a question. This application may have millions of users.we have to extract tags from the questions and comments.
Now we planned to have db structure like this.
questions table(id,question)
comments table(id,comment)
tags table (id,tag_name)
tags_questions_comments (id,tag_id,question_id,comment_id)
Now I am having a confusion whether this is correct or not ?.
id tag_id question_id comment_id
--------------------------------------
1 1 1 NULL -> tag from question
2 2 1 NULL
3 3 1 1 -> tag from comment
Thanks

I would make two tables instead of:
tags_questions_comments (id,tag_id,question_id,comment_id)
The one includes the set:
tags_questions (tag_id,question_id)
And another one for the the comments:
questions_comments (question_id,comment_id)
unless you add a questionId columns to the comments table which could be a better approach according to your needs.

Related

How to Select a record in Access if it is not a duplicate value

Ok, from the title it seems to be impossible to understand, I'll try to be as clear as possible.
Basically, I have a table, let's call it 'records'. In this table I have some products, of which I store 'id', 'codex' (which is a unique identifier for a certain product in the whole database), 'price' and 'situation'. This last one is a string which tells me wether the product has just entered the store (in that case it is set to 'IN'), or it has already been sold ('OUT' in this case).
The database was not created by us, I HAVE to work with that although it is horribly structured... The guy who originally projected the database decided to register when a product's situation passes from 'IN' to 'OUT' in the following way: instead of UPDATEing the corresponding value in the table, he used to take the row of data with 'IN' as situation, and to DUPLICATE it setting, that time, 'OUT' as situation.
Just to sum up: if a product has not been sold yet, it will have one row of dedicated data; otherwise those rows will be two, identical except for the 'situation' field.
What I need to do is: select a product if (and ONLY if) there is no duplicate for it. Basically, I can (and should) look for a 'codex', and if I my Count(codex) ends up being >1, I do not select the row.
I hope the explanation of the process is clear enough...
I tryed many alternative (no, SELECT DISTINCT is not a solution): des anyone have an idea of how to do that? Because really, none of us three could come up with a good solution!
Here is the schema for the table, I hope it is sufficiently clear, and if not do not hesitate asking for more details.
Just as a reminder: the project is in (sigh...) VB.net, the database is in Microsoft Access (mdb).
I could not find a solution on StackOverFlow, I hope this is not a duplicate question! Thanks in advance for the help.
id codex price situation
1 1 2.50 IN
2 1 2.50 OUT
3 2 3.45 IN
4 3 21.50 IN
5 2 3.45 OUT
6 4 1.50 IN
To check if I understand what your problem is... In your example table you just want to get the lines with ID 4 a 6, right?
If is that what you want, and If you want only the not sold ones try this command
SELECT
*
FROM
records
WHERE
codex
not in
(
SELECT
codex
FROM
records
WHERE
situation ='OUT'
)

Database design for multiple similar types?

Say I have two question types: Multiple Choice and Range. A Range question allows users to answer by specifying a range of values in their answer (1-10 or 2-4 for example).
I inherited a database where the answers to these question types are stored in the same table which is structured like so:
Answers
-------
Id
QuestionId
choice
range_from
range_to
This results in data like below:
1 1 null 1 10
2 1 null 2 4
3 2 Pants null null
4 2 Hat null null
Does it make sense to include columns from every answer type in the answer table? Or should they be broken out into separate tables?
This is a very slimmed-down version of my real database. In reality there are about 8 question types, so with every answer there are several columns that are left unused.
Does it make sense to include columns from every answer type in the answer table?
This is "all classess in the same table" strategy for implementing inheritance, which is suitable for small number of classes. As the number of classes grows, you might consider one of the other strategies. There is no predefined "cut-off point" for that - you'll have to measure and decide for yourself.
The alternative would be an EAV-like system as proposed by blotto, but that would shift the enforcement of data consistency away from the DBMS. This is a valid solution if you don't know the structure of data at design-time and want to avoid DML at run-time, but if you do know the structire of data at design-time better stick with inheritance.
You could have a single field that represents the 'type' of question, that seems best suited in the Question table ( not the Answer table). For example:
question_type ENUM('choice', 'range', 'type_3', 'type_4'..)
Then make a one-to-many link ( a join table ) that represents the Question-to-Answers relationship
AnswerId (pk) | QuestionId (fk)
1 1
2 1
3 2
4 2
Finally, your Answer table is a collection of values for each Answer . It can designate each record more specifically by having its own ENUM.
answer_type ENUM('low_range', 'high_range', 'choice', etc)
Id (pk)| AnswerId (fk) | Type | Value
1 1 low_range 1
2 1 high_range 10
3 2 low_range 2
4 2 high_range 4
5 3 choice Pants
6 4 choice Hat
This is much more scalable, and basically pivots the fields in your previous table to values in the answers table. So you can always add new 'Type's both for questions an answers without adding new fields to the schema.

relational database design for multidimensional matrix questions

I am designing a relational DB for an online survey.
However, I am not sure what is the best relational database design for storing multidimensional matrix questions.
Let's say, I have the following question (sorry, it does not allow me to insert HTML table):
What was your experience of...
----------| Not friendly| (2) |Very friendly|Length of stay|Visited in the last year?|
Sydney |radio button | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
New York | rb | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
London | rb | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
Do you think I should do something along the following lines or is there a better way?
To hold all the question:
Question
questionID
question
QuestionMatrix2d
matrix2dID
questionID
subquestionID
subquestion
QuestionMatrix
questionID
matrix2dID
question_parentID
And to hold all the responses:
QuestionResponse
questionID
response_code
QuestionMatrix2dResponse
questionID
subquestionID
response_code
Thank you for your help.
I disagree with ryan1234. This totally is a relational problem, and there is very little reason not to put it into a database.
I have to do a bit of guesswork though, in what you're trying to achieve here. You have an online survey, so I assume it will be used by more than one person. Your database will need to acommodate for that by having a session or user table, I'll go with the latter since it is more clear to read.
Secondly, you have a list of locations (Sidney, New York, London). I assume this list can either change over time or even from one questionaire to the next.
Then you have a set of questions. You don't explicitly state that these would be variable or fixed. Since you designed a set of tables for that, I assume it's supposed to be variable. Please note that your questions are not a matrix, but a list. Even if they are hierarchical, they still do not compose a matrix.
Last but not least you've got answers to those questions.
Lets create a users table:
user_id user_name
1 me
2 somebody else
Second table is as simple: locations
location_id location_name
1 Sidney
2 New York
3 London
Third table is a bit more complicated - and to be honest: just plain ugly. But this is what you get if you design a database in a database, and the alternatives (using DDL or storing that information in XML/JSON or even outside the database) are not pretty either. If there is a hierarchical question (your examples don't show them), you could add a "parent_question_id" column.
question_id question_text question_type question_type_info
1 How do you rate RADIO 0 to 5
2 Length of stay COMBOBOX 1 day, 2 days, whatever
3 Visited last year CHECKBOX
Finally you need a fourth table to store all the answers
user_id location_id question_id value
1 1 1 2 <-- value here means "rating of 2"
1 1 2 5 <-- value here means "5 days"
1 1 3 1 <-- value here means "yes, visited last year"
Yep. ugly as well. If you had a fixed list of questions I could provide you with a pretty database :)
Edit: Answering to your comments: To link your questions to a survey, you'll need a few more tables surveys defining which questions for which locations are going to be asked. The following database layout lets you specify a list of locations and a list of questions asked as well as a survey name.
Table surveys:
survey_id survey_name
1 Spring 2013 London Travel Survey
2 Spring 2013 Northern Hemisphere Short Survey
Table survey_questions:
survey_id question_id
1 1
1 2
1 3
2 1
Table survey_locations:
survey_id location_id
1 1
2 1
2 2
The contents I put in here gives you two surveys. Survey #1 will ask all three questions just on one location: 'London'. Survey #2 will just ask one question on both London and New York. If you want to ask different questions on different locations your table layout will have to accommodate for that, but such a system won't fit into your original table-like layout.
Having done things similar to this, I would recommend considering not turning this into a relational problem. What if you have objects and just serialize them to something like JSON and store that?
Doing this relationally you'll end up spending quite a bit of time making tables and wiring together complex drawing code in your application to make sure the questions/answers draw in the right order etc.
Otherwise I think you can make your approach work. There is no silver bullet for designing survey stuff in an RDBMS.

How to design the schema for something like StackOverflow questions tags?

I have 3 plans:
1, in questions table:
question
------------------------------------
id title content ... tags
------------------------------------
1 aaa bbb ... tag1,tag2,tag3 (use , to split more tags)
2, in tags table and split:
tags
------------------------------------
id tag
------------------------------------
1 tag1,tag2,tag3 (use , to split more tags)
3, in tags table:
tags
------------------------------------
id tag
------------------------------------
1 tag1
2 tag2
3 tag3
I think that plan 3 is better, but what's your opinion?
Any other good ideas for this implementation?
Thanks for the help :)
These patterns are called mysqlicious, scuttle and toxi (from the least to the most normalized).
They all have their benefits and drawbacks. You can read quite a good analysis here:
http://forge.mysql.com/wiki/TagSchema (WayBackMachine Version)
Note that mysqlicious heavily depends on your database's ability to perform FULLTEXT searches efficiently.
This means that for MySQL with InnoDB and for some other systems it's very impractical.
The relationship between tags and content is many-to-many. What this means is that one tag can be associated with several units of content, and one unit of content can be associated with several tags.
To implement this in a database, you can use an auxiliary table called ContentTags. The relationship of Content to ContentTags is one-to-many; the relationship of Tags to ContentTags is one-to-many.
#Tags Table
Id Text
1 'Tag1'
2 'Tag2'
3 'Tag3'
#Content Table
Id Content
1 "some content"
2 "other content"
3 "more content"
#ContenTags Table
ContentId TagId
1 1
1 2
2 1
2 2
2 3
3 1
As you can see, the relationship is clearly reflected (content 1 is associated with tags 1 and 2; content 2 is associated with tags 1, 2, and 3; content 3 is only associated with tag 1)
Depends on how normalized you want your data to be.
Firstly, I cringe when I see an "id" column in a table that isn't unique. At least rename the column to "question_id".
Secondly, it depends on whether you want a quick listing of all tags defined. In which, case, you'd want a separate tag table defining the set of possible tags, and then an intermediate table between questions and tags that provided a many-to-many association.
The correct approach is to create the one-many relations, that is you have one comment and multiple tags. From WIKI
In database technology, a one-to-many (also known as to-many) relationships occurs when one entity is related to many occurrences in another entity. For example, one club has many members.
And the main concept in the database design is the Database normalization.
So I'd do it like this.
comments
------------------------------------
id_comment title content
------------------------------------
12 aaa bbb
tags
------------------------------------
id_tag comment_id tag
------------------------------------
1 12 tag1
2 12 tag2
3 12 tag3

Why are comment ids in Wordpress continuous (unique)?

I notice that in WP blogs the comments order by continuous id (id=203, id=204 and so..).
while I understand the class names "parent" "child" - to sort the reply to thread, I didn't figure out why the id's are continuous - maybe to choose hide/show specific comments?
You need to improve your question.
Comments have UNIQUE IDs because they are stored in their own table, and a One to Many relationship links the Posts with the Comments, thus, having a 'global' counter to them.
Comment ID 1 => Post ID 1
Comment ID 2 => Post ID 1
Comment ID 3 => Post ID 2
And so on, Post 1 has 2 comments, post 2 has one.
If you meant "unique ID in the HTML":
Every ID in HTML has to be unique.
Even if there are no links to the separate comments on the page itself, like on the page which you linked to, IDs are still very handy when someone else wants to point to a particular comment from within another page. Like, when I want to point someone to the comment of Jeremy regarding email updates, I can simply do so.

Resources