How to organize the data in application database? - google-app-engine

My application should contain list of questions + user answers. How should I organize the database:
question1 question2 question2 ... questionN
user_id_1 yes no yes ... yes
user_id_2 no no yes ... no
...
user_id_N yes yes yes ... yes
Looks like I need to create separate table with questions and assign id to each question. How another table should look like (since number of columns is not fixed)? Or, should I have 2 more tables?
Later on I will also need to:
calculate how many users answered 'yes' on questionN;
how many friends (another table or json data) of *user_id_N* answered 'yes' on questionN.
Should I query database each for getting these numbers, or should I have separate database and keep counters there each time user answer (looks possible for item 1 only since friends list can be changed anytime).

A standard way to do this is to store each answer as a separate entity - conceptually the same as you depict, but without the requirement to modify your structure as you add new questions. Here's an example set of model definitions that achieves this:
class UserInfo(db.Model):
# Anything you want to store about the user
class Question(db.Model):
text = db.TextProperty(required=True)
# Anything else you want to store about the question
class Answer(db.Model):
user = db.ReferenceProperty(UserInfo, required=True)
question = db.ReferenceProperty(Question, required=True)
answer_text = db.TextProperty(required=True)

If you are only storing whatever what answered, you should be able to do that with 3 tables. One for your questions with a one to many to a table with the answers and then another table for the users with a one to many to the answer they provided.

Using a separate table to keep track of the questions themselves might be a good idea. By the way, if you weren't simply leaving out a header for it, your list of user IDs should itself be a specific column in the answer table. It would probably be a good idea to use a separate table to keep track of who is friends with who, though.
Also, while I'm not experienced with accessing a GAE datastore, it's fairly simple to take a count of specific answers in a single column using SQL, at least. SELECT COUNT(questionN) FROM AnswerTable WHERE questionN='yes' would be what you'd use as an SQL query.
Note that if you went with Limey's suggestion for the design, the equivalent SQL query would be more like SELECT COUNT(answer) FROM AnswerTable WHERE questionID='questionN' AND answer='yes'.

Related

SQL Questionaire Database Design (EAV Model) Issue

Im building a friendship site where I try and match users who share similar interests.
I have 25 questions with defined answers(drop down answers) that the user must fill out.
Im using an entity–attribute–value model to store the users id the question id the answered id the user selects.
I then use the count function to see which users have the most matches to my profile.
Current Table Structure
Question Table
Answer Table
Question_Answer_User Table
The problem im running into is I have two question and im not sure where the best place to store them is.
The question is what is your country?
The question is what is your State?
Im not sure if I should store them with the other 25 questions or if i should store them in a three separate tables as seen below.
country table
state table
user_country_state Table
There are going to be alot of answer entries for these 2 questions. For example there are 25 countries the user can answer and a total of 900 states / provinces the user can answer from.
I want to be able to consider the users location as similarity to count but im not sure what the best approach to incorporate this is?
I think the selected country and state should live in the user table, along with the other necessary user information such as name and email address. I don't think it belongs in the Answers table, but it would work there.
For the list of options for the user to select from when setting up his account, storing them in your pre-defined Q and A tables are as good a place as any. It depends, I guess, on how your data and functionality is broken apart so that you aren't crossing responsibility boundaries for the Q&A table storing survey-type answers as well as user-setup answers.

Should a table with only 1 field of useful data be its own table

Just got a question here about a database table. If the table only has a primary key (identity) and 1 column of useful data, is it okay to be its own table or should it be in the parent table as just the data?
The table is storing Security Questions that the user will set up with they make their account and be used to reset password in the event they want to change password or forgot the password. I have the ID of the question, and the question string in this table.
The reason I have it in its own table is that the same question could be used for many users so why store the question many times in the parent table. Thats my thinking, just wanted a few others' opinions on this.
EDIT: The Security Questions are going to be input by my team, not the user themselves. The user will pick one of the questions to use.
I would suggest this sample design using bridge table:
You can have multiple questions for a user as well as their answers unique. Also, the questions can be same for multiple users.
You must always try to prevent duplicates, that's why your solution is the best.
it will also keep your database smaller. A foreign key with int value is smaller than a string.

Designing tables in Amazon dynamodb

I am new to DynamoDB and I have a big mess of: how my tables should be look like.
I have read the posts here: (its recommended for who didn't read it yet)
http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/BestPractices.html
And now I have some dilemmas that I think everyone who start using DynamoDB will have.
First,
my tables: STUDENTS, TEAMS, PROJECTS
STUDENTS: id, age ...
TEAMS: id, student-1-id, student-2-id, current-project, prev-project, last-updated-on
PROJECTS: id, team-id, list of questions, list student1answers, list student2answers
some comments:
as you can see I don't use range-key. Do I need to?.
each answer is a json of (number of question, text, date of inserted)
every student can be in multiple teams.
My dilemmas:
I want to get all the teams of a specific student that updated after specific date.
for now I am using 2 scans operations: one search the student1 and the second search the student2.
**Is there a better way ?**
I have thought about adding a new table: user-Battles: student-id, team-id
so i can query the teams for the specific students and then batch_get_item all the teams
but what with the last-update-on? how can I also query by this inside the batch_get_item ?
When a project overs I don't use it anymore. what to do with the old items ?
delete ? Move them to another table ?
In the project table, the attributes that can be updated are the answers attributes
so I think to move them to another table for performances.
Do I really need to move them if its updated just twice? (when student1 send answer and when student2 send answer - and then the project is old)
*If I create a new table for the answers I will not have to store them in a JSON format
How would you design the tables? Please let me know.
Nice question with lot of details :)
If I had only one advise, it would be:
keep in mind that, with NoSQL, it is not only OK but normal, even recommended to de-normalize your data.
This said, for you "dilemna", your suggestion was pretty good. You should de-normalize with the date as the range_key. One way could be to add a table like this:
hash_key: student
range_key: date
team: team_id
But still, this is not perfect as the table would keep on growing. Each updating inserting a new object. Indeed, it is not possible to edit a key. You would have to do your own cleaning code.
In DynamoDB, you do not have to worry about performance slowdown caused by "old" items(except for scan), this is the main strength of DynamoDB. Nonetheless, this is always a good practice to keep data clean but be consistent. If you start moving expired projects then, move all of them or you will end up not knowing where your data are.
Last suggestion: are you sure "ids" are the best thing to describe your objects ? Most of the time, a name, date or any unique attribute makes a better key.

Database design: Likes table [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 months ago.
Improve this question
I got two tables (for now):
Topic
Post (post is a comment for a topic)
I want to add the option to Like those objects.
so I thought about creating one table of Likes and using enum to indicate which object was liked (including the object's id of course).
by the way, if I choose this option, should it be an enum or another table represent all the objects:
id object_name
1 Topic
2 Post
another option is to create likes table for every object .
what is the best practice to take?
I think creating a separate table for each object is better.
I don't see what you gain if you use only one table. You can't use foreign keys properly also in one table.
I mean you can't add a column object_id to your table, because you do not know the table to which it will point to. In this case you have two add two columns, topic_id and post_id. Always one of the two will be NULL.
Just create another table for the likes:
tbl_posts_likes (likeID, userID, postID, like = 1, unlike = -1)
then you could write a subquery like:
SELECT SUM(like) as likeCount, SUM(unlike)
FROM tbl_posts_likes
GROUP BY postID
WHERE postID= posts.postID
Depending on how you are tracking 'likes', I would recommend adding another table called likes to the following effect:
likes (like_id, like_type)
From this point, you would simply COUNT() the number of 'likes' for each like_type (topic/post) as each time someone likes either a topic or a post, a record would be inserted. However, if you plan to track 'likes' by user, you would need to add another column for user.
If you wanted to track the individual posts or topics, you would set up a table for each object and create a foreign key contraint for the topic or post ID.
topic_likes (tl_id, topic_id)
post_likes (pl_id, post_id)
The design above would create an entry for each like. If you are only concerned with the total number of likes for each object, you could set up something like so:
likes (like_id, like_type, likes)
A table for likes is a overkill. When does anyone need to check who liked who's post. Very rarely. Its better to just maintain a count of likes and unlikes in the post table, and maintain a likes table only for archive and audit purpose. Basically for all regular operations use the counts, for audit needs.
This will prevent doing joins to calculate # of likes. Frankly accuracy of likes and unlikes is not that crucial. 99% accuracy is good enough. And consistency between API calls may cause a issue, that is very rare. Only on high load.
I also find doing operations like SUM on a database server is very costly. JOIN and then SUM, too painful and time consuming. Instead move the operation to the compute. On the API do the ++ or --. That will take the load off the DB for useless operations like this.

Save multiple questions in one row

I need to make a scheme for an database. My problem is, that I have multiple questions they belong to one exam. That means: One Exam has multiple Questions. I don't know how I can solve that. I have try to fix it with an table between "tabQuestions" and "tabTest" but I doesn't seems to be the correct approach.
I have the following tables:
tabTest: ID, Name, FK_Categorie, FK_Questions
tabQuestions: ID, Question, FK_Answer
tabAnswers: ID, Answer, FK_Solution
tabSolution: ID, Solution
Thank you very much for the help!
Luca
You don't need the FK_Question field in your tabTest. What you need is a FK_Test field in your tabQuestion table where you store the id of the test the question belongs to.
...if I understood you right...?
And if I understood you right, then you should use the same for the rest of the schema too. This means you need a reference in your solutions table where you store the answer the solution belongs to etc.
You need to create two tables for this. One for exam (test) and one for questions.
The table exam (test) should have:
test_id, test_name
The table question should have:
test_id (references test_id from test table),
question_id ,
question_text.
Now you can have a 1:n relationship where one test has many questions.
But do not, I repeat: do not, store multiple questions in one row. That violates every possible good database design. Your selects, updates and inserts will be near impossible to write.
This website seems to have very good pointers for you.

Resources