Is AWS DynamoDB fit for data collecting and analysis app?

Is AWS DynamoDB fit for data collecting and analysis app? - database

I'm building a simple let's say survey app. I have the following requirements:
User:
- name
- surname
- age
Question:
- question (text)
- type
- author
Answer:
- value
- date
I want users to give questions to answers and I want to be able to query the following:
Get user's questions and answers
Get question and its answers
Get user's questions and answers by (or where) type
Get answers to questions by value (or where value)
Count answers to a question
Get answers to a question by user age
Get answers to questions over time (by user or general)
So far I came up with the following solution:
So here the Partition Key is the ID and the Sort Key is type
So the first problem is the first requirement:
Get user's questions and answers
Shall I add another type which would be user ?
Now how to:
get a question and its answers ?
I can query id = 1 and type starts with question but then I can get a lot of unnecessary user records.
Now the next one:
get user's answers to questions of specific type (type as question attribute)
How to count same answers to a question?
I'm new to DynamoDB so any help is greatly appreciated!

I would add an id (uuid generated from code) for each table.
Get user's questions and answers
Set userId as field in both Question and Answer, then set an index on that. This will allow you (with 2 separate queries) to get all questions and answer of a particular user.
get a question and its answers ?
Answer will have as field questionId and same index principle applies
get user's answers to questions of specific type (type as question attribute)
You can add a field to Answer, which is questionType and filter on that. Remember that NoSQL schema focuses on the queries you need, not for a perfect normalised schema.
How to count same answers to a question?
I am guessing you meant how many answers does a question have. Well if you know the questionId, just follow "get a question and its answers ?" and execute a .withSelect(Select.COUNT) on the query
EDIT
User:
- userId (hash key)
- name
- surname
- age
Question:
- questionId (hash key)
- question (text)
- type
- author
- askedByUserId (index - hash key)
Answer:
- answerId (hash key)
- value
- date
- answeredByUserId (index - hash key)
- questionId (index - hash key)
So everything depends on how you query data (and of course you may end up switching to SQL even, noSQL not an 1:1 replacement). Referring just by id is not wrong, I would prefer to have immutable questions and answers instead of editing a question every time it receives an answer. So take into consideration that you can't use joins and always prefer immutable data (add a new value instead of editing).
EDIT 2
In order to get all questions and answers from a User, create and manage this table:
UserItems:
- userId (hash key)
- question (full json of a question, optional)
- answer (full json of a answer, optional)
So every time you create a new question for example, add it to both Question and UserItems. Do a simple query based on hash key and you will get all questions and answers, with full data. Again, this is based on your querying needs.

Ok, so it took me a few good hours to figure out the answer.
Let's start with the entities and their relationships, here is the diagram:
Where:
USR - User
ANS - Answer
QUE - Question
As it can be seen above, I don't have any many-to-many relationships so the only pattern I will be using for retrieving the data is the Primary Key (Partition Key + Selection Key) pattern.
If I had many-to-many relationship in the model, apart from PK+SK I would also use GSI (Global Secondary Index)
GSI can also be used to create a different view, for example by swapping places of PK with SK.
Now the DB access patterns:
The above unfortunately doesn't cover all the access patterns, a few more I can think of are:
Query answers by user age
Query answers to a question and calculate percentage of answers by type (should this be calculated outside of the DynamoDB?)
And now the DB Table design:
So based on the access patterns in the Access Patterns table with this design I can query:
All the answers of a user to a question
SELECT PK=USR_1 WHERE SK begins_with(AR_QUE_2)
All the answers to all the question by a user
SELECT PK=USR_1 WHERE SK begins_with(AR_QUE) - in this case I would have to include question metadata (question itself as text) in the attributes of the AR_QUE_X_ANS_Y so that I don't have to query the table multiple times to get all the questions metadata.
All the answers to a question
Select PK=QUE_2
I'd really love to hear someone's opinion on this one.
I also appreciate that in some cases I would have to denormalise data and insert metadata information in Attributes.
I am still very curious how I could calculate number of answers by type to a question and calculate for example a percentage of answers of different type.

Related

Managing both fixed and user-defined values

I'm designing the database for an application in which the user is presented with questions, and he must answer them. Think of it either as a questionnaire or as a quiz game, the concept applies to both. I plan to have:
a table with the questions
a table with the possible answers, each of them linked to the question it belongs to with a foreign key (let's keep things simple and assume it's a 1:many relationship, where answers cannot be shared between questions)
a table with the answers that users provided (with foreign keys to the question, the answer and the user ID)
Since many of the questions will be common cases, like yes/no, I decided I'd specify a "question type" enumeration to each question. If the application sees a yes/no question, for example, it means there are no answers in the database, and the application will automatically add the two answers, "Yes" and "No". This saves me hundreds or thousands of useless rows in the answers table.
However, I'm not sure how I should define the table to record user answers. Without the special types of questions, I'd just record the question ID, the answer ID and the user ID, which means "user X answered Y to question Z". However, "yes/no" questions would not have a matching answer in the table, so I can't use the answer ID.
Even making the answers shareable between questions (by making a many-to-many relationship between questions and answers) is not a good solution. Sure, it would allow me to define "Yes" and "No" as regular answers, but then applications should be aware that a "yes/no" question uses answers (say) 7 and 8 - or, when creating a "yes/no" question answers 7 and 8 should be bound to that question. But this means that these "special" answers' IDs must be hardcoded somewhere else. Also, this would not scale well should I add more special types of question in the future.
How should I proceed? Ideally, I need to store in each row of my "user answers" table either a fixed value or a foreign key to the answers table. Is there a better solution than using two columns, one of which is NULL?
I'm using SQL Server, if that matters.

Based on your description I think I'd go on the route of adding another column to the table and making the FK column nullable.
You'd probably have only a few choices for those special questions, so a nullable TINYINT datatype would cut it, and it is only 1 extra byte for your answer row. If this extra column happen to raise the number of columns to more than a multiple of eight, say you go from 8 to 9 or 16 to 17, than you pay another extra byte for the growth of the null bitmap. But it's 2 extra bytes per row worst case.

Database diagram for multiple selection

I am having a logic problem while designing my database for a online survey. Part of my software consist in a survey generator where the user can create a question and add checkboxes and radiobuttons as options. So every question have an ID, however 1 question can have many options.
So far I have only work with software where the user can choose from determined range of options so I can collect the input and store it in the respective column of a table. However this is a different case for me because the columns are not there yet. I am trying to figure out a way to work around however I have been block for hours.

Assuming by options you mean Possible answers to a question
Question -Tracks the question of the survey
QID PK
Answers - Tracks the available answers for a survey item
AID PK
QuestionAnswers - Identifies which answers can be used on which questions
QUID PK
AID PK
Users - identifies a person taking a survey
UID PK
UserAnswers - Identifes the answers a user has selected for a survey and relates to both users and questionAnswers thus limiting which answers can be used for a question but each row allows the user to select 1-M answers as needed.
UID PK
QID PK
AID PK

normalization of Repeating Groups

My data consists of questions and answers. It also shows what the versionNr is of the answer and shows which persons changed which Answer. Its possible that some questions have the same answer.
questionID Question Answer VersionNr User date
1 Who is....? W.H. Smith 1.0 ...#test.com 1/1/14
1.1 ...#test.com 3/8/14
2 What is...? 3% 1.0 ...#test.com 1/2/14
RG = Repeating Group
Bold = Composite/Primary Key
0NF:
(questionID, question, AnswerID, answer,RG{versionNr, user, date}
1NF:
(questionID, question, AnswerID, answer)
(questionID, AnswerID, VersionNr, user, date)
2NF/3NF:
Q(questionID, question, AnswerID)
Ans(AnswerID, answer)
Version(questionID, AnswerID, VersionNr, user, date)
My question is whether I should remove questionID from Version, because the versionNr, date and User gives information about the Answer and Not the question.

If there can be many version records for one answer record, then yes, QuestionID should not be in Version, as it is redundant data: You can always tell the QuestionID of a Version by looking up the Answer record.
That said, it's not clear to me what you're trying to model. When you say that a user can change an answer, do you mean that this is a quiz and users can change what the "right answer" is? Or do you mean that they can change the answer they gave to a quiz or survey?
But if there are many Version records for one Answer, than you are not recording what the answer was at each step, you are just recording who made the change. Maybe that's all you need to know for your purposes. But if the idea is that you want to record what the answer was for each version, then Answer and Version should be a combined into a single table. In that case you need questionID in the Answer/Version record because otherwise you have no way to know which question any answer other than the latest is for.
If the idea is that there is just one Answer record for each Question, and then many versions, then Question and Answer should be combined into a single record. Well, you say that several questions can share the same answer. As I asked in a comment, what does this mean? Are you talking about answers that are the same by coincidence, like on question is "How much is 2+2?" and another is "What is the square root of 16?" and both answers happen to be 4? Or do you mean that they are really the same answer in the sense that if the answer to one question changed, the answer to the other would logically and inevitably have to change to the same thing? Like, "Who is the current vice president of the United States?" and "If the president of the United States died, who would become president?"
I'd think the logical schema would be this:
Question (questionID, question_text)
Answer (answerID, questionID, version_number, answer_text, user, change_date)
Then the current answer is
select answer_text
from answer
where questionID=#qid
and version_number=(select max(version_number) from answer
where questionID=#qid)
This requires no redundant data.
An obvious denormalization for performance and simplicity would be to put the answerID of the current answer in the Question record.

If I understand your problem correctly, I believe you can do the following to achieve a working 3NF schema.
Q(questionID, question)
Ans(AnswerID,answer,QuestionID)
Version(VersionNr,AnswerID,User,date)
(Italics are foreign keys)
So, the primary key of version is triple {VersionNr,AnswerID,User). Please explain if something is not correct with my solution.
So, to sum up, no, you don't need to have questionID inside version, since you can find it with joins.
UPDATE
I think I understand your problem and I believe the correct solution is the following.
Q(questionID,question)
Ans(AnswerID,answer)
Version(QuestionID,AnswerID,VersionNr,user,date)
Actually, your relationship between answers and question is many to many, since many answers are connected to many questions. So, you can use Version as the intermediate table to construct this many to many relationship.
Additionally, you can add the version number, the user and the date in that intermediate table, to have all the necessary information.

Questionnaire to database design

I have read through a lot of the threads here and have found a good amount of useful input...but there are a couple of questions that remain unanswered.
I am storing questions & answers from a questionnaire in a database.
I have the tables:
Survey (surveyID)
Question (questionID, surveyID, questionType, Question)
Answer (answerID, userID, questionID, answer)
User (userID, username)
Question 1: multi-value questions...I would have a separate row for each value in the answer table....but have the same questionID and userID. But then how would you work the following:
-what are your coping strategies (multi-value)
-how frequently do you use each coping strategy?
i.e. a one-one relationship of coping strategy-frequency.
The solution above (i.e. one row per answer doesn't work because you need the relation between the specific coping strategy and the frequency).
A similar question is for the following:
have you been involved in conflicts over land-use rights?
with whom? (multi-value)
for what reasons?
(i.e. what were your reasons for conflict with the neighbours, what were your reasons for conflict with the authorities?) ...i.e. one to many on a multi-value attribute
Thank you in advance, I hope I have explained my query sufficiently well.
Becky

Database design of online exam system

I am designing a simple database of online exam system. But I can not figure out how the questions and the answers should be stored. I am thinking question and Answer as different entities. There will be both MCQ and short questions in the same question set and the number of questions in a set may be dynamic (choose by teacher).

As I see you're looking for something like this:
User table - everyone who will be answering to the questions. It will have UserId and other profile information - name, class, photo, etc.
Question table - it will have questionid, created by (userid) and text of the question
AnswerOption - it will have optionid, link to question, text of answer option
UserAnswer - it will have useranswerid, questionid, optionid
So for example you have this question: "How much is 2*2?" and answer options are "4", "5", "6".
In this case you will have 1 record in question table and 3 records in AnswerOption table.
Now when someone answers the question, you insert a record into UserAnswer table with respective userid, questionid and optionid.
Is this what you had been looking for?
And of course you should also think how to group questions in test etc.

Since you've got multi-choice questions (MCQ, I assume), you need to consider carefully whether the alternatives in an MCQ are part of the question or are answers with a status (wrong, part of the right answer, correct). If a question has multiple answers, keep them in separate tables. If a question has just one answer, then keep them in a single table.

Start from your smallest item. A question can have multiple answer choices, one of which is good. So you could have an answer table.
ANSWER:
AnswerID
QuestionID
Choice
Text
Good (boolean qualifier)
QUESTION:
QuestionID
Text
Points
This is just a suggestion. It all depends on what you want to do. But first, you lay out by categories what your items are. Prefer loose leaf paper.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight