Designing a database for personal project - database

I am pretty new to designing databases, and currently, I am working on a substantial big project of mine which requires a pretty big database. Here for I have a couple of questions to get my database ready for implementation. --Do have in mind that this project is focused on Laravel--
Question 1:
My project makes use of posts, But not only one. I have a system where three sorts of posts can be created, a standard post, a profile post and a Company post. All these posts can contain images. Currently, I have a column inside of all these different post tables called Post_photo'. Is this the right way to store pictures that associate with a post? It is illustrated in the image below,
Image: https://imgur.com/a/b9FWL
Question 2:
Every post can contain comments, And to connect these comments to a post you need to refer them one. But because I have three different variations of posts I set my comments table up like this; "Comment table consists of a Post_ID column and a Company_post_ID column" Instead of it having one Post_ID. Is this the right way to connect comments to posts? Or do I need to make another table called company_comments? If not, How can I accomplish this?
I have this same system on my likes and category table as well because I need to refer my likes and categories to posts. Is this the right way? To get a visual of what I am talking about, There is a picture above.
Thanks for taking the time to read this!

The following assumes that you are using a relational database.
Answer 1: If there can be more than one picture or file per post, then the best practice would be creating a table for photos that references the post's ID.
This way when you load the post you would query the photos table for columns containing a PostID field matching your post's id.
Answer 2: If the three types of post are very similar (and contain similar data), consider having only one post table, and include a field that indicates the type of post. For example, a field called postType could store an integer (0-2) that corresponds to the type. This would simplify your comments table, as you would only reference the postID.
As a final note, you might find this thread about storing binary data in databases helpful: (Storing files in SQL Server)

Related

Best way to store comments with mentions (#FirstName) in database

Was wondering what is the best way to store comments in a database (sql) that allows mentioning of other users by a non-unique natural name?
E.g. This is a comment for #John.
The client application would also need to detect and link to corresponding user profile if his/her name was clicked.
My initial thought was to replace the user's first name with the id and some metadata and store that in the DB: This is a comment for <John_51/> where 51 is the id of that user. Clients can then parse that and display the appropriate user name and profile links.
Is this a good approach?
Some background:
What I would like to achieve is similar to facebook posts where it allows you to 'tag' a user by just mentioning their name (not the unique username) in a post. It doesn't have to be as complex as facebook as what I need it for isn't for a post, but just comments (which can only be text, as opposed to posts which could be text mixed with videos/images/etc).
The solution would affect the database side (how the comments are stored) and also the client side (how the comments are parsed and displayed to the user). The clients are mobile apps for iOS and Android but also looking to expand to a web application as well.
I don't think the language matters as much but for completeness sake, I'm using Python's Flask with SQLAlchemy frameworks on the backend.
Current DB schema for comments
COMMENT TABLE:
id (<PK>)
post_id (id of the post that the comment is for: <FK on a post object>)
author_id (id of the creator of the comment: <FK on a user object>)
text (comment text: <String>)
timestamp (comment date: <Date>)
Edit:
I ended up going with metadata in the comment. E.g.
Hey <mention userid="785" tagname="JohnnyBravo"/>!
I included the user's name (tagname) as well so that client application can extract the name directly from the comment text instead of adding another step to look up who user 785 is.
The big problem here is if the username is not a stable reference, you need to abstract it to an id reference, while still keeping the the text reconstructable, but the references queryable.
Embedded collections and dynamic typing are a great option if you're using a NoSQL database. It would be fairly straightforward.
{
_id: ...,
text: [
"Wow ",
51,
", your selfie looks really great, even better than ",
72,
"'s does."
],
...
}
That way you could query references, while still easily reconstructing the content. BUT since you're using SQLAlchemy, that's a no go. Your methodology seems fine, but because your doing magic in the string you'll need to escape your delimiters, (as well as escape the escape character) if they exist in the text. Personally, I would use # as the delimiter since it's already a special character. You'd also need to identify the end of the id, in case the user sticks a bunch of numbers after the #mention, so
Wow #51#, your selfie looks really great, even better than #72#'s does. email me! john\#foo.com. Division time!!! with backslashes! 12\\4 = 3
IF querying posts for references is also important to you. You'll also need to maintain a separate POST__USER junction table that stores a row for the post and for each user id, so that when you load an object into memory, you can construct a collection. You could decide to add the junction table later, but it would be a fairly expensive migration.
If #name is not unique,you have to somehow associate the non-unique name, via the session, with the unique owner of the natural name, and do this ideally before storing it in the database. Storing a non-unique name in the database, if it cannot be resolved to its unique owner, is not of much value.
Since you mention "sql" I assume you're using a relational database. If that is the case, once you have resolved #name to its unique owner, I would create a one-to-many relationship between posting or comment and userids; that would allow a comment or post to reference more than one user.
TABLE: COMMENT_MENTIONEDUSERS
commentid
userid
I would recommend storing the comment as markdown since it's now quite widespread. In your case, "This is a comment for [#John](/user/johnID)".
Markdown is pretty standard and you shouldn't have an issue finding a package for editing / viewing.

representing topics and tables in one table vs. two tables?

I'm currently building a forum component for a larger application and I'm considering different approaches to certain parts of the database schema. In particular, I am considering representing topics and posts in a single table. While I view topics and posts as practically the same, I feel a bit apprehensive as this may make things less flexible in the future.
When topics of a particular forum are queried, the title and first post will be shown as well as some of the user information (basically the name and avatar). In this application, there are various attributes that are used by both topics and posts except for views and replies; and perhaps title, and forum_id(forum_id because that would mean potentially hundreds of records need to be affected if a topic is changed to another forum as opposed to changing the forum_id attribute in the topic relation).
The tables look something like what I have below here:
TOPIC POST
topic_id poster_id
forum_id topic_id
poster_id content
title upvote
views dnvote
replies closed
post_id deleted
last_edited
last_editor
parent_id
content
post_id
Doing it this way, using table inheritance, generating the posts in the topic would require a 4-table join via TOPIC, POST, USER, and TOPIC_TYPE.
On the other hand, if I decide to take the single table approach, should I simply leave the views, replies, title, and forum_id attributes as null if the topic_type is a regular post? (topic_type references an appropriate icon for the type of topic displayed, and will be used for statistics and etc.)
If you are definitely committed to using relational technology (I would consider NoSQL db for this like Mongo, etc. as well) I would separate into two tables as you proposed.
Your case here is a fundamental of relational master-detail design or whole-parts and I think that two tables are appropriate.
I think in this scenario simple normalization is preferred. It will also be useful to generate different types of reports as well. Although single table may be used but as you have designed the table in this case if you use two tables that would be more manageable to avoid entry of same value multiple times.
It might be worthwhile to distinguish between a "topic" and a "topic starter" as such. A "topic starter" is a comment that is not a reply. Every topic has exactly one topic starter, which could be referenced by a foreign key in the topic table.
Other than that, I agree with both your analysis and your design.

Am I expecting too much of one database table?

I'm working on a proprietary feedback application. I have a table named topics that I will to use to store suggestions, questions, and problems.
topics [ id, user_id, title, content, type[suggestion, question, problem] ]
I can easily store this data in one table using a type column to distinguish between the three different topic types.
However, there's another wrinkle: Each topic has its own responses too, and responses are very similar to the topics themselves. I'm tempted to store them in the same table as well. So now I have type (suggestion, question, problem) and subtype (topic, response).
Am I asking too much of my topics table? Should I split my data into separate tables? I'm using Postgres and Rails for this particular project.
Best way to visualise is to compare it to StackoverFlow. SO stores questions and answers in the same posts table. Now suppose instead of only questions SO decided to allow suggestions and problems. Would they still use the same table?
How often you'll want to query both topics and responses in one action? Maybe when searching, but sometimes you also want to search only topics or only responses. And how often you will need to query only one of them? Most of the time.
Go for two separate tables, you can use views with UNION clause if you want to use them together. Also at the application level you can build inheritance model on top of relational database. Say Post object with Topic and Response subclasses. Some libraries like hibernate will transparently translate query for *all posts that...` into two separate queries and union results together.
Another approach (also being one of the ways to deal with inheritance in relational store) would be to have... three tables! Posts, Topics and Response, the last two having foreign key to Posts. This way common columns are in one table and type-specific columns are separately.
Keeping topics and responses in one table is better for forums. (A lot depends on the functionality you plan to have. Is it a forum or a news/articles/reviews site?)
Most forum frameworks use this design. Including SO as you mentioned. One distinction to make clear - note that what you are defining as "topic" is generally "post". So "responses" are also posts. What other frameworks call "topic" is the thread info.
Here's an image of phpBB's schema (warning, 1MB). Notice the phpbb_posts table with post_text and topic_id (where topic_ is the title, forum id, view count, etc. but not the post_text).
StackOverflow: The PostTypeId in the Posts table - "1 is a question, 2 is an answer. Answers will have a ParentId field populated to link back to the question post."
See this related question and google for others: How would you structure a forum's DB schema?
You could query a post + responses using something like:
select t.id, t.user_id, t.title, t.content, t.type, t.parent_id,
r.id, r.user_id, r.title, r.content, r.type, r.parent_id
from topics t
left join topics r on r.parent_id = t.id
where t.parent_id = 0 and t.id = <specific id>
The part you should separate is: If you want to show thread summaries like the stackoverflow Questions/Active/Newest pages; or forum index with latest topics, response, poster, etc. then maintaining a thread_info table would help for database performance, especially if you expect high vistor volume and/or many threads and posts.
Now suppose instead of only questions SO decided to allow suggestions and problems. Would they still use the same table?
For Suggestions that depends. Look at comments for example. Different table. The nature(model) + functionality of comments is different enough to be stored separately.
Taking another example: on news / reviews sites or like in wordpress, articles and their responses would be stored separately because of the same reasons. Articles would have relations to site authors, related articles, formatting, categories, etc. Responses would be threaded, possibly unformatted, etc.
Use multiple tables here, you said it yourself: there's another wrinkle: Each topic has its own respons*es* too
Multiple of anything usually requires another table.
I'd see it like this:
[TOPICS] [ topicID, user_id, title, content, type[suggestion, question, problem] ]
[SUGGESTION] [ suggID <fields here> *topicID ]
Note topicID as a foreign key in [SUGGESTION]

bulletin-board database design

suppose I needed to design a database for a bulltin-board website.
something like stackoverflow which means there is a topic and a series of posts
but, no threaded posts (not a tree-based design)
I thought about two main options:
Topic table and Post table. Post has "topic_id" field
no Topic table. only one big Post table.
what do you think is the more preferable option?
Well, stackoverflow is a tagged based design, where a post may have multiple topics/tags.
So to capture this in a relational-style design, you would have three tables:
POST (post_id, author, etc.)
TOPIC (topic_id, name, etc.)
POSTTOPIC (post_id, topic_id)
The reason for POSTTOPIC is because a post may have multiple tags. Using #3, it becomes easy to assign/unassign tags to a post or to find posts with certain topics. None of which a column in POST would be able to accommodate.

Efficient way to store a dynamic questionnaire?

In reference to this question, I am facing almost the same scenario except that in my case, the questions are probably static (it's subject to change from time to time, and I still think it's not a good idea adding columns for each question, but even I decided to add, how should the answers be specified/retrieved from), but the answers are in different types, for examples the answer could be yes/no, list-items, free text, list-items OR free text (Other, Please specify), multiple-selectable-list items etc.
What would be an efficient way to implement this?
Shimmy, I have written a four-part article that addresses this issue - see Creating a Dynamic, Data-Drive User Interface. The article looks at how to let a user define what data to store about clients, so it's not an exact examination of your question, but it's pretty close. Namely, my article shows how to let an end user define the type of data to store, which is along the lines of what you want.
The following ER diagram gives the gist of the data model:
Here, DynamicAttributesForClients is the table that indicates what user-created attributes a user wants to track for his clients. In short, each attribute has a DataTypeId value, which indicates whether it's a Boolean attribute, a Text attribute, a Numeric attribute, and so on. In your case, this table would store the questions of the survey.
The DynamicValuesForClients table holds the values stored for a particular client for a particular attribute. In your case, this table would store the answers to the questions of the survey. The actual value is stored in the DynamicValue column, which is of type sql_variant, allowing any type of data - numeric, bit, string, etc. - to be stored there.
My article does not address how to handle multiple-choice questions, where a user may select one option from a preset list of options, but enhancing the data model to allow this is pretty straightforward. You would create a new table named DynamicListOptions with the following columns:
DynamicListOptionId - a primary key
DynamicAttributeId - specifies what attribute these questions are associated with
OptionText - the option text
So if you had an attribute that was a multiple-choice option you'd populate the drop-down list in the user interface with the options returned from the query:
SELECT OptionText
FROM DynamicListOptions
WHERE DynamicAttributeId = ...
Finally, you would store the selected DynamicListOptionId value in the DynamicValuesForClients.DynamicValue column to record the list option they selected (or use NULL if they did not choose an item).
Give the article a read through. There is a complete, working demo you can download, which includes the complete database and its model. Also, the four articles that make up the series explore the data model in depth and show how to build a web-based (ASP.NET) user interface for letting users define dynamic attributes, how to display them for data entry, and so forth.
Happy Programming!
This may not fit you exactly, but here's what i've got at my part-time job.
I have a questions table, an answers table, and a survey table. For each new survey i crate a survey build (because each survey is unique, but questions and answers are repeated a lot). I then have a respondent table that contains some information about the respondent (and it also links back to the survey table, forgot that in the diagram). I also have a response table that links the respondent and the survey build. This probably isn't the best way but it's the way that works for me, and it works pretty fast (we're at about 1mill+ in the response table and it handles like a dream).
With this model i get reusable questions, reusable answers (a lot of our questions use "Yes" and "No"), and a rather slim response table.

Resources