PostgreSQL: Database structure for Chat Conversation - database

I am designing a Table for Chat Conversation. Instead of create 2 table: Conversation and Message. I just design 1 table: Conversation and use JSONB field for Message.
You guys check this photo:
Is this database structure solution good or bad? And if it's bad, is there other solutions for me?

I would strongly recommend to normalize your table structure.
The participants should go into separate table with columns id_conversation and id_user. It would be better for searching and updating than using a (json) array.
Same thing with messages. Why not storing them into a separate table with columns id_conversation, timestamp, id_user, message_text? It would be much better designed for searching and updating as well. And it makes your conversation table much smaller.
Addionally: What is that participants column for? If you have messages for each conversation you could easily ask the table for all users who committed a message to the conversation with something like
SELECT DISTINCT id_user FROM messages WHERE id_conversation = 42
Edit:
On principle: 1M data sets are much but not a gigantic table. Postgres with a good table design should not have any problems with it. But I assume one conversation has much fewer messages so you can do a lot with filtering and indexing.
1.
I strongly recommend to think about some clever indexes for your tables which should make the searching really quick. Maybe an index over the message's timestamps could help and one over the conversion IDs:
CREATE INDEX idx_messages_timestamp
ON messages (timestamp);
CREATE INDEX idx_messages_conversations
ON messages (id_conversation);
If you want do fetch the newer messages it could be helpful to create the indexes with an DESC order (... ON messages(... DESC))
2.
For really huge tables (I mean REALLY huge tables) it could be helpful to partition it. This splits your table internally on a certain criterion - maybe on timestamp (monthly or yearly for example). So if you mostly fetch some newer data the older ones will be archived in separate tables internally. So the query is only on the rows of the requested smaller table.
But this is kind of advanced: https://www.postgresql.org/docs/current/static/ddl-partitioning.html

Related

Am I modelling my warehouse tables the right way?

I'm designing a website where users answer surveys. I need to design a data warehouse to aggregate their responses. So far in my model I have:
A dim table for Users.
A dim table for Questions.
A fact table for UserResponses. <= This is where I'm having the problem.
So the problem I have is that additional comments can be added to their responses. For example, somebody may come in and make 2 comments against a single response. How should I model this in the database?
I was thinking of creating another fact table for "Comments", and linking it to a record in UserResponses. Is this the right thing to do? This additional table would have something like the below columns:
CommentText
Foreign key relationship to fact.UserResponses.
Yes, your idea to create another table is correct. I would typically call it a "child" table rather than calling it another fact table.
The key thing that you didn't mention is that the table comments still needs an ID field. A table without an ID would be bad design (although it is indeed possible to create the table with no ID) since you would have no simple way to refer to individual comments.
In a dimension model, fact tables are never linked to each other, as the grain of the data will be compromised.
The back-end database of a client application is not usually a data warehouse schema, but more of an online transactional processing (OLTP) schema. This is because transactional systems work better with third normal form. Analytical systems work better with dimensional models because the data can be aggregated (i.e., "sliced and diced") more easily.
I would recommend switching back to an OLTP database. It can still be aggregated when needed, but maintains third normal form for easier transactional processing.
Here is a good comparison between a dimensional model (OLAP) and a transactional system (OLTP):
https://www.guru99.com/oltp-vs-olap.html

Data Modeling: Is it bad practice to store IDs from various sources in the same column?

I am attempting to merge data from various sources into an existing data model. Each source uses different types of IDs (such as GUID, Salesforce IDs, etc.). For example, if I were to merge data from two different sources, the table may look like the following (where the first two SalesPersonIDs are GUID IDs and the second two are Salesforce IDs):
Is this a bad practice? I could also imagine a table where each ID type was its own column and could be left blank if it was not applicable. Something like the following:
I apologize, I am a bit new to this. Thanks in advance for any insight, I greatly appreciate it!
The big roles of an ID column are to act as a key connecting data in different tables, and to help indexing - quickly find rows so your queries run fast.
The second solution wouldn't work well for these purposes, and will lead to big headaches in queries: every time you want to group by the ID, you'll have to combine the info from 2 columns in some way, hopefully getting a correct unique result every time.
On the one hand, all you might ever need from an ID is for it to be unique. The first solution might be fine this respect - but are you sure you'll never, ever get data about one SalesPerson from more than one source?
I'd suggest keeping all the IDs in one column, and adding a column to say what kind of ID this is. At least this way, you won't lose any information and can do other things in the future.
One thing you might consider is making a separate table of SalesPerson with all their possible IDs, and have this keyed to other (Sales?) data by a unique ID used only in your database.

How can I store an indefinite amount of stuff in a field of my database table?

Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.

Suggestions for chat system schema design

I need a suggestion about sql table schema. I've opened a table and named it Chats, would it be better for me to add two columns(like ID and Messages) or one that will contain the IDs and the messages? And which one of them will work faster
Personally I'd model this as two tables:
Chats
- ID
- Name
Messages
- ID
- ChatID
- Message
- SentDate
There should be a foreign key from Messages.ChatID to Chats.ID.
Otherwise you're going to have to create duplicate chats each time someone sends a message.
I would strongly recommend against keeping IDs and Values in the same column, it makes it near impossible to join on and will create all sorts of problems later on.
There is no reason to use a single column. Add as many columns as possible, each with its own data type because you will be able to filter and sort the table by each column later. You will also be able to add constraints, indexes, statistics, etc... if needed.
Any query performed on that table will work faster if you use separate columns.

Database design query

I'm trying to work out a sensible approach for designing a database where I need to store a wide range of continuously changing information about pets. The categories of data can be broken down into, for example, behaviour, illness etc. Data will be submitted on a regular basis relating to these categories, so i need to find a good way to design the db to efficiently accommodate this. A simple approach would just to store multiple records for each pet within each relevant table - e.g the behaviour table would store the behaviour data and would simply have a timestamp for each record along with the identifier for that pet. When querying the db, it would be straightforward to query the one table with the pet id, using the timestamps to output the correct history of submissions. Is there a more sensible way around this or does that make sense?
I would use a combination of lookup tables with a strong use of foreign keys. I think what you are suggesting is very common. For example, get me all the reported illnesses for a particluar pet during this data range would look something like:
Select *
from table_illness
where table_illness.pet_id = <value>
and date between table_illness.start_date and table_illness.finish_date
You could do that for any of the tables. The lookup tables will be a link between, for example, table_illness.illness_type and illness_types.illness_type. The illness_types table is where you would store the details on the types of illnesses.
When designing a database you should build your tables to mimic real-life objects or concepts. So in that sense the design you suggest makes sense. Each pet should have its own record in a pet table which doesn't change. Changing information should then be placed into the appropriate table which has the pet's id. The time stamp method you suggest is probably what I would do -- unless of course this is for a vet or something. Then I'd create an appointment table with the date and connect the illness or behavior to the appointment as well.

Resources