Database design / data model for socail networking - database

I am designing a database for a social media website (uni assignment).
I have been struggling with the messages link to member. There will be a need for two foreign keys from the same primary key in messages. One for the sending member and one for the receiving member. I was unsure if this was possible or a good idea so i was thinking of assign a member to a inbox (Many Members - One Inbox). Then assign all messages to the inbox (One Inbox - Many Messages).
Member Many------1 Inbox 1------Many Messages
Tables look like....
##################
Member Profiles
Member ID (PK)
Name
Gender
Inbox ID (FK)
##################
Inbox
Inbox ID
##################
Message
Message ID (PK)
Inbox ID (FK)
Message Direction .... either to or from (then the members name)
Member ID (FK)
That's what Ive got so far id appreciate some pointers if ive gone off the right path. Because the more i look at my design the less i like it.

The draft model may helps you:
Message creation:
Message consumption: (after send message process)

No don't do that
Messages can belong to multiple members and members can have mulitple messages, so you need what is called a Join table.
MemberMessage
Memberid
Message id

Have you considered changing the MemberID field on the Message table to something like SendingMemberID, and then adding another field called ReceivingMemberID? This would allow you to avoid the somewhat confusing Message Direction field.
Sure you'd need to join to it twice to get all incoming and outgoing messages for a user, but that's really not a bad thing. The alternative option of having two records for each message (one for sender, one for receiver) has its own drawbacks.
Good luck!

Related

skype main.db - difference between Chats and Conversations

I've been dissecting Skype database main.db for a couple of days, and this is something which I haven't yet figured out. Naturally, this question will be very specifically for Skype main.db structure.(disclaimer)
It seems that all the necessary information that I need are in tables Conversations, Messages, Participants.
Message table contains actual log that has been said, the recipient(s), timestamp, and the convo_id foreign key(although not enforced) to connect to a Conversation which the message belongs to.
Conversation exists to hold the aggregates of Message and the Contacts that participate in.
Participants table works to a many-to-many connector table between the Conversations table and Contacts table.
What gets me are Chats and Chatmembers table. Chatmembers works to Chats what Participants table works to Conversations table; connecting Contacts and the conversations-or 'chats'.
What's in Chats is similar to Conversations except that it does not have any aggregate to Message table. It is impossible to map from Messages table to Chats table to which the message log(row of Messages table) belongs.
Chats and Conversations share a foreign key, Conversations table has a column named chat_dbid which joins to the Chats table. But there are rows in Conversations table which have a null chat_dbid field, and not all rows in Chats have id field which corresponds to chat_dbid field in Chats table.
The Chats table is still being updated and I recognize some of the chats-or conversations- I've had recently based on the timestamp and the members in it.
Does anyone know exactly what Chats table does? Or rather, what's the difference and justification for Chats table and Conversations table?
When I looked frantically for this I could find only one like that talked about main.db structure, and it wasn't very helpful.
According to the link Chats
Provides the chats in which the user participated.
and Conversations
Provides a list of the conversations in which the user participated.
What's their terminology about Chats and Conversations? How are they different?
It's been driving me crazy.
Yesterday I was also going through main.db table in skype. Below are my findings.
Conversations table uniquely identifies conversation with a particular contact(or a group contact you have created). Conversation entails all communication: chat messages, voice message, file transfers, calls that you do with a particular contact. Most of the tables have references to the entry in this table.
Messages table has convo__id,
Chats table has conv_dbid,
Transfers has convo_id and likewise.
Messages table: messages entry are not always chats. If an entry is chat then its chatname field is populated.
It seems that chats and messages has one to many relations. chat is a collection of messages maintained per some identifier(most probably day not sure.). "type=61" seems to be normal message: message typed by user. Other types seem to be auto generated messages for eg. msg you get if a call is disconnected.
Hope this helps.
It looks like Chats are redundant. Messages are grouped into chats as an after-thought, you can have several Chats inside one Conversation and then some messages outside any Chat. The rules for grouping are unclear, perhaps by time.
Grouping is done by setting chatname field of a bunch of messages to the same value. Chat names look like #SenderId/$TargetId;ChatId or #SenderId/ChatId for Chats over groupchat.
ChatIds don't seem to hold any particular meaning and can be different on different PCs.
Not every Chat gets an entry in Chats table: SELECT DISTINCT(chatname) FROM Messages gives a great many more entries than SELECT * FROM Chats. Not everything that goes into chatname is a name of chat from Chats. Sometimes it's a conversation id (== groupchat id or skypename).
Different Skype instances also group the same synchronized messages into Chats differently.
So basically Chats are not important, they group messages arbitrarily, they don't contain any key data about who sent what to whom.
This is how I understand other tables work:
Contacts - this is everyone whose skypename is mentioned in the database, even people you never knew about (which said something in the groupchat you were listening to at the time). is_permanent marks those in your contact list.
Conversations - this is a union of your actual contacts and groupchats you have ever had joined. This is what one should see as "contact list". If you need contacts you've never messages, add Contacts WHERE is_permanent=1. If you only want present contacts, filter by is_bookmarked or something like that.
There seems to be no duplicates and splits. One contact = one conversation, one groupchat = one conversation. If you're talking with a contact one on one and you add another party, previous messages remain in that contact's Conversation, and the following ones get their own Conversation.
Messages - this is all messages and events ever sent or received:
convo_id - always set, always references a conversation. This is how you identify to what contact / groupchat the messages was sent.
chatname - always set, sometimes references a chat from Chats, sometimes a chat which is not in Chats, sometimes a groupchat id or skypename from Conversations. Mostly this can be ignored, or you can group messages by this field visually.
author, from_name - who sent this message and their nick at the time, always set properly.
dialog_partner - very unreliable, different values for the same message on different PCs
participant_count - sometimes set, sometimes not, same as with dialog_partner: unreliable.
identities - mentions all skypenames related to the event, or sometimes doesn't. Rules are unclear, unreliable.

Need guidance in designing a database

I'm trying to design a wesite that would allow sports teams to more easily organize matches:
A user signs up and joins a team. Members can browse available teams and send them private messages to organize matches. After a match is done, teams can post comments on each other's pages mentioning their skill, sportsmanship etc. Here is what I imagined the database to look like:
User
* UserID
* Username
* email
Team
* TeamID
* TeamName
* OtherInfo
Review
* FromID
* ToID
* Date
* Comments
Message
* FromID
* ToID
* Content
UserTeam (junction table)
* pk (UserID, TeamID)
I'm not too sure how to model the reviews and the messages. A review has a from and a to field, so I can't just normalize the design like I would in a many-to-many situation, by using junction tables.
Note: Messages can be sent by either memebers and teams and can be received by either members or teams.
Not sure what the question is but if your messages (and reviews?) can be sent and received by teams or members, you need to differentiate the message purpose. Add a tinyint or some other column which indicates if a message is meant by a person/team to a person/team (so your query knows to use the FromID and ToID of the relevant table).
What about the team sizes and what do you mean by saying: "Members can browse available teams and send them private messages to organize matches."? Are all the teams the same size and can people skip from team to team freely? If so, you need to keep track of the team's members and if there are any need for additional members. You can do this in the Team table or UserTeam junction table.
I'm also guessing your website needs a login functionality. It might be a good idea to differentiate between basic team members and official representatives of the team. So only the official members (or whatever) are able to send (and receive!) messages between teams. (If you mean to implement a simple guestbook type of solution this point might be useless.)
Edit. An option for normalizing.
I don't see anything wrong in your current schema but you could combine Review and Message tables like this:
Communication
MsgID (PK)
FromID (FK to user, NOT NULL)
AnswerTo (FK to MsgID, NULL)
Timestamp
Review/Message (tinyint [what type the communication is], NOT NULL)
Text (NOT NULL)
Review/Message column could also differentiate messages between persons and teams, so FromID could be FK to TeamID as well.

How to solve this solution with DB

I have a table Users (UserID, FirstName, LastName...) and a table Messages. Table Messages stores messages, which are sent between users.
So, I can create this table like:
Messages (SenderID, ReceiverID...) and create 2 FK to Users, but this approach seems is incorrect and does not allow to make cascade delete/update for FK.
Multiple messages are not allowed.
Also, I can't set "Set Null" for both relationships. Why? It would be very good.
Which structure is correct in this case?
You have to look at the problem from the users perspective.
Do you think that the receiver wants it's message deleted when the sender deletes it's message from the outbox?
No. In other words: Create one copy of the message for each user.
Multiple messages are not allowed
Insane requirement. It IS two different messages.

private message database design

I'm creating a simple private message system and I'm no sure which database design is better.
The first design is a table for messages, and a table for message comments:
Message
---------------
id
recipientId
senderId
title
body
created_at
MessageComment
---------------
id
messageId
senderId
body
created_at
the second design, is one table for both messages and comments, and an addition field messageId so i'll be able to chain messages as comments.
Message
---------------
id
recipientId
senderId
messageId
title
body
created_at
I'd like to hear your opinion!
In this case, I'd vote for one table.
In general, whenever the data in two tables is the same or very similar and the logical concepts they represent are closely related, I'd put them in a single table. If there are lots of differences in the data or the concepts are really different, I'd make them two tables.
If you make two tables and you find yourself regularly writing queries that do a union of the two, that's an indication that they should be combined.
If you make one table but you find there are many fields that are always null for case A and other fields that are always null for case B, or if you're giving awkward double-meanings to fields, like "for type A this field is the zip code but for type B it's the product serial number", that's an indication they should be broken out.
Using a single table is the most advantageous.
It allows better message threading possibilities and it reduces duplication of effort, i.e. what happens when you want to add a column.
I would rather use the first one and include an additional field del_code to both tables. So, you'll be able to hide deleted messages and still have them in your database.

App Engine: how would you... snapshotting entities

Let's say you have two kinds, Message and Contact, related by a
db.ListProperty of keys on Message. A user creates a message, adds
some contacts as recipients, and emails the message. Later, the user
deletes one of the contact entities that was a recipient of the
message. Our application should delete the appropriate Contact
entity, but we want to preserve the original recipient list for the
message that was sent for the user's records. In essence, we want a
snapshot of the message entity at the time it was sent. If we naively
delete the contact entity, though, we lose snapshot integrity; if not,
we are left with an invalid key.
How would you handle this situation,
either in controller logic or model changes?
class User(db.Model):
email = db.EmailProperty(required=True)
class Contact(db.Model):
email = db.EmailProperty(required=True)
user = db.ReferenceProperty(User, collection_name='contacts')
class Message(db.Model):
recipients = db.ListProperty(db.Key) # contacts
sender = db.ReferenceProperty(User, collection_name='messages')
body = db.TextProperty()
is_emailed = db.BooleanProperty(default=False)
I would add a boolean field "deleted" (or something spiffier, such as the date and time of deletion) to the Contact model -- so that contacts are never physically deleted, but rather only "logically" deleted when that field is set. (This also lets you offer other cool features such as "show my old now-deleted contacts", "undelete" functionality, etc, if you wish).
This is a common approach in all storage systems that are required to maintain historical integrity (and/or similar requirements such as "auditability").
In cases where the sheer amount of logically deleted entities is threatening to damage system performance, the classic alternative is to have a separate, identical model "DeletedContacts", but foreign key constraints require more work, e.g. the Message class would have to have both recipients and deleted_recipients fiels if you needed foreign key integrity (but using just keys, as you're doing, this extra work would not be needed).
I doubt the average user will delete such a huge percentage of their contacts as to warrant the optimization explained in the last paragraph, so in this case I'd go with the simple "deleted" field.
Alternately, you could refactor your Contact model by moving the email address into the key name and setting the user as the parent entity. Your recipients property would change to a string list of raw email addresses. This gives you a static list of email recipients without having to fetch a set of corresponding entities for each one, or requiring that such entities still exist. If you want to fetch the contact entities, you can easily construct their keys from the user and the recipient address.
One limitation here is that the email address on an existing contact entity cannot be changed, but I think you have that problem anyway. Changing a contact address with your existing model would retroactively change the recipients of a sent message, which we know is a problem.

Resources