I've been dissecting Skype database main.db for a couple of days, and this is something which I haven't yet figured out. Naturally, this question will be very specifically for Skype main.db structure.(disclaimer)
It seems that all the necessary information that I need are in tables Conversations, Messages, Participants.
Message table contains actual log that has been said, the recipient(s), timestamp, and the convo_id foreign key(although not enforced) to connect to a Conversation which the message belongs to.
Conversation exists to hold the aggregates of Message and the Contacts that participate in.
Participants table works to a many-to-many connector table between the Conversations table and Contacts table.
What gets me are Chats and Chatmembers table. Chatmembers works to Chats what Participants table works to Conversations table; connecting Contacts and the conversations-or 'chats'.
What's in Chats is similar to Conversations except that it does not have any aggregate to Message table. It is impossible to map from Messages table to Chats table to which the message log(row of Messages table) belongs.
Chats and Conversations share a foreign key, Conversations table has a column named chat_dbid which joins to the Chats table. But there are rows in Conversations table which have a null chat_dbid field, and not all rows in Chats have id field which corresponds to chat_dbid field in Chats table.
The Chats table is still being updated and I recognize some of the chats-or conversations- I've had recently based on the timestamp and the members in it.
Does anyone know exactly what Chats table does? Or rather, what's the difference and justification for Chats table and Conversations table?
When I looked frantically for this I could find only one like that talked about main.db structure, and it wasn't very helpful.
According to the link Chats
Provides the chats in which the user participated.
and Conversations
Provides a list of the conversations in which the user participated.
What's their terminology about Chats and Conversations? How are they different?
It's been driving me crazy.
Yesterday I was also going through main.db table in skype. Below are my findings.
Conversations table uniquely identifies conversation with a particular contact(or a group contact you have created). Conversation entails all communication: chat messages, voice message, file transfers, calls that you do with a particular contact. Most of the tables have references to the entry in this table.
Messages table has convo__id,
Chats table has conv_dbid,
Transfers has convo_id and likewise.
Messages table: messages entry are not always chats. If an entry is chat then its chatname field is populated.
It seems that chats and messages has one to many relations. chat is a collection of messages maintained per some identifier(most probably day not sure.). "type=61" seems to be normal message: message typed by user. Other types seem to be auto generated messages for eg. msg you get if a call is disconnected.
Hope this helps.
It looks like Chats are redundant. Messages are grouped into chats as an after-thought, you can have several Chats inside one Conversation and then some messages outside any Chat. The rules for grouping are unclear, perhaps by time.
Grouping is done by setting chatname field of a bunch of messages to the same value. Chat names look like #SenderId/$TargetId;ChatId or #SenderId/ChatId for Chats over groupchat.
ChatIds don't seem to hold any particular meaning and can be different on different PCs.
Not every Chat gets an entry in Chats table: SELECT DISTINCT(chatname) FROM Messages gives a great many more entries than SELECT * FROM Chats. Not everything that goes into chatname is a name of chat from Chats. Sometimes it's a conversation id (== groupchat id or skypename).
Different Skype instances also group the same synchronized messages into Chats differently.
So basically Chats are not important, they group messages arbitrarily, they don't contain any key data about who sent what to whom.
This is how I understand other tables work:
Contacts - this is everyone whose skypename is mentioned in the database, even people you never knew about (which said something in the groupchat you were listening to at the time). is_permanent marks those in your contact list.
Conversations - this is a union of your actual contacts and groupchats you have ever had joined. This is what one should see as "contact list". If you need contacts you've never messages, add Contacts WHERE is_permanent=1. If you only want present contacts, filter by is_bookmarked or something like that.
There seems to be no duplicates and splits. One contact = one conversation, one groupchat = one conversation. If you're talking with a contact one on one and you add another party, previous messages remain in that contact's Conversation, and the following ones get their own Conversation.
Messages - this is all messages and events ever sent or received:
convo_id - always set, always references a conversation. This is how you identify to what contact / groupchat the messages was sent.
chatname - always set, sometimes references a chat from Chats, sometimes a chat which is not in Chats, sometimes a groupchat id or skypename from Conversations. Mostly this can be ignored, or you can group messages by this field visually.
author, from_name - who sent this message and their nick at the time, always set properly.
dialog_partner - very unreliable, different values for the same message on different PCs
participant_count - sometimes set, sometimes not, same as with dialog_partner: unreliable.
identities - mentions all skypenames related to the event, or sometimes doesn't. Rules are unclear, unreliable.
Related
My question is a bit of a logical one. I hope my title is not misleading.
I'm working on a mail application like website where users can send or receive documents.
Documents are kept in a daabase table which holds attributes like Sender, Receiver, DeleteDate, DeleteuserId etc.
Let's look at this scenario.
A sends Document1 to B.
Document1 is at A's outbox and B's Inbox
A wants to delete it from the outbox
At this moment my deletion mechanism kicks in and sets the deletedate and deleteuserId of Document1 to date and Id of A respectively.
Problem is, now the document is logically deleted ( deletedate and deleteuserID are not null anymore) so both A and B can't see it because listing stored procedures don't allow "deleted" items to the list.
What kind of a logic should be implemented in order to let B see it and A don't ?
Not the best. but if you update the sender_id in the document table, then it will lost the connection with the document table. But if there is a some logic then you will create some other errors.
I am designing a database for a social media website (uni assignment).
I have been struggling with the messages link to member. There will be a need for two foreign keys from the same primary key in messages. One for the sending member and one for the receiving member. I was unsure if this was possible or a good idea so i was thinking of assign a member to a inbox (Many Members - One Inbox). Then assign all messages to the inbox (One Inbox - Many Messages).
Member Many------1 Inbox 1------Many Messages
Tables look like....
##################
Member Profiles
Member ID (PK)
Name
Gender
Inbox ID (FK)
##################
Inbox
Inbox ID
##################
Message
Message ID (PK)
Inbox ID (FK)
Message Direction .... either to or from (then the members name)
Member ID (FK)
That's what Ive got so far id appreciate some pointers if ive gone off the right path. Because the more i look at my design the less i like it.
The draft model may helps you:
Message creation:
Message consumption: (after send message process)
No don't do that
Messages can belong to multiple members and members can have mulitple messages, so you need what is called a Join table.
MemberMessage
Memberid
Message id
Have you considered changing the MemberID field on the Message table to something like SendingMemberID, and then adding another field called ReceivingMemberID? This would allow you to avoid the somewhat confusing Message Direction field.
Sure you'd need to join to it twice to get all incoming and outgoing messages for a user, but that's really not a bad thing. The alternative option of having two records for each message (one for sender, one for receiver) has its own drawbacks.
Good luck!
I have a table Users (UserID, FirstName, LastName...) and a table Messages. Table Messages stores messages, which are sent between users.
So, I can create this table like:
Messages (SenderID, ReceiverID...) and create 2 FK to Users, but this approach seems is incorrect and does not allow to make cascade delete/update for FK.
Multiple messages are not allowed.
Also, I can't set "Set Null" for both relationships. Why? It would be very good.
Which structure is correct in this case?
You have to look at the problem from the users perspective.
Do you think that the receiver wants it's message deleted when the sender deletes it's message from the outbox?
No. In other words: Create one copy of the message for each user.
Multiple messages are not allowed
Insane requirement. It IS two different messages.
What's the standard relational database idiom for setting permissions for items?
Answers should be general; however, they should be able to be applied to example below. Anything flies: adding columns, adding another table—whatever as long as it works well.
Application / Example
Assume the Twitter database is extremely simple: we have one User table, which contains a login and user id; we have a Tweet table, which contains a tweet id, tweet text, and creator id; and we have a Follower table, which contains the id of the person being followed and the follower.
Now, assume Twitter wants to enable advanced privacy settings (viewing permissions), so that users can pick exactly which followers can view tweets. The settings can be:
Everyone on Twitter
Only current followers (which would of course have to be approved by the user, this doesn't really matter though) EDIT: Current as in, I get a new follower, he sees it; I remove a follower, he stops seeing it.
Specific followers (e.g., user id 5, 10, 234, and 1)
Only the owner
Under these circumstances, what's the best way to represent viewing permissions? The priorities, in order, are speed of lookup (you want to be able to figure out what tweets to display to a user quickly), speed of creation (you don't want to take forever to post a tweet), and efficient use of space (every time I post a tweet to everyone on my followers' list, I shouldn't have to add a row for each and every follower I have to some table.)
Looks like a typical many-to-many relationship -- I don't see any restrictions on what you desire that would allow space savings wrt the typical relational DB idiom for those, i.e., a table with two columns (both foreign keys, one into users and one into tweets)... since the current followers can and do change all the time, posting a tweet to all the followers that are current at the instant of posting (I assume that's what you mean?) does mean adding that many (extremely short) rows to that relationship table (the alternative of keeping a timestamped history of follower sets so you can reconstruct who was a follower at any given tweet-posting time appears definitely worse in time and not substantially better in space).
If, on the other hand, you want to check followers at the time of viewing (rather than at the time of posting), then you could make a special userid artificially meaning "all followers of the current user" (just like you'll have one meaning "all users on Twitter"); the needed SQL to make the lookup fast, in that case, looks hairy but feasible (a UNION or OR with "all tweets for which I'm a follower of the author and the tweet is readable by [the artificial userid representing] all followers"). I'm not getting deep into that maze of SQL until and unless you confirm that it is this peculiar meaning that you have in mind (rather than the simple one which seems more natural to me but doesn't allow any space savings on the relationship table for the action of "post tweet to all followers").
Edit: the OP has clarified they mean the approach I mention in the second paragraph.
Then, assume userid is the primary key of the Users table, the Tweets table has a primary key tweetid and a foreign key author for the userid of each tweet's author, the Followers table is a typical many-to-many relationship table with the two columns (both foreign keys into Users) follower and followee, and the Canread table a not-so-typical many-to-many relationship table, still with two column -- foreign key into Users is column reader, foreign key into Tweets is column tweet (phew;-). Two special users #everybody and #allfollowers are defined with the above meanings (so that posting to everybody, all followers, or "just myself", all add only one row to Canread -- only selective posting to a specific list of N people adds N rows).
So the SQL for the set of tweet IDs a user #me can read is, I think, something like:
SELECT Tweets.tweetid
FROM Tweets
JOIN Canread ON(Tweets.tweetid=Canread.tweet)
WHERE Canread.reader IN (#me, #everybody)
UNION
SELECT Tweets.tweetid
FROM Tweets
JOIN Canread ON(Tweets.tweetid=Canread.tweet)
JOIN Followers ON(Tweets.author=Followers.followee)
WHERE Canread.reader=#allfollowers
AND Followers.follower=#me
Let's say you have two kinds, Message and Contact, related by a
db.ListProperty of keys on Message. A user creates a message, adds
some contacts as recipients, and emails the message. Later, the user
deletes one of the contact entities that was a recipient of the
message. Our application should delete the appropriate Contact
entity, but we want to preserve the original recipient list for the
message that was sent for the user's records. In essence, we want a
snapshot of the message entity at the time it was sent. If we naively
delete the contact entity, though, we lose snapshot integrity; if not,
we are left with an invalid key.
How would you handle this situation,
either in controller logic or model changes?
class User(db.Model):
email = db.EmailProperty(required=True)
class Contact(db.Model):
email = db.EmailProperty(required=True)
user = db.ReferenceProperty(User, collection_name='contacts')
class Message(db.Model):
recipients = db.ListProperty(db.Key) # contacts
sender = db.ReferenceProperty(User, collection_name='messages')
body = db.TextProperty()
is_emailed = db.BooleanProperty(default=False)
I would add a boolean field "deleted" (or something spiffier, such as the date and time of deletion) to the Contact model -- so that contacts are never physically deleted, but rather only "logically" deleted when that field is set. (This also lets you offer other cool features such as "show my old now-deleted contacts", "undelete" functionality, etc, if you wish).
This is a common approach in all storage systems that are required to maintain historical integrity (and/or similar requirements such as "auditability").
In cases where the sheer amount of logically deleted entities is threatening to damage system performance, the classic alternative is to have a separate, identical model "DeletedContacts", but foreign key constraints require more work, e.g. the Message class would have to have both recipients and deleted_recipients fiels if you needed foreign key integrity (but using just keys, as you're doing, this extra work would not be needed).
I doubt the average user will delete such a huge percentage of their contacts as to warrant the optimization explained in the last paragraph, so in this case I'd go with the simple "deleted" field.
Alternately, you could refactor your Contact model by moving the email address into the key name and setting the user as the parent entity. Your recipients property would change to a string list of raw email addresses. This gives you a static list of email recipients without having to fetch a set of corresponding entities for each one, or requiring that such entities still exist. If you want to fetch the contact entities, you can easily construct their keys from the user and the recipient address.
One limitation here is that the email address on an existing contact entity cannot be changed, but I think you have that problem anyway. Changing a contact address with your existing model would retroactively change the recipients of a sent message, which we know is a problem.