I was wondering how to set up a database for storing actions people recently done when they travel. For example, if they go to a museum, the database will store this text "Bob went to this museum" and store the user id and timestamp. I was wondering if these events should be stored in just one table, and if I want the events of a single person I will just query this table with a user id.
On a similar note I want to store 50 users the user has "recently met" meaning the last 50 users the userhas been around in their travels. I was thinking this could be stored in one table as well, with just user IDs being paired with no duplicates. I'm just afraid the table might get too big.
Personally I would go with an ER structure like this:


How don't select the same row twice in two select cassandra queries?

I am working on a social networking project with cassandra. Users can subscribe to a profile and have access to the list of people who have subscribed to that same profile. My goal is to retrieve in a table called user_follows the list of people subscribed to a profile.
CREATE TABLE users_follows (to_id text, from_id text, followed_at timestamp, PRIMARY KEY(to_id, from_id))
The problem is that some profiles can have thousands of subscribers and I don't want to get them all at once. That's why I'd like to get the list in increments of 20 depending on how far down the user goes. My problem is that I can't see how to retrieve the other parts of the list after the first select because Cassandra always returns the same users.
SELECT * FROM users_follows where to_id = 'xxxxx'
A possible solution was to sort with a timestamp but in case I want to retrieve the list of people to whom a user is subscribed (the reverse query) this would not work. One solution would be to use materialized views but I'm not sure that it would be very optimal given the size of the table. Or to create a different table, one user_follows and another user_followers, but I don't think this is very recommended....

How can I fit these 3 columns into one?

For a project, I have a database with some tables.
All of them are related between them. The table organization has a relationship with offer and user, etc.
However, I have some columns that only serve to link two tables together.
Take a look at the diagram. I use users_interests to link the user to his interests. Same thing for badges and group.
It doesn't really feel efficient. When I try to get the interests of a users, I must first go through requesting the user interests and from that, require the interest details.
Is there a way that I can request the interests of an user without having to go through a second table ?
There's a couple things you should consider when looking at this question/functionality and what you're going to be doing with the data.
Do you have a unique set of interests that will remain the same or be added to, without duplicates?
Do you need to have multiple interests for a single user?
If you answered "yes" to both of these questions, then you should leave things the way they are. You'll be able to reuse the same interests entry for multiple users as well as each user being able to have multiple interests.
If you only want one interest per user, but want to keep a "master list" (aka: lookup table) of interests, then you can move the interest_id to the users table.
If you don't care about duplicates, but you still want each user to have multiple interests, move the name column to the users_interests table.
Finally, if you don't care about duplicates and you want each user to have a single interest, move the name column to the users table.
FYI, how you currently have the tables structured will likely take up less disk space for a large amount of users.
It's unclear what you're trying to do, so I included all options. Each option has it's own set of pros and cons, and each will be required of some other project or another at some point. You might want to learn the difference and reasoning behind why you would use one option over another now, so you don't have to think too hard about it later.
You can do it if you will make interests like this
but then you will lose uniqness of your interests, you will have many rows with the same name only because they connect to different users. Now db is perfectly ok according my understanding...It is as it should be

How can I store an indefinite amount of stuff in a field of my database table?

Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Oh and its a mysql database (my website is in php)
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.

Where should I break up my user records to keep track of revisions

I am putting together a staff database and I need to be able to revise the staff member information, but also keep track of all the revisions. How should I structure the database so that I can have multiple revisions of the same user data but be able to query against the most recent revision? I am looking at information that changes rarely, like Last Name, but that I will need to be able to query for out of date values. So if Jenny Smith changes her name to Jenny James I need to be able to find the user's current information when I search against her old name.
I assume that I will need at least 2 tables, one that contains the uid and another that contains the revisions. Then I would join them and query against the most recent revision. But should I break it out even further, depending on how often the data changes or the type of data? I am looking at about 40 fields per record and only one or two fields will probably change per update. Also I cannot remove any data from the database, I need to be able to look back on all previous records.
A simple way of doing this is to add a deleted flag and instead of updating records you set the deleted flag on the existing record and insert a new record.
You can of course also write the existing record to an archive table, if you prefer. But if changes are infrequent and the table is not big I would not bother.
To get the active record, query with 'where deleted = 0', the speed impact will be minimal when there is an index on this field.
Typically this is augmented with some other fields like a revision number, when the record was last updated, and who updated it. The revision number is very useful to get the previous versions and also to do optimistic locking. The 'who updated this last and when' questions usually come once the system is running instead of during requirements gathering, and are useful fields to put in any table containing 'master' data.
I would use the separate table because then you can have a unique identifier that points to all the other child records that is also the PK of the table which I think makes it less likely you will have data integrity issues. For instance, you have Mary Jones who has records in the address table and the email table and performance evaluation table, etc. If you add a change record to the main table, how are you going to relink all the existing information? With a separate history table, it isn't a problem.
With a deleted field in one table, you then have to have an non-autogenerated person id and an autogenrated recordid.
You also have the possiblity of people forgetting to use the where deleted = 0 where clause that is needed for almost every query. (If you do use the deleted flag field, do yourself a favor and set a view with the where deleted = 0 and require developers to use the view in queries not the orginal table.)
With the deleted flag field you will also need a trigger to ensure one and only one record is marked as active.
#Peter Tillemans' suggestion is a common way to accomplish what you're asking for. But I don't like it.
The structure of a database should reflect the real-world facts that are being modeled.
I would create a separate table for obsolete_employee, and just store the historical information that would need to be searched in the future. This way you can keep your real employee data table clean and keep only the old data that is necessary. This approach will also simplify reporting and other features of the application that are not related to searching historical data.
Just think of that warm feeling you'll get when you type select * from employee and nothing but current, correct goodness comes flowing back!

Database design query

I'm trying to work out a sensible approach for designing a database where I need to store a wide range of continuously changing information about pets. The categories of data can be broken down into, for example, behaviour, illness etc. Data will be submitted on a regular basis relating to these categories, so i need to find a good way to design the db to efficiently accommodate this. A simple approach would just to store multiple records for each pet within each relevant table - e.g the behaviour table would store the behaviour data and would simply have a timestamp for each record along with the identifier for that pet. When querying the db, it would be straightforward to query the one table with the pet id, using the timestamps to output the correct history of submissions. Is there a more sensible way around this or does that make sense?
I would use a combination of lookup tables with a strong use of foreign keys. I think what you are suggesting is very common. For example, get me all the reported illnesses for a particluar pet during this data range would look something like:
Select *
from table_illness
where table_illness.pet_id = <value>
and date between table_illness.start_date and table_illness.finish_date
You could do that for any of the tables. The lookup tables will be a link between, for example, table_illness.illness_type and illness_types.illness_type. The illness_types table is where you would store the details on the types of illnesses.
When designing a database you should build your tables to mimic real-life objects or concepts. So in that sense the design you suggest makes sense. Each pet should have its own record in a pet table which doesn't change. Changing information should then be placed into the appropriate table which has the pet's id. The time stamp method you suggest is probably what I would do -- unless of course this is for a vet or something. Then I'd create an appointment table with the date and connect the illness or behavior to the appointment as well.
