I'm creating a calendar application in which each date has one of 3 states: available, maybe available, and unavailable. Trying to figure out the best schema for this situation.
One thought might be to have a UserDate model with a field state. The problem with this is that the DB will have #-of-users- x 365 rows for each year - seems like it would grow too quickly for a modestly sized app.
Another thought might be to have a default state, and only create a UserDate object when the user has signified that their availability on that date is different from the default. This seems convoluted though.
Has anyone dealt with this situation before? Any suggestions on the best way to go about this?
When you create a new user, you do not want to be inserting records for the next 50 years of their life. Only creating the UserDate object when there is a non-default value is what you should do.
You could consider storing a range of dates for a user if you are likely to have lots of consecutive dates with the same status. For example, if they are unavailable for all of December, then this could be represented as a single row.
Think about the sort of information you want to extract from your database, and how difficult this will be with each of your possible designs.
Related
Let's say I want to model a graph with sales people. They belong to an organisation, have a manager, etc.. They are assigned to specific territories and/or client accounts. Your company may work with external partners, which must be managed, and so on. A nice, none-trivial, graph.
Elements in this graph keep on changing all the time: sales people come and go, or move within the organisation and thus change responsibilities; customers sign contract or cancel them, ...
In my specific use cases, the point in time is very important. How did the graph look like at the end of last month? End of last fiscal year? last Monday when we run job ABC. E.g. what was the manager hierarchy end of last month? Which clients did the sale person manage end of last month? and so on.
In our use cases, DELETE doesn't delete anything, but some sort of end_date gets updated. UPDATE doen't update anything, but a new version of the record is created.
I'm sure I can add CREATED and START-/END_DATE properties to nodes as well as relations, and for sure I can also create queries. But these queries are a pain to write, and almost unreadable, with tons of repeating where clauses everywhere.
I wish graph databases (and their graphical query builder) would allow me to travel in time more easily, e.g. by setting a session variable to a point in time, and all the where clauses are automatically added for all nodes and references that have the start/end date properties. The algorithm should not fail for objects that don't have these properties, but consider the condition met.
What are you thoughts about this use case und what help does memgraph provide for these use cases?
thanks a lot
Juergen
As far as I am aware there is not any graph database that supports the type of functionality you are asking about directly although as #buda points out you can model and query against time series data. I agree with #buda that the way in which you would like this to work seems a bit undefined and very application specific so I would not expect this to be a feature of any database.
The closest I can think of to out of the box support for something like this would be to use a Tinkerpop-enabled database with a PartitionStrategy or SubgraphStrategy to create the subgraph of only the times you wanted and then query against that. Another option would be creating a domain specific language to minimize the amount of times you need to repeated code in your queries.
PartitionStrategy
SubgraphStrategy
Domain Specific Languages
I'm making an api for movie/tv/actors etc. with web api 2 and sql server. The database now has >30 tables, most of them storing data users will be able to edit.
How should I store old version of entries?
Say someone edits description, runtime and tagline for a entry(movie) in the movies table.
I'll have a table(movies_old), where I store the editable files in 'movies' pluss who/when it was edited.
All in the same database. The '???_old' tables has no relationships.
I'm very new to database design. Is there something obviously wrong with this?
To my mind, there are two issues here: what table you store the data in, and what goes in the "historical value" field.
On the first question, there are two obvious options: Store old and new records in the same table, with some sort of indication of which is "current" and which is "history", or have a separate table for history.
The main advantage of one table is that you have a simpler schema. This is especially true if the table contains many fields. If there are two tables, then all the field definitions are duplicated. When you move data from the current table to the history table, you have to copy every field, and if the list of fields changes, or their formats change, you have to remember to update the copy. Any queries that show the history have to read two tables. Etc. But with one table, all that goes away. Converting a record from current to history just means changing the setting of the "is_current" flag or however you indicate it.
The main advantages of two tables are, (a) Access is probably somewhat faster, as you don't have so many irrelevant records to skip over. (b) When reading the current table you don't have to worry about excluding the history records.
Oh, an annoying thing about SQL: In principle you could put a date on each record, and then the record with the latest date is the current one. In practice this is a pain: you usually have to have an inner query to find the latest date, and then feed this back in to an outer query that re-reads the record with that date. (Some SQL engines have ways around this. Postgres, for example.) So in practice, you need an "is_current" flag, probably 1 for current and 0 for history or some such.
The other issue is what to put in the contents. If you're dealing with short fields, customer number and amount billed and so forth, then the simple and easy thing to do is just store the complete old contents in one record and the complete new contents in the new record. But if you're dealing with a long text block, like a plot synopsis or a review, there could be many small editorial changes. If every time someone fixes a grammar or spelling error, we have a whole new record with the entire 1000 characters, of which 5 characters are different, this could really clutter up the database. If that's the case you might want to investigate ways to store changes more efficiently. May or may not be an issue to you.
I want to use a database for a program I'm creating. Let's say that I will have to manage clients that can make "posts" and every post has a series of properties.
To store the information about the users I have created a table. I'm not sure how to design the table for "posts". Every post has some properties that are text and about ten boolean properties.
My question is: Would it be better to have only one column with a Y,N,N,Y.... and then do a split in the program to know every status of these properties or is it better to have every property in a column with a boolean type?
I anticipate large number of clients and a large number of posts so I don't know if this last option is faster and cheaper or not. What do you think? My program will serve data to mobile phones.
It is better to have a column for each of the boolean properties, I would also recommend using a bit, or tinyint column with 1, 0 values instead of Y/N as they take up less space and are easier to manipulate for reporting purposes
I am designing a database that needs to store transaction time and valid time, and I am struggling with how to effectively store the data and whether or not to fully time-normalize attributes. For instance I have a table Client that has the following attributes: ID, Name, ClientType (e.g. corporation), RelationshipType (e.g. client, prospect), RelationshipStatus (e.g. Active, Inactive, Closed). ClientType, RelationshipType, and RelationshipStatus are time varying fields. Performance is a concern as this information will link to large datasets from legacy systems. At the same time the database structure needs to be easily maintainable and modifiable.
I am planning on splitting out audit trail and point-in-time history into separate tables, but I’m struggling with how to best do this.
Some ideas I have:
1)Three tables: Client, ClientHist, and ClientAudit. Client will contain the current state. ClientHist will contain any previously valid states, and ClientAudit will be for auditing purposes. For ease of discussion, let’s forget about ClientAudit and assume the user never makes a data entry mistake. Doing it this way, I have two ways I can update the data. First, I could always require the user to provide an effective date and save a record out to ClientHist, which would result in a record being written to ClientHist each time a field is changed. Alternatively, I could only require the user to provide an effective date when one of the time varying attributes (i.e. ClientType, RelationshipType, RelationshipStatus) changes. This would result in a record being written to ClientHist only when a time varying attribute is changed.
2) I could split out the time varying attributes into one or more tables. If I go this route, do I put all three in one table or create two tables (one for RelationshipType and RelationshipStatus and one for ClientType). Creating multiple tables for time varying attributes does significantly increase the complexity of the database design. Each table will have associated audit tables as well.
Any thoughts?
A lot depends (or so I think) on how frequently the time-sensitive data will be changed. If changes are infrequent, then I'd go with (1), but if changes happen a lot and not necessarily to all the time-sensitive values at once, then (2) might be more efficient--but I'd want to think that over very carefully first, since it would be hard to manage and maintain.
I like the idea of requiring users to enter effective daes, because this could serve to reduce just how much detail you are saving--for example, however many changes they make today, it only produces that one History row that comes into effect tomorrow (though the audit table might get pretty big). But can you actually get users to enter what is somewhat abstract data?
you might want to try a single Client table with 4 date columns to handle the 2 temporal dimensions.
Something like (client_id, ..., valid_dt_start, valid_dt_end, audit_dt_start, audit_dt_end).
This design is very simple to work with and I would try and see how ot scales before going with somethin more complicated.
I'm going to try to keep this question database agnostic, but I have an interesting problem that I need to tackle and I thought I'd open up the floor for suggestions and feedback.
I need to be able to download data from a feed source and store it in a database of some kind, the data needs to be merged into the existing data and I need to able to query for the data as of any given date. It's the part in bold that I'd like to talk about.
Essentially what this problem boils down to is that I need to persist an object graph to an OLTP database and be able to query it temporally.
In the simple case of one table this problem is very simple, you have a date range indicating the valid time span for the record and then you pass in an as of range and only select rows that are valid for this point in time. The issues rise when you have more than one table.
Let's take the case of having two tables, Order-*Item.
When we query for an order we can apply the same as of date changes to the item table. All is well, but what happens if we want to modify an order? Now we need to copy the order row, set the date ranges so the valid from on the new row and the valid to on the new row is set to now. We also have to copy the items, or if we change our model copy the references to the items.
Even in this simple case things are starting to get complicated.
My problem is exacerbated because I have a self-referential object graph, so to use the above model you'd have Order-*Item-*Order.
What would you do? How do you structure your databases when you need versioning of rows and temporal queries?
Back in the day, Developing Time-Oriented Database Applications in SQL was the best source of info for temporal databases. Published in 1999, the copyright has reverted to the author, and the link goes to his PDF version of the book. Look here for more of his publications, and for a link to the compressed content of the CDROM.