Representing complex scheduled recurrence in a database - database

I have an interesting problem trying to represent complex schedule data in a database. As a guideline, I need to be able to represent the entirety of what the iCalendar -- ics -- format can represent, but in a database. I'm not actually implementing anything relating to ics, but it gives a good scope of the type of rules I need to be able to model for my particular project.
I need to allow allow representation of a single event or a recurring event based on multiple times per day, days of the week, week of a month, month, year, or some combination of those. For example, the third Thursday in November annually, or the 25th of December annually, or every two weeks starting November 2 and continuing until September 8 the following year.
I don't care about insertion efficiency but query efficiency is critical. The operation I will be doing most often is providing either a single date/time or a date/time range, and trying to determine if the defined schedule matches any part of the date/time range. Other operations can be slower. For example, given January 15, 2010 at 10:00 AM through January 15, 2010 at 11:00 AM, find all schedules that match at least part of that time. (i.e. a schedule that covers 10:30 - 11:00 still matches.)
Any suggestions? I looked at How would one represent scheduled events in an RDBMS? but it doesn't cover the scope of the type of recurrence rules I'd like to model.

In the end, this post was most helpful:
iCal "Field" list (for database schema based on iCal standard)
We decided to follow the iCal model pretty exactly since the guys who wrote that standard had a great feel for the problem domain.

The way I did something similar to this was to have two tables. If an event had no recurring pattern, then just store the date, start time, and end time. Your query checks if the time your searching for is greater than the start time of any entry and less than or equal to the end time of that same event.
For recurring events, I'm not too familiar with how iCalendar stores recurrences, but if you store each event by day of the week (you might have to have multiple rows for a single event if it repeats on more than one day a week), then search it almost the same way as the above table. For stranger recurrences like the third Tuesday of the week, you could have an extra column describing the specific condition. I might be able to give you a better answer for this if you could tell me more about how ics represents that kind of recurrence.
I hope that helps. I don't have much time right now. You can contact me later if you want to discuss this. I'm currently in Missouri so my availability for the next week is going to be erratic.

This might be a trivial solution, but what would be the drawbacks of adding a column that defines the recurrence of the event (i.e. every x weeks, annually, weekly, etc) and using that as the result criterion?

Related

Filtering and boosting on field based on time in SOLR

I want to use filtering and boosting on documents based on a field value which is dependent on data collected on time ranges.
Ex.
I am dealing with multiple vendors to supply me goods.
Suppose there are 3 vendors and they have different time they can sell their goods. I want the vendor list based on certain product, I want to boost the vendor who can sell the product now.
I have information like this:
Vendor Does not sell on any sunday
It will definitely sell on monday from 2:00 to 4:00 PM
The rest of the time, I am not sure whether they are available to sell the product or not, so I will not boost their result.
How should I store this information and use the result in solr query?
You can just map the availability (or in separate field non-availability) as an integer, probably at an hour resolution. So, if Monday is your week start, you can have 14,15,16 as availability for second condition. And then you just boost with standard number or range search. (availability:15 or availability[14 TO 16].
For more complex searches, there is a trick to use geographical filtering for these kinds of things.
You basically map your availability (or in separate field non-availability) as a x,y coordinates in some space. Like a box in the calendar. So, your Sunday to Saturday is 0-6 on X and your hours are 0-24. Then, you condition 2 is 1,14-1,16 rectangle and you map the current time into that same concept and see if your dot is inside any of the rectangles in a boost query.
Basically, you decide on granularity and map your time into numbers or rectangles.

Problems with incorrect timezones and locale-specific display of time

So, there is absolutely no reason why we should be having this problem in this day and age, but we do. Our database has datetime columns, and when the dates are pulled out from the database, they are retrieved as CDT (this time of year, CST in the winter). That time is then passed as CDT to the UI through JSON. This could not be more wrong.
The time stored in the database is the time relative to the location specified in the data. So, if we have a trip going from Los Angeles at 6AM PDT to New York at 11PM EDT, then the start time will be retrieved from the database as 6AM CDT and the end time will be retrieved as 11PM CDT.
Requirements:
The UI needs to display in the time local to the data, 6AM and 11PM in the previous example.
The UI needs to indicate items in the near future, such as arrivals within the next 3 hours.
When the data is edited, it needs to be input in the same manner, the user enters 6AM in Los Angeles, and 11PM in New York.
The user also needs to be able to enter times relative to the current time, such as "H+5" for 5 hours from now.
The UI needs to sort based on time. This is more of a nice to have as the application they are used to using doesn't do this right either.
Our current solution is just burying our head in the sand and displaying it (untested for browsers in other timezones, such as those in our California office), which is actually surprisingly effective, even though it is not at all semantically correct.
11PM CDT is read from the database
It gets displayed as 11PM, and is understood by the user to be EDT
The user edits it, puts in 10PM
It is parsed back as 10PM CDT and saved that way, exactly as we want it.
Where the current solution fails miserably is in times relative to the current time.
11PM CDT is read from the database
It gets displayed as 11PM, and is understood by the user to be EDT
The user edits it, puts in "NOW" (assume the user's local clock is 9PM CDT)
It is parsed back as 9PM CDT and saved that way, but it should have been 10PM because it is 10PM in New York!
I'm looking for a way to handle these five cases that isn't totally hideous. I am open to a solution in any layer (architecture detailed above), but there are constraints because we share the database with another application. If there are additional tools/frameworks that would be useful and fit with what we already have, I am open to using them.
Database: SQL Server 2008
API: Rails with JSON responses
Frontend: JS + Moment + other stuff unrelated to dates
Any attempt to correct the data and/or schema is totally out of the question as we would break the other application.
The addition of new views/table columns/tables/stored procedures is usually possible.
The addition of indexes is NOT allowed. The status of any more exotic features is unknown.
There are many tables/endpoints that are affected by this problem, so any brute-force solution is going to be incredibly tedious.
Any solution only needs to work in the Continental US.
Note that this is not a simple timezone conversion as the timezone we get back from the database is straight up wrong, so conversions of the timezone will also be wrong.

What is better: make "Date" composite attribute or atomic?

In a scenerio when I need to use the the entire date (i.e. day, month, year) as a whole, and never need to extract either the day, or month, or the year part of the date, in my database application program, what is the best practice:
Making Date an atomic attribute
Making Date a composite attribute (composed of day, month, and year)
Edit:- The question can be generalized as:
Is it a good practice to make composite attributes where possible, even when we need to deal with the attribute as a whole only?
Actually, the specific question and the general question are significantly different, because the specific question refers to dates.
For dates, the component elements aren't really part of the thing you're modelling - a day in history - they're part of the representation of the thing you're modelling - a day in the calendar that you (and most of the people in your country) use.
For dates I'd say it's best to store it in a single date type field.
For the generalized question I would generally store them separately. If you're absolutely sure that you'll only ever need to deal with it as a whole, then you could use a single field. If you think there's a possibility that you'll want to pull out a component for separate use (even just for validation), then store them separately.
With dates specifically, the vast majority of modern databases store and manipulate dates efficiently as a single Date value. Even in situations when you do want to access the individual components of the date I'd recommend you use a single Date field.
You'll almost inevitably need to do some sort of date arithmetic eventually, and most database systems and programming languages give some sort of functionality for manipulating dates. These will be easier to use with a single date variable.
With dates, the entire composite date identifies the primary real world thing you're identifying.
The day / month / year are attributes of that single thing, but only for a particular way of describing it - the western calendar.
However, the same day can be represented in many different ways - the unix epoch, a gregorian calendar, a lunar calendar, in some calendars we're in a completely different year. All of these representations can be different, yet refer to the same individual real world day.
So, from a modelling point of view, and from a database / programmatic efficiency point of view, for dates, store them in a single field as far as possible.
For the generalisation, it's a different question.
Based on experience, I'd store them as separate components. If you were really really sure you'd never ever want to access component information, then yes, one field would be fine. For as long as you're right. But if there's even an ability to break the information up, I peronally would separate them from the start.
It's much easier to join fields together, than to separate fields from a component string. That's both from a programm / algorithmic viewpoint and from compute resource point of view.
Some of the most painful problems I've had in programming have been trying to decompose a single field into component elements. They'd initially been stored as one element, and by the time the business changed enough to realise they needed the components... it had become a decent sized challenge.
Most composite data items aren't like dates. Where a date is a single item, that is sometimes (ok, generally in the western world) represented by a Day-Month-Year composite, most composite data elements actually represent several concrete items, and only the combination of those items truly uniquely represent a particular thing.
For example a bank account number (in New Zealand, anyway) is a bit like this:
A bank number - 2 or 3 digits
A branch number - 4 to 6 digits
An account / customer number - 8 digits
An account type number - 2 or 3 digits.
Each of those elements represents a single real world thing, but together they identify my account.
You could store these as a single field, and it'd largely work. You might decide to use a hyphen to separate the elements, in case you ever needed to.
If you really never need to access a particular piece of that information then you'd be good with storing it as a composite.
But if 3 years down the track one bank decides to charge a higher rate, or need different processing; or if you want to do a regional promotion and could key that on the branch number, now you have a different challenge, and you'll need to pull out that information. We chose hyphens as separators, so you'll have to parse out each row into the component elements to find them. (These days disk is pretty cheap, so if you do this, you'll store them. In the old days it was expensive so you had to decide whether to pay to store it, or pay to re-calculate it each time).
Personally, in the bank account case (and probably the majority of other examples that I can think of) I'd store them separately, and probably set up reference tables to allow validation to happen (e.g. you can't enter a bank that we don't know about).

Determining calendar availability, search algorithm and db structure

need to design an availability application.
The users would mark on a calendar all their events, just like a regular calendar. Then I would need to search for something like, "I need someone from next Thursday afternoon through Saturday morning". So what I'm searching for is the negative of what the user puts in - the user puts in time slots when they're NOT available, and I search for the slots when they are.
The simplest thing I can think of is to just put down the calendar info in these 2 tables,
User (id, name, etc.... )
Events (id, user_id_foreign_key, time_start, time_end, type)
The "type" in Event is probably going to be used for something like "daily", "weekly", etc. for infinitely repeating events. Haven't really figured out how to handle those yet...
So all I really have to do is to search for whether or not, for each user, if any of their events' start / end times fall within the time range I gave (i.e., events' start times are earlier than the range's end time OR events' end times are later than the range's start time). If they do then the user is unavailable. I'd of course have to cycle through all the users to do this.
Does this sound like the most efficient way to do this? Came up with this myself so would like to get some feedback. Thanks!
No, it doesn't sound like the most efficient way to do this if the number of users and events grows large in your system.
For example, if you would store the FREE time intervals in your database instead, you would be able to find all the available users with a single database query (find all free intervals that contain the interval you search for, and join this with the users data table). The free intervals data table should be indexed with both start and end times of the intervals.
Maintaining free intervals is a bit more complicated, because when you add an event you normally would split a free interval. But for the search if would be a better datastructure.

Availability Calendar Ideas

I have an interesting situation here this time.
This more of a conceptual question, less of a technical code question.
I need to have an 'availability calendar' for a new web application.
The calendar needs to allow users to easily choose dates that are unavailable at the cottage (and update them in the future).
Any ideas for a calendar or a simple and effective method to do this would be great.
I've seen various jQuery and similar calendars, however none of them seem to make it easy to select dates in various months, etc.
Thanks in advance,
Craig
One way to make a calendar easy to use for the selection of date ranges is to imitate certain aspects of how a roulette board is used to select ranges of numbers. For example, clicking on a region either side of a week would select the whole week, while clicking on a region above/below a particular day of the week would select all days of the week for that month.
Roulette - this came to my mind
http://www.marca.com/deporte/futbol/mundial/sudafrica-2010/calendario-english.html

Resources