Suggested method of handling non-overlapping ranges (e.g. scheduling) - database

I've seen this sort of problems a few times and am trying to decide the best way of storing ranges in a non-overlapping manner. For example, when scheduling some sort of resource that only one person can use at a time. Mostly what I've seen is something like this:
PERSON ROOM START_TIME END_TIME
Col. Mustard Library 08:00 10:00
Prof. Plum Library 10:00 12:00
What's the best way of preventing new entries from overlapping an existing schedule, like say if Miss Scarlet wants to reserve the library from 11:00 to 11:30? An inline constraint won't work and I don't think this can easily be done in a trigger. A procedure that handles all inserts that initially looks for an existing conflict in the table?
Secondly, what's the best way to handle concurrency issues? Say Miss Scarlet wants the library from 13:00 to 15:00 and Mrs. White wants it from 14:00 to 16:00. The procedure from (1) would find both these schedules acceptable, but clearly taken together, they are not. The only thing I can think of is manual locking of the table or some sort of mutex.
What's a good primary key for the table above, (room, start_time)?

Fast working way for the cases, where you have fixed time ranges, you can store all ranges in separate table then simply link it to the "reserves" table. It can do the trick for fixed ranges, for example you can reserver library only with 30min interval and working hours is like from 8am to 8pm, just 24 records needed.
--Person table---------------
ID PERSON ROOM
1 Col. Mustart Library
2 Proof. Plum Library
--Timeshift table------------
ID START_TIME END_TIME
1 08:00 08:30
2 08:30 09:00
....
24 19:30 20:00
--Occupy table----
DATE TIMESHIFT PERSON
TRUNC(SYSDATE) TS_ID P_ID
08/12/2012 4 1
08/12/2012 5 1
08/12/2012 9 2
08/12/2012 10 2
Now you make it PK or UK and your database driven check is ready. It will be fast, with little data overhead. However using the same routine for every second will be not that effective.
More universal and complex way is to let some procedure (or trigger) check, is your range occupied or not and you will have to check all current records.

Related

Manufacturing process cycle time database design

I want to create a database to store process cycle time data. For example:
Say a particular process for a certain product, say welding, theoretically takes about 10 seconds to do (the process cycle time). Due to various issues, the machine's actual cycle time would vary throughout the day. I would like to store the machine's actual cycle time throughout the day and analyze it over time (days, weeks, months). How would i go about designing the database for this?
I considered using a time series database, but i figured it isn't suitable - cycle time data has a start time and an end time - basically i'm measuring time performance over time - if this even makes sense. At the same time, I was also worried that using relational database to store and then display/analyze time related data is inefficient.
Any thoughts on a good database structure would be greatly appreciated. Let me know if any more info is needed and i will gladly edit this question
You are tracking the occurrence of an event. The event (weld) starts at some time and ends at some time. It might be tempting to model the event entity like so:
StationID StartTime StopTime
with each welding station having a unique identifier. Some sample data might look like this:
17 08:00:00 09:00:00
17 09:00:00 10:00:00
For simplicity, I've set the times to large values (1 hour each) and removed the date values. This tells you that welding station #17 started a weld at 8am and finished at 9am, at which time the second weld started which finished at 10am.
This seems simple enough. Notice, however, that the StopTime of the first entry matches the StartTime of the second entry. Of course it does, the end of one weld signals the start of the next weld. That's how the system was designed.
But this sets up what I call the Row Spanning Dependency antipattern: where the value of one field of a row must be synchronized with the value of another field in a different row.
This can create any number of problems. For example, what if the StartTime of the second entry showed '09:15:00'? Now we have a 15 minute gap between the end of the first weld and the start of the next. The system does not allow for gaps -- the end of each weld also starts the next weld. How should be interpret this gap. Is the StopTime of the first row wrong. Is the StartTime of the second row wrong? Are both wrong? Or was there another row between them that was somehow deleted? There is no way to tell which is the correct interpretation.
What if the StartTime of the second entry showed '08:45'? This is an overlap where the start of the second cycle supposedly started before the first cycle ended. Again, we can't know which row contains the erroneous data.
A row spanning dependency allows for gaps and overlaps, neither of which is allowed in the data. There would need to be a large amount of database and application code required to prevent such a situation from ever occurring, and when it does (as assuredly it will) there is no way to determine which data is correct and which is wrong -- not from within the database, that is.
An easy solution is to do away with the StopTime field altogether:
StationID StartTime
17 08:00:00
17 09:00:00
Each entry signals the start of a weld. The end of the weld is indicated by the start of the next weld. This simplifies the data model, makes it impossible to have a gap or overlap, and more precisely matches the system we are modeling.
But we need the data from two rows to determine the length of a weld.
select w1.StartTime, w2.StartTime as StopTime
from Welds w1
join Welds w2
on w2.StationID = w1.StationID
and w2.StartTime =(
select Max( StartTime )
from Welds
where StationID = w2.StationID
and StartTime < w2.StartTime );
This may seem like a more complicated query that if the start and stop times were in the same row -- and, well, it is -- but think of all that checking code that no longer has to be written, and executed at every DML operation. And since the combination of StationID and StartTime would be the obvious PK, the query would use only indexed data.
There is one addition to suggest. What about the first weld of the day or after a break (like lunch) and the last weld of the day or before a break? We must make an effort not to include the break time as a cycle time. We could include the intelligence to detect such situation in the query, but that would increase the complexity even more.
Another way would be to include a status value in the record.
StationID StartTime Status
17 08:00:00 C
17 09:00:00 C
17 10:00:00 C
17 11:00:00 C
17 12:00:00 B
17 13:00:00 C
17 14:00:00 C
17 15:00:00 C
17 16:00:00 C
17 17:00:00 B
So the first few entries represent the start of a cycle, whereas the entry for noon and 5pm represents the start of a break. Now we just need to append the line
where w1.Status = 'C'
to the end of the query above. Thus the 'B' entries supply the end times of the previous cycle but do not start another cycle.

How to store sets of objects that have occurred together during events?

I'm looking for an efficient way of storing sets of objects that have occurred together during events, in such a way that I can generate aggregate stats on them on a day-by-day basis.
To make up an example, let's imagine a system that keeps track of meetings in an office. For every meeting we record how many minutes long it was and in which room it took place.
I want to get stats broken down both by person as well as by room. I do not need to keep track of the individual meetings (so no meeting_id or anything like that), all I want to know is daily aggregate information. In my real application there are hundreds of thousands of events per day so storing each one individually is not feasible.
I'd like to be able to answer questions like:
In 2012, how many minutes did Bob, Sam, and Julie spend in each conference room (not necessarily together)?
Probably fine to do this with 3 queries:
>>> query(dates=2012, people=[Bob])
{Board-Room: 35, Auditorium: 279}
>>> query(dates=2012, people=[Sam])
{Board-Room: 790, Auditorium: 277, Broom-Closet: 71}
>>> query(dates=2012, people=[Julie])
{Board-Room: 190, Broom-Closet: 55}
In 2012, how many minutes did Sam and Julie spend MEETING TOGETHER in each conference room? What about Bob, Sam, and Julie all together?
>>> query(dates=2012, people=[Sam, Julie])
{Board-Room: 128, Broom-Closet: 55}
>>> query(dates=2012, people=[Bob, Sam, Julie])
{Board-Room: 22}
In 2012, how many minutes did each person spend in the Board-Room?
>>> query(dates=2012, rooms=[Board-Room])
{Bob: 35, Sam: 790, Julie: 190}
In 2012, how many minutes was the Board-Room in use?
This is actually pretty difficult since the naive strategy of summing up the number of minutes each person spent will result in serious over-counting. But we can probably solve this by storing the number separately as the meta-person Anyone:
>>> query(dates=2012, rooms=[Board-Room], people=[Anyone])
865
What are some good data structures or databases that I can use to enable this kind of querying? Since the rest of my application uses MySQL, I'm tempted to define a string column that holds the (sorted) ids of each person in the meeting, but the size of this table will grow pretty quickly:
2012-01-01 | "Bob" | "Board-Room" | 2
2012-01-01 | "Julie" | "Board-Room" | 4
2012-01-01 | "Sam" | "Board-Room" | 6
2012-01-01 | "Bob,Julie" | "Board-Room" | 2
2012-01-01 | "Bob,Sam" | "Board-Room" | 2
2012-01-01 | "Julie,Sam" | "Board-Room" | 3
2012-01-01 | "Bob,Julie,Sam" | "Board-Room" | 2
2012-01-01 | "Anyone" | "Board-Room" | 7
What else can I do?
Your question is a little unclear because you say you don't want to store each individual meeting, but then how are you getting the current meeting stats (dates)? In addition any table given the right indexes can be very fast even with alot of records.
You should be able to use a table like log_meeting. I imagine it could contain something like:
employee_id, room_id, date (as timestamp), time_in_meeting
Where foreign keys to employee id to employee table, and room id key to room table
If you index employee id, room id, and date you should have a pretty quick lookup as mysql multiple-column indexes go left to right such that you gain index on (employee id, employee id + room id, and employee id + room id + timestamp) when do searches. This is explained more in the multi-index part of:
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
By refusing to store meetings (and related objects) individually, you are loosing the original source of information.
You will not be able to compensate for this loss of data, unless you memorize on a regular basis the extensive list of all potential daily (or monthly or weekly or ...) aggregates that you might need to question later on!
Believe me, it's going to be a nightmare ...
If the number of people are constant and not very large you can then assign a column to each person for present or not and store the room, date and time in 3 more columns this can remove the string splitting problems.
Also by the nature of your question I feel first of all you need to assign Ids to everything rooms,people, etc. No need for long repetitive string in DB. Also try reducing any string operation and work using individual data in each column for better intersection performance. Also you can store a permutation all the people in a table and assign a id for them then use one of those ids in the actual date and time table. But all techniques will require that something be constant either people or rooms.
I do not understand whether you know all "questions" in design time or it's possible to add new ones during development/production time - this approach would require to keep all data all the time.
Well if you would know all your questions it seems like classic "banking system" which recalculates data on daily basis.
How I think about it.
Seems like you have limited number of rooms, people, days etc.
Gather logging data on daily basis, one table per day. Just one event, one database row, all information (field) what you need.
Start to analyse data using some crone script at "midnight".
Update stats for people, rooms, etc. Just increment number of hours spent by Bob in xyz room etc. All what your requirements need.
As analyzed data are limited and relatively small as you analyzed (compress) them, your system can contain also various queries as indexes would be relatively small etc.
You could be able to use scalable map/reduce algorithm.
You can't avoid storing the atomic facts as follows: (the meeting room, the people, the duration, the day), which is probably only a weak consolidation when the same people meet multiple times in the same room on the same day. Maybe that happens a lot in your office :).
Making groups comparable is an interesting problem, but as long as you always compose the member strings the same, you can probably do it with string comparisons. This is not "normal" however. To normalise you'll need a relation table (many to many) and compose a temporary table out of your query set so it joins quickly, or use an "IN" clause and a count aggregate to ensure everyone is there (you'll see what I mean when you try it).
I think you can derive the minutes the board room was in use as meetings shouldn't overlap, so a sum will work.
For storage efficiency, use integer keys for everything with lookup tables. Dereference the integers during the query parsing, or just use good old joins if you are feeling traditional.
That's how I would do it anyway :).
You'll probably have to store individual meetings to get the data you need anyway.
However you'll have to make sure you aggregate and anonymise it properly before creating your reports. Make sure to separate concerns and access levels to stay within the proper legal limits on data.

Large amount of timecourses in database

I have a rather large amount of data (~400 mio datapoints) which is organized in a set of ~100,000 timecourses. This data may change every day and for reasons of revision-safety has to be archived daily.
Obviously we are talking about way too much data to be handled efficiently, so I made some analysis on sample data. Approx. 60 to 80% of the courses do not change at all between two days and for the rest only a very limited amount of the elements changes. All in all I expect much less than 10 mio datapoints change.
The question is, how do I make use of this knowledge? I am aware of concepts like the Delta-Trees used by SVN and similar techniques, however I would prefer, if the database itself would be capable of handling such semantic compression. We are using Oracle 11g for storage and the question is, is there a better way than a homebrew solution?
Clarification
I am talking about timecourses representing hourly energy-currents. Such a timecourse might start in the past (like 2005), contains 8760 elements per year and might end any time up to 2020 (currently). Each timecourse is identified by one unique string.
The courses themselves are more or less boring:
"Course_XXX: 1.1.2005 0:00 5; 1.1.2005 1:00 5;1.1.2005 2:00 7,5;..."
My task is making day-to-day changes in these courses visible and to do so, each day at a given time a snapshot has to be taken. My hope is, that some loss-free semantical compression will spare me from archiving ~20GB per day.
Basically my source data looks like this:
Key | Value0 | ... | Value23
to archive that data I need to add an additional dimension which directly or indirectly tells me the time at which the data was loaded from the source-system, so my archive-database is
Key | LoadID | Value0 | ... | Value23
Where LoadID is more or less the time the source-DB was accessed.
Now, compression in my scenario is easy. LoadIDs are growing with each run and I can give a range, i.e.
Key | LoadID1 | LoadID2 | Value0 | ... | Value23
Where LoadID1 gives me the ID of the first load where the 24 values where observed and LoadID2 gives me the ID of the last consecutive load where the 24 values where observed.
In my scenario, this reduces the amount of data stored in the database to 1/30th

Storing and searching opening/closing times for stores

I'm writing an application that indexes data for our stores, some of which are open late (8 am - 2 am). We need to be able to search this database quickly -- basically, to run a query to find which stores are open at a given point in time (now, Sunday at 1 am, whatever).
In addition, the open/close times can vary day-by-day -- some stores are closed on Sundays, for example.
The obvious solution to me would be to make a table where I have a row with the store ID, day, open time, and close time. For something like Monday, 8 am - 2 am, that would actually be two rows, one for Monday 0800 - 2400, and one for Tuesday 0000 - 0200.
We have a lot of stores, so the search has to perform well (basically, the data has to be index-friendly), but I'll also have to display this data back out in a human-readable format. With my current solution, that'd look something like this:
Monday: 8:00 - Midnight
Tuesday: Midnight - 2:00 am; 8:00 am - Midnight
I'm just wondering if anybody else has alternative solutions before I jump right to an implementation. Thanks!
When PBS (the US Public Broadcasting System) faced this same problem a couple of years ago, they invented the idea of the "30 hour day" -- Where 00:00 is midnight at the start of the day, 24:00 is midnight at the end of the day, 25:00 is 1am the next day, 30:00 is 6am the next day. That way Mon closing time of 26:00 is 2am Tues morning.
Rather than two records representing a single store's times for a day, it may be more object oriented to think of the "store day" as the object. That way 1 record = 1 store's times for a day. If you want to store the two sets of open/close times, just use four fields in the record instead of two--and adjust your queries appropriately.
Remember that your queries should use a library/api that you write and publish. The library will then deal with the data store and its data layout. No one but your library should be looking at the db directly.
Time zones are very important in this sort of app too. (Hopefully) at some point, the store chain will expand to cover more than one time zone. You'll then need to determine the local time of the query. -- May not the same as the time zone of your server which is handling the queries.
Further thoughts--
I now see that you're standardizing to GMT. Good. You could also use datetime values (vs time values) and standardize to a given week in time. Eg open time is Sun Jan 1, 1995 10am - Mon Jan 2, 1995 2am (using Jan 1, 1995 as a base since it was a Sunday).
Then rationalize your "current time and date" to match the same point in the week of Jan 1, 1995. Then query to find open store days.
HTH,
Larry

Booking system dates in database

I need some help with the following:
I am setting up a booking system (kind of hotel booking) and I have inserted a check in date and a check out date into database, how do I go to check if a room is already booked?
I have no clue how to manage the already booked days in the database. Is there anyone who can give me a clue how this works? And maybe where I can find more information?
Well, I didn't understand very well your question, but my suggestion is to you to add a state field, in which you can control the current state of the "booked" item. something like
Available
Under Maintenance
Occupied
Or whatever bunch of states that work for you.
[EDIT]
The analysis that I use to do for that case is as follows:
Take for instance, your room is currently booked with these date range:
Init Date: Feb 8
End Date: Feb 14
Success Booking Examples
Init Date: Feb 2
End Date: Feb 6
Init Date: Feb 15
End Date: Feb 24
You should check that the booking attempt satisfies these conditions:
Neither "Booking Init Date" nor "Booking End Date" can be inside of the already booked date.
Example:
Init Date: Feb 2
End Date: Feb 10 (Inside the current range (Feb 8 to 14))
Init Date: Feb 12 (Inside the current range (Feb 8 to 14))
End Date: Feb 27
if "Booking Init Date" is less than current init date, "Booking End Date" should also be less than current init date
Init Date: Feb 2.
End Date: Feb 27 (Init date before, but end date later)
This is an interesting question - not least because I don't believe that there is a single ideal answer as it will depend to some extent on the nature of the "Hotel" and the way in which people are placed in rooms and to a further extent on the scale of the question.
At the most basic level you have two ways that you can track occupancy of rooms:
You can have a table with one row per room per day that defines the state of that room on that date (and possibly the related booking if occupied)
For each room you maintain a list of "bookings" (which as already suggested will have to include states for when a room is unavailable for maintenance).
The reason that its an interesting question is that these both immediately present you with certain challenges when maintaining the data and searching for occupancy in the case of the former you're holding a lot of data that may not be needed and in the case of the latter finding gaps in the occupancy for new bookings is perhaps more interesting.
You then (in both cases) have a futher problem of resource allocation - if your bookings tend to be for two days and you system results in 1 day gaps between bookings you're going to need to re-arrange the occupancy to optimise usage... but then you have to be careful with bookings that need to be explicitly tied to specific rooms.
Ideally you would defer assigning a booking to a room for as long as possible (which is why it depends on the hotel - fine for 400 modular rooms rather less so for a dozen unique ones) - so long as there are sufficient rooms of the necessary standard available (which you invert, so long as there are fewer rooms booked than real rooms) during a target period you can take the booking. Of course you've still then got to represent the state of the rooms so this is in addition to the data you've got to manage.
All of which is what makes it interesting - there is considerable depth to the problem, you need to have a fairly decent understanding of the specific problem to produce an appropriate solution.
I have come to the booking problem from the perspective of avoiding a highly populated table, giving that the inventory is thousands rather than hundreds.
solution one - intervals
solution two - populate slots only when are occupied using the smallest unit (1 day)
solution three - generate slots in advance for each resource and manage the status
Solution one has smallest size footprint but since you cannot guess if your searched range is already in an interval or not - you have to read and compare the whole table.
Solution two solves this problem and you can search for a specific time-frame only but the table contains more lines. However since the empty slots are not written anywhere a high vacancy will reduce the size of the table.
Another advantage is that old bookings can be transferred to a separate table.
Solution three increases the size of the table to a maximum minSlotresourcetime and the lines are generated in advance. The only advantage that I can think of is the cost of finding empty slots with a simple select.
However generating the slots in advance looks like a terrible idea to me.

Resources