Checking for overlapping car reservations - database

I'm writing a simple booking program for a car rental (a school assignment). Me and my buddy are trying to make the system a little more advanced than the assignment dictates, but we're having some problems we hoped you could help us with.
The idea is that you can reserve a certain car type, and when you get the car it will be one of that type (you don't reserve a specific car, as our assignment dictates, but only a type). Only one customer can have the car on a specific date. As the reservations tick in we have to make sure, that we don't hire out more cars of each type than we've got. The reservations are basically stored with a start date, an end date, and a car type.
If we ignore the car type for now (lets say we only have one type) then the reservations could graphically look something like this:
1/12 2/12 3/12 4/12 5/12 6/12 7/12
|-------------------|
|-----------------|
|-----|
|-------|
|-----------|
|-------------|
If the rental only has three cars it would be possible to rent a car from 3/12 to 5/12 since all days only have 2 car reservations. But how do we know this? Do we have to check each date and count() the number of reservations that spans over that date?
And what if somebody had reserved a car on 4/12, then 3/12 and 5/12 would still only have 2 reservations, but 4/12 would have 3.
Would it be possible to do with a query some how, or do we have to step through each date in the program to check the number of reservations didn't exceed the number of cars?
(This is easy enough with only full dates, but consider the scenario where you could rent the cars on an hourly basis (not only on a daily as here). Then it could be a though one to step through each our if we have a lot of reservations and cars and the timespan is long...)
Hope you have some nice ideas that will help us along. Thanks for taking the time to read the question :)
Mikkel, Denmark

Assume, You have such reservation situation in real life:
1/12 2/12 3/12 4/12 5/12 6/12 7/12
Car1: |-------------------|
Car2: |-----------------|
Car3: |-------| |-----------| |-----|
Car4: |-------------|
Table car
| id | type | registration |
| 1 | 1 | HH1111 |
| 2 | 1 | HH3333 |
| 3 | 2 | HH77 |
| 4 | 3 | DD999 |
Table reservation
| car_id | date_from | date_to |
| 1 | 2013-12-01 | 2013-12-04 |
| 2 | 2013-12-04 | 2013-12-07 |
| 3 | 2013-12-01 | 2013-12-02 |
| 3 | 2013-12-03 | 2013-12-05 |
| 3 | 2013-12-06 | 2013-12-07 |
| 4 | 2013-12-01 | 2013-12-03 |
Now, You must by really simple logic, select all available cars for period
from 2013-12-05 to 2013-12-06
"Select ALL cars, which does not have any reservation with dates, which blocks it for usage"
with brillian mysql select:
select * from car where not exists ( select * from reservation
where car.id = reservation.car_id AND
date_from < '2013-12-06' AND
date_to > '2013-12-05' )

"Would it be possible to do with a query some how, or do we have to step through each date in the program to check the number of reservations didn't exceed the number of cars? (This is easy enough with only full dates,"
The nature of your problem is that a violation of the constraint could appear on any individual date. So logically speaking, it is indeed necessary to do the check for each individual date comprised in a new reservations. The only optimisation possible would be to do the check at the level of "smallest intervals". To do that, you must first compute all the intervals that already appear in the database, and which overlap with your new reservation.
For example, a new reservation for 4/12-6/12 would have to be split into 4/12-5/12 (second line) and 5/12-6/12 (third line). Those individual intervals might be longer than one single day, and you can do the checks on the level of those individual intervals. (They are the same as individual days in this particular example, but a reservation 7/12-19/12 would not have to be split at all.
However, computing this might prove difficult, and there's another caveat: when you're looking al multi-row inserts, you should also be splitting over the other rows to be inserted (and that requires you to record all the inserted rows in a temporary table, otherwise you won't be able to access them).

Related

How sample data that has to be distributed in different criteria

I am looking for a way to sample data using 2 different criterias, is there anyone who can assist?
I have this that that I have clean with 2000 records. I would like to sample 100 clients distributed at 80% employed and 20 % self employed, furthermore on I have to apply another criteria. Each of the employed and self_employed sample will have to be further distributed by profession, 20% Lawyers, 10% Doctors, 50% Engineers and 20% Accountants.
this is what the data looks like:
Client ID | self employed | Profession
123456 | yes |lawyer
123457 | no |doctor
123458 | yes |accountant
123459 | yes |accountant
123460 | yes |engineer
123461 | yes |lawyer
123462 | no |engineer
123456 | yes |doctor
123456 | yes |lawyer
123456 | yes |engineer
I can't help with the SQL, but the basic idea is straightforward. You need to cross the categories of employment by the professions, with the desired percentages in the margins. Then fill out the table by multiplying the row and column percentages:
employed | unemployed
-------- | -----------
Lawyer | 16% | 4% | 20%
Doctor | 8% | 2% | 10%
Engineer | 40% | 10% | 50%
Accountant | 16% | 4% | 20%
-------- -----------
80% 20%
The entries in the table are what percentage of each crossed category you want in your sample. Since you want a total sample size of 100, multiply each percentage by 100 to get the desired sample size. Given your stated proportions, you want 16 employed lawyers, 4 unemployed lawyers, 8 employed doctors, etc.
Divide your data into subsets corresponding to the 8 categories, and randomly select the appropriate number from each subset. I don't know if SQL provides a random shuffling capability, but if so that's an easy way to select the sample without replacement. Shuffle the employed lawyers and take the first 16, shuffle the unemployed lawyers and take the first 4, and so on. Note that this presumes that each category has enough elements to supply the desired size sample.

SSAS - MDX calculated member

I've a fact table that details individual line amounts for orders placed by my organisation. In this fact, at line level, I've included the total order amount to be used, as it's possible we might need that level of detail at some point.
Here's an example of what I've got:-
+------------+------------+---------------+------------+---------------------+
| BookingKey | Booking_ID | Category_FKey | Line_Value | Total_Booking_Value |
+------------+------------+---------------+------------+---------------------+
| 1 | 12 | 8 | 150 | 700 |
| 2 | 12 | 4 | 150 | 700 |
| 3 | 12 | 5 | 300 | 700 |
| 4 | 12 | 4 | 100 | 700 |
+------------+------------+---------------+------------+---------------------+
As you can see, the Total_Booking_Value here is the sum of the Line_Value for the booking in the example (Booking_ID = 12).
The Category_FKey looks up to a Categories dimension.
Using this structure I've created a simple cube and this works fine, mainly.
The issue I have is that I'd like to be able to view the Total Line_Value amount, and somehow include the Total_Booking_Value alongside it.
So, for example I might add the Categories dimension as a filter and want to filter by say Category_FKey = 4.
If this was the case I'd want the aggregates to tell me that the total Line_Value was 250 (for BookingKeys 2 and 4), and the Total_Booking_Value should be 700. Using normal aggregation (ie SUM) I'm getting the Total_Booking_Value as 1400 (obviously - because it's adding 700 * 2 for the two rows the cube would return).
So, the way I see it I'd like to create an MDX calculation that somehow takes the Total_Booking_Value and gives just the value for the Booking in question.
Should this be done using some kind of average, or division by the Distinct number of items? I can't figure this out. I tried something like this:-
create member currentcube.measures.[Calculated Booking Value]
as
[Measures].[Total_Booking_Value] / count(Measures.Booking_ID);
But this isn't working.
Hopefully this makes sense and you can point me in the right direction.
I find it strange that booking_ID is a measure - intuitively it strikes me as something that would be an attribute and therefore a hierarchy - in which case you'd be able to do the count like this:
[Measures].[Total_Booking_Value]
/
COUNT(EXISTING [Booking].[Booking_ID].[Booking_ID].members)
A straightforward solution would be to have two fact tables: one with granularity booking key and one with granularity booking id. The first would contain all columns except total booking value, and the second would contain columns booking id and total booking value.
Then each of both measures would easily be summable.
The reference type between the second fact table and the category dimension could be configures as many-to-many via the first fact table. Thus, you would see the full values of the involved bookings for each selected category, automatically eliminating double counting.

How to deal with Variable data over time in associations

In linked models (let's say a drink transaction, a waiter, and a restaurant), when you want to display data, you look for informations in your linked content :
Where was that beer bought ?
Fetch Drink transaction => Fetch its Waiter => Fetch this waiter's Restaurant : this is where the beer was purchased
So at time T, when I display all transactions, I fetch my data following associations, thus I can display this :
TransactionID Waiter Restaurant
1 Julius Caesar's palace
2 Cleo Moe's tavern
Let's say now that my waiter is moved to another restaurant.
If I refresh this table, the result will be
TransactionID Waiter Restaurant
1 Julius Moe's tavern
2 Cleo Moe's tavern
But we know that the transaction n°1 was made in Caesar's palace !
Solution 1
Don't modify the waiter Julius, but clone it.
Upside : I keep an association between models, and still can filter with every field of every associated models.
Downside : Every modification on every model duplicates content, which can do a LOT when time passes.
Solution 2
Keep a copy of the current state of your associated models when you create the transaction.
Upside : I don't duplicate the contents.
Downside : You can't anymore use fields on your content to display, sort or filter them, as your original and real data is inside, let's say, a JSON field. So you have to, if you use MySQL, filter your data by makin plain-search queries in that field.
What is your solution ?
[EDIT]
The problem goes further, as it's not only a matter when association changes : a simple modification on an associated model causes a problem too.
What I mean :
What's the amount of this order ?
Fetch Drink transaction => Fetch its product => Fetch this product's Price => Multiply by order quantity : this is the total amount of the order
So at time T, when I display all transactions, I fetch my data following associations, thus I can display this :
TransactionID Qty ProductId
1 2 1
ProductID Title Price
1 Beer 3
==> Amount of order n°1 : 6.
Let's say now that the beer costs 2,5.
If I refresh this table, the result will be
TransactionID Qty ProductId
1 2 1
ProductID Title Price
1 Beer 2,5
==> Amount of order n°1 : 5.
So, once again, the 2 solutions are available : do I clone the beer product when its price is changed ? Do I save a copy of beer in my order when the order is made ? Do you have any third solution ?
I can't just add an "amount" attribute on my orders : yes it can solve that problem (partially) but it's not a scalable solution as many other attributes will be in the same situation and I can't multiply attributes like this.
Event Sourcing
This is a good use case for Event Sourcing. Martin Fowler wrote a very good article about it, I advise you to read it.
there are times when we don't just want to see where we are, we also want to know how we got there.
The idea is to never overwrite data but instead create immutable transactions for everything you want to keep a history of. In your case you'll have WaiterRelocationEvents and PriceChangeEvents. You can recreate the status of any given time by applying every event in order.
If you don't use Event Sourcing, you lose information. Often it's acceptable to forget historic information, but sometimes it's not.
Lambda Architecture
As you don't want to recalculate everything on every single request, it's advisable to implement a Lambda Architecture. That architecture is often explained with BigData technology and frameworks, but you could implement it with Plain Old Java and CronJobs.
It consists of three parts: Batch Layer, Service Layer and Speed Layer.
The Batch Layer regularly calculates an aggregated version of the data, for example you'll calculate the monthly income once per day. So the current month's income will change every night until the month is over.
But now you want to know the income in real-time. Therefore you add a Speed Layer, which will apply all events of the current date immediately. Now if a request of the current month's income arrives, you'll add up the last result of the Batch Layer and the Speed Layer.
The Service Layer allows more advanced queries by combing multiple batch results and the Speed Layer results into one query. For example you can calculate the year's income by summing the monthly incomes.
But as said before, only use the Lambda approach if you need the data often and fast, because it adds extra complexity. Calculations which are rarely needed, should be run on-the-fly. For example: Which waiter creates the most income at Saturday evenings?
Example
Restaurants:
| Timestamp | Id | Name |
| ---------- | -- | --------------- |
| 2016-01-01 | 1 | Caesar's palace |
| 2016-11-01 | 2 | Moe's tavern |
Waiters:
| Timestamp | Id | Name | FirstRestaurant |
| ---------- | -- | -------- | --------------- |
| 2016-01-01 | 11 | Julius | 1 |
| 2016-11-01 | 12 | Cleo | 2 |
WaiterRelocationEvents:
| Timestamp | WaiterId | RestaurantId |
| ---------- | -------- | ------------ |
| 2016-06-01 | 11 | 2 |
Products:
| Timestamp | Id | Name | FirstPrice |
| ---------- | -- | -------- | ---------- |
| 2016-01-01 | 21 | Beer | 3.00 |
PriceChangeEvent:
| Timestamp | ProductId | NewPrice |
| ---------- | --------- | -------- |
| 2016-11-01 | 21 | 2.50 |
Orders:
| Timestamp | Id | ProductId | Quantity | WaiterId |
| ---------- | -- | --------- | -------- | -------- |
| 2016-06-14 | 31 | 21 | 2 | 11 |
Now let's get all information about order 31.
get order 31
get price of product 21 at 2016-06-14
get last PriceChangeEvent before the date or use FirstPrice if none exists
calculate total price by multiplying retrieved price with quantity
get waiter 11
get waiter's restaurant at 2016-06-14
get last WaiterRelocationEvent before the date or use FirstRestaurant if none exists
get restaurant name by retrieved restaurant id of the waiter
As you can see it becomes complicated, therefore you should only keep history of useful data.
I wouldn't involve the relocation events in the calculation. They could be stored, but I would store the restaurant id and the waiter id in the order directly.
The price history on the other hand could be interesting to check if orders went down after a price change. Here you could use the Lambda Architecure to calculate a full order with prices from the raw order and the price history.
Summary
Decide of which data you want to keep the history.
Implement Event Sourcing for that data.
Use the Lambda Architecture to speed up commonly used queries.
I like the question as it raises something very straightforward and also something more subtle.
The common principle in both cases is that ‘History must not change’, meaning if we run a query over a specified past date range today the results are the same as when we run that same query at any point in the future.
Waiters Case
When a waiter changes restaurants we must not change the history of sales. If waiter Julius sells a drink yesterday in restaurant 1 then he switches to sell more drinks today in restaurant 2 we must retain those details.
Thus we want to be able to answer queries such as ‘how many drinks has Julius sold in restaurant 1’ and ‘how many drinks has Julius sold in all restaurants’.
To achieve this you have to abstract away from Julius as a waiter by bringing in a concept of staff. Julius is a member of staff. Staff work as waiters. When working in restaurant 1 Julius is waiter A and when he works in another restaurant he is waiter B, but always the same member of staff – Julius. With an entity ‘Staff’ the queries can be answered easily.
Upside:
No loss of historic data or excessive duplications.
Downside New entity Staff must be managed. But waiter table content is reduced making net overhead of data storage is low.
In summary - abstract data subject to change into a new entity and refer back to it from transactions.
Value of Order Case
The extended use case regarding ‘what is the value of this order’ is more involved. I work in cross-currency transactions where value for the observer (user) in the price list changes from day to day as currency fluctuations occur.
But there are good reasons to lock the order value in place. For example invoice processing systems have tolerance for a small difference between their expected invoice value and that of the submitted invoice, but any large difference can lead to late payment whilst invoice handlers check the issue. Also, if customers run reports on their historic purchases then the values of those orders must remain consistent despite fluctuations in currency rates over time.
The solution is to save into the order line:
the value of product in the customers currency,
or the rate between custom and supplier currency,
but ideally do both to avoid rounding errors.
What this does is provide a statement that ‘on the date that this order was placed line 1 cost $44.56 at exchange rate 1.1 $/£’. Having this data locked in allows you to invoice to the customers expectation and provide consistent spend reports over time.
Upside: Consistent historic data. Fast database performance as no look-ups required against historic rate tables.
Downside: Some data duplication. However, trading off against overhead of storage and indexation for historic rate storage plus indexation then this is possibly an upside.
Regarding adding 'amount' to your order table - you have to do this if you want to achieve a consistent data history. If you only work in one currency then amount is the only additional storage concern. And by adding this one attribute you have protected history. Your other alternative is to store a historic cost table for drinks so you know in January beer was $1, in February it as $1.10 etc and then store the cost-table key in the transaction so that you can look up the cost if anyone asks about a historic order. But the overhead on storing the key PLUS the indexes needed to make this practicable will outweigh the storage cost of cloning 'amount' onto the order record.
In summary - clone cost data that will change over time.

Database Design - how to store quantities that are measured in different ways

I would like to know if the database design i have in mind for an online food store is good according to the usually followed standards and conventions.
Basically the confusion i have is how to store items whose quantity is measured in different ways.
For example, there are items that are measured in terms of kilograms and then there are items measured in terms of number of packets.
For example rice is measured in kilograms and something like say, Noodles would be measured in terms of number of packets.
so the tables are planned to have below fields:
Items table with the fields: category,name,company,variant and a boolean variable named measured_in_packets?..
for items where measured_in_packets is set to true, an entry in another table will hold the available packet sizes:
packet_sizes table with item_id and packet_size..
so if one product is available in multiple packet sizes (250 gm, 500 gm etc), a row would be made for each available size against the item id...
does this sound like a good database design?
In a nutshell, you have items which have a quantity value, but that quantity value can be measured in different kinds of measurement types. You gave examples such as kilograms, packages, and we can perhaps add others such as litres for liquids, etc.
One of the problems with the current solution is that is doesn't allow for any easy alteration or expansion. It also relies on the checking of a boolean field in order to make decisions (such as which table to join I believe, based on your description).
Instead, a better approach would be to create a table containing the possible measurement types, such as kilograms or packets. Your items then simply have a foreign key to this table, and that tells you how the item is measured. This allows you to expand the types in the future, and no need to maintain a boolean flag, or do any other manual work.
This diagram illustrates what I'm referring to:
So if the data in these tables looked like this:
items
+----+---------+----------+----------------------+
| id | name | quantity | measurement_types_id |
+----+---------+----------+----------------------+
| 1 | Rice | 50 | 1 |
| 2 | Noodles | 75 | 2 |
+----+---------+----------+----------------------+
measurement_types
+----+-----------+--------------------+
| id | name | measurement_symbol |
+----+-----------+--------------------+
| 1 | Kilograms | kg |
| 2 | Packets | packets |
+----+-----------+--------------------+
A practical example of this data using the following query:
SELECT items.name, items.quantity, measurement_types.measurement_symbol
FROM items
INNER JOIN measurement_types
ON measurement_types.id = items.measurement_types_id;
would yield this result:
+---------+----------+--------------------+
| name | quantity | measurement_symbol |
+---------+----------+--------------------+
| Rice | 50 | kg |
| Noodles | 75 | packets |
+---------+----------+--------------------+

Summarizing across multiple columns in sql or crystal

I was wondering if there was a way to get a distinct count on a certain column based on the value of a second column while still getting a total count of the first column. This is an example of the issue I'm facing. I have a query that returns an i-Vent type, ID, Status, and linked medication orders for a pharmacy intervention system. The interventions are grouped by i-Vent type. The Status can be one of five values or NULL. I need to be able to count how many i-Vents were recorded as each of the six possible values for Status.
An example set may look similar to this:
________________________________________________________
Type | ID | Status | Linked Meds
________________________________________________________
IV2PO | 1234 | Accepted | pantoprazole IV
IV2PO | 1234 | Accepted | pantoprazole PO
IV2PO | 1235 | NULL | NULL
IV2PO | 1236 | Pending | metoclopramide IV
IV2PO | 1236 | Pending | metoclopramide PO
IV2PO | 1236 | Pending | Pharmacy Consult - IV2PO
Consult | 1237 | Rejected | NULL
________________________________________________________
The group summary should list IV2PO having a total count of 3 with a count of 1 for "Accepted", 1 for "NULL", and 1 for "Pending"; and Consult having a total count of 1 with a count of 1 for "Rejected".
Please take notice of the duplicate values caused by having more than one medication/order liked to an i-Vent.
Ultimately I'm building the final report in Crystal Reports so if there is a way to get the correct counts there that would be fine as well. I have a version of this which uses a subreport to get the linked medications/orders, but I'd like to find a better alternative to take less time to run and use fewer resources.
Does anyone know of a way to do this?
Thanks!
In Crystal Reports you can use Count distinct summary option
When creating a "Summary", using the Count function may not be desirable. It is often the case that a report must only return the number of unique contact records, as other tables (i.e. History) may contain multiple rows for each customer.
Select Insert | Summary.
Select the fieldname you wish to summarize.
Make sure to select Distinct Count as the Summary Operation.

Resources