Can a join table be combined with another table? - database

The first image below is my database schema for a project that will use psql, ruby and active record.
While writing my schema, things got a bit complex. My "special_days" table ended up becoming a join table for "days_of_week" and "organizations". I'm assuming that this is not best practice and will end up causing me trouble.
In the second schema below, I made a separate join table for "days of week" and "organizations". My special_days table still needs to be associated with a day_of_week and an organization, so I think I have to keep the joining information in the special_days table. Is there a better way to do this? It seems that my second attempt is too repetitive.
These are my relationships:
days of week & organizations | many to many
city & organizations | one to many
organization & special days | one to many
day of week & special days | one to many

Some more information about the requirements might be useful. Are you required to do something like define which days are holidays, which days are paydays, etc. (e.g. many organizations would define Sat and Sun as non-work days and many U.S. organization would define July 4 as a non-work day)? I'm not sure what days_of_the week represents. Is this a table with 7 records in it (M,T,W,Th,F,S,Sun)? If I'm guessing correctly at the requirements it might be better to do something like have a table called special_day that has a date column and a recurrence column (e.g. weekly, monthly, yearly, etc.). You could then have a organization_special_day table that is a many-to-many join table on organization and special_day.

Related

DynamoDB table design for business hours

I am trying to create a business hours application using DynamoDB.
I saw lots of examples and schema designs for different databases but just can't find the right table design for DynamoDB.
Here are my requirements:
every business should have default working hours (Monday 08:00 - 14:00, 16:00 - 20:00) and special events (26/11/2020 shop is closed / opened between 10:00 - 14:00 due to Thanksgiving)
every day can have multiple work durations (08:00 - 14:00, 16:00 - 20:00)
Those are the operations I need to allow:
Create / edit working hours for each business (including special events)
Check whether a business (or a list of businesses) are open right now - by providing a list of business ids.
Get business (or list of businesses) working hours between 2 dates (for example 23/04/2020 - 25/04/2020) by providing a list of business ids and date range for each id
What I've tried:
Defined a table where business id is the partition key (HASH) and special dates / day of week is the sort key (RANGE).
The problem with this approach is that I cannot query by multiple business hours unless I use the scan api which is not recommended due to expensive operations.
Please advice what kind of table design I should use for this application.
You probably need to first construct your overarching logic outside of DynamoDB, do decide if a business is working or not, and only use quarries in Dynamo for a subset of that logic.
Lets say though we use DynamoDB for querying in regards to normal working hours, and not include logic like holidays and special cases, you can use that to filter after you access Dynamo. You can't construct one query in Dynamo to answer all your questions that is more like what you can do in SQL.
So lets say we have a Table/Subset of values which relate to the normal working day. So you have something like this:
Partition Key (PK): business, Range Key (RK): dayOfWeek, and attributes, opens & closes.
We can then create 2 GSIs:
PK dayOfWeek RK opens
PK dayOfWeek RK closes
Now we can do two queries if a store is open between 3-4pm on Monday:
PK == MONDAY & opens < 2 pm
PK == MONDAY & closes > 4 pm
And collect only the values which appear in both queries.
Obviously though, having a PK of day, is probably not a great idea, as you will only have 7 partitions. So what do you do? Well you probably have more criteria in your query than simply day, for example, the type of store, the city the store is located it, etc. That would mean then you would have a PK of something like: city-category-dayOfWeek.
Similarly on the sorting side, you might want higher rated stores to be the first option, so you might have something like: {rating}-{open} & {rating}-{closes}.
You will just have to get creative, firstly layout all the queries you have before you design your tables. I really like this video on table design, it's terrific.

SQL Schema - merge data from two tables

I'm trying to create a schema that will allow me to define times, when a supplier website is non operational (planned not unplanned).
I've gone for non-operational as opposed to operational because many suppliers work 24/7, so non-operting times represent the least number of rows.
For example, a supplier might not work:
On a Sunday
On a recognised holiday date - '1/1/2015'
On a Saturday after 5pm
I'm not overly confident with SQL Server, but have come up with a schema that 'does the job'. However, as we all know, there are good ways, not so good ways, and bad ways, that all work in a fashion, so would appreciate comments and advice on what I have to date.
One of the key features is to use data from WorkingDays and Holidays together to represent a WorkingPeriod entity.
I would appreciate coments no matter how small.
Holiday
Contains all recognised holidays - Easter Monday, Good Friday etc.
HolidayDate
Contains dates of holidays. For instance, this year Easter Monday is 6th Apr 2015.
WorkingDay
Sunday through to Monday, mapped to Asp.Net day of week enums.
WorkingPeriodType
A lookup table containing 2 rows - Holiday, or Day of Week
WorkingPeriod
Merges the Holiday table and the WorkingDay table to represent a single WorkingPeriod entity that can be used in the SupplierNonWorkingTimes table.
SupplierNonWorkingTimes
Contains the ID representing the WorkingDay/Holiday and the times of non- operation.
This is a very subjective question, as you've already observed there's no right and wrong, just different ways. I'm a database guy but I don't know your specific circumstances, so this is just some observations - you'll have to judge for yourself whether any of them are appropriate to you.
I like my naming to be crystal clear, it saves all the
misunderstanding by other people later on. If [WorkingDay] holds the
7 days of the week I would call it [WeekDay]. If you intend
[Holiday] to hold whole-day holidays I would call it [HolidayDay].
The main table [SupplierNonWorkingTime] is about 'non-working' so I
would call the [WorkingPeriod] table [NonWorkingPeriod]. The term
'period' always refers to a whole day, so I would replace 'period'
with 'day' (let's ignore start/stop time for now).
My first impression was that your design is over-normalised. The
[WorkingPeriodType] table has 2 rows that will never change,
[WorkingDay] has 7. For these very low numbers I sometimes prefer a
char(1) with a check constraint. Normalisation is generally good,
but lots of JOINs for trivial queries is not so good. You could
eliminate [WorkingPeriodType] and [WorkingDay] but you've mentioned
.Net enums in your question so if you've got some sort of ORM in
your .Net code this level of normalisation might be right for you.
I'd add a Year field to the [HolidayDate] table, then the PK
becomes a better HolidayID+Year - unless you know somewhere that has
lots of Christmas' :)
I'd add an IsAllDay field to the [SupplierNonWorkingTime] table,
otherwise you have to use 'magic values' to represent 'all day' and
magic values are bad. There should be a check constraint to enforce
start/stop times can only be entered if IsAllDay = false.
Like I said, just my thoughts, hope it's helpful.

Best practice database schema for property booking calendar

I am working on a multiple properties booking system and making me headache about the best practice schema design. Assume the site hosts for example 5000 properties where each of it is maintained by one user. Each property has a booking calendar. My current implementation is a two-table-system with one table for the available dates and the other for the unavailable dates, with a granularity of 1 day each.
property_dates_available (property_id, date);
property_dates_booked (property_id, date);
However, i feel unsure if this is a good solution. In another question i read about a single table solution with both states represented. But i wonder if it is a good idea to mix them up. Also, should the booking calendar be mapped for a full year with all its 365 days per year into the database table or was it better to map only the days a property is available for booking? I think of the dramatically increasing number of rows every year. Also i think of searching the database lately for available properties and am not sure if looking through 5000 * 365 rows might be a bad idea compared to i.e. only 5000 * av. 100 rows.
What would you generally recommend? Is this aspect ignorable? How to best practice implement this?
I don't see why you need a separate table for available dates. If you have a table for booked dates (property_id, date), then you can easily query this table to find out which properties are available for a given date
select properties.property_name
from properties where not exists
(select 1 from property_dates_booked
where properties.property_id = property_dates_booked
and property_dates_booked.date = :date)
:date being a parameter to the query
Only enter actual bookings into the property_dates_booked table (it would be easier to rename the table 'bookings'). If a property is not available for certain dates because of maintenance, then enter a booking for those dates where the customer is 'special' (maybe the 'customer' has a negative id).

Database Optimization - Store each day in a different column to reduce rows

I'm writing an application that stores different types of records by user and day. These records are divided in categories.
When designing the database, we create a table User and then for each record type we create a table RecordType and a table Record.
Example:
To store data related to user events we have the following tables:
Event EventType
----- ---------
UserId Id
EventTypeId Name
Value
Day
Our boss pointed out (with some reason) that we're gonna store a lot of rows ( Users * Days ) and suggested an idea that seems a little crazy to me: Create a table with a column for each day of the year, like so:
EventTypeId | UserId | Year | 1 | 2 | 3 | 4 | ... | 365 | 366
This way we only have 1 row per user per year, but we're gonna get pretty big rows.
Since most ORMs (we're going with rails3 for this project) use select * to get the database records, aren't we optimizing something to "deoptimize" another?
What's the community thoughs about this?
This is a violation of First Normal Form. It's an example of repeating groups across columns.
Example of why this is bad: Write a query to find which day a given event occurred. You'll need a WHERE clause with 366 terms, separated by OR. This is tedious to write, and impossible to index.
Relational databases are designed to work well even if you have a lot of rows. Say you have 10000 users, and on average every user generates 10 events every day. After 10 years, you will have 10000*366*10*10 rows, or 366,000,000 rows. That's a fairly large database, but not uncommon.
If you design your indexes carefully to match the queries you run against this data, you should be able to have good performance for a long time. You should also have a strategy for partitioning or archiving old data.
That's breaks the DataBase normal forms principles
http://databases.about.com/od/specificproducts/a/normalization.htm
if it's applicable why don't you replace Day columns with a DateTime column in your event table with a default value (GetDate() we are talking about SQL)
then you could group by Date ...
I wouldn't do it. As long as you take the time to index the table appropriately, the database server should work well with tables that have lots of rows. If it's significantly slowing down your database performance, I'd start by making sure your queries aren't forcing a lot of full table scans.
Some other potential problems I see:
It probably will hurt ORM performance.
It's going to create maintainability problems on down the road. You probably don't want to be working with objects that have 366 fields for every day of the year, so there's probably going to have to be a lot of boilerplate glue code to keep track of.
Any query that wants to search against a range of dates is going to be an unholy mess.
It could be even more wasteful of space. These rows are big, and the number of rows you have to create for each customer is going to be the sum of the maximum number of times each different kind of event happened in a single day. Unless the rate at which all of these events happens is very constant and regular, those rows are likely to be mostly empty.
If anything, I'd suggest sharding the table based on some other column instead if you really do need to get the table size down. Perhaps by UserId or year?

storing weekly targets in database

i have the following requirement
Sales Officer: Bob
Week1 Week2 Week3 ................. Week52
Prod1 10 15 12 ................. 14
Prod2 20 14 10 ................. 17
. .
. .
. .
Sales supervisor will set the targets for each sales officer on weekly basis.
Sales officer may enter actual sales on daily basis for each product through a similar grid against the set targets e.g.
Edit
In the above case Supervisor has set target of 10 units for week 1 Now the sales Officer will enter the sales on daily basis as 1,2,0,1,3,2=9(Actual Sale for Week 1) so against the target of 10 unit he has sold 9 units in week one.
I have already created Employee and Product tables. Can any one guide about the best practice about how to store days and weeks in database against which the targets are stored and actual sales can be recorded.
I am thinking storing data in following table
EmpSales (EmployeeID,ProductID,SaleTarget,Actual Sale,Date,WeekNo,Month)
Thanks in advance
This one is really easy in pure Relational modelling terms. I do not see the need for "denormalisation" of any kind.
Sales Data Model.
If you are unfamiliar with the Standard for modelling Relational databases, the IDEF1X Notation may be helpful.
Pure 5NF; full Declarative referential Integrity; no Nulls, no Update Anomalies; no GROUP BYs; pure Date arithmetic.
The SaleTarget is compared against SaleActual by projection, and may be in the same result set.
If you have Monthly and Annual Sales accounting, the extension required is a common calendar table with a bit of control or structure; eg. similar to Week, including rows for each Month and Year. Just let me know, and I will update the model.
I say 5NF because that is the minimum I provide in order to eliminate Update Anomalies, and most modellers are familiar with it. But if it does not scare you off, the two Sales tables are actually Sixth Normal Form.
This allows full Pivoting (weeks or months across the top; Products or Employees down the side; vice versa; any combination) without temporary tables or complex SQL. (Just ask.)
I think it may even be self-explanatory, but I will supply the Verb Phrases which spell out the Business Rules, only because there are three Parents involved in each:
Each Employee is scheduled SaleTarget of Product for Week
Each Product is scheduled SaleTarget By Employee for Week
Each Employee did SaleActual of Product on Day
Each Product did SaleActual by Employee on Day
Comparison
I should have mentioned. Notice there is no vertical (rows) or horizontal (columns) duplication. When columns are duplicated eg, StartDate and EndDate, you have broken 3NF (introduced Functional Dependencies), and introduced an Update Anomaly. The EndDate in any row, is the StartDate in the next row (that, minus 1 second counts as a dupe, is a contrivance); when updating, now two rows instead of one have to be changed. More important, this structure is so simple (it is not a Time Series, or "temporal" requirement), the EndDate is not required.
Response to Comments
The Data Model has been updated to include Month and Year requirements. You now need a Check Constraint on SaleTarget to ensure that DateType is W for week. Loading the Date table is simple, you do not need the nonsense code (manually repeated cut-and-paste) that is posted on SQLTeam; they are famous for being stupid and sub-standard.
The SaleActual table now contains Daily, Weekly, Monthly, and Annual values. Which of course, you summarise programmatically on the first day of each Week, Month, Day. First add the new row to Date.
5NF is prety much the minimum required for standard compliance these days, so you need to get used to it. Basically there was a lot of argument among the academics (plus places like Wikipedia posting completely incorrect entries) of the NFs between 3NF and 5NF. The short and sweet definition of 5NF is that it is what 3NF was intended to be, with zero data duplication, zero Update Anomalies (no duplicated columns to be updated transactionally).
Forget about 6NF for now. Any table that is in 6NF, is in 5NF (and 4NF and BCNF and 3NF). Just treat the two Sales tables as 5NF. When you have to write a pivoted report, say an year from now, that's when you will realise the value of this structure.
I personally would store the targets and actuals in separate rows, and most probably in separate tables:
Targets:
EmployeeId, PeriodId, ProductId, TargetValue
Sales:
EmployeeId, PeriodId, ProductId, SalesValue
In fact, in an integrated system, the second table is usually unnecessary (assuming that you have a complete sales recording system, this should be a projection/view of the actual recorded sales - with appropriate assignment of employee, period and product based on the model of that subsystem).
In order to fit your calendar requirements, I would almost certainly have a date table which will allow you to ensure all your various business rules for definitions of weeks and months without complex date logic. Determining periods and aggregating is then just facilitated with joins to the calendar table.
So the ActualSales would look something like this (with just a generic Period table, which might itself be a period and date table):
SELECT sp.EmployeeId
, p.ProductId
, pd.PeriodType
, pd.PeriodId
, SUM(id.Quantity * id.UnitProce) AS TotalSales
FROM Invoice AS i
INNER JOIN InvoiceDetail AS id
ON id.InvoiceId = i.InvoiceId
INNER JOIN Employee AS sp
ON sp.EmployeeId = i.SalesPersonId
INNER JOIN Product AS p
ON id.ProductId = p.ProductId
INNER JOIN Period AS pd
ON pd.StartDate <= i.InvoiceDate
AND pd.EndDate > i.InvoiceDate
GROUP BY sp.EmployeeId, p.ProductId, pd.PeriodType, pd.PeriodId
In this case, data would be duplicated if you had overlapping periods (like daily, weekly, monthly), so you would need to aggregate ONLY one type of period - that's why I've specifically included it in this example view although it's redundant here.
I expect a generic Period table would look like:
PeriodId
PeriodType
StartDate
EndDate
This would be prepopulated with the various periods you want to report on:
'Q', 1/1/2010, 4/1/2010
'M', 1/1/2010, 2/1/2010
'M', 2/1/2010, 3/1/2010
'M', 3/1/2010, 4/1/2010
'W', 1/3/2010, 1/10/2010
'W', 1/10/2010, 1/17/2010
etc.
'D', 1/1/2010, 1/2/2010
'D', 1/2/2010, 1/3/2010
etc.
It makes very little sense to worry about holidays except that you probably aren't going to assign a target if they aren't working and this is mainly about managing the assignments so that they are presumably realistic. You can have a calendar table of days with various flags
Calendar
DateId
Date
IsHoliday
Then you can include that when you join to count the number of holidays/weekends in a period etc.
This is typically an accounting/business thing, but you may want to look into standardizing your calendar. For instance, in media buys for TV advertising, they make each "quarter" equal and make each "month" standardized - 4 weeks, 4 weeks, 5 weeks. Obviously they make exceptions for holiday and special TV events, but this helps to smooth out the accounting and compare like periods more easily.
Personally I would go for a more generic "period" table.
period(periodId,startDate,endDate,weekNo,Month,Year)
and then add
empSales(EmployeeId,ProductId,SaleTarget,ActualSale,periodId).
This is a bit more flexible (you can easily introduce different time spans and either make the relative week field null, or define some rule that maps the period on a "standard" week), there is less redundancy (note how the month and week have been moved away from the empSales table) and it allows you to do reporting and calculation (btw, you didn't include a Year field, is there a reason?).
Tallying up stuff should be easier, because assuming you have sales sorted by day, summing these up between intervals is easier unless you want to duplicate the "week" field all over the DB.
Note also that you can easily have targets on different, overlapping periods.
Example, you can set a weekly target for week 22-28 November (I am using the European convention of having the week start on monday) and have a special one-day period set on Black Friday
So:
Period:
periodId|startDate |endDate |weekNo|Month | Year|
0030020|22-NOV-2010|28-NOV-2010| 43 |November | 2010 |
0030026|26-NOV-2010|26-NOV-2010| null |November | 2010 |
empSales:
EmployeeId|ProductId|SaleTarget|ActualSale|periodId|
567689| 788585| 58 | 42 | 0030020|
567689| 788585| 28 | 32 | 0030026|
Note how Employee 567689 missed his weekly target but managed to go over his Black Friday target.
Btw, while working on this example I think you better drop the "empSales" table, renaming it to "empTargets":
empTargets(EmployeeId,ProductId,SaleTarget,periodId).
because the Actual Sales is easily calculated on the fly either with a UDF or placed in a view - after all, it's just a
select sum(items_sold)
from sales
where sales.employeeId = empTargets.employeeId and
sales.ProductId= empTargets.ProductId and
sales.saleDate between empTargets.startDate and
empTargets.endDate)
so no need to store it directly in the table (in fact it could become a burden in case of returned items or other future corrections).

Resources