Store weekdays and time of the day table in database - database

I have the following data (this is not a real table in database, it's just a group of information I need to store with each post in my database):
X = yes / true
O = no / false
Weekday | Morning | Day | Evening | Night |
---------------------------------------------
Monday | X | O | O | X |
Tuesday | X | O | O | X |
Wednesday | O | O | X | X |
Thursday | O | X | O | O |
Friday | X | X | X | X |
Saturday | O | O | X | O |
Sunday | X | X | X | O |
How should I store data like this in a database? Im not too experienced with database design and all the possible ways I could think of waste a lot of space. Normalization is not a requirement for this.
I don't need to query by this data, I just need to store it efficiently in parent object/entity.

From what you've said, I'd create a table with 4 columns (Morning, Day, Evening, Night) and a Primary Key to reference the individual datasets (like the one you've shown). Then I'd use a bitfield for each row entry. For example Monday = 1, Tuesday = 2, Wednesday = 4, Thurs = 8, Friday = 16, Saturday = 32, and Sunday = 64.
Your dataset provided (here as PK 001) could be saved in a single row as:
PK | Morning | Day | Evening | Night |
001 83 88 116 23
The morning value is 83, because Monday (1) + Tues (2) + Friday (16) + Sunday (64) = 83.
You'd only need to use a datatype that can store a 128-bit number max. This depends on which database you use but many databases have a binary(64) type that would work.
You would then use the bitwise & operator to test if a day is represented by a particular value:
83 (Morning Value) & 16 (Friday) = 16 (Friday, therefore true)
88 (Day value) & 4 (Wednesday) = 0 (therefore false)
116 (Evening value) & 32 (Saturday) = 32 (Saturday, therefor true)
Alternatively, you could create a bitfield column for each Day of the Week, and the value would be Morning = 1, Day = 2, Evening = 4, and Night = 8.
Your given dataset would be represented as a single row in the database as:
PK | Mon | Tue | Wed | Thu | Fri | Sat | Sun |
001 9 9 12 2 15 4 7

You may think of creating a table with 28 columns, being each group of 4 columns corresponding to the 4 times of each day. Something like:
CREATE TABLE <name> ( SUNDAY_MORNING VARCHAR(1) ,
SUNDAY_DAY VARCHAR(1) ,
SUNDAY_EVENING VARCHAR(1) ,
SUNDAY_NIGHT VARCHAR(1) ,
MONDAY_MORNING VARCHAR(1) ,
MONDAY_DAY VARCHAR(1) ,
MONDAY_EVENING VARCHAR(1) ,
MONDAY_NIGHT VARCHAR(1) ,
...
) ;
Of course, KEYs may need to be defined, but it is not possible to suggest anything based on the provided info.
As per your concern of SPACE, this structure will se MUCH LESS space than you can imagine.

Related

Data warehouse design - periodic snapshot with frequently changing dimension keys

Imagine a fact table with a summation of measures over a time period, say 1 hour.
Start Date | Measure 1 | Measure 2
-------------------------------------------
2018-09-08 00:00:00 | 5 | 10
2018-09-08 00:01:00 | 12 | 20
Ideally we want to maintain the grain such that each row is exactly 1 hour. However, each row references dimensions which might ‘break’ the grain. For instance:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | key 1
2018-09-08 00:01:00 | 12 | 20 | key 2
It is possible that the dimension value may change 30 minutes into the hour in which case, the above would be inaccurate and should be represented like this:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | val 1
2018-09-08 00:00:30 | 5 | 10 | val 2
2018-09-08 00:01:00 | 12 | 20 | val 2
In our scenario, the data needs to be sliced by at least 5 dimension keys with queries like:
sum(measure1) where dim1 = x and dim2 = y..
Is there a design pattern for this requirement? I have considered ‘periodic snapshots’ but I have not read anywhere about this kind of row splitting on dimension changes.
I can see only two options:
Store the dimension values that were most present on each row (e.g. if a dimension value was true for the majority of the time in the hour, use this value). This would lead to some loss of accuracy.
Split each row on every dimension change. This is complex in the ETL, creates more data and breaks the granularity rule in the fact table.
Option 2 is the current solution and serves the purpose but is harder to maintain. Is there a better way to do this, or other options?
By way of a real example, this system records production data in a manufacturing environment so the data is something like:
Line | Date | Crew | Product | Running Time (mins)
-----------------------------------------------------------------------
Line 1 | 2018-09-08 00:00:00 | Crew A | Product A | 60
As noted, the crew, product or any of the other dimension may change multiple times within the hour.
You shouldn't need to split the time portion of your fact table since you clearly want to report hourly data, but you should have two records, one for each dimension value. If this is an aggregate of a transactional fact table, your process that loads the hourly table should be grouping each record by each dimension key. So in your example above, you should have two records for hour like so:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | val 1
2018-09-08 00:01:00 | 5 | 10 | val 1
2018-09-08 00:01:00 | 12 | 10 | val 2
You will need to take into account the other measures as well and make sure they all go into the correct bucket (val 1 or val 2). I split them evenly in the example.
Now if you slice by hour 1 and by Dim 1 Value 2, you will only see 12 (measure 1), and if you slice on hour 1, dim 1 value 1, you will only see 5, and if you only slice on hour 1, you will see 17.
Remember, your grain is defined by the level of each dimension, not just the time dimension. HTH.

SSRS 'where clause'

I've got a table that contains sales information for several companies. Each sales transaction the company makes is stored in the table, and the week of the year (1-52) that the sale took place within is stored also. Here's a small example of the database table that I'm querying to produce the SSRS report.
|---------------------|------------------|------------------|
| Company | Week |Sales_Transaction |
|---------------------|------------------|------------------|
| Alpha | 20 | 1.00 |
|---------------------|------------------|------------------|
| Alpha | 20 | 2.00 |
|---------------------|------------------|------------------|
| Beta | 20 | 9.00 |
|---------------------|------------------|------------------|
| Alpha | 21 | 5.00 |
|---------------------|------------------|------------------|
| Coolbeans | 21 | 5.50 |
|---------------------|------------------|------------------|
| Alpha | 22 | 2.00 |
|---------------------|------------------|------------------|
| Alpha | 22 | 2.00 |
|---------------------|------------------|------------------|
| Coolbeans | 22 | 3.00 |
|---------------------|------------------|------------------|
I have a matrix with a row group which produces a line in the matrix for each company. The matrix has 52 additional columns for each week of the year. Here's a condensed version of the matrix and data I want to see.
|--------------|---------------|----------------|----------------|
| Company | # Sales Wk 20 | # Sales Wk 21 | # Sales Wk 22 |
|--------------|---------------|----------------|----------------|
| Alpha | 2 | 1 | 2 |
|--------------|---------------|----------------|----------------|
| Beta | 1 | 0 | 0 |
|--------------|---------------|----------------|----------------|
| Coolbeans | 0 | 1 | 1 |
|--------------|---------------|----------------|----------------|
To count the number of sales transactions for each week for each company, I'm using this expression like this for each column:
=Count(IIF(Fields!Sales_Week_Number.Value = "20", Fields!Sales.Value, 0))
Using the example expression above which I'm placing in the # Sales Wk 20 matrix column, the problem is that instead of counting ONLY the transactions that occurred in week 20, it counts transactions for all weeks for the company. The result is that in column # Sales Wk 20, it shows a 5 for Alpha, a 1 for Beta, and a 2 for Coolbeans.
What do I need to do to make it only count the sales transaction from the specific week?
Side Note: Regarding the 52 columns for each week of the year, I intentionally did not use a column group for this b/c I need to do some other calculations/comparisons with another matrix which doesn't play nice when column groups are used. I did, however, use a row group for the companies.
Your expression should use SUM instead of count
=SUM(IIF(Fields!Sales_Transaction.Value=0,0,1))
I think you may be going down the wrong path here. Since your using a matrix in SSRS, then the easiest way is to make SSRS handle the separation for you rather than building a WHERE.
Try just adding =CountRows() as part of your formula, and ssrs handles the grouping for you. I'll check the format of the command when I'm on-line properly not on my phone.
Use this expression in your matrix's value column -
=IIf((Fields!Sales_Transaction.Value)>0,Count(Fields!Sales_Transaction.Value),0);

Using Arithmetic in SQL on my own columns to fill a third column where it is zero. (complicated, only when certain criteria is met)

So here is my question. Brace yourself as it takes some thinking just to wrap your head around what I am trying to do.
I'm working with Quarterly census employment and wage data. QCEW data has something called suppression codes. If a data denomination (comes in overall, location quotient, and over the year each year each quarter) is suppressed, then all the data for that denomination is zero. I have my table set up in the following way (only showing you columns that are relevant for the question):
A County_Id column,
Industry_ID column,
Year column,
Qtr column,
Suppressed column (0 for not suppressed and 1 for suppressed),
Data_Category column (1 for overall, 2 for lq, and 3 for over the year),
Data_Denomination column (goes 1-8 for what specific data is being looked at in that category ex: monthly employment,Taxable wage, etc. typical data),
and a value column (which will be zero if the Data_Category is suppressed - since all the data denomination values will be zero).
Now, if Overall data (cat 1) for, say, 1991 quarter 1 is suppressed, but the next year quarter 1 has both overall and over the year (cats 1 and 3) NOT suppressed, then we can infer what the value would be for that first year's suppressed data, since OTY1991q1 = (Overall1991q1 - Overall1990q1). So to find that suppressed data we would just subtract our cat 1 (denom 1-8) values from our cat 3 (denom 1-8) values to replace the zeroes that are in our suppressed values from the year before. It's fairly easy to grasp mathematically, the difficulty is that there are millions of columns with which to check for these criteria. I'm trying to write some kind of SQL query that would do this for me, check to make sure Overall-n qtr-n is suppressed, then look to see if the next year isn't for both overall and oty, (in maybe some sort of complicated case statement? Then if those criteria are met, perform the arithmetic for the two Data_Cat-Data_Denom categories and replace the zero in the respective Cat-Denom values.
Below is a simple sample (non-relevant data_cats removed) that I hope will help get what I'm trying to do across.
|CountyID IndustryID Year Qtr Suppressed Data_Cat Data_Denom Value
| 5 10 1990 1 1 1 1 0
| 5 10 1990 1 1 1 2 0
| 5 10 1990 1 1 1 3 0
| 5 10 1991 1 0 1 1 5
| 5 10 1991 1 0 1 2 15
| 5 10 1991 1 0 1 3 25
| 5 10 1991 1 0 3 1 20
| 5 10 1991 1 0 3 2 20
| 5 10 1991 1 0 3 3 35
So basically what we're trying to do here is take the overall data from each data category (I removed lq ~ data_cat 2) because it isn't relevant and data_denom (which I've narrowed down from 8 to 3 for simplicity) in 1991, subtract it from the overall 1991 value and that will give you the applicable
| value for the previous year's 1990 cat_1. So here data_cat 1 Data_denom 1 would be 15 (20-5), denom 2 would be 5(20-15), and denom 3 would be 10(35-25). (Oty 1991q1 - overall 1991q1) = 1990q1. I hope this helps. Like I said the problem isn't the math it's formulating a query that will check this criteria millions and millions of times.
If you want to find supressed data that has 2 rows of unsupressed data for the next year and quarter, we could use cross apply() to do something like this:
test setup: http://rextester.com/ORNCFR23551
using cross apply() to return rows with a valid derived value:
select t.*
, NewValue = cat3.value - cat1.value
from t
cross apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 1
) cat1
cross apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 3
) cat3
where t.Suppressed = 1
and t.Data_Cat = 1
returns:
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| 5 | 10 | 1990 | 1 | 1 | 1 | 1 | 0 | 15 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 2 | 0 | 5 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 3 | 0 | 10 |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
Using outer apply() to return all rows
select t.*
, NewValue = coalesce(nullif(t.value,0),cat3.value - cat1.value,0)
from t
outer apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 1
) cat1
outer apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 3
) cat3
returns:
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| 5 | 10 | 1990 | 1 | 1 | 1 | 1 | 0 | 15 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 2 | 0 | 5 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 3 | 0 | 10 |
| 5 | 10 | 1991 | 1 | 0 | 1 | 1 | 5 | 5 |
| 5 | 10 | 1991 | 1 | 0 | 1 | 2 | 15 | 15 |
| 5 | 10 | 1991 | 1 | 0 | 1 | 3 | 25 | 25 |
| 5 | 10 | 1991 | 1 | 0 | 3 | 1 | 20 | 20 |
| 5 | 10 | 1991 | 1 | 0 | 3 | 2 | 20 | 20 |
| 5 | 10 | 1991 | 1 | 0 | 3 | 3 | 35 | 35 |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
UPDATE 1 - fixed some column names
UPDATE 2 - improved aliases in 2nd query
Ok, I think I get it.
If you're just wanting to make that one inference, then the following may help. (If this is just the first of many inferences you want to make in filling data gaps, you may find that a different method leads to a more efficient solution for doing both/all of them, but I guess cross that bridge when you get there...)
While much of the basic logic stays the same, how you'd tweak it depends on whether you want a query just to provide the values you would infer (e.g. to drive an UPDATE statement), or whether you want to use this logic inline in a bigger query. For performance reasons, I suspect the former makes more sense (especially if you can do the update once and then read the resulting dataset many times), so I'll start by framing things that way and come back to the other in a moment...
It sounds like you have a single table (I'll call it QCEW) with all these columns. In that case, use joins to associate each suppressed overall datapoint (c_oa in the following code) with the corresponding overall and oty datapoints from a year later:
SELECT c_oa.*, n_oa.value - n_oty.value inferred_value
FROM QCEW c_oa --current yr/qtr overall
inner join QCEW n_oa --next yr (same qtr) overall
on c_oa.countyId = n_oa.countyId
and c_oa.industryId = n_oa.industryId
and c_oa.year = n_oa.year - 1
and c_oa.qtr = n_oa.qtr
and c_oa.data_denom = n_oa.data_denom
inner join QCEW n_oty --next yr (same qtr) over-the-year
on c_oa.countyId = n_oty.countyId
and c_oa.industryId = n_oty.industryId
and c_oa.year = n_oty.year - 1
and c_oa.qtr = n_oty.qtr
and c_oa.data_denom = n_oty.data_denom
WHERE c_oa.SUPPRESSED = 1
AND c_oa.DATA_CAT = 1
AND n_oa.SUPPRESSED = 0
AND n_oa.DATA_CAT = 1
AND n_oty.SUPPRESSED = 0
AND n_oty.DATA_CAT = 3
Now it sounds like the table is big, and we've just joined 3 instances of it; so for this to work you'll need good physical design (appropriate indexes/stats for join columns, etc.). And that's why I'd suggest doing an update based on the above query once; sure, it may run long, but then you can read the inferred values in no time.
But if you really want to merge this directly into a query of the data you could modify it some to show all values, with inferred values mixed in. We need to switch to outer joins to do this, and I'm going to do some slightly weird things with join conditions to make it fit together:
SELECT src.COUNTYID
, src.INDUSTRYID
, src.YEAR
, src.QTR
, case when (n_oa.value - n_oty.value) is null
then src.suppressed
else 2
end as SUPPRESSED_CODE -- 0=NOT SUPPRESSED, 1=SUPPRESSED, 2=INFERRED
, src.DATA_CAT
, src.DATA_DENOM
, coalesce(n_oa.value - n_oty.value, src.value) as VALUE
FROM QCEW src --a source row from which we'll generate a record
left join QCEW n_oa --next yr (same qtr) overall (if src is suppressed/overall)
on src.countyId = n_oa.countyId
and src.industryId = n_oa.industryId
and src.year = n_oa.year - 1
and src.qtr = n_oa.qtr
and src.data_denom = n_oa.data_denom
and src.SUPPRESSED = 1 and n_oa.SUPPRESSED = 0
and src.DATA_CAT = 1 and n_oa.DATA_CAT = 1
left join QCEW n_oty --next yr (same qtr) over-the-year (if src is suppressed/overall)
on src.countyId = n_oty.countyId
and src.industryId = n_oty.industryId
and src.year = n_oty.year - 1
and src.qtr = n_oty.qtr
and src.data_denom = n_oty.data_denom
and src.SUPPRESSED = 1 and n_oty.SUPPRESSED = 0
and src.DATA_CAT = 1 and n_oty.DATA_CAT = 3

Sum based on values in other cells

I have a table in excel like the one below
Date | Type | Value
----------------------------------
21/01/2012 | Other | 1000
22/02/2012 | Existing | 1000
23/01/2012 | Existing | 1000
24/01/2012 | Other | 1000
12/02/2012 | Other | 1000
13/02/2012 | Existing | 1000
16/02/2012 | Other | 1000
19/01/2012 | Other | 1000
I want a formula that will add up all values of existing client for each month so for example it would say 1000 for January existing and 300 for January other.
I have tried everything i know how but i can't seem to make it work.
=SUMIFS(P2:P74,N2:N74,">="&N13,N2:N74,"<="&N43,O2:O74,"other")
where N13 is first and N43 is last day of the month, P is your value range and o Is column witho other /existing.
Try sumifs after extracting the month from the date column

Conditional SUM using multiple tables in EXCEL

I have a table that I'm trying to populate based on the values of two reference tables.
I have various different projects 'Type 1', 'Type 2' etc. that each run for 4 months and cost different amounts depending on when in their life cycle they are. These costings are shown in Ref Table 1.
Ref Table 1
Month | a | b | c | d
---------------------------------
Type 1 | 1 | 2 | 3 | 4
Type 2 | 10 | 20 | 30 | 40
Type 3 | 100 | 200 | 300 | 400
Ref Table 2 shows my schedule of projects for the next 3 months. With 2 new ones starting in Jan, one being a Type 1 and the other being a Type 2. In Feb, I'll have 4 projects, the first two entering their second month and two new ones start, but this time a Type 1 and a Type 3.
Ref table 2
Date | Jan | Feb | Mar
--------------------------
Type 1 | a | b | c
Type 1 | | a | b
Type 2 | a | b | c
Type 2 | | | a
Type 3 | | a | b
I'd like to create a table which calculates the total costs spent per project type each month. Example results are shown below in Results table.
Results
Date | Jan | Feb | Mar
-------------------------------
Type 1 | 1 | 3 | 5
Type 2 | 10 | 20 | 40
Type 3 | 0 | 100 | 200
I tried doing it with an array formula:
Res!b2 = {sum(if((Res!A2 = Ref2!A2:A6) * (Res!A2 = Ref1!A2:A4) * (Ref2!B2:D6 = Ref1!B1:D1), Ref!B2:E4))}
However it doesn't work and I believe that it's because of the third condition trying to compare a vector with another vector rather than a single value.
Does anyone have any idea how I can do this? Happy to use arrays, index, match, vector, lookups but NOT VBA.
Thanks
Assuming that months in results table headers are in the same order as Ref table 2 (as per your example) then try this formula in Res!B2
=SUM(SUMIF(Ref1!$B$1:$E$1,IF(Ref2!$A$2:$A$6=Res!$A2,Ref2!B$2:B$6),INDEX(Ref1!$B$2:$E$4,MATCH(Res!$A2,Ref1!$A$2:$A$4,0),0)))
confirm with CTRL+SHIFT+ENTER and copy down and across
That gives me the same results as you get in your results table
If the months might be in different orders then you can add something to check that too - I assumed that the types in results table row labels might be in a different order to Ref table 1, but if they are always in the same order too (as per your example) then the INDEX/MATCH part at the end can be simplified to a single range

Resources