How to split the start time and end time? - database

This question is about how to clean up the time data.
I am doing a database migration.
Old system:
For the time column, I have values such as:
9:30am - 12.30pm
9.30am - 12.30pm
9:30am-12.30pm
3pm to 7pm
3-9pm
1900-2000
1900h-2000h
etc etc...
New System:
Start time | End Time
0930 | 1230
0930 | 1230
0930 | 1230
1500 | 1900
1500 | 2100
1900 | 2000
1900 | 2000
Problem: How to efficiently convert the OLD values into NEW system where there are two columns with start and end time? (Or must I do it manually? The database has over 10000 records)

Although the data is of inconsistent format it looks quite easy to convert to the required format, mostly with a series of simple find/replace. For example in this order (or equivalent):
to > -
. > :
space > nothing
a > spacea
p > spacep
h > nothing
It is already more consistent:
9:30 am-12:30 pm
9:30 am-12:30 pm
9:30 am-12:30 pm
3 pm-7 pm
3-9 pm
1900-2000
1900-2000
Using Excel's Text to Columns with - as the delimited would then get to (in separate text columns):
9:30 am 12:30 pm
9:30 am 12:30 pm
9:30 am 12:30 pm
3 pm 7 pm
3 9 pm
1900 2000
1900 2000
Entering 1 somewhere in the spreadsheet and copying that cell before selecting cells containing M (with Find All and Ctrl+A) and then Paste Special, Multiply would coerce the selection into Time for formatting as as hhmm, to give:
0930 1230
0930 1230
0930 1230
1500 1900
3 2100
1900 2000
1900 2000
The likes of 3 would need some manual intervention (to guess whether AM or PM) but once that was addressed export as .csv should provide data ready for insertion.

Related

Formula for a conditional cumulative sum in Mac Numbers

I have a spreadsheet in Mac1 Numbers2 that tracks amounts due and amounts paid, ordered by date.
Transaction Date
Customer Name
Amount Due
Amount Paid
...
...
...
...
16 Nov 2022
Name1
$70.00
$70.00
16 Nov 2022
Name2
$70.00
$0.00
16 Nov 2022
Name3
$0.00
$70.00
16 Nov 2022
Name2
$0.00
$70.00
...
...
...
...
I would like to add an additional column, Running Total, that shows for each transaction, the accumulated credit/amount due for that row's customer up to that row's date.
Transaction Date
Customer Name
Amount Due
Amount Paid
Running Total
...
...
...
...
...
16 Nov 2022
Name1
$70.00
$70.00
$0.00
16 Nov 2022
Name2
$70.00
$0.00
-$70.00
16 Nov 2022
Name3
$0.00
$70.00
$70.00
16 Nov 2022
Name2
$0.00
$70.00
$0.00
...
...
...
...
...
I have a separate sheet that shows the complete running sum for each customer from the beginning of time through the present, but I'm at a loss for how to create a row-by-row, grouped running sum (or even whether Numbers allows for it).
I have tried to create formulas with SUMIF, but have not made headway in understanding how to code the kind of filter I need, so have not even created a runnable test formula. Various Google searches for creating running sums by group/category in Numbers, Excel, and GoogleSheets have not yielded results.
In a database, this would be trivial, but I'm restricted to Numbers.
1 MacOS Monterey 12.5
2 Numbers 12.2

How to compare and select non-changing variables in panel data

I have unbalanced panel data and need to exclude observations (in t) for which the income changed during the year before (t-1), while keeping other observations of these people. Thus, if a change in income happens in year t, then year t should be dropped (for that person).
clear
input year id income
2003 513 1500
2003 517 1600
2003 518 1400
2004 513 1500
2004 517 1600
2004 518 1400
2005 517 1600
2005 513 1700
2005 518 1400
2006 513 1700
2006 517 1800
2006 518 1400
2007 513 1700
2007 517 1600
2007 518 1400
2008 513 1700
2008 517 1600
2008 518 1400
end
xtset id year
xtline income, overlay
To illustrate what's going on, I add a xtline plot, which follows the income per person over the years. ID=518 is the perfect non-changing case (keep all obs). ID=513 has one time jump (drop year 2005 for that person). ID=517 has something like a peak, perhaps one time measurement error (drop 2006 and 2007).
I think there should be some form of loop. Initialize the first value for each person (because this cannot be compared), say t0. Then compare t1-t0, drop if changed, else compare t2-t1 etc. Because data is unbalanced there might be missing year-obervations. Thanks for advice.
Update/Goal: The purpose is prepare the data for a fixed effects regression analysis. There is another variable, reported for the entire "last year". Income however is reported at interview date (point in time). I need to get close to something like "last year income" to relate it to this variable. The procedure is suggested and followed by several publications. I try to replicate and understand it.
Solution:
bysort id (year) : drop if income != income[_n-1] & _n > 1
bysort id (year) : gen byte flag = (income != income[_n-1]) if _n > 1
list, sepby(id)
The procedure is VERY IFFY methodologically. There is no need to prepare for the fixed effects analysis other than xtsetting the data; and there rarely is any excuse to create missing data... let alone do so to squeeze the data into the limits of what (other) researchers know about statistics and econometrics. I understand that this is a replication study, but whatever you do with your replication and wherever you present it, you need to point out that the original authors did not have much clue about regression to begin with. Don't try too hard to understand it.

Query for records within time range using UTC and time zones

I am looking for a solution to quickly and efficiently search for records using start and stop times that are inclusve of the current system time. My hope is that I can find a solution that doesn't require that SQL Server to perform mathmatics or conversion on the format of the data that is stored in the tables. Accounting for daylight savings time can be done using a second set of values and the query logic can determine prior to execution if US daylight savings rules are in effect. My intent is to use UTC for the current time so the logic could move to Azure or other data centers and be location agnostic.
Solutions I've considered but found reasons they fall short:
Storing integer based offsets from UTC in hourly or minute increments. As time ranges cross the 24 hour mark, I can't perform a between operation like this: #CURRENT_HOUR BETWEEN 16 AND 4.
Storing only time based strings (ex: 09:00 & 18:00). These values have to be converted to a date time value to be compared to another date time value.
The current solution in use calculates the current time for each record using UTC offset and that is requiring a lot more work of SQL Server than should be required for this operation.
The best solution I've found so to use a DATETIMEOFFSET format, setting the values to a unified 1900-01-01 format so that the records are date agnostic, and only the current system time has to be adjusted accordingly.
DATA
ZIP CITY ST START_OFFSET STOP_OFFSET START_OFFSET_DST STOP_OFFSET_DST
----------------------------------------------------------------------------------------------------------------------
10001 New York NY 1900-01-01 09:00 -5:00 1900-01-01 18:00 -5:00 1900-01-01 09:00 -4:00 1900-01-01 18:00 -4:00
60601 Chicago IL 1900-01-01 09:00 -6:00 1900-01-01 18:00 -6:00 1900-01-01 09:00 -5:00 1900-01-01 18:00 -5:00
80202 Denver CO 1900-01-01 09:00 -7:00 1900-01-01 18:00 -7:00 1900-01-01 09:00 -6:00 1900-01-01 18:00 -6:00
85001 Phoenix AZ 1900-01-01 09:00 -7:00 1900-01-01 18:00 -7:00 1900-01-01 09:00 -7:00 1900-01-01 18:00 -7:00
90001 Los Angeles CA 1900-01-01 09:00 -8:00 1900-01-01 18:00 -8:00 1900-01-01 09:00 -7:00 1900-01-01 18:00 -7:00
QUERY
-- Get the current UTC date/time and adjust it to be 1900-01-01 with the current time and offset
DECLARE #UTC_TIME AS DATETIMEOFFSET = DATEADD(DAY, (DATEDIFF(DAY, SWITCHOFFSET(SYSDATETIMEOFFSET(), '+00:00'), '1900-01-01')) * 1, SWITCHOFFSET(SYSDATETIMEOFFSET(), '+00:00'))
SELECT #UTC_TIME -- EXAMPLE: 1900-01-01 17:00:00.000000 +00:00
SELECT TimeZoneId
FROM dbo.TimeZones
WHERE #UTC_TIME BETWEEN START_OFFSET AND STOP_OFFSET -- Ignoring DST logic for this example
The issue with this approach is once UTC rolls over 23:59, the date/time in the example will be 1900-01-01 00:00:00 instead of 1900-01-02 00:00:00 and won't fall between the intended range values when they are compared. I'm hoping someone will help me spot the forest for the trees on calculating a better offset or offer a new approach for the data structure that will still allow the query to avoid calculations.
This logic needs to run on SQL Server 2008 and I do not want to implement CLR to solve it.
Edit:
The purpose of this data query is determine reasonable call times for B2B marketing across US zones. We mostly use zip codes with their offsetting time zones and fall back to states if we don't have zip code data for the prospect. Our data is 99% US based, with a few records in Guam, Puerto Rico, etc. If a record gives a false positive its not the end of the world but we do try and restrict our marketing phone calls to 9A to 6P local time for these prospects.
If I wasn't clear before, I am looking for a way to track a time range only, not a date range on these records.

Database design for business hours considering holidays and specials cases

I saw a few examples on stackoverflow on how to design a database table to store business hours but they doesn't answer all my needs. They doesn't support defining different hours depending of the periode of the year and also doesn't support holidays and special time of the year were they can be closed.
My needs
support business hours that overlap 2 days. Example, a bar can open at 6pm and close at 3am
support multiple opening hours in the same day
support dates were they are closed
support different cases where the open/close hours are different during a certain period of time
Scenarios
In general, the store #1 will have these open hours
Monday, 9am to 12pm and 1pm to 5pm
Tuesday, 9am to 12pm and 1pm to 5pm
Wednesday, 9am to 12pm and 1pm to 5pm
Thursday, 9am to 12pm and 1pm to 9pm
Friday, 9am to 12pm and 1pm to 9pm
Saturday, 10am to 5pm
Sunday, closed
During the month of december, the opens hours are different
Monday, 9am to 12pm and 1pm to 9pm
Tuesday, 9am to 12pm and 1pm to 9pm
Wednesday, 9am to 12pm and 1pm to 9pm
Thursday, 9am to 12pm and 1pm to 9pm
Friday, 9am to 12pm and 1pm to 9pm
Saturday, 10am to 5pm
Sunday, 10am to 5pm
They are closed on these dates:
december 25
january 1
And for some reasons, they have specials cases were the open hours can be different:
july 10, 1pm to 9pm
september 20, 1pm to 9pm
My solution so far
StoreId BeginDate EndDate DayOfWeek OpenHour Duration
1 2015-01-01 2015-11-30 2 09:00 180
1 2015-01-01 2015-11-30 2 13:00 240
1 2015-01-01 2015-11-30 3 09:00 180
1 2015-01-01 2015-11-30 3 13:00 240
...
1 2015-12-01 2015-12-31 2 09:00 180
1 2015-12-01 2015-12-31 2 13:00 480
1 2015-12-01 2015-12-31 3 09:00 180
1 2015-12-01 2015-12-31 3 13:00 480
...
The problem that i see
I'm not sure that the BeginDate/EndDate should be in that table. Maybe i should have another table that will define Periods and have a foreign key on the OpenHours table that will link to a period.
Where should i define closed dates (holidays)?
Where should i define a special date where the open/close hours is different? Like an override of what is defined?
Stop thinking about rules. Think about rows.
This is dead simple if you just store open hours. PostgreSQL has particularly good support for this kind of thing.
create table business_hours (
open tstzrange primary key,
exclude using gist (open with &&)
);
The exclusion constraint guarantees no overlapping open hours. If it takes two rows per day, a year's data is little more than 700 rows. 100 years of data is only 70k rows. This is the most flexible option, development and testing time is almost nil, and a minimum-wage clerk can verify that the hours you're about to advertise match the hours you're going to be open.
The normal hours
-- The "normal" hours for the week starting Apr 13, 2015 (a Monday).
insert into business_hours values
-- Mon
(tstzrange('2015-04-13 09:00', '2015-04-13 12:00')),
(tstzrange('2015-04-13 13:00', '2015-04-13 17:00')),
-- Tue
(tstzrange('2015-04-14 09:00', '2015-04-14 12:00')),
(tstzrange('2015-04-14 13:00', '2015-04-14 17:00')),
-- Wed
(tstzrange('2015-04-15 09:00', '2015-04-15 12:00')),
(tstzrange('2015-04-15 13:00', '2015-04-15 17:00')),
-- Thu
(tstzrange('2015-04-16 09:00', '2015-04-16 12:00')),
(tstzrange('2015-04-16 13:00', '2015-04-16 21:00')),
-- Fri
(tstzrange('2015-04-17 09:00', '2015-04-17 12:00')),
(tstzrange('2015-04-17 13:00', '2015-04-17 21:00')),
-- Sat
(tstzrange('2015-04-18 10:00', '2015-04-18 17:00'));
-- Sun
-- Closed.
It should be clear from inserting just the "normal" hours that this kind of table can accommodate any kind of logic, whether good or bad.
You can wrap that kind of statement in a stored function in such a way that you can generate a week, a month, or a year of "normal" hours at one time. Update as needed.
In other dbms, you can use two timestamp columns and some check constraints. Checking for overlapping rows would probably have to be done as an exception report rather than as a constraint enforced by the dbms.
create table business_hours (
opens timestamp not null,
closes timestamp not null,
check (closes > opens),
primary key (opens, closes)
);
Using the pair of columns as a primary key lets the optimizer use index-only scans.
Im working on the same problem, and so far I have designed a similar approach to yours. But in regards to your questions.
Yes, it helps to move the start and end dates to a related table, for reasons seen in #3.
Closed dates could just be when the open hour is null
Default hours would have no start and end dates. All special hours would have defined start and end dates. The app looks up which groups of dates include todays date. It calculates the interval of the start and end of all groups. The smallest interval wins.
BTW, I am probably going to store my open hour by minute of the day, and leave formatting for later...
I would store the opening hours as RFC 5445 RRules and ExRules.
Some libraries have functions to show the rule in English text.
I find it's easy to generate RRules and ExRules using Google Calendar.

What is the Datatype for 10:00 AM PST in SQL Server

I have a table that need to store Timeslot such as 10:00 AM PST, 10:15 AM PST.... etc at a constant interval of 15 Min.
I am wondering if there is any Time datatype available for this kind of data?
I need to store them as 11:00 AM PST, 11:15 AM PST, 11:30 AM PST, 11:45 PM PST, 12:00 PM PST, 12:15 PM PST, 12:30 PST, 12:45 PM PST, 01:00 PM PST..... etc.
Also in future if the business requirement is for 20 min interval, I should be able to change it easily.
Thanks
There is a time data type you can use (SQL Server 2008).
CREATE TABLE Table1 ( Column1 time(7) )
The range is:
00:00:00.0000000 through 23:59:59.9999999
You can use a CHECK CONSTRAINT to ensure that the minutes part is one of (0, 15, 30, 45).
You could use the Time data type.
If that's overkill for you since you only need to store the interval at the level of minutes, you could store the minutes since midnight, within the range 0-1339.
You might even consider storing the number of 15-minute intervals since midnight, such that 2:00 AM is 8 and 2:15 AM is 9 (though unless you can think of a great name for such a column, it wouldn't be very clear).
If you're looking to constrain a time to 15 minute intervals, the easiest way to do it would be to just store it in an int constrained between 0-100 (some daylight savings days have 25 hours) and calculate when the exact time is if you need it. I've worked on a few large applications that have used this for stuff like weather forecasting data and it has worked quite well.

Resources