Populating a departure date field which is after the arrival date - sql-server

Step 1
Arrival Date (Already generated) – 1.35 Million Times
Step 2
Randomise a number between 0 and 1
Step 3
Use the Randomised number produced above to create the script below
UPDATE BOOKINGS
SET DepartureDate
CASE WHEN RAND() Result = Between 0 and 0.3 = Departure Date will be 2 Nights Later
CASE WHEN RAND() Result = Between 0.3 and 0.4 = Departure Date will be 3 Nights Later
CASE WHEN RAND ()Result >0.4 = Departure Date will be either 1,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 Nights Later

Do not use RAND() with a changing seed. It makes for terribly randomized data.
To get to your solution you need to create "buckets" of possible values. 3 days is supposed to happen in 10% of the cases; that makes the smallest bucket, so we need ten buckets. 2 days goes into 3 buckets. The other values go into 2 buckets each. then just use modulo to select one of the 10 buckets like this:
CREATE TABLE dbo.booking(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,days INT);
GO
INSERT INTO dbo.booking(days)
SELECT TOP(100000) 0 FROM sys.columns A,sys.columns B,sys.columns C,sys.columns D;
GO
UPDATE b
SET days = rndm.days
FROM dbo.booking b
CROSS APPLY (
SELECT days
FROM (VALUES(0,2),(1,2),(2,2),(3,3),(4,1),(5,1),(6,4),(7,4),(8,28),(9,28))dn(n,days)
WHERE n = ABS(CHECKSUM(NEWID(),b.Id))%10
)rndm;
GO
SELECT days,COUNT(1) cnt
FROM dbo.booking
GROUP BY days;
GO
EDIT: Updated code to not use case statement.

Just to let you know the final solution I used was:
UPDATE BOOKINGS
SET DepartureDate =
DATEADD(day,
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0 and 0.3 THEN 2 ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.3 and 0.5 THEN 3 ELSE
Round(Rand(CHECKSUM(NEWID())) * 28,0) END END,ArrivalDate)
Thanks
Wayne

Related

Need to generate from and to numbers based on the result set with a specified interval

I have below requirement.
Input is like as below.
Create table Numbers
(
Num int
)
Insert into Numbers
values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15)
Create table FromTo
(
FromNum int
,ToNum int
)
Select * From FromTo
Output should be as below.
FromNum ToNum
1 5
6 10
11 15
Actual Requirement is as below.
I need to load the data for a column into a table which will have thousands of records with different no's.
Consider like below.
1,2,5,7,9,11,15,34,56,78,98,123,453,765 etc..
I need to load these into other table which is having FROM and TO columns with the intervals of 5000. For example in the first 5000 if i have the no's till 3000, my 1st row should have FromNo as 1 and ToNum as 3000. second row: if the data is not having till 10000 and the next no started as 12312(This is the 2nd Row FromNum) the ToNum value should be +5000 i.e 17312. Here also if we don't have the no's data till 17312 it need to consider the ToNum between the 12312 and 17312
Output should be as below.
FromNum ToNum
1 3205
1095806 1100805
1100808 1105806
1105822 1110820
Can you guys please help me with the solution for the above.
Thanks in advance.
What you may try in this situation is to group data and get the expected results:
DECLARE #interval int = 5
INSERT INTO FromTo (FromNum, ToNum)
SELECT MIN(Num) AS FromNum, MAX(Num) AS ToNum
FROM Numbers
GROUP BY (Num - 1) / #interval

Update table with random numbers in kdb+q

when I run the following script:
tbl: update prob: 1?100 from tbl;
I was expecting that I get a new column created with each row having a random number. However, I get back a column containing the same number for all the rows in the table.
How do I resolve this? I need to update my existing table and not create a table from scratch.
When you are using 1?100 you are only requesting 1 random value within the range of 0-100. If you use 10?100, you will be returned a list of 10 random values between 0-100.
So to do this in an update you want to use something like this
tbl:([]time:5?.z.p;sym:5?`3;price:5?10f;qty:5?10)
time sym price qty
-----------------------------------------------
2012.02.19D18:34:27.148501760 gkn 8.376952 9
2008.07.29D20:23:13.601434560 odo 7.041609 3
2007.02.07D08:17:59.482332864 pbl 0.955069 9
2001.04.27D03:36:44.475531384 aph 1.127308 2
2010.03.03D03:35:55.253069888 mgi 0.7663449 6
update r:abs count[i]?0h from tbl
time sym price qty r
-----------------------------------------------------
2012.02.19D18:34:27.148501760 gkn 8.376952 9 23885
2008.07.29D20:23:13.601434560 odo 7.041609 3 19312
2007.02.07D08:17:59.482332864 pbl 0.955069 9 10372
2001.04.27D03:36:44.475531384 aph 1.127308 2 25281
2010.03.03D03:35:55.253069888 mgi 0.7663449 6 27503
Note that I am using type short and abs to return positive values.
You need to seed your initial data, using something like rand(time), otherwise it will use the same seed, and thus, give the same sequence of random numbers.
EDIT: Per https://code.kx.com/wiki/Reference/SystemCommands
Use \S?n, where n is any integer.
EDIT2: Check out https://code.kx.com/wiki/Reference/SystemCommands#.5CS_.5Bn.5D_-_random_seed for how to use random numbers, please.
Just generate as many random numbers as you have rows using count tbl:
First create your table tbl:
tbl:([]date:reverse .z.d-til 100;price:sums 100?1f)
date price
--------------------
2018.04.26 0.2426471
2018.04.27 0.6163571
2018.04.28 1.179559
..
Then add a column of random numbers between 0 and 100:
update rdn:(count tbl)?100 from tbl
date price rdn
------------------------
2018.04.26 0.2426471 25
2018.04.27 0.6163571 33
2018.04.28 1.179559 13
..

Counting Columns with conditions, assigning values based on count

I have a table with call logs. I need to assign time slots for next call based on which time slot the phone number was reachable in.
The relevant columns of the table are:
Phone Number | CallTimeStamp
CallTimeStamp is a datetime object.
I need to calculate the following:
Time Slot: From the TimeStamp, I need to calculate the count for each time slot (eg. 0800-1000, 1001-1200, etc.) for each phone number. Now, if the count is greater than 'n' for a particular time slot, then I need to assign that time slot to that number. Otherwise, I select a default time slot.
Weekday Slot: Same as above, but with weekdays.
Priority: Basically a count of how many times a number was reached
Here's I have gone about solving these issues:
Priority
To calculate the number of times a phone number is called is straight forward. If a number exists in the call log, I know that it was called. In that case, the following query will give me the call count for each number.
SELECT DISTINCT(PhoneNumber), COUNT(PhoneNumber) FROM tblCallLog
GROUP BY PhoneNumber
However, my problem is that I need to change the values in the field Count(PhoneNumber) based on the value in that column itself. How do I go about achieving this? (eg. If Count(PhoneNumber) gives me a value > 20, I need to change it to 5).
Time Slot / Weekday
This is where I'm completely stumped and am looking for the "database" way of doing things.
Unfortunately, I can't get out of my iterative process of thinking. For example, if I was aggregating for a certain phone number (say '123456') and in a certain time slot (say between 0800-1000 hrs), I can write a query like this:
DECLARE #T1Start time = '08:00:00.0000'
DECLARE #T2End time = '10:00:00.0000'
SELECT COUNT(CallTimeStamp) FROM tblCallLog
WHERE PhoneNumber = '123456' AND FORMAT(CallTimeStamp, 'hh:mm:ss') >= #T1Start AND FORMAT(CallTimeStamp, 'hh:mm:ss') < #T2End
Now, I could go through each and every Distinct Phone Number in the table, count the values for each time slot and then assign a slot value for the phone number. However, there has to be a way that does not involve me iterating through a database.
So, I am looking for suggestions on how to solve this.
Thanks
You can use DATEPART Function to get week day slot.
To calculate time slot you can try dividing number of minutes from beginning of day and dividing it by size of the time slot. It would return you slot number. You can use either CASE statement to translate it to proper string or look table where you can store slot descriptions.
SELECT
PhoneNumber
, DATEPART(WEEKDAY, l.CallTimeStamp) AS DayOfWeekSlot
, DATEDIFF(MINUTE, CONVERT(DATE, l.CallTimeStamp), l.CallTimeStamp) / 120 AS TwoHourSlot /*You can change number of minutes to get different slot size*/
, COUNT(*) AS Count
FROM tblCallLog l
GROUP BY PhoneNumber
, DATEPART(WEEKDAY, l.CallTimeStamp)
, DATEDIFF(MINUTE, CONVERT(DATE, l.CallTimeStamp), l.CallTimeStamp) / 120
You could try this to return the phone number, the day of the week and a 2 hour slot. If the volume of calls is greater than 20 the value is set to 5 (not sure why to 5?). The code for the 2 hour section is adapted from this question How to Round a Time in T-SQL where the value 2 in (24/2) is the number of hours in your time period.
SELECT
PhoneNumber
, DATENAME(weekday,CallTimeStamp) as [day]
, CONVERT(smalldatetime,ROUND(CAST(CallTimeStamp as float) * (24/2),0)/(24/2)) AS RoundedTime
, CASE WHEN COUNT(*) > 20 THEN 5 ELSE COUNT(*) END
FROM
tblCallLog
GROUP BY
PhoneNumber
, DATENAME(weekday,dateadd(s,start_ts,'01/01/1970'))

ms sql table adding rows whenever level changes by more than 1 so that every row has difference of 1 in start_level and end_level

(This is my first stack overflow question. So please let me know suggestions for posing a better question, if you cannot understand.)
I have a table of around 500 people(users) who are going up the stairs from floor x (0=x, max(y) = 50). A person can climb zero/one or many levels in a single go which corresponds to a single row of the table along with the time taken to do so in seconds.
I want to find average time taken to go from floor a to a+1 where a is any of the floor number. To do so I intend to divide every row of the mentioned table into rows which have start_level+1= end_level. Duration will be divided equally as shown in EXPECTED OUTPUT TABLE for user b.
GIVEN TABLE INPUT
start_level end_level duration user
1 1 10 a
1 2 5 a
2 5 27 b
5 6 3 c
EXPECTED OUTPUT
start_level end_level duration user
1 1 10 a
1 2 5 a
2 3 27/3 b
3 4 27/3 b
4 5 27/3 b
5 6 3 c
Note: level jumps are in integers only.
After getting expected output, I can simply create a column sum(duration)/count(distinct users) at a start_level level to get average time taken to get one floor above from each floor.
Any help is appreciated.
You can use a Numbers table to "create" the incremental steps. Here's my setup:
CREATE TABLE #floors
(
[start_level] INT,
[end_level] INT,
[duration] DECIMAL(10, 4),
[user] VARCHAR(50)
)
INSERT INTO #floors
([start_level],
[end_level],
[duration],
[user])
VALUES (1,1,10,'a'),
(1,2,5,'a'),
(2,5,27,'b'),
(5,6,3,'c')
Then, using a Numbers table and some LEFT JOIN/COALESCE logic:
-- Create a Numbers table
;WITH Numbers_CTE
AS (SELECT TOP 50 [Number] = ROW_NUMBER()
OVER(
ORDER BY (SELECT NULL))
FROM sys.columns)
SELECT [start_level] = COALESCE(n.[Number], f.[start_level]),
[end_level] = COALESCE(n.[Number] + 1, f.[end_level]),
[duration] = CASE
WHEN f.[end_level] = f.[start_level] THEN f.[duration]
ELSE f.[duration] / ( f.[end_level] - f.[start_level] )
END,
f.[user]
FROM #floors f
LEFT JOIN Numbers_CTE n
ON n.[Number] BETWEEN f.[start_level] AND f.[end_level]
AND f.[end_level] - f.[start_level] > 1
Here are the logical steps:
LEFT JOIN the Numbers table for cases where end_level >= start_level + 2 (this has the effect of giving us multiple rows - one for each incremental step)
new start_level = If the LEFT JOIN "completes": take Number from the Numbers table, else: take the original start_level
new end_level = If the LEFT JOIN "completes": take Number + 1, else: take the original end_level
new duration = If end_level = start_level: take the original duration (to avoid divide by 0), else: take the average duration over end_level - start_level

SQL: Is it possible to SUM() fields of INTERVAL type?

I am trying to sum INTERVAL. E.g.
SELECT SUM(TIMESTAMP1 - TIMESTAMP2) FROM DUAL
Is it possible to write a query that would work both on Oracle and SQL Server? If so, how?
Edit: changed DATE to INTERVAL
I'm afraid you're going to be out of luck with a solution which works in both Oracle and MSSQL. Date arithmetic is something which is very different on the various flavours of DBMS.
Anyway, in Oracle we can use dates in straightforward arithmetic. And we have a function NUMTODSINTERVAL which turns a number into a DAY TO SECOND INTERVAL. So let's put them together.
Simple test data, two rows with pairs of dates rough twelve hours apart:
SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss'
2 /
Session altered.
SQL> select * from t42
2 /
D1 D2
-------------------- --------------------
27-jul-2010 12:10:26 27-jul-2010 00:00:00
28-jul-2010 12:10:39 28-jul-2010 00:00:00
SQL>
Simple SQL query to find the sum of elapsed time:
SQL> select numtodsinterval(sum(d1-d2), 'DAY')
2 from t42
3 /
NUMTODSINTERVAL(SUM(D1-D2),'DAY')
-----------------------------------------------------
+000000001 00:21:04.999999999
SQL>
Just over a day, which is what we would expect.
"Edit: changed DATE to INTERVAL"
Working with TIMESTAMP columns is a little more labourious, but we can still work the same trick.
In the following sample. T42T is the same as T42 only the columns have TIMESTAMP rather than DATE for their datatype. The query extracts the various components of the DS INTERVAL and converts them into seconds, which are then summed and converted back into an INTERVAL:
SQL> select numtodsinterval(
2 sum(
3 extract (day from (t1-t2)) * 86400
4 + extract (hour from (t1-t2)) * 3600
5 + extract (minute from (t1-t2)) * 600
6 + extract (second from (t1-t2))
7 ), 'SECOND')
8 from t42t
9 /
NUMTODSINTERVAL(SUM(EXTRACT(DAYFROM(T1-T2))*86400+EXTRACT(HOURFROM(T1-T2))*
---------------------------------------------------------------------------
+000000001 03:21:05.000000000
SQL>
At least this result is in round seconds!
Ok, after a bit of hell, with the help of the stackoverflowers' answers I've found the solution that fits my needs.
SELECT
SUM(CAST((DATE1 + 0) - (DATE2 + 0) AS FLOAT) AS SUM_TURNAROUND
FROM MY_BEAUTIFUL_TABLE
GROUP BY YOUR_CHOSEN_COLUMN
This returns a float (which is totally fine for me) that represents days both on Oracle ant SQL Server.
The reason I added zero to both DATEs is because in my case date columns on Oracle DB are of TIMESTAMP type and on SQL Server are of DATETIME type (which is obviously weird). So adding zero to TIMESTAMP on Oracle works just like casting to date and it does not have any effect on SQL Server DATETIME type.
Thank you guys! You were really helpful.
You can't sum two datetimes. It wouldn't make sense - i.e. what does 15:00:00 plus 23:59:00 equal? Some time the next day? etc
But you can add a time increment by using a function like Dateadd() in SQL Server.
In SQL Server as long as your individual timespans are all less than 24 hours you can do something like
WITH TIMES AS
(
SELECT CAST('01:01:00' AS DATETIME) AS TimeSpan
UNION ALL
SELECT '00:02:00'
UNION ALL
SELECT '23:02:00'
UNION ALL
SELECT '17:02:00'
--UNION ALL SELECT '24:02:00' /*This line would fail!*/
),
SummedTimes As
(
SELECT cast(SUM(CAST(TimeSpan AS FLOAT)) as datetime) AS [Summed] FROM TIMES
)
SELECT
FLOOR(CAST(Summed AS FLOAT)) AS D,
DATEPART(HOUR,[Summed]) AS H,
DATEPART(MINUTE,[Summed]) AS M,
DATEPART(SECOND,[Summed]) AS S
FROM SummedTimes
Gives
D H M S
----------- ----------- ----------- -----------
1 17 7 0
If you wanted to handle timespans greater than 24 hours I think you'd need to look at CLR integration and the TimeSpan structure. Definitely not portable!
Edit: SQL Server 2008 has a DateTimeOffset datatype that might help but that doesn't allow either SUMming or being cast to float
I also do not think this is possible. Go with custom solutions that calculates the date value according to your preferences.
You can also use this:
select
EXTRACT (DAY FROM call_end_Date - call_start_Date)*86400 +
EXTRACT (HOUR FROM call_end_Date - call_start_Date)*3600 +
EXTRACT (MINUTE FROM call_end_Date - call_start_Date)*60 +
extract (second FROM call_end_Date - call_start_Date) as interval
from table;
You Can write you own aggregate function :-). Please read carefully http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm
You must create object type and its body by template, and next aggregate function what using this object:
create or replace type Sum_Interval_Obj as object
(
-- Object for creating and support custom aggregate function
duration interval day to second, -- In this property You sum all interval
-- Object Init
static function ODCIAggregateInitialize(
actx IN OUT Sum_Interval_Obj
) return number,
-- Iterate getting values from dataset
member function ODCIAggregateIterate(
self IN OUT Sum_Interval_Obj,
ad_interval IN interval day to second
) return number,
-- Merge parallel summed data
member function ODCIAggregateMerge(
self IN OUT Sum_Interval_Obj,
ctx2 IN Sum_Interval_Obj
) return number,
-- End of query, returning summary result
member function ODCIAggregateTerminate
(
self IN Sum_Interval_Obj,
returnValue OUT interval day to second,
flags IN number
) return number
)
/
create or replace type body Sum_Interval_Obj is
-- Object Init
static function ODCIAggregateInitialize(
actx IN OUT Sum_Interval_Obj
) return number
is
begin
actx := Sum_Interval_Obj(numtodsinterval(0,'SECOND'));
return ODCIConst.Success;
end ODCIAggregateInitialize;
-- Iterate getting values from dataset
member function ODCIAggregateIterate(
self IN OUT Sum_Interval_Obj,
ad_interval IN interval day to second
) return number
is
begin
self.duration := self.duration + ad_interval;
return ODCIConst.Success;
exception
when others then
return ODCIConst.Error;
end ODCIAggregateIterate;
-- Merge parallel calculated intervals
member function ODCIAggregateMerge(
self IN OUT Sum_Interval_Obj,
ctx2 IN Sum_Interval_Obj
) return number
is
begin
self.duration := self.duration + ctx2.duration; -- Add two intervals
-- return = All Ok!
return ODCIConst.Success;
exception
when others then
return ODCIConst.Error;
end ODCIAggregateMerge;
-- End of query, returning summary result
member function ODCIAggregateTerminate(
self IN Sum_Interval_Obj,
returnValue OUT interval day to second,
flags IN number
) return number
is
begin
-- return = All Ok, too!
returnValue := self.duration;
return ODCIConst.Success;
end ODCIAggregateTerminate;
end;
/
-- You own new aggregate function:
CREATE OR REPLACE FUNCTION Sum_Interval(
a_Interval interval day to second
) RETURN interval day to second
PARALLEL_ENABLE AGGREGATE USING Sum_Interval_Obj;
/
Last, check your function:
select sum_interval(duration)
from (select numtodsinterval(1,'SECOND') as duration from dual union all
select numtodsinterval(1,'MINUTE') as duration from dual union all
select numtodsinterval(1,'HOUR') as duration from dual union all
select numtodsinterval(1,'DAY') as duration from dual);
Finally You can create SUM function, if you want.

Resources