date_trunc value in Unix second , CrateDB - npgsql

Im trying to group a set of data using date_trunc. But after executing the data i got the grouped data in Date format (DD/MM/YY). How to get it to second using query ?
Table Sensor
----------------------------------
| sensorid | reading | timestamp |
----------------------------------
| 1 | 100 | 1612331498 |
-----------------------------------
| 2 | 100 | 1614752263 |
-----------------------------------
| 1 | 10 | 1614752263 |
-----------------------------------
> select date_trunc('day', v.timestamp) as day,sum(reading) from sensor
> v(timestamp,sensorid) where sensorid=1 group by (DAY);
Output is
day sum
03/02/2021 12:00:00 am 100
03/03/2021 12:00:00 am 10
Expected result
day sum
1612331498 100
1614752263 10
im using Npgsql C# client and cratedb.
CREATE TABLE sensor (
sensorid int,
reading double,
timestamp double
);

Related

Results changed after sorting a table

I have a following scenario:
I created a batch job using SQL API.
final TableEnvironment tEnv = TableEnvironment.create(EnvironmentSettings.newInstance().inBatchMode().build());
I load the data from csv files, convert/aggregate it using SQL API.
At some stage I have a table:
CREATE VIEW ohlc_current_day as
SELECT
CAST(transact_time as DATE) as `day`,
instrument_id,
first_value(price) as `open`,
min(price) AS `low`,
max(price) AS `high`,
last_value(price) as `close`,
count(*) AS `count`,
sum(quantity) AS volume,
sum(quantity * price) AS turnover
FROM trades //table loaded from csv
group by CAST(transact_time as DATE), instrument_id
Now when check the results:
select * from ohlc_current_day where instrument_id=14
+------------+---------------+---------+---------+--------+---------+-------+-----------+---------------+
| day | instrument_id | open | low | high | close | count | volume | turnover |
+------------+---------------+---------+---------+--------+---------+-------+-----------+---------------+
| 2021-04-11 | 14 | 1723.0 | 1709.0 | 1743.0 | 1728.0 | 679 | 487470.0 | 8.4114803E8 |
+------------+---------------+---------+---------+--------+---------+-------+-----------+---------------+
The results are repeatable and correct (checked with reference).
Then, for futrher processing, I need ohlc values from the previous day which are already stored in a database:
CREATE TABLE ohlc_database (
`day` TIMESTAMP,
instrument_id INT,
`open` float,
`low` FLOAT,
`high` FLOAT,
`close` FLOAT,
`count` BIGINT,
volume FLOAT,
turnover FLOAT
) WITH (
'connector' = 'jdbc',
'url' = 'url',
'table-name' = 'ohlc',
'username' = 'user',
'password' = 'password'
)
Let's now merge ohlc_current_day with ohlc_database:
CREATE VIEW ohlc_raw as
SELECT * from ohlc_current_day
UNION ALL
select
CAST(`day` as DATE) as `day`,
instrument_id,
`open`,
`low`,
`high`,
`close`,
`count`,
volume,
turnover
FROM ohlc_database
WHERE `day` = '2021-04-10' //hardcoded previous day date
And check the results:
select * from ohlc_raw where instrument_id=14
+------------+---------------+--------+--------+--------+---------+-------+-----------+---------------+
| day | instrument_id | open | low | high | close | count | volume | turnover |
+------------+---------------+--------+--------+--------+---------+-------+-----------+---------------+
| 2021-04-10 | 14 | 1696.0 | 1654.0 | 1703.0 | 1691.0 | 936 | 1040888.0 | 1.74619264E9 |
| 2021-04-11 | 14 | 1723.0 | 1709.0 | 1743.0 | 1728.0 | 679 | 487470.0 | 8.4114829E8 |
+------------+---------------+--------+--------+--------+---------+-------+-----------+---------------+
results are ok, values the same as in previous select query.
Now let's order by day:
CREATE VIEW ohlc as
SELECT * from ohlc_raw ORDER BY `day`
Check the results:
select * from ohlc where instrument_id=14
+------------+---------------+-----------------+-----------------+-------------+----------------+----------------------+--------------------------------+--------------------------------+
| day | instrument_id | open | low | high | close | count | volume | turnover |
+------------+---------------+-----------------+-----------------+-------------+----------------+----------------------+--------------------------------+--------------------------------+
| 2021-04-10 | 14 | 1696.0 | 1654.0 | 1703.0 | 1691.0 | 936 | 1040888.0 | 1.74619264E9 |
| 2021-04-11 | 14 | 1729.0 | 1709.0 | 1743.0 | 1732.0 | 679 | 487470.0 | 8.4114854E8 |
+------------+---------------+-----------------+-----------------+-------------+----------------+----------------------+--------------------------------+--------------------------------+
open and close are wrong compared to previous values. They are calculated using first_value() and last_value() functions which depend on the order of elements. So my guess is that order by in last query has changed the order and this is why there are different results.
Is my understanding correct? How can I fix it?
I thought that first_value() or last_value() themself is a sort operation. When you order by in last query, the sort according to last_value() is out of order. May be you can output the result into database after union all, and then do order by date after extract data from database if possible.

Selecting unique records based on date of effect, ending on date of discontinue

I have an interesting conundrum and I am using SQL Server 2012 or SQL Server 2016 (T-SQL obviously). I have a list of products, each with their own UPC code. These products have a discontinue date and the UPC code gets recycled to a new product after the discontinue date. So let's say I have the following in the Item_UPCs table:
Item Key | Item Desc | UPC | UPC Discontinue Date
123456 | Shovel | 0009595959 | 2018-04-01
123456 | Shovel | 0007878787 | NULL
234567 | Rake | 0009595959 | NULL
As you can see, I have a UPC that gets recycled to a new product. Unfortunately, I don't have an effective date for the item UPC table, but I do in an items table for when an item was added to the system. But let's ignore that.
Here's what I want to do:
For every inventory record up to the discontinue date, show the unique UPC associated with that date. An inventory record consists of the "Inventory Date", the "Purchase Cost", the "Purchase Quantity", the "Item Description", and the "Item UPC".
Once the discontinue date is over with (e.g.: it's the next day), start showing only the UPC that is in effect.
Make sure that no duplicate data exists and the UPCs are truly being "attached" to each row per whatever the date is in the query.
Here is an example of the inventory details table:
Inv_Key | Trans_Date | Item_Key | Purch_Qty | Purch_Cost
123 | 2018-05-12 | 123456 | 12.00 | 24.00
108 | 2018-03-22 | 123456 | 8.00 | 16.00
167 | 2018-07-03 | 234567 | 12.00 | 12.00
An example query:
SELECT DISTINCT
s.SiteID
,id.Item_Key
,iu.Item_Desc
,iu.Item_Department
,iu.Item_Category
,iu.Item_Subcategory
,iu.UPC
,iu.UPC_Discontinue_Date
,id.Trans_Date
,id.Purch_Cost
,id.Purch_Qty
FROM Inventory_Details id
INNER JOIN Item_UPCs iu ON iu.Item_Key = id.Item_Key
INNER JOIN Sites s ON s.Site_Key = id.Site_Key
The real query I have is far too long to post here. It has three CTEs and the resultant query. This is simply a mockup. Here is an example result set:
Site_ID | Item_Key | Item_Desc | Item_Department | Item_Category | UPC | UPC_Discontinue Date | Trans_Date | Purch_Cost | Purch_Qty
2457 | 123456 | Shovel | Digging Tools | Shovels | 0009595959 | 2018-04-01 | 2018-03-22 | 16.00 | 8.00
2457 | 123456 | Shovel | Digging Tools | Shovels | 0007878787 | NULL | 2018-03-22 | 16.00 | 8.00
2457 | 234567 | Rakes | Garden Tools | Rakes | 0009595959 | NULL | 2018-07-03 | 12.00 | 12.00
2457 | 123456 | Shovel | Digging Tools | Shovels | 0007878787 | NULL | 2018-05-12 | 24.00 | 12.00
Do any of you know how I can "assign" a UPC to a specific range of dates in my query and then "assign" an updated UPC to the item for every effective date thereafter?
Many thanks!
Given your current Item_UPC table, you can generate effective start dates from the Discontinue Date using the LAG analytic function:
With Effective_UPCs as (
select [Item_Key]
, [Item_Desc]
, [UPC]
, coalesce(lag([UPC_Discontinue_Date])
over (partition by [Item_Key]
order by coalesce( [UPC_Discontinue_Date]
, datefromparts(9999,12,31))
),
lag([UPC_Discontinue_Date])
over (partition by [UPC]
order by coalesce( [UPC_Discontinue_Date]
, datefromparts(9999,12,31))
)) [UPC_Start_Date]
, [UPC_Discontinue_Date]
from Item_UPCs i
)
select * from Effective_UPCs;
Which yields the following Results:
| Item_Key | Item_Desc | UPC | UPC_Start_Date | UPC_Discontinue_Date |
|----------|-----------|------------|----------------|----------------------|
| 123456 | Shovel | 0007878787 | 2018-04-01 | (null) |
| 123456 | Shovel | 0009595959 | (null) | 2018-04-01 |
| 234567 | Rake | 0009595959 | 2018-04-01 | (null) |
This function produces a fully open ended interval where both the start and discontinue dates could be null indicating that it's effective for all time. To use this in your query simply reference the Effective_UPCs CTE in place of the Item_UPCs table and add a couple additional predicates to take the effective dates into consideration:
SELECT DISTINCT
s.SiteID
,id.Item_Key
,iu.Item_Desc
,iu.Item_Department
,iu.Item_Category
,iu.Item_Subcategory
,iu.UPC
,iu.UPC_Discontinue_Date
,id.Trans_Date
,id.Purch_Cost
,id.Purch_Qty
FROM Inventory_Details id
INNER JOIN Effective_UPCs iu
ON iu.Item_Key = id.Item_Key
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date < id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date <= iu.UPC_Discontinue_Date)
INNER JOIN Sites s ON s.Site_Key = id.Site_Key
Note that the above query uses a partially open range (UPC_Start_Date < trans_date <= UPC_Discontinue_Date instead of <= for both inequalities) this prevents transactions occurring exactly on the discontinue date from matching both the prior and next Item_Key record. If transactions that occur exactly on the discontinue date should match the new record and not the old simply swap the two inequalities:
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date <= id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date < iu.UPC_Discontinue_Date)
instead of
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date < id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date <= iu.UPC_Discontinue_Date)

Date difference for same ID

I ve got a data set similar to
+----+------------+------------+------------+
| ID | Udate | last_code | Ddate |
+----+------------+------------+------------+
| 1 | 05/11/2018 | ACCEPTED | 13/10/2018 |
| 1 | 03/11/2018 | ATTEMPT | 13/10/2018 |
| 1 | 01/11/2018 | INFO | 13/10/2018 |
| 1 | 22/10/2018 | ARRIVED | 13/10/2018 |
| 1 | 15/10/2018 | SENT | 13/10/2018 |
+----+------------+------------+------------+
I m trying to get the date difference for each code on Udate, but for the first date I want to make datedifference between Udate and Ddate.
So I ve been trying:
DATEDIFF(DAY,LAG(Udate) OVER (PARTITION BY Shipment_Number ORDER BY Udate), Udate)
to get the difference between dates and it works so far, but I also need the first date difference between Udate and Ddate.
I was thinking about ISNULL()
Also, at the end I need an average of days between codes as well, usually they keep the same pattern. Sample output data:
+----+------------+------------+------------+------------+
| ID | Udate | last_code | Ddate | Difference |
+----+------------+------------+------------+------------+
| 1 | 05/11/2018 | ACCEPTED | 13/10/2018 | 2 |
| 1 | 03/11/2018 | ATTEMPT | 13/10/2018 | 2 |
| 1 | 01/11/2018 | INFO | 13/10/2018 | 10 |
| 1 | 22/10/2018 | ARRIVED | 13/10/2018 | 7 |
| 1 | 15/10/2018 | SENT | 13/10/2018 | 2 |
+----+------------+------------+------------+------------+
Notice that when there is no previous code, the date diff is between Udate and Ddate.
Would appreciate any idea.
Thank you.
Well, ISNULL is the way to go here.
Since you also want the average difference, you can use a common table expression to get the difference, and query it to get the average:
First, Create and populate sample data (Please save us this step in your future questions)
-- This would not be needed if you've used ISO8601 for date strings (yyyy-mm-dd | yyyymmdd)
SET DATEFORMAT DMY;
DECLARE #T AS TABLE
(
ID int,
UDate date,
last_code varchar(10),
Ddate date
) ;
INSERT INTO #T (ID, Udate, last_code, Ddate) VALUES
(1, '05/11/2018', 'ACCEPTED', '13/10/2018'),
(1, '03/11/2018', 'ATTEMPT' , '13/10/2018'),
(1, '01/11/2018', 'INFO' , '13/10/2018'),
(1, '22/10/2018', 'ARRIVED' , '13/10/2018'),
(1, '15/10/2018', 'SENT' , '13/10/2018');
The cte:
WITH CTE AS
(
SELECT ID,
Udate,
last_code,
Ddate,
DATEDIFF(
DAY,
ISNULL(
LAG(Udate) OVER(PARTITION BY ID ORDER BY Udate),
Ddate
),
UDate
) As Difference
FROM #T
)
The query:
SELECT *, AVG(Difference) OVER(PARTITION BY ID) As AverageDifference
FROM CTE;
Results:
ID Udate last_code Ddate Difference AverageDifference
1 15.10.2018 SENT 13.10.2018 2 4
1 22.10.2018 ARRIVED 13.10.2018 7 4
1 01.11.2018 INFO 13.10.2018 10 4
1 03.11.2018 ATTEMPT 13.10.2018 2 4
1 05.11.2018 ACCEPTED 13.10.2018 2 4

Limit RANGE with condition in Window function

Take an example I have the following transaction table, with transaction values of each department for each trimester.
TransactionID | Department | Trimester | Year | Value | Moving Avg
1 | Dep1 | T1 | 2014 | 13 |
2 | Dep1 | T1 | 2014 | 43 |
3 | Dep1 | T2 | 2014 | 36 |
300 | Dep1 T1 | 2017 | 28 |
301 | Dep2 T1 | 2014 | 24 |
I would like to calculate moving average for each transaction from the same department, taking the window as from the 6 trimesters to 2 trimesters before the current line's trimester. Example for transaction 300 in T1 2017, I'd like to have the average of transaction values for Dep1 from T1-2015 to T2-2016.
How can I achieve this with sliding window function in SQL Server 2014. My thought is that I should use something like
SELECT
AVG(VALUES) OVER
(PARTITION BY DEPARTMENT ORDER BY TRIMESTER,
YEAR RANGE [Take the range from previous 6 to 2 trimesters])
How would we define the RANGE clause. I suppose I could not use ROWS due to the number of rows for the window is unknown.
The same question for median. How would we rewrite for calculating the median instead of mean ?

Delta of values in same table over time - SQL Server

I have a table of data as below in SQL Server:
+-------+------------+-------------------------+--------------------------+
| ID | IP | Date | NumFails |
+-------+------------+-------------------------+--------------------------+
| 21365 | 172.16.2.1 | 2016-05-16 00:20:54.000 | 200 |
| 21457 | 172.16.3.1 | 2016-05-16 00:21:05.000 | 295 |
| 21478 | 172.16.4.1 | 2016-05-16 00:22:46.000 | 128 |
| 24255 | 172.16.2.1 | 2016-05-16 12:22:01.000 | 213 |
| 24318 | 172.16.3.1 | 2016-05-16 12:22:12.000 | 297 |
| 24366 | 172.16.4.1 | 2016-05-16 12:23:52.000 | 243 |
| 25699 | 172.16.2.1 | 2016-05-16 18:21:31.000 | 226 |
| 25794 | 172.16.3.1 | 2016-05-16 18:21:41.000 | 347 |
| 25811 | 172.16.4.1 | 2016-05-16 18:22:51.000 | 270 |
| 27142 | 172.16.2.1 | 2016-05-17 00:22:45.000 | 227 |
| 27193 | 172.16.3.1 | 2016-05-17 00:22:55.000 | 347 |
| 27251 | 172.16.4.1 | 2016-05-17 00:23:59.000 | 270 |
+-------+------------+-------------------------+--------------------------+
I have an idea of how to do this programmatically, but I'm too new to SQL to know how to do this: I want to get the delta of NumFails given a specific time period. For this, I want to be able to do a query that:
Selects IP address from time period A (<2016-05-17 01:00:00.000 and >2016-05-17 00:00:00.000) and matching IP address from time period B (<2016-05-16 01:00:00.000 and >2016-05-16 00:00:00.000) and returns IP address and the difference from period A numfails result MINUS period b numfails result. This is done for every unique IP address in time period A (all are unique) comparing against time period B.
Any easy way to do such a thing? I want to run the report on a daily basis, so period A will shift to today's date, and period B will be the previous day's date. I can pre-populate that with the calling SQL, but I have no clue what to build to grab the two values and do the difference and report.
This solution assumes each IP will exist in both timeframes.
declare #StartPeriodA datetime
declare #EndPeriodA datetime
declare #StartPeriodB datetime
declare #EndPeriodB datetime
set #StartPeriodA = '2016-05-17 00:00:00.000'
set #EndPeriodA = '2016-05-17 01:00:00.000'
set #StartPeriodB = '2016-05-16 00:00:00.000'
set #EndPeriodB = '2016-05-16 01:00:00.000'
select a.IP, a.PeriodAFailures - b.PeriodBFailures as 'FailureDifference'
from
(
select IP, sum(NumFails) as PeriodAFailures
from YourTable
where Date between #StartPeriodA and #EndPeriodA
group by IP
) a
inner join
(
select IP, sum(NumFails) as PeriodBFailures
from YourTable
where Date between #StartPeriodB and #EndPeriodB
group by IP
) b on a.IP = b.IP
You can manipulate the dates as needed.

Resources