Does DATE_TRUNC Remove Data ? [POSTGRESQL] - database

I want to use a sql query with DATE_TRUNC(). I saw that entry: Snowflake date_trunc to remove time from date
I tested on local docker containers. It worked fine. Just to be sure, does trunc remove/pop timestamps ? It's sound likes truncate :) Thanks for your time.
i.e
SELECT
DATE_TRUNC('month',production_timestamp)
AS production_to_month,
COUNT(id) AS count
FROM watch
GROUP BY DATE_TRUNC('month',production_timestamp);
I want to calculate the monthly number of data in a column. With out update any data.

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
It truncates a timestamp to the accuracy you specify, returning that new value. It doesn't change any data in tables.

Related

Best way to handle time consuming queries in InfluxDB

We have an API that queries an Influx database and a report functionality was implemented so the user can query data using a start and end date.
The problem is that when a longer period is chosen(usually more than 8 weeks), we get a timeout from influx, query takes around 13 seconds to run. When the query returns a dataset successfully, we store that in cache.
The most time-consuming part of the query is probably comparison and averages we do, something like this:
SELECT mean("value") AS "mean", min("value") AS "min", max("value") AS "max"
FROM $MEASUREMENT
WHERE time >= $startDate AND time < $endDate
AND ("field" = 'myFieldValue' )
GROUP BY "tagname"
What would be the best approach to fix this? I can of course limit the amount of weeks the user can choose, but I guess that's not the ideal fix.
How would you approach this? Increase timeout? Batch query? Any database optimization to be able to run this faster?
In such cases where you allow user to select in days, I would suggest to have another table that stores the result (min, max and avg) of each day as a document. This table can be populated using some job after end of the day.
You can also think changing the document per day to per week or per month, based on how you plot the values. You can also add more fields like in your case, tagname and other fields.
Reason why this is superior to using a cache: When you use a cache, you can store the result of the query, so you have to compute for every different combination in realtime. However, in this case, the cumulative results are already available with much smaller dataset to compute.
Based on your query, I assume you are using InfluxDB v1.X. You could try Continuous Queries which are InfluxQL queries that run automatically and periodically on realtime data and store query results in a specified measurement.
In your case, for each report, you could generate a CQ and let your users to query it.
e.g.:
Step 1: create a CQ
CREATE CONTINUOUS QUERY "cq_basic_rp" ON "db"
BEGIN
SELECT mean("value") AS "mean", min("value") AS "min", max("value") AS "max"
INTO "mean_min_max"
FROM $MEASUREMENT
WHERE "field" = 'myFieldValue' // note that the time filter is not here
GROUP BY time(1h), "tagname" // here you can define the job interval
END
Step 2: Query against that CQ
SELECT * FROM "mean_min_max"
WHERE time >= $startDate AND time < $endDate // here you can pass the user's time filter
Since you already ask InfluxDB to run these aggregates continuously based on the specified interval, you should be able to trade space for time.

Anylogic: How to create plot from database table?

In my Anylogic model I succesfully create plots of datasets that count the number of trucks arriving from terminals each hour in my simulation. Now, I want to add the actual/"observed" number of trucks arriving at a terminal, to compare my simulation to these numbers. I added these numbers in a database table (see picture below). Is there a simple way of adding this data to the plot?
I tried it by creating a variable that reads the database table for every hour and adding that to a dataset (like can be seen in the pictures below), but this did not work unfortunately (the plot was empty).
Maybe simply delete the variable and fill the dataset at the start of the model by looping through the dbase table data. Use the dbase query wizard to create a for-loop. Something like this should work:
int numEntries = (int) selectFrom(observed_arrivals).count();
DataSet myDataSet = new DataSet(numEntries);
List<Tuple> rows = selectFrom(observed_arrivals).list();
for (Tuple
row : rows) {
myDataSet.add(row.get( observed_arrivals.hour ), row.get( observed_arrivals.terminal_a ));
}
myChart.addDataSet(myDataSet);
You don't explain why it "didn't work" (what errors/problems did you get?), nor where you defined these elements.
(1) Since you want both observed (empirical) and simulated arrivals per terminal, datasets for each should be in the Terminal agent. And then the replicated plot (in Main) can have two data entries referring to data sets terminals(index).observedArrivals and terminals(index).simulatedArrivals or whatever you name them.
(2) Using getHourOfDay to add to the observed dataset is wrong because that just returns 0-23 (i.e., the hour in the current day for the current model date). Your database table looks like it has hours since model start, so you just want time(HOUR) to get the model time in elapsed hours (irrespective of what the model time unit is). Or possibly time(HOUR) - 1 if you only want to update the empirical arrivals for the hour at the end of that hour (i.e., at the same time that you updated the simulated arrivals).
(3) Using a Variable to get the database value each hour doesn't work because a variable's initial value is only evaluated once at model initialisation. You want an hourly cyclic Event in Terminal instead which adds the relevant row's value. (You need to use the Insert Database Query wizard to generate the relevant Java code for the query you need in the event's action.)
(4) Because you have a database table with specifically-named columns for each terminal (columns terminal_a and presumably terminal_b etc.) that makes it slightly more awkward. (This isn't proper relational table design where, instead of 4 columns for the 4 terminals, you'd instead have two columns for terminal_id and observed_value with a row for each time period and terminal combination.)
So your database query expression (in your Terminal agents) will need to use the SQL format (not the QueryDSL format) so that you can 'stitch in' the correct column name into the SQL.

Extract data by day from SQL Server

I need to get all the values from a SQL Server database by day (24 hours). I have timestamps column in TestAllData table and I want to select the data which only corresponds to a specific day.
For instance, there are timestamps of DateTime type like '2019-03-19 12:26:03.002', '2019-03-19 17:31:09.024' and '2019-04-10 14:45:12.015' so I want to load the data for the day 2019-03-19 and separately for the day 2019-04-10. Basically, it is needed to get DateTime values with the same date.
Is this possible to use some functions like DatePart or DateDiff for that?
And how can I solve such problem overall?
As in this case, I do not know the exact difference in hours between a timestamp and the end of the day (because there are various timestamps for 1 day) and I need to extract the day itself from the timestamp. After that, I need to group the data by days or something like this and get block by block. For example:
'2019-03-19' - 1200 records
'2019-04-10' - 3500 records
'2019-05-12' - 10000 records and so on
I'm looking for a more generic solution not supplying a timestamp (like '2019-03-19') as a boundary or in a where clause because the problem is not about simply filtering the data by some date!!
UPDATE: In my dataset, I have about 1,000,000 records and more than 100 unique dates. I was thinking about extracting the set of unique dates and then kind of run a query in the loop where the data would be filtered by the provided day. It would look in such a way:
select * from TestAllData where dayColumn = '2019-03-19'
select * from TestAllData where dayColumn = '2019-04-10'
select * from TestAllData where dayColumn = '2019-05-12'
...
I might use this query in my code, so I may run it in the loop from Scala function. However, I am not sure that in terms of performance it would be ok to run separate unique dates extraction query.
Depending on whether you want to be able to work with all the dates (rather than just a subset), one of the easiest ways to achieve this is with a cast:
;with cte as (SELECT cast(my_datetime as date) as my_date, * from TestAllData)
SELECT * FROM cte where my_date = '2019-02-14'
Note when casting datetime to date, times are truncated, ie just the date part is extracted.
As I say though, whether this is efficient, depends on your needs, as all datetime values from all records will be cast to date, before the data is filtered. If you want to select several dates (as opposed to just one or two), however, it may prove overall quicker, as it reads the whole table once and then gives you a column upon which you can much more efficiently filter.
If this is a permanent requirement, though, I would probably use a persisted computed column, which effectively would mean that the casting is done once initially and then only again if the corresponding value changed. For a large table I would also strongly consider an index on the computed column.

display image on a report based on two date

I am using sql server 2005 reporting service to generate report base on a database. There are two columns which are datetime type ColumnA and ColumnB. The report would display a KPI image on this report by comparing these two columns.Below is the expression for selecting image
SWITCH(DateDiff("d",Fields!ColumnA.Value,Fields!ColumnB.Value)<0,"kpi_r",
DateDiff("d",Fields!ColumnA.Value,Fields!ColumnB.Value)>0,"kpi_g",
DateDiff("d",Fields!ColumnA.Value,Fields!ColumnB.Value)=0,"kpi_y")
For most of the records, the image is correct. Only for one record, the result is very strange.
For this record
ColumnA=2010-04-23 08:00:00 ColumnB=2010-04-22 17:00:00
It would display kpi_r, it displayed kpi_y. I have checked the value of DateDiff(d,Fields!ColumnA.Value,Fields!ColumnB.Value) in the SSMS, the value is -1. Why does it display kpi_y? Does anyone meet this problem before?
Best Regards,
The difference is that the SSMS DATEDIFF function counts the interval boundaries between the two dates whereas ReportBuilder counts the actual intervals. Within SSMS if you cross midnight you have triggered a day boundary so in your example you get -1. In ReportBuilder it is looking for 24 hours to be between the two values so you get 0. If you change the time on ColumnA to be '2010-04-23 17:00:00' you will see the value changes to -1 as you expected. For your comparison it would probably make sense to strip the time component from ColumnA and ColumnB when you do this SWITCH statement.
the above answer is spot on.
here's a few ways to strip time off a date depending on your preferences:
1. do it in RS: use and expression like dateserial(year(Fields!ColumnA.Value),month(Fields!ColumnA.Value), day(Fields!ColumnA.Value)) in your switch expression
2. do it in SQL: use an expression like cast(round(cast(ColumnA as float),0,1) as datetime) in your query

How to display time in HH:MM format?

How to display a time HH:MM format?
Using SQL 2000
In my Database time column datatype is varchar
Example
Table1
Time
08:00:00
09:00:23
214:23:32
Here I want to take only 08:00, 09:00, 214:23
How to make a query for this condition?
Whilst you could choose to turn the varchar into a datetime and format it there, assuming you do not want rounding, you could could shortcut the process. (Assuming the time format in the varchar is consistent)
select left('08:00:00',5)
Edit : Question altered, now I would use
select substring('243:00:00', 1, len('243:00:00') - 3)
and replace the value I used with the appropriate field
Cheap and cheerful.
I think Andrew was onto a correct solution, just didn't address all of the possibilities:
SELECT LEFT(Time, LEN(TIME)-3)
should trim off the last 3 characters.
Now, if you want to round up, that's another story....

Resources