I'm trying to use table decorators ranges in one of the AppEngine RequestLog table in BigQuery. According to documentation log entries are objects of type LogEntry https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry.
There are two columns timestamp and receiveTimestamp. The first column description is "The time the event described by the log entry occurred" and for the second one "The time the log entry was received by Stackdriver Logging".
I tried to compare time range and number of records in table querying table using timestamp column and table decorator range.
Query where I'm using timestamp column.
SELECT count(*), MIN( timestamp ), max( timestamp )
FROM [project_id:dataset.appengine_googleapis_com_request_log_20170622]
WHERE timestamp between timestamp('2017-06-22 01:00:00') and
date_add(timestamp('2017-06-22 01:00:00'), 1, 'hour')
Query result.
1698320 | 2017-06-22 01:00:00 UTC | 2017-06-22 01:59:59 UTC
Query where I'm using table decorator range.
--select timestamp_to_msec(timestamp('2017-06-22 01:00:00')) as time1,
timestamp_to_msec(date_add(timestamp('2017-06-22 01:00:00'), 1, 'hour')) as time2
SELECT count(*), min(timestamp), max(timestamp)
FROM [project_id:dataset.appengine_googleapis_com_request_log_20170622#1498093200000-1498096800000]
Query result.
1534754 | 2017-06-22 00:40:45 UTC | 2017-06-22 01:35:59 UTC
I did not get the same date range and the same number of records. What each of these three timestamps mean? And how table decorators ranges works under the hood? (Does BigQuery make snapshots of tables, when it makes them)
Table Decorators docenter link description here explains that it uses snapshot.But it "References a snapshot of the table at " -- meaning the time the data was ingested into bigquery. However that's totally unrelated to the timestamp fields in your table, because those fields represent the time the related event happened, not the time the field is ingested into bigquery.
Related
I have a cronjob that looks at load previous day data sourcing another table that gets refreshed on a daily basis. I am looking to update the job to source from origination table that holds entire year of data. However I am just looking to capture and load previous 1 day of data from the origination table. The origination table has a date updated field which is in timestamptz format (ex - 2022-08-01 20:20:20.736+00). Any recommendation on what function to place where the job picks up:
last_updated from >= 2022-08-01 10:00:00 and last_updated from <= 2022-08-02 10:00:00. Assuming I am running this on 2022-08-02 11:00:00.
Thanks,
I am trying to create a calendar using Power Query functions and for that I used below syntax in blank query:
Source= Duration.TotalDays(DateTime.LocalNow() - #datetime(2014,01,01,00,00,00)) * 24
Date= List.DateTimes(#datetime(2014,01,01,00,00,00), Source ,#duration(0,1,0,0))
Then I convert to a table and apply query.
Connect dimension date table to date column in fact table.
The error occurs when I’m trying to mark table as date table:
‘The date column can only gave one timestamp per day. The date column
can’t have gaps in dates’
What I have done wrong?
As the error message says:
The date column can only have one timestamp per day.
While you are trying to add 24, one for each hour. See the requirements for setting a table as a date table:
if it is a Date/Time data type, it has the same timestamp across each value
i.e. you can have only one value for each date, and if it is not a date, but datetime value, all time values should be the same.
I'm trying to rank a series of transactions, however my source data does not capture the time of a transaction which can happen multiple times a day, the only other field I can use is a timestamp field - will this be ranked correctly?
Here's the code
SELECT [LT].[StockCode]
, [LT].[Warehouse]
, [LT].[Lot]
, [LT].[Bin]
, [LT].[TrnDate]
, [LT].[TrnQuantity]
, [LT].[TimeStamp]
, LotRanking = Rank() Over (Partition By [LT].[Warehouse],[LT].[StockCode],[LT].[Lot] Order By [LT].[TrnDate] Desc, [LT].[TimeStamp] Desc)
From [LotTransactions] [LT]
Results being returned are as below
StockCode |Warehouse |Lot |Bin |TrnDate |TrnQuantity |TimeStamp |LotRanking
2090 |CB |3036 |CB |2016-02-16 00:00:00.000 |2.000000 |0x0000000000500AB9 |1
2090 |CB |3036 |CB |2016-02-16 00:00:00.000 |2.000000 |0x0000000000500A4E |2
First, you should be using rowversion rather than timestamp for keeping track of row versioning information. I believe timestamp is deprecated. At the very least, the documentation explicitly suggests [rowversion][1].
Second, I would strongly recommend that you add an identity column to the table. This will provide the information that you really need -- as well as a nice unique key for the table.
In general, a timestamp or rowversion is used just to determine whether or not a row has changed -- not to determine the ordering. But, based on this description, what you are doing might be correct:
Each database has a counter that is incremented for each insert or
update operation that is performed on a table that contains a
timestamp column within the database. This counter is the database
timestamp. This tracks a relative time within a database, not an
actual time that can be associated with a clock. A table can have only
one timestamp column. Every time that a row with a timestamp column is
modified or inserted, the incremented database timestamp value is
inserted in the timestamp column.
I would caution that this might not be safe. Instead, it gives a reason why such an approach might make sense. Let me repeat the recommendation: add an identity column, so you are correctly adding this information, at least for the future.
You can use something like this to get datetime of transaction:
SELECT LEFT(CONVERT(nvarchar(50),[LT].[TrnDate],121),10) + RIGHT(CONVERT(nvarchar(50),CAST([LT].[TimeStamp] as datetime),121),13)
For first string it will be:
2016-02-16 04:51:25.417
And use this for ranking.
I have two tables, TicketReport and TimeTracker.
TicketReport Columns:
Ticket_Number
Report_DT (DateTime ticket was reported)
Response_Time (Hourly value. How long it took for this ticket to be responded to. This is where I need the place the hours. DateTime Reported minus DateTime started)
TimeTracker Column:
Ticket_Number
Time_Start( DateTime that work started on this ticket)
Right now, for every row in my TicketReport table, the Response_Time column contains either Null or just a test value of 1(hour).
*I need to calculate how many hours it took for a ticket to be responded to (ReportDT - Time_Start), and then insert that hourly value into the Report_DT column for each row in the TicketReport table.
I did some research and found DATEDIFF, but I think this only returns days, and even if it did return hours, im not sure how to use it.
How could I accomplish this in a stored procedure?
To update the response_time column with the number of hours you can use a simple update query (which you could wrap in a stored proc if you need.
The query could look like this:
-- uncomment the next line to create a sp...
-- create proc update_response_time as
update tr
set response_time = datediff(hour, report_dt, time_start)
from TicketReport tr
join TimeTracker tt on tr.Ticket_Number = tt.Ticket_Number
If you want to run it without updating (to see what the values would be) you can run it as a select query:
select *, datediff(hour, report_dt, time_start) as diff_in_hours
from TicketReport tr
join TimeTracker tt on tr.Ticket_Number = tt.Ticket_Number
Be aware that these queries assume that there are just one matching row in the TimeTracker table for each ticket. If there can be multiple rows you would need another solution. Also know that the hour value doesn't take minutes into account at all so if report_dt is at 12:00 and time_start is at 12:30 it would be reported as 0 hours, so maybe a finer granularity than hours would be more suitable.
You can use DATEDIFF function
Syntax
DATEDIFF ( datepart , startdate , enddate )
pass datepart hh to get hours
I have a table of database size information. The data is collected daily. However, some days are missed due to various reasons. Additionally we have databases which come and go over or the size does not get recorded for several databases for a day or two. This all leads to very inconsistent data collection regarding dates. I want to construct a SQL procedure which will generate a percentage of change between any two dates (1 week, monthly, quarterly, etc.) for ALL databases The problem is what to do if a chosen date is missing (no rows for that date or no row for one or more databases for that date). What I want to be able to do is get the nearest available date for each database for the two dates (begin and end).
For instance, if database Mydb has these recording dates:
2015-05-03
2015-05-04
2015-05-05
2015-05-08
2015-05-09
2015-05-10
2015-05-11
2015-05-12
2015-05-14
and I want to compare 2015-05-06 with 2015-05-14
The 2015-05-07 date is missing so I would want to use the next available date which is 2015-05-08. Keep in mind, MyOtherDB may only be missing the 2015-05-06 date but have available the 2015-05-07 date. So, for MyOtherDb I would be using 2015-05-07 for my comparison.
Is there a way to proceduralize this with SQL WITHOUT using a CURSOR?
You're thinking too much into this, simple do a "BETWEEN" function in your where clause that takes the two parameters.
In your example, if you perform the query:
SELECT * FROM DATABASE_AUDIT WHERE DATE BETWEEN param1 /*2015-05-06*/ and param2 /*2015-05-14*/
It will give you the desired results.
select (b.dbsize - a.dbsize ) / a.dbsize *100 dbSizecChangePercent from
( select top 1 * from dbAudit where auditDate = (select min(auditDate) from dbAudit where auditDate between '01/01/2015' and '01/07/2015')) a
cross join
(select top 1 * from dbAudit where auditDate = (select max(auditDate) from dbAudit where auditDate between '01/01/2015' and '01/07/2015')) b
The top 1 can be replaced by a group by. This was assuming only 1 db aduit per day