How to query the duration of a state in a day in TDengine? - tdengine

I'm writing a program to process device data. I used TDengien as a time series database. I have a table with a status field, representing the status of the device, the value of 1, 2, 3, 4 for four cases. When the device changes its status, my program will insert one piece of data into TDengine.
There is a requirement: query the duration of a state in a day. I cannot tell how to write a sql for it. Do I need to create a time window?

Related

Converting Large Data Table To Use Partitions

I have a single MSSQL 2017 Standard table, let's call it myTable, with data going back to 2015, containing 206.4 million rows. Once INSERTed by the application, these rows are never modified or deleted. The table is actively collecting data, 24/7.
My goal is to reduce the data in this table to only the most recent full 6 months plus current month, into monthly-based partitions for easy monthly pruning. myTable.dateCreated would determine which partition the data ultimately resides.
(Unrelated, but mentioning in case it ends up being relevant: I have an existing application that replicates all data that gets stored in myTable out to a data warehouse for long term storage every 15 minutes; the main application is able to query myTable for recent data and the data warehouse for older data as needed.)
Because I want to prune the oldest one month worth of data out of myTable each time a new month starts, partitioning myTable by month makes the most sense - I can simply SWITCH the oldest partition to a staging table, then truncate that staging table without causing downtime or performance on the main table.
I've come up with the following plan, and my questions are simple: Is this the best way to approach this task, and will it keep downtime/performance degradation to a minimum?
Create a new table, myTable_pending, with the same exact table structure as myTable, EXCEPT that it will have a total of 7 monthly partitions (6 months retention plus current month) configured;
In one complete step: rename myTable to myTable_transfer, and rename myTable_pending to myTable. This should have the net effect of allowing incoming data to continue being stored, but now it will be in a partition for the month of 2023-01;
Step 3 is where I need advice... which of the following might be best to get the remaining 6mos + current data back into the now-partitioned myTable, or are there additional options I should consider?
OPTION 1: Run a Bulk Insert of just the most recent 6 months of data from myTable_transfer back into myTable, causing the data to end up in the correct partitions in the process (with the understanding that this may still take some time, but not as long as a bunch of INSERTs that would end up chewing on the transaction log);
OPTION 2: Run a DELETE against myTable_transfer, getting rid of all data except the most recent full 6 months + current, and then set up and apply partitions on THIS table, that would then cause SQL Server to reorganize the data into those partitions, but without affecting access or performance on myTable, after which I could just SWITCH the partitions from myTable_transfer into myTable for immediate access; (related issue: since myTable is still collecting current data, and myTable_transfer will contain data from the current month as well, can the current month partitions be merged?)
OPTION 3: Any other way to do this, so that myTable ends up with 6 months worth of data, properly partitioned, without significant downtime?
We ended up revising our solution, since the original table was replicated to a data warehouse anyway, we simply renamed the table and created a new one with partitioning to start collecting new data from the rename point. This provided the least amount of downtime, the fastest schema changes, and gave us the partitioning we needed to maintain the table efficiently going forward.

SpringBatch application periodically pulling data from DB

I am working on a spring batch service that pulls data from a db on a schedule. (e.g. every day at 12pm)
I am using JdbcPagingItemReader to read the data and a scheduler (#Scheduled provided by spring batch) to launch the job. The problem that I have now is: every time the job runs, it will just pull all the data from the beginning and not from the "last read" row.
The data from the db is changing everyday(deleting old ones and adding new ones) and all I have is a timestamp column to track them.
Is there a way to "remember" the last row read from the last execution of the job and read data only later than that row?
Since you need to pull data on a daily basis, and your records have a timestamp, then you can design your job instances to be based on a given date (ie using the date as an identifying job parameter). With this approach, you do not need to "remember" the last processed record. All you need to do is process records for a given date by using the correct SQL query. For example:
Job instance ID
Date
Job parameter
SQL
1
2021-03-22
date=2021-03-22
Select c1, c2 from table where date = 2021-03-22
2
2021-03-23
date=2021-03-23
Select c1, c2 from table where date = 2021-03-23
...
...
...
...
With that in place, you can use any cursor-based or paging-based reader to process records of a given date. If a job instance fails, you can restart it without a risk to interfere with other job instances. The restart could be done even several days after the failure since the job instance will always process the same data set. Moreover, in case of failure and job restart, Spring Batch will reprocess records from the last check point in the previous (failed) run.
Just want to post an update to this question.
So in the end I created two more steps to achieve what I wanted to do initially.
Since I don't have the privilege to modify the table where I read the data from, I couldn't use the "process indicator pattern" which involves having a column to mark if a record is processed or not. I created another table to store the last-read record's timestamp, and use it to update the sql query.
step 0: a tasklet that reads the bookmark from a table, pass it in the job context
step 1: a chunk step, get the bookmark from the context, use jdbcPagingItemReader to read the data
step 2: a tasklet to update the bookmark
But doing this I have to be very cautious with the bookmark table. If I lose that I lose everything

Maintaining Row Sequence across multiple tables

I have triggers on multiple tables that capture updates to a table (Capture Table). Once that insert is complete the trigger completes and control returns to the calling application. It's expected that this capture table will receive about 1.75 million records per day.
I then have a Windows service that runs and takes the data from the Capture Table, adds some context by referencing other tables in our database, then inserts records into a notification table that is used by another process.
Our Capture Table has an IDENTITY column and a DATETIME to help with sequencing. For the DATETIME, we set a variable to GETDATE() and use the variable so that all records in the same transaction will have the exact same DATETIME.
I am looking at making the process more robust. At this point the Windows service calls a stored procedure and processes one transaction at a time in sequence. This could become a bottleneck. We may need to process multiple transactions at the same time.
How can I guarantee the sequence is the same between the Capture Table and the notification table? It’s critical that the records stay in the same order. Thanks for your input!

SQL Server query with paging - different time for different pages

I have table with 12000 records and I have a query where this table is joined with few tables + paging. I measure time using SET STATISTICS TIME ON/OFF. For first pages it's very fast but the closer to the last page the more time it takes. Is it normal?
This is normal because SQL Server has no way to directly seek to a given page of a logical query. It scans through a stream of results until it has arrived at the page you wanted.
If you want constant time paging you need to provide some kind of seek key on an index. For example if you can guarantee that your ID int column has consecutive values starting with 1 you can get any page in constant time simply by saying WHERE ID >= ... and ID < ....
I'm sure you'll find other approaches on the web but there's nothing built into the product.

Send an Email when no inserts occur for a specified time period in SQL Server

One of the computers in my department runs an excel macro every 30 seconds. When it finishes, it submits a row to a table in a SQL Server DB that I control (I don't have the ability to change the macro).
I would like to demonstrate how frequently this setup fails by sending an email when there have been no new records during a given period. I'm familiar with sending emails through SQL Server but I don't know how to accomplish the timing.
How do I trigger an email when there is no activity?
You can create a job that would run every half an hour and select the most recent row from the table based on the Id or date, and send the email with the result. If there is no new row you can conclude the outcome.
You could set up a job that runs periodically - every minute, five minutes, etc. - and have it check the date/time of the last inserted record. If it's been greater than a predefined amount of time, you fire an email off.

Resources