TDengine all streams stopped working for no reason - tdengine

all streams stop working at one time for no reason;
one of the stream create sql is:
CREATE STREAM IF NOT EXISTS cq_action_info_1m_s TRIGGER WINDOW_CLOSE into cq_action_info_1m SUBTABLE(CONCAT('cai1_', tbname)) AS SELECT _wstart as _ts, avg(ave_time_top) as ave_time_top, avg(average_time) as average_time, max(handle_now_count) as handle_now_count, max(last_time) as last_time, max(max_time_top) as max_time_top, max(p_error_rate) as p_error_rate, max(profession_error) as profession_error, max(req_count) as req_count, max(req_second_count) as req_second_count, max(total_time) as total_time, max(unusual_count) as unusual_count, action, action_name, host, port, shelltype FROM zzdb.action_info PARTITION BY action, action_name, host, port, shelltype INTERVAL(1m);
some system information is as follows:
TDengine version: 3.0.2.2
But I'm not sure if the result of the last write is complete;
This is my latest query result, it still stops at 30th, I don't know if there is a local log file for query reason:

Related

when using TDenging database, all streams stopped working for no reason

all streams stop working at one time for no reason;
one of the stream create sql is:
CREATE STREAM IF NOT EXISTS cq_action_info_1m_s TRIGGER WINDOW_CLOSE into cq_action_info_1m SUBTABLE(CONCAT('cai1_', tbname)) AS SELECT _wstart as _ts, avg(ave_time_top) as ave_time_top, avg(average_time) as average_time, max(handle_now_count) as handle_now_count, max(last_time) as last_time, max(max_time_top) as max_time_top, max(p_error_rate) as p_error_rate, max(profession_error) as profession_error, max(req_count) as req_count, max(req_second_count) as req_second_count, max(total_time) as total_time, max(unusual_count) as unusual_count, action, action_name, host, port, shelltype FROM zzdb.action_info PARTITION BY action, action_name, host, port, shelltype INTERVAL(1m);
some system information is as follows:
TDengine version: 3.0.2.2
show dnodes:
enter image description here
streams:
enter image description here

Flink SQL Tumble Aggregation result not written out to filesystem locally

Context
I have a Flink job coded by python SQL api. it is consuming source data from Kinesis and producing results to Kinesis. I want to make a local test to ensure the Flink application code is correct. So I mocked out both the source Kinesis and sink Kinesis with filesystem connector. And then run the test pipeline locally. Although the local flink job always run successfully. But when I look into the sink file. The sink file is alway empty. This has also been the case when I run the code in 'Flink SQL Client'.
Here is my code:
CREATE TABLE incoming_data (
requestId VARCHAR(4),
groupId VARCHAR(32),
userId VARCHAR(32),
requestStartTime VARCHAR(32),
processTime AS PROCTIME(),
requestTime AS TO_TIMESTAMP(SUBSTR(REPLACE(requestStartTime, 'T', ' '), 0, 23), 'yyyy-MM-dd HH:mm:ss.SSS'),
WATERMARK FOR requestTime AS requestTime - INTERVAL '5' SECOND
) WITH (
'connector' = 'filesystem',
'path' = '/path/to/test/json/file.json',
'format' = 'json',
'json.timestamp-format.standard' = 'ISO-8601'
)
CREATE TABLE user_latest_request (
groupId VARCHAR(32),
userId VARCHAR(32),
latestRequestTime TIMESTAMP
) WITH (
'connector' = 'filesystem',
'path' = '/path/to/sink',
'format' = 'csv'
)
INSERT INTO user_latest_request
SELECT groupId,
userId,
MAX(requestTime) as latestRequestTime
FROM incoming_data
GROUP BY TUMBLE(processTime, INTERVAL '1' SECOND), groupId, userId;
Curious what I am doing wrong here.
Note:
I am using Flink 1.11.0
if I directly dump data from source to sink without windowing and grouping, it works fine. That means the source and sink table is set up correctly. So it seems the problem is around the Tumbling and grouping for local filesystem.
This code works fine with Kinesis source and sink.
Have you enabled checkpointing? This is required if you are in `STREAMING mode which appears to be the case. See https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/file_sink/
The most likely cause is that there isn't enough data in the file being read to keep the job running long enough for the window to close. You have a processing-time-based window that is 1 second long, which means that the job will have to run for at least one second to guarantee that the first window will produce results.
Otherwise, once the source runs out of data the job will shut down, regardless of whether the window contains unreported results.
If you switch to event-time-based windowing, then when the file source runs out of data it will send one last watermark with the value MAX_WATERMARK, which will trigger the window.

flink job submit through sql-client.sh sometime without any checkpoint (what is the way to alter it) or how to recover in case of failure

for example sql-client.sh embedded
insert into wap_fileused_daily(orgId, pdate, platform, platform_count) select u.orgId, u.pdate, coalesce(p.platform,'other'), sum(u.isMessage) as platform_count from users as u left join ua_map_platform as p on u.uaType = p.uatype where u.isMessage = 1 group by u.orgId, u.pdate, p.platform
it will show up as:enter image description here
there will never be any checkpoint.
Question: 1) how to trigger checkpoint ( alert job)
2) how to recover in case of failure
You can specify execution configuration parameters in the SQL Client YAML file. For example, the following should work:
configuration:
execution.checkpointing.interval: 42
There is a feature request on flink:https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished

Deadlock wait resource RID

Can anyone help me with the steps to decode the exact culprit of the deadlock when wait resource is RID and also how to remove that deadlock?
you can use Sp who & sp who2 System Stored Procedures
sp_who
Sp_who returns information about system and user activity.
Sp_who returns the following columns: Spid System process id that
requested the lock Ecid Execution context of the thread associated
with the spid. Zero means the main thread, all other numbers mean
sub-threads. Status Runnable, sleeping, or background. If the status
is runnable that means the process is actually performing work,
sleeping means the process is connected to the server, but is idle at
the moment. Loginname The login that has initiated the lock request
Hostname The name of the computer where the lock request was initiated
Blk The connection that is blocking the lock request from the current
connection Dbname Database name where the lock has been requested
Cmd General command type that requested the lock The syntax of sp_who
also allows specifying a single login, however, most of the time it
will be executed with no parameters. The output of sp_who is very
similar to the output of sp_who2.
sp_who2
Sp_who2 is a newer version of sp_who. It returns some
additional information: Spid System process id that requested the
lock Status Background, sleeping or runnable Login The login name that
has requested the lock HostName The computer where the lock request
has been initiated BlkBy The spid of the connection that is blocking
the current connection DbName The database name where the lock request
has been generated Command General command type that requested the
lock CPUTime The number of milliseconds the request has used
DiskIO Disk input / output that the command has used LastBatch Date
and time of the last batch executed by the connection ProgramName The
name of the application that issued the connection Spid In case you
can't read the spid from the beginning of the output it is repeated
here Example: Sp_who2
Blocking details :
--============================================
--View Blocking in Current Database
--Author: Timothy Ford
--http://thesqlagentman.com
--============================================
SELECT DTL.resource_type,
CASE
WHEN DTL.resource_type IN ('DATABASE', 'FILE', 'METADATA') THEN DTL.resource_type
WHEN DTL.resource_type = 'OBJECT' THEN OBJECT_NAME(DTL.resource_associated_entity_id)
WHEN DTL.resource_type IN ('KEY', 'PAGE', 'RID') THEN
(
SELECT OBJECT_NAME([object_id])
FROM sys.partitions
WHERE sys.partitions.hobt_id =
DTL.resource_associated_entity_id
)
ELSE 'Unidentified'
END AS requested_object_name, DTL.request_mode, DTL.request_status,
DOWT.wait_duration_ms, DOWT.wait_type, DOWT.session_id AS [blocked_session_id],
sp_blocked.[loginame] AS [blocked_user], DEST_blocked.[text] AS [blocked_command],
DOWT.blocking_session_id, sp_blocking.[loginame] AS [blocking_user],
DEST_blocking.[text] AS [blocking_command], DOWT.resource_description
FROM sys.dm_tran_locks DTL
INNER JOIN sys.dm_os_waiting_tasks DOWT
ON DTL.lock_owner_address = DOWT.resource_address
INNER JOIN sys.sysprocesses sp_blocked
ON DOWT.[session_id] = sp_blocked.[spid]
INNER JOIN sys.sysprocesses sp_blocking
ON DOWT.[blocking_session_id] = sp_blocking.[spid]
CROSS APPLY sys.[dm_exec_sql_text](sp_blocked.[sql_handle]) AS DEST_blocked
CROSS APPLY sys.[dm_exec_sql_text](sp_blocking.[sql_handle]) AS DEST_blocking
WHERE DTL.[resource_database_id] = DB_ID()
sp-who-to-find-dead-locks-in-SQL-Server
different-techniques-to-identify-blocking-in-sql-server
understanding-sql-server-blocking

How to resolve Application timeout issue due to SQL Query in 'Killed/ROLLBACK' Scenario

I've an application which has a database in SQL server 2012 and the application use entity framework to communicate with database. In the application, there is a functionality to update a record in a table based on a WHERE condition and the Primary Key field is the one used in the WHERE condition. So the update happens only to a single record (there is no loop or anything just an update to a single record). This is the background.
Here is the issue now I'm facing - I'm getting a timeout error message from the application when I invoke the functionality to update a record in the table (as mentioned above). I checked the query execution in the SQL server using 'Activity Monitor' and under the 'Processes' tab I could see that the Command of this query comes to 'KILLED/ROLLBACK' after some 'Wait Types' (like LOGBUFFER, pageiolatch_XX, etc...)
I tried to execute the update query directly in SQL server and that also not responding. So it's clear the issues is with the SQL server and it takes too much time to execute the update query. But the execution plan looks good and the PK field is used as the where condition. Is it something related with disk latency?
Note: This issues is not consistent, sometimes it works.
Here is the file stat. How can I interpret or reach a conclusion from these data...
My DB
DbId FileId TimeStamp NumberReads BytesRead IoStallReadMS NumberWrites BytesWritten IoStallWriteMS IoStallMS BytesOnDisk FileHandle
2 1 -1152466625 21199845 1351872315392 2528572322 21869447 1424883785728 10201419039 12729991361 28266332160 0x0000000000000C64
2 2 -1152466625 1063 45187072 87119 1013000 61628433920 178901888 178989007 6945505280 0x0000000000000CC4
TempDB
DbId FileId TimeStamp NumberReads BytesRead IoStallReadMS NumberWrites BytesWritten IoStallWriteMS IoStallMS BytesOnDisk FileHandle
18 1 -1152466625 390905 27728437248 52640514 196501 6927843328 817347538 869988052 58378551296 0x0000000000001F3C
18 2 -1152466625 24840 1596173312 645024 56563 3335298048 2590871 3235895 938344448 0x00000000000012BC

Resources