What causes data on a read-replica to be an old_snapshot and cause conflict? - database

After encountering (on an RDS Postgres instance) for several times:
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
I ran (on the hot standby):
SELECT *
FROM pg_stat_database_conflicts;
And found that all the conflicts have to do with confl_snapshot
Which is explained in the documentation as:
confl_snapshot: Number of queries in this database that have been canceled due to old
snapshots
What might be causing this conflict (being an old snapshot)?
If it helps, here are some of the relevant settings (by running SHOW ALL ; on the stand by):
hot_standby: on
hot_standby_feedback: off
max_standby_archive_delay: 30s
max_standby_streaming_delay: 1h
name,setting
old_snapshot_threshold: -1
vacuum_defer_cleanup_age: 0
vacuum_freeze_min_age: 50000000
vacuum_freeze_table_age: 150000000
vacuum_multixact_freeze_min_age: 5000000
vacuum_multixact_freeze_table_age: 150000000
wal_level: replica
wal_receiver_status_interval: 10s
wal_receiver_timeout: 30s
wal_retrieve_retry_interval: 5s
wal_segment_size: 16MB
wal_sender_timeout: 30s
wal_writer_delay: 200ms

Related

mongodb transaction aborted what parameters to increase

Some of my transactions were aborted in mongodb, from the log file, I digged up the info "transaction parameters:... terminationCause:aborted timeActiveMicros:205 timeInactiveMicros:245600632 ...
I increased the transactionLifetimeLimitSeconds on the server to 3000 from 60, which was 50 minutes, should be plenty, the transaction should at most take 10 minutes. Still not working
the second thing I tweaked was at the client side (pymongo), I changed wtimeout on write_concern to 500000000 from 1000, still getting the same error.
Any other parameters I should change?

How to check max-sql-memory and cache settings for an already running instance of cockroach db?

I have a cockroachdb instance running in production and would like to know the settings for the --max-sql-memory and --cache specified when the database was started. I am trying to enhance performance by following this production checklist but I am not able infer the setting either on dashboard or sql console.
Where can I check the values of max-sql-memory and cache value ?
Note: I am able to access the cockroachdb admin console and sql tables.
You can find this information in the logs, shortly after node startup:
I190626 10:22:47.714002 1 cli/start.go:1082 CockroachDB CCL v19.1.2 (x86_64-unknown-linux-gnu, built 2019/06/07 17:32:15, go1.11.6)
I190626 10:22:47.815277 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 31 GiB, using system memory
I190626 10:22:47.815311 1 server/config.go:386 system total memory: 31 GiB
I190626 10:22:47.815411 1 server/config.go:388 server configuration:
max offset 500000000
cache size 7.8 GiB <====
SQL memory pool size 7.8 GiB <====
scan interval 10m0s
scan min idle time 10ms
scan max idle time 1s
event log enabled true
If the logs have been rotated, the value depends on the flags.
The defaults for v19.1 are 128MB, with recommended settings being 0.25 (a quarter of system memory).
The settings are not currently logged periodically or exported through metrics.

Why does the log always say "No Data Available" when the cube is built?

In the sample case on the Kylin official website, when I was building cube, in the first step of the Create Intermediate Flat Hive Table, the log is always No Data Available, the status is always running.
The cube build has been executed for more than three hours.
I checked the hive database table kylin_sales and there is data in the table.
And I fount that the intermediate flat hive table kylin_intermediate_kylin_sales_cube_402e3eaa_dfb2_7e3e_04f3_07248c04c10c
has been created successfully in the hive, but there is no data in its.
hive> show tables;
OK
...
kylin_intermediate_kylin_sales_cube_402e3eaa_dfb2_7e3e_04f3_07248c04c10c
kylin_sales
...
Time taken: 9.816 seconds, Fetched: 10000 row(s)
hive> select * from kylin_sales;
OK
...
8992 2012-04-17 ABIN 15687 0 13 95.5336 17 10000975 10000507 ADMIN Shanghai
8993 2013-02-02 FP-non GTC 67698 0 13 85.7528 6 10000856 10004882 MODELER Hongkong
...
Time taken: 3.759 seconds, Fetched: 10000 row(s)
The deploy environment is as follows:
 
zookeeper-3.4.14
hadoop-3.2.0
hbase-1.4.9
apache-hive-2.3.4-bin
apache-kylin-2.6.1-bin-hbase1x
openssh5.3
jdk1.8.0_144
I deployed the cluster through docker and created 3 containers, one master, two slaves.
Create Intermediate Flat Hive Table step is running.
No Data Available means this step's log has not been captured by Kylin. Usually only when the step is exited (success or failed), the log will be recorded, then you will see the data.
For this case, usually, it indicates the job was pending by Hive, due to many reasons. The simplest way is, watch Kylin's log, you will see the Hive CMD that Kylin executes, and then you can run it manually in console, then you will reproduce the problem. Please check if your Hive/Hadoop has enough resource (cpu, memory) to execute such a query.

Almost empty plan cache

I am experiencing a strange situation - my plan cache is almost empty. I use the following query to see what's inside:
SELECT dec.plan_handle,qs.sql_handle, dec.usecounts, dec.refcounts, dec.objtype
, dec.cacheobjtype, des.dbid, des.text,deq.query_plan
FROM sys.dm_exec_cached_plans AS dec
join sys.dm_exec_query_stats AS qs on dec.plan_handle=qs.plan_handle
CROSS APPLY sys.dm_exec_sql_text(dec.plan_handle) AS des
CROSS APPLY sys.dm_exec_query_plan(dec.plan_handle) AS deq
WHERE cacheobjtype = N'Compiled Plan'
AND objtype IN (N'Adhoc', N'Prepared')
One moment it shows me 82 rows, the next one 50, then 40 then 55 and so on while an hour before I couldn't reach the end of the plan cache issuing the same command. The point is that SQL Server keeps the plan cache very-very small.
The main reason of my investigation is high CPU compared to our baselines without any high loads, under normal during-the day workload - constantly 65-80%
Perfmon counters show low values for Plan Cache Hit Ratio - around 30-50%, high compilations - 400 out of 2000 batch requests per second and high CPU - 73 avg. What could cause this behaviour?
The main purpose of the question is to learn the possible reasons for an empty plan cache.
Memory is OK - min: 0 max: 245000.
I also didn't notice any signs of memory pressure - PLE, lazy writes, free list stalls disk activity were just ok, logs did not tell me a thing.
I came here for possible causes of this so I could proceed with investigation.
EDIT: I have also considered this thread:
SQL Server 2008 plan cache is almost always empty
But none of the recommendations/possible reasons are relevant.
The main purpose of the question is to learn the possible reasons for an empty plan cache.
If it is to learn,the answer from Martin Smith,in the thread you referred will help you
If you want to know in particular,why plan is getting emptied,i recommend using extended events and try below extended event

SQL Server 2012 Always on has high logcapture wait type

From reading, I can see the Work_Queue wait can safely be ignored, but I don't find much about logcapture_wait. This is from BOL, "Waiting for log records to become available. Can occur either when waiting for new log records to be generated by connections or for I/O completion when reading log not in the cache. This is an expected wait if the log scan is caught up to the end of log or is reading from disk."
Average disk sec/write is basically 0 for both SQL Servers so I'm guessing this wait type can safely be ignored?
Here are the top 10 waits from the primary:
wait_type pct running_pct
HADR_LOGCAPTURE_WAIT 45.98 45.98
HADR_WORK_QUEUE 44.89 90.87
HADR_NOTIFICATION_DEQUEUE 1.53 92.40
BROKER_TRANSMITTER 1.53 93.93
CXPACKET 1.42 95.35
REDO_THREAD_PENDING_WORK 1.36 96.71
HADR_CLUSAPI_CALL 0.78 97.49
HADR_TIMER_TASK 0.77 98.26
PAGEIOLATCH_SH 0.66 98.92
OLEDB 0.53 99.45
Here are the top 10 waits from the secondary:
wait_type pct running_pct
REDO_THREAD_PENDING_WORK 66.43 66.43
HADR_WORK_QUEUE 31.06 97.49
BROKER_TRANSMITTER 0.79 98.28
HADR_NOTIFICATION_DEQUEUE 0.79 99.07
Don't troubleshoot problems on your server by looking at total waits. If you want to troubleshoot what is causing you problems, then you need to look at current waits. You can do that by either querying sys.dm_os_waiting_tasks or by grabbing all waits (like you did above), waiting for 1 minute, grabbing all waits again, and subtracting them to see what waits actually occurred over that minute.
See the webcast I did for more info: Troubleshooting with DMVs
That aside, HADR_LOGCAPTURE_WAIT is a background wait type and does not affect any running queries. You can ignore it.
No, you can't just simply ignore " HADR_LOGCAPTURE_WAIT". this wait type happens when SQL is either waiting for some new log data to be generated or when there are some latency while trying to read data from the log file. internal and external fragmentation on the log file or slow storage could contribute to this wait type as well.

Resources