2017-07-19 09:04:17.542 [0%] [944896 sec remaining] Web synchronization progress: 94% complete.
Article Upload Statistics:
FILE_REPLICA:
Relative Cost: 4.87%
PUBLISH_DOCUMENTS:
Updates: 827
Relative Cost: 76.73%
WF_ACTIVE_ROUTING_HISTORY:
Relative Cost: 4.29%
WF_RUN_ROUTING_HISTORY_REV:
Relative Cost: 1.87%
WF_RUN_STAGE_RES_LIST_PRES:
Relative Cost: 1.83%
WF_RUN_STAGE_STATUS_PRES:
Relative Cost: 1.83%
ORDER_RES_GROUP:
Relative Cost: 5.54%
WF_RUN_ROUTING_HISTORY:
Relative Cost: 3.04%
Article Download Statistics:
FILE_REPLICA:
Relative Cost: 7.61%
PUBLISH_DOCUMENTS:
Relative Cost: 4.18%
WF_ACTIVE_ROUTING_HISTORY:
Relative Cost: 29.20%
WF_RUN_ROUTING_HISTORY_REV:
Relative Cost: 13.25%
WF_RUN_STAGE_RES_LIST_PRES:
Relative Cost: 19.39%
WF_RUN_STAGE_STATUS_PRES:
Relative Cost: 6.54%
ORDER_RES_GROUP:
Relative Cost: 9.05%
WF_RUN_ROUTING_HISTORY:
Relative Cost: 10.78%
Session Statistics:
Upload Updates: 827
Deadlocks encountered: 18
Change Delivery Time: 753 sec
Schema Change and Bulk Insert Time: 5 sec
Delivery Rate: 1.10 rows/sec
Total Session Duration: 6556 sec
=============================================================
2017-07-19 09:04:17.596 Connecting to Subscriber 'VMSQL2014'
2017-07-19 09:04:17.609 The upload message to be sent to Publisher 'VMSQL2014' is being generated
2017-07-19 09:04:17.613 The merge process is using Exchange ID '86D0215F-E4E3-4FC1-99F4-BC9E05ACDA21' for this web synchronization session.
2017-07-19 09:04:20.168 Uploading data changes to the Publisher
2017-07-19 09:04:22.980 A query executing on Subscriber 'VMSQL2014' failed because the connection was chosen as the victim in a deadlock. Please rerun the merge process if you still see this error after internal retries by the merge process.
2017-07-19 09:04:25.513 [0%] [1227049 sec remaining] Request message generated, now making it ready for upload.
2017-07-19 09:04:25.561 [0%] [1227049 sec remaining] Upload request size is 260442 bytes.
2017-07-19 09:04:27.462 [0%] [1227049 sec remaining] Uploaded a total of 55 chunks.
2017-07-19 09:04:27.466 [0%] [1227049 sec remaining] The request message was sent to 'https://webserver/SQLReplication/replisapi.dll'
2017-07-19 09:09:28.676 The operation timed out
2017-07-19 09:09:28.679 Category:NULL
Source: Merge Process
Number: -2147209502
Message: The operation timed out
2017-07-19 09:09:28.680 Category:NULL
Source: Merge Process
Number: -2147209502
Message: The processing of the response message failed.
It says there were deadlocks encountered. A deadlock is when two transactions are trying to affect the same row. Likely, someone / some other program is writing to the same row you want to write to, thus locking you out and not letting you write.
You can:
Implement a retry procedure so your merge tries again if deadlocked.
Lock your database out from other programs / users while you run this.
There are likely other options to get around this issue. Google: "avoid deadlock"
Related
Some of my transactions were aborted in mongodb, from the log file, I digged up the info "transaction parameters:... terminationCause:aborted timeActiveMicros:205 timeInactiveMicros:245600632 ...
I increased the transactionLifetimeLimitSeconds on the server to 3000 from 60, which was 50 minutes, should be plenty, the transaction should at most take 10 minutes. Still not working
the second thing I tweaked was at the client side (pymongo), I changed wtimeout on write_concern to 500000000 from 1000, still getting the same error.
Any other parameters I should change?
I've simple Apache Flink job:
**DataSource (Apache Kafka) - Filter - KeyBy - CEP Pattern (with timer) - PatternProcessFucntion - KeyedProcessFunction (*here I've ValueState(Boolean) and registering timer on 5 minutes. If a valueState not null I'll update valueState (nothing to send in collector) and update timer. If a valueState is null, I'll save in state TRUE, then send input event in collector and setting timer. When onTimer method is ready, I'll clean my ValueState*) - Sink (Apache Kafka)**.
Job settings:
**Checkpointing interval: 5000ms**
**Incremental checkpointing: true**
**Semantic: Exactly Once**
**State Backend: RocksDB**
**Parallelism: 4**
Logically my job is working perfectly, but I've some problems.
I had two tests on my cluster (2 job manager and 3 task manager):
**First test:**
I started my job and connected to an empty Apache Kafka topic then I saw in Flink WEB UI **Checkpointing Statistics:**
1)Latest Acknowledgement - Trigger Time = 5000ms (like my checkpoint interval)
2)State size = 340 kb at each 5sec interval
3)All status was completed (blue).
**Second test:**
I started sending json-messages with other keys (from "1" to Integer.MAX_VALUE) in Apache Kafka topic. Sending speed was: 1000 messages/sec then I saw in Flink WEB UI **Checkpointing Statistics:**
1)Latest Acknowledgement - Trigger Time = 1 - 6 minutes
**My Question #1: Why is this time growing? It is bad or OK?**
2) State size was constantly growing. I sent messages in Kafka for about 10 minutes (1000 x 60 x 10 = 600000 messages). After sending State size was 100mb - 150mb.
3)After sending I waited about an one hour and saw that:
Latest Acknowledgement - Trigger Time = 5000ms (like my checkpoint interval)
State size was: 100mb - 150mb at each 5sec interval.
**My question #2: Why doesn't it decrease? After all I checked my job logs and saw 600000 records: ValueState for **key** was cleared (OnTimer method was successfully) and job logics (see description my KeyedProcessFunction) was working great**
What was I trying to do?
1)setting pause between checkpoints
2)disable incremental checkpoints
3)enable async checkpoints (in flink-conf.yml)
It doesn't give any changes!!!
**My question #3: What should I do?? Because on industrial server speed is: *10 millions messages/hour* and checkpoint size is increases instantly.**
I send emails with cron job and task queue usage. The job is executed every 15 minutes and the queue used has the following setup:
- name: send-emails
rate: 1/m
max_concurrent_requests: 1
retry_parameters:
task_retry_limit: 0
But quite often apiproxy_errors.OverQuotaError exception happens. I am checking Quota Details and see that I am still within the daily quota (Recipients Emailed, Attachment Data Sent etc.), and I believe I couldn't be over maximum per minute limit, since the the rate I use is just 1 task per minute (i.e. send no more than 1 mail per minute).
Where am I wrong and what should I check?
How many emails are you sending? You have not set a bucket-size, so it defaults to 5. Your rate sets how often the bucket is replenished. So, with your current configuration, you can send 5 emails every minute. That means if you are sending more than 75 emails to the queue every 15 minutes, the queue will fill up, and eventually go over quota.
I have not tried this myself, but when you catch the apiproxy_errors.OverQuotaError exception, does the message contain any detail as to why it is over quota/which quota has been exceeded?
try:
send_mail_here
except apiproxy_errors.OverQuotaError, message:
logging.error(message)
From reading, I can see the Work_Queue wait can safely be ignored, but I don't find much about logcapture_wait. This is from BOL, "Waiting for log records to become available. Can occur either when waiting for new log records to be generated by connections or for I/O completion when reading log not in the cache. This is an expected wait if the log scan is caught up to the end of log or is reading from disk."
Average disk sec/write is basically 0 for both SQL Servers so I'm guessing this wait type can safely be ignored?
Here are the top 10 waits from the primary:
wait_type pct running_pct
HADR_LOGCAPTURE_WAIT 45.98 45.98
HADR_WORK_QUEUE 44.89 90.87
HADR_NOTIFICATION_DEQUEUE 1.53 92.40
BROKER_TRANSMITTER 1.53 93.93
CXPACKET 1.42 95.35
REDO_THREAD_PENDING_WORK 1.36 96.71
HADR_CLUSAPI_CALL 0.78 97.49
HADR_TIMER_TASK 0.77 98.26
PAGEIOLATCH_SH 0.66 98.92
OLEDB 0.53 99.45
Here are the top 10 waits from the secondary:
wait_type pct running_pct
REDO_THREAD_PENDING_WORK 66.43 66.43
HADR_WORK_QUEUE 31.06 97.49
BROKER_TRANSMITTER 0.79 98.28
HADR_NOTIFICATION_DEQUEUE 0.79 99.07
Don't troubleshoot problems on your server by looking at total waits. If you want to troubleshoot what is causing you problems, then you need to look at current waits. You can do that by either querying sys.dm_os_waiting_tasks or by grabbing all waits (like you did above), waiting for 1 minute, grabbing all waits again, and subtracting them to see what waits actually occurred over that minute.
See the webcast I did for more info: Troubleshooting with DMVs
That aside, HADR_LOGCAPTURE_WAIT is a background wait type and does not affect any running queries. You can ignore it.
No, you can't just simply ignore " HADR_LOGCAPTURE_WAIT". this wait type happens when SQL is either waiting for some new log data to be generated or when there are some latency while trying to read data from the log file. internal and external fragmentation on the log file or slow storage could contribute to this wait type as well.
I am running Apache2 on Linux (Ubuntu 9.10).
I am trying to monitor the load on my server using mod_status.
There are 2 things that puzzle me (see cut-and-paste below):
The CPU load is reported as a ridiculously small number,
whereas, "uptime" reports a number between 0.05 and 0.15 at the same time.
The "requests/sec" is also ridiculously low (0.06)
when I know there are at least 10 requests coming in per second right now.
(You can see there are close to a quarter million "accesses" - this sounds right.)
I am wondering whether this is a bug (if so, is there a fix/workaround),
or maybe a configuration error (but I can't imagine how).
Any insights would be appreciated.
-- David Jones
- - - - -
Current Time: Friday, 07-Jan-2011 13:48:09 PST
Restart Time: Thursday, 25-Nov-2010 14:50:59 PST
Parent Server Generation: 0
Server uptime: 42 days 22 hours 57 minutes 10 seconds
Total accesses: 238015 - Total Traffic: 91.5 MB
CPU Usage: u2.15 s1.54 cu0 cs0 - 9.94e-5% CPU load
.0641 requests/sec - 25 B/second - 402 B/request
11 requests currently being processed, 2 idle workers
- - - - -
After I restarted my Apache server, I realized what is going on. The "requests/sec" is calculated over the lifetime of the server. So if your Apache server has been running for 3 months, this tells you nothing at all about the current load on your server. Instead, reports the total number of requests, divided by the total number of seconds.
It would be nice if there was a way to see the current load on your server. Any ideas?
Anyway, ... answered my own question.
-- David Jones
Apache status value "Total Accesses" is total access count since server started, it's delta value of seconds just what we mean "Request per seconds".
There is the way:
1) Apache monitor script for zabbix
https://github.com/lorf/zapache/blob/master/zapache
2) Install & config zabbix agentd
UserParameter=apache.status[*],/bin/bash /path/apache_status.sh $1 $2
3) Zabbix - Create apache template - Create Monitor item
Key: apache.status[{$APACHE_STATUS_URL}, TotalAccesses]
Type: Numeric(float)
Update interval: 20
Store value: Delta (speed per second) --this is the key option
Zabbix will calculate the increment of the apache request, store delta value, that is "Request per seconds".