I have set a delayer(30000) instruction into a route. When I perform a Graceful Shutdown on this route, during those 30 seconds, the message is immediately transfered to the next instruction. Is it normal?
Actually it is pretty smart but how can I still delay the exchange?
PS : sorry, camel 2.2.0
Camel 2.2.0 is much old version.
I don't think you can get some free support here.
Related
I am using IBM WebSphere MQ to set up a durable subscription for Pub/Sub. I am using their C APIs. I have set up a subscription name and have MQSO_RESUME in my options.
When I set a wait interval for my subscriber and I properly close my subscriber, it works fine and restarts fine.
But if I force crash my subscriber (Ctrl-C) and I try to re open it, I get a MQSUB ended with reason code 2429 which is MQRC_SUBSCRIPTION_IN_USE.
I use MQWI_UNLIMITED as my WaitInterval in my MQGET and use MQGMO_WAIT | MQGMO_NO_SYNCPOINT | MQGMO_CONVERT as my MQGET options
This error pops up only when the topic has no pending messages for that subscription. If it has pending messages that the subscription can resume, then it resumes and it ignores the first published message in that topic
I tried changing the heartbeat interval to 2 seconds and that didn't fix it.
How do I prevent this?
This happens because the queue manager has not yet detected that your application has lost its connection to the queue manager. You can see this by issuing the following MQSC command:-
DISPLAY CONN(*) TYPE(ALL) ALL WHERE(APPLTYPE EQ USER)
and you will see your application still listed as connected. As soon as the queue manager notices that your process has gone you will be able to resume the subscription again. You don't say whether your connection is a locally bound connection or a client connection, but there are some tricks to help speed up the detection of connections depending on the type of connection.
You say that in the times when you are able to resume you don't get the first message, this is because you are retrieving this messages with MQGMO_NO_SYNCPOINT, and so that message you are not getting was removed from the queue and was on its way down the socket to the client application at the time you forcibly crashed it, and so that message is gone. If you use MQGMO_SYNCPOINT, (and MQCMIT) you will not have that issue.
You say that you don't see the problem when there are still messages on the queue to be processed, that you only see it when the queue is empty. I suspect the difference here is whether your application is in an MQGET-wait or processing a message when you forcibly crash it. Clearly, when there are no messages left on the queue, you are guarenteed with the use of MQWL_UNLIMITED, to be in the MQGET-wait, but when processing messages, you probably spend more time out of the MQGET than in it.
You mention tuning down the heartbeat interval, to try to reduce the time frame, this was a good idea. You said it didn't work. Please remember that you have to change it at both ends of the channel, or you will still be using the default 5 minutes.
At some point my site, running on Apache2 with mod_wsgi just stops processing requests. The connection to server is maintained and client waits for responce, but it never is returned by apache. The server at this time is at 0% CPU, and nothing is processing. I think, apache just sends request to queue and never gets them out of there.
When I perform apache2ctl graceful the problem does not resolve. Only after apache2ctl restart.
My site is a 4 instance wsgi application of Pyramid and 2 instances of Zope 3. It is running normaly and does not have speed problems, that I am aware of.
versions:
Ubuntu 10.04
apache2 2.2.14-5ubuntu8.9
libapache2-mod-wsgi 2.8-2ubuntu1
Sounds like you are using embedded mode to run the multiple applications and you are using third party C extensions that have problems in sub interpreters, resulting in potential deadlock. Else your code is internally deadlocking or blocking on external services and never returning, causing exhaustion of available processes/threads.
For a start, you should look at using daemon mode and delegate each web application to a distinct daemon process group and then forcing each to run in the main interpreter.
See:
http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide#Delegation_To_Daemon_Process
http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Python_Simplified_GIL_State_API
Otherwise use debugging tips described in:
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques
for getting stack traces about what application is doing.
I am making an application which runs on our every PC in random times. It works fine, however if the PC is currently shutting down, then I can't read the WMI and i get some errors. So I need to determinate if a PC is shutting down currently, and so i could avoid these errors. Does anyone has an idee?
Thanks!
Call GetSystemMetrics with index SM_SHUTTINGDOWN (0x2000).
Create a hidden top-level window and listen for WM_ENDSESSION messages. The value of wParam will tell you whether the entire system is going down, or whether the user is logging off.
If your app is a console app then use SetConsoleCtrlHandler to register to receive shutdown notifications.
Any attempt to detect this situation will have a race condition: the system shutdown might start immediately after you detect that it's not shutting down, but before you try to perform the operations that won't work during shutdown. Thus your approach to fixing the problem is wrong. Instead you just need to handle the WMI read failures and determine if they're cause by system shutdown, and in this case abort the operation or proceed in whatever alternate way makes sense.
It might be possible to use a sort of synchronous shutdown detection mechanism where you can actually lock/delay the shutdown for a brief interval before it proceeds, and do your processing in that interval. If so, that would also be a safe approach without race conditions.
My application takes a checkpoint every few 100 milliseconds by using the fork system call. However, I notice that my application slows down significantly when using checkpointing (forking). I tested the time taken by fork call and it came out to be 1 to 2 ms. So why is fork slowing down my application so much. Note that I only keep 1 checkpoint (forked process) at a time and kill the previous checkpoint whenever I take a new one. Also, my computer has a huge RAM.
Notice that my forked process just sleeps after creation. It is only awoken when rollback needs to be done. So, it should not be scheduled by the OS. One thing that comes to my mind is that since fork is a copy-on-write mechanism, there are page faults occuring whenever my application modifies a page. But should that slow down the application significantly? Without checkpointing (forking), my application finishes in approximately 3.1 seconds and with it, it takes around 3.7 seconds. Any idea, what is slowing down my application?
You are probably observing the cost of the copy-on-write mechanism, as you hypothesize. That's actually quite expensive -- it is the reason vfork still exists. (The main cost is not the extra page faults themselves, but the memcpy of each page as it is touched, and the associated cache and TLB flushes.) It's not showing up as a cost of fork because the page faults don't happen inside the system call.
You can confirm the hypothesis by looking at the times reported by getrusage -- if this is correct, the extra time elapsed should be nearly all "system" time (CPU burnt inside the kernel). oprofile or perf will let you pin down the problem more specifically... if you can get them to work at all, which is nontrivial, alas.
Unfortunately, copy-on-write is also the reason why your checkpoint mechanism works in the first place. Can you get away with taking checkpoints at longer intervals? That's the only quick fix I can think of.
I suggest using oprofile to find out.
oprofile is believed to be able to profile a system (and not only a single process).
You could compare with what other checkpointing packages do, e.g. BLCR
Forking is by nature very expensive, as you're creating a copy of the existing process as an entirely new process. If speed is important to you, you should use threads.
Additionally, you say that the forked process sleeps until a 'rollback' is needed. I'm not sure what you mean by rollback, but provided its something that you can put in a function, you ought to just place it in a function and then create a thread that just runs that function and exits when you detect the need for the rollback. As an added bonus, if you use that method you only create the thread if you need it.
I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.
The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.
You could create a CRON job to check if the process is running with start-stop-daemon from time to time.
use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon