Detect when ETW drops events

Detect when ETW drops events - c

How can I determine if an ETW session is dropping events?
If it is dropping events, how can I configure the tracing session so that events are not dropped?
I've written a custom ETW provider to help with some debugging efforts. I'm currently capturing the trace data using logman.exe.
In viewing the results, it appears that some of the events are being dropped. Basically I'm seeking something like:
Event A
Event C
where their should be an intervening Event B, but one does not appear in the trace file. It should be impossible for that to happen, which leads me to believe that ETW is dropping events.
Of course, I'd like to verify that the problem I'm seeing is due to dropped events, and not caused by a bug in my code. I've tried Google, but wasn't able to come up with anything. Does any one know how I can check to see if events are being dropped?

It doesn't answer the question directly (how to detect drops), but it might explain drops:
EVENT_TRACE_NO_PER_PROCESSOR_BUFFERING
Writes events that were logged on different processors to a common
buffer. Using this mode can eliminate the issue of events appearing
out of order when events are being published on different processors
using system time. This mode can also eliminate the issue with
circular logs appearing to drop events on multiple processor
computers.
If you do not use this mode and you use system time, the events may
appear out of order on multiple processor computers. This is because
ETW buffers are associated with a processor instead of a thread. As a
result, if a thread is switched from one CPU to another, the buffer
associated with the latter CPU can be flushed to disk before the one
associated with the former CPU.
If you expect a high volume of events (for example, more than 1,000
events per second), you should not use this mode.
Note that the processor number is not included with the event. Not
available prior to Windows 7 and Windows Server 2008 R2.

I've been using logman to capture the results. It looks like tracelog will give me info about lost events, and I can tweek its buffer parameters to reduce the event loss.

If you use xperf to collect the logs, it generates a warning when events are lost. With xperf you can also play with the buffer size and can divide the logging to several loggers.

Related

How to get events in real-time using ETW (StartTrace etc.)?

In Event Tracing for Windows, StartTrace accepts an EVENT_TRACE_PROPERTIES structure that allows for a FlushTimer which specifies how frequently unfull buffers should be flushed.
The thing is, FlushTimer is a ULONG representing seconds, but I want it to be very small so that it's nearly instantaneous (on the order of milliseconds).
I don't know how Process Monitor manages to get ETW events in real-time, but it does, so surely there must be a way to do it.
So the question is: How can I receive real-time events, you know, in real time?

ETW does not support real time notifications. Even the so-called EVENT_TRACE_REAL_TIME_MODE isn't really real-time as the documentation clearly says.
The premise of your question is wrong: Sysinternals Process Monitor does not use ETW to get its synchronous kind-of real-time process, thread, module, file and Registry events. You've got two options:
Use ETW - which is not what ProcMon does - and get events they way ETW provides them to you.
Do what ProcMon does - which is not consume ETW events - and get events synchronously, like ProcMon gets them.

How to trap file access attempts with a filter driver (kernel) and offer dialog to allow/deny (user)?

I've been looking at Windows's File System Filter Drivers. I started with this "FsFilter" example:
http://www.codeproject.com/Articles/43586/File-System-Filter-Driver-Tutorial
With effort, I managed to get it built and signed in versions that work on everything from 64-bit Win8 to 32-bit WinXP. (Well, as long as I run Bcdedit.exe -set TESTSIGNING ON to allow it to accept my test certificate, since I didn't pay Microsoft $250 to sign my .SYS file. :-/)
Now I want to modify FsFilter. I'd like write accesses to certain types of files to be trapped by the filter. I then want the user to receive a dialog box, in which they can either allow the access or deny it.
Perhaps obviously...the kernel-mode code cannot display the UI. It will have to signal some user mode process, which will (after an arbitrarily latent period of time) signal back the user's wish to the driver. I've looked a bit over
User-Mode Interactions: Guidelines for Kernel-Mode Drivers (here's Google's Cache as HTML, instead of .DOC)
I don't know what the best way to attack this is. The only example I've yet found to study is SysInternals FileMon. The driver it installs gathers data in a buffer, which is periodically requested by the .EXE according to a WM_TIMER loop:
// Have driver fill Stats buffer with information
if ( ! DeviceIoControl( SysHandle, IOCTL_FILEMON_GETSTATS,
NULL, 0, &Stats, sizeof Stats,
&StatsLen, NULL ) )
{
Abort( hWnd, _T("Couldn't access device driver"), GetLastError() );
return TRUE;
}
Should I use a similar technique? Perhaps the filter driver, upon receiving a request it wants to check, could place a record to track the request in a buffer that would contain two HEVENTs. It would then WaitForMultipleObjects on these two HEVENTs, which represent a signaled "YES" or "NO" from user mode on whether to allow access.
Periodically the monitor process (running in user mode) will poll the driver from another thread using a custom IOCTL. The filter driver would return the request information... as well as the two HEVENTs that request is waiting on. The monitor would wait for the user's feedback, and when available signal the appropriate event.
I could also invert this model. The user mode code could use a custom IOCTL to pass in data... such as HEVENTs which could be signaled by the driver, and just implement some kind of safe protocol. This would eliminate the need for polling.
Basically just looking for guidance on method, or a working example on the web! I'd also be interested to know what the mechanics would be on an asynchronous file access. I assume there's a way so a client making an async call that is being checked could keep running and only be held up when they waited on the request to finish...?
(Note: Along the way of getting the filters built and debugged, I learned there are some more modern techniques via "miniFilters"--which are part of something called the Filter Manager Model. But for the moment, I'm not that concerned as long as the legacy model is supported. It looks rather similar anyway.)

You (a.k.a. I) have pretty much enumerated the possibilities. Either poll the way FileMon does, or pass an event. Passing the event is probably a bit more error prone, and if you aren't a threading guru then there's probably more chance for error. But if you tend to make lots of mistakes then device drivers may not be for you...skydiving might be a poor choice too.
I'll offer taking a look at this project, but please note the disclaimers in the README. (It is only a test and investigation):
https://github.com/hostilefork/CloneLocker
And yes, to the extent that Microsoft and their driver model is to be something one worries about, miniFilters are the better choice these days.

Embedded File System and power-off

I am working on an embedded application without any OS that needs the use of a File System. I've been over this many times with the people in the project and some agree with me that the system must make a proper shut down of the system whenever there is a power failure or else the file system might go crazy.
Some people say that it doesn't matter if you simply power off the system and let nature run its course, but I think that's one of the worst things to do, especially if you know this will bring you a problem and probably shorten your product's life span.
In the last paragraph I just assumed that it is a problem, but my question remains:
Does a power down have any effect on the file system?

Here is a list of various techniques to help an embedded system tolerate a power failure. These may not be practical for your particular application.
Use a Journaling File System - Can tolerate incomplete writes due to power failure, OS crash, etc. Most modern filesystems are journaled, but do your homework to confirm.
Unless your application needs the write performance, disable all write caching. Check your disk drivers for caching options. Under Linux/Unix, consider mounting the filesystem in sync mode.
Unless it must be writable, make it read-only. Try to keep your application executables and operating system files on their own partition(s), with write protections in place (e.g. mount read only in Linux). Your read/write data should be on its own partition. Even if your application data gets corrupted, your system should still be able to boot (albeit with a fail safe default configuration).
3a. For data that is only written once (e.g. Configuration Settings), try to keep it mounted as read-only most of the time. If there is a settings change mount is as R/W temporarily, update the data, and then unmount/remount it as read-only.
3b. Use a technique similar to 3a to handle application/OS updates in the field.
3c. If it is impractical for you to mount the FS as read-only, at least consider opening individual files as read-only (e.g. fp=fopen("configuration.ini", "r")).
If possible, use separate devices for your storage. Keeping things in separate partitions provides some protection, but there are still edge cases where a partition table may become corrupt and render the entire drive unreadable. Using physically separate devices further isolates against one corrupt device bringing down the whole system. In a perfect world, you would have at least 4 separate devices:
4a. Boot Loader
4b. Operating System & Application Code
4c. Configuration Settings
4e. Application Data
Know the characteristics of your storage devices, and control the brand/model/revision of devices used. Some hard disks ignore cache flush commands from the OS. We had cases where some models of CompactFlash cards would corrupt themselves during a power failure, but the "industrial" models did not have this problem. Of course, this information was not published in any datasheet, and had to be gathered by experimental testing. We developed a list of approved CF cards, and kept inventory of those cards. We periodically had to update this list as older cards became obsolete, or the manufacturer would make a revision.
Put your temporary files in a RAM Disk. If you keep those writes off-disk, you eliminate them as a potential source of corruption. You also reduce flash wear and tear.
Develop automated corruption detection and recovery methods. - All of the above techniques will not help you if the application simply hangs because a missing config file. You need to be able to recover as gracefully as possible:
7a. Your system should maintain at least two copies of its configuration settings, a "primary" and a "backup". If the primary fails for some reason, switch to the backup. You should also consider mechanisms for making backups whenever whenever the configuration is changed, or after a configuration has been declared "good" by the user (testing vs production mode).
7b. Did your Application Data partition fail to mount? Automatically run chkdsk/fsck.
7c. Did chkdsk/fsck fail to fix the problem? Automatically re-format the partition and get it back to a known state.
7d. Do you have a Boot Loader or other method to restore the OS and application after a failure?
7e. Make sure your system will beep, flash an LED, or something to indicate to the user what happened.
Power Failures should be part of your system qualification testing. The only way you will be sure you have a robust system is to test it. Yank the power cord from the system and document what happens. Try yanking the power at multiple points in the system operation (during runtime, while booting, mid configuration, etc). Repeat each test multiple times.
If you cannot mitigate all power failure problems, incorporate a battery or Supercapacitor into the system - Keep in mind that you will need a background process in your OS to initiate a graceful shutdown when power gets low. Also, batteries will require periodic testing and replacement with age.

Addition to msemack's response, unfortunately my rating is too low to post a comment to his answer vs. a separate answer.
Does a power down have any effect on the file system?
Yes, if proper measures aren't put in place to prevent corruption. See previous answers for file system options to help mitigate. However if ATA flush/sleep aren't properly implemented on your device you may run into the scenario we did. In our scenario the device was corrupt beyond the file system, and fdisk/format would not recover the device.
Instead an ATA security-erase was required to recover the device once corruption occurs. In order to avoid this, we implemented an ATA sleep command prior to power loss. This required hold-up of 400ms to support the 160ms ATA sleep took, and leave some head room for degradation of the caps over the life of the product.
Notes from our scenario:
fdisk/format failed to repair/recover the drive.
Our power-safe file system's check disk utility returned that the device had bad blocks, but there really weren't any.
flush/sync returned success, quickly, and most likely weren't implemented.
Once corrupt, dd could not read the device beyond the 1st partition boundary and returned i/o errors after.
hdparm used to issue ATA security-erase, as only method of recovery for some corruption scenarios.

For non-journalling filesystem unexpected turn-off can mean corruption of certain data including directory structure. This happens if there's unsaved data in the cache or if the FS is in the process of writing multi-block update and interruption happens when only some blocks are written.
Journalling addresses this problem mostly - if there's interruption in the middle, recovery routine or check-and-repair operation done by the FS (usually implicitly) brings the filesystem to consistent state. However this state is not always the latest - i.e. if there were some data in the memory cache, they can be lost even with journalling. This is because journalling saves you from corruption of the filesystem but doesn't do magic.
Write-through mode (no write caching) reduces possibility of the data loss but doesn't solve the problem completely, as journalling will work as a cache (for a very short time).
So unfortunately backup or data duplication are the main ways to prevent data loss.

It totally depends on the file system you are using and if it is acceptable to loose some data at power off based on your project requirements.
One could imagine using a file system that is secured against unattended power-off and is able to recover from a partial write sequence. So on the applicative side, if you don't have critic data that absolutely needs to be written before shuting down, there is no need for a specific power off detection procedure.
Now if you want a more specific answer for your project you will have to give more information on the file system you are using and your project requirements.
Edit: As you have critical applicative data to save before power-off, i think you have answered the question yourself. The only way to secure unattended power-off is to have a brown-out detection that alerts your embedded device coupled with some hardware circuitry that allows keeping delivering enought power to the device to perform the shutdown procedure.

The FAT file-system is particularly prone to corruption if a write is in progress or a file is open on shutdown - specifically if ther is a buffered operation that is not flushed . On one project I worked on the solution was to run a file system integrity check and repair (essentially chkdsk/scandsk) on start-up. This strategy did not prevent data loss, but it did prevent the file system becoming unusable.
A number of vendors provide journalling add-on components for FAT to counter exactly this problem. These include Segger, Quadros and Micrium for example.
Either way, your system should generally adopt a open-write-close approach to file access, or open-write-flush if you feel the need to keep the file open.

Maintaining connections with many real-time devices

I'm writing a program on Linux to control about 1000 Patient Monitors at same time over UDP sockets. I've successfully written a library to parse and send messages to collect the data from a single patient monitor device. There are various scheduling constraints on the the device, listed below:-
Each device must constantly get an alive-request from computer client within max time-period of 300 milliseconds(may differ for different devices), otherwise connection is lost.
Computer client must send a poll-request to a device in order fetch the data within some time period. I'm polling for about 5 seconds of averaged data from patient monitor, therefore, I'm required to send poll-request in every 5 * 3 = 15 seconds. If I fail to send the request within 15 seconds time-frame, I looses the connection from device.
Now, I'm trying to extend my current program so that it is capable of handling about 1000+ devices at same time. Right now, my program can efficiently handle and parse response from just one device. In case of handling multiple devices, it is necessary to synchronize multiple responses from different device and serialize them and stream it over TCP socket, so that remote computers can also analyze the data. Well, that is not a problem because it is a well know multiple-producer and single consumer problem. My main concern is, what approach should I use in order to maintain alive-connection 1000+ devices.
After reading over Internet and browsing for similar questions on this website, I'm mainly considering two options:-
Use one thread per device. In order to control 1000+ device, I would end up in making 1000+ threads which does not look feasible to me.
Use multiplexing approach, selecting FD that requires attention and deal with it one at a time. I'm not sure how would I go about it and if multiplexing approach would be able to maintain alive-connection with all the devices considering above two constants.
I need some suggestions and advice on how to deal with this situation where you need to control 1000+ real-time-device over UDP sockets. Each device requires some alive-signal every 300 milliseconds (differ for different devices) and they require poll request in about 3 times the time interval mentioned during association phase. For example, patient monitors in ICU may require real-time (1 second averaged) data where as patient monitors in general wards may require 10-seconds averaged data, therefore, poll period for two devices would be 3*1(3 seconds) and 3*10 (30 seconds) respectively.
Thanks
Shivam Kalra

for the most part either approach is at least functionally capable of handling the functionality you describe, but by the sounds of things performance will be a crucial issue. From the figures you have provided it seems that the application could be CPU-buond.
A multithreaded approach has the advantage of using all of the available CPU cores on the machine, but multithreaded programs are notorious for being difficult to make reliable and robust.
You could also use the Apache's old tried-and-true forked-worker model - create, say, a separate process to handle a maximum of 100 devices. You could then need to write code to manage the mapping of connections to processes.
You could also use multiple hosts and some mechanism to distribute devices among them. This would have the advantage of making it easier to handle recovery situations. It sounds like your application could well be mission critical, and it may need to be architected so that if any one piece of hardware breaks then other hardware will take over automatically.

Polling a database versus triggering program from database?

I have a process wherein a program running in an application server must access a table in an Oracle database server whenever at least one row exists in this table. Each row of data relates to a client requesting some number crunching performed by the program. The program can only perform this number crunching serially (that is, for one client at a time rather than multiple clients in parallel).
Thus, the program needs to be informed of when data is available in the database for it to process. I could either
have the program poll the database, or
have the database trigger the program.
QUESTION 1: Is there any conventional wisdom why one approach might be better than the other?
QUESTION 2: I wonder if programs have any issues "running" for months at a time (would any processes in the server stop or disrupt the program from running? -- if so I don't know how I'd learn there was a problem unless from angry customers). Anyone have experience running programs on a server for a long time without issues? Or, if the server does crash, is there a way to auto-start a (i.e. C language executable) program on it after the server re-boots, thus not requiring a human to start it specifically?
Any advice appreciated.
UPDATE 1: Client is waiting for results, but a couple seconds additional delay (from polling) isn't a deal breaker.

I would like to give a more generic answer...
There is no right answer that applies every time. Some times you need a trigger, and some times is better to poll.
But… 9 out of 10 times, polling is much more efficient, safe and fast than triggering.
It's really simple. A trigger needs to instantiate a single program, of whatever nature, for every shot. That is just not efficient most of the time. Some people will argue that that is required when response time is a factor, but even then, half of the times polling is better because:
1) Resources: With triggers, and say 100 messages, you will need resources for 100 threads, with 1 thread processing a packet of 100 messages you need resources for 1 program.
2) Monitoring: A thread processing packets can report time consumed constantly on a defined packet size, clearly indicating how it is performing and when and how is performance being affected. Try that with a billion triggers jumping around…
3) Speed: Instantiating threads and allocating their resources is very expensive. And don’t get me started if you are opening a transaction for each trigger. A simple program processing a say 100 meessage packet will always be much faster that initiating 100 triggers…
3) Reaction time: With polling you can not react to things on line. So, the only exception allowed to use polling is when a user is waiting for the message to be processed. But then you need to be very careful, because if you have lots of clients doing the same thing at the same time, triggering might respond LATER, than if you where doing fast polling.
My 2cts. This has been learned the hard way ..

1) have the program poll the database, since you don't want your database to be able to start host programs (because you'd have to make sure that only "your" program can be started this way).
The classic (and most convenient IMO) way for doing this in Oracle would be through the DBMS_ALERT package.
The first program would signal an alert with a certain name, passing an optional message. A second program which registered for the alert would wait and receive it immediatly after the first program commits. A rollback of the first program would cancel the alert.
Of cause you can have many sessions signaling and waiting for alerts. However, an alert is a serialization device, so if one program signaled an alert, other programs signaling the same alert name will be blocked until the first one commits or rolls back.
Table DBMS_ALERT_INFO contains all the sessions which have registered for an alert. You can use this to check if the alert-processing is alive.
2) autostarting or background execution depends on your host platform and OS. In Windows you can use SRVANY.EXE to run any executable as a service.

I recommend using a C program to poll the database and a utility such as monit to restart the C program if there are any problems. Your C program can touch a file once in a while to indicate that it is still functioning properly, and monit can monitor the file. Monit can also check the process directly and make sure it isn't using too much memory.
For more information you could see my answer of this other question:
When a new row in database is added, an external command line program must be invoked
Alternatively, if people aren't sitting around waiting for the computation to finish, you could use a cron job to run the C program on a regular basis (e.g. every minute). Then monit would be less needed because your C program will start and stop all the time.

You might want to look into Oracle's "Change Notification":
http://docs.oracle.com/cd/E11882_01/appdev.112/e25518/adfns_cqn.htm
I don't know how well this integrates with a "regular" C program though.
It's also available through .Net and Java/JDBC
http://docs.oracle.com/cd/E11882_01/win.112/e23174/featChange.htm
http://docs.oracle.com/cd/E11882_01/java.112/e16548/dbchgnf.htm

There are simple job managers like gearman that you can use to send a job message from the database to a worker. Gearman has among others a MySQL user defined function interface, so it is probably easy to build one for oracle as well.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight