I'm writing a multiprocess server in C and an I'm just wondering what are the best tools to debug and test my programs? Specifically what is being sent to the client and vise versa.
Thank you for your help.
Every process should write log. It's not exactly a debugging tool like gdb, but very-very useful.
Every log entry should contain a precise timestamp, the process ID, and socket data. You can write the log to file(s), to database, maybe to a log server. Logging to a database (e.g. SQLite) is useful because it's easy to filter the log for specific time range, for specific client connection etc. It's also easy to merge the log of different processes (SQLite: ATTACH DATABASE). On Linux, I would consider using syslog.
Specify different logging levels. Detailed logging helps to debug your code in the development phase. Basic logging will help you tracking down rare errors which will emerge in the long term. Make sure you can turn on and off logging and set logging levels easily, without turning off your server.
Related
I've just started using Nagios to monitor a group of broadcast transmitters. Each transmitter is defined as a host, and each aspect of the transmitter I wish to monitor (RF forward, RF reflected, power supply voltages, etc) is defined as a service. In doing so, I can get an alarm if any of these aspects are out of tolerance, and can use the performance data to graph each aspect (using pnp4nagios, in this case).
To check the transmitters' telemetry data, I wrote some scripts, one to address the unique facilities of each make/model of transmitter involved. In keeping with the way I've seen other Nagios checks work, an argument to the script allows you to select which aspect you want reported.
At first I was content with this. It worked like any more-traditional use of Nagios I'd encountered. But then I hit a snag.
Because each service check is scheduled individually, diagnosing an alarm condition can be tricky, since the various services aren't all being checked at the same time - and therefore the set of values I'm looking at is unlikely to be time-aligned. If all the service check values were from the same moment in time, it would be easier to detect correlations (since the set of values would essentially be a snapshot).
My first thought would be to deal with this by running a single instance of a single command, which would return values for multiple services. This would also seem far more efficient than opening as many connection instances as there are services to be checked. From a scripting perspective, this is easily done. But from a Nagios config perspective, I don't know how (or if?) you'd do that.
I know I could also divorce the data collection from the Nagios check, caching the telemetry values all at once periodically, and feeding Nagios values from the cache. But I don't want to introduce added delays if I can help it.
Thoughts?
My first thought would be to deal with this by running a single instance of a single command, which would return values for multiple services. This would also seem far more efficient than opening as many connection instances as there are services to be checked. From a scripting perspective, this is easily done. But from a Nagios config perspective, I don't know how (or if?) you'd do that.
There's nothing strange about this from a Nagios perspective, because what you're essentially doing is writing your own plugin, and plugins can be as general or specific as you want them to be.
When writing your own plugin, it's good to remember:
Your script is responsible for all failures, so make sure you handle garbage responses, failed connections and whatever other errors you predict may happen in the plugin itself, and exit with appropriate error levels.
Since you may encounter errors you didn't expect, it probably makes sense to have the plugin write what it's doing to a log file, as well as what responses it got.
The plugin must use exit codes to alert Nagios correctly. If you want performance data, it needs to be given in the correct syntax. See the development guidelines.
I'm considering submitting the service data passively. It would solve all the problems I mentioned. But it would create a few minor new ones - now there's external processes to keep running, and it's a little outside the mainstream way of doing things (might put a future admin through a little pain to figure out how it works).
I don't think this is a better solution than writing your own plugin, unless the data is coming from nodes actively pushing it out.
For example, in an IoT context, the nodes you are monitoring may actually be sending passive check results directly to the Nagios instance. In that setting, passive checks make sense, because you just want to take whatever someone else gives you and action in case no results come in (freshness).
In your case, it sounds like writing your own script would take care of both the timing issue and whatever else additional logic you want in your script, and as far as Nagios is concerned it should only run it on a schedule and watch the exit codes, then act as configured if it fails.
What are my best options for logging 3k events per second from a c file ? Following of the options which come to my mind. Not able to decide which would be robust solution with less failure points, higher reliability and less latency.
Use a messaging server to relay events as they happen
Use syslog for logging
Use Unix pipe
Use of logging agents like fluent which will send events to analysis server
Write a log file locally and then rotate periodically rotate it to analysis server using something like rsync
Try syslog. No reason to make it too complicated. With syslog-ng you can do local logging through UDP, then set up the local syslogd to forward everything through TCP to a central syslog server. You might need to run without fsync on the central syslog server to keep up with that load (but test first), but that can be mitigated with forwarding everything to two separate machines. This gives you the asynchronous performance locally and enough reliability that you should almost never lose events.
Another option I've done is to log events into Redis, Riak or some other nosql data store (I usually don't recommend them for anything complex, but event logging is right up their alley). Set up mirroring for redundancy and they should be able to keep up way more than 3k events per second.
I'm quite new to SQL Server and was wondering what the difference between the SQL Server log is and a custom log (in my case, using log4net)? I guess there's more choice on what to log using log4net, but what things are automatically logged by the database? For example, if a user signs up to my site, would I have to manually log that transaction, or would that be recorded in the database's log automatically? I'm currently starting a project and would like to figure out exactly what I should bother logging.
Thanks
Apples and Oranges.
Log4net and other custom 'logging' is just a way to capture events an application is reporting. 'Log' in this context reffers to whatever store is used by this infrastucture to persist information about these events.
The database log on the other hand is something compeltely different. In order to maintain consistency and atomicity databases use a so called Write-Ahead-Log protocol. In WAL all changes are first durable written into a journal, or log, before being applied to the data. This allows recovery to replay the log (the journal) and get the data back into a consistent state, by rolling back any uncommited work.
Database logs have absolutely nothing to do with your application code. Any database update will be automatically logged by the engine, simply because this is how any data is updated in a database. You cannot modify that, nor do you have any access to what's written in the log (strictly speaking you can look into the log, but you won't find any usefull information for your application).
SQL log handles tansaction logging for rolling back or comiting data. They are usually only dealt with by someone who knows what they are doing restoring backups or shipping the logs to use for backups.
The log4net and other logging framweworks handle in code logging of exceptions, warning, or debug level info that you would like to output for your own info. They can be sent to a table in a database, command window, flat file or web service. Common logging scenarios are catching unhandled exceptions at the application level to help track down bugs, or in any try catch statements writing out the stack trace.
It keeps track of the transactions so it can roll them back or replay in case of a crash. Quite more involved than simple logging.
The two are almost completely unrelated.
A database log is used to rollback transactions, recover from crashes, etc. All good things to ensure database consistency. It has updates/inserts/deletes in it--not really anything about intent or what your app is trying to do unless it directly affects data in the database.
The application log on the other hand (with Log4Net) can be extremely useful when building and debugging your application. It is driven by you and should contain information that traces what your app is doing. This is something that can safely be turned off or reduced (by toggling the log level) when you no longer need it.
The SQL Server log file is actually used for maintaining it's own stability, but it's not terribly useful for normal developers. It's not what you think (and I what I thought), a list of SQL statements that have been run. It's a propriety format designed to help SQL recover from a crash or roll back transactions.
If you need to track what's going on in the system, the SQL transaction log won't be helpful, and it would be very difficult to get that information back out. Instead, I would suggest adding triggers on your tables that write information off to another table, or add some code in your data layer that saves off a log of what's going on. It could be as simple as wrapping the SQL command object with your own implementation, which saved SQL statements off to log4net in addition to whatever normal code it was executing.
It is the mechanism by which the RMDBS can assure atomicity and consistency, see ACID.
I've got an application in test that's logging events for a departmental Windows forms application to a SQL Server database. The application is nearly ready for production. The logging database is completely separate from the application database.
My question is, do I really need to create a production version of the logging database, or can I just log production and test events to the same database? The obvious answer is yes, of course I need a separate database. You never want test and production environments to mix. But in this case, the data being written to the database isn't really production data, exactly, it's just logging details that we use to troubleshoot problems. It has no business value, and nothing significant would be lost if the data were to be inadvertently dropped or the database were to be temporarily unavailable. And having it all in a test environment would make it much simpler for me to manage.
So on a pros and cons basis using a single logging database for all environments seems like a better solution. But it just doesn't feel right. Can anybody give me any specific reasons why this is a bad idea?
Your logging may not work if your dev server is down but prod isn't. Can guarantee that will be when something critical you need logged on prod will happen. In our case prod and dev are not in the same physical location which would mean sending logging data across our network and cause pipeline bottlenecks and cranky network guys.
Plus what if you decide to change the logging process? While you are doing new development, the entire prod process might break.
And there will be times when someone might read lthe logs, panic at some error forgetting that it happened on dev. Or worse, someone might see a bad error that they thought was happening on dev that was really happening on prod.
I would say:
Use a standard methodology for logging (a single DLL or similar) and actually house the logging DB in production.
That way, your logging database can be considered a "Logging Server" and ALL apps (Dev, Staging, Test, and Prod) can log to it, since you are using a vetted library.
Of course, you have to still watch out that you don't flood the server...
I don't see anything wrong with keeping it on the dev box, unless your app will fail if it's unable to properly log, or unless the information being logged is more valuable than you indicate.
On the upside, keeping your logging db on your dev server will help take the load for handling this data off of the production server - a definite plus where performance is concerned.
Keeping logs on a database is a bad idea in the first place. What happens when you can't connect to the DB for some reason or another? I suggest you use log4net and implement RollingFileAppenders. They write log entries to a file and when the file hits a certain limit, log4net starts writing to a new file. If you have questions getting setup, feel free to ask. I would be glad to help!
Note: This is not for unit testing or integration testing. This is for when the application is running.
I am working on a system which communicates to multiple back end systems, which can be grouped into three types
Relational database
SOAP or WCF service
File system (network share)
Due to the environment this will run in, there are no guarantees that any of those will be available at run time. In fact some of them seem pretty brittle and go down multiple times a day :(
The thinking is to have a small bit of test code which runs before the actual code. If there is a problem then persist the request and poll until the target system until it is available. Tests could possibly be rerun within the code to check it is still available at logical points. The ultimate goal is to have a very stable system, regardless of the stability (or lack thereof) of the systems it communicates to.
My questions around this design are:
Are there major issues with it? (small things like the fact it may fail between the test completing and the code running are understandable)
Are there better ways to implement this sort of design?
Would using traditional exception handling and/or transactions be better?
Updates
The system needs to talk to the back end systems in a coordinated way.
The system is very async in nature so using things like queuing technologies is fine.
The system must run even if one or more backend systems are down as others may be up and processing of some information is possible.
You will be needing that traditional exception handling no matter what, since as you point out there's always the chance that things'll fail between your last check and the actual request. So I really think any solution you find should try to interact smoothly with this.
You are not stating if these flaky resources need to interact in some kind of coordinated manner, which would indicate that you should probably be using a transaction manager of some sort to do this. I do not believe you want to get into the footwork of transaction management in application code for most needs.
Sometimes I have also seen people use AOP to encapsulate retry logic to back-end systems that fail (for instance due to time-out issues). Used sparingly this may be a decent solution.
In some cases you can also use message queuing technology to alleviate unstable back-ends. You could for instance commit to a message queue as part of a transaction, and only pop off the queue when successful. But this design is normally only possible when you're able to live with an asynchronous process.
And as always, real stability can only be achieved by attacking the root cause of the problem. I had a 25-year old bug fixed in a mainframe TCP/IP stack fixed because we were overrunning it, so it is possible.
The Microsoft Smartclient framework provides a ConnectionMonitor class. Should be easy to use or duplicate.
Our approach to this kind of issue was to run a really basic 'sanity tester' prior to bringing up our main application. This was thick client so we could run the test every time the app started. This sanity test would go out and check things like database availability, and external network (extranet) access, and it could have been extended to do webservices as well.
If there was a failure, the user was informed, and crucially an email was also sent to the support/dev team. These emails soon became unweildy as so many were being created, but we then setup filters, so we knew when somethings really bad was happening. Overall the approach worked pretty well, our biggest win was being able to tell users that the system was down, before they had entered data, and got part way through a long winded process. They absolutely loved it.
At a technica level the sanity was written in C#, it used exception handling in a conventional way not to find the problems it was looking for. The sanity program became a mini app in its own right, and it was standalone from the main app. If I were doing it again I'd using a logging framework to capture issues, which is more flexible then our hard coded approach.