Omnimark file processing fails

Omnimark file processing fails - sgml

We have the omnimark script that takes 2gb sgml file size as input and output the file which is around 2.2 gb.The script is called from unix shell script and we are facing issues that sometimes script runs successfully and sometime it just aborted with no error....any idea or suggestions how to debug this?

I have seen this type of issue before in running OmniMark v5.3 when the script bombs due to lack of server resources/memory.
If you've specified writing to a log file, e.g. using -log logfilename.txt, then you would see something like an error code #3000 "insufficient memory error".
http://developers.omnimark.com/docs/html/error/3000.htm
If no log file, then initial step would be to run the script in a console session so that any such abort message is visible.
Stilo have a page listing fixes in various versions of OmniMark
http://developers.omnimark.com/docs/html/concept/806.htm
This mentions a variety of memory-related issues in various versions of the software (e.g. use of certain translate rules) which may help some investigation.
Alternatively, you could add to the script writing to a debug log file (with a global switch to activate debug on or off (so you don't waste further I/O resources when you don't need to)). Debug log file should be unbuffered. At certain breakpoints in the script add a message. The more verbose the better at narrowing down where/when the error is, but with the size of file I suggest it's a I/O or memory error.
Also depends what version of OmniMark you're using.

Related

MiNiFi GetFile processor fails to get large files

I'm running apache MiNiFi c++, The flow starts with a GetFile processor.
The input directory includes some large files, and when I run MiNiFi the files above ~1.5 GB fail and do not get queued.
The log file states:
[org::apache::nifi::minifi::processors::GetFile] [Warning] failed to stat large_file_path_here
The other smaller files are queued as expected.
Does anyone have a clue what can be wrong? Why can't the processor manage the larger files?
Thanks in advance.

What you found seems like a bug that is present even in the current MiNiFi implementation even up to today. The issue is that file sizes you mentioned, a narrowing exception happens here when trying to determine the length of the file to be written into the content repository.
We will try to fix this issue asap.

Application calls old source functions

There is an application on remote machine with Linux OS(Fedora), writing to the log file when certain events occur. Some time ago I changed format of the message being written to the log file. But recently it turned out that for some reason in some seldom cases log files with old format messages appear there. I know for sure that none part of my code can write such strings. Also there is no instance of the old application running. Does anyone have some ideas why it can happen? It's not possible to check which process writes those files because anything like auditctl is not installed there, and neither package manager or yum to get it or install. Application is written in C language.

you can use fuser command to find out all the processes that are using that file
`fuser file.log`

Is there any way to replicate a memory can't be read error message in my C# application?

Let me state upfront that I truly appreciate any assistance on this issue.
I have a C# (2.0) application. This is relatively simple application that executes stored procedures based on an XML file that is passed as a command line parameter.
We use it as a tool to call different stored procedures. This application does some logging and for the most part works very well.
The application reads the stored procedure name and parameters from an XML file. It sets up a connection string and SQL Command object (System.Data.SqlClient.SqlCommand).
Then it runs the stored procedure with the ExecuteReader method.
Unfortunately on a handful of occasions this application has generated the following error:
“Application popup: StoredProcLauncher.exe - Application Error : The instruction
at "0x7c82c912" referenced memory at "0x00000000". The memory could not be "read”
This error has appeared on multiple servers so it must be a code issue.
It seems that when our production server rolls a certain number it belches out this memory error.
The problem is I don’t see this issue on development. I can’t replicate it so I’m stuck.
Is there any way to simulate this error. Can I fill up the memory on my local PC somehow to attempt to replicate this error?
Does anyone know some common coding issues that might result in an error like this?
Does anyone have some rope I can borrow?

One way to do this is to wrap the offending code in a try catch block and writing the stack trace and error message to the windows application event log, text file, email, etc.
This will give you some line numbers and additional information.
Also note, you may need to deploy this in debug mode or at least copy the .pdb file with the application exe/dll so it can get the debug symbols. Can't remember off the top of my head how that works, but I think when you deploy in release mode you may loose some valuable debug information.

The instruction at "0x7c82c912" referenced memory at "0x00000000"
This is an access violation:
An access violation occurs in unmanaged or unsafe code when the code attempts to read or write to memory that has not been allocated, or to which it does not have access. This usually occurs because a pointer has a bad value.
Why does your program have unmanaged/unsafe code? For doing what you described it needs no native code.
Alas, the code crashes and now is not the time to wonder how is ending up calling native code. To solve the issue you're going to have to catch a dump and analyze the dump. See Capturing Application Crash Dumps. There are tools that specialize in this, like breakpad. there are also services that can help you collect and track crashes generated from your app, like crittercism.com or AirBrake. I even created one for myself and made it public bugcollect.com.

Foreach Loop Container with Foreach File Enumerator option iterates all files twice

I am using the SSIS Foreach Loop Container to iterate through files with a certain pattern on a network share.
I am encountering an kind of unreproducible malfunction of the Loop Container:
Sometimes the loop is executed twice. After all files were processed it starts over with the first file.
Have anyone encountered a similar bug?
Maybe not directly using SSIS but accessing files on a Windows share with some kind of technology?
Could this error relate to some network issues?
Thanks.

I found this to be the case whilst working with Excel files and using the *.xlsx wildcard to drive the foreach.
Once I put logging in place I noticed that when the Excel was opened it produced an excel file prefixed with ~$. This was picked up by the foreach loop.
So I used a trick similar to http://geekswithblogs.net/Compudicted/archive/2012/01/11/the-ssis-expression-wayndashskipping-an-unwanted-file.aspx to exclude files with a ~$ in the filename.

What error message (SSIS log / Eventvwr messages) do you get?
Similar to #Siva, I've not come across this, but some ideas you could use to try and diagnose. You may be doing some of these already, I've just written them down for completeness from my thought processes...
log all files processed. write a line to a log file/table pre-processing (each file), then post-process (each file). Keep the full path of each file. This is actually something we do as standard with our ETL implementations, as users are often coming back to us with questions about when/what has been loaded. This will allow you to see if files are actually being processed twice.
perhaps try moving each file after it is processed to a different directory. That will make it more difficult to have a file processed a second time and the problem may disappear. (If you are processing them from an area that is a "master" area (and so cant move them), consider copying the files to a "waiting" folder, then processing them and moving them to a "processed" folder)
#Siva's comment is interesting - look at the "traverse subfolders" check box.
check your eventvwr for odd network events, or application events (SQL Server restarting?)
use perf mon to see if there is anything odd happening in terms of network load on your server (a bit of a random idea!)
try running your whole process with files on a local disk instead of a network disk, if your mean time between failures is after running 10 times, then you could do this load locally 20-30 times and if you dont get an error it may be a network error

nothing helped - I implemented following workaround: script task in the foreach iterator which tracks all files. if a file was alread loaded a warning is fired and the file is not processed again. anyway, seems to be some network related problem...

Daemon logging in Linux

So I have a daemon running on a Linux system, and I want to have a record of its activities: a log. The question is, what is the "best" way to accomplish this?
My first idea is to simply open a file and write to it.
FILE* log = fopen("logfile.log", "w");
/* daemon works...needs to write to log */
fprintf(log, "foo%s\n", (char*)bar);
/* ...all done, close the file */
fclose(log);
Is there anything inherently wrong with logging this way? Is there a better way, such as some framework built into Linux?

Unix has had for a long while a special logging framework called syslog. Type in your shell
man 3 syslog
and you'll get the help for the C interface to it.
Some examples
#include <stdio.h>
#include <unistd.h>
#include <syslog.h>
int main(void) {
openlog("slog", LOG_PID|LOG_CONS, LOG_USER);
syslog(LOG_INFO, "A different kind of Hello world ... ");
closelog();
return 0;
}

This is probably going to be a was horse race, but yes the syslog facility which exists in most if not all Un*x derivatives is the preferred way to go. There is nothing wrong with logging to a file, but it does leave on your shoulders an number of tasks:
is there a file system at your logging location to save the file
what about buffering (for performance) vs flushing (to get logs written before a system crash)
if your daemon runs for a long time, what do you do about the ever growing log file.
Syslog takes care of all this, and more, for you. The API is similar the printf clan so you should have no problems adapting your code.

One other advantage of syslog in larger (or more security-conscious) installations: The syslog daemon can be configured to send the logs to another server for recording there instead of (or in addition to) the local filesystem.
It's much more convenient to have all the logs for your server farm in one place rather than having to read them separately on each machine, especially when you're trying to correlate events on one server with those on another. And when one gets cracked, you can't trust its logs any more... but if the log server stayed secure, you know nothing will have been deleted from its logs, so any record of the intrusion will be intact.

I spit a lot of daemon messages out to daemon.info and daemon.debug when I am unit testing. A line in your syslog.conf can stick those messages in whatever file you want.
http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/040/4036/4036s1.html has a better explanation of the C API than the man page, imo.

Syslog is a good option, but you may wish to consider looking at log4c. The log4[something] frameworks work well in their Java and Perl implementations, and allow you to - from a configuration file - choose to log to either syslog, console, flat files, or user-defined log writers. You can define specific log contexts for each of your modules, and have each context log at a different level as defined by your configuration. (trace, debug, info, warn, error, critical), and have your daemon re-read that configuration file on the fly by trapping a signal, allowing you to manipulate log levels on a running server.

As stated above you should look into syslog. But if you want to write your own logging code I'd advise you to use the "a" (write append) mode of fopen.
A few drawbacks of writing your own logging code are: Log rotation handling, Locking (if you have multiple threads), Synchronization (do you want to wait for the logs being written to disk ?). One of the drawbacks of syslog is that the application doesn't know if the logs have been written to disk (they might have been lost).

If you use threading and you use logging as a debugging tool, you will want to look for a logging library that uses some sort of thread-safe, but unlocked ring buffers. One buffer per thread, with a global lock only when strictly needed.
This avoids logging causing serious slowdowns in your software and it avoids creating heisenbugs which change when you add debug logging.
If it has a high-speed compressed binary log format that doesn't waste time with format operations during logging and some nice log parsing and display tools, that is a bonus.
I'd provide a reference to some good code for this but I don't have one myself. I just want one. :)

Our embedded system doesn't have syslog so the daemons I write do debugging to a file using the "a" open mode similar to how you've described it. I have a function that opens a log file, spits out the message and then closes the file (I only do this when something unexpected happens). However, I also had to write code to handle log rotation as other commenters have mentioned which consists of 'tail -c 65536 logfile > logfiletmp && mv logfiletmp logfile'. It's pretty rough and maybe should be called "log frontal truncations" but it stops our small RAM disk based filesystem from filling up with log file.

There are a lot of potential issues: for example, if the disk is full, do you want your daemon to fail? Also, you will be overwriting your file every time. Often a circular file is used so that you have space allocated on the machine for your file, but you can keep enough history to be useful without taking up too much space.
There are tools like log4c that you can help you. If your code is c++, then you might consider log4cxx in the Apache project (apt-get install liblog4cxx9-dev on ubuntu/debian), but it looks like you are using C.

So far nobody mentioned boost log library which has nice and easy way to redirect your
log messages to files or syslog sink or even Windows event log.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight