Logstash unique output to file - file

Does anybody know how to configure logstash to obtain distinct results in output file?
For example if i have a couple eqaul lines from input source I will get duplicated results in my output file. I am interested in something like 'unique' in unix.
Thanks in advance for your help.

The only way you could do this with logstash would be to write a plugin for it, but since logstash is a long running process, you'd have to be very careful about how you go about it. To only output unique items, you have to keep track in memory of all items you ever see. So without some way to purge the data in memory, your logstash process will consume all memory on JVM and then eventually crash.
So it may make more sense to preprocess your log file with some external process that does a sort/unique and then writes the output to a log file that you can then ingest with logstash.

Related

identifying data file type

I have a huge 1.9 GB data file without extension I need to open and get some data from, the problem is this data file is extension-less and I need to know what extension it should be and what software I can open it with to view the data in a table.
here is the picture :
Its only 2 lines file, I already tried csv on excel but it did not work, any help ?
I have never use it but you could try this:
http://mark0.net/soft-tridnet-e.html
explained here:
http://www.labnol.org/software/unknown-file-extensions/20568/
The third "column" of that line looks 99% chance to be from php's print_r function (with newlines imploded to be able to stored on a single line).
There may not be a "format" or program to open it with if its just some app's custom debug/output log.
A quick google found a few programs to split large files into smaller units. THat may make it easier to load into something (may or may not be n++) for reading.
It shouldnt be too hard to mash out a script to read the lines and reconstitute the session "array" into a more readable format (read: vertical, not inline), but it would to be a one-off custom job, since noone other than the holder of your file would have a use for it.

Managing log file size

I have a program which logs its activity.
I want to implement a log file mechanism to keep the log file under a certain size, lets say 10 MB.
The log file itself just holds commands the program executed; those commands are variable length.
Right now, the program runs on a windows environment, but I'm likely to port it to UNIX soon.
I've came up with two methods for managing the log files:
1. Keep multiple files of lower size, and if the new command exceeds the current file length, truncate the oldest file to zero size, and start writing there.
2. Keep a header in the file, which holds metadata regarding the first command in the file, and the next place to write to in the file. Also I think, each command should hold metadata about it's length this way.
My questions are as follows:
In terms of efficiency which of these methods would you use, and why?
Is there a unix command / function to this easily?
Thanks a lot for your help,
Nihil.
On UNIX/Linux platforms there's a logrotate program that manages logfiles. Details can be found for example here:
http://linuxcommand.org/man_pages/logrotate8.html

Check if another program has a file open

After doing tons of research and nor being able to find a solution to my problem i decided to post here on stackoverflow.
Well my problem is kind of unusual so I guess that's why I wasn't able to find any answer:
I have a program that is recording stuff to a file. Then I have another one that is responsible for transferring that file. Finally I have a third one that gets the file and processes it.
My problem is:
The file transfer program needs to send the file while it's still being recorded. The problem is that when the file transfer program reaches end of file on the file doesn't mean that the file actually is complete as it is still being recorded.
It would be nice to have something to check if the recorder has that file still open or if it already closed it to be able to judge if the end of file actually is a real end of file or if there simply aren't further data to be read yet.
Hope you can help me out with this one. Maybe you have another idea on how to solve this problem.
Thank you in advance.
GeKod
Simply put - you can't without using filesystem notification mechanisms, windows, linux and osx all have flavors of this. I forget how Windows does it off the top of my head, but linux has 'inotify' and osx has 'knotify'.
The easy way to handle this is, record to a tmp file, when the recording is done then move the file into the 'ready-to-transfer-the-file' directory, if you do this so that the files are on the same filesystem when you do the move it will be atomic and instant ensuring that any time your transfer utility 'sees' a new file, it'll be wholly formed and ready to go.
Or, just have your tmp files have no extension, then when it's done rename the file to an extension that the transfer agent is polling for.
Have you considered using stream interface between the recorder program and the one that grabs the recorded data/file? If you have access to a stream interface (say an OS/stack service) which also provides a reliable end of stream signal/primitive you could consider that to replace the file interface.
There is no functions/libraries available in C to do this. But a simple alternative is to rename the file once an activity is over. For example, recorder can open the file with name - file.record and once done with recording, it can rename the file.record to file.transfer and the transfer program should look for file.transfer to transfer and once the transfer is done, it can rename the file to file.read and the reader can read that and finally rename it to file.done!
you can check if file is open or not as following
FILE_NAME="filename"
FILE_OPEN=`lsof | grep $FILE_NAME`
// if [ -z $FILE_NAME ] ;then
// "File NOT open"
// else
// "File Open"
refer http://linux.about.com/library/cmd/blcmdl8_lsof.htm
I think an advisory lock will help. Since if one using the file which another program is working on it, the one will get blocked or get an error. but if you access it in force,the action is Okey, but the result is unpredictable, In order to maintain the consistency, all of the processes who want to access the file should obey the advisory lock rule. I think that will work.
When the file is closed then the lock is freed too.Other processes can try to hold the file.

Following multiple log files efficiently

I'm intending to create a programme that can permanently follow a large dynamic set of log files to copy their entries over to a database for easier near-realtime statistics. The log files are written by diverse daemons and applications, but the format of them is known so they can be parsed. Some of the daemons write logs into one file per day, like Apache's cronolog that creates files like access.20100928. Those files appear with each new day and may disappear when they're gzipped away the next day.
The target platform is an Ubuntu Server, 64 bit.
What would be the best approach to efficiently reading those log files?
I could think of scripting languages like PHP that either open the files theirselves and read new data or use system tools like tail -f to follow the logs, or other runtimes like Mono. Bash shell scripts probably aren't so well suited for parsing the log lines and inserting them to a database server (MySQL), not to mention an easy configuration of my app.
If my programme will read the log files, I'd think it should stat() the file once in a second or so to get its size and open the file when it's grown. After reading the file (which should hopefully only return complete lines) it could call tell() to get the current position and next time directly seek() to the saved position to continue reading. (These are C function names, but actually I wouldn't want to do that in C. And Mono/.NET or PHP offer similar functions as well.)
Is that constant stat()ing of the files and subsequent opening and closing a problem? How would tail -f do that? Can I keep the files open and be notified about new data with something like select()? Or does it always return at the end of the file?
In case I'm blocked in some kind of select() or external tail, I'd need to interrupt that every 1, 2 minutes to scan for new or deleted files that shall (no longer) be followed. Resuming with tail -f then is probably not very reliable. That should work better with my own saved file positions.
Could I use some kind of inotify (file system notification) for that?
If you want to know how tail -f works, why not look at the source? In a nutshell, you don't need to periodically interrupt or constantly stat() to scan for changes to files or directories. That's what inotify does.

A simple log file format

I'm not sure if it was asked, but I couldn't find anything like this.
My program uses a simple .txt file for log purposes, It just creates/opens a file and appends lines.
After some time, I started to log quite a lot of activities, so the file became too large and hardly readable. I know, that it's not write way to do this, but I simply need to have a readable file.
So I thought maybe there's a simple file format for log files and a soft to view it or if you'd have any other suggestions on this question?
Thanks for help in advance.
UPDATE:
It's access 97 application. I'm logging some activities like Form Loading, SELECT/INSERT/UPDATE to MS SQL Server ...
The log file isn't really big, I just write the duration of operations, so I need a simple way to do this.
The Log file is created on a user's machine. It's used for monitoring purposes logging some activities' durations.
Is there a way of viewing that kind of simple Log file highlighted with an existing tool?
Simply, I'd like to:
1) Write smth like "'CurrentTime' 'ActivityName' 'Duration in milliseconds' " (maybe some additional information like query string) into a file.
2) Open it with a tool and view it highlighted or somehow more readable.
ANSWER: I've found a nice tool to do all I've wanted. Check my answer.
LogExpert
The 3 W's :
When, what and where.
For viewing something like multitail ("tail on steroids") http://www.vanheusden.com/multitail/
or for pure ms windows try mtail http://ophilipp.free.fr/op_tail.htm
And to keep your files readable, you might want to start new files when if the filesize of the current log file is over certain limit. Example:
activity0.log (1 mb)
activity1.log (1 mb)
activity2.log (1 mb)
activity3.log (1 mb)
activity4.log (205 bytes)
A fairly standard way to deal with logging from an application into a plain text file is to:
split the logs into different program functional areas.
rotate the logs on a daily/weekly basis (i.e. split the log on a size or date basis)
so the current log is "mylog.log" or whatever, and yesterday's was "mylog.log.1" or "mylog.ddmmyyyy.log"
This keeps the size of the active log manageable. And then you can just have expiry rules so that old logs get thrown away on a regular basis.
In addition it would be a good idea to have different log levels for your application (info/warning/error/fatal) so that you're not logging more than is necessary.
First, check you're only logging things that are useful.
If it's all useful, make sure it is easily parsable by tools such as grep, that way you can find the info you want. Make sure you have the type of log entry, the date/time all conforming to a layout.
Build yourself a few scripts to extract the information for you.
Alternatively, use separate log files for different types of entries.
Basically you better just split logs according to severity. You'll rarely need to read all logs for the whole system. For example apache allows to configure error log and access log, pretty obvious what info exactly they have.
If you're under linux system grep is your best tool to search through logs for specific entries.
Look at popular logfiles like /var/log/syslog on Unix to get ideas:
MMM DD HH:MM:SS hostname process[pid]: message
Example out of my syslog:
May 11 12:58:39 raphaelm anacron[1086]: Normal exit (1 job run)
But to give you the perfect answer we'd need more information about what you are logging, how much and how you want to read the logs.
If only the size of the log file is the problem, I recommend using logrotate or something similar. logrotate watches log files and, depending on how you configured it, after a given time or when the log file exceeds a given size, it moves the log file to an archive directory and optionally compresses it. Then the original log file is truncated. For example, you could configure it to archive the log file every 24 hours or whenever the files size exceeds 500kb.
If this is a program, you might investigate apache logging libraries (http://logging.apache.org/) Out of the box, they'll give you a decent logging format out of the box. They're also customizable, so you can simplify your parsing job.
If this is a script, see some of the other answers.
LogExpert
I've found it here. Filter is better, than in mtail. There's an option of highlighting just adding a string and the app is nice and readable. You can customize columns as you like.

Resources