Reading data from a continuously updating log file

Reading data from a continuously updating log file - c

I have a log file (similar to a web server's access log) which I need to continuously read and use a regular expression to fetch a value in each line.
For example, if you could imagine reading from a web server's access log, I would be getting the IP address of each visitor as their visit is written to the log.
Of course this is easy to do using the linux command line (combination of tail and sed, or something like that), but I want to do it using C code.
I guess I could open the file, read X lines, save the number of the last line I read, and then on the next round open the file, move to the line X, and read from there, and so on, but this seems very clumsy.
Is there a known way or best practice for reading data from a continuously updating log file?
Thanks

Related

changing and removing lines from text file using swi-prolog

I'm using text files as a database for saving users' information for a game which i made using swi-prolog. The information is saved like this:user(Name,Password,Age,Points). What i want to do is to change a user's Points without having to rewrite the entire db. In other words, I am looking for something that will work like retractall(user(Name,_,_,_)), but with the text file. I know how to find the specific user using read/2, and how to assert a new fact using write/2, but i don't know how to delete one specific line in the text file.
Thank you for helping.

Take a look at SWI-Prolog's library(persistency). It removes a fact by adding a line that the fact is removed. If the file gets too big with add/remove lines, it provides db_sync/1 to write a clean file. OS file system operations do not allow to remove part of a file (except from truncating the end). The normal way to do this is to write a new file and, if successful, rename this to the existing one, so nothing is lost if you crash while writing the new file.

Write to beginning of file in Scheme

I want to create a log file in Scheme, but every time I add a new entry, I want it to be at the beginning of the file, so when I read X number of logs from the file again, it reads the X newest entries from new to old.
Example:
22/02/14 13:50 Newest log entry
22/02/14 13:45 Older log entry
22/02/14 13:40 Oldest log entry
Does anyone know how to do this using the 'open-input-file' and 'open-output-file' procedures?

The functionality you are requesting you need to write the whole logfile every time you need to write a new entry because you will overwrite the previous first entry with the next. Usually programs don't keep the commited parts of a logfile so this introduces more memory usage and your program must know when the log is being rotated to clear the buffer.
The standard way is to append a new entry, which leaves the previous log entries where the last log write put them.
As a compromise you might look for a program that displays a log file in the reverse order and prehaps tails it like that too. It's easy to implement so I guess it would exist already. Writing such an app if it doesn't exist would be trivial.

Unknown Master List .dat file, issues retrieving information

I come to you completely stumped. I do some side work for a company that uses an old DOS based program to input and retrieve data. This is a legacy piece of software, and they have since moved to either QuickBooks or Outlook for all of their address or billing related needs. However there have been some changes made, and they work with this database fairly regularly. Since the computer that this software is on, is running XP (and none of the other computers in the office can run it) they're looking to phase this software out for when the computer inevitably explodes.
TLDR; I have an old .csv file (roughly two years) that has a good chunk of information on it, but again it's two years old. I have another file called ml.dat (I'm assuming masterlist.dat) that's in the same folder as this legacy software. I open it with notepad and excel and am presented with information like this:
S;Û).;PÃS;*p(â'a,µ,
The above chunk of text is recognized much less within notepad or excel. It's a lot more of the unrecognized squares.
Some of the information is actually readable however. I can for example read the occasional town name, or person's name but I'm unable to get all of the information since there's a lot missing. Perhaps the data isn't in unicode or something? I have no idea. Any suggestions? I'm ultimately trying to take this information and toss it into either quickbooks or outlook.
Please help!
Thanks
Edit: I'm guessing the file might be encrypted since .dat's are usually clear text? Any thoughts?

.DAT files can be anything, they are usually just application data. Since there is readable text, then it is very unlikely that this file is encrypted. Instead you are seeing ASCII representations of the bytes of other content. http://www.asciitable.com/ Assuming single byte values, the number 77 might appear in the file somewhere as M.
Your options:
Search for some utility to load and translate the dat file for that application.
Set up an appropriate dos emulator so you can run this application on another box, or even a virtual machine running freedos or something.
Figure out the file format and then write a program to translate the data.
For #3, you can attach a debugger to the application to trace how the file is read and written. Alternatively you can try to figure out record boundaries (if all the records are the same size, then things are a little bit easier.) Then you can use known values to try to find field boundaries. If you can find (or reverse compile) the source code, then that could also give you insight into the file format.
1 is your best bet, and #2 will buy you some time so that you don't need that original machine anymore. #3 would likely be something to outsource.
If you can find the source or file format, then you just recreate whatever data structure was dumped to the file and read the file into it.
To find which exe opens it, you can do something like:
for %f in (*.exe) do find "ml.dat" %f -c
Assuming the original application was written in C then there would be code something like this to read the first record from the file:
struct SecretData
{
int first;
double money;
char city[10];
};
FILE* input;
struct SecretData secretdata;
input = fopen("ml.dat", "rb");
fread(&data, sizeof(data), 1, input);
fclose(input);
(The file would have been written with fwrite.) Basically you need to figure out the innards of the SecretData structure to be able to read the file.
There likely wasn't a separate utility used to make the file, dumping data and reading it back from a file is relatively easy in most languages.

Following multiple log files efficiently

I'm intending to create a programme that can permanently follow a large dynamic set of log files to copy their entries over to a database for easier near-realtime statistics. The log files are written by diverse daemons and applications, but the format of them is known so they can be parsed. Some of the daemons write logs into one file per day, like Apache's cronolog that creates files like access.20100928. Those files appear with each new day and may disappear when they're gzipped away the next day.
The target platform is an Ubuntu Server, 64 bit.
What would be the best approach to efficiently reading those log files?
I could think of scripting languages like PHP that either open the files theirselves and read new data or use system tools like tail -f to follow the logs, or other runtimes like Mono. Bash shell scripts probably aren't so well suited for parsing the log lines and inserting them to a database server (MySQL), not to mention an easy configuration of my app.
If my programme will read the log files, I'd think it should stat() the file once in a second or so to get its size and open the file when it's grown. After reading the file (which should hopefully only return complete lines) it could call tell() to get the current position and next time directly seek() to the saved position to continue reading. (These are C function names, but actually I wouldn't want to do that in C. And Mono/.NET or PHP offer similar functions as well.)
Is that constant stat()ing of the files and subsequent opening and closing a problem? How would tail -f do that? Can I keep the files open and be notified about new data with something like select()? Or does it always return at the end of the file?
In case I'm blocked in some kind of select() or external tail, I'd need to interrupt that every 1, 2 minutes to scan for new or deleted files that shall (no longer) be followed. Resuming with tail -f then is probably not very reliable. That should work better with my own saved file positions.
Could I use some kind of inotify (file system notification) for that?

If you want to know how tail -f works, why not look at the source? In a nutshell, you don't need to periodically interrupt or constantly stat() to scan for changes to files or directories. That's what inotify does.

C Remove the first line from a text file without rewriting file

I've got a service which runs all the time and also keeps a log file. It basically adds new lines to the log file every few seconds. I'm written a small file which reads these lines and then parses them to various actions. The question I have is how can I delete the lines which I have already parsed from the log file without disrupting the writing of the log file by the service?
Usually when I need to delete a line in a file then I open the original one and a temporary one and then I just write all the lines to the temp file except the original which I want to delete. Obviously this method will not word here.
So how do I go about deleting them ?

In most commonly used file systems you can't delete a line from the beginning of a file without rewriting the entire file. I'd suggest instead of one large file, use lots of small files and rotate them for example once per day. The old files are deleted when you no longer need them.

Can't be done, unfortunately, without rewriting the file, either in-place or as a separate file.
One thing you may want to look at is to maintain a pointer in another file, specifying the position of the first unprocessed line.
Then your process simply opens the file and seeks to that location, processes some lines, then updates the pointer.
You'll still need to roll over the files at some point lest they continue to grow forever.

I'm not sure, but I'm thinking in this way:
New Line is a char, so you must delete chars for that line + New Line char
By the way, "moving" all characters back (to overwrite the old line), is like copying each character in a different position, and removing them from their old position
So no, I don't think you can just delete a line, you should rewrite all the file.

You can't, that just isn't how files work.
It sounds like you need some sort of message logging service / library that your program could connect to in order to log messages, which could then hide the underlying details of file opening / closing etc.

If each log line has a unique identifier (or even just line number), you could simply store in your log-parsing the identifier until which you got parsing. That way you don't have to change anything in the log file.
If the log file then starts to get too big, you could switch to a new one each day (for example).