Replay a file-based data stream

Replay a file-based data stream - file

I have a live stream of data based on files in different formats. Data comes over the network and is written to files in certain subdirectories in a directory hierarchy. From there it is picked up and processed further. I would like to replay e.g. one day of this data stream for testing and simulation purposes. I could duplicate the data stream for one day to a second machine and „record“ it this way, by just letting the files pile up without processing or moving them.
I need something simple like a Perl script which takes a base directory, looks at all contained files in subdirectories and their creation time and then copies the files at the same time of the day to a different base directory.
Simple example: I have files a/file.1 2012-03-28 15:00, b/file.2 2012-03-28 09:00, c/file.3 2012-03-28 12:00. If I run the script/program on 2012-03-29 at 08:00 it should sleep until 09:00, copy b/file.2 to ../target_dir/b/file.2, then sleep until 12:00, copy c/file.3 to ../target_dir/c/file.3, then sleep until 15:00 and copy a/file.1 to ../target_dir/a/file.1.
Does a tool like this already exist? It seems I’m missing the right search keywords to find it.
The environment is Linux, command line preferred. For one day it would be thousands of files with a few GB in total. The timing does not have to be ultra-precise. Second resolution would be good, minute resolution would be sufficient.

Related

Copy subdirectories one at a time with a pause

I have a folder tree with several subdirectories within subdirectories. I am looking for a batch script to move one bottom level subdirectory, wait 10 minutes and move the next, wait 10 minutes and so on. Alternatively I could move one file, wait a minute, then move the next and so on, as long is it replicates the directory structure.
I have been able to COPY files from a folder with a delay between each file using instructions found here: Copy Paste Files with delay
However I want to MOVE files and include subdirectories.
The reason behind this request is that I am moving a large amount of media (video specifically) and it needs to be checked into a database once it lands in the scanned location. Because the database is in use, dumping the entire contents of the top level folder will overload the system. At the moment I am manually moving one bottom level folder at a time and it is taking up way too much of my time.
Thanks!

Date in NLog file name and limit the number of log files

I'd like to achieve the following behaviour with NLog for rolling files:
1. prevent renaming or moving the file when starting a new file, and
2. limit the total number or size of old log files to avoid capacity issues over time
The first requirement can be achieved e.g. by adding a timestamp like ${shortdate} to the file name. Example:
logs\trace2017-10-27.log <-- today's log file to write
logs\trace2017-10-26.log
logs\trace2017-10-25.log
logs\trace2017-10-24.log <-- keep only the last 2 files, so delete this one
According to other posts it is however not possible to use date in the file name and archive parameters like maxArchiveFiles together. If I use maxArchiveFiles, I have to keep the log file name constant:
logs\trace.log <-- today's log file to write
logs\archive\trace2017-10-26.log
logs\archive\trace2017-10-25.log
logs\archive\trace2017-10-24.log <-- keep only the last 2 files, so delete this one
But in this case every day on the first write it moves the yesterday's trace to archive and starts a new file.
The reason I'd like to prevent moving the trace file is because we use Splunk log monitor that is watching the files in the log folder for updates, reads the new lines and feeds to Splunk.
My concern is that if I have an event written at 23:59:59.567, the next event at 00:00:00.002 clears the previous content before the log monitor is able to read it in that fraction of a second.
To be honest I haven't tested this scenario as it would be complicated to set up as my team doesn't own Splunk, etc. - so please correct me if this cannot happen.
Note also I know that it is possible to directly feed Splunk other ways like via network connection, but the current setup for Splunk at our company is reading from log files so it would be easier that way.
Any idea how to solve this with NLog?

When using NLog 4.4 (or older) then you have to go into Halloween mode and make some trickery.
This example makes hourly log-files in the same folder, and ensure archive cleanup is performed after 840 hours (35 days):
fileName="${logDirectory}/Log.${date:format=yyyy-MM-dd-HH}.log"
archiveFileName="${logDirectory}/Log.{#}.log"
archiveDateFormat="yyyy-MM-dd-HH"
archiveNumbering="Date"
archiveEvery="Year"
maxArchiveFiles="840"
archiveFileName - Using {#} allows the archive cleanup to generate proper file wildcard.
archiveDateFormat - Must match the ${date:format=} of the fileName (So remember to correct both date-formats, if change is needed)
archiveNumbering=Date - Configures the archive cleanup to support parsing of filenames as dates.
archiveEvery=Year - Activates the archive cleanup, but also the archive file operation. Because the configured fileName automatically ensures the archive file operation, then we don't want any additional archive operations (Ex. avoiding generating extra empty files at midnight).
maxArchiveFiles - How many archive files to keep around.
With NLog 4.5 (Still in BETA), then it will be a lot easier (As one just have to specify MaxArchiveFiles). See also https://github.com/NLog/NLog/pull/1993

Strange timestamp duplication when renaming and recreating a file

I'm trying to rename a log file named appname.log into the form appname_DDMMYY.log for archiving purposes and recreate an empty appname.log for further writing. When doing this in Windows 7 using C++ and either WinAPI or Qt calls (which may be the same internally) the newly created .log file strangely inherits the timestamps (last modified, created) from the renamed file.
This behaviour is also observable when renaming a file in Windows Explorer and creating a file with the same name quickly afterwards in the same directory. But it has to be done fast. After clicking on "new Text File" the timestamps are normal but after renaming they change to the timestamps the renamed file had or still has.
Is this some sort of Bug? How can I rename a file and recreate it shortly afterwards without getting the timestamps messed up?

This looks like it is by design, perhaps to try to preserve the time for "atomic saving." If an application does something like (save to temp, delete original, rename temp to original) to eliminate the risk of a mangled file, every time you saved a file the create time would increase. A file you have been editing for years would appear to have been created today. This kind of save pattern is very common.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724320(v=vs.85).aspx
If you rename or delete a file, then restore it shortly thereafter, Windows searches the cache for file information to restore. Cached information includes its short/long name pair and creation time.
Notice that modification time is not restored. So after saving the file appears to have been modified and the creation time is the same as before.
If you create "a-new" and rename it back to "a" you get the old creation time of "a". If you delete "a" and recreate "a" you get the old creation time of "a".

This behaviour is called "File Tunneling". File Tunneling is allow "...to enable compatibility with programs that rely on file systems being able to hold onto file meta-info for a short period of time". Basically backward compatibility for older Windows systems that use a "safe save" function that involved saving a copy of the new file to a temp file, deleting the original and then renaming the temp file to the original file.
Please see the following KB article: https://support.microsoft.com/en-us/kb/172190 (archive)
As a test example, create FileA, rename FileA to FileB, Create FileA again (within 15 seconds) and the creation date will be the same as FileB.
This behaviour can be disabled in the registry as per the KB article above. This behaviour is also quite annoying when "forensicating" Windows machines.
Regards
Adam B

Here's a simple python script that repro's the issue on my Windows7 64bit system:
import time
import os
def touch(path):
with open(path, 'ab'):
os.utime(path, None)
touch('a')
print " 'a' timestamp: ", os.stat('a').st_ctime
os.rename('a', 'a-old')
time.sleep(15)
touch('a')
print "new 'a' timestamp: ", os.stat('a').st_ctime
os.unlink('a')
os.unlink('a-old')
With the sleep time ~15 seconds I'll get the following output:
'a' timestamp: 1436901394.9
new 'a' timestamp: 1436901409.9
But with the sleep time <= ~10 seconds one gets this:
'a' timestamp: 1436901247.32
new 'a' timestamp: 1436901247.32
Both files... created 10 seconds apart have the time created-timestamp!

Why are compressed files modified at the end of compression?

Using 7zip I compressed ~15GB worth of pictures split in folders in 15 1024MB files.
Compression methode: LZMA2; Level: Ultra; Dictionary size: 64M;
At the end of compression some of the files had their "last modified" time changed to the time of completion, while some of the files didn't.
Why is this?
And if I have already uploaded most of the files will I be able to unarchive them successfully?

You would need to ask the author of the program for an explanation of why it modifies volumes at the end of the operation. If I had to make an educated guess, it might be because 7-zip doesn't know which is the last volume until it's finished (because this would depend on the compression ratio of the files being archived, which can't be predicted), and so it needs to go back and update parts of the volume file headers accordingly.
In general, though, quoting the relevant 7-zip help file entry:
NOTE: Please don't use volumes (and don't copy volumes) before
finishing archiving. 7-Zip can change any volume (including first
volume) at the end of archiving operation.
The only safe assumption is that you can't reliably use any of your individual 1GB volumes until 7-zip has finished processing the whole 15GB archive.

one log file for several batch processes locks up

I have very basic batch file knowledge. My first script was something I found to export Oracle Discoverer reports via windows task scheduler. That's basically all I know, I've got several of them (maybe 40 or so) that run at various times, some every 30 mins. They sometimes overlap in time.
My issue is not the specific discoverer export, but the logging of errors. I want to log everything to a single log file... with excel and access processes, I can loop until free and all is good; with the discoverer batch files, the log file gets locked at the beginning and doesn't let anything else log in until done. Some of these discoverer reports may take 30 mins or more, messing up all my runs.
Here's an example of my bat file:
#echo off
echo my process %date% %time% >>c:\test.log
c:\orant\DISCVR4\DIS4USR.EXE /connect MyUserID/MyPassword#myserver /open "c:\DiscoReport.DIS" /export xls "c:\MyFile.xls" /batch 1>>c:\test.log 2>>&1
I have a bat file with several of those individual process bat files, so that they run one at a time. That works fine. But when the run takes longer than estimated, then the next run fails... because they all start by running disco, and the log file is locked throughout and until the end... Is there something I can do to just open and close it right at the time of adding the results only?
I've looked for answers, and I believe there's something that might be done with the TEE or redirecting the results maybe to null and then using that as input piped to write to the log? but I don't really know how to do this... looked, tried, weeks and weeks, can't get anything working... Pretty please, I'm sure those who know, can do this with one single line.. Pls help...

Essentially, NO - if you want the log file to contain all events in time-order.
You could have the discoverer processes create their individual logfiles and then
type discoverer.log.file >>logfile
del discoverer.log.file
which would group all of the discoverer process output together in the logfile.
Otherwise, you'd have to put up with more than one log.
I severely doubt TEE could do it as TEE would then itself need to hold the log open, so you're back at the start - But I'll emphasise I haven't tried it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight