Bro: Disable ALL log generation - file

I created a bro script, with the objective of extract all files for all posible protocols from a pcap file. But I dont want to write all logs. Bro create a log file for each protocol. Example: 'http.log', 'smtp.log', etc. Even a 'weird.log' is generated. My pcap files are large (20gb), so, each log file contains over 30mb of information. This log generation reduce the performance of the file extraction.
I can disable the 'conn.log' with the line Log::disable_stream(Conn::LOG) but, what about all protocol logging??
This is my script
#load base/files/extract
event bro_init()
{
Log::disable_stream(Conn::LOG);
}
event file_sniff(f: fa_file, meta: fa_metadata)
{
local ext = "";
if ( meta?$mime_type )
ext = split_string(meta$mime_type, /\//)[1];
local fname = fmt("%s-%s.%s", f$source, f$id, ext);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT, [$extract_filename=fname]);
}

You can use the none writer like this:
bro -r packets.pcap Log::default_writer=Log::WRITER_NONE
I'm not totally convinced that writing these logs is harming your performance in any real way though. Typically, writing the files to disk is what causes the biggest overhead.

Here's a way to turn off whatever logging's been turned on (prior to bro_init), without having to know which stream IDs are relevant:
event bro_init()
{
# We don't want any output other than from this script.
for (id in Log::active_streams)
Log::disable_stream(id);
}
This construct makes me twitch a little about modifying a table while iterating over it, but it seems to work and I can't actually find any way to peek at one key from a table without doing an iteration. I suppose one could write
event bro_init()
{
while (|Log::active_streams|) {
for (id in Log::active_streams) {
Log::disable_stream(id);
break;
}
}
}
but that's hideous and I'm not going to use it unless I discover that I have to.

I achieved this with this line of code in main.bro:
Log::remove_filter(Conn::LOG, "default");

Related

Spring Batch FlatFileItemWriter does not write data to a file

I am new to Spring Batch application. I am trying to use FlatFileItemWriter to write the data into a file. Challenge is application is creating the file on a given path, but, now writing the actual content into it.
Following are details related to code:
List<String> dataFileList : This list contains the data that I want to write to a file
FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("C:\\Desktop\\test"));
writer.open(new ExecutionContext());
writer.setLineAggregator(new PassThroughLineAggregator<>());
writer.setAppendAllowed(true);
writer.write(dataFileList);
writer.close();
This is just generating the file at proper place but contents are not getting written into the file.
Am I missing something? Help is highly appreciated.
Thanks!
This is not a proper way to use Spring Batch Writer and writer data. You need to declare bean of Writer first.
Define Job Bean
Define Step Bean
Use your Writer bean in Step
Have a look at following examples:
https://github.com/pkainulainen/spring-batch-examples/blob/master/spring-boot/src/main/java/net/petrikainulainen/springbatch/csv/in/CsvFileToDatabaseJobConfig.java
https://spring.io/guides/gs/batch-processing/
You probably need to force a sync to disk. From the docs at https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/FlatFileItemWriter.html,
setForceSync
public void setForceSync(boolean forceSync)
Flag to indicate that changes should be force-synced to disk on flush. Defaults to false, which means that even with a local disk changes could be lost if the OS crashes in between a write and a cache flush. Setting to true may result in slower performance for usage patterns involving many frequent writes.
Parameters:
forceSync - the flag value to set

How do I append to a file in an Azure storage file share?

I want to write entries to a log file stored in Azure file storage. I currently have this:
var log = "My log entry";
var client = _storageAccount.CreateCloudFileClient();
var share = client.GetShareReference(Config.LogShare);
share.CreateIfNotExists();
var root = share.GetRootDirectoryReference();
var logfile = root.GetFileReference("log.txt");
if (!logfile.Exists()) logfile.Create(0);
// What goes here to append to the file...?
I can see plenty of examples of how to do this with Blobs, or how to upload an entire file, but how do I just append to an existing file?
I have tried this:
var buffer = Encoding.GetEncoding("UTF-8").GetBytes(log.ToCharArray());
using (var fileStream = logfile.OpenWrite(0)) {
fileStream.Write(buffer, (int)logfile.Properties.Length, buffer.Length);
}
But then I get this error:
The remote server returned an error: (416) The range specified is invalid for the current size of the resource..
I managed to work this out myself. You just need to increase the size of the file by the number of new bytes you want to write to it, and then write the new data to that new empty space at the end of the file, like this:
var client = _storageAccount.CreateCloudFileClient();
var share = client.GetShareReference(Config.LogShare);
share.CreateIfNotExists();
var root = share.GetRootDirectoryReference();
var logfile = root.GetFileReference("log.txt");
if (!logfile.Exists()) logfile.Create(0);
var buffer = Encoding.UTF8.GetBytes($"{log}\r\n");
logfile.Resize(logfile.Properties.Length + buffer.Length);
using (var fileStream = logfile.OpenWrite(null)) {
fileStream.Seek(buffer.Length * -1, SeekOrigin.End);
fileStream.Write(buffer, 0, buffer.Length);
}
You can do this with blobs https://blogs.msdn.microsoft.com/windowsazurestorage/2015/04/13/introducing-azure-storage-append-blob/
Shame it doesn't work with files too
Azure file storage REST API doesn't support appending to an existing file. To achieve this, please mount the file share to your machine as a drive, and append to the file just like simple local files.
Actually, I don't think you really need appending functionality per your code above. You can specify the file size in CloudFile.OpenWrite() / CloudFile.Create(), or try CloudFile.UploadFromStream() instead of CloudFile.OpenWrite().
This error could also be due to multi-threaded access.
I bet if you tried to lock the file before you access it, you will not face this problem.
There are many ways to update the file.
Since you already managed to get the share, the root, the folder and the file.. Here is a portion of my code that worked for me.
if (!fileLock.IsWriteLockHeld) fileLock.EnterWriteLock();
try
{
using (var stream = new MemoryStream(content, false))
{
file.UploadFromStream(stream, null, options);
}
}
catch (Exception ex)
{
File.AppendAllText(FileName, ex.ToString());
}
finally
{
if (fileLock.IsWriteLockHeld)
fileLock.ExitWriteLock();
}
Where fileLock is declared as:
protected ReaderWriterLockSlim fileLock = new ReaderWriterLockSlim();
Having said that, I am not saying that this is the best way ever to do it.
The two things I would like you to keep in mind :
1-Lock the resource that is likely to be accessed by more than one thread (That is so common in AZURE)
2- Get familiar with asynchronous methods that Azure provides.. use them when they suit well.
Coming back to your original problem about appending to the existing file..
All the methods of the CloudFile will overwrite the existing file. Cloud Files are not for frequent writing, and they indeed impact performance if you keep writing on them frequently, add the lock impact on performance, they will be horrible.
Cloud files are meant to store big bulk of data once and for all, if you want to add another bulk you have the choice of creating another file.
Have all your data with the client till they reach some size and create an algorith to select the file name and upload them all at once.

Hadoop Map Whole File in Java

I am trying to use Hadoop in java with multiple input files. At the moment I have two files, a big one to process and a smaller one that serves as a sort of index.
My problem is that I need to maintain the whole index file unsplitted while the big file is distributed to each mapper. Is there any way provided by the Hadoop API to make such thing?
In case if have not expressed myself correctly, here is a link to a picture that represents what I am trying to achieve: picture
Update:
Following the instructions provided by Santiago, I am now able to insert a file (or the URI, at least) from Amazon's S3 into the distributed cache like this:
job.addCacheFile(new Path("s3://myBucket/input/index.txt").toUri());
However, when the mapper tries to read it a 'file not found' exception occurs, which seems odd to me. I have checked the S3 location and everything seems to be fine. I have used other S3 locations to introduce the input and output file.
Error (note the single slash after the s3:)
FileNotFoundException: s3:/myBucket/input/index.txt (No such file or directory)
The following is the code I use to read the file from the distributed cache:
URI[] cacheFile = output.getCacheFiles();
BufferedReader br = new BufferedReader(new FileReader(cacheFile[0].toString()));
while ((line = br.readLine()) != null) {
//Do stuff
}
I am using Amazon's EMR, S3 and the version 2.4.0 of Hadoop.
As mentioned above, add your index file to the Distributed Cache and then access the same in your mapper. Behind the scenes. Hadoop framework will ensure that the index file will be sent to all the task trackers before any task is executed and will be available for your processing. In this case, data is transferred only once and will be available for all the tasks related your job.
However, instead of add the index file to the Distributed Cache in your mapper code, make your driver code to implement ToolRunner interface and override the run method. This provides the flexibility of passing the index file to Distributed Cache through the command prompt while submitting the job
If you are using ToolRunner, you can add files to the Distributed Cache directly from the command line when you run the job. No need to copy the file to HDFS first. Use the -files option to add files
hadoop jar yourjarname.jar YourDriverClassName -files cachefile1, cachefile2, cachefile3, ...
You can access the files in your Mapper or Reducer code as below:
File f1 = new File("cachefile1");
File f2 = new File("cachefile2");
File f3 = new File("cachefile3");
You could push the index file to the distributed cache, and it will be copied to the nodes before the mapper is executed.
See this SO thread.
Here's what helped me to solve the problem.
Since I am using Amazon's EMR with S3, I have needed to change the syntax a bit, as stated on the following site.
It was necessary to add the name the system was going to use to read the file from the cache, as follows:
job.addCacheFile(new URI("s3://myBucket/input/index.txt" + "#index.txt"));
This way, the program understands that the file introduced into the cache is named just index.txt. I also have needed to change the syntax to read the file from the cache. Instead of reading the entire path stored on the distributed cache, only the filename has to be used, as follows:
URI[] cacheFile = output.getCacheFiles();
BufferedReader br = new BufferedReader(new FileReader(#the filename#));
while ((line = br.readLine()) != null) {
//Do stuff
}

Neo4j store is not cleanly shut down; Recovering from inconsistent db state from interrupted batch insertion

I was importing ttl ontologies to dbpedia following the blog post http://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html. The post uses BatchInserters to speed up the task. It mentions
Batch insertion is not transactional. If something goes wrong and you don't shutDown() your database properly, the database becomes inconsistent.
I had to interrupt one of the batch insertion tasks as it was taking time much longer than expected which left my database in an inconsistence state. I get the following message:
db_name store is not cleanly shut down
How can I recover my database from this state? Also, for future purposes is there a way for committing after importing every file so that reverting back to the last state would be trivial. I thought of git, but I am not sure if it would help for a binary file like index.db.
There are some cases where you cannot recover from unclean shutdowns when using the batch inserter api, please note that its package name org.neo4j.unsafe.batchinsert contains the word unsafe for a reason. The intention for batch inserter is to operate as fast as possible.
If you want to guarantee a clean shutdown you should use a try finally:
BatchInserter batch = BatchInserters.inserter(<dir>);
try {
} finally {
batch.shutdown();
}
Another alternative for special cases is registering a JVM shutdown hook. See the following snippet as an example:
BatchInserter batch = BatchInserters.inserter(<dir>);
// do some operations potentially throwing exceptions
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
batch.shutdown();
}
});

Provide a database packaged with the .APK file or host it separately on a website?

Here is some background about my app:
I am developing an Android app that will display a random quote or verse to the user. For this I am using an SQLite database. The size of the DB would be approximately 5K to 10K records, possibly increasing to upto 1M in later versions as new quotes and verses are added. Thus the user would need to update the DB as and when newer versions are of the app or DB are released.
After reading through some forums online, there seem to be two feasible ways I could provide the DB:
1. Bundle it along with the .APK file of the app, or
2. Upload it to my app's website from where users will have to download it
I want to know which method would be better (if there is yet another approach other than these, please do let me know).
After pondering this problem for some time, I have these thoughts regarding the above approaches:
Approach 1:
Users will obtain the DB along with the app, and won't have to download it separately. Installation would thereby be easier. But, users will have to reinstall the app every time there is a new version of the DB. Also, if the DB is large, it will make the installable too cumbersome.
Approach 2:
Users will have to download the full DB from the website (although I can provide a small, sample version of the DB via Approach 1). But, the installer will be simpler and smaller in size. Also, I would be able to provide future versions of the DB easily for those who might not want newer versions of the app.
Could you please tell me from a technical and an administrative standpoint which approach would be the better one and why?
If there is a third or fourth approach better than either of these, please let me know.
Thank you!
Andruid
I built a similar app for Android which gets periodic updates with data from a government agency. It's fairly easy to build an Android compatible db off the device using perl or similar and download it to the phone from a website; and this works rather well, plus the user gets current data whenever they download the app. It's also supposed to be possible to throw the data onto the sdcard if you want to avoid using primary data storage space, which is a bigger concern for my app which has a ~6Mb database.
In order to make Android happy with the DB, I believe you have to do the following (I build my DB using perl).
$st = $db->prepare( "CREATE TABLE \"android_metadata\" (\"locale\" TEXT DEFAULT 'en_US')");
$st->execute();
$st = $db->prepare( "INSERT INTO \"android_metadata\" VALUES ('en_US')");
$st->execute();
I have an update activity which checks weather updates are available and if so presents an "update now" screen. The download process looks like this and lives in a DatabaseHelperClass.
public void downloadUpdate(final Handler handler, final UpdateActivity updateActivity) {
URL url;
try {
close();
File f = new File(getDatabasePath());
if (f.exists()) {
f.delete();
}
getReadableDatabase();
close();
url = new URL("http://yourserver.com/" + currentDbVersion + ".sqlite");
URLConnection urlconn = url.openConnection();
final int contentLength = urlconn.getContentLength();
Log.i(TAG, String.format("Download size %d", contentLength));
handler.post(new Runnable() {
public void run() {
updateActivity.setProgressMax(contentLength);
}
});
InputStream is = urlconn.getInputStream();
// Open the empty db as the output stream
OutputStream os = new FileOutputStream(f);
// transfer bytes from the inputfile to the outputfile
byte[] buffer = new byte[1024 * 1000];
int written = 0;
int length = 0;
while (written < contentLength) {
length = is.read(buffer);
os.write(buffer, 0, length);
written += length;
final int currentprogress = written;
handler.post(new Runnable() {
public void run() {
Log.i(TAG, String.format("progress %d", currentprogress));
updateActivity.setCurrentProgress(currentprogress);
}
});
}
// Close the streams
os.flush();
os.close();
is.close();
Log.i(TAG, "Download complete");
openDatabase();
} catch (Exception e) {
Log.e(TAG, "bad things", e);
}
handler.post(new Runnable() {
public void run() {
updateActivity.refreshState(true);
}
});
}
Also note that I keep a version number in the filename of the db files, and a pointer to the current one in a text file on the server.
It sounds like your app and your db are tightly bound -- that is, the db is useless without the database and the database is useless without the app, so I'd say go ahead and put them both in the same .apk.
That being said, if you expect the db to change very slowly over time, but the app to change quicker, and you don't want your users to have to download the db with each new app revision, then you might want to unbundle them. To make this work, you can do one of two things:
Install them as separate applications, but make sure they share the same userID using the sharedUserId tag in the AndroidManifest.xml file.
Install them as separate applications, and create a ContentProvider for the database. This way other apps could make use of your database as well (if that is useful).
If you are going to store the db on your website then I would recommend that you just make rpc calls to your webserver and get data that way, so the device will never have to deal with a local database. Using a cache manager to avoid multiple lookups will help as well so pages will not have to lookup data each time a page reloads. Also if you need to update the data you do not have to send out a new app every time. Using HttpClient is pretty straight forward, if you need any examples please let me know

Resources