Rolling catalina.YYYY-MM-DD.log file - tomcat6

For some specific reason I need to retain both my app specific log and catalina.log. I have configured log4j to use RollingFileAppender for my app specific logs and it is working fine. Is there any way to use similar logging mechanism for catalina.logs also.
Can I do this by somehow tweaking the logging.properties under conf.

You can use logrotate. If you run ubuntu.
Create this file
/etc/logrotate.d/tomcat
Copy the following contents into the above file
/var/log/tomcat/catalina.out {
copytruncate
daily
rotate 7
compress
missingok
size 5M
}
Make sure that the path /var/log/tomcat/catalina.out above is adjusted to point to your tomcat’s catalina.out
daily - rotates the catalina.out daily
rotate – keeps at most 7 log files
compress – compresses the rotated files
size – rotates if the size of catalina.out is bigger than 5M
Thats it.

Related

AWS Sagemaker failure after successful training "ClientError: Artifact upload failed:Insufficient disk space"

I'm training a network using custom docker image. First training with 50.000 steps everythig was ok, when I tried to increase to 80.000, I got error: "ClientError: Artifact upload failed:Insufficient disk space", I just increased the steps number.. this is weird to me. There are no errors in the cloudwatch log, my last entry is:
Successfully generated graphs: ['pipeline.config', 'tflite_graph.pb',
'frozen_inference_graph.pb', 'tflite_graph.pbtxt',
'tflite_quant_graph.tflite', 'saved_model', 'hyperparameters.json',
'label_map.pbtxt', 'model.ckpt.data-00000-of-00001',
'model.ckpt.meta', 'model.ckpt.index', 'checkpoint']
Which basically means that those files have been created because is a simple:
graph_files = os.listdir(model_path + '/graph')
Which disk space is talking about? Also looking at the training job I see from the disk utilization chart that the rising curve peaks at 80%...
I expect that after the successful creation of the aforementioned files, everything is uploaded to my s3 bucket, where no disk space issues are present. Why 50.000 steps is working and 80.000 is not working?
It is my understanding that the number of training steps don't influence the size of the model files..
Adding volume size to the training job selecting "additional storage volume per instance (gb)" to 5GB on the creation seems to solve the problem. I still don't understand why, but problem seems solved.
When the Sagemaker training completes, the model from /opt/ml/model directory in container will be uploaded to S3. If the model to be uploaded is too large then that error ClientError: Artifact upload failed:... will be thrown.
And, increasing the volume size will fix the problem superficially. But the model in most cases does not have to be that large, right?
Note that the odds are your model itself is not too large but you're saving your checkpoints to /opt/ml/model as well (bug).
And, in the end, sagemaker tries to pack everything (model and all checkpoints) in order to upload to S3. Thereby, not having sufficient volume. Hence, the error. You can confirm if this is the reason by checking the size of your uploaded model.tar.gz file on S3.
Why 50.000 steps is working and 80.000 is not working?
With 80,000 steps, the number of checkpoints have also increased, and the final model.tar.gz file which is to uploaded on S3 has become too big that it can't even fit in current volume.

How to upload .gz files into Google Big Query?

I have an idea of a 90 GB .csv file that I want to make on my local computer and then upload into Google BigQuery for analysis. I create this file by combining thousands of smaller .csv files into 10 medium-sized files and then combining those medium-sized files into the 90 GB file, which I then want to move to GBQ. I am struggling with this project because my computer keeps crashing from memory issues. From this video I understood that I should first transform the medium-sized .csv files (about 9 GB each) into .gz files (about 500MB each), and then upload those .gz files into Google Cloud Storage. Next, I would create an empty Table (in Google BigQuery / Datasets) and then append all of those files to the created Table. The issue I am having is finding some kind of tutorial about how to do this or and documentation of how to do this. I am new to the Google Platform so maybe this is a very easy job that can be done with 1 click somewhere, but all I was able to find was from the video that I linked above. Where can I find some help or documentation or tutorials or videos on how people do this? Do I have the correct idea on the workflow? Is there some better way (like using some downloadable GUI to upload stuff)?
See the instructions here:
https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile
As Abdou mentions in a comment, you don't need to combine them ahead of time. Just gzip all of your small CSV files, upload them to a GCS bucket, and use the "bq.py load" command to create a new table. Note that you can use a wildcard syntax to avoid listing all of the individual file names to load.
The --autodetect flag may allow you to avoid specifying a schema manually, although this relies on sampling from your input and may need to be corrected if it fails to detect in certain cases.

Monitoring for changes in folder without continuously running

This question has been asked around several time. Many programs like Dropbox make use of some form of file system api interaction to instantaneously keep track of changes that take place within a monitored folder.
As far as my understanding goes, however, this requires some daemon to be online at all times to wait for callbacks from the file system api. However, I can shut Dropbox down, update files and folders, and when I launch it again it still gets to know what the changes that I did to my folder were. How is this possible? Does it exhaustively search the whole tree in search for updates?
Short answer is YES.
Let's use Google Drive as an example, since its local database is not encrypted, and it's easy to see what's going on.
Basically it keeps a snapshot of the Google Drive folder.
You can browse the snapshot.db (typically under %USER%\AppData\Local\Google\Drive\user_default) using DB browser for SQLite.
Here's a sample from my computer:
You see that it tracks (among other stuff):
Last write time (looks like Unix time).
checksum.
Size - in bytes.
Whenever Google Drive starts up, it queries all the files and folders that are under your "Google Drive" folder (you can see that using Procmon)
Note that changes can also sync down from the server
There's also Change Journals, but I don't think that Dropbox or GDrive use it:
To avoid these disadvantages, the NTFS file system maintains an update sequence number (USN) change journal. When any change is made to a file or directory in a volume, the USN change journal for that volume is updated with a description of the change and the name of the file or directory.

Neo4j and big log files

I try to use n4j in my app, but I have problem with big log files. Are they necessary or is there some way to reduce the number and size of them?
At the moment I see files like:
nioneo_logical.log.v0
nioneo_logical.log.v1
nioneo_logical.log.v2
etc
and they are ~26MB each (over 50% of neo4j folder).
These files are created whenever the logical logs are rotated.
You can configure rules for them in the server properties file.
See details here: http://docs.neo4j.org/chunked/stable/configuration-logical-logs.html
You can safely remove them (but only the *.v*) if your database is shutdown and in a clean state. Don't remove them while the db is running because they could be needed in case of recovery on a crash.

How can I mount a partition in the middle of a full disk image in userspace?

It is perfectly possible to mount a partition with a disk image using mount or losetup's "offset" parameter. However to mount an image in userspace without requiring root permission or setting up fstab etc, fuseiso is needed (I think).
Unfortunately fuseiso doesn't seem to have the offset parameter needed to reach into the image to the partition.
I'm not sure if I'm out of luck, or if there might be some trick to get it to work by perhaps producing some sort of phantom file that effectively starts at the offset of the other file (I don't want to make a copy, otherwise dd would do the trick).
Use Case
My use case is building disk images on an autobuild server where the build job is not allowed to have root permissions, and the server should not need a custom setup for particular build jobs.
TL;DR
Can fuseiso reach into a full disk image (not just partition image)?
If not, is it possible to create a file/link that into the middle of a file to fake out fuseiso?
Is there another option besides fuseiso ?

Resources