How to prevent dataloss on restart of a influxdb server? - database

im running a instance of influxdb on a relatively small device to collect time series IoT sensor data. Sometimes it is necessary to reboot that device (updates or something similar) and I noticed that after rebooting there is a dataloss in my influxdb history if I query some data. With my current config I lost around 16 hours of data due to a reboot.
How can I make sure all data is still present after a reboot of the device?
Thanks!

In InfluxDB, persistent data is stored in the /wal, /data, and /meta directories. The /wal directory contains the write ahead log, which acts as a staging area for recently written points to be stored before they are compressed into time series merge files for long term storage in the /data directory.
On Unix systems, the /tmp directory is sometimes cleared on reboot. Since the wal-dir config setting is specified as /tmp/.influxdb/wal, the persistent data in the write ahead log may be cleared on reboot. This explains why you are seeing data loss on only recent data. Older data is flushed to the /data directory, which isn't cleared on reboot.
In short, the wal-dir config setting needs to be set to /data/.influxdb/wal or another directory that is not cleared on reboot.

Related

Google Colab: about lifetime of files on VM

is there anyone knows the lifetime of files on the colab virtual machine?
for example, in a colab notebook, I save the data to a csv file as:
data.to_csv('data.csv')
then how long will the data.csv exist?
This is the scenario:
I want to maintain and update over 3000 small datasets everyday, but it seems that the interaction between colab and google drive by using pydrive is pretty slow(as I need to check every dataset everyday), so if the lifetime of files on the virtual machine is long enough, I can update the files on virtual machine everyday(which would be much faster) then synchronize them to google drive several days a time rather than everyday.
VMs are discarded after a period of inactivity, so your best bet is to save files to Drive that you'd like to keep as generated.
With pydrive, this is possible, but a bit cumbersome. An easier method is to use a FUSE interface to Drive so that you can automatically sync files as they are saved normally.
For an example, see:
https://colab.research.google.com/drive/1srw_HFWQ2SMgmWIawucXfusGzrj1_U0q

Monitoring for changes in folder without continuously running

This question has been asked around several time. Many programs like Dropbox make use of some form of file system api interaction to instantaneously keep track of changes that take place within a monitored folder.
As far as my understanding goes, however, this requires some daemon to be online at all times to wait for callbacks from the file system api. However, I can shut Dropbox down, update files and folders, and when I launch it again it still gets to know what the changes that I did to my folder were. How is this possible? Does it exhaustively search the whole tree in search for updates?
Short answer is YES.
Let's use Google Drive as an example, since its local database is not encrypted, and it's easy to see what's going on.
Basically it keeps a snapshot of the Google Drive folder.
You can browse the snapshot.db (typically under %USER%\AppData\Local\Google\Drive\user_default) using DB browser for SQLite.
Here's a sample from my computer:
You see that it tracks (among other stuff):
Last write time (looks like Unix time).
checksum.
Size - in bytes.
Whenever Google Drive starts up, it queries all the files and folders that are under your "Google Drive" folder (you can see that using Procmon)
Note that changes can also sync down from the server
There's also Change Journals, but I don't think that Dropbox or GDrive use it:
To avoid these disadvantages, the NTFS file system maintains an update sequence number (USN) change journal. When any change is made to a file or directory in a volume, the USN change journal for that volume is updated with a description of the change and the name of the file or directory.

PostgreSQL : Find Postgresql database file on another drive and restore it

I am working on a PostgreSQL database and recently we had a server upgrade, during which we changed our drive from a 2Tb raid Hard disk to a SSD. Now I mounted the RAID drive on a partition and can even access it.
Next what I would like to do is to get the database out of the mounted drive and restore it on the currently running PostgreSQL. How can I achieve this?
root#check03:/mnt/var/lib/postgresql/9.1/main/global# ls
11672 11674 11805 11809 11811 11813_fsm 11816 11820 11822 11824_fsm 11828 11916 11920 pg_internal.init
11672_fsm 11675 11807 11809_fsm 11812 11813_vm 11818 11820_fsm 11823 11824_vm 11829 11918 pg_control pgstat.stat
11672_vm 11803 11808 11809_vm 11813 11815 11819 11820_vm 11824 11826 11914 11919 pg_filenode.map
root#check03:/mnt/var/lib/postgresql/9.1/main/global# cd ..
As you can see I am able to access the drives and the folders, but I don't know what to do next. Kindly let me know. Thanks a lot.
You need the same version of PostgreSQL (9.1), also the same or later minor version. copy main/ and everything below that to the new location. Copy the configuration of the old instance and adapt the paths to fit to the new location (the main/ ist the ''data directory'' (also sometimes called PGDATA)). Start the new instance and look carefully at the logs. You should probably rebuild any indexes.
Also read about the file layout in the fine documentation.
EDIT: If you have any chance to run the old configuration, read about backup and restore, this is a much more safe way to transfer data.
the Postgres binaries must be the same version
make sure that postgres is not running
copy using cp -rfp or tar | tar or cpio , or whatever you like. Make sure you preserve the file owners and mode (top-level-directory must be 0700, owned by postgres)
make sure that the postgres-startup (in /etc/init.d/postxxx) refers to the new directory; sometimes there is an environment variable $PGDATA contiaining the name of the postgres data directory; maybe you need to make changes to new_directory/postgres.conf, too (pg_log et al)
for safety, rename the old data directory
restart Postgres
try to connect to it; check the logs.
Extra:
Seasoned unix-administrators (like the BOFH ;-) might want to juggle with mountpoints and/or symlinks (instead of copying). Be my guest. YMMV
Seasoned DBAs might want to create a tablespace, point it at the new location and (selectively) move databases, schemas or tables to the new location.

Re-process all nagios data with pnp4nagios

I have moved to a new nagios installation (new server). I have transfered the data from the original server (/var/log/nagios2/archives) to my new server (/var/log/nagios3/archives) (I think they have compatible format). Now I would like to regenerate the pnp4nagios graphs using this historical data.
This is the command I have to process data on the fly:
/usr/lib/pnp4nagios/libexec/process_perfdata.pl --bulk=/var/lib/pnp4nagios/perfdata/host-perfdata
But this is just processing new data in /var/lib/pnp4nagios/perfdata/host-perfdata.
I have several questions:
Where does pnp4nagios store the processed data (graphs)?
How can I force pnp4nagios to regenerate all graphs?
pnp4nagios will call process_perfdata.pl which itsself invokes a call to rrdtool, storing the graph data within round robin databases (rrds). Within your pnp4nagios configuration, you should find the logical path for those on disk, letting you backup/move that data.
I'm just guessing that you already have pnp4nagios 0.6 (of not, look for the 0.4 docs):
http://docs.pnp4nagios.org/pnp-0.6/webfe_cfg (look for rrdbase)
But keep in mind - you cannot move rrds between different architectures. i.e. old server i386, new server amd64 - does not work at all by just copying stuff with rsync/scp. Look here for details: http://oss.oetiker.ch/rrdtool/doc/rrddump.en.html
Further, you should consider moving from the synchronous mode (which yours is) to "bulk mode with npcd and npcdmod" which will create asynchronous spooling, decreasing blocking the core when perfdata processing hangs (lower latency in case).
If you happen to have further questions on pnp4nagios itsself, you might like the idea of posting your questions to the monitoring-portal.org as well where the devs are reading too.
The nagios.log (and log archives) do not contain the perfdata information. What you need to do is move the .rrd & .xml files to the new server from the perfdata directory.
Nagios doesn't log or retain the perfdata, at runtime if processing of perfdata is enable it will process it and then cheal it's hands of it.

Mule ESB: How can I detect partial files to avoid transferring partly-uploaded files?

I have a folder on my Mule ESB server which several sources can point to (one SCP, one SFTP, along with others). Whenever I detect a file, I want to move it to another directory. The problem is, I'm moving the partially-completed files causing me to lose the data.
I've tried using the File's "fileage" attribute, but it doesn't seem to work reliably. I'm trying to keep latency down as much as possible, since most files will be <10kb, but some will need to be as large as 100mb.
Is there any way to know if a file is a partial, versus a complete? I know WinSCP uses the .filepart extension, but that's just for one application, and even then that extension can be changed (or completely removed) in WinSCP's preferences.
I solved my own problem.
With a bit more testing it turns out the problem is at the OS level. On RedHat the "Last Modified" time stamp is only updated every ~1000ms, creating too large a latency for the 4kb files. On Windows it's even worse: the "last modified" is only set when the transfer starts, then reset when it completes. The MuleSoft team is technically correct to pull fileAge from this time stamp, but the OS isn't updating it often enough to work. They should use file size instead. I'll be submitting a work-around ticket.

Resources