How to migrate DolphinDB cluster onto another machine - database

Our company is using DolphinDB on a daily basis. As time goes by, we have too much data on our old server. So we decided to migrate our cluster onto to another Dell PowerEdge server with 128 cores. We did this by copying all the data onto the new machine, but as we started the new cluster and trying to open a dfs database by using command:
db = database("dfs://rangeDB");
it reported an error message:
The chunk meta returned from name node didn't contain any site.
How can we solve this problem?

DolphinDB is a distributed database system. A database contains meta data on controller and chunk data on multiple data nodes. Simply copying all directory from one machine to another doesn't work.
The best practice for DolphinDB migration is to use builtin backup and restore functions. First, use backup function to export data to a shared disk and then use restore function to import data from the backup directory.

Related

Easiest way to replicate (copy? Export and import?) a large, rarely changing postgreSQL database

I have imported about 200 GB of census data into a postgreSQL 9.3 database on a Windows 7 box. The import process involves many files and has been complex and time-consuming. I'm just using the database as a convenient container. The existing data will rarely if ever change, and will be updating it with external data at most once a quarter (though I'll be adding and modifying intermediate result columns on a much more frequent basis. I'll call the data in the database on my desktop the “master.” All queries will come from the same machine, not remote terminals.
I would like to put copies of all that data on three other machines: two laptops, one windows 7 and one Windows 8, and on a Ubuntu virtual machine on my Windows 7 desktop as well. I have installed copies of postgreSQL 9.3 on each of these machines, currently empty of data. I need to be able to do both reads and writes on the copies. It is OK, and indeed I would prefer it, if changes in the daughter databases do not propagate backwards to the primary database on my desktop. I'd want to update the daughters from the master 1 to 4 times a year. If this wiped out intermediate results on the daughter databases this would not bother me.
Most of the replication techniques I have read about seem to be worried about transaction-by-transaction replication of a live and constantly changing server, and perfect history of queries & changes. That is overkill for me. Is there a way to replicate by just copying certain files from one postgreSQL instance to another? (If replication is the name of a specific form of copying, I'm trying to ask the more generic question). Or maybe by restoring each (empty) instance from a backup file of the master? Or of asking postgreSQL to create and export (ideally on an external hard drive) some kind of postgreSQL binary of the data that another instance of postgreSQL can import, without my having to define all the tables and data types and so forth again?
This question is also motivated by my desire to work around a home wifi/lan setup that is very slow – a tenth or less of the speed of file copies to an external hard drive. So if there is a straightforward way to get the imported data from one machine to another by transference of (ideally compressed) binary files, this would work best for my situation.
While you could perhaps copy the data directory directly as mentioned by Nick Barnes in the comments above, I would recommend using a combination of pg_dump and pg_restore, which will dump a self-contained file which can then be dispersed to the other copies.
You can run pg_dump on the master to get a dump of the DB. I would recommend using the options -Fc -j3 to use the custom binary format (instead of dumping in SQL format; this should be much smaller and perhaps faster as well) and will dump 3 tables at once (this can be adjusted up or down depending on the disk throughput capabilities of your machine and the number of cores that it has).
Then you run dropdb on the copies, createdb to recreate an empty DB of the same name, and then run pg_restore on that new empty DB to restore the dump file to the DB. You would want to use the options -d <dbname> -f <dump_file> -j3 (again adjusting the number for -j according to the abilities of the machine).
When you want to refresh the copies with new content from the master DB, simply repeat the above steps

The fastest backup/restore strategy for Azure SQL databases?

What is the fastest way to backup/restore Azure SQL database?
The background: We have the database with size ~40 GB and restoring it from the .bacbac file (~4GB of compressed data) in the native way by Azure SQL Database Import/Export Service takes up to 6-8 hours. Creating .bacpac is also very long and takes ~2 hours.
UPD:
UPD.
Creating the database (by the way transactional consistent) copy using CREATE DATABASE [DBBackup] AS COPY OF [DB] takes only 15 minutes with 40 GB database and the restore is simple database rename.
UPD. Dec, 2014. Let me share with you our experience about the fastest way of DB migration schema we ended up with.
First of all, the approach with data-tier application (.bacpac) turned out to be not viable for us after DB became slightly bigger and it also will not work for you if you have at least one non-clustered index with total size > 2 GB until you disable non-clustered indexes before export - it's due to Azure SQL transaction log limit.
We stick to Azure Migration Wizard that for data transfer just runs BCP for each table (parameters of BCP are configurable) and it's ~20% faster than approach with .bacpac.
Here are some pitfalls we encountered with the Migration Wizard:
We run into encoding troubles for non-Unicode strings. Make sure
that BCP import and export runs with same collation. It's -C ... configuration switch, you can find parameters with which BCP calling
in .config file for MW application.
Take into account that MW (at least the version that is actual at the moment of this writing) runs BCP with parameters that will leave the constraints in non-trusted state, so do not forget to check all non-trusted constraints after BCP import.
If your database is 40GB it's long past time to consider having a redundant Database server that's ready to go as soon as the main becomes faulty.
You should have a second server running alongside the main DB server that has no actual routines except to sync with the main server on an hourly/daily basis (depending on how often your data changes, and how long it takes to run this process). You can also consider creating backups from this database server, instead of the main one.
If your main DB server goes down - for whatever reason - you can change the host address in your application to the backup database, and spend the 8 hours debugging your other server, instead of twiddling your thumbs waiting for the Azure Portal to do its thing while your clients complain.
Your database shouldn't be taking 6-8 hours to restore from backup though. If you are including upload/download time in that estimate, then you should consider storing your data in the Azure datacenter, as well as locally.
For more info see this article on Business Continuity on MSDN:
http://msdn.microsoft.com/en-us/library/windowsazure/hh852669.aspx
You'll want to specifically look at the Database Copies section, but the article is worth reading in full if your DB is so large.
Azure now supports Point in time restore / Geo restore and GeoDR features. You can use the combination of these to have quick backup / restore. PiTR and Geo restore comes with no additional cost while you have to pay for
Geo replica
There are multiple ways to do backup, restore and copy jobs on Azure.
Point in time restore.
Azure Service takes full backups, multiple differential backups and t-log backups every 5 minutes.
Geo Restore
same as Point in time restore. Only difference is that it picks up a redundant copy from a different blob storage stored in a different region.
Geo-Replication
Same as SQL Availability Groups. 4 Replicas Async with read capabilities. Select a region to become a hot standby.
More on Microsoft Site here. Blog here.
Azure SQL Database already has these local replicas that Liam is referring to. You can find more details on these three local replicas here http://social.technet.microsoft.com/wiki/contents/articles/1695.inside-windows-azure-sql-database.aspx#High_Availability_with_SQL_Azure
Also, SQL Database recently introduced new service tiers that include new point-in-time-restore. Full details at http://msdn.microsoft.com/en-us/library/azure/hh852669.aspx
Key is to use right data management strategy as well that helps solve your objective. Wrong architecture and approach to put everything on cloud can prove disastrous... here's more to it to read - http://archdipesh.blogspot.com/2014/03/windows-azure-data-strategies-and.html

Access Newly Created PostgreSQL Cluster

I am looking to create a new database cluster in PostgreSQL because I want this cluster to point to a different (and larger) data directory than my current cluster on localhost:5432. To create a new cluster, I ran the command below. However, after restarting PostgresSQL I don't see the db cluster in pgAdmin and can't connect to a server with port 5435. How do I connect to this new cluster? Alternatively, I thought I could create a new tablespace within the old cluster which points to this larger data directory, but I'm assuming that populating a database using this tablespace would still result in files being written to the cluster data directory? I'm running PostgreSQL 9.3 on Ubuntu 12.4.
$ pg_createcluster -d /home/foo 9.3 test_env
Creating new cluster 9.3/test_env ...
config /etc/postgresql/9.3/test_env
data /home/foo/
locale en_US.UTF-8
port 5435
You need to specify option '--start' if you want it to actually start the database. Without that it will just create the data directory.
However using table spaces would probably be a better solution since running multiple database clusters introduces the overhead of running completely separate postmaster processes. If you create a database in a table space then all files associated with the database will go in the associated directory. The system catalogue will still be in the cluster's data directory, but this is unlikely to be large enough to cause you any problems.

Copying a tablespace from one postgresql instance to another

I'm looking for a way to quickly "clone" a database from 1 postgresql server to another.
Assuming...
I have a postgresql server running on HostA, serving 2 databases
I have 2 devices mounted on HostA, each device stores the data for one of the database (i.e. 1 database => 1 tablespace => 1 device )
I am able to obtain a "safe" point in time snapshot through some careful method
I am able to consistently and safely produce a clone of any of the 2 devices used (I could even assume both database only ever receive reads)
The hosts involved are CentOS 5.4
Is it possible to host a second postgresql server on HostB, mount one of the cloned devices and get the corresponding database to "pop" into existence? (Kind of like when you copy the MyISAM table files in MySQL).
If this is at all possible, what is the mechanism (i.e. what DDL should I look into or pg commands)?
It's important for me to be able to move individual databases in isolation of each other. But if this isn't possible, would a similar approach work at a server level (trying to clone and respawn a server by copying the datadir over to a host with the same postgresql install) ?
Not easily, because there are a number of files that are shared between databases which means each database in the same install is dependent on this.
You can do it at the server level, or at the cluster level, but not at individual database level. Just be sure to copy/clone over the whole data directory and all external tablespaces. As long as you can make the clone atomically (either on the same filesystem or with a system that can do atomic clones across filesystems), you don't even need to stop the database on hostA to do it.

Copying databases to remote locations

Our EPOS system copies data by compressing the database into a zip file, and manually copying to each till, using shared directories.
Each branched is liked to the main location, using VPN which can be problematic, but is required for the file sharing to work correctly.
Since our database system currently does not support replication, is there another solution for copying data or should we migrate our software to another database?
Replication is the "right" way to go, so if migrating to another database is an option (is it really?), that's the best route.
You might consider a utility that queries all the tables for raw data (in CSV?), sending that to files. Then at least you don't have to take the database down to do the backup.

Resources