Best practices for exporting mongo collections to SQL Server

Best practices for exporting mongo collections to SQL Server - sql-server

we are using MongoDB (on Linux) as our main database. However, we need to periodically (e.g. nightly) export some of the collections from Mongo to a MS SQL server to run analytics.
I am thinking about the following approach:
Backup the Mongo database (probably from a replica) using mongodump
Restore the database into a Windows machine where Mongo is istalled
Write a custom made app to import the collections from Mongo into SQL (possibly handling any required normalization).
Run analytics on the Windows SQL Server installation.
Are there any other "tried and true" alternatives?
Thanks,
Stefano
EDIT: for point 4, the analytics is to be run on SQL Server, not Mongo.

Overall looks fine, but i can suggest two things:
Skip backup/restore steps and read data directly from linux mongodb, because it will be harder and harder to backup/restore database as it will grow.
Instead of custom made app use Quartz.net for nightly export, it is easy to use and can solve any other schedule tasks.
Also i can suggest look into such new approaches as cqrs and event sourcing, that's basically allow to avoid export tasks. You can just handle messages and store data into two data sources (linux mongodb, windows sql server) in real time with small delay, or even analyze data from messages and store in mongodb.

Related

Elastic Search 5 and SQL Server synchronisation

I am starting a Elastic search 5 project from data that are actually in a SQL Server, so I am starting from the start:
I am thinking about how import data from my SQL Server, and especially how to synchronise my data when data are updated or added.
I saw here it is adviced to make no too frequent batch.
But how make synchronisation batchs, may I have to write it myself or is there very used tools and practices ?
River and JDBC plugin feeder appears to have been really used but don't work with Elastic Search 5.*
Any help would be very welcomed.

I'd recommend using Logstash:
It's easy to use and setup
You can do your own ETL in logstash configuration files
You can have multiple JDBC sources in one file
You'll have figure out how to make incremental (batched) updates to sync your data. It really depends on your data model.
This is a nice blog piece to begin with:
https://www.elastic.co/blog/logstash-jdbc-input-plugin

Streaming data from SQL to Mongo

I am working with industrial equipment that inserts some text data into a SQL Server 2008 database every time it cycles (about every 25 seconds). I am looking to forward that data to a mongo database in real time to use with an internal Meteor application.
Would there be any obvious starting point? The closest answer I have found is at: https://github.com/awatson1978/meteor-cookbook/blob/master/cookbook/datalayer.md
Q: Well, how am I suppose to use the data in my SQL database then?
Through REST interfaces and/or exposing the SQL database as a JSON stream. We put the ORM outside of Meteor. So, the trick is to move your data from your SQL database into Meteor's Mongo database, and have Mongo act as an object store or caching layer.
Apologies, if it is something obvious.

You need to use Mongo, but as simple repository for your MySql database.
This maintains all Meteor's characteristics and uses Mongo as a temporal repository for your MySql or PostgreSql databases.
A brilliant attempt to that is mysql-shadow by #perak (https://github.com/perak/mysql-shadow). It does what it says, keeps Mongo synchronized both ways with MySql and let's you work your data in MySql.
The bad news is that the developer will not continue maintaining it, but what is done is enough to work with simple scenarios where you don't have complex triggers that update other tables or stuff like that.
This works with MySql of course, but if you look at the code the MS SQL implementation is not hard.
For a full featured synchronization you can use SymmetricsDS (http://www.symmetricds.org), a very well tested database replicator. This involves setting up a new java server, of course, but is by far the best way to be sure that you will be able to convert your Mongo database in a simple repository of your real MySql, PostgreSQL, SQL Server , Informix database. I have to check it myself yet.
For now MySQL Shadow seems like a good enough solution.
One advantage of this approach is that you can still use all standard Meteor features, packages, meteor deployment and so on. You don´t have to do anything but set up the synch mechanism, and you are not breaking anything.
Also, if someday the Meteor team uses some of the dollars raised in SQL integration, your app is more likely to work as is.

Suitable method For synchronising online and offline Data

I have two applications with own database.
1.) Desktop application which has vb.net winforms interface, runs in offline enterprise network and stores data in central database [SQL Server]
**All the data entry and other office operations are carried out and stored in central database
2.) Second application has been build on php. it has html pages and runs as website in online environment. It stores all data in mysql database.
**This application is accessed by registered members only and they are facilitied with different reports of the data processed by 1st application.
Now I have to synchronize data between online and offline database servers. I am planning for following:
1.) Write a small program to export all the data of SQL Server [offline server] to a file in CVS format.
2.) Login to admin Section of live server.
3.) Upload the exported cvs file to the server.
4.) Import the data from cvs file to mysql database.
Is the method i am planning good or it can be tunned to perform good. I would also appreciate for other nice ways for data synchronisation other than changing applications.. ie. network application to some other using mysql database

What you are asking for does not actually sound like bidirectional sync (or movement of data both ways from SQL Server to MySQL and from MySQL to SQL Server) which is a good thing as it really simplifies things for you. Although I suspect your method of using CSV's (which I would assume you would use something like BCP to do this) would work, one of the issues is that you are moving ALL of the data every time you run the process and you are basically overwriting the whole MySQL db everytime. This is obviously somewhat inefficient. Not to mention during that time the MySQL db would not be in a usable state.
One alternative (assuming you have SQL Server 2008 or higher) would be to look into using this technique along with Integrated Change Tracking or Integrated Change Capture. This is a capability within SQL Server that allows you to determine data that has changed since a certain point of time. What you could do is create a process that just extracts the changes since the last time you checked to a CSV file and then apply those to MySQL. If you do this, don't forget to also apply the deletes as well.

I don't think there's an off the shelf solution for what you want that you can use without customization - but the MS Sync framework (http://msdn.microsoft.com/en-us/sync/default) sounds close.
You will probably need to write a provider for MySQL to make it go - which may well be less work than writing the whole data synchronization logic from scratch. Voclare is right about the challenges you could face with writing your own synchronization mechanism...

Do look into SQL Server Integration Service as a good alternate.

Export from a standalone database to an embedded database

I have a two-part application, where there is a central database that is edited, and then at certain times, the data is released and distributed as its own application. I would like to use a standalone database for the central database (MySQL, Postgres, Oracle, SQL Server, etc.) and then have a reliable export to an embedded database (probably SQLite) for distribution.
What tools/processes are available for such an export, or is it a practice to be avoided?
EDIT: A couple of additional pieces of information. The distributed application should be able to run without having to connect to another server (ex: your spellchecker still works even you don't have internet), and I don't want to install a full DB server for read-only access to the data.

If you really only want your clients to have read-access to the offline data it should not be that difficult to update your client-data manually.
A good practice would be to use the same product for the server database and the client database. You wouldn't have to write SQL-Statements twice since they use the same SOL-Dialect and same features.
Firebird for example offers a server
and an embedded version.
Also Microsoft offers their MS SQL Server
as a mobile version (compact edition) and there are
also Synchronization services
provided by Microsoft (good blog
describing sync services in visual
studio:
http://keithelder.net/blog/archive/2007/09/23/Sync-Services-for-SQL-Server-Compact-Edition-3.5-in-Visual.aspx)
MySQL has a product which is called "MySQLMobile" but I never actually used it.
I can also recommend SQLite as an embedded database since it is very easy to use.
Depending on your bandwidth and data amount you could even download the whole database and delete the old one. (in Firebird for example only copy the database files and it will also work with the mobile version) Very easy - BUT you have to know if it will work for your scenario. If you have more data you will need something more flexible and sophisticated, only updating the data that really changed.

Updating database on website from another data store

I have a client who owns a business with a handful of employees. He has a product website that has several hundred static product pages that are updated periodically via FTP.
We want to change this to a data-driven website, but the database (which will be hosted at an ISP) will have to be updated from data on my client's servers.
How best to do this on a shoestring? Can the database be hot-swapped via FTP, or do we need to build a web service we can push changes to?

Ask the ISP about the options. Some ISPs allow you to ftp upload the .mdf (database file).
Some will allow you to connect with SQL management studio.
some will allow both.
you gotta ask the ISP.

Last time I did this we created XML documents that were ftp'd to the website. We had an admin page that would clear out the old data by running some stored procs to truncate tables then import the xml docs to the sql tables.
Since we didn't have the whole server to ourselves, there was no access to SQL Server DTS to schedule this stuff.

There is a Database Publishing Wizard from MS which will take all your data and create a SQL file that can then be run on the ISP. It will also, though I've never tried it, go directly to an ISP database. There is an option button on one of the wizard screens that does it.
it does require the user to have a little training and it's still a manual process so mabe not what you're after but i think it will do the job.

Long-term, building a service to upload the data is probably the cleanest solution as the app can now control it's import procedures. You could go grossly simple with this and just have the local copy dump some sort of XML that the app could read, making it not much harder than uploading the file while still in the automatable category. Having this import procedure would also help with development as you now have an automated and repeatable way to sync data.

This is what I usually do:
You could use a tool like Red-Gate's SQL Data Compere to do this. The tool compares data between two catalogs (on same or different servers) and generates a script for syncing them.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight