Elastic Search 5 and SQL Server synchronisation - sql-server

I am starting a Elastic search 5 project from data that are actually in a SQL Server, so I am starting from the start:
I am thinking about how import data from my SQL Server, and especially how to synchronise my data when data are updated or added.
I saw here it is adviced to make no too frequent batch.
But how make synchronisation batchs, may I have to write it myself or is there very used tools and practices ?
River and JDBC plugin feeder appears to have been really used but don't work with Elastic Search 5.*
Any help would be very welcomed.

I'd recommend using Logstash:
It's easy to use and setup
You can do your own ETL in logstash configuration files
You can have multiple JDBC sources in one file
You'll have figure out how to make incremental (batched) updates to sync your data. It really depends on your data model.
This is a nice blog piece to begin with:
https://www.elastic.co/blog/logstash-jdbc-input-plugin

Related

Importing data from multiple SQL servers

We are looking at collecting data from partners' Microsoft SQL Servers and importing it into our own SQL Server. Part of what we want to do is to take all of their data separately and then combine it all together so that we can create baselines on how they are performing against one another comparatively. I am curious to learn what best practices or recommendations there might be to achieve this?
The easiest approach that I can think of, is to set them up as linked servers on our SQL Server and then write stored procedures (and automate a schedule using SQL Server Agent) to import the data from each to local tables. I've also started looking at 3rd party systems to do this (e.g. stitchdata) but am not seeing ones that will import data back locally, most of them appear to import data to a cloud DB solution.
Has anyone done something similar before and can help steer us in the right direction?
Thank you!
To Solve this problem using SQL Tools an approach is you create a staging database to load all external information.
To gather the data you can use SSIS packages to connect directly to the sources. and schedule the packages on SQL ServerAgent
I avoid using the linked server to ETL proposes for many reasons, but the most important to me are:
If the remote server is unavailable, all ETL process can be broken.
The process would have been strongly linked to the origin and if the source changes you will need to reconstruct many things.
Tou can use or not SP to load and compare the tables between the final database and the stage. It will depend if the database is on the same server, performance, etc.

Streaming data from SQL to Mongo

I am working with industrial equipment that inserts some text data into a SQL Server 2008 database every time it cycles (about every 25 seconds). I am looking to forward that data to a mongo database in real time to use with an internal Meteor application.
Would there be any obvious starting point? The closest answer I have found is at: https://github.com/awatson1978/meteor-cookbook/blob/master/cookbook/datalayer.md
Q: Well, how am I suppose to use the data in my SQL database then?
Through REST interfaces and/or exposing the SQL database as a JSON stream. We put the ORM outside of Meteor. So, the trick is to move your data from your SQL database into Meteor's Mongo database, and have Mongo act as an object store or caching layer.
Apologies, if it is something obvious.
You need to use Mongo, but as simple repository for your MySql database.
This maintains all Meteor's characteristics and uses Mongo as a temporal repository for your MySql or PostgreSql databases.
A brilliant attempt to that is mysql-shadow by #perak (https://github.com/perak/mysql-shadow). It does what it says, keeps Mongo synchronized both ways with MySql and let's you work your data in MySql.
The bad news is that the developer will not continue maintaining it, but what is done is enough to work with simple scenarios where you don't have complex triggers that update other tables or stuff like that.
This works with MySql of course, but if you look at the code the MS SQL implementation is not hard.
For a full featured synchronization you can use SymmetricsDS (http://www.symmetricds.org), a very well tested database replicator. This involves setting up a new java server, of course, but is by far the best way to be sure that you will be able to convert your Mongo database in a simple repository of your real MySql, PostgreSQL, SQL Server , Informix database. I have to check it myself yet.
For now MySQL Shadow seems like a good enough solution.
One advantage of this approach is that you can still use all standard Meteor features, packages, meteor deployment and so on. You donĀ“t have to do anything but set up the synch mechanism, and you are not breaking anything.
Also, if someday the Meteor team uses some of the dollars raised in SQL integration, your app is more likely to work as is.

Best practices for exporting mongo collections to SQL Server

we are using MongoDB (on Linux) as our main database. However, we need to periodically (e.g. nightly) export some of the collections from Mongo to a MS SQL server to run analytics.
I am thinking about the following approach:
Backup the Mongo database (probably from a replica) using mongodump
Restore the database into a Windows machine where Mongo is istalled
Write a custom made app to import the collections from Mongo into SQL (possibly handling any required normalization).
Run analytics on the Windows SQL Server installation.
Are there any other "tried and true" alternatives?
Thanks,
Stefano
EDIT: for point 4, the analytics is to be run on SQL Server, not Mongo.
Overall looks fine, but i can suggest two things:
Skip backup/restore steps and read data directly from linux mongodb, because it will be harder and harder to backup/restore database as it will grow.
Instead of custom made app use Quartz.net for nightly export, it is easy to use and can solve any other schedule tasks.
Also i can suggest look into such new approaches as cqrs and event sourcing, that's basically allow to avoid export tasks. You can just handle messages and store data into two data sources (linux mongodb, windows sql server) in real time with small delay, or even analyze data from messages and store in mongodb.

Syncing Data between MS SQL Server and Oracle Server

My colleague and I are trying to find the best query to sync up an Oracle database with that of a SQL server. There are about 80k+ rows with ~19 columns of data in each row. We have a linked server setup between the two servers and we have a query that works but for 80k records, the query took 10 hours to copy the records over. I can post the query we used but I would like to have a fresh set of eyes. This is a new process so we aren't trying to retrofit a solution to existing code. LIke I said before, permissions aren't an issue, it is just a matter of getting the data from Point A to Point B in the quickest time. This is to be used on a coldfusion supported web site and the client would like to click a buttton to sync up the data but again, this is just "wish list" of requirements we are working with.
Additional Thoughs I'd like to add:
We have tried openquery and using linked server but both took about the same time to complete.
Most are varchar(64 bytes), a couple of varchar(128) and a couple of varchar(12 bytes).
One suggestion someone else made was to write the data to a flat file, ftp the flat file to Point B and then import it. That is a viable solution but the more steps we include, the more chances there are of something breaking.
Thanks in advance. I look forward to seeing y'alls solutions.
I've had more success with an SSIS package than linked servers. If you use the Oracle DLL's, it's not too bad.
Have you looked at Oracle Transparent Gateway? Here is the reference manual. It drives SQL Server from Oracle instead of the other way around.
Zidsoft CompareData you can set up the sync task visually and also scheduling it to run via the commandline. Disclosure: I am the developer of this product.

Suitable method For synchronising online and offline Data

I have two applications with own database.
1.) Desktop application which has vb.net winforms interface, runs in offline enterprise network and stores data in central database [SQL Server]
**All the data entry and other office operations are carried out and stored in central database
2.) Second application has been build on php. it has html pages and runs as website in online environment. It stores all data in mysql database.
**This application is accessed by registered members only and they are facilitied with different reports of the data processed by 1st application.
Now I have to synchronize data between online and offline database servers. I am planning for following:
1.) Write a small program to export all the data of SQL Server [offline server] to a file in CVS format.
2.) Login to admin Section of live server.
3.) Upload the exported cvs file to the server.
4.) Import the data from cvs file to mysql database.
Is the method i am planning good or it can be tunned to perform good. I would also appreciate for other nice ways for data synchronisation other than changing applications.. ie. network application to some other using mysql database
What you are asking for does not actually sound like bidirectional sync (or movement of data both ways from SQL Server to MySQL and from MySQL to SQL Server) which is a good thing as it really simplifies things for you. Although I suspect your method of using CSV's (which I would assume you would use something like BCP to do this) would work, one of the issues is that you are moving ALL of the data every time you run the process and you are basically overwriting the whole MySQL db everytime. This is obviously somewhat inefficient. Not to mention during that time the MySQL db would not be in a usable state.
One alternative (assuming you have SQL Server 2008 or higher) would be to look into using this technique along with Integrated Change Tracking or Integrated Change Capture. This is a capability within SQL Server that allows you to determine data that has changed since a certain point of time. What you could do is create a process that just extracts the changes since the last time you checked to a CSV file and then apply those to MySQL. If you do this, don't forget to also apply the deletes as well.
I don't think there's an off the shelf solution for what you want that you can use without customization - but the MS Sync framework (http://msdn.microsoft.com/en-us/sync/default) sounds close.
You will probably need to write a provider for MySQL to make it go - which may well be less work than writing the whole data synchronization logic from scratch. Voclare is right about the challenges you could face with writing your own synchronization mechanism...
Do look into SQL Server Integration Service as a good alternate.

Resources