Talend Open Studio For MDM + Postgresql + synchronization of two databases

Talend Open Studio For MDM + Postgresql + synchronization of two databases - database

I have two database A and B(replica of A) now one live web application is
entering data into A now i want that the entries done in A should be
reflected in B.(i.e the changes in A should automatically reflect in B).
My sole purpose is synchronizing two databases and for that i have searched on Talend. I searched and came up with Talend MDM.I have installed MDM .I have searched on it but i am not getting whether it does database synchronization or not.Since there are other talend products like ESB,Data Integration etc. which one of them exactly is for syncing purpose.
Please suggest me.

IHMO, if you are looking for data replication between two databases having the same structure, then Talend is not what you are looking for.
Talend is an ETL tool (Extract Transform and Load). It would be applicable if in your case, your B database had a different structure than A. For that particular use case, you would use Talend in order to define some processing rules :
How do I extract data from A (Extract)
How do I transform A's data into B's data (Transform)
How do I store B's data (Load)
As mentioned by #jayadevan above, I would definitely look for inbuilt replication offered by your database.

Related

Copying tables from databases to a database in AWS in simplest and most reliable way

I have some tables from three databases that I want to copy their data to another database in an automated way and these data are quite large. My servers are running on AWS. What is the simplest and most reliable way to do so?
Edit
I want them to stay on-sync (automation process as DevOps engineer)
The databases are all MySQL and all moved between AWS EC2. The data is in range between 100GiB and 200GiB
Currently, Maxwell to take the data from the tables then moved to Kafka and then a script written in Java to feed the other database.

I believe you can use AWS Database Migration Service (DMS) to replicate tables from each source into a single target. You would have a single target endpoint and three source endpoints. You would have three replication tasks that would take data from each source and put it into your target. DMS can keep data in sync via ongoing replication. Be sure to read up on the documentation before proceeding as it isn't the most intuitive service to use, but it should be able to do what you are asking.
https://docs.aws.amazon.com/dms/latest/userguide/Welcome.html

SQL Server move data between databases

We have a requirement where we will have to move data between different database instance on regular basis. (For e.g. some customers willing to pay more for the better performance). So this is not going to be one off.
The database tables has referential integrity. Is there a way in which this can be done without rewriting sql script (or some other method) every time we migrate customers data?
I came across this How to move data between multiple database's table while maintaining foreign-key relationships/referential integrity?. However it appears that we have write script every time we migrate data (please correct me if I misunderstood the answer on this thread).
Thanks
Edit:
Both servers are using SQL Server 2012 (same version). Its an Azure SQL Server database.
They are not necessarily linked (no firewall between them)
We are only transferring some data, not the whole database. This is only for certain customers who opted pay more.
The schema are exactly same in both databases.

Preyash - please see the documentation on the Split-Merge tool. The Split-Merge tool enables you do move data between databases, as you have described, based on a sharding key (e.g., customer ID). One modification that you will need for your application is to add a shard map (i.e., a database that understand the global state of which customers resides in which databases).

Have a look into Azure Data Sync. It is much more aligned with your requirements. But you may end up in having another SQL Azure DB to maintain a Hub. Azure data Sync follows hub-spoke pattern and will let you do all flexible directional syncs with a few minutes of syncing gap. It is more simple and can set it up very fast without any scripts and all as you wanted.

Practical Implementation for Data Warehouse

Data warehousing seems to be a big trend these days, and is very interesting to me. I'm trying to acquaint myself with its concepts, and am having a problem "seeing the forest through the trees" because all of the data warehouse models and descriptions I can find online are theoretical, but don't gives examples with actual technologies being used. I'm a contextual learner, so abstracted, theoretical explanations don't really help me out all that much.
Now there seem to be many "data warehousing models", but all of them seem to have some similar characteristics. There is ually an "ODS" (operational data store that aggregates data from multiple sources into the same place. A process known as "ETL" then converts data in this ODS into a "data vault", and again into "data" and/or "strategy marts."
Can someone provide an example of the technologies that would be used for each of these components (ODS, ETL, data vault, data/strategy marts)?
It sounds like the ODS could just be any ordinary database, but the data vault seems to have some special things going on because it is used by these "marts" to pull data from.
ETL is the biggest thing I'm choking on by far. Is this a language? A framework? An algorithm?
I think once I see a concrete example of what's going on at each step of the way, I'll finally get it. Thanks in advance!

ETL is a process. The abbreviation stands for Extract-Transform-Load which describes what is being done with data during the process. The process can be implemented anywhere where you need to create a bridge between two systems with differenet data formats. First, you need to pull (exract) data from a source system (database, flat files, web service etc.), Then data are being processed (transform) to comply with format of a target storage (again it can vary: databases, files, API calls). During the transform step, further actions can be performed on the data set as enrichment with data from other sources, cleansing and improving its quality. The last step is loading transformed data into a target storage.
Typically, an ETL process is employed for loading a datawarehouse, migrating data from one system or database to another during moving from a legacy system to new one, synchronizing data between two or more systems. It is also used as an intermediate layer in broader MDM and BI solutions.
In terms of specific software, there are many ETL tools on the market ranging from robust solutions from big players as Informatica, IBM DataStage, Oracle Data Integrator, to more affordable and open source providers as CloverETL, Talend, or Pentaho. The most of these tools offer a GUI where flow and processing of data is defined through diagrams.

For Microsoft SQL Server 2005 and later the ETL tool is called SSIS (SQL Server Integration Services). If you install at least the Standard version of the SQL Server you get the Business Intelligence Developer Studio with which you can design your data flows. Basically what an ETL tool does is take data from one or more sources (tables, flat files, ...) then transform it (add columns, join, filter, map to different data types, etc.) and finally store it again to one or more tables or files.
To get a basic understanding of how something works you can watch e.g. this video or this one (both from midnightdba). They're a bit lengthy, but you get an idea. They certainly helped me in understanding the basic functionality of an ETL tool.
Unfortunately I have not yet digged into other platforms or tools.

I'd highly recommend checking out some of the books by Ralph Kimball and Margy Ross (The Data Warehouse Toolkit, The Data Warehouse Lifecycle Toolkit) for an introduction to data warehousing.
My company's data warehouse is built using the Oracle Warehouse Builder tool for ETL. The OWB is a GUI tool that generates PL/SQL code on the database to manipulate the data. After manipulation and cleansing, the data is published to an Oracle datamart. The datamart is a database instance that users access for ad-hoc querying via Oracle Discoverer (Java software).

Centralizing / abstracting SQL Server data from multiple tables / databases

If one has a number of databases (due to separate application front-ends) that provide a complete picture - for example a CRM, accounting, and product database - what methods are available to centralize/abstract this data for easy reporting?
Essentially, I'm wondering if there is a way to automatically pull data from multiple databases into a central repository that is continuously updated from the three databases and which can be used for reporting?
I'm also open to alternative best practice suggestions?

Look into building a Data warehouse.
It is difficult to provide very specific info, since no version of SQL Server is given, but SQL Server Data Warehouse Cribsheet has some general information.

you can have views that join data from all your other databases.
Or do you want replicated data on all servers?

Synchronizing 2 databases

I have a database in MySQL and another database that runs on MS SQL.
The MySQL is the backend database for my website running on Joomla.
I have an ERP running my store. This ERP is made by a 3rd party in .Net
A table called the orders gets updated whenever a user places an order in my website.
The order details must get flushed to my orders table in my ERP.
The table structure in the two databases are totally different so I will do the mapping myself.
My questions are:
How frequently should I transfer the data from my MySQL database to MS SQL?
Someone suggested that I could write a web service that would periodically pump data to my table in the ERP. So I started thinking about Nusoap webservices. Is this the right way or is there a better way to do it ??
I will also have to retrieve inventory-related information from my ERP to my MySQL database.

1: Depends on how often your data is changing, and how often you need to sync up (i.e., depends on your business).
2 & 3: A web service to transfer data could work just fine. But unless you're trying to come up with a general solution, this sounds like a lot more trouble than it's worth.
If I were doing this, I would export the data from Sql Server to a file, then import that file into mysql (mysql my_db < file.sql).
Getting data OUT of sql server in this format isn't so easy (there's no equivalent to mysqldump on Sql Server). But check out this question for some ideas.
If the data itself is compatible between systems (if the columns are equivalent data types), you can overcome the table structure differences by just creating a query in SQL Server which exports the data in the correct order.
In fact, you may be able to create a query who's output is the file.sql for import into mysql. For example, a query such as:
SELECT CONCAT(
'INSERT INTO MYTABLE VALUES (',
myColumn,
',',
myOtherColumn,
');'
) AS SQL_STATEMENT
Produces output something like:
INSERT INTO MYTABLE VALUES (myColumnValue1, myOtherColumnValue1);
INSERT INTO MYTABLE VALUES (myColumnValue2, myOtherColumnValue2);
....
I've exported data from sql server that way on at least one occasion.

How up to date do you need the ms sql database. That is going to be the deciding factor
I don't see any huge advantage to this being a web service.
This isn't a question.

Deciding how often you transfer the order across is a business decision not a technical one. But it is hard to see what competitive advantage you might gain from not processing your customers' orders as soon as possible, so it ought to be a no brainer.
Without knowing a lot more about your infrastructure and architecture we cannot give you definitive advice about approach. I would expect a decently written ERP package to include interfaces for importing and exporting information. Alas such expectations are often confounded. If you do need to write your own interface, avoid web services. Unless you have a very peculiar set-up all WS will mean is that it will take longer to satisfy your customers. I think we have already agreed that is not a good idea.
Considerations for a Syncronization API:
You need to track which new orders
have not been transferred to the ERP
database. A flag is clumsy, a queue
is perhaps more elegant.
Have a job/daemon polling
continuously to identify orders
which need to be transferred and
transfer them in near-real time.
Have a plan for handling the
unavailability of the ERP database.
Construct the mapping in a modular
fashion so you do not have to
rewrite the entire thing just
because of a change to the structure
of one of your tables.
The inventory data will probably
have to be pulled from the MySQL
database, as it seems unlikely that
the third party will allow you to
put code into their database. But
it's worth reading the contract.

Okay based on the replies I got I will rephrase my question giving more details.
I have an eCommerce portal running on Joomla and Virtue mart (never mind what they are !!)
The backend database here is MySQL.
I have an erp written in .net by my friend and the Db used there is MSSQL
Now I am going to host my eCommerce portal.
Following are actions that will take place and questions related to the actions
Action 1:
At the start of the day my friend updates inventory of various products on and erp table
question:
I want the updated inventory from the erp (MS SQL) to get reflected on my website database (MySQL) automatically. How do I do it ?
Action 2:
People come to my site and place orders.These orders are stored in an order table in my website(MYSQL).
Question 2:
I want these update orders related data from my website (MySQL) to be updated on a corresponding table in my erp (MS SQL)
More over the db structures of the tables in my erp and my website are completely different

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Talend Open Studio For MDM + Postgresql + synchronization of two databases - database

Related

Copying tables from databases to a database in AWS in simplest and most reliable way

SQL Server move data between databases

Practical Implementation for Data Warehouse

Centralizing / abstracting SQL Server data from multiple tables / databases

Synchronizing 2 databases

Categories

Resources