Import Data from Mysql to Neo4j using Pentaho kettle Spoon - sql-server

I was trying to migrate my mysql database to the Neo4j database. For that I was using Pentaho Kettle. I have also Downloaded plugins for the Neo4j in Kettle.
I am a beginner to kettle. I don't know the efficient way to bulk load data from mysql database to Neo4j, using kettle. I am thinking of doing as follows:
First take input from table by writing sql query
Then connect it to the Execute Script and create a cyper query for creating node and relationships corresponding to the table Schema.
Is there any better way of doing this? Because my database size is large. I have to write so many sql and cipher queries for importing my sql database to neo4j. I am looking for some bulk load feature in pentaho kettle spoon.
Here are the snapshots of my Transformations.
Transformation
Input Table
Execute Script
My PDI version is 8.1.

Related

Data Migration from On Prem to Azure SQL (PaaS)

We have an on-prem SQL Server DB (SQL Server 2017 Comp 140) that is about 1.2 TB. We need to do a repeatable migration of just the data to an on cloud SQL (Paas). The on-prem has procedures and functions that do cross DB queries which eliminates the Data Migration Assistant. Many of the tables that we need to migrate are system versioned tables (just to make this more fun). Ideally we would like to move the data into a different schema of a different DB so we can avoid the use of External tables (worried about performance).
Moving the data is just the first step as we also need to do an ETL job on the data to massage it into the new table structure.
We are looking at using ADF but it has trouble with versioned tables unless we turn them off first.
What are other options that we can look and try to be able to do this quickly and repeatedly? Do we need to change to IaaS or use a third party tool? Did we miss options in ADF to handle this?
If I summarize your requirements, you are not just migrating a database to cloud but a complete architecture of your SQL Server, which includes:
1.2 TB of data,
Continuous data migration afterwards,
Procedures and functions for cross DB queries,
Versioned tables
Point 1, 3, and 4 can be done easily by creating and exporting .bacpac file using SQL Server Management Studio (SSMS) from on premises to Azure Blob storage and then importing that file in Azure SQL Database. The .bacpac file that we create in SSMS allows us to include all version tables which we can import at destination database.
Follow this third-party tutorial by sqlshack to migrate data to Azure SQL Database.
The stored procedures can also be moved using SQL Scripts. Follow the below steps:
Go the server in Management Studio
Select the database, right click on it Go to Task.
Select Generate Scripts option under Task
Once its started select the desired stored procedures you want to copy
and create a file of them and then run script from that file to the Azure SQL DB which you can login in SSMS.
The repeatable migration of data is challenging part. You can try it with Change Data Capture (CDC) but I'm not sure that is what exactly your requirement. You can enable the CDC on database level using below command:
Use <databasename>;
EXEC sys.sp_cdc_enable_db;
Refer to know more - https://www.qlik.com/us/change-data-capture/cdc-change-data-capture#:~:text=Change%20data%20capture%20(CDC)%20refers,a%20downstream%20process%20or%20system.

Migrating from SQL server to Neo4j with Pentaho Kettle Spoon

I want to Migrate from SQL Server to Neo4j. I used CSV for this mean but I need to a ETL Tool for solving this problem with simplest way.
for this reason I use Pentahoo Kettle Spoon.
I used this to connect to Neo4j with Pentaho Kettle Spoon.
How can I migrate from SQL Server to Neo4j with Pentaho Kettle Spoon?
Which Tools can help me in Pentaho Kettle Spoon?
I faced to this problem and I could solve it. :)
At first you need to add Table Input tool for getting records from SQL Server then you can add Execute SQL Script from Scripting tool.
create your Transformation from Table input to Execute SQL Script. then Get fields and check mark :
Execute for each row?
Execute as a single Statement
then you can add your Cypehr Query Like that:
CREATE(NodeName:NodeLabel{field1:?,field2:?,field3:?,...})
Execute Transformation and Enjoy it! :)
--------------------------------------------------------
Edited:
Load CSV Command in Neo4j is very faster than Create All Nodes node by node. you can use Load CSV advantages in Pentaho Kettle Spoon. for this mean we need two Transformations, first Transformation exports Data to CSV and second Transformation loads CSV to Neo4j.
For First Transformation:
add a Table Input and a Text File Output to transformation. Config Connection String and other parts of them.
For Config Neo4j Connection String, refer to this
For Second Transformation:
add a Execute SQL Script tool to transformation, Config Connection String and Write below code for that:
LOAD CSV FROM 'file:///C:/test.CSC' AS Line
CREATE(NodeName:NodeLabel{field1:Line[0],field2:Line[1],field3:Line[2],...})
at final create a job and add the transformations to that.

Sqoop Export into Sql Server VS Bulk Insert into SQL server

I have a unique query regarding Apache Sqoop. I have imported data using apache Sqoop import facility into my HDFS files.
Next ,. I need to put the data back into another database (basically I am performing data transfer from one database vendor to another database vendor) using Hadoop (Sqoop).
To Put data into Sql Server , there are 2 options.
1) Using Sqoop Export facility to connect to my RDBMS,(SQL server) and export data directly.
2) Copy the HDFS data files (which are in CSV format) into my local machine using copyToLocal command and then perform BCP ( or Bulk Insert Query) on those CSV files to put the data into SQL server database.
I would like to understand which is the perfect(or rather correct) approach to do so and which one of them is more Faster out of the two - The Bulk Insert or Apache Sqoop Export from HDFS into RDBMS. ??
Are there any other ways apart from these 2 ways mentioned above which can transfer faster from one database vendor to another.?
I am using 6-7 mappers (records to be transferred is around 20-25 millions)
Please suggest and Kindly let me know if my Question is unclear.
Thanks in Advance.
If all you do is ETL from one vendor to another, then going through Sqoop/HDFS is a poor choice. Sqoop makes perfect sense if the data originates in HDFS or is meant to stay in HDFS. I would also consider sqoop if the set is so large as to warrant a large cluster for the transformation stage. But a mere 25 million records is not worth it.
With SQL Server import it is imperative, on large imports, to achieve minimally logging, which require bulk insert. Although 25 mil is not so large as to make the bulk option imperative, still AFAIK sqoop, nor sqoop2, do not support bulk insert for SQL Server yet.
I recommend SSIS instead. Is much more mature than sqoop, it has bulk insert task and has a rich transformation featureset. Your small import is well within the size SSIS can handle.

What is the best way to move data between postgresql and SQL Server databases

If we have the same database schema in a database on Postgresql and SQL Server (table, primary keys, indexes and triggers are the same) what would be the best way to move data from one database to another? Currently we have one in-house .NET program that does the following through two ODBC connections:
read a row from source database table 1
construct an insert statement
write a row into destination database table 1
Go to 1 if there are more rows in the table
Move to next table in database and go to 1
Needless to say: this is a very slow process and I would be interested if there was a better/faster solution to this?
If it's a "one off" migration, there's a tool you get with SQL Server which allows you to move data around between databases (I'm not on a Windows machine right now, so can't tell you what it's called - something like import/export tool).
If it's an ongoing synchronisation, you can look at the MS Sync framework, which plays nice with SQL Server and Postgres.
The answer is bulk export and bulk loading. You can go much faster by using the copy command in PostgreSQL https://www.postgresql.org/docs/current/static/sql-copy.html to dump data from the tables in the CSV format and then use the bulk insert in SQLServer Import CSV file into SQL Server. A rule of thumb is to harness parallelism for the process. Check if you can load the data ins CSV in parallel to SQL Server and if you have many tables then you can also have a parallelism on the level of separate tables. By the way, loading or migrating data row by row is one of the slowest ways.

Best way to migrate export/import from SQL Server to oracle

I'm faced with needing access for reporting to some data that lives in Oracle and other data that lives in a SQL Server 2000 database. For various reasons these live on different sides of a firewall. Now we're looking at doing an export/import from sql server to oracle and I'd like some advice on the best way to go about it... The procedure will need to be fully automated and run nightly, so that excludes using the SQL developer tools. I also can't make a live link between databases from our (oracle) side as the firewall is in the way. The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting.
What I'm thinking about is writing a monster query for SQL Server (which I mostly have already) that will denormalise and read out the data from SQL Server into a flat file using the sql server equivalent of sqlplus as a scheduled task, dump into a Well Known Location, then on the oracle side have a cron job that copies down the file and loads it with sql loader and rebuilds indexes etc.
This is all doable, but very manual. Is there one or a combination of FOSS or standard oracle/SQL Server tools that could automate this for me? the Irreducible complexity is the query on one side and building indexes on the other, but I would love to not have to write the CSV dumping detail or the SQL loader script, just say dump this view out to CSV on one side, and on the other truncate and insert into this table from CSV and not worry about mapping column names and all other arcane sqlldr voodoo...
best practices? thoughts? comments?
edit: I have about 50+ columns all of varying types and lengths in my dataset, which is why I'd prefer to not have to write out how to generate and map each single column...
"The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting."
You are really looking for an ETL tool. If you have no money in the till, I suggest you check out the Open Source Talend and Pentaho offerings.

Resources