Merge Delta lake data to SQL Server - sql-server

I am trying write a merge from Deltalake (Databricks) to SQL Server and couldn't find a way do a merge.
I am thinking in the lines of foreachpartition have a connection for each partition and then it writes accordingly but the spark SQL connector shows mostly append or overwrite.
Please let me know if I am missing something here.
Connector: https://learn.microsoft.com/en-us/sql/big-data-cluster/spark-mssql-connector?view=sql-server-ver15

Related

Bulk load of tables from SQL Server into snowflake

I want to copy tables in various schemas from SQL Server to snowflake. I understand that snowflake COPY works well to load huge amount of data into snowflake, provided I have CSV data as input.
However, I am unable to figure out an efficient way to export SQL Server data in CSV format. I went through some of the threads in this forum on this topic and found that PowerShell tool export-csv is a good option. But does it work well with thousands of tables in SQL Server?
If not, what other option should I try to move the data from SQL Server to snowflake? Please note that this is not a one time data load. I am looking for a process that can run daily to load data from SQL Server to snowflake.
Thanks in advance!
P.S: I tried the SQL Server bcp tool. But it doesn't generate a standardized CSV file.

How to parse SQL files in pandas?

I am in an odd situation where I cannot connect to the server using python. I can however connect to the server in other ways using SQL Server Management, so from that end I can execute any query. The problem however is parsing in pandas, data retrieved from SQL Manager. As far as I am aware, data from SQL Manager can be retrieved as csv, txt or rpt. Parsing any of these formats is a pain in the neck and it's not always the same for all tables. My question is then, what is the fastest way to parse any of the file formats that SQL Manager can output in pandas? Is there a standard format that SQL Manager can output and which is parsed the same way in pandas for all tables? Has anyone faced this problem, or is there another workaround?

Querying a HIVE Table from SQL Server 2016 or later

I'm trying to query my Hortonworks cluster Hive tables from SQL Server. My scenario below:
HDP 2.6, Ambari, HiveServer2
SQL Server 2016 Enterprise
Kerberos configuration for secure logins in HDP
I was reading about the PolyBase service in SQL Server 2016 and I suppose later versions. However, I realize that according to the documentation what this service is going to perform in SQL Server is a bridge to reach out my HDFS and recreate external tables based in this data source.
Otherwise what I'm expecting is query Hive objects like these would be SQL Server objects as well, such as a linked server.
Someone has an example or knows if this is possible within SQL Server and Hive?
Thanks so much
Hive acts more as a job compiler than a database. This means every SQL statement you are writing will be translated into a job for Hadoop, sent over to the cluster and become executed there. From the user perspective it looks like querying a table.
The already mentioned approach by reading the HDFS data source and re-create it in SQL Server is the correct one. Since both, Hive and database server are different technologies, something like a linked server seems to technically not possible for me.
Hive provides nowadays a JDBC interface which could be used to connect to it. But even with Hive JDBC, every query will end up as cluster job for distributed computing, running over the files in HDFS, create a result set and present that to you.
If you want to querying Hive from SQL server, you can download ODBC driver (Microsoft or Hortonsworks) and create a Data Source Name (DSN) for Hive. In Advanced option check Use Native Query. Then just create new linked server in the SQL server with the same name of datasource as Data Source Name in ODBC driver.
Write openquery something like:
select top 100 * from
openquery(HadoopLinkedServer,
'column1, column2 from databaseInHadoop.tableInHadoop')

Migrating from SQL server to Neo4j with Pentaho Kettle Spoon

I want to Migrate from SQL Server to Neo4j. I used CSV for this mean but I need to a ETL Tool for solving this problem with simplest way.
for this reason I use Pentahoo Kettle Spoon.
I used this to connect to Neo4j with Pentaho Kettle Spoon.
How can I migrate from SQL Server to Neo4j with Pentaho Kettle Spoon?
Which Tools can help me in Pentaho Kettle Spoon?
I faced to this problem and I could solve it. :)
At first you need to add Table Input tool for getting records from SQL Server then you can add Execute SQL Script from Scripting tool.
create your Transformation from Table input to Execute SQL Script. then Get fields and check mark :
Execute for each row?
Execute as a single Statement
then you can add your Cypehr Query Like that:
CREATE(NodeName:NodeLabel{field1:?,field2:?,field3:?,...})
Execute Transformation and Enjoy it! :)
--------------------------------------------------------
Edited:
Load CSV Command in Neo4j is very faster than Create All Nodes node by node. you can use Load CSV advantages in Pentaho Kettle Spoon. for this mean we need two Transformations, first Transformation exports Data to CSV and second Transformation loads CSV to Neo4j.
For First Transformation:
add a Table Input and a Text File Output to transformation. Config Connection String and other parts of them.
For Config Neo4j Connection String, refer to this
For Second Transformation:
add a Execute SQL Script tool to transformation, Config Connection String and Write below code for that:
LOAD CSV FROM 'file:///C:/test.CSC' AS Line
CREATE(NodeName:NodeLabel{field1:Line[0],field2:Line[1],field3:Line[2],...})
at final create a job and add the transformations to that.

How do I convert my SQL SERVER data into a SAS Table?

I am using Enterprise Miner 6.2 and want to create a data source but my option is a SAS Table. How do I go about exporting SQL Server or Excel data into a SAS table?
SAS has many ways of connecting to and/or reading data from disparate sources. I haven't used Enterprise Miner, so I'm not sure which of SAS' methods are available to you directly from within EM, but it's likely there will be someone at your site who has some interface to Base SAS and who can help you/advise what data access products are installed and how you can use them.
For SQL Server data, SAS/Access to SQL Server or SAS/Access to OLE DB will allow you to read directly from SQL Server tables in place. Alternatively, someone could provide you with a dump of the data you need from the SQL Server database.
For Excel data, there are also SAS/Access products, but SAS also has native capabilities to read in the data if saved as, for example, a .csv or .txt file.
To help answer you further, perhaps can you come back with some details about what SAS products/interfaces are available to you?

Resources