Is it possible to use the JDBC connector https://docs.databricks.com/data/data-sources/sql-databases.html in order to get data from local SQL server. (and export it to delta lake)
Using:
jdbcUrl = "jdbc:mysql://{0}:{1}/{2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
"user" : jdbcUsername,
"password" : jdbcPassword,
"driver" : "com.mysql.jdbc.Driver"
}
Irrespective if you have MySql or SQL Server, Databricks driver supports both as outlined in the article you linked.
From the perspective of access to on-prem - the answer is yes, however Databricks must be able to connect to it. Usually this will mean deploying your Databricks clusters into your VNET which has access to your on-prem resources, e.g. following the guidance here
Alternatively you could use Azure Data Factory self-hosted integration runtime to move the data to a staging/"Bronze" storage in the cloud and pick it up with a Databricks task to move it to a Delta table.
Related
I am not able to copy data from ADLS gen2 to SQL Server (its not Azure SQL) using ADF.
What I have done is like this:
Created Data Set: Adls gen2 dataset Src
SQL Server DataSet tgt
But it doesn't allow me to choose tgt as my sink, though it lists down to choose the sink if the data set is either from (Azure SQL or Data Lake).
You will have to create an Integration Runtime and configure the same in your SQL Server Linked Service in ADF.
SQL Server is supported as sink, you can find the details here
As SQL Server is a different compute environment than Azure, you will have to create IR (Integration Runtime) so that Azure and SQL Server can communicate with each other.
Integration Runtime
If you want create on-premise SQL Server as dataset, you must install the Self-hosted integration manually:
A self-hosted integration runtime can run copy activities between a
cloud data store and a data store in a private network. It also can
dispatch transform activities against compute resources in an
on-premises network or an Azure virtual network. The installation of
a self-hosted integration runtime needs an on-premises machine or a
virtual machine inside a private network.
If you're using Data Flow, Data Flow doesn't support self-hosted integration so that we can't use SQL Server as connector:
You must use Copy active instead.
HTH.
I have amazon EC2 instance on AWS where SQL server is installed. I want to migrate SQL Server data and SSIS packages to Azure VMs.
I can not effort any loss of information.
What would be the best way to do so ?
I would recommend to use Azure Blob Storage as the transport.
SQL Server out of the box can create backups and can restore databases directly to/from Azure Blob containers
For instance:
CREATE CREDENTIAL sql_backup_credential
--storage account name:
WITH IDENTITY='your-storage-account',
--storage account key from portal:
SECRET = 'pvv99UFQvuLadBEb7ClZhRsf9zE8/OA9B9E2ZV2kuoDXu7hy0YA5OTgr89tEAqZygH+3ckJQzk8a4+mpmjN7Lg=='
BACKUP DATABASE Test
TO URL = N'https://your-storage-account.blob.core.windows.net/your-container/test.bak'
WITH credential = 'sql_backup_credential';
So, this way all user databases + SSISDB can be transferred from one isolated box to another.
Of course, Firewall settings of SQL Server VMs to allow outbound and inbound connections for HTTPS.
I would like to connect to an on-premise database (say SQL Server) from Azure Databricks notebook, via REST API Call. Also, I would like to perform an UPSERT operation on a table in the database from the same.
Is it possible?
Kindly upload the necessary steps.
You can use JDBC to connect to an on-premise database (say SQL Server) from Azure Databricks notebook.
You could reference this Azure document: SQL Databases using JDBC:
This article covers how to use the DataFrame API to connect to SQL
databases using JDBC and how to control the parallelism of reads
through the JDBC interface. This article provides detailed examples
using the Scala API, with abbreviated Python and Spark SQL examples
at the end.
You also could reference the document Connect your Azure Databricks Workspace to your on-premises network #Sebastian Inones provided in the comment.
Ref this here: Connecting to on-prem SQL Server through Azure Databricks
Hope this helps.
I'm trying to use AWS QuickSight to analyse some data that is being stored in SQL Server on an Azure SQL server.
According to QuickSight, it can connect to an SQL Server, but whenever I try to validate the connection, the process hangs for about a minute then comes back with 'Cannot open server "..." requested by the login. The login failed.'
I initially suspected that this was an issue with the firewall on the MS SQL server on Azure. Accordingly, I looked up the IP regions here
The region I am using is US West (Oregon) (us-west-2), so I white-listed the IP range 54.70.204.128 to 54.70.204.159 - but I am still getting exactly the same 1 minute (or so) pause, before the error comes back in Quick Sight.
The exact error is:
Your database generated a SQL exception. This can be caused by query timeouts, resource constraints, unexpected DDL alterations before or during a query, and other database errors. Check your database settings and your query, and try again.
If I click "Show Details" then I get a further message saying:
Error details
region: us-west-2
timestamp: XXX
requestId: XXX
sourceErrorCode:40532
sourceErrorMessage: Cannot open server "..." requested by the login. The login failed. ClientConnectionId:*
sourceErrorState: S0001
sourceException: com.microsoft.sqlserver.jdbc.SQLServerException
sourceType: SQLSERVER
Obviously some of the above has been redacted.
I cannot believe that QuickSight cannot connect to an Azure MS SQL database, so I'm wondering if anyone else had had this problem, and what their solution was?
I myself had this issue and it seems many others did. However, as noted above, there is little to no documentation that provides the steps to connect Quicksight and Azure Sql Server.
The issues for myself were primarily in the details that I gave Quicksight. Most connections with Azure sql server database connect seamlessly with your basic information:
Server, Port, Database Name, Username, Password
However, AWS Quicksight was trying to connect to my Azure Sql server with JDBC authentification.
JDBC Authentification requires your Username input to be = "username"#"servername"
Example of Correct Connection:
Server: "servername".database.windows.net
Port: 1433
Database Name: "databasename"
Username: "username"#"servername"
Password: "password"
Lastly, I turned off the SSL checkbox. It did not work with SSL connection.
Please see this document: Relational Data Sources
You can use any of the following relational data stores as data sources for Amazon QuickSight:
Amazon Athena
Amazon Aurora
Amazon Redshift
Amazon Redshift Spectrum
Amazon S3
Amazon S3 Analytics
Apache Spark 2.0 or later
MariaDB 10.0 or later
Microsoft SQL Server 2012 or later
MySQL 5.1 or later
PostgreSQL 9.3.1 or later
Presto 0.167 or later
Snowflake
Teradata 14.0 or later
Note
You can access additional data sources not listed here by linking or importing them through supported data sources.
You can retrieve data from tables and materialized views in PostgreSQL instances, and from tables in all other database instances.
Amazon Redshift clusters, Amazon Athena databases, and Amazon Relational Database Service (RDS) instances must be in AWS. Other database instances must be in one of the following environments to be accessible from Amazon QuickSight:
Amazon EC2
On your local network
In a data center or some other internet-accessible environment
AWS QuickSight document doesn't say it support Azure SQL database, just said the supproted other database environments.
And others have asked this problems in AWS Discussion Forums, no one or AWS QuickSight official given the answer.
Reference: Can Quicksight connect to Azure SQL Database?
What we can guess that it doesn't support Azure SQL for now.
Hope this helps.
According to any Googling I have done, and the responses posted here, it appears that while there is no specific statement from AWS or Azure saying the two cannot be connected, equally there is no response to say that they can. Interestingly, nobody has responded to say that they have already got it working. My feeling at the moment is that it cannot work.
While Azure SQL is not explicitly listed as one of the QuickSight data sources, you can still use it as a data source. TLS/SSL is also supported now and I have tested it personally.
You just need to make sure to use the "username"#"servername" format for the Username, as mentioned by Scotty Smith.
We found in documentation the following:
AWS Glue can connect to the following data stores by using the JDBC protocol:
• Amazon Redshift
• Amazon Relational Database Service (MySQL, PostgreSQL, Aurora, and MariaDB)
• Publicly accessible (Amazon Redshift, MySQL, PostgreSQL, Aurora, and MariaDB) databases
Is it possible to make a JDBC connection with SQL Server for data stores? I'm trying create to Crawler with data store in SQL Server.
Should I create new instance of SQL Server on RDS?
Thanks
It would be possible if the correct JDBC driver was integrated into AWS Glue but it is not. One of the downsides of a serverless environment is you can't add drivers to the server.
AWS reps have informed me that at present, you cannot connect to a database outside an Amazon VPC. This is obviously frustrating. I believe they are putting it on the roadmap.
If you are able to set up an RDS instance with a database they didn't explicitly name, you should try setting up a Glue job to connect to it. If it fails at first because it lacks the nece, I would imagine you should be able to connect to it by supplying the JDBC driver
You can connect to SQL Server using JDBC, here is a article on how to do it.
https://www.progress.com/tutorials/jdbc/accessing-data-using-jdbc-on-aws-glue
Although it's for Salesforce, you can use the similar steps for SQL Server too. Just replace Salesforce JDBC driver with SQL Server JDBC driver.