Is there a way to connect to a sample hadoop DB online. I'm trying to test the kerberos connection but I need to test on an actual Hadoop database. I just need to read from the database, is there such service?
Thanks!
Related
I typically use pyodbc when running jupyter notebooks from my machine, but this does not work on Azure ML. My assumption is that this is being caused by Azure ML not knowing if I'm on my company's network as I typically need a VPN to the server if I'm not in office. The only solutions I can find online involve copying the data over on Azure Data Factory however I need to avoid this if possible as there are many tables I will need to experiment with, but nothing is intended to be long term and I'm unsure what I will even end up using.
Ideally there is a way to make pyodbc work but any other suggestions are welcome. I have researched integration runtimes but was unsure if that would solve my problem here.
The only solutions I can find online involve copying the data over on
Azure Data Factory however I need to avoid this if possible as there
are many tables I will need to experiment with, but nothing is
intended to be long term and I’m unsure what I will even end up using.
Ideally there is a way to make pyodbc work but any other suggestions
Unfortunately, the on-Prem SQL Server is not supported as a Data Source in Azure ML.
Only the Data sources available below are supported:-
Approach1)
You can copy your data from the on-premises SQL database to Azure SQL via copy tool in Azure Data factory and connect to Azure SQL via Azure Machine learning by directly connecting to it via Datasource like below:-
You can also use Self-hosted integration run time to connect to your SQL server on-prem in your data factory:-
Click on Option 2 to download the Integration runtime and set it in your local machine with the Registration keys mentioned above:-
Approach2)
If there’s a large data You can automate your entire copy process from the on-prem SQL server to Azure SQL by using the Azure DevOps pipeline.
References:-
https://learn.microsoft.com/en-us/answers/questions/775844/unable-to-connect-sql-server-to-azure-ml-pipeline By Ramr-msft
How To: Azure Data Factory CI/CD with Azure DevOps pipelines — The YAML WAY! | by Raghavendra Bharadwaj | Servian
Is there a way by which we can establish a connection from AWS to sql server and pull the data. I am aware of the method of using cdata connector with glue jobs and it looks promising but I want to explore options here. The idea is to pull the data from sql server to s3 bucket.
You can directly use from_options method of GLUE to pull data from below data stores
s3, mysql, postgresql, redshift, sqlserver, oracle, and dynamodb
and dump wherever you required.
More Information on
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame-reader.html
My client has his data stored on SQL Server hosted on an on-premise network. I established a VPN connection from Google to the network, but I don't know how to follow from here. My final goal is to process his data using cloud functions. Any suggestions?
PS: I read that Shared VPC can be used to accomplish this, but I don't have a proper organization for this purpose :/
Edit: I followed the suggestions on the comments but now I'm missing to extract the data since pyodbc is not pre-installed on Cloud Functions. Any ideas oh how to query an on-prem database on SQL Server through Cloud Functions?
I would like to connect to an on-premise database (say SQL Server) from Azure Databricks notebook, via REST API Call. Also, I would like to perform an UPSERT operation on a table in the database from the same.
Is it possible?
Kindly upload the necessary steps.
You can use JDBC to connect to an on-premise database (say SQL Server) from Azure Databricks notebook.
You could reference this Azure document: SQL Databases using JDBC:
This article covers how to use the DataFrame API to connect to SQL
databases using JDBC and how to control the parallelism of reads
through the JDBC interface. This article provides detailed examples
using the Scala API, with abbreviated Python and Spark SQL examples
at the end.
You also could reference the document Connect your Azure Databricks Workspace to your on-premises network #Sebastian Inones provided in the comment.
Ref this here: Connecting to on-prem SQL Server through Azure Databricks
Hope this helps.
We found in documentation the following:
AWS Glue can connect to the following data stores by using the JDBC protocol:
• Amazon Redshift
• Amazon Relational Database Service (MySQL, PostgreSQL, Aurora, and MariaDB)
• Publicly accessible (Amazon Redshift, MySQL, PostgreSQL, Aurora, and MariaDB) databases
Is it possible to make a JDBC connection with SQL Server for data stores? I'm trying create to Crawler with data store in SQL Server.
Should I create new instance of SQL Server on RDS?
Thanks
It would be possible if the correct JDBC driver was integrated into AWS Glue but it is not. One of the downsides of a serverless environment is you can't add drivers to the server.
AWS reps have informed me that at present, you cannot connect to a database outside an Amazon VPC. This is obviously frustrating. I believe they are putting it on the roadmap.
If you are able to set up an RDS instance with a database they didn't explicitly name, you should try setting up a Glue job to connect to it. If it fails at first because it lacks the nece, I would imagine you should be able to connect to it by supplying the JDBC driver
You can connect to SQL Server using JDBC, here is a article on how to do it.
https://www.progress.com/tutorials/jdbc/accessing-data-using-jdbc-on-aws-glue
Although it's for Salesforce, you can use the similar steps for SQL Server too. Just replace Salesforce JDBC driver with SQL Server JDBC driver.