Is there a way by which we can establish a connection from AWS to sql server and pull the data. I am aware of the method of using cdata connector with glue jobs and it looks promising but I want to explore options here. The idea is to pull the data from sql server to s3 bucket.
You can directly use from_options method of GLUE to pull data from below data stores
s3, mysql, postgresql, redshift, sqlserver, oracle, and dynamodb
and dump wherever you required.
More Information on
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame-reader.html
Related
My client has his data stored on SQL Server hosted on an on-premise network. I established a VPN connection from Google to the network, but I don't know how to follow from here. My final goal is to process his data using cloud functions. Any suggestions?
PS: I read that Shared VPC can be used to accomplish this, but I don't have a proper organization for this purpose :/
Edit: I followed the suggestions on the comments but now I'm missing to extract the data since pyodbc is not pre-installed on Cloud Functions. Any ideas oh how to query an on-prem database on SQL Server through Cloud Functions?
I'm trying to use AWS QuickSight to analyse some data that is being stored in SQL Server on an Azure SQL server.
According to QuickSight, it can connect to an SQL Server, but whenever I try to validate the connection, the process hangs for about a minute then comes back with 'Cannot open server "..." requested by the login. The login failed.'
I initially suspected that this was an issue with the firewall on the MS SQL server on Azure. Accordingly, I looked up the IP regions here
The region I am using is US West (Oregon) (us-west-2), so I white-listed the IP range 54.70.204.128 to 54.70.204.159 - but I am still getting exactly the same 1 minute (or so) pause, before the error comes back in Quick Sight.
The exact error is:
Your database generated a SQL exception. This can be caused by query timeouts, resource constraints, unexpected DDL alterations before or during a query, and other database errors. Check your database settings and your query, and try again.
If I click "Show Details" then I get a further message saying:
Error details
region: us-west-2
timestamp: XXX
requestId: XXX
sourceErrorCode:40532
sourceErrorMessage: Cannot open server "..." requested by the login. The login failed. ClientConnectionId:*
sourceErrorState: S0001
sourceException: com.microsoft.sqlserver.jdbc.SQLServerException
sourceType: SQLSERVER
Obviously some of the above has been redacted.
I cannot believe that QuickSight cannot connect to an Azure MS SQL database, so I'm wondering if anyone else had had this problem, and what their solution was?
I myself had this issue and it seems many others did. However, as noted above, there is little to no documentation that provides the steps to connect Quicksight and Azure Sql Server.
The issues for myself were primarily in the details that I gave Quicksight. Most connections with Azure sql server database connect seamlessly with your basic information:
Server, Port, Database Name, Username, Password
However, AWS Quicksight was trying to connect to my Azure Sql server with JDBC authentification.
JDBC Authentification requires your Username input to be = "username"#"servername"
Example of Correct Connection:
Server: "servername".database.windows.net
Port: 1433
Database Name: "databasename"
Username: "username"#"servername"
Password: "password"
Lastly, I turned off the SSL checkbox. It did not work with SSL connection.
Please see this document: Relational Data Sources
You can use any of the following relational data stores as data sources for Amazon QuickSight:
Amazon Athena
Amazon Aurora
Amazon Redshift
Amazon Redshift Spectrum
Amazon S3
Amazon S3 Analytics
Apache Spark 2.0 or later
MariaDB 10.0 or later
Microsoft SQL Server 2012 or later
MySQL 5.1 or later
PostgreSQL 9.3.1 or later
Presto 0.167 or later
Snowflake
Teradata 14.0 or later
Note
You can access additional data sources not listed here by linking or importing them through supported data sources.
You can retrieve data from tables and materialized views in PostgreSQL instances, and from tables in all other database instances.
Amazon Redshift clusters, Amazon Athena databases, and Amazon Relational Database Service (RDS) instances must be in AWS. Other database instances must be in one of the following environments to be accessible from Amazon QuickSight:
Amazon EC2
On your local network
In a data center or some other internet-accessible environment
AWS QuickSight document doesn't say it support Azure SQL database, just said the supproted other database environments.
And others have asked this problems in AWS Discussion Forums, no one or AWS QuickSight official given the answer.
Reference: Can Quicksight connect to Azure SQL Database?
What we can guess that it doesn't support Azure SQL for now.
Hope this helps.
According to any Googling I have done, and the responses posted here, it appears that while there is no specific statement from AWS or Azure saying the two cannot be connected, equally there is no response to say that they can. Interestingly, nobody has responded to say that they have already got it working. My feeling at the moment is that it cannot work.
While Azure SQL is not explicitly listed as one of the QuickSight data sources, you can still use it as a data source. TLS/SSL is also supported now and I have tested it personally.
You just need to make sure to use the "username"#"servername" format for the Username, as mentioned by Scotty Smith.
We found in documentation the following:
AWS Glue can connect to the following data stores by using the JDBC protocol:
• Amazon Redshift
• Amazon Relational Database Service (MySQL, PostgreSQL, Aurora, and MariaDB)
• Publicly accessible (Amazon Redshift, MySQL, PostgreSQL, Aurora, and MariaDB) databases
Is it possible to make a JDBC connection with SQL Server for data stores? I'm trying create to Crawler with data store in SQL Server.
Should I create new instance of SQL Server on RDS?
Thanks
It would be possible if the correct JDBC driver was integrated into AWS Glue but it is not. One of the downsides of a serverless environment is you can't add drivers to the server.
AWS reps have informed me that at present, you cannot connect to a database outside an Amazon VPC. This is obviously frustrating. I believe they are putting it on the roadmap.
If you are able to set up an RDS instance with a database they didn't explicitly name, you should try setting up a Glue job to connect to it. If it fails at first because it lacks the nece, I would imagine you should be able to connect to it by supplying the JDBC driver
You can connect to SQL Server using JDBC, here is a article on how to do it.
https://www.progress.com/tutorials/jdbc/accessing-data-using-jdbc-on-aws-glue
Although it's for Salesforce, you can use the similar steps for SQL Server too. Just replace Salesforce JDBC driver with SQL Server JDBC driver.
I have an on premise oracle database. Can I use anything on AWS e.g. API Gateway to query the database and expose the results via API? I know I could do API Gateway -> Lambda -> Oracle DB where the code in the Lambda function would query the database (assuming query takes less than 5 mins). Are there any other easy options that would be serverless and with minimal amount of code?
Basically I would like to find the simplest way to create an API layer over the top of an existing on premise oracle database so that applications (hosted on AWS) can access this data without connecting directly to the database. Does AWS provide anything out of the box?
There does not seem to be an out of the box way provided by AWS to connect API Gateway to your on premise Oracle DB. So basically the way you provided (API Gateway->Lambda->Oracle) should be the way to go.
Now the question is if you want to connect to your Oracle directly or if your want to create a replication of your database in RDS and create a synch mechanism between RDS and your on premise Oracle DB to keep the DB highly responsive and available (in case of network failure between AWS and your local network). I think that depends on how you access your DB on premise.
If your won't create an replica in RDS you should at least use a VPN connection to your local network to keep data transfer from on premise Oracle to AWS RDS secure.
Yes it is possible to use AWS Lambda and expose the API through API
Gateway. But that is the easy part.
The tough part is to get your On-Premise database connected to AWS
infrastructure. If you have an on premise database, and you are
working in a large enterprise, you will need to get through a lot of
approvals to setup a VPN or a AWS Direct Connect.
The ideal solution is to use AWS Direct Connect to extend your
corporate infrastructure to connect to AWS and then use Lambda to
connect to the DB.
Also there is no out of the box solution in AWS to connect to
OracleDB. At the most, you can wrap all business logic in Stored
Procedures, and execute them in the lambda function. You can always
use the JDBC from Lambda to connect and query your database.
Try this from AWS Marketplace https://aws.amazon.com/marketplace/pp/B01MU8W71L
Basically, I need to work on streaming data from a SQL Server table using Node and I was wondering if there is a server somewhere that would allow for some test data to work with.
You can use Ragic or Amazon.com's simpledb or Cloud SQL from Google or Dedicated DB Servers or follow some more at cloudboost