AWS EMR serverless connect to jdbc SQL Server - sql-server

I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark.
So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an error.
Note: My VPC Security Group has enabled all traffic ports
So, this is my submit job,
applicationId=app_id,
executionRoleArn="my-role",
jobDriver={
"sparkSubmit": {
"entryPoint": "s3://emr-studio-rts/scripts/ms-sql-fetch.py",
"entryPointArguments": ["s3://emr-studio-rts/output"],
"sparkSubmitParameters": "--jars https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar --conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1",
}
},
configurationOverrides={
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://emr-studio-rts/logs"}
}
},
)
Now I can show the error as, for the line,
spark = SparkSession\
.builder\
.appName('test-db') \
.config('spark.driver.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
.config('spark.executor.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
.config("spark.executor.cores", "1") \
.getOrCreate()
#read table data into a spark dataframe
df1 = spark.read.format("jdbc") \
.option("url", f"jdbc:sqlserver://{my_host}:1433;databaseName={my_database};") \
.option("dbtable", table_name) \
.option("user", my_user) \
.option("password", my_password) \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
.load()
as follows,
Status Details: Job failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: : com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 3.12.0.70, port 1433 has failed. Error: "Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". py4j.protocol.Py4JJavaError: An error occurred while calling o93.load.

Related

Apache Spark Connector Driver not suitable

Trying to connect to sqlserver using Azure Apache Spark connector and getting the following error
java.sql.SQLException: No suitable driver
Databricks cluster has com.microsoft.azure:spark-mssql-connector_2.12:1.2.0 for Apache Spark 3.1.2, Scala 2.12 as stated my documentation
That's the only library I have installed on the cluster.
Went through docs on
https://github.com/microsoft/sql-spark-connector
jdbcHostname = "server name"
jdbcPort = 1433
jdbcDatabase = "database name"
Table="tbl.name"
JDBC_URL='"jdbc:sqlserver://{0}:{1};database={2}"'.format(jdbcHostname,jdbcPort,jdbcDatabase)
username="user"
password="pass"
jdbcDF = spark.read \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", JDBC_URL) \
.option("dbtable", Table) \
.option("user", username) \
.option("password", password).load()

ERROR - Import from SQL Server to GCS using Apache Sqoop & Dataproc

I am trying to import data from SQL Server to Google Cloud Storage which I will later on upload to BigQuery. I am doing all of this through Google's Cloud Shell.
I have done the initial steps of downloading Sqoop and Sql server JDBC file and downloading and then uploading it to a specific google cloud storage. I have also created a Google Dataproc cluster to submit the Sqoop job but when I try to use the submission code it throws me a couple of error.
I am following this process (https://medium.com/datamindedbe/import-sql-server-data-in-bigquery-d640441d5d56), in my case I am trying to extract one table first.Code to submit a job through dataproc
WHAT I HAVE TRIED
I do have the SQL server jdbc .jar (mssql-jdbc-8.2.1.jre8.jar) file
in cloud storage with other dependent files
I have also checked my TCP/IP connection in my SQL Server 2014 and it
is in the recommended state as the error suggests
CODE I USED TO SUBMIT A SQOOP JOB TO DATAPROC CLUSTER
CLUSTERNAME="sqoop-cluster"
BUCKET="gs://sqoop-bucket-20092021"
libs=`gsutil ls $BUCKET/jars | paste -sd, --`
JDBC_STR="jdbc:sqlserver://RUKSQLRS01:1433;databaseName=RUKDataWarehouse"
SQL_USER="RUKSQLDataWarehouse_Reporting"
SQL_PASS="gs://sqoop-bucket-20092021/creds/sqoop.password"
TABLE="LBD_Task"
SCHEMA="dbo"
gcloud dataproc jobs submit hadoop \
--region europe-west2 \
--cluster="$CLUSTERNAME"\
--jars=$libs \
--class=org.apache.sqoop.Sqoop \
-- \
import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-Dmapreduce.job.user.classpath.first=true \
--connect "$JDBC_STR" \
--username "$SQL_USER" \
--password-file "$SQL_PASS" \
--table "$SCHEMA.$TABLE" \
--warehouse-dir "$BUCKET/output/$TABLE" \
--num-mappers 1 \
--as-avrodatafile
ERROR I AM GETTING
21/09/22 11:30:46 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
21/09/22 11:30:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
21/09/22 11:30:48 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
21/09/22 11:30:48 INFO manager.SqlManager: Using default fetchSize of 1000
21/09/22 11:30:48 INFO tool.CodeGenTool: Beginning code generation
21/09/22 11:31:02 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host RUKSQLRS01, port 1433 has failed. Error: "RUKSQLRS01. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host RUKSQLRS01, port 1433 has failed. Error: "RUKSQLRS01. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:227)
at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:284)
at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2435)
at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:635)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2010)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:904)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
21/09/22 11:31:02 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1677)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
This seems to be a networking issue. Your SQL server is outside of GCP and you are trying to access it through hostname. You need to use either external IP and setup firewall rules on the SQL server side to allow accessing from GCP, or setup VPN between your GCP VPC network and the SQL server's network and access SQL server through internal IP.

How to install and scrape metric for Nginx and mssql in prometheus

I am trying to setup prometheus monitoring tool for a project. While I am able to configure prometheus to scrape server metrics with node_exporter and DNS with blackbox_exporter, I am having a hard time setting up metrics for Nginx and MsSQL in prometheus. I will appreciate any resources or walk through to get the exporter for both as well as the installation process for prometheus to scrape data. The project is hosted on digital ocean cloud.
Both appear straightforward:
The NGINX exporter
The official Microsoft SQL Server exporter
Please update your question and explain:
Exactly what you tried (include code and config)
What errors you received
Example
This would be better under e.g. Docker Compose but...
nginx.conf:
events {
use epoll;
worker_connections 128;
}
http {
server {
location = /basic_status {
stub_status;
}
}
}
Run NGINX (defaults to :80):
docker run \
--interactive --tty --rm \
--net=host \
--volume=${PWD}/nginx.conf:/etc/nginx/nginx.conf:ro \
nginx
Test it:
curl \
--request GET \
http://localhost/basic_status
Run Exporter:
docker run \
--interactive --tty --rm \
--net=host \
nginx/nginx-prometheus-exporter \
-nginx.scrape-uri=http://localhost/basic_status
Test it:
curl \
--request GET \
http://localhost:9113/metrics
Yields:
# HELP nginx_connections_accepted Accepted client connections
# TYPE nginx_connections_accepted counter
nginx_connections_accepted 1
# HELP nginx_connections_active Active client connections
# TYPE nginx_connections_active gauge
nginx_connections_active 1
# HELP nginx_connections_handled Handled client connections
# TYPE nginx_connections_handled counter
nginx_connections_handled 1
# HELP nginx_connections_reading Connections where NGINX is reading the request header
# TYPE nginx_connections_reading gauge
nginx_connections_reading 0
# HELP nginx_connections_waiting Idle client connections
# TYPE nginx_connections_waiting gauge
nginx_connections_waiting 0
# HELP nginx_connections_writing Connections where NGINX is writing the response back to the client
# TYPE nginx_connections_writing gauge
nginx_connections_writing 1
# HELP nginx_http_requests_total Total http requests
# TYPE nginx_http_requests_total counter
nginx_http_requests_total 2
# HELP nginx_up Status of the last metric scrape
# TYPE nginx_up gauge
nginx_up 1
# HELP nginxexporter_build_info Exporter build information
# TYPE nginxexporter_build_info gauge
nginxexporter_build_info{commit="5f88afbd906baae02edfbab4f5715e06d88538a0",date="2021-03-22T20:16:09Z",version="0.9.0"} 1
prometheus.yml:
scrape_configs:
# Self
- job_name: "prometheus-server"
static_configs:
- targets:
- "localhost:9090"
# NGINX
- job_name: "nginx"
static_configs:
- targets:
- "localhost:9113"
Then:
docker run \
--interactive --tty --rm \
--net=host \
--volume=${PWD}/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus:v2.26.0 \
--config.file=/etc/prometheus/prometheus.yml
Then:
Targets:
Graph:

Issue using pg_dumpl

Ok, here's the issue: I want to upload a DB I've got locally and I'm using in localhost, but, when I use:
PGPASSWORD=mypassword pg_dump -Fc --no-acl --no-owner -h localhost -U root app_db > app_db.dump
I always get:
pg_dump: [archiver (db)] connection to database "app_db" failed: could not connect to server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
I think the problem is that I'm writing localhost instead of my heroku app address, can anyone tell me how can I get the correct address?
You need to have postgresql installed and running

Sqoop connection to MS SQL timeout

I am attempting to connect to Microsoft SQL Server using Sqoop. I have installed the JDBC driver from Microsoft by following the instructions for the Sqoop Connector and the JDBC Driver. Next I attempt to list the databases on the server. I have tried the following commands:
sqoop list-databases --connect 'jdbc:sqlserver://<HOST>' --username <USER> --password <PASS>
sqoop list-databases --connect 'jdbc:sqlserver://<HOST>;username=<USER>;password=<PASS>'
sqoop list-databases --connect 'jdbc:sqlserver://<HOST>;username=<USER>;password=<PASS>' --username <USER> --password <PASS>
Each of these commands produces the same error messages.
13/01/02 10:44:52 ERROR sqoop.ConnFactory: Error loading ManagerFactory information from file <MY SQOOP DIRECTORY>/conf/managers.d/mssqoop-sqlserver: java.io.IOException: the content of connector file must be in form of key=value
at org.apache.sqoop.ConnFactory.addManagersFromFile(ConnFactory.java:219)
at org.apache.sqoop.ConnFactory.loadManagersFromConfDir(ConnFactory.java:294)
at org.apache.sqoop.ConnFactory.instantiateFactories(ConnFactory.java:85)
at org.apache.sqoop.ConnFactory.<init>(ConnFactory.java:62)
at com.cloudera.sqoop.ConnFactory.<init>(ConnFactory.java:36)
at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:200)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:44)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
13/01/02 10:45:08 ERROR manager.CatalogQueryManager: Failed to list databases
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host <HOST>, port 1433 has failed. Error: "connect timed out. Verify the connection properties, check that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port, and that no firewall is blocking TCP connections to the port.".
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1033)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:817)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:700)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:842)
at java.sql.DriverManager.getConnection(DriverManager.java:579)
at java.sql.DriverManager.getConnection(DriverManager.java:221)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:665)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:56)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
I have connected to the database using Microsoft SQL Server Management Studio to ensure the database is operation and that that the host/username/password are all correct. Additionally I have ensured that the port is open and that MSSQL is on the other side with the following.
sudo nmap -sS -p 1433 <HOST>
Starting Nmap 5.21 ( http://nmap.org ) at 2013-01-02 11:04 PST
Nmap scan report for <HOST> (<HOST IP>)
Host is up (0.00070s latency).
rDNS record for <HOST IP: <HOST FQDN>
PORT STATE SERVICE
1433/tcp filtered ms-sql-s
Nmap done: 1 IP address (1 host up) scanned in 0.33 seconds
Any suggestions on where I should go from here? I have not been able to find any documentation on this error. Thanks
I am currently attempting to verify that the SQL server is reachable using OSQL from FreeTDS. Will update this post with my findings.
After much searching and talking to many people I have determined that this issue is being caused by the port being blocked. I am still not 100% certain why this error in particular occurred. If I try with an invalid username or password it will successfully tell me that they are invalid. It is just when it comes to making the actual query that the port is restricted. Most likely different ports are used.

Resources