Apache Spark Connector Driver not suitable - sql-server

Trying to connect to sqlserver using Azure Apache Spark connector and getting the following error
java.sql.SQLException: No suitable driver
Databricks cluster has com.microsoft.azure:spark-mssql-connector_2.12:1.2.0 for Apache Spark 3.1.2, Scala 2.12 as stated my documentation
That's the only library I have installed on the cluster.
Went through docs on
https://github.com/microsoft/sql-spark-connector
jdbcHostname = "server name"
jdbcPort = 1433
jdbcDatabase = "database name"
Table="tbl.name"
JDBC_URL='"jdbc:sqlserver://{0}:{1};database={2}"'.format(jdbcHostname,jdbcPort,jdbcDatabase)
username="user"
password="pass"
jdbcDF = spark.read \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", JDBC_URL) \
.option("dbtable", Table) \
.option("user", username) \
.option("password", password).load()

Related

AWS EMR serverless connect to jdbc SQL Server

I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark.
So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an error.
Note: My VPC Security Group has enabled all traffic ports
So, this is my submit job,
applicationId=app_id,
executionRoleArn="my-role",
jobDriver={
"sparkSubmit": {
"entryPoint": "s3://emr-studio-rts/scripts/ms-sql-fetch.py",
"entryPointArguments": ["s3://emr-studio-rts/output"],
"sparkSubmitParameters": "--jars https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar --conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1",
}
},
configurationOverrides={
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://emr-studio-rts/logs"}
}
},
)
Now I can show the error as, for the line,
spark = SparkSession\
.builder\
.appName('test-db') \
.config('spark.driver.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
.config('spark.executor.extraClassPath', 'https://emr-studio-rts.s3.us-east-2.amazonaws.com/jars/sqljdbc42.jar') \
.config("spark.executor.cores", "1") \
.getOrCreate()
#read table data into a spark dataframe
df1 = spark.read.format("jdbc") \
.option("url", f"jdbc:sqlserver://{my_host}:1433;databaseName={my_database};") \
.option("dbtable", table_name) \
.option("user", my_user) \
.option("password", my_password) \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
.load()
as follows,
Status Details: Job failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: : com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 3.12.0.70, port 1433 has failed. Error: "Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". py4j.protocol.Py4JJavaError: An error occurred while calling o93.load.

ERROR - Import from SQL Server to GCS using Apache Sqoop & Dataproc

I am trying to import data from SQL Server to Google Cloud Storage which I will later on upload to BigQuery. I am doing all of this through Google's Cloud Shell.
I have done the initial steps of downloading Sqoop and Sql server JDBC file and downloading and then uploading it to a specific google cloud storage. I have also created a Google Dataproc cluster to submit the Sqoop job but when I try to use the submission code it throws me a couple of error.
I am following this process (https://medium.com/datamindedbe/import-sql-server-data-in-bigquery-d640441d5d56), in my case I am trying to extract one table first.Code to submit a job through dataproc
WHAT I HAVE TRIED
I do have the SQL server jdbc .jar (mssql-jdbc-8.2.1.jre8.jar) file
in cloud storage with other dependent files
I have also checked my TCP/IP connection in my SQL Server 2014 and it
is in the recommended state as the error suggests
CODE I USED TO SUBMIT A SQOOP JOB TO DATAPROC CLUSTER
CLUSTERNAME="sqoop-cluster"
BUCKET="gs://sqoop-bucket-20092021"
libs=`gsutil ls $BUCKET/jars | paste -sd, --`
JDBC_STR="jdbc:sqlserver://RUKSQLRS01:1433;databaseName=RUKDataWarehouse"
SQL_USER="RUKSQLDataWarehouse_Reporting"
SQL_PASS="gs://sqoop-bucket-20092021/creds/sqoop.password"
TABLE="LBD_Task"
SCHEMA="dbo"
gcloud dataproc jobs submit hadoop \
--region europe-west2 \
--cluster="$CLUSTERNAME"\
--jars=$libs \
--class=org.apache.sqoop.Sqoop \
-- \
import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-Dmapreduce.job.user.classpath.first=true \
--connect "$JDBC_STR" \
--username "$SQL_USER" \
--password-file "$SQL_PASS" \
--table "$SCHEMA.$TABLE" \
--warehouse-dir "$BUCKET/output/$TABLE" \
--num-mappers 1 \
--as-avrodatafile
ERROR I AM GETTING
21/09/22 11:30:46 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
21/09/22 11:30:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
21/09/22 11:30:48 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
21/09/22 11:30:48 INFO manager.SqlManager: Using default fetchSize of 1000
21/09/22 11:30:48 INFO tool.CodeGenTool: Beginning code generation
21/09/22 11:31:02 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host RUKSQLRS01, port 1433 has failed. Error: "RUKSQLRS01. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host RUKSQLRS01, port 1433 has failed. Error: "RUKSQLRS01. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:227)
at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:284)
at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2435)
at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:635)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2010)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:904)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
21/09/22 11:31:02 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1677)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
This seems to be a networking issue. Your SQL server is outside of GCP and you are trying to access it through hostname. You need to use either external IP and setup firewall rules on the SQL server side to allow accessing from GCP, or setup VPN between your GCP VPC network and the SQL server's network and access SQL server through internal IP.

How to install and scrape metric for Nginx and mssql in prometheus

I am trying to setup prometheus monitoring tool for a project. While I am able to configure prometheus to scrape server metrics with node_exporter and DNS with blackbox_exporter, I am having a hard time setting up metrics for Nginx and MsSQL in prometheus. I will appreciate any resources or walk through to get the exporter for both as well as the installation process for prometheus to scrape data. The project is hosted on digital ocean cloud.
Both appear straightforward:
The NGINX exporter
The official Microsoft SQL Server exporter
Please update your question and explain:
Exactly what you tried (include code and config)
What errors you received
Example
This would be better under e.g. Docker Compose but...
nginx.conf:
events {
use epoll;
worker_connections 128;
}
http {
server {
location = /basic_status {
stub_status;
}
}
}
Run NGINX (defaults to :80):
docker run \
--interactive --tty --rm \
--net=host \
--volume=${PWD}/nginx.conf:/etc/nginx/nginx.conf:ro \
nginx
Test it:
curl \
--request GET \
http://localhost/basic_status
Run Exporter:
docker run \
--interactive --tty --rm \
--net=host \
nginx/nginx-prometheus-exporter \
-nginx.scrape-uri=http://localhost/basic_status
Test it:
curl \
--request GET \
http://localhost:9113/metrics
Yields:
# HELP nginx_connections_accepted Accepted client connections
# TYPE nginx_connections_accepted counter
nginx_connections_accepted 1
# HELP nginx_connections_active Active client connections
# TYPE nginx_connections_active gauge
nginx_connections_active 1
# HELP nginx_connections_handled Handled client connections
# TYPE nginx_connections_handled counter
nginx_connections_handled 1
# HELP nginx_connections_reading Connections where NGINX is reading the request header
# TYPE nginx_connections_reading gauge
nginx_connections_reading 0
# HELP nginx_connections_waiting Idle client connections
# TYPE nginx_connections_waiting gauge
nginx_connections_waiting 0
# HELP nginx_connections_writing Connections where NGINX is writing the response back to the client
# TYPE nginx_connections_writing gauge
nginx_connections_writing 1
# HELP nginx_http_requests_total Total http requests
# TYPE nginx_http_requests_total counter
nginx_http_requests_total 2
# HELP nginx_up Status of the last metric scrape
# TYPE nginx_up gauge
nginx_up 1
# HELP nginxexporter_build_info Exporter build information
# TYPE nginxexporter_build_info gauge
nginxexporter_build_info{commit="5f88afbd906baae02edfbab4f5715e06d88538a0",date="2021-03-22T20:16:09Z",version="0.9.0"} 1
prometheus.yml:
scrape_configs:
# Self
- job_name: "prometheus-server"
static_configs:
- targets:
- "localhost:9090"
# NGINX
- job_name: "nginx"
static_configs:
- targets:
- "localhost:9113"
Then:
docker run \
--interactive --tty --rm \
--net=host \
--volume=${PWD}/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus:v2.26.0 \
--config.file=/etc/prometheus/prometheus.yml
Then:
Targets:
Graph:

Sqoop export failing while export parquet files from S3 to SQL Server

I am trying to export a parquet file form S3 to SQL Server using Sqoop and I get this error:
19/07/09 16:12:57 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:s3://mybucket/data-lake/serving-zone/part-00002-b5a1da42.snappy.parquet
Check that JARs for s3 datasets are on the classpath
org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:s3://mybucket/data-lake/serving-zone/part-00002-b5a1da42.snappy.parquet
Check that JARs for s3 datasets are on the classpath
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:128)
at org.kitesdk.data.Datasets.load(Datasets.java:103)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:83)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:434)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Dataset is present on the above location and no issue with path URI. I have tried to export a CSV file from same path and it worked.
Below is my Sqoop Export Command:
sqoop export --driver com.microsoft.sqlserver.jdbc.SQLServerDriver
--connection-manager org.apache.sqoop.manager.SQLServerManager
--connect "jdbc:sqlserver://localhost:1433;databaseName=salesdb"
--table DimEmployee_test --num-mappers 128
--export-dir s3://mybucket/data-lake/serving-zone/part-00002-b5a1da42.snappy.parquet
--username db-user --password mypassword
Your --connect URI seems awkward try using this format instead:
jdbc:jtds:sqlserver://<HOST>:<PORT>/<DATABASE>

Query SQL Server from Spark Scala - How to?

Env: Spark 1.6 with Scala, Cloudera
SQL Server 2012, version 11.0
I am trying to query SQL Server from Spark.
object ConnTest extends App {
val conf = new SparkConf()
val sc = new SparkContext(conf.setAppName("Spark Ingestion").setMaster("local[*]"))
val sqlcontext = new SQLContext(sc)
val prop=new Properties()
val url2="jdbc:sqlserver://xxx.xxx.xxx:1511;user=username;password=mypassword;database=SessionMonitor"
prop.setProperty("user","username")
prop.setProperty("password","mypassword")
val test=sqlcontext.read.jdbc(url2,"Service",prop)
val dd = sqlcontext.sql("select count(*) as TOT from Service")
dd.foreach(println)
}
My pom.xml has dependencies-
<!-- https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc -->
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>6.1.0.jre8</version>
</dependency>
I did not download any jar file; nor install jar to maven repository nor add jar to the class path. My Hadoop cluster does not have a connection to the internet. After creating the maven package, I tried to submit using
spark-submit --class ConnTest /Hadoopshare/tmp/sqldb-1.0-SNAPSHOT.jar
Error:
Exception in thread "main" java.sql.SQLException: No suitable driver
This should be added to your code:
prop.setProperty("driver" , "com.mysql.jdbc.Driver")
In my case i used this and it totally worked fine:
val jdbcDF = sqlContext.read
.format("jdbc")
.option("driver" , "com.mysql.jdbc.Driver")
.option("url", "jdbc:mysql://<<>Servername>:3306/<<DatabaseName>>")
.option("dbtable", "(SELECT id, name FROM partner) tmp")
.option("user", "username")
.option("password", "******")
.load()
Hope this should work.

Resources