Export Data from Hadoop using sql-spark-connector (Apache) - sql-server

I am trying to export data from Hadoop to MS SQL using Apache Spark SQL Connector as instructed here sql-spark-connector which fails with exception java.lang.NoSuchMethodError: com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer (Lcom/microsoft/sqlserver/jdbc/ISQLServerBulkRecord;)V
According to official documentation Supported Versions
My Development Environment:
Hadoop Version: 2.7.0
Spark Version: 2.4.5
Scala Version: 2.11.12
MS SQL Version: 2016
My Code:
package com.company.test
import org.apache.spark.sql.SparkSession
object TestETL {
def main(args:Array[String]):Unit = {
val spark:SparkSession = SparkSession
.builder()
.getOrCreate()
import spark.implicits._
// create DataFrame
val export_df = Seq(1,2,3).toDF("id")
export_df.show(5)
// Connection String
val server_name = "jdbc:sqlserver://ip_address:port"
val database_name = "database"
val url = server_name + ";" + "databaseName=" + database_name + ";"
export_df.write
.format("com.microsoft.sqlserver.jdbc.spark")
.mode("append")
.option("url", url)
.option("dbtable", "export_test")
.option("user", "username")
.option("password", "password")
.save()
}
}
My SBT
build.sbt
Command line argument I executed
/mapr/abc.company.com/user/dir/spark-2.4.5/bin/spark-submit --class com.company.test.TestETL /mapr/abc.company.com/user/dir/project/TestSparkSqlConnector.jar
JDBC Exception
I de-compiled the mssql-jdbc-8.2.0.jre8.jar to check if it is missing the SQLServerBulkCopy.writeToServer method implementation but that doesn't see to be the case.
Any insights on how I can fix this?

it is a compatability error please to reffer to this link it will explain the error or just choose compatible versions. gitHub link

Related

backend error when put file with python connector

with the connector python, the sql query work fine. However, when i use an instruction (put file file:///localfile), i have an error:
TypeError: init() missing 1 required positional argument: 'backend'
With snowSql on the same Server, it's work.
the code used:
import snowflake.connector
ctx = snowflake.connector.connect(
user='lincavo',
account='*****',
password='*******',
database='DEV_POC_VELOS_DB',
schema='DATALAB',
role='dev_data_analyst'
)
cur = ctx.cursor()
FILE_NAME = "/home/lincoln/DEV/snowflake/100003097-SC.json"
sql="PUT file:///home/lincoln/DEV/snowflake/100003097-SC.json #local_velos_json auto_compress=false"
cur.execute(sql)
snowflake-connector-python 2.6.2
can you help me please ?
thanks

Spark - Reading from SQL Server using com.microsoft.azure

I am trying to read from a table using com.microsoft.azure. Below is the code snippet
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
import com.microsoft.azure.sqldb.spark.query._
import org.apache.spark.sql.functions.to_date
val spark = SparkSession.builder().master("local[*]").appName("DbApp").getOrCreate()
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")
val config = Config(Map(
"url" -> "jdbc:sqlserver://localhost:1433",
"databaseName" -> "Student",
"dbTable" -> "dbo.MemberDetail",
"authentication" -> "SqlPassword",
"user" -> "test",
"password" -> "****"
))
val df = spark.sqlContext.read.sqlDB(config)
println("Total rows: " + df.count)
However I am getting below error
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Product$class
at com.microsoft.azure.sqldb.spark.config.SqlDBConfigBuilder.<init>(SqlDBConfigBuilder.scala:31)
at com.microsoft.azure.sqldb.spark.config.Config$.apply(Config.scala:254)
at com.microsoft.azure.sqldb.spark.config.Config$.apply(Config.scala:235)
at DbApp$.main(DbApp.scala:55)
at DbApp.main(DbApp.scala)
MSSQL JDBC Version: mssql-jdbc-7.2.2.jre8
azure-sqldb-spark version: 1.0.2
Could anyone kindly guide me what am I doing wrong.?
The class doesn't seem to be set in your config nor specified anywhere else. Class.forName just validates presence of the JDBC driver. The driver is also for microsoft.sqlserver, which is different library.
Consider using this:
import com.microsoft.sqlserver.jdbc.SQLServerDriver
import java.util.Properties
val jdbcHostname = "localhost"
val jdbcPort = 1433
val jdbcDatabase = "Student"
val jdbcTable = "dbo.MemberDetail"
val MyDBUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
val MyDBProperties = new Properties()
MyDBProperties.put("user", "test")
MyDBProperties.put("password", "****")
MyDBProperties.setProperty("Driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
val df = spark.read.jdbc(MyDBUrl, jdbcTable, MyDBProperties)
This approach was most stable in my environment (using Databricks and Azure SQL DB).
Related knowledgebase article available here.
Since you are using azure-sqldb-spark to connect to SQL server.
All connection properties in Microsoft JDBC Driver for SQL Server are supported in this connector. Add connection properties as fields in the com.microsoft.azure.sqldb.spark.config.Config object.
You don't need to create the jdbc driver Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver") again.
Your cold should be like this:
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
val config = Config(Map(
"url" -> "locaohost",
"databaseName" -> "MyDatabase",
"dbTable" -> "dbo.Clients",
"user" -> "username",
"password" -> "*********",
"connectTimeout" -> "5", //seconds
"queryTimeout" -> "5" //seconds
))
val collection = sqlContext.read.sqlDB(config)
collection.show()
Please ref:
Connect Spark to SQL DB using the connector
azure-sqldb-spark
Hope this helps.
This issue is due to version (versions are mentioned in the question itself) conflict between com.microsoft.azure.sqldb and com.microsoft.jdbc driver, after downloading com.microsoft.azure.sqldb with all its dependencies from below link it worked.
Note: com.microsoft.azure.sqldb works on Java 8, I downgraded my java runtime version.
Click here to com.microsoft.azure.sqldb with all dependencies

sbt run won't find external libraries

I have mssql as an external library defined like this in my build.sbt.
libraryDependencies ++= Seq(
...
"com.typesafe.slick" %% "slick" % "3.3.2",
"com.typesafe.slick" %% "slick-hikaricp" % "3.3.2",
"com.microsoft.sqlserver" % "mssql-jdbc" % "7.4.1.jre8"
)
Now, in order to run my main object, I do the following
sbt
run
choose the main object
Now, it seems however, that the driver, i.e. the library cannot be found.
java.lang.RuntimeException: Failed to get driver instance for jdbc
...
Caused by: java.sql.SQLException: No suitable driver
I assume it's simply not included in the class path. Any suggestions on how to fix this?
Edit: I use the following way to acquire a database connection.
object DatabaseUtils {
private val cfg: DatabaseConfig[JdbcProfile] = DatabaseConfig.forConfig("database")
def db: JdbcProfile#Backend#Database = cfg.db
}
With this configuration
database = {
profile = "slick.jdbc.SQLServerProfile$"
db {
host = "<IP>"
port = <port>
databaseName = "<dbname>"
url = "jdbc:sqlserver://"${database.db.host}":"${database.db.port}";databaseName="${database.db.databaseName}
user = "<user>"
password = "<pass>"
}
}
I think you miss the Database Driver. From the documentation:
tsql {
driver = "slick.driver.H2Driver$"
db {
connectionPool = disabled
driver = "org.h2.Driver"
url = "jdbc:h2:mem:tsql1;INIT=runscript from 'src/main/resources/create-schema.sql'"
}
}
I don't use Slick, in our project, the driver for MSSQL is com.microsoft.sqlserver.jdbc.SQLServerDriver

How correctly connect to Oracle 12g database in Play Framework?

I am new in Play Framework (Scala) and need some advise.
I use Scala 2.12 and Play Framework 2.6.20. I need to use several databases in my project. Right now I connected MySQL database as it says in documentation. How correctly connect project to remote Oracle 12g database?
application.conf:
db {
mysql.driver = com.mysql.cj.jdbc.Driver
mysql.url = "jdbc:mysql://host:port/database?characterEncoding=UTF-8"
mysql.username = "username"
mysql.password = "password"
}
First of all to lib folder I put ojdbc8.jar file from oracle website.
Then add libraryDependencies += "com.oracle" % "ojdbc8" % "12.1.0.1" code to sbt.build file. Finally I wrote settings to aplication.conf file.
After that step I notice error in terminal:
[error] (*:update) sbt.ResolveException: unresolved dependency: com.oracle#ojdbc8;12.1.0.1: not found
[error] Total time: 6 s, completed 10.11.2018 16:48:30
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
EDIT:
application.conf:
db {
mysql.driver = com.mysql.cj.jdbc.Driver
mysql.url = "jdbc:mysql://#host:#port/#database?characterEncoding=UTF-8"
mysql.username = "#username"
mysql.password = "#password"
oracle.driver = oracle.jdbc.driver.OracleDriver
oracle.url = "jdbc:oracle:thin:#host:#port/#sid"
oracle.username = "#username"
oracle.password = "#password"
}
ERROR:
play.api.UnexpectedException: Unexpected exception[CreationException: Unable to create injector, see the following errors:
1) No implementation for play.api.db.Database was bound.
while locating play.api.db.Database
for the 1st parameter of controllers.GetMarkersController.<init>(GetMarkersController.scala:14)
while locating controllers.GetMarkersController
for the 7th parameter of router.Routes.<init>(Routes.scala:45)
at play.api.inject.RoutesProvider$.bindingsFromConfiguration(BuiltinModule.scala:121):
Binding(class router.Routes to self) (via modules: com.google.inject.util.Modules$OverrideModule -> play.api.inject.guice.GuiceableModuleConversions$$anon$1)
GetMarkersController.scala:
package controllers
import javax.inject._
import akka.actor.ActorSystem
import play.api.Configuration
import play.api.mvc.{AbstractController, ControllerComponents}
import play.api.libs.ws._
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, Future, Promise}
import services._
import play.api.db.Database
class GetMarkersController #Inject()(db: Database, conf: Configuration, ws: WSClient, cc: ControllerComponents, actorSystem: ActorSystem)(implicit exec: ExecutionContext) extends AbstractController(cc) {
def getMarkersValues(start_date: String, end_date: String) = Action.async {
getValues(1.second, start_date: String, end_date: String).map {
message => Ok(message)
}
}
private def getValues(delayTime: FiniteDuration, start_date: String, end_date: String): Future[String] = {
val promise: Promise[String] = Promise[String]()
val service: GetMarkersService = new GetMarkersService(db)
actorSystem.scheduler.scheduleOnce(delayTime) {
promise.success(service.get_markers(start_date, end_date))
}(actorSystem.dispatcher)
promise.future
}
}
You cannot access Oracle without credentials. You need to have an account with Oracle. Then add something like the following to your build.sbt file
resolvers += "Oracle" at "https://maven.oracle.com"
credentials += Credentials("Oracle", "maven.oracle.com", "username", "password")
More information about accessing the OTN: https://docs.oracle.com/middleware/1213/core/MAVEN/config_maven_repo.htm#MAVEN9012
If you have the hard coded jar, you don't need to include as a dependency. See unmanagedDependencies https://www.scala-sbt.org/1.x/docs/Library-Dependencies.html

Spark query sql server

i'm trying to query SQL server using Spark/scala and running into an issue
here is the code
import org.apache.spark.SparkContext
object temp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("temp").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jdbcSqlConnStr = "jdbc:sqlserver://XXX.XXX.XXX.XXX;databaseName=test;user=XX;password=XXXXXXX;"
val jdbcDbTable = "[test].dbo.[Persons]"
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> jdbcSqlConnStr,
"dbtable" -> jdbcDbTable)).load()
jdbcDF.show(10)
println("Complete")
}
}
below is the error and i assume it is complaining about main method - but why ?how to fix it.
error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:888)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:70)
at apachetika.temp$.main(sqltemp.scala:24)
at apachetika.temp.main(sqltemp.scala)
18/09/28 16:04:40 INFO spark.SparkContext: Invoking stop() from shutdown hook
As far as I can tell this is due to a scala version mismatch
The library compiled with spark_core dependence with scala 2.11 instead of scala 2.10. Use scala 2.11.8+.
Hope this helps.

Resources