Spark query sql server - sql-server

i'm trying to query SQL server using Spark/scala and running into an issue
here is the code
import org.apache.spark.SparkContext
object temp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("temp").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jdbcSqlConnStr = "jdbc:sqlserver://XXX.XXX.XXX.XXX;databaseName=test;user=XX;password=XXXXXXX;"
val jdbcDbTable = "[test].dbo.[Persons]"
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> jdbcSqlConnStr,
"dbtable" -> jdbcDbTable)).load()
jdbcDF.show(10)
println("Complete")
}
}
below is the error and i assume it is complaining about main method - but why ?how to fix it.
error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:888)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:70)
at apachetika.temp$.main(sqltemp.scala:24)
at apachetika.temp.main(sqltemp.scala)
18/09/28 16:04:40 INFO spark.SparkContext: Invoking stop() from shutdown hook

As far as I can tell this is due to a scala version mismatch
The library compiled with spark_core dependence with scala 2.11 instead of scala 2.10. Use scala 2.11.8+.
Hope this helps.

Related

Snowflake Scala Connector(snowpark)

I am new to scala and trying to create a snowflake connector in scala using snowflake snowpark scala library.
Here is my simple code
package com.abc.commons.rest.snowflake
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
object ScalaConnector {
def main(args: Array[String]): Unit = {
// Replace the <placeholders> below.
val configs = Map (
"URL" -> "https://xxx.snowflakecomputing.com:443",
"USER" -> "xxx",
"PASSWORD" -> "xxx",
"ROLE" -> "xxx",
"WAREHOUSE" -> "xxx",
"DB" -> "xxx",
"SCHEMA" -> "xxx"
)
val session = Session.builder.configs(configs).create
session.sql("show tables").show()
session.close();
}
}
I have provided correct credentials and then on running the above code in IntelliJ I am getting below errors:
Exception in thread "main" java.lang.ExceptionInInitializerError
at com.abc.commons.rest.snowflake.ScalaConnector$.main(ScalaConnector.scala:17)
at com.abc.commons.rest.snowflake.ScalaConnector.main(ScalaConnector.scala)
Caused by: java.lang.NullPointerException
at com.snowflake.snowpark.Session$.getActiveSession(Session.scala:1204)
at com.snowflake.snowpark.SnowparkClientException.<init>(SnowparkClientException.scala:15)
at com.snowflake.snowpark.internal.ErrorMessage$.createException(ErrorMessage.scala:380)
at com.snowflake.snowpark.internal.ErrorMessage$.MISC_SCALA_VERSION_NOT_SUPPORTED(ErrorMessage.scala:340)
at com.snowflake.snowpark.internal.Utils$.checkScalaVersionCompatibility(Utils.scala:243)
at com.snowflake.snowpark.internal.Utils$.checkScalaVersionCompatibility(Utils.scala:233)
at com.snowflake.snowpark.Session$.<init>(Session.scala:1129)
at com.snowflake.snowpark.Session$.<clinit>(Session.scala)
... 2 more
libraryDependencies I use in build.sbt
"com.snowflake" % "snowpark" % "1.4.0"
Can anyone point out the issue in my code?
There's no issue with the code, you're just using an unsupported Scala version. From your stacktrace:
MISC_SCALA_VERSION_NOT_SUPPORTED
We only support Scala 2.12 as explained here.

Spark - Reading from SQL Server using com.microsoft.azure

I am trying to read from a table using com.microsoft.azure. Below is the code snippet
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
import com.microsoft.azure.sqldb.spark.query._
import org.apache.spark.sql.functions.to_date
val spark = SparkSession.builder().master("local[*]").appName("DbApp").getOrCreate()
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")
val config = Config(Map(
"url" -> "jdbc:sqlserver://localhost:1433",
"databaseName" -> "Student",
"dbTable" -> "dbo.MemberDetail",
"authentication" -> "SqlPassword",
"user" -> "test",
"password" -> "****"
))
val df = spark.sqlContext.read.sqlDB(config)
println("Total rows: " + df.count)
However I am getting below error
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Product$class
at com.microsoft.azure.sqldb.spark.config.SqlDBConfigBuilder.<init>(SqlDBConfigBuilder.scala:31)
at com.microsoft.azure.sqldb.spark.config.Config$.apply(Config.scala:254)
at com.microsoft.azure.sqldb.spark.config.Config$.apply(Config.scala:235)
at DbApp$.main(DbApp.scala:55)
at DbApp.main(DbApp.scala)
MSSQL JDBC Version: mssql-jdbc-7.2.2.jre8
azure-sqldb-spark version: 1.0.2
Could anyone kindly guide me what am I doing wrong.?
The class doesn't seem to be set in your config nor specified anywhere else. Class.forName just validates presence of the JDBC driver. The driver is also for microsoft.sqlserver, which is different library.
Consider using this:
import com.microsoft.sqlserver.jdbc.SQLServerDriver
import java.util.Properties
val jdbcHostname = "localhost"
val jdbcPort = 1433
val jdbcDatabase = "Student"
val jdbcTable = "dbo.MemberDetail"
val MyDBUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
val MyDBProperties = new Properties()
MyDBProperties.put("user", "test")
MyDBProperties.put("password", "****")
MyDBProperties.setProperty("Driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
val df = spark.read.jdbc(MyDBUrl, jdbcTable, MyDBProperties)
This approach was most stable in my environment (using Databricks and Azure SQL DB).
Related knowledgebase article available here.
Since you are using azure-sqldb-spark to connect to SQL server.
All connection properties in Microsoft JDBC Driver for SQL Server are supported in this connector. Add connection properties as fields in the com.microsoft.azure.sqldb.spark.config.Config object.
You don't need to create the jdbc driver Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver") again.
Your cold should be like this:
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
val config = Config(Map(
"url" -> "locaohost",
"databaseName" -> "MyDatabase",
"dbTable" -> "dbo.Clients",
"user" -> "username",
"password" -> "*********",
"connectTimeout" -> "5", //seconds
"queryTimeout" -> "5" //seconds
))
val collection = sqlContext.read.sqlDB(config)
collection.show()
Please ref:
Connect Spark to SQL DB using the connector
azure-sqldb-spark
Hope this helps.
This issue is due to version (versions are mentioned in the question itself) conflict between com.microsoft.azure.sqldb and com.microsoft.jdbc driver, after downloading com.microsoft.azure.sqldb with all its dependencies from below link it worked.
Note: com.microsoft.azure.sqldb works on Java 8, I downgraded my java runtime version.
Click here to com.microsoft.azure.sqldb with all dependencies

Stuck at: Could not find a suitable table factory

While playing around with Flink, I have been trying to upsert data into Elasticsearch. I'm having this error on my STDOUT:
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSinkFactory' in
the classpath.
Reason: Required context properties mismatch.
The following properties are requested:
connector.hosts=http://elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200
connector.index=transfers-sum
connector.key-null-literal=n/a
connector.property-version=1
connector.type=elasticsearch
connector.version=6
format.json-schema={ \"curr_careUnit\": {\"type\": \"text\"}, \"sum\": {\"type\": \"float\"} }
format.property-version=1
format.type=json
schema.0.data-type=VARCHAR(2147483647)
schema.0.name=curr_careUnit
schema.1.data-type=FLOAT
schema.1.name=sum
update-mode=upsert
The following factories have been considered:
org.apache.flink.streaming.connectors.kafka.Kafka09TableSourceSinkFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
at org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:322)
...
Here is what I have in my scala Flink code:
def main(args: Array[String]) {
// Create streaming execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
// Set properties per KafkaConsumer API
val properties = new Properties()
properties.setProperty("bootstrap.servers", "kafka.kafka:9092")
properties.setProperty("group.id", "test")
// Add Kafka source to environment
val myKConsumer = new FlinkKafkaConsumer010[String]("raw.data4", new SimpleStringSchema(), properties)
// Read from beginning of topic
myKConsumer.setStartFromEarliest()
val streamSource = env
.addSource(myKConsumer)
// Transform CSV (with a header row per Kafka event into a Transfers object
val streamTransfers = streamSource.map(new TransfersMapper())
// create a TableEnvironment
val tEnv = StreamTableEnvironment.create(env)
// register a Table
val tblTransfers: Table = tEnv.fromDataStream(streamTransfers)
tEnv.createTemporaryView("transfers", tblTransfers)
tEnv.connect(
new Elasticsearch()
.version("6")
.host("elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local", 9200, "http")
.index("transfers-sum")
.keyNullLiteral("n/a")
.withFormat(new Json().jsonSchema("{ \"curr_careUnit\": {\"type\": \"text\"}, \"sum\": {\"type\": \"float\"} }"))
.withSchema(new Schema()
.field("curr_careUnit", DataTypes.STRING())
.field("sum", DataTypes.FLOAT())
)
.inUpsertMode()
.createTemporaryTable("transfersSum")
val result = tEnv.sqlQuery(
"""
|SELECT curr_careUnit, sum(los)
|FROM transfers
|GROUP BY curr_careUnit
|""".stripMargin)
result.insertInto("transfersSum")
env.execute("Flink Streaming Demo Dump to Elasticsearch")
}
}
I am creating a fat jar and uploading it to my remote flink instance. Here is my build.gradle dependencies:
compile 'org.scala-lang:scala-library:2.11.12'
compile 'org.apache.flink:flink-scala_2.11:1.10.0'
compile 'org.apache.flink:flink-streaming-scala_2.11:1.10.0'
compile 'org.apache.flink:flink-connector-kafka-0.10_2.11:1.10.0'
compile 'org.apache.flink:flink-table-api-scala-bridge_2.11:1.10.0'
compile 'org.apache.flink:flink-connector-elasticsearch6_2.11:1.10.0'
compile 'org.apache.flink:flink-json:1.10.0'
compile 'com.fasterxml.jackson.core:jackson-core:2.10.1'
compile 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.10.1'
compile 'org.json4s:json4s-jackson_2.11:3.7.0-M1'
Here is how the farJar command is built for gradle:
jar {
from {
(configurations.compile).collect {
it.isDirectory() ? it : zipTree(it)
}
}
manifest {
attributes("Main-Class": "main" )
}
}
task fatJar(type: Jar) {
zip64 true
manifest {
attributes 'Main-Class': "flinkNamePull.Demo"
}
baseName = "${rootProject.name}"
from { configurations.compile.collect { it.isDirectory() ? it : zipTree(it) } }
with jar
}
Could anybody please help me to see what I am missing? I'm fairly new to Flink and data streaming in general. Hehe
Thank you in advance!
Is the list in The following factories have been considered: complete? Does it contain Elasticsearch6UpsertTableSinkFactory? If not as far as I can tell there is a problem with the service discovery dependencies.
How do you submit your job? Can you check if you have a file META-INF/services/org.apache.flink.table.factories.TableFactory in the uber jar with an entry for Elasticsearch6UpsertTableSinkFactory?
When using maven you have to add a transformer to properly merge service files:
<!-- The service transformer is needed to merge META-INF/services files -->
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
I don't know how do you do it in gradle.
EDIT:
Thanks to Arvid Heise
In gradle when using shadowJar plugin you can merge service files via:
// Merging Service Files
shadowJar {
mergeServiceFiles()
}
You should use the shadow plugin to create the fat jar instead of doing it manually.
In particular, you want to merge service descriptors.

How correctly connect to Oracle 12g database in Play Framework?

I am new in Play Framework (Scala) and need some advise.
I use Scala 2.12 and Play Framework 2.6.20. I need to use several databases in my project. Right now I connected MySQL database as it says in documentation. How correctly connect project to remote Oracle 12g database?
application.conf:
db {
mysql.driver = com.mysql.cj.jdbc.Driver
mysql.url = "jdbc:mysql://host:port/database?characterEncoding=UTF-8"
mysql.username = "username"
mysql.password = "password"
}
First of all to lib folder I put ojdbc8.jar file from oracle website.
Then add libraryDependencies += "com.oracle" % "ojdbc8" % "12.1.0.1" code to sbt.build file. Finally I wrote settings to aplication.conf file.
After that step I notice error in terminal:
[error] (*:update) sbt.ResolveException: unresolved dependency: com.oracle#ojdbc8;12.1.0.1: not found
[error] Total time: 6 s, completed 10.11.2018 16:48:30
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
EDIT:
application.conf:
db {
mysql.driver = com.mysql.cj.jdbc.Driver
mysql.url = "jdbc:mysql://#host:#port/#database?characterEncoding=UTF-8"
mysql.username = "#username"
mysql.password = "#password"
oracle.driver = oracle.jdbc.driver.OracleDriver
oracle.url = "jdbc:oracle:thin:#host:#port/#sid"
oracle.username = "#username"
oracle.password = "#password"
}
ERROR:
play.api.UnexpectedException: Unexpected exception[CreationException: Unable to create injector, see the following errors:
1) No implementation for play.api.db.Database was bound.
while locating play.api.db.Database
for the 1st parameter of controllers.GetMarkersController.<init>(GetMarkersController.scala:14)
while locating controllers.GetMarkersController
for the 7th parameter of router.Routes.<init>(Routes.scala:45)
at play.api.inject.RoutesProvider$.bindingsFromConfiguration(BuiltinModule.scala:121):
Binding(class router.Routes to self) (via modules: com.google.inject.util.Modules$OverrideModule -> play.api.inject.guice.GuiceableModuleConversions$$anon$1)
GetMarkersController.scala:
package controllers
import javax.inject._
import akka.actor.ActorSystem
import play.api.Configuration
import play.api.mvc.{AbstractController, ControllerComponents}
import play.api.libs.ws._
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, Future, Promise}
import services._
import play.api.db.Database
class GetMarkersController #Inject()(db: Database, conf: Configuration, ws: WSClient, cc: ControllerComponents, actorSystem: ActorSystem)(implicit exec: ExecutionContext) extends AbstractController(cc) {
def getMarkersValues(start_date: String, end_date: String) = Action.async {
getValues(1.second, start_date: String, end_date: String).map {
message => Ok(message)
}
}
private def getValues(delayTime: FiniteDuration, start_date: String, end_date: String): Future[String] = {
val promise: Promise[String] = Promise[String]()
val service: GetMarkersService = new GetMarkersService(db)
actorSystem.scheduler.scheduleOnce(delayTime) {
promise.success(service.get_markers(start_date, end_date))
}(actorSystem.dispatcher)
promise.future
}
}
You cannot access Oracle without credentials. You need to have an account with Oracle. Then add something like the following to your build.sbt file
resolvers += "Oracle" at "https://maven.oracle.com"
credentials += Credentials("Oracle", "maven.oracle.com", "username", "password")
More information about accessing the OTN: https://docs.oracle.com/middleware/1213/core/MAVEN/config_maven_repo.htm#MAVEN9012
If you have the hard coded jar, you don't need to include as a dependency. See unmanagedDependencies https://www.scala-sbt.org/1.x/docs/Library-Dependencies.html

Error in Jenkins using SQL Server Driver

I'm trying to run a Groovy script to connect to Microsoft SQL Server within Jenkins and insert new data . I want to use the SQL Server driver and I placed the driver in Jenkins\war\WEB-INF\lib. I get an error when I Tried to run the following code in the build step - Execute Groovy Script:
import groovy.sql.Sql
import com.microsoft.sqlserver.jdbc.SQLServerDriver
class Connection {
def sqlConnection
def route = "xxxx"
def user = "xxxx"
def password = "xxxxx"
def driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
Connection(db){
this.route+=db.toString()
this.sqlConnection = Sql.newInstance( route, user, password, driver )
}
static main(args) {
Connection con = new Connection("nameDataBase")
}
}
The error is:
1 error org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
E:\Jenkins\workspace\pruebaDB\hudson1035758401251924782.groovy: 2:
unable to resolve class com.microsoft.sqlserver.jdbc.SQLServerDriver # line 2, column 1.
import com.microsoft.sqlserver.jdbc.SQLServerDriver`
The most evident case of such exception is that necessary class not yet in classpath during script execution.
During a which build step script is executed?
If my assumptions are right and you can't shift script execution time to the other build step (when all necessary classes/libs will be in the cp) try to use class loader.
Here is a few links which should help you with this:
class lodding fun
class loading in groovy

Resources