Spark - jdbc write fails in Yarn cluster mode but works in spark-shell

Spark - jdbc write fails in Yarn cluster mode but works in spark-shell - sql-server

I am using Spark 1.6.2, Hadoop 2.6, Scala 2.10.5 and Java 1.7
I am using JDBC to read data from MSSQL and this works without any problem:
val hqlContext = new HiveContext(sc)
val url = "jdbc:sqlserver://1.1.1.1:1111;database=CIQOwnershipProcessing;user=OwnershipUser;password=Ownership123"
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
val df1 = hqlContext.read.format("jdbc").options(
Map("url" -> url, "driver" -> driver,
"dbtable" -> "(select * from OwnershipStandardization_PositionSequence_tbl) as ps")).load()
And, while writing back dataframe to MSSQL, I am using the JDBC write as shown below. This works fine in Spark-shell but fails when I do spark-submit in Yarn-Cluster mode. What am I missing ?
val prop = new java.util.Properties
df1.write.mode("Overwrite").jdbc(url, "CIQOwnershipProcessing.dbo.df_sparkop",prop)
This is how my spark-submit command looks like. As you can see, I am passing the SQLJDBC jar path too. And, I have also specified the jdbc jar path in "spark.executor.extraClassPath" property in spark-defaults.conf on all nodes of the cluster. Since the JDBC read is working, I doubt if it has anything to do with the classpaths.
spark-submit --class com.spgmi.csd.OshpStdCarryOver --master yarn --deploy-mode cluster --conf spark.yarn.executor.memoryOverhead=2048 --num-executors 1 --executor-cores 2 --driver-memory 3g --executor-memory 8g --jars $SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,/usr/share/java/sqljdbc_4.1/enu/sqljdbc41.jar --files $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/lib/spark-poc2-17.1.0.jar
The error thrown in the Yarn-Cluster mode is:
17/01/05 10:21:31 ERROR yarn.ApplicationMaster: User class threw
exception: java.lang.InstantiationException:
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper
java.lang.InstantiationException:
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper
at java.lang.Class.newInstance(Class.java:368)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:46)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:53)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:278)
at com.spgmi.csd.OshpStdCarryOver$.main(SparkOshpStdCarryOver.scala:175)
at com.spgmi.csd.OshpStdCarryOver.main(SparkOshpStdCarryOver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558)

I was facing the same issue. I resolved it by setting connection property in prop.
val prop = new java.util.Properties
prop.setProperty("driver","com.mysql.jdbc.Driver")
now pass this prop in
df1.write.mode("Overwrite").jdbc(url, "CIQOwnershipProcessing.dbo.df_sparkop",prop)

Your problem feels very similar to SPARK-14204 and SPARK-14162 -- although that bug was supposed to be fixed in Spark 1.6.2 (?!)
With a Type 4 JDBC driver you should not have to explicitly mention the "driver" property; the JAR should automatically register the URL prefix that it supports (here jdbc:sqlserver:).
But because of the bug, the Spark JDBC module may not use that "registration" to find the driver that implicitly matches the URL.
In other words: for reading, you force the "driver" property and the connection works; for writing, you don't force it, and it does not work. Aha!

Related

Unable to use extended_choice_parameter.ExtendedChoiceParameterValue for security reasons in jenkins

please find the error while trying to pass the parameters during build -
java.lang.UnsupportedOperationException: Refusing to marshal com.cwctravel.hudson.plugins.extended_choice_parameter.ExtendedChoiceParameterValue for security reasons; see https://jenkins.io/redirect/class-filter/
at hudson.util.XStream2$BlacklistedTypesConverter.marshal(XStream2.java:541)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88)
at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:64)
at com.thoughtworks.xstream.converters.collections.CollectionConverter.marshal(CollectionConverter.java:74)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84)
at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:264)
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:251)
Caused: java.lang.RuntimeException: Failed to serialize hudson.model.ParametersAction#parameters for class hudson.model.ParametersAction
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:255)
at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:223)
at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138)
at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209)
at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88)
at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:64)
at com.thoughtworks.xstream.converters.collections.CollectionConverter.marshal(CollectionConverter.java:74)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84)
at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:264)
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:251)
Caused: java.lang.RuntimeException: Failed to serialize hudson.model.Actionable#actions for class org.jenkinsci.plugins.workflow.job.WorkflowRun
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:255)
at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:223)
at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138)
at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209)
at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:82)
at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37)
at com.thoughtworks.xstream.XStream.marshal(XStream.java:1026)
at com.thoughtworks.xstream.XStream.marshal(XStream.java:1015)
at com.thoughtworks.xstream.XStream.toXML(XStream.java:988)
at hudson.XmlFile.write(XmlFile.java:195)
at hudson.model.Run.save(Run.java:2077)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.forRun(EnvActionImpl.java:136)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:149)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:142)
at org.jenkinsci.plugins.workflow.cps.CpsScript.getProperty(CpsScript.java:121)
at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:174)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.getProperty(ScriptBytecodeAdapter.java:456)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:355)
at org.kohsuke.groovy.sandbox.GroovyInterceptor.onGetProperty(GroovyInterceptor.java:68)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:354)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:353)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:357)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.getProperty(SandboxInvoker.java:29)
at com.cloudbees.groovy.cps.impl.PropertyAccessBlock.rawGet(PropertyAccessBlock.java:20)
Caused: java.io.IOException
at hudson.XmlFile.write(XmlFile.java:202)
at hudson.model.Run.save(Run.java:2077)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.forRun(EnvActionImpl.java:136)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:149)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:142)
at org.jenkinsci.plugins.workflow.cps.CpsScript.getProperty(CpsScript.java:121)
at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:174)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.getProperty(ScriptBytecodeAdapter.java:456)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:355)
at org.kohsuke.groovy.sandbox.GroovyInterceptor.onGetProperty(GroovyInterceptor.java:68)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:354)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:353)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:357)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.getProperty(SandboxInvoker.java:29)
at com.cloudbees.groovy.cps.impl.PropertyAccessBlock.rawGet(PropertyAccessBlock.java:20)
at WorkflowScript.run(WorkflowScript:8)
at ___cps.transform___(Native Method)
at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.get(PropertyishBlock.java:74)
at com.cloudbees.groovy.cps.LValueBlock$GetAdapter.receive(LValueBlock.java:30)
at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.fixName(PropertyishBlock.java:66)
at jdk.internal.reflect.GeneratedMethodAccessor629.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
at com.cloudbees.groovy.cps.Next.step(Next.java:83)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:237)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:331)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:243)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:231)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:136)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Finished: FAILURE
We are stated to see security issue related to extended choice parameters in jenkins ,which gets json array as input. Please find the error . Can you please help on this. FYI There was no jenkins or plugins update done recently. This stared happening all of the sudden (2 weeks back) and all the "extended choice parameters" got deleted without any clue in jenkins build.
extended choice parameters VERSION - 0.78
jenkins VERSION - 2.263.1

Issue has been fixed by replacing the extended choice parameter jar file with the latest version which had fix to extended_choice_parameter.ExtendedChoiceParameterValue parameters. Also, make sure to White list "export JDK_JAVA_OPTIONS="Dhudson.remoting.ClassFilter=com.cwctravel.hudson.plugins.extended_choice_parameter.ExtendedChoiceParameterValue" in .bash_profile of your jenkins machine

[flink]Task manager initialization failed

I am new to flink. I am trying to run the flink example on my local PC(windows).
However, after I run the start-cluster.bat, I login to the dashboard, it shows the task manager is 0.
I checked the log and seems it fails to initialize:
2020-02-21 23:03:14,202 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - TaskManager initialization failed.
org.apache.flink.configuration.IllegalConfigurationException: Failed to create TaskExecutorResourceSpec
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpec.FromConfig(TaskExecutorResourceUtils.java:72)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerSecurely$2(TaskManagerRunner.java:322)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerSecurely(TaskManagerRunner.java:321)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:287)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: The required configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback keys: []) is not set
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84)
at java.util.Arrays$ArrayList.forEach(Arrays.java:3880)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
... 7 more
2020-02-21 23:03:14,217 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
Basically, it looks like a required option 'taskmanager.cpu.cores' is not set. However, I can't find this property in flink-conf.yaml and in the document(https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html) either.
I am using flink 1.10.0. Any help would be highly appreciated!

That configuration option is intended for internal use only -- it shouldn't be user configured, which is why it isn't documented.
The windows start-cluster.bat is failing because of a bug introduced in Flink 1.10. See https://jira.apache.org/jira/browse/FLINK-15925.
One workaround is to use the bash script, start-cluster.sh, instead.
See also this mailing list thread: https://lists.apache.org/thread.html/r7693d0c06ac5ced9a34597c662bcf37b34ef8e799c32cc0edee373b2%40%3Cdev.flink.apache.org%3E

Flink: Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath

I am using flink's table api, I receive data from kafka, then register it as
a table, then I use sql statement to process, and finally convert the result
back to a stream, write to a directory, the code looks like this:
def main(args: Array[String]): Unit = {
val sEnv = StreamExecutionEnvironment.getExecutionEnvironment
sEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val tEnv = TableEnvironment.getTableEnvironment(sEnv)
tEnv.connect(
new Kafka()
.version("0.11")
.topic("user")
.startFromEarliest()
.property("zookeeper.connect", "")
.property("bootstrap.servers", "")
)
.withFormat(
new Json()
.failOnMissingField(false)
.deriveSchema() //使用表的 schema
)
.withSchema(
new Schema()
.field("username_skey", Types.STRING)
)
.inAppendMode()
.registerTableSource("user")
val userTest: Table = tEnv.sqlQuery(
"""
select ** form ** join **"".stripMargin)
val endStream = tEnv.toRetractStream[Row](userTest)
endStream.writeAsText("/tmp/sqlres",WriteMode.OVERWRITE)
sEnv.execute("Test_New_Sign_Student")
}
I was successful in the local test, but when I submit the following command
in the cluster, I get the following error:
=======================================================
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error.
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:426)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
at
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could
not find a suitable table factory for
'org.apache.flink.table.factories.DeserializationSchemaFactory' in
the classpath.
Reason: No factory implements
'org.apache.flink.table.factories.DeserializationSchemaFactory'.
The following properties are requested:
connector.properties.0.key=zookeeper.connect
....
schema.9.name=roles
schema.9.type=VARCHAR
update-mode=append
The following factories have been considered:
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.streaming.connectors.kafka.Kafka011TableSourceSinkFactory
at
org.apache.flink.table.factories.TableFactoryService$.filterByFactoryClass(TableFactoryService.scala:176)
at
org.apache.flink.table.factories.TableFactoryService$.findInternal(TableFactoryService.scala:125)
at
org.apache.flink.table.factories.TableFactoryService$.find(TableFactoryService.scala:100)
at
org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.scala)
at
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:259)
at
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:144)
at
org.apache.flink.table.factories.TableFactoryUtil$.findAndCreateTableSource(TableFactoryUtil.scala:50)
at
org.apache.flink.table.descriptors.ConnectTableDescriptor.registerTableSource(ConnectTableDescriptor.scala:44)
at
org.clay.test.Test_New_Sign_Student$.main(Test_New_Sign_Student.scala:64)
at
org.clay.test.Test_New_Sign_Student.main(Test_New_Sign_Student.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
===================================
Can someone tell me what caused this? I am very confused about this........

if you are using maven-shade-plugin, make sure SPI transformer is placed.
Flink uses java Service Provider to discover Source/Sink connector.
Without this transformer, you will 100% encoutner "org.apache.flink.table.api.NoMatchingTableFactoryException: Could
not find a suitable table factory", which happened on me.
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#update-mode
flink officially points out this, search "SPI" on this page
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>

You have to add the JAR dependencies of the connectors (Kafka) and formats (JSON) that you are using to the classpath of your program, i.e., either build a fat JAR that includes them or provide them to the classpath of the Flink cluster by copying them in the ./lib folder.
Check the Flink documentation for links to download the respective dependencies.

I have the met the same problem, just add parameters --connector.type kafka when you run your application will solve this. see enter link description here

Zeppelin 0.7.2 org.apache.thrift.transport.TTransportException with Spark and HighCharts

If I add this artifact to Zeppelin com.knockdata:spark-highcharts:0.6.4 it gives the error org.apache.thrift.transport.TTransportException
Even a simple example like this causes the error:
val x = Array(1,2,3,4)
val rdd = sc.parallelize(x)
The problem is definitely related to %spark as %md and %sh work. I have Spark version spark-2.1.0-bin-hadoop2.6.
There are no messages in the Spark logs. In zeppelin-interpreter-spark-root-(hostname).log it says:
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_ARRAY but was
BEGIN_OBJECT at line 1 column 2
at com.google.gson.Gson.fromJson(Gson.java:802)
at com.google.gson.Gson.fromJson(Gson.java:757)
at com.google.gson.Gson.fromJson(Gson.java:706)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.convert(RemoteInterprete
rServer.java:425)
org.apache.zeppelin.interpreter.InterpreterException: Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.wrapRefArray([Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;

spark-highcharts:0.6.4 does not support zeppelin:0.7.2. There is a dependency from spark-highcharts which clearly state which zeppelin version to use and it is not binary compatible. That is why the error reported.
The version has been bumped to spark-highcharts:0.6.5 to support zeppelin:0.7.2(spark:2.1).

Problems accessing datasource in Karaf 4 from Apache Camel

I've created a datasource in Karaf 4 (ServiceMix 7) that works from the karaf console - I can list tables, execute queries and so on.
My issue is when I try to use it from my Camel route.
Excerpt from my blueprint:
...
<reference filter="(osgi.jndi.service.name=jdbc/erp)" id="erpDataSource" interface="javax.sql.DataSource"/>
...
<to id="erpSelectQuery" uri="jdbc:erpDataSource"/>
...
It finds my dataSource but the blueprint can't start due to:
"java.lang.IllegalArgumentException: connectionFactory must be specified"
My datasource was created using:
jdbc:ds-create -dbName erp -dt DataSource -dn mysql -u erp -dc com.mysql.jdbc.Driver -p pre jdbc/erp
I'm at loss here

I have never done it via the jdbc command syntax I followed the guides from the Ops4J Wiki On Datasource creation which I like for one reason alone, this method creates a simple text file that can be administered by not just a Java developer, i.e. it is easier to modify and troubleshoot.
For the sake of not subjecting my answer to link rot I will just outline the procedure here.
Create a datasource configuration file(simple text file) in /servicemixhome/etc with the following naming convention org.ops4j.datasource-give_your_datasource_a_name.cfg .
In the config file configure the appropriate settings an example of mine looks like this:
osgi.jdbc.driver.class = com.mysql.jdbc.Driver
databaseName=dhData
user=foo
url=jdbc:mysql://192.199.199.199:3306/dhData
password=somepassword
dataSourceName=myDSName
Make sure you installed the ops4j required features:
feature:install pax-jdbc-mysql pax-jdbc-config
Now list the datasources using the following syntax:
karaf#root()> service:list javax.sql.DataSource
This will echo something like the list below back.
[javax.sql.DataSource]
----------------------
osgi.jdbc.driver.class = com.mysql.jdbc.Driver
databaseName=dhData
user=foo
url=jdbc:mysql://192.199.199.199:3306/dhData
password=somepassword
dataSourceName=myDSName
Provided by :
OPS4J Pax JDBC Config (216)
At this point you can reference the datasource usign an osgi filter in the blueprint.xml with the following syntax:
<reference filter="(&(objectClass=javax.sql.DataSource)(dataSourceName=myDSName ))" id="myData" interface="javax.sql.DataSource"/>
Then to reference this as property of a bean for example you could do the following:
<bean class="foo.bar" id="ImsCbrEventsBean">
<property name="dataSource" ref="myData"/>
</bean>
Keep in mind this creates a singular connection to a database and you should really create a connection pool.
This can be done by installing the pax-jdbc-pool-dbcp2 feature or any of the other connection pools but use only one at a time, then modifying the datasource config file to carry appropriate information like the example below:
osgi.jdbc.driver.name = mysql
databaseName=dhData
user=foo
url=jdbc:mysql://192.199.199.199:3306/dhData
password=somepassword
dataSourceName=myDSName
jdbc.pool.maxTotal=32
jdbc.pool.blockWhenExhausted=true
jdbc.pool.lifo=false
jdbc.pool.maxIdle=24
jdbc.pool.maxWaitMillis=5000
jdbc.pool.minEvictableIdleTimeMillis=1800000
jdbc.pool.minIdle=16
jdbc.pool.numTestsPerEvictionRun=3
jdbc.pool.softMinEvictableIdleTimeMillis=-1
jdbc.pool.testOnBorrow=true
jdbc.pool.testOnCreate=true
jdbc.pool.testOnReturn=true
jdbc.pool.testWhileIdle=true
jdbc.pool.timeBetweenEvictionRunsMillis=3600000

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Spark - jdbc write fails in Yarn cluster mode but works in spark-shell - sql-server

I was facing the same issue. I resolved it by setting connection property in prop. val prop = new java.util.Properties prop.setProperty("driver","com.mysql.jdbc.Driver") now pass this prop in df1.write.mode("Overwrite").jdbc(url, "CIQOwnershipProcessing.dbo.df_sparkop",prop)

Related

Unable to use extended_choice_parameter.ExtendedChoiceParameterValue for security reasons in jenkins

[flink]Task manager initialization failed

Flink: Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath

Zeppelin 0.7.2 org.apache.thrift.transport.TTransportException with Spark and HighCharts

Problems accessing datasource in Karaf 4 from Apache Camel

Categories

Resources