Stuck at: Could not find a suitable table factory - apache-flink

While playing around with Flink, I have been trying to upsert data into Elasticsearch. I'm having this error on my STDOUT:
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSinkFactory' in
the classpath.
Reason: Required context properties mismatch.
The following properties are requested:
connector.hosts=http://elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200
connector.index=transfers-sum
connector.key-null-literal=n/a
connector.property-version=1
connector.type=elasticsearch
connector.version=6
format.json-schema={ \"curr_careUnit\": {\"type\": \"text\"}, \"sum\": {\"type\": \"float\"} }
format.property-version=1
format.type=json
schema.0.data-type=VARCHAR(2147483647)
schema.0.name=curr_careUnit
schema.1.data-type=FLOAT
schema.1.name=sum
update-mode=upsert
The following factories have been considered:
org.apache.flink.streaming.connectors.kafka.Kafka09TableSourceSinkFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
at org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:322)
...
Here is what I have in my scala Flink code:
def main(args: Array[String]) {
// Create streaming execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
// Set properties per KafkaConsumer API
val properties = new Properties()
properties.setProperty("bootstrap.servers", "kafka.kafka:9092")
properties.setProperty("group.id", "test")
// Add Kafka source to environment
val myKConsumer = new FlinkKafkaConsumer010[String]("raw.data4", new SimpleStringSchema(), properties)
// Read from beginning of topic
myKConsumer.setStartFromEarliest()
val streamSource = env
.addSource(myKConsumer)
// Transform CSV (with a header row per Kafka event into a Transfers object
val streamTransfers = streamSource.map(new TransfersMapper())
// create a TableEnvironment
val tEnv = StreamTableEnvironment.create(env)
// register a Table
val tblTransfers: Table = tEnv.fromDataStream(streamTransfers)
tEnv.createTemporaryView("transfers", tblTransfers)
tEnv.connect(
new Elasticsearch()
.version("6")
.host("elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local", 9200, "http")
.index("transfers-sum")
.keyNullLiteral("n/a")
.withFormat(new Json().jsonSchema("{ \"curr_careUnit\": {\"type\": \"text\"}, \"sum\": {\"type\": \"float\"} }"))
.withSchema(new Schema()
.field("curr_careUnit", DataTypes.STRING())
.field("sum", DataTypes.FLOAT())
)
.inUpsertMode()
.createTemporaryTable("transfersSum")
val result = tEnv.sqlQuery(
"""
|SELECT curr_careUnit, sum(los)
|FROM transfers
|GROUP BY curr_careUnit
|""".stripMargin)
result.insertInto("transfersSum")
env.execute("Flink Streaming Demo Dump to Elasticsearch")
}
}
I am creating a fat jar and uploading it to my remote flink instance. Here is my build.gradle dependencies:
compile 'org.scala-lang:scala-library:2.11.12'
compile 'org.apache.flink:flink-scala_2.11:1.10.0'
compile 'org.apache.flink:flink-streaming-scala_2.11:1.10.0'
compile 'org.apache.flink:flink-connector-kafka-0.10_2.11:1.10.0'
compile 'org.apache.flink:flink-table-api-scala-bridge_2.11:1.10.0'
compile 'org.apache.flink:flink-connector-elasticsearch6_2.11:1.10.0'
compile 'org.apache.flink:flink-json:1.10.0'
compile 'com.fasterxml.jackson.core:jackson-core:2.10.1'
compile 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.10.1'
compile 'org.json4s:json4s-jackson_2.11:3.7.0-M1'
Here is how the farJar command is built for gradle:
jar {
from {
(configurations.compile).collect {
it.isDirectory() ? it : zipTree(it)
}
}
manifest {
attributes("Main-Class": "main" )
}
}
task fatJar(type: Jar) {
zip64 true
manifest {
attributes 'Main-Class': "flinkNamePull.Demo"
}
baseName = "${rootProject.name}"
from { configurations.compile.collect { it.isDirectory() ? it : zipTree(it) } }
with jar
}
Could anybody please help me to see what I am missing? I'm fairly new to Flink and data streaming in general. Hehe
Thank you in advance!

Is the list in The following factories have been considered: complete? Does it contain Elasticsearch6UpsertTableSinkFactory? If not as far as I can tell there is a problem with the service discovery dependencies.
How do you submit your job? Can you check if you have a file META-INF/services/org.apache.flink.table.factories.TableFactory in the uber jar with an entry for Elasticsearch6UpsertTableSinkFactory?
When using maven you have to add a transformer to properly merge service files:
<!-- The service transformer is needed to merge META-INF/services files -->
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
I don't know how do you do it in gradle.
EDIT:
Thanks to Arvid Heise
In gradle when using shadowJar plugin you can merge service files via:
// Merging Service Files
shadowJar {
mergeServiceFiles()
}

You should use the shadow plugin to create the fat jar instead of doing it manually.
In particular, you want to merge service descriptors.

Related

Zeppelin Python Flink cannot print to console

I'm using Kinesis Data Analytics Studio which provides a Zeppelin environment.
Very simple code:
%flink.pyflink
from pyflink.common.serialization import JsonRowDeserializationSchema
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer
# create env = determine app runs locally or remotely
env = s_env or StreamExecutionEnvironment.get_execution_environment()
env.add_jars("file:///home/ec2-user/flink-sql-connector-kafka_2.12-1.13.5.jar")
# create a kafka consumer
deserialization_schema = JsonRowDeserializationSchema.builder() \
.type_info(type_info=Types.ROW_NAMED(
['id', 'name'],
[Types.INT(), Types.STRING()])
).build()
kafka_consumer = FlinkKafkaConsumer(
topics='nihao',
deserialization_schema=deserialization_schema,
properties={
'bootstrap.servers': 'kakfa-brokers:9092',
'group.id': 'group1'
})
kafka_consumer.set_start_from_earliest()
ds = env.add_source(kafka_consumer)
ds.print()
env.execute('job1')
I can get this working locally can sees change logs being produced to console. However I cannot get the same results in Zeppelin.
Also checked STDOUT in Flink web console task managers, nothing is there too.
Am I missing something? Searched for days and could not find anything on it.
I'm not 100% sure but I think you may need a sink to begin pulling data through the datastream, you could potentially use the included Print Sink Function

Use a config file in jenkins pipeline

I have a Jenkins pipeline job, which is parametrized. It will take a string input and pass that to a bat script.
Input : https://jazz-test3.com/web/projects/ABC
I wanted to include a config file, it should be placed in a location (may be in the Jenkins machine locally or what can be the best option?).
Sample config file: server_config.txt
test1 : https://jazz-test1.com
test2 : https://jazz-test2.com
test3 : https://jazz-test3.com
Once the user input is received , the job should access this file and check whether the server in the input link is present in the config file.
If present call the bat script if not present throw an error , the server is not supported.
Expected result:
Input 1 : https://jazz-test3.com/web/projects/ABC
Job should run
Input 2 : https://jazz-test4.com/web/projects/ABC
Job should fail with error message
What is the best way of achieving this ?
Can we do it directly from the pipeline or a separate script will be required to perform this ?
Thank you for the help!!
Jenkins supports configuration files stored on the controller via the config-file-provider plugin. See Manage Jenkins | Managed Files for the collection of managed configuration files.
You can add a JSON config file with id "test-servers" that stores your test server configuration:
{
"test1":"https://jazz-test-1.com",
"test2":"https://jazz-test-1.com"
}
Then the job pipeline would do something like:
servers = [:]
stage("Setup") {
steps {
configFileProvider([fileId:'test-servers', variable:'test_servers_json']) {
script {
// Read the json data from the file.
echo "Loading configured servers from file '${test_servers_json}' ..."
servers = readJSON(file:test_servers_json)
}
}
}
}
stage('Test') {
steps {
script {
// Check if 'server' is supported.
if (!servers.containsKey(params.server)) {
error "Unsupported server"
}
build job:'...', parameters:[...]
}
}
}

Failed to register Prometheus Gauge to Flink

I am trying to expose a Prometheus Gauge in a Flink app:
#transient def metricGroup: MetricGroup = getRuntimeContext.getMetricGroup
.addGroup("site", site)
.addGroup("sink", counterBaseName)
#transient var failedCounter: Counter = _
def expose(metricName: String, gaugeValue: Int, context: SinkFunction.Context[_]): Unit = {
try {
metricGroup.addGroup("hostname", metricName).gauge[Int, ScalaGauge[Int]]("test", ScalaGauge[Int](() => gaugeValue))
}
} catch {
case _: Throwable => failedCounter.inc()
}
}
The app runs locally just fine and expose the metrics without any problem.
While trying to move to production I encounter the following exception in Flink task manager:
WARN org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while registering metric. java.lang.NullPointerException
Not sure, what am I missing here.
Why does the local app expose metrics while on the cluster it fails to register gauge?
I use Prometheus in order to expose other metrics from Flink, for example, failedCounter (in the code) which is a counter.
This is the first time I exposed gauge in Flink so I bet something in my implementation is broken.
Please help.

Spark query sql server

i'm trying to query SQL server using Spark/scala and running into an issue
here is the code
import org.apache.spark.SparkContext
object temp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("temp").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jdbcSqlConnStr = "jdbc:sqlserver://XXX.XXX.XXX.XXX;databaseName=test;user=XX;password=XXXXXXX;"
val jdbcDbTable = "[test].dbo.[Persons]"
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> jdbcSqlConnStr,
"dbtable" -> jdbcDbTable)).load()
jdbcDF.show(10)
println("Complete")
}
}
below is the error and i assume it is complaining about main method - but why ?how to fix it.
error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:888)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:70)
at apachetika.temp$.main(sqltemp.scala:24)
at apachetika.temp.main(sqltemp.scala)
18/09/28 16:04:40 INFO spark.SparkContext: Invoking stop() from shutdown hook
As far as I can tell this is due to a scala version mismatch
The library compiled with spark_core dependence with scala 2.11 instead of scala 2.10. Use scala 2.11.8+.
Hope this helps.

problem with loading a jar file which has dependency to another jar file

I have a problem while loading my jar file at run time.
My hotel.jar is loaded and a method of it (makeReservation) is invoked using the following code:
File file = new File("c:/ComponentFiles/hotel.jar");
URL jarfile = new URL("jar", "", "file:" + file.getAbsolutePath() + "!/");
URLClassLoader cl = URLClassLoader.newInstance(new URL[]{jarfile});
Class componentClass = cl.loadClass("HotelPackage.HotelMgt");
Object componentObject = componentClass.newInstance();
Method setMethod = componentClass.getDeclaredMethod("makeReservation", null);
setMethod.invoke(componentObject, null);
The problem is in the class HotelPackage.HotelMgt of my jar file , I have a class variable of another class (HotelPackage.Hotel) which is in another jar file.
I tried to open and load the other jar file with the same code as above but I receive the exception that cannot find the class def.:
Exception in thread "main" java.lang.NoClassDefFoundError: BeanPackage/Hotel
what is the solution?
You can specify dependencies between JARs by defining the Class-Path property in the JARs' manifest files. Then the JVM will load the dependency JARs automatically as needed.
More information is here: http://java.sun.com/docs/books/tutorial/deployment/jar/downman.html
Thanks, but I found another solution that really works. Since I know whole the component series that are going to work with each other, I load them all with one class loader instance (array of URLs). then the classloader itself manages the dependencies.

Resources