SnappyData + Zeppelin + Kafka streaming - error while creating streaming table - apache-zeppelin

I'm trying to create SnappyData streaming table using Zeppelin.
I have issue with stream table definition on argument 'rowConverter'
Zeppelin notebook is divided to a few paragraphs:
Paragraph 1:
import org.apache.spark.sql.Row
import org.apache.spark.sql.streaming.{SchemaDStream, StreamToRowsConverter}
class RowsConverter extends StreamToRowsConverter with Serializable {
override def toRows(message: Any): Seq[Row] = {
val log = message.asInstanceOf[String]
val fields = log.split(",")
val rows = Seq(Row.fromSeq(Seq(new java.sql.Timestamp(fields(0).toLong),
fields(1),
fields(2),
fields(3),
fields(4),
fields(5).toDouble,
fields(6)
)))
rows
}
}
Paragraph 2:
snsc.sql(
"CREATE STREAM TABLE adImpressionStream if not exists ("sensor_id string, metric
metric string) using kafka_stream
options (storagelevel 'MEMORY_AND_DISK_SER_2',
rowConverter 'RowsConverter',
zkQuorum 'localhost:2181',
groupId 'streamConsumer', topics 'test'");"
)
First paragraph returns error:
error: not found: type StreamToRowsConverter
class RowsConverter extends StreamToRowsConverter with Serializable {
^
<console>:13: error: not found: type Row
override def toRows(message: Any): Seq[Row] = {
^
<console>:16: error: not found: value Row
val rows = Seq(Row.fromSeq(Seq(new java.sql.Timestamp(fields(0).toLong),
Second paragraph:
java.lang.RuntimeException: Failed to load class : java.lang.ClassNotFoundException: RowsConverter
I have been trying to use default code from git:
snsc.sql("create stream table streamTable (userId string, clickStreamLog string) " +
"using kafka_stream options (" +
"storagelevel 'MEMORY_AND_DISK_SER_2', " +
" rowConverter 'io.snappydata.app.streaming.KafkaStreamToRowsConverter' ," +
"kafkaParams 'zookeeper.connect->localhost:2181;auto.offset.reset->smallest;group.id->myGroupId', " +
"topics 'test')")
but I have similar error:
java.lang.RuntimeException: Failed to load class : java.lang.ClassNotFoundException: io.snappydata.app.streaming.KafkaStreamToRowsConverter
Could you help me with this issue?
Thank you a lot.

You need to provide your application specific classes in the classpath. Please refer to the step of setting classpath here. Zeppelin will pick up classpath set in your spark-env.sh
https://github.com/SnappyDataInc/snappy-poc#lets-get-this-going

Add the snappydata interpreter to Apache Zeppelin as given here: https://snappydatainc.github.io/snappydata/howto/use_apache_zeppelin_with_snappydata/
This will enable running the Zeppelin in the lead so that the code is run in embedded mode. In particular you need to set the required jars using "-classpath" option in the cluster configuration.

Related

Error when connecting android (kotlin) app to database

I get the following error when connecting to my database:
" Caused by: org.postgresql.util.PSQLException: Something unusual has occurred to cause the driver to fail. Please report this exception."
package code.with.cal.timeronservicetutorial
import kotlinx.android.synthetic.main.activity_main.*
import java.sql.DriverManager
class ConnectionHelper {
fun ConnectDB() {
val jdbcUrl = "jdbc:postgresql://HOST/USER"
// get the connection
val connection = DriverManager.getConnection(jdbcUrl, "USER", "PW")
// prints true if the connection is valid
println(connection.isValid(0))
}
}
I suspect I didn't include the dependency of right version in my gradle, but i don't know how to find which driver version i have. I added this one:
implementation 'org.postgresql:postgresql:42.2.5'

Apache Beam ReadFromKafka using Python runs in Flink but no published messages are passing through

I have a local cluster running in Minikube. My pipeline job is written in python and is a basic consumer of Kafka. My pipeline looks as follows:
def run():
import apache_beam as beam
options = PipelineOptions([
"--runner=FlinkRunner",
"--flink_version=1.10",
"--flink_master=localhost:8081",
"--environment_type=EXTERNAL",
"--environment_config=localhost:50000",
"--streaming",
"--flink_submit_uber_jar"
])
options.view_as(SetupOptions).save_main_session = True
options.view_as(StandardOptions).streaming = True
with beam.Pipeline(options=options) as p:
(p
| 'Create words' >> ReadFromKafka(
topics=['mullerstreamer'],
consumer_config={
'bootstrap.servers': '192.168.49.1:9092,192.168.49.1:9093',
'auto.offset.reset': 'earliest',
'enable.auto.commit': 'true',
'group.id': 'BEAM-local'
}
)
| 'print' >> beam.Map(print)
)
if __name__ == "__main__":
run()
The Flink runner shows no records passing through in "Records received"
Am I missing something basic?
--environment_type=EXTERNAL means you are starting up the workers manually, and is primarily for internal testing. Does it work if you don't specify an environment_type/config at all?
def run(bootstrap_servers, topic, pipeline_args):
bootstrap_servers = 'localhost:9092'
topic = 'wordcount'
pipeline_args = pipeline_args.append('--flink_submit_uber_jar')
pipeline_options = PipelineOptions([
"--runner=FlinkRunner",
"--flink_master=localhost:8081",
"--flink_version=1.12",
pipeline_args
],
save_main_session=True, streaming=True)
with beam.Pipeline(options=pipeline_options) as pipeline:
_ = (
pipeline
| ReadFromKafka(
consumer_config={'bootstrap.servers': bootstrap_servers},
topics=[topic])
| beam.FlatMap(lambda kv: log_ride(kv[1])))
I'm facing another issue with latest apache Beam 2.30.0, Flink 1.12.4
2021/06/10 17:39:42 Initializing python harness: /opt/apache/beam/boot --id=1-2 --provision_endpoint=localhost:42353
2021/06/10 17:39:50 Failed to retrieve staged files: failed to retrieve /tmp/staged in 3 attempts: failed to retrieve chunk for /tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for /tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for /tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for /tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc =
2021-06-10 17:39:53,076 WARN org.apache.flink.runtime.taskmanager.Task [] - [3]ReadFromKafka(beam:external:java:kafka:read:v1)/{KafkaIO.Read, Remove Kafka Metadata} -> [1]FlatMap(<lambda at kafka-taxi.py:88>) (1/1)#0 (9d941b13ae9f28fd1460bc242b7f6cc9) switched from RUNNING to FAILED.
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: No container running for id d727ca3c0690d949f9ed1da9c3435b3ab3af70b6b422dc82905eed2f74ec7a15

Flink adding rebalance to stream cause to job failure when StreamExecutionEnvironment set using TimeCharacteristic.IngestionTime

I am trying to run streaming job that consumes messages from Kafka transforms them and sinks to the Cassandra.
Current code snippet is failing
val env: StreamExecutionEnvironment = getExecutionEnv("dev")
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)
.
.
.
.
val source = env.addSource(kafkaConsumer)
.uid("kafkaSource")
.rebalance
val transformedObjects = source.process(new EnrichEventWithIngestionTimestamp)
.setParallelism(dataSinkParallelism)
sinker.apply(transformedObjects,dataSinkParallelism)
class EnrichEventWithIngestionTimestamp extends ProcessFunction[RawData, TransforemedObjects] {
override def processElement(rawData: RawData,
context: ProcessFunction[RawData, TransforemedObjects]#Context,
collector: Collector[TransforemedObjects]): Unit = {
val currentTimestamp=context.timerService().currentProcessingTime()
context.timerService().registerProcessingTimeTimer(currentTimestamp)
collector.collect(TransforemedObjects.fromRawData(rawData,currentTimestamp))
}
}
but if rebalance is commented out, or the job is changed to use TimeCharacteristic.EventTime and watermark assignment, as in fallowing snippet, then it works.
val env: StreamExecutionEnvironment = getExecutionEnv("dev")
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
.
.
val source = env.addSource(kafkaConsumer)
.uid("kafkaSource")
.rebalance
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessRawDataTimestampExtractor[RawData](Time.seconds(1)))
val transformedObjects = source.map(rawData=>TransforemedObjects.fromRawData(rawData))
.setParallelism(dataSinkParallelism)
sinker.apply(transformedObjects,dataSinkParallelism)
The stack trace is:
java.lang.Exception: java.lang.RuntimeException: 1
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:217)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.processInput(SourceStreamTask.java:133)
at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: 1
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:143)
at org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:45)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$AutomaticWatermarkContext.processAndCollect(StreamSourceContexts.java:176)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$AutomaticWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:194)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:91)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:156)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:203)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.getBufferBuilder(RecordWriter.java:246)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:169)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:154)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:120)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 16 more
Am I doing something wrong?
Or there Is a limitation of using rebalance function when TimeCharacteristic set to IngestionTime?
Thank you in advance...
Can you provide the flink version that you are using?
Seems your issue is related to this Jira ticket
https://issues.apache.org/jira/browse/FLINK-14087
Did u only use rebalance once in your task? The recordWriter may share same channelSelector which decides where the record will be forwarded to. Your stack trace shows it is trying to select an out-of-bound channel.

How correctly connect to Oracle 12g database in Play Framework?

I am new in Play Framework (Scala) and need some advise.
I use Scala 2.12 and Play Framework 2.6.20. I need to use several databases in my project. Right now I connected MySQL database as it says in documentation. How correctly connect project to remote Oracle 12g database?
application.conf:
db {
mysql.driver = com.mysql.cj.jdbc.Driver
mysql.url = "jdbc:mysql://host:port/database?characterEncoding=UTF-8"
mysql.username = "username"
mysql.password = "password"
}
First of all to lib folder I put ojdbc8.jar file from oracle website.
Then add libraryDependencies += "com.oracle" % "ojdbc8" % "12.1.0.1" code to sbt.build file. Finally I wrote settings to aplication.conf file.
After that step I notice error in terminal:
[error] (*:update) sbt.ResolveException: unresolved dependency: com.oracle#ojdbc8;12.1.0.1: not found
[error] Total time: 6 s, completed 10.11.2018 16:48:30
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
EDIT:
application.conf:
db {
mysql.driver = com.mysql.cj.jdbc.Driver
mysql.url = "jdbc:mysql://#host:#port/#database?characterEncoding=UTF-8"
mysql.username = "#username"
mysql.password = "#password"
oracle.driver = oracle.jdbc.driver.OracleDriver
oracle.url = "jdbc:oracle:thin:#host:#port/#sid"
oracle.username = "#username"
oracle.password = "#password"
}
ERROR:
play.api.UnexpectedException: Unexpected exception[CreationException: Unable to create injector, see the following errors:
1) No implementation for play.api.db.Database was bound.
while locating play.api.db.Database
for the 1st parameter of controllers.GetMarkersController.<init>(GetMarkersController.scala:14)
while locating controllers.GetMarkersController
for the 7th parameter of router.Routes.<init>(Routes.scala:45)
at play.api.inject.RoutesProvider$.bindingsFromConfiguration(BuiltinModule.scala:121):
Binding(class router.Routes to self) (via modules: com.google.inject.util.Modules$OverrideModule -> play.api.inject.guice.GuiceableModuleConversions$$anon$1)
GetMarkersController.scala:
package controllers
import javax.inject._
import akka.actor.ActorSystem
import play.api.Configuration
import play.api.mvc.{AbstractController, ControllerComponents}
import play.api.libs.ws._
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, Future, Promise}
import services._
import play.api.db.Database
class GetMarkersController #Inject()(db: Database, conf: Configuration, ws: WSClient, cc: ControllerComponents, actorSystem: ActorSystem)(implicit exec: ExecutionContext) extends AbstractController(cc) {
def getMarkersValues(start_date: String, end_date: String) = Action.async {
getValues(1.second, start_date: String, end_date: String).map {
message => Ok(message)
}
}
private def getValues(delayTime: FiniteDuration, start_date: String, end_date: String): Future[String] = {
val promise: Promise[String] = Promise[String]()
val service: GetMarkersService = new GetMarkersService(db)
actorSystem.scheduler.scheduleOnce(delayTime) {
promise.success(service.get_markers(start_date, end_date))
}(actorSystem.dispatcher)
promise.future
}
}
You cannot access Oracle without credentials. You need to have an account with Oracle. Then add something like the following to your build.sbt file
resolvers += "Oracle" at "https://maven.oracle.com"
credentials += Credentials("Oracle", "maven.oracle.com", "username", "password")
More information about accessing the OTN: https://docs.oracle.com/middleware/1213/core/MAVEN/config_maven_repo.htm#MAVEN9012
If you have the hard coded jar, you don't need to include as a dependency. See unmanagedDependencies https://www.scala-sbt.org/1.x/docs/Library-Dependencies.html

Apache Zeppelin - error: overloaded method value run with alternatives

I am trying to use the angular binding feature available in Apache Zeppelin in the following code:
val ab10 = z.sqlContext.sql("select "+ z.angular("selectVari0") + " from MyDF")
ab10.toDF.registerTempTable("ab0")
z.angularBind("abb0", ab10)
val selvar = z.getInterpreterContext()
z.angularUnwatch("abb0")
z.angularWatch("abb0", (before:Object, after:Object) => {
z.run(15, selvar)
})
I get the following error:
ab10: org.apache.spark.sql.DataFrame = [BMI: double]
warning: there was one deprecation warning; re-run with -deprecation for details
selvar: org.apache.zeppelin.interpreter.InterpreterContext = org.apache.zeppelin.interpreter.InterpreterContext#216b8218
<console>:31: error: overloaded method value run with alternatives:
(x$1: java.util.List[Object],x$2: org.apache.zeppelin.interpreter.InterpreterContext)Unit <and>
(x$1: String,x$2: String)Unit
cannot be applied to (Int, org.apache.zeppelin.interpreter.InterpreterContext)
z.run(15, selvar)
^
I tried anther example from here. I got similar errors. I was not able to find any documentation to fix this error.
As stated in error message, you have to overloaded .run methods, with following signatures:
(x$1: java.util.List[Object],x$2: org.apache.zeppelin.interpreter.InterpreterContext)Unit <and>
(x$1: String,x$2: String)Unit
Please, try to call it like this:
import collection.JavaConverters._
z.run(List(15).asJava, selvar)

Resources