FlinkQueryableState: configuration issues on a local cluster - apache-flink

I am running flink from IDE. Storing data in the queryable is working,
but somehow when I query it, it throws an exception.
Exception
Failure(akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink#127.0.0.1:6123/), Path(/user/jobmanager)])
My code:
config.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,"localhost")
config.setString(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY,"6123")
#throws[Throwable]
def recover(failure: Throwable): Future[Array[Byte]] = if (failure.isInstanceOf[AssertionError]) return Futures.failed(failure)
else {
// At startup some failures are expected
// due to races. Make sure that they don't
// fail this test.
return Patterns.after(retryDelay, TEST_ACTOR_SYSTEM.scheduler, TEST_ACTOR_SYSTEM.dispatcher, new Callable[Future[Array[Byte]]]() {
#throws[Exception]
def call: Future[Array[Byte]] = return getKvStateWithRetries(queryName, key, serializedKey)
})
}
}
#SuppressWarnings(Array("unchecked"))
private def getKvStateWithRetries(queryName: String,
keyHash: Int,
serializedKey: Array[Byte]): Future[Array[Byte]] = {
val kvState = client.getKvState(jobID, queryName, keyHash, serializedKey)
kvState.recoverWith(recover(queryName, keyHash, serializedKey))
}
def onSuccess = new OnSuccess[Array[Byte]]() {
#throws(classOf[Throwable])
override def onSuccess(result: Array[Byte]): Unit = {
println("found record ")
val value = KvStateRequestSerializer.deserializeValue(result, valueSerializer)
println(value)
}
}
override def invoke(query: QueryMetaData): Unit = {
println("getting inside querystore"+query.record)
val serializedResult = flinkQuery.getResult(query.record, queryName)
serializedResult.onSuccess(onSuccess)
I am not spawning a new mini-cluster or cluster.submit
like https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/query/QueryableStateITCase.java
as I want to this in the same cluster in the same environment as main app running with env.execute. Is that step necessary.
From the documentation by default flink runs at localhost:6123
Is there problem with connection? Do I need to submit job in separate cluster?

After a lot of googling i found a solution.
I am using LocalStreamEnvironment and getting the same error, Until a found this thread RemoteEnv connect failed. The error described is for a different setup(not locally) but the gist example contained in the topic used for testing is creating the LocalFlinkMiniCluster with the parameter "useSingleActorSystem" set to false.
Looking at the implementation of LocalStreamEnvironment the MiniCluster is created with "useSingleActorSystem" set to true.
I simply created a class LocalQueryableStreamEnvironment extending LocalStreamEnvironment where the mini cluster is created with "useSingleActorSystem" set to true, and everything is working from IDE.
Now my code is as follow:
Configuration:
Configuration config = new Configuration();
config.setLong(TaskManagerOptions.MANAGED_MEMORY_SIZE, 6);
config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);
config.setInteger(JobManagerOptions.WEB_PORT, JobManagerOptions.WEB_PORT.defaultValue());
config.setBoolean(QueryableStateOptions.SERVER_ENABLE, true);
config.setString(JobManagerOptions.ADDRESS, "localhost");
config.setInteger(JobManagerOptions.PORT,JobManagerOptions.PORT.defaultValue());
**config.setInteger(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, 2);**
NOTE: QueryableState only works with this config LOCAL_NUMBER_TASK_MANAGER set to value more then 1!
Instantiate/execute environment:
LocalQueryableStreamEnvironment env = LocalQueryableStreamEnvironment.createLocalEnvironment(3, config);
...
env.addSource(anySource)
.keyby(anyAtribute)
.flatmap(new UpdateMyStateToBeQueriedLaterMapper())
.addSink(..); //etc
...
env.execute("JobNameHere");
And to create the client:
final Configuration config = new Configuration();
config.setString(JobManagerOptions.ADDRESS, "localhost");
config.setInteger(JobManagerOptions.PORT, JobManagerOptions.PORT.defaultValue());
HighAvailabilityServices highAvailabilityServices = HighAvailabilityServicesUtils
.createHighAvailabilityServices(
config,
Executors.newSingleThreadScheduledExecutor(),
HighAvailabilityServicesUtils.AddressResolution.TRY_ADDRESS_RESOLUTION
);
return new QueryableStateClient(config,highAvailabilityServices);
For more info access:
Queryable States in ApacheFlink - Implementation
Queryable State Client with 1.3.0-rc0
My dependencies:
compile group: 'org.apache.flink', name: 'flink-java', version: '1.3.1'
compile group: 'org.apache.flink', name: 'flink-jdbc', version: '1.3.1'
compile group: 'org.apache.flink', name: 'flink-streaming-java_2.11', version: '1.3.1'
compile group: 'org.apache.flink', name: 'flink-clients_2.11', version: '1.3.1'
compile group: 'org.apache.flink', name: 'flink-cep_2.11', version: '1.3.1'
compile group: 'org.apache.flink', name: 'flink-connector-kafka-0.10_2.11', version: '1.3.1'
compile 'org.apache.flink:flink-runtime-web_2.11:1.3.1'

Related

I am getting an error in Kapt Debug Kotlin. I have update versions of dependencies in gradle file. still facing this issue

My app was running smoothly but I am getting this error now.I am getting an error in Kapt Debug Kotlin. I have update versions of dependencies in gradle file. still facing this issue. How it can be resolved? I saw somewhere to see your room database , dao and data class. still not able to figure out what is the issue.
The error is showing this file
ROOM DATABASE
#Database(entities = [Transaction::class], version = 1, exportSchema = false)
abstract class MoneyDatabase : RoomDatabase(){
abstract fun transactionListDao():transactionDetailDao
companion object {
// Singleton prevents multiple instances of database opening at the
// same time.
#Volatile
private var INSTANCE: MoneyDatabase? = null
fun getDatabase(context: Context): MoneyDatabase {
// if the INSTANCE is not null, then return it,
// if it is, then create the database
return INSTANCE ?: synchronized(this) {
val instance = Room.databaseBuilder(
context.applicationContext,
MoneyDatabase::class.java,
"transaction_database"
).build()
INSTANCE = instance
// return instance
instance
}
}
}
}
DAO
#Dao
interface transactionDetailDao {
#Insert(onConflict = OnConflictStrategy.IGNORE)
suspend fun insert(transaction : Transaction)
#Delete
suspend fun delete(transaction : Transaction)
#Update
suspend fun update(transaction: Transaction)
#Query("SELECT * FROM transaction_table ORDER BY id ASC")
fun getalltransaction(): LiveData<List<Transaction>>
}
DATA CLASS
enum class Transaction_type(){
Cash , debit , Credit
}
enum class Type(){
Income, Expense
}
#Entity(tableName = "transaction_table")
data class Transaction(
val name : String,
val amount : Float,
val day : Int,
val month : Int,
val year : Int,
val comment: String,
val datePicker: String,
val transaction_type : String,
val category : String,
val recurring_from : String,
val recurring_to : String
){
#PrimaryKey(autoGenerate = true) var id :Long=0
}
The error is resolved. I was using the kotlin version 1.6.0. I reduced it to 1.4.32. As far as I understood, above(latest) version of Kotlin along with Room and coroutines doesn’t work well.
I believe that your issue is due to the use of an incorrect class being inadvertently used, a likely culprit being Transaction as that it also a Room class
Perhaps in transactionDetailDao (although it might be elsewhere)
See if you have import androidx.room.Transaction? (or any other imports with Transaction)?
If so delete or comment out the import
As an example with, and :-
And with the import commented out :-
Imported from github, had a play issue definitely appears to be with co-routines. commented out suspends in the Dao :-
#Dao
interface transactionDetailDao {
#Insert(onConflict = OnConflictStrategy.IGNORE)
suspend fun insert(transaction : Transaction)
#Delete
suspend fun delete(transaction : Transaction)
#Update
suspend fun update(transaction: Transaction)
#Query("SELECT * FROM transaction_table ORDER BY id ASC")
fun getalltransaction(): LiveData<List<Transaction>>
}
Compiled ok and ran and had a play e.g. :-

Gatling Test in Blazemeter creates ClassNotFoundException

I used the Taurus Gatling guide to create a simple performance test and uploaded the yaml and scala file to blazemeter. When i start the test in blazemeter there is no test result and the bzt.log contains a ClassNotFoundException.
The validator for the yaml says its fine and i can't find anything so I'm lost...
My blazemleter.yaml:
execution:
- executor: gatling
scenario: products
iterations: 15
concurrency: 3
ramp-up: 2
scenarios:
products:
script: productSimulation.scala
simulation: test.productSimulation
My productSimulation.scala is mostly copied from their documentation:
package test
import io.gatling.core.Predef._
import io.gatling.http.Predef._
class productSimulation extends Simulation {
// parse load profile from Taurus
val t_iterations = Integer.getInteger("iterations", 100).toInt
val t_concurrency = Integer.getInteger("concurrency", 10).toInt
val t_rampUp = Integer.getInteger("ramp-up", 1).toInt
val t_holdFor = Integer.getInteger("hold-for", 60).toInt
val t_throughput = Integer.getInteger("throughput", 100).toInt
val httpConf = http.baseURL("https://mydomain/")
val header = Map(
"Content-Type" -> """application/x-www-form-urlencoded""")
val sessionHeaders = Map("Authorization" -> "Bearer ${access_token}",
"Content-Type" -> "application/json")
// 'forever' means each thread will execute scenario until
// duration limit is reached
val loopScenario = scenario("products").forever() {
// auth
exec(http("POST OAuth Req")
.post("https://oauth-provider")
.formParam("client_secret", "...")
.formParam("client_id", "...")
.formParam("grant_type", "client_credentials")
.headers(header)
.check(status.is(200))
.check(jsonPath("$.access_token").exists
.saveAs("access_token")))
// read products
.exec(http("products")
.get("/products")
.queryParam("limit", 200)
.headers(sessionHeaders))
}
val execution = loopScenario
.inject(rampUsers(concurrency) over rampUp) // during for gatling 3.x
.protocols(httpConf)
setUp(execution).maxDuration(rampUp + holdFor)
}
After learning that i can execute the scala file as a test directly if i click the file directly and not the yaml i got better exceptions.
Basicly i made two mistakes:
my variables are named t_concurrency, ... while the execution definition uses a different name. ups.
since gatling 3.x the keyword for the inject is during, so the correct code is: rampUsers(t_concurrency) during t_rampUp
Now everything works.

Spark streaming nested execution serialization issues

I am trying to connect DB2 database in the spark streaming application and the database query execution statement causing "org.apache.spark.SparkException: Task not serializable" issues. Please advise. Below is the sample code I have for reference.
dataLines.foreachRDD{rdd=>
val spark = SparkSessionSingleton.getInstance(rdd.sparkContext.getConf)
val dataRows=rdd.map(rs => rs.value).map(row =>
row.split(",")(1)-> (row.split(",")(0), row.split(",")(1), row.split(",")(2)
, "cvflds_"+row.split(",")(3).toLowerCase, row.split(",")(4), row.split(",")(5), row.split(",")(6))
)
val db2Conn = getDB2Connection(spark,db2ConParams)
dataRows.foreach{ case (k,v) =>
val table = v._4
val dbQuery = s"(SELECT * FROM $table ) tblResult"
val df=getTableData(db2Conn,dbQuery)
df.show(2)
}
}
Below is other function code:
private def getDB2Connection(spark: SparkSession, db2ConParams:scala.collection.immutable.Map[String,String]): DataFrameReader = {
spark.read.format("jdbc").options(db2ConParams)
}
private def getTableData(db2Con: DataFrameReader,tableName: String):DataFrame ={
db2Con.option("dbtable",tableName).load()
}
object SparkSessionSingleton {
#transient private var instance: SparkSession = _
def getInstance(sparkConf: SparkConf): SparkSession = {
if (instance == null) {
instance = SparkSession
.builder
.config(sparkConf)
.getOrCreate()
}
instance
}
}
Below is the error log:
2018-03-28 22:12:21,487 [JobScheduler] ERROR org.apache.spark.streaming.scheduler.JobScheduler - Error running job streaming job 1522289540000 ms.0
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:916)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:915)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:915)
at ncc.org.civil.receiver.DB2DataLoadToKudu$$anonfun$createSparkContext$1.apply(DB2DataLoadToKudu.scala:139)
at ncc.org.civil.receiver.DB2DataLoadToKudu$$anonfun$createSparkContext$1.apply(DB2DataLoadToKudu.scala:128)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: org.apache.spark.sql.DataFrameReader
Serialization stack:
- object not serializable (class: org.apache.spark.sql.DataFrameReader, value: org.apache.spark.sql.DataFrameReader#15fdb01)
- field (class: ncc.org.civil.receiver.DB2DataLoadToKudu$$anonfun$createSparkContext$1$$anonfun$apply$2, name: db2Conn$1, type: class org.apache.spark.sql.DataFrameReader)
- object (class ncc.org.civil.receiver.DB2DataLoadToKudu$$anonfun$createSparkContext$1$$anonfun$apply$2, )
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 30 more
Ideally you should keep the closure in dataRows.foreach clear of any connection objects, since the closure is meant to be serialized to executors and run there. This concept is covered in depth # this official link
In your case below line is the closure that is causing the issue:
val df=getTableData(db2Conn,dbQuery)
So, instead of using spark to get the DB2 table loaded, which in your case becomes(after combining the methods):
spark.read.format("jdbc").options(db2ConParams).option("dbtable",tableName).load()
Use plain JDBC in the closure to achieve this. You can use db2ConParams in the jdbc code. (I assume its simple enough to be serializable). The link also suggests using rdd.foreachPartition and ConnectionPool to further optimize.
You have not mentioned what you are doing with the table data except df.show(2). If the rows are huge, then you may discuss more about your use case. Perhaps, you need to consider a different design then.

Jenkins 'Pipeline script from SCM' polls infinitely

My Jenkins pipeline workflow for building project is as below.
There are two files checked into repository -- JenkinsfileAllBranches and Jenkinsfile
1) JenkinsfileAllBranches - Polls all branches for changes
def scm_branch = ''
pipeline {
agent any
triggers {
pollSCM('* * * * *') //used this for quick debugging
}
stages {
stage ('SCM') {
steps {
script {
def git_scm = checkout([$class: 'GitSCM', branches: [[name: '**']],
doGenerateSubmoduleConfigurations: false, extensions: [],
submoduleCfg: [], userRemoteConfigs: [[url: <repository_url>]]])
scm_branch = git_scm.GIT_BRANCH.substring('origin\\'.length())
}
}
}
stage('Call Jenkinsfile for specific branch') {
steps {
print("branch:${scm_branch}")
build job:'Build_Project', parameters:
[[$class:'StringParameterValue', name: 'BRANCH', value: scm_branch]]
}
}
}
}
2) Jenkinsfile - For ease, I am providing the simplified Jenkinsfile
pipeline {
agent any
options {
disableConcurrentBuilds()
}
parameters {
string(name: 'BRANCH', defaultValue:'', description: 'Enter a branch name to build.')
}
stages {
stage ('SCM') {
steps {
script {
print("Parameter BRANCH: ${params.BRANCH}")
}
git url: <repo_url>, branch: params.BRANCH
}
}
}
}
Problem: First job is expected to poll every minute for changes in branches and second job is expected to build that specific branch where changes are found.
Everything works as expected when I leave the pipeline scripts in place.
But when I select 'Pipeline from SCM', JenkinsfileAllBranches behaves weirdly. It keeps polling the same branch again and again. How do I resolve this loop?
In the Stage -- Call Jenkinsfile for specific branch, I notice that it's always executing the branch as master instead of branch1 or branch2 where scm changes are found.

Scala, Slick connect to MSSQL server

I am trying to connect to a MSSQL database using the slick framework. The following code shows my first attempt but I can't figure out what is wrong.
This error occurs when leaving it as shown below:
[1] value create is not a member of scala.slick.lifted.DDL
Now I delete the line because I do not necessarily need to create the table within my scala code. But then another error arises:
[2] value map is not a member of object asd.asd.App.Coffees
package asd.asd
import scala.slick.driver.SQLServerDriver._
import scala.slick.session.Database.threadLocalSession
object App {
object Coffees extends Table[(String, Int, Double)]("COFFEES") {
def name = column[String]("COF_NAME", O.PrimaryKey)
def supID = column[Int]("SUP_ID")
def price = column[Double]("PRICE")
def * = name ~ supID ~ price
}
def main(args : Array[String]) {
println( "Hello World!" )
val db = slick.session.Database.forURL(url = "jdbc:jtds:sqlserver", user = "test", password = "test", driver = "scala.slick.driver.SQLServerDriver")
db withSession {
Coffees.ddl.create [1]
// Coffees.insertAll(
// ("Colombian", 101, 7.99),
// ("Colombian_Decaf", 101, 8.99),
// ("French_Roast_Decaf", 49, 9.99)
// )
val q = for {
c <- Coffees [2]
} yield (c.name, c.price, c.supID)
println(q.selectStatement)
q.foreach { case (n, p, s) => println(n + ": " + p) }
}
}
}
Problem solved. What I did was the following: Update to the latest Slick version and then adjust the code like demonstrated here. Afterwards you need to exchange the line
import scala.slick.driver.H2Driver.simple._
with
import scala.slick.driver.SQLServerDriver.simple._
And modify the connect string to
[...]
Database.forURL("jdbc:jtds:sqlserver://localhost:1433/<DB>;instance=<INSTANCE>", driver = "scala.slick.driver.SQLServerDriver") withSession {
[...]
After this worked out, I decided to use a c3p0 pooled connection (Which makes Slick very much faster and thus first usable, I do highly recommend to use connection pooling!). This left me with the following database object.
package utils
import scala.slick.driver.SQLServerDriver.simple._
import com.mchange.v2.c3p0.ComboPooledDataSource
object DatabaseUtils {
private val ds = new ComboPooledDataSource
ds.setDriverClass("scala.slick.driver.SQLServerDriver")
ds.setUser("supervisor")
ds.setPassword("password1")
ds.setMaxPoolSize(20)
ds.setMinPoolSize(3)
ds.setTestConnectionOnCheckin(true)
ds.setIdleConnectionTestPeriod(300)
ds.setMaxIdleTimeExcessConnections(240)
ds.setAcquireIncrement(1)
ds.setJdbcUrl("jdbc:jtds:sqlserver://localhost:1433/db_test;instance=SQLEXPRESS")
ds.setPreferredTestQuery("SELECT 1")
private val _database = Database.forDataSource(ds)
def database = _database
}
You can use this like displayed below.
DatabaseUtils.database withSession {
implicit session =>
[...]
}
Last the Maven dependency for c3p0 and the latest Slick version.
<dependency>
<groupId>com.typesafe.slick</groupId>
<artifactId>slick_2.10</artifactId>
<version>2.0.0-M3</version>
</dependency>
<dependency>
<groupId>com.mchange</groupId>
<artifactId>c3p0</artifactId>
<version>0.9.2.1</version>
</dependency>

Resources