I'm a beginner on pyflink framework and I would like to know if my use case is possible with it ...
I need to make a tumbling windows and apply a python udf (scikit learn clustering model) on it.
The use case is : every 30 seconds I want to apply my udf on the previous 30 seconds of data.
For the moment I succeeded in consume data from a kafka in streaming but then I'm not able to create a 30seconds window on a non-keyed stream with the python API.
Do you know some example for my use case ? Do you know if the pyflink API allow this ?
Here my first shot :
from pyflink.common import Row
from pyflink.common.serialization import JsonRowDeserializationSchema, JsonRowSerializationSchema
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer, FlinkKafkaProducer
from pyflink.common.watermark_strategy import TimestampAssigner, WatermarkStrategy
from pyflink.common import Duration
import time
from utils.selector import Selector
from utils.timestampAssigner import KafkaRowTimestampAssigner
# 1. create a StreamExecutionEnvironment
env = StreamExecutionEnvironment.get_execution_environment()
# the sql connector for kafka is used here as it's a fat jar and could avoid dependency issues
env.add_jars("file:///flink-sql-connector-kafka_2.11-1.14.0.jar")
deserialization_schema = JsonRowDeserializationSchema.builder() \
.type_info(type_info=Types.ROW_NAMED(["labelId","freq","timestamp"],[Types.STRING(),Types.DOUBLE(),Types.STRING()])).build()
kafka_consumer = FlinkKafkaConsumer(
topics='events',
deserialization_schema=deserialization_schema,
properties={'bootstrap.servers': 'localhost:9092'})
# watermark_strategy = WatermarkStrategy.for_bounded_out_of_orderness(Duration.of_seconds(5))\
# .with_timestamp_assigner(KafkaRowTimestampAssigner())
ds = env.add_source(kafka_consumer)
ds.print()
ds = ds.windowAll()
# ds.print()
env.execute()
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner (file:/home/dorian/dataScience/pyflink/pyflink_env/lib/python3.6/site-packages/pyflink/lib/flink-dist_2.11-1.14.0.jar) to field java.util.Properties.serialVersionUID
WARNING: Please consider reporting this to the maintainers of org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Traceback (most recent call last):
File "/home/dorian/dataScience/pyflink/project/__main__.py", line 35, in <module>
ds = ds.windowAll()
AttributeError: 'DataStream' object has no attribute 'windowAll'
Thx
Related
I was doing an experiment with open pulse on caliberation of a qubit and I stumbled upon this error:
Traceback (most recent call last):
Input In [73] in <cell line: 1>
frequency_sweep_results = job.result(timeout=120) # timeout parameter set to 120 seconds
File /opt/conda/lib/python3.8/site-packages/qiskit/providers/ibmq/job/ibmqjob.py:290 in result
raise IBMQJobFailureError(
IBMQJobFailureError: 'Unable to retrieve result for job 627bcffafd267c3dbc4f42f7. Job has failed: The Qobj pulse type is not supported by the selected backend. Error code: 1108.'
Error code 1108 being :
Run the job on a backend that supports open pulse. Whether a backend supports open pulse can be found in its configuration data.
Use %tb to get the full traceback
I have used ibmq_santiago,ibmq_manila and ibmq_lima so far all giving me the same error.
Could someone suggest a backend that supports qiskit pulse?
You can check out the backends with pulse support in the table view of IBM service list:
Programmatically, you can list the backends with pulse support you have access to in the following way:
from qiskit import IBMQ
provider = IBMQ.load_account()
backends_supporting_openpulse = provider.backends(filters=lambda b: b.configuration().open_pulse)
I'm using Kinesis Data Analytics Studio which provides a Zeppelin environment.
Very simple code:
%flink.pyflink
from pyflink.common.serialization import JsonRowDeserializationSchema
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer
# create env = determine app runs locally or remotely
env = s_env or StreamExecutionEnvironment.get_execution_environment()
env.add_jars("file:///home/ec2-user/flink-sql-connector-kafka_2.12-1.13.5.jar")
# create a kafka consumer
deserialization_schema = JsonRowDeserializationSchema.builder() \
.type_info(type_info=Types.ROW_NAMED(
['id', 'name'],
[Types.INT(), Types.STRING()])
).build()
kafka_consumer = FlinkKafkaConsumer(
topics='nihao',
deserialization_schema=deserialization_schema,
properties={
'bootstrap.servers': 'kakfa-brokers:9092',
'group.id': 'group1'
})
kafka_consumer.set_start_from_earliest()
ds = env.add_source(kafka_consumer)
ds.print()
env.execute('job1')
I can get this working locally can sees change logs being produced to console. However I cannot get the same results in Zeppelin.
Also checked STDOUT in Flink web console task managers, nothing is there too.
Am I missing something? Searched for days and could not find anything on it.
I'm not 100% sure but I think you may need a sink to begin pulling data through the datastream, you could potentially use the included Print Sink Function
I am trying to run simulation with nothingFor function under Gatling 3.4.1. However, compilation fails with the error could not find implicit value for evidence parameter of type io.gatling.core.controller.inject.InjectionProfileFactory[Product with Serializable]
Simulation
package abs
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class DefaultSimulation extends Simulation {
...
val httpProtocol = http
.baseUrl(Setting.baseUrl)
.userAgentHeader("Gatling/3.4.1")
setUp(
getScenario.inject(
nothingFor(20 seconds),
rampConcurrentUsers(0) to(20) during(10 seconds),
constantConcurrentUsers(20) during (600 seconds)
),
setScenario.inject(
rampConcurrentUsers(0) to(20) during(30 seconds),
constantConcurrentUsers(20) during (600 seconds)
)
).protocols(httpProtocol)
}
Compilation error
[ERROR] i.g.c.ZincCompiler$ - C:/Installation/gatling-3.4.1/user-files/simulations/DefaultSimulation.scala:50:30: could not find implicit value for evidence parameter of type io.gatling.core.controller.inject.InjectionProfileFactory[Product with Serializable]
getScenario.inject(
^
[ERROR] i.g.c.ZincCompiler$ - one error found
[ERROR] i.g.c.ZincCompiler$ - Compilation crashed
xsbt.InterfaceCompileFailed: null
at xsbt.CachedCompiler0.handleErrors(CompilerBridge.scala:183)
at xsbt.CachedCompiler0.run(CompilerBridge.scala:172)
at xsbt.CachedCompiler0.run(CompilerBridge.scala:134)
at xsbt.CompilerBridge.run(CompilerBridge.scala:39)
at sbt.internal.inc.AnalyzingCompiler.compile(AnalyzingCompiler.scala:89)
at sbt.internal.inc.MixedAnalyzingCompiler.$anonfun$compile$7(MixedAnalyzingCompiler.scala:185)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at sbt.internal.inc.MixedAnalyzingCompiler.timed(MixedAnalyzingCompiler.scala:240)
at sbt.internal.inc.MixedAnalyzingCompiler.$anonfun$compile$4(MixedAnalyzingCompiler.scala:175)
at sbt.internal.inc.MixedAnalyzingCompiler.$anonfun$compile$4$adapted(MixedAnalyzingCompiler.scala:156)
at sbt.internal.inc.JarUtils$.withPreviousJar(JarUtils.scala:232)
at sbt.internal.inc.MixedAnalyzingCompiler.compileScala$1(MixedAnalyzingCompiler.scala:156)
at sbt.internal.inc.MixedAnalyzingCompiler.compile(MixedAnalyzingCompiler.scala:203)
at sbt.internal.inc.IncrementalCompilerImpl.$anonfun$compileInternal$1(IncrementalCompilerImpl.scala:571)
at sbt.internal.inc.IncrementalCompilerImpl.$anonfun$compileInternal$1$adapted(IncrementalCompilerImpl.scala:571)
at sbt.internal.inc.Incremental$.$anonfun$apply$5(Incremental.scala:174)
at sbt.internal.inc.Incremental$.$anonfun$apply$5$adapted(Incremental.scala:172)
at sbt.internal.inc.Incremental$$anon$2.run(Incremental.scala:459)
at sbt.internal.inc.IncrementalCommon$CycleState.next(IncrementalCommon.scala:115)
at sbt.internal.inc.IncrementalCommon$$anon$1.next(IncrementalCommon.scala:56)
at sbt.internal.inc.IncrementalCommon$$anon$1.next(IncrementalCommon.scala:52)
at sbt.internal.inc.IncrementalCommon.cycle(IncrementalCommon.scala:248)
at sbt.internal.inc.Incremental$.$anonfun$incrementalCompile$8(Incremental.scala:414)
at sbt.internal.inc.Incremental$.withClassfileManager(Incremental.scala:499)
at sbt.internal.inc.Incremental$.incrementalCompile(Incremental.scala:401)
at sbt.internal.inc.Incremental$.apply(Incremental.scala:166)
at sbt.internal.inc.IncrementalCompilerImpl.compileInternal(IncrementalCompilerImpl.scala:571)
at sbt.internal.inc.IncrementalCompilerImpl.$anonfun$compileIncrementally$1(IncrementalCompilerImpl.scala:489)
at sbt.internal.inc.IncrementalCompilerImpl.handleCompilationError(IncrementalCompilerImpl.scala:332)
at sbt.internal.inc.IncrementalCompilerImpl.compileIncrementally(IncrementalCompilerImpl.scala:419)
at sbt.internal.inc.IncrementalCompilerImpl.compile(IncrementalCompilerImpl.scala:137)
at io.gatling.compiler.ZincCompiler$.doCompile(ZincCompiler.scala:258)
at io.gatling.compiler.ZincCompiler$.delayedEndpoint$io$gatling$compiler$ZincCompiler$1(ZincCompiler.scala:265)
at io.gatling.compiler.ZincCompiler$delayedInit$body.apply(ZincCompiler.scala:40)
at scala.Function0.apply$mcV$sp(Function0.scala:39)
at scala.Function0.apply$mcV$sp$(Function0.scala:39)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1$adapted(App.scala:80)
at scala.collection.immutable.List.foreach(List.scala:431)
at scala.App.main(App.scala:80)
at scala.App.main$(App.scala:78)
at io.gatling.compiler.ZincCompiler$.main(ZincCompiler.scala:40)
at io.gatling.compiler.ZincCompiler.main(ZincCompiler.scala)
I have tried to implicitly specify imports, but compilation fails with the same exception
import io.gatling.core.Predef.{nothingFor, rampConcurrentUsers, constantConcurrentUsers, _}
This simulation compiles and works without nothingFor(20 seconds).
Gatling has 2 different families of injection profiles steps:
open, where you control users arrival rate
closed where you control the number of concurrent users
You can't mix them because those are 2 completely different and incompatible behaviors.
nothingFor belongs to the open family while rampConcurrentUsers belongs to the closed one.
Use constantConcurrentUsers(0) during (20) instead.
I've decided to experiment with apache flink a bit. I decided to use scala console (or more precisely http://ammonite.io/) to read some stuff from csv file and print it locally... just to debug end experiments.
import $ivy.`org.apache.flink:flink-csv:1.10.0`
import $ivy.`org.apache.flink::flink-scala:1.10.0`
import org.apache.flink.api.scala._
import org.apache.flink.api.scala.extensions._
val env = ExecutionEnvironment.createLocalEnvironment()
val lines = env.readCsvFile[(String, String, String)]("/home/slovic/Dokumenty/test.csv")
lines.collect()
//java.lang.NullPointerException: Cannot find compatible factory for specified execution.target (=local)
//org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:104)
//org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:937)
//org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:860)
//org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:844)
//org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:495)
//org.apache.flink.api.scala.DataSet.collect(DataSet.scala:739)
//ammonite.$sess.cmd24$.<init>(cmd24.sc:1)
//ammonite.$sess.cmd24$.<clinit>(cmd24.sc)
What I need to do to run this code locally? (tested with scala 2.11 & 2.12)
EDIT: SOLLUTION BY Piyush_Rana
We need additional import:
import $ivy.`org.apache.flink::flink-streaming-scala:1.10.0` //Piyush_Rana's advice. !!!FIX!!!
I also got the same error and figured out that was missing one dependency -
val flinkVersion = "1.10.0"
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion,
or in ammonite repl:
import $ivy.`org.apache.flink::flink-streaming-scala:1.10.0`
You didnt execute the flink program .
Try to add execute command at the end.
env.execute("unique name")
Since in log4j javadoc is
WARNING: This version of JDBCAppender is very likely to be completely replaced in the future. Moreoever, it does not log exceptions.
What should I do to log to a database?
If you are looking for a database appender which not only works, but also supports connection pooling, is maintained and properly documented, than consider logback's DBAppender.
Ironically enough, the warning in the javadocs about removing JDBCAppender in future versions of log4j was written by me.
You can use an alternative appender, but really Log4j 1.2 is going to be around and standard for a long time. They developed DBAppender as part of their receivers companions, which isn't officially released, but you can download the source code and get your own going as well.
Unless the issue of not logging exceptions bothers you, JDBCAppender is just fine. Any further upgrade to 2.0 is going to be more radical than just changing JDBCAppender (if 2.0 happens), so I wouldn't worry about using it, despite the warning. They clearly don't have a solid roadmap or timeline to introducing a new version, and 1.2.15 was released in 2007.
**log4j.properties file**
# Define the root logger with appender file
log4j.rootLogger = DEBUG, DB
# Define the DB appender
log4j.appender.DB=org.apache.log4j.jdbc.JDBCAppender
# Set JDBC URL
log4j.appender.DB.URL=jdbc:mysql://localhost/log
# Set Database Driver
log4j.appender.DB.driver=com.mysql.jdbc.Driver
# Set database user name and password
log4j.appender.DB.user=root
log4j.appender.DB.password=root
# Set the SQL statement to be executed.
log4j.appender.DB.sql=INSERT INTO actionlg(user_id, dated, logger, level, message) values('%X{userId}',' %d{yyyy-MM-dd-HH-mm}','%C','%p','%m')
# Define the layout for file appender
log4j.appender.DB.layout=org.apache.log4j.PatternLayout
**Java Class**
Log4jExamples.java
import java.sql.*;
import java.io.*;
import org.apache.log4j.Logger;
import org.apache.log4j.MDC;
public class Log4jExample {
/* Get actual class name to be printed on */
static Logger log = Logger.getLogger(Log4jExample.class.getName());
public static void main(String[] args)throws IOException,SQLException{
log.error("Error");
MDC.put("userId", "1234");
}
}
**libs required**
- mysql-connector-java-3.1.8-bin.jar
- log4j-1.2.17.jar