I am trying to run some code in flink server but unable to
Below is my code
DataStream<UserInfo> keyedStream = executionEnvironment
.addSource(new UserDataSource());
keyedStream.assignTimestampsAndWatermarks(new MessageWaterEmitter());
tableEnv.registerDataStream("test", keyedStream, "userId,ticks,startime.rowtime");
Table table = tableEnv
.sqlQuery(
"SELECT userId,COUNT(userId) as ticks,TUMBLE_END(startime,INTERVAL '5' SECOND) as startime FROM test "
+ "GROUP BY TUMBLE(startime,INTERVAL '5' SECOND),userId");
DataStream<Row> userInfoDataStream = tableEnv.toRetractStream(table, Row.class)
.filter(new FilterFunction<Tuple2<Boolean, Row>>() {
#Override
public boolean filter(Tuple2<Boolean, Row> booleanUserInfoTuple2) throws Exception {
return booleanUserInfoTuple2.f0;
}
}).map(new MapFunction<Tuple2<Boolean, Row>, Row>() {
#Override
public Row map(Tuple2<Boolean, Row> booleanUserInfoTuple2) throws Exception {
return booleanUserInfoTuple2.f1;
}
});
JdbcSink sink = new JdbcSink();
userInfoDataStream.addSink(sink);
Below is the error am getting
java.lang.RuntimeException: Rowtime timestamp is null. Please make sure that a proper TimestampAssigner is defined and the stream environment uses the EventTime time characteristic.
at DataStreamSourceConversion$651.processElement(Unknown Source)
at org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:707)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:660)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
at flink.source.UserDataSource.run(UserDataSource.java:20)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:208)
can any body help me with this problem? thank you in advance
The root cause of this problem is that while you are calling assignTimestampsAndWatermarks on keyedStream, you aren't doing anything with the result of this call. If you rework the code like this, it will work:
DataStream<UserInfo> keyedStream = executionEnvironment
.addSource(new UserDataSource())
.assignTimestampsAndWatermarks(new MessageWaterEmitter());
tableEnv.registerDataStream("test", keyedStream, "userId,ticks,startime.rowtime");
Calling assignTimestampsAndWatermarks on a stream doesn't modify that stream, but instead returns a new stream that has timestamps and watermarks.
This could also be fixed this like, which might be a bit clearer as to what is going on:
DataStream<UserInfo> streamWithTSandWMs = keyedStream
.assignTimestampsAndWatermarks(new MessageWaterEmitter());
tableEnv.registerDataStream("test", streamWithTSandWMs, "userId,ticks,startime.rowtime");
Related
I have posted a question few days back- Flink Jdbc sink
Now, I am trying to use the sink provided by flink.
I have written the code and it worked as well. But nothing got saved in DB and no exceptions were there. Using previous sink my code was not finishing(that should happen ideally as its a streaming app) but after the following code I am getting no error and the nothing is getting saved to DB.
public class CompetitorPipeline implements Pipeline {
private final StreamExecutionEnvironment streamEnv;
private final ParameterTool parameter;
private static final Logger LOG = LoggerFactory.getLogger(CompetitorPipeline.class);
public CompetitorPipeline(StreamExecutionEnvironment streamEnv, ParameterTool parameter) {
this.streamEnv = streamEnv;
this.parameter = parameter;
}
#Override
public KeyedStream<CompetitorConfig, String> start(ParameterTool parameter) throws Exception {
CompetitorConfigChanges competitorConfigChanges = new CompetitorConfigChanges();
KeyedStream<CompetitorConfig, String> competitorChangesStream = competitorConfigChanges.run(streamEnv, parameter);
//Add to JBDC Sink
competitorChangesStream.addSink(JdbcSink.sink(
"insert into competitor_config_universe(marketplace_id,merchant_id, competitor_name, comp_gl_product_group_desc," +
"category_code, competitor_type, namespace, qualifier, matching_type," +
"zip_region, zip_code, competitor_state, version_time, compConfigTombstoned, last_updated) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
(ps, t) -> {
ps.setInt(1, t.getMarketplaceId());
ps.setLong(2, t.getMerchantId());
ps.setString(3, t.getCompetitorName());
ps.setString(4, t.getCompGlProductGroupDesc());
ps.setString(5, t.getCategoryCode());
ps.setString(6, t.getCompetitorType());
ps.setString(7, t.getNamespace());
ps.setString(8, t.getQualifier());
ps.setString(9, t.getMatchingType());
ps.setString(10, t.getZipRegion());
ps.setString(11, t.getZipCode());
ps.setString(12, t.getCompetitorState());
ps.setTimestamp(13, Timestamp.valueOf(t.getVersionTime()));
ps.setBoolean(14, t.isCompConfigTombstoned());
ps.setTimestamp(15, new Timestamp(System.currentTimeMillis()));
System.out.println("sql"+ps);
},
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("xyz")
.withPassword("xyz#")
.build()));
return competitorChangesStream;
}
}
You need enable autocommit mode for jdbc Sink.
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database;autocommit=true")
It looks like SimpleBatchStatementExecutor only works in auto-commit mode. And if you need to commit and rollback batches, then you have to write your own ** JdbcBatchStatementExecutor **
Have you tried to include the JdbcExecutionOptions ?
dataStream.addSink(JdbcSink.sink(
sql_statement,
(statement, value) -> {
/* Prepared Statement */
},
JdbcExecutionOptions.builder()
.withBatchSize(5000)
.withBatchIntervalMs(200)
.withMaxRetries(2)
.build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("xyz")
.withPassword("xyz#")
.build()));
FLINK Streaming: I have DataStream[String] from kafkaconsumer which is
JSON
stream = env
.addSource(new FlinkKafkaConsumer[String]("topic", new SimpleStringSchema(), properties))
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html
I have to sink this stream using StreamingFileSink, which needs DataStream[GenericRecord]
val schema: Schema = ...
val input: DataStream[GenericRecord] = ...
val sink: StreamingFileSink[GenericRecord] = StreamingFileSink
.forBulkFormat(outputBasePath, AvroWriters.forGenericRecord(schema))
.build()
input.addSink(sink)
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html
Question: How to convert DataStream[String] to DataStream[GenericRecord] before Sinking so that I can write AVRO files ?
Exception while converting String stream to generic data strem
Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: Task not serializable
at org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:408)
at org.apache.flink.api.scala.ClosureCleaner$.org$apache$flink$api$scala$ClosureCleaner$$clean(ClosureCleaner.scala:400)
at org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:168)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scalaClean(StreamExecutionEnvironment.scala:791)
at org.apache.flink.streaming.api.scala.DataStream.clean(DataStream.scala:1168)
at org.apache.flink.streaming.api.scala.DataStream.map(DataStream.scala:617)
at com.att.vdcs.StreamingJobKafkaFlink$.main(StreamingJobKafkaFlink.scala:128)
at com.att.vdcs.StreamingJobKafkaFlink.main(StreamingJobKafkaFlink.scala)
Caused by: java.io.NotSerializableException: org.apache.avro.Schema$RecordSchema
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
at org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:406)
... 7 more
After initializing schema in mapper, Getting cast exception.
org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: scala.Tuple2 cannot be cast to java.util.Map
schema and msg below in screen:
Got through CAST Exception by casting like:
record.put(0,scala.collection.JavaConverters.mapAsJavaMapConverter(msg._1).asJava)
Now streaming is working good Except there are extra Escape Characters added
,"body":"\"{\\\"hdr\\\":{\\\"mes
there are extra escape \
it should be like:
,"body":"\"{\"hdr\":{\"mes
extra escape was removed after changing toString to getAsString
Now its working as expected.
Need to try SNAPPY compression of stream next.
You need to transform your stream of Strings into a stream of GenericRecords, for example using a .map() function.
Example:
DataStream<String> strings = env.addSource( ... );
DataStream<GenericRecord> records = strings.map(inputStr -> {
GenericData.Record rec = new GenericData.Record(schema);
rec.put(0, inputStr);
return rec;
});
Please note that using GenericRecord can lead to a poor performance, because the schema needs to be serialized with each record over and over again.
It is better to generate an Avro Pojo, as it won't need to ship the schema.
In java, you should use a RichMapFunction to convert DataStream to DataStream and add a transient Schema field to generate GenericRecord. But i dont know how to do this in scala, just for reference.
DataStream<GenericRecord> records = maps.map(new RichMapFunction<Map<String, Object>, GenericRecord>() {
private transient DatumWriter<IndexedRecord> datumWriter;
/**
* Output stream to serialize records into byte array.
*/
private transient ByteArrayOutputStream arrayOutputStream;
/**
* Low-level class for serialization of Avro values.
*/
private transient Encoder encoder;
/**
* Avro serialization schema.
*/
private transient Schema schema;
#Override
public GenericRecord map(Map<String, Object> stringObjectMap) throws Exception {
GenericRecord record = new GenericData.Record(schema);
stringObjectMap.entrySet().forEach(entry->{record.put(entry.getKey(), entry.getValue());});
return record;
}
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
this.arrayOutputStream = new ByteArrayOutputStream();
this.encoder = EncoderFactory.get().binaryEncoder(arrayOutputStream, null);
this.datumWriter = new GenericDatumWriter<>(schema);
try {
this.schema = new Schema.Parser().parse(avroSchemaString);
} catch (SchemaParseException e) {
throw new IllegalArgumentException("Could not parse Avro schema string.", e);
}
}
});
final StreamingFileSink<GenericRecord> sink = StreamingFileSink
.forBulkFormat(new Path("D:\\test"), AvroWriters.forGenericRecord(mongoSchema))
.build();
records.addSink(sink);
Flink1.9.1
I read a csv file. I want to use a long type column to TUMBLE.
I use UDF transfer Long type to Timestamp type,but is can't work
error message: Window can only be defined over a time attribute column.
I try to debug. TimeIndicatorRelDataType is not Timestamp,I don't know how to transfer and why?
def isTimeIndicatorType(relDataType: RelDataType): Boolean = relDataType match {
case ti: TimeIndicatorRelDataType => true
case _ => false
}
CODE
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(1);
// read csv
URL fileUrl = HotItemsSql.class.getClassLoader().getResource("UserBehavior-less.csv");
CsvTableSource csvTableSource = CsvTableSource.builder().path(fileUrl.getPath())
.field("userId", BasicTypeInfo.LONG_TYPE_INFO)
.field("itemId", BasicTypeInfo.LONG_TYPE_INFO)
.field("categoryId", BasicTypeInfo.LONG_TYPE_INFO)
.field("behavior", BasicTypeInfo.LONG_TYPE_INFO)
.field("optime", BasicTypeInfo.LONG_TYPE_INFO)
.build();
// trans to stream
DataStream<Row> csvDataStream=csvTableSource.getDataStream(env).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Row>() {
#Override
public long extractAscendingTimestamp(Row element) {
return Timestamp.valueOf(element.getField(5).toString()).getTime();
}
}).broadcast();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
tableEnv.registerDataStream("T_UserBehavior",csvDataStream,"userId,itemId,categoryId,behavior,optime");
tableEnv.registerFunction("Long2DateTime",new DateTransFunction());
Table result = tableEnv.sqlQuery("select userId," +
"TUMBLE_START(Long2DateTime(optime), INTERVAL '10' SECOND) as window_start," +
"TUMBLE_END(Long2DateTime(optime), INTERVAL '10' SECOND) as window_end " +
"from T_UserBehavior " +
"group by TUMBLE(Long2DateTime(optime),INTERVAL '10' SECOND),userId");
tableEnv.toRetractStream(result, Row.class).print();
UDF
import java.sql.Timestamp;
public class DateTransFunction extends ScalarFunction {
public Timestamp eval(Long longTime) {
try {
Timestamp t = new Timestamp(longTime);
return t;
} catch (Exception e) {
return null;
}
}
}
error stack
Exception in thread "main" org.apache.flink.table.api.ValidationException: Window can only be defined over a time attribute column.
at org.apache.flink.table.plan.rules.datastream.DataStreamLogicalWindowAggregateRule.getOperandAsTimeIndicator$1(DataStreamLogicalWindowAggregateRule.scala:85)
at org.apache.flink.table.plan.rules.datastream.DataStreamLogicalWindowAggregateRule.translateWindowExpression(DataStreamLogicalWindowAggregateRule.scala:90)
at org.apache.flink.table.plan.rules.common.LogicalWindowAggregateRule.onMatch(LogicalWindowAggregateRule.scala:68)
at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560)
at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419)
at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215)
at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202)
at org.apache.flink.table.plan.Optimizer.runHepPlanner(Optimizer.scala:228)
at org.apache.flink.table.plan.Optimizer.runHepPlannerSequentially(Optimizer.scala:194)
at org.apache.flink.table.plan.Optimizer.optimizeNormalizeLogicalPlan(Optimizer.scala:150)
at org.apache.flink.table.plan.StreamOptimizer.optimize(StreamOptimizer.scala:65)
at org.apache.flink.table.planner.StreamPlanner.translateToType(StreamPlanner.scala:410)
at org.apache.flink.table.planner.StreamPlanner.org$apache$flink$table$planner$StreamPlanner$$translate(StreamPlanner.scala:182)
Since you already managed to assign a timestamp in DataStream API, you should be able to call:
tableEnv.registerDataStream(
"T_UserBehavior",
csvDataStream,
"userId, itemId, categoryId, behavior, rt.rowtime");
The .rowtime instructs the API to create column with the timestamp stored in every stream record coming from DataStream API.
The community is currently working on making your program easier. In Flink 1.10 you should be able to define your CSV with rowtime table directly in a SQL DDL.
I am reading data from Kafka using flink 1.4.2 and parsing them to ObjectNode using JSONDeserializationSchema. If the incoming record is not a valid JSON then my Flink job fails. I would like to skip the broken record instead of failing the job.
FlinkKafkaConsumer010<ObjectNode> kafkaConsumer =
new FlinkKafkaConsumer010<>(TOPIC, new JSONDeserializationSchema(), consumerProperties);
DataStream<ObjectNode> messageStream = env.addSource(kafkaConsumer);
messageStream.print();
I am getting the following exception if the data in Kafka is not a valid JSON.
Job execution switched to status FAILING.
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'This': was expecting ('true', 'false' or 'null')
at [Source: [B#4f522623; line: 1, column: 6]
Job execution switched to status FAILED.
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
The easiest solution is to implement your own DeserializationSchema and wrap JSONDeserializationSchema. You can then catch the exception and either ignore it or perform custom action.
As suggested by #twalthr, I implemented my own DeserializationSchema by copying JSONDeserializationSchema and added exception handling.
import org.apache.flink.api.common.serialization.AbstractDeserializationSchema;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.IOException;
public class CustomJSONDeserializationSchema extends AbstractDeserializationSchema<ObjectNode> {
private ObjectMapper mapper;
#Override
public ObjectNode deserialize(byte[] message) throws IOException {
if (mapper == null) {
mapper = new ObjectMapper();
}
ObjectNode objectNode;
try {
objectNode = mapper.readValue(message, ObjectNode.class);
} catch (Exception e) {
ObjectMapper errorMapper = new ObjectMapper();
ObjectNode errorObjectNode = errorMapper.createObjectNode();
errorObjectNode.put("jsonParseError", new String(message));
objectNode = errorObjectNode;
}
return objectNode;
}
#Override
public boolean isEndOfStream(ObjectNode nextElement) {
return false;
}
}
In my streaming job.
messageStream
.filter((event) -> {
if(event.has("jsonParseError")) {
LOG.warn("JsonParseException was handled: " + event.get("jsonParseError").asText());
return false;
}
return true;
}).print();
Flink has improved null record handling for FlinkKafkaConsumer
There are two possible design choices when the DeserializationSchema encounters a corrupted message. It can either throw an IOException which causes the pipeline to be restarted, or it can return null where the Flink Kafka consumer will silently skip the corrupted message.
For more details, you can see this link.
I am querying oracle database using Flink DataSet API. For this I have customised Flink JDBCInputFormat to return java.sql.Resultset. As I need to perform further operation on resultset using Flink operators.
public static void main(String[] args) throws Exception {
ExecutionEnvironment environment = ExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);
#SuppressWarnings("unchecked")
DataSource<ResultSet> source
= environment.createInput(JDBCInputFormat.buildJDBCInputFormat()
.setUsername("username")
.setPassword("password")
.setDrivername("driver_name")
.setDBUrl("jdbcUrl")
.setQuery("query")
.finish(),
new GenericTypeInfo<ResultSet>(ResultSet.class)
);
source.print();
environment.execute();
}
Following is the customised JDBCInputFormat:
public class JDBCInputFormat extends RichInputFormat<ResultSet, InputSplit> implements ResultTypeQueryable {
#Override
public void open(InputSplit inputSplit) throws IOException {
Class.forName(drivername);
dbConn = DriverManager.getConnection(dbURL, username, password);
statement = dbConn.prepareStatement(queryTemplate, resultSetType, resultSetConcurrency);
resultSet = statement.executeQuery();
}
#Override
public void close() throws IOException {
if(statement != null) {
statement.close();
}
if(resultSet != null)
resultSet.close();
if(dbConn != null) {
dbConn.close();
}
}
#Override
public boolean reachedEnd() throws IOException {
isLastRecord = resultSet.isLast();
return isLastRecord;
}
#Override
public ResultSet nextRecord(ResultSet row) throws IOException{
if(!isLastRecord){
resultSet.next();
}
return resultSet;
}
}
This works with below query having limit in the row fetched:
SELECT a,b,c from xyz where rownum <= 10;
but when I try to fetch all the rows having approx 1 million of data, I am getting the below exception after fetching random number of rows:
java.sql.SQLRecoverableException: Io exception: Socket closed
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:101)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:199)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:263)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.fetch(T4CPreparedStatement.java:1024)
at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314)
at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228)
at oracle.jdbc.driver.ScrollableResultSet.cacheRowAt(ScrollableResultSet.java:1839)
at oracle.jdbc.driver.ScrollableResultSet.isValidRow(ScrollableResultSet.java:1823)
at oracle.jdbc.driver.ScrollableResultSet.isLast(ScrollableResultSet.java:349)
at JDBCInputFormat.reachedEnd(JDBCInputFormat.java:98)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketOutputStream.socketWrite0(Native Method)
So for my case, how i can solve this issue?
I don't think it is possible to ship a ResultSet like a regular record. This is a stateful object that internally maintains a connection to the database server. Using a ResultSet as a record that is transferred between Flink operators means that it can be serialized, shipped over the via the network to another machine, deserialized, and handed to a different thread in a different JVM process. That does not work.
Depending on the connection a ResultSet might as well stay on the same machine in the same thread, which might be the case that worked for you. If you want to query a database from within an operator, you could implement the function as a RichMapPartitionFunction. Otherwise, I'd read the ResultSet in the data source and forward the resulting rows.