Flink SQL - How to parse a TIMESTAMP with custom pattern? - apache-flink

From documentation it looks like Flink's SQL can only parse timestamps in a certain format, namely:
TIMESTAMP string: Parses a timestamp string in the form "yy-mm-dd hh:mm:ss.fff" to a SQL timestamp.
Is there any way to pass in a custom DateTimeFormatter to parse a different kind of timestamp format?

You can implement any parsing logic using a user-defined scalar function (UDF).
This would look in Scala as follows.
class TsParser extends ScalarFunction {
def eval(s: String): Timestamp = {
// your logic
}
}
Once defined the function has to be registered at the TableEnvironment:
tableEnv.registerFunction("tsParser", new TsParser())
Now you can use the function tsParser just like any built-in function.
See the documentation for details.

Related

How to convert RowData into Row when using DynamicTableSink

I have a question regarding the new sourceSinks interface in Flink. I currently implement a new custom DynamicTableSinkFactory, DynamicTableSink, SinkFunction and OutputFormat. I use the JDBC Connector as an example and I use Scala.
All data that is fed into the sink has the type Row. So the OutputFormat serialisation is based on the Row Interface:
override def writeRecord(record: Row): Unit = {...}
As stated in the documentation:
records must be accepted as org.apache.flink.table.data.RowData. The
framework provides runtime converters such that a sink can still work
on common data structures and perform a conversion at the beginning.
The goal here is to keep the Row data structure and only convert Row into RowData when inserted into the SinkFunction. So in this way the rest of the code does not need to be changed.
class MySinkFunction(outputFormat: MyOutputFormat) extends RichSinkFunction[RowData] with CheckpointedFunction
So the resulting question is: How to convert RowData into Row when using a DynamicTableSink and OutputFormat? Where should the conversion happen?
links:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sourceSinks.html
https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc
Thanks.
You can obtain a converter instance in the Context provided in org.apache.flink.table.connector.sink.DynamicTableSink#getSinkRuntimeProvider.
// create type information for the DeserializationSchema
final TypeInformation<RowData> producedTypeInfo =
context.createTypeInformation(producedDataType);
// most of the code in DeserializationSchema will not work on internal data structures
// create a converter for conversion at the end
final DataStructureConverter converter =
context.createDataStructureConverter(producedDataType);
The instance is Java serializable and can be passed into the sink function. You should also call the converter.open() method in your sink function.
A more complex example can be found here (for sources but sinks work in a similar way). Have a look at SocketDynamicTableSource and ChangelogCsvFormat in the same package.

Flink json serialization timezone issue

I use JsonRowSerializationSchema to serialize Flink's Row into JSON. I SQL timestamp serialization has timezone issues.
val row = new Row(1)
row.setField(0, new Timestamp(0))
val tableSchema = TableSchema
.builder
.field("c", DataTypes.TIMESTAMP(3).bridgedTo(classOf[Timestamp]))
.build
val serializer = JsonRowSerializationSchema.builder()
.withTypeInfo(tableSchema.toRowType)
.build()
println(new String(serializer.serialize(row)))
{"c":"1969-12-31T16:00:00Z"}
I see it uses PST(local time zone) to interpret timestamp, but then output UTC(see Z in output)
If I do TimeZone.setDefault(TimeZone.getTimeZone("UTC")), then it prints {"c":"1970-01-01T00:00:00Z"}. My timestamps are created for UTC time, and I want Flink to interpret them as UTC.
I am checking the Flink implementation, following two methods are in action.
private JsonNode convertLocalDateTime(ObjectMapper mapper, JsonNode reuse, Object object) {
return mapper.getNodeFactory()
.textNode(RFC3339_TIMESTAMP_FORMAT.format((LocalDateTime) object));
}
private JsonNode convertTimestamp(ObjectMapper mapper, JsonNode reuse, Object object) {
Timestamp timestamp = (Timestamp) object;
return convertLocalDateTime(mapper, reuse, timestamp.toLocalDateTime());
}
It looks like the implementation is hardcoded, is there any way to tell Flink to use UTC without changing the system time?
The java.sql.Timestamp is very problematic because it depends on a time zone. This is why we replaced it with the new java.time.* classes in the new Table/SQL type system.
We recommend that all Flink JVMs are configured in UTC time zone for the outdated implementation.
For Table/SQL, we use the new org.apache.flink.formats.json.JsonRowDataSerializationSchema but this works on internal data structures. I would recommend to just copy the source code of JsonRowSerializationSchema and implement the format as you need it. Or use the Jackson library directly which would avoid dealing with TypeInformation at all.

How to get a document from mongodb by using its string id and string collection name in SpringData

I know that generally, we need to do something similar to this for getting a document back from mongodb in spring data:
Define a class and annotate it with #Document:
#Document ("persons")
public class Person
Use MongoTemplete:
mongoOps.findById(p.getId(), Person.class);
The problem is that in runtime I don't know the class type of the document, I just have its string collection name and its string Id. How is it possible to retrieve the document using SpringData? Something like this:
db.myCollectionName.findOne({_id: myId})
The result object type is not a concern, it can be even an object, I just want to map it to a jackson JsonNode.
A possible workaround for this you can use the aggregate function of mongooperation like this
AggregationResults<Object> aggResults = mongoOps.aggregate(newAggregation(match(Criteria.where("_id").is(myId)) ,
myCollectionName, Object.class);
return aggResults.getUniqueMappedResult();

Registering an Aggregate UDF in Apache Flink

I am trying to follow the steps here to create a basic Flink Aggregate UDF. I've added the dependencies () and implemented
public class MyAggregate extends AggregateFunction<Long, TestAgg> {..}
I've implemented the mandatory methods as well as a few other: accumulate, merge, etc. All this builds without errors. Now according to the docs, I should be able to register this as
StreamExecutionEnvironment sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment sTableEnv = StreamTableEnvironment.getTableEnvironment(sEnv);
sTableEnv.registerFunction("MyMin", new MyAggregate());
But, the registerFucntion seems to want a ScalarFunction only as input. I am getting an incompatible type error: The method registerFunction(String, ScalarFunction) in the type TableEnvironment is not applicable for the arguments (String, MyAggregate)
Any help would be great.
You need to import the StreamTableEnvironment for your chosen language which is in your case org.apache.flink.table.api.java.StreamTableEnvironment.
org.apache.flink.table.api.StreamTableEnvironment is a common abstract class for the Java and Scala variants of StreamTableEnvironment. We've noticed that this part of the API is confusing for users and we will improve it in the future.

Passing indefinite Query Parameters with RESTful URL and reading them in RESTEasy

I have a requirement to design a RESTful Service using RESTEasy. Clients can call this common service with any number of Query Parameters they would want to. My REST code should be able to read these Query Params in some way. For example if I have a book search service, clients can make the following calls.
http://domain.com/context/rest/books/searchBook?bookName=someBookName
http://domain.com/context/rest/books/searchBook?authorName=someAuthor& pubName=somePublisher
http://domain.com/context/rest/books/searchBook?isbn=213243
http://domain.com/context/rest/books/searchBook?authorName=someAuthor
I have to write a service class like below to handle this.
#Path("/books")
public class BookRestService{
// this is what I currently have, I want to change this method to in-take all the
// dynamic parameters that can come
#GET
#Path("/searchBook")
public Response searchBook(#QueryParam("bookName") String bookName,#QueryParam("isbn") String isbn) {
// fetch all such params
// create a search array and pass to backend
}
#POST
#Path("/addBook")
public Response addBook(......) {
//....
}
}
Sorry for the bad format (I couldn't get how code formatting works in this editor!). As you can see, I need to change the method searchBook() so that it will take any number of query parameters.
I saw a similar post here, but couldn't find the right solution.
How to design a RESTful URL for search with optional parameters?
Could any one throw some light on this please?
The best thing to do in this case would be using a DTO containing all the fields of your search criteria. For example, you mentioned 4 distinct parameters.
Book Name (bookName)
Author Name (authorName)
Publisher Name (pubName)
ISBN (isbn)
Create a DTO containing the fields having the following annotations for every property you want to map the parameters to:
public class CriteriaDTO{
#QueryParam("isbn")
private String isbn;
.
.
Other getter and setters of other properties
}
Here is a method doing that for your reference:
#GET
#Produces("application/json")
#Path("/searchBooks")
public ResultDTO search(#Form CriteriaDTO dto){
}
using following URL will populate the CriteriaDTO's property isbn automatically:
your.server.ip:port/URL/Mapping/searchBooks?isbn=123456789&pubName=testing
A similar question was asked here: How do you map multiple query parameters to the fields of a bean on Jersey GET request?
I went with kensen john's answer (UriInfo) instead. It allowed to just iterate through a set to check which parameters were passed.

Resources