Flink map tuple to string

Flink map tuple to string - apache-flink

Hi I have some tuple Tuple2<String, Integer> that i want to convert to string an then send it to KAFKA.
Im trying to figure out a way to to iterate the tuple and create one string from it so if i have N elements in my tuple i want to create a string that contain them.
I tried flat map it but im geting new string for each element in the tuple.
SingleOutputStreamOperator<String> s = t.flatMap(new FlatMapFunction<Tuple2<String, Integer>, String>() {
#Override
public void flatMap(Tuple2<String, Integer> stringIntegerTuple2, Collector<String> collector) throws Exception {
collector.collect(stringIntegerTuple2.f0 + stringIntegerTuple2.f1);
}
});
What is the correct way to create on string out of tuple .

You can just override the .toString() method of a tuple with a custom class and use that. Like this:
import org.apache.flink.api.java.tuple.Tuple3;
public class CustomTuple3 extends Tuple3 {
#Override
public String toString(){
return "measurment color=" + this.f0.toString() + " color=" + this.f1.toString() + " color=" + this.f2.toString();
}
}
So now just use a CustomTuple3 object instead of a Tuple3 and when you populate it and call .toString() on it, it will output that formatted string.

Related

Writing parquet output with selected attributes from Bean

I have a bean class
#Getter
#Setter
public class Employee {
String id;
String name;
String depart;
String address;
final String pipe= "|";
#Override
public String toString() {
return id +pipe+ name +pipe+depart;
}
}
And I have a JavaRDD<Employee> emprdd;
and when I do the emprdd.saveAsText(path);. I get the output as based on the toString method.
Now I wanted to write into the parquet format after converting it to the dataframe but I need only (id,name,depart). I tried sqlContext.createDataframe(rdd,Employee.class); (syntax ignored), but I dont need all the properties.
Can anyone guide me through this. (This is a sample , I have bean class with 350+ attributes)

is JSONDeserializationSchema() deprecated in Flink?

I am new to Flink and doing something very similar to the below link.
Cannot see message while sinking kafka stream and cannot see print message in flink 1.2
I am also trying to add JSONDeserializationSchema() as a deserializer for my Kafka input JSON message which is without a key.
But I found JSONDeserializationSchema() is not present.
Please let me know if I am doing anything wrong.

JSONDeserializationSchema was removed in Flink 1.8, after having been deprecated earlier.
The recommended approach is to write a deserializer that implements DeserializationSchema<T>. Here's an example, which I've copied from the Flink Operations Playground:
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
/**
* A Kafka {#link DeserializationSchema} to deserialize {#link ClickEvent}s from JSON.
*
*/
public class ClickEventDeserializationSchema implements DeserializationSchema<ClickEvent> {
private static final long serialVersionUID = 1L;
private static final ObjectMapper objectMapper = new ObjectMapper();
#Override
public ClickEvent deserialize(byte[] message) throws IOException {
return objectMapper.readValue(message, ClickEvent.class);
}
#Override
public boolean isEndOfStream(ClickEvent nextElement) {
return false;
}
#Override
public TypeInformation<ClickEvent> getProducedType() {
return TypeInformation.of(ClickEvent.class);
}
}
For a Kafka producer you'll want to implement KafkaSerializationSchema<T>, and you'll find examples of that in that same project.

To solve the problem of reading non-key JSON messages from Kafka I used case class and JSON parser.
The following code makes a case class and parses the JSON field using play API.
import play.api.libs.json.JsValue
object CustomerModel {
def readElement(jsonElement: JsValue): Customer = {
val id = (jsonElement \ "id").get.toString().toInt
val name = (jsonElement \ "name").get.toString()
Customer(id,name)
}
case class Customer(id: Int, name: String)
}
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "xxx.xxx.0.114:9092")
properties.setProperty("group.id", "test-grp")
val consumer = new FlinkKafkaConsumer[String]("customer", new SimpleStringSchema(), properties)
val stream1 = env.addSource(consumer).rebalance
val stream2:DataStream[Customer]= stream1.map( str =>{Try(CustomerModel.readElement(Json.parse(str))).getOrElse(Customer(0,Try(CustomerModel.readElement(Json.parse(str))).toString))
})
stream2.print("stream2")
env.execute("This is Kafka+Flink")
}
The Try method lets you overcome the exception thrown while parsing the data
and returns the exception in one of the fields (if we want) or else it can just return the case class object with any given or default fields.
The sample output of the Code is:
stream2:1> Customer(1,"Thanh")
stream2:1> Customer(5,"Huy")
stream2:3> Customer(0,Failure(com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
at [Source: ; line: 1, column: 0]))
I am not sure if it is the best approach but it is working for me as of now.

To convert object node to json node

Here DataStream returns keyvalue pair as a object i need key value directly not as a object becoz i need to group the values based on key.
DataStream<ObjectNode> stream = env
.addSource(new FlinkKafkaConsumer<>("test5", new JSONKeyValueDeserializationSchema (false), properties));
// stream.keyBy("record1").print();
when i give stream.keyby("record1").print();
it shows
Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: This type (GenericType<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode>) cannot be used as key.
at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)
at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)
at ReadFromKafka.main(ReadFromKafka.java:27)

David Anderson's response is correct, as an addition, I can add that You can simply create the KeySelector that will extract the key as String. It could look like this:
public class JsonKeySelector implements KeySelector<ObjectNode, String> {
#Override
public String getKey(ObjectNode jsonNodes) throws Exception {
return jsonNodes.get("key").asText();
}
}
This obviously assumes that the Key is supposed to be String.

There are several ways to specify the key selector in a Flink keyBy. For example, if you have a POJO of type Event with the String key in a field named "id", any of these will work:
stream.keyBy("id")
stream.keyBy(event -> event.id)
stream.keyBy(
new KeySelector<Event, String>() {
#Override
public String getKey(Event event) throws Exception {
return event.id;
}
}
)
So long as you can compute the key from the object in a deterministic way, you can make this work.

Flink DataSet Tuple Values not coming as expected

I have a Dataset<Tuple3<String,String,Double>> values which has the following data:
<Vijaya,Chocolate,5>
<Vijaya,Chips,10>
<Rahul,Chocolate,2>
<Rahul,Chips,8>
I want the DataSet<Tuple5<String,String,Double,String,Double>> values1as following:
<Vijaya,Chocolate,5,Chips,10>
<Rahul,Chocolate,2,Chips,8>
My code looks like following:
DataSet<Tuple5<String, String, Double, String, Double>> values1 = values.fullOuterJoin(values)
.where(0)
.equalTo(0)
.with(
new JoinFunction<Tuple3<String, String, Double>, Tuple3<String, String, Double>, Tuple5<String, String, Double, String, Double>>() {
private static final long serialVersionUID = 1L;
public Tuple5<String, String, Double, String, Double> join(Tuple3<String, String, Double> first, Tuple3<String, String, Double> second) {
return new Tuple5<String, String, Double, String, Double>(first.f0, first.f1, first.f2, second.f1, second.f2);
}
})
.distinct(1, 3)
.distinct(1);
In the above code I tried doing self join.I want the output in that particular format but I am unable to get it.
How to do this?
Please help.

Since you don't want the output to have the same item repeated, you can use a flat-join, in which you can output only those records that have the value in the 2nd position not equal to the value in the 4th position. Also, if you want only "chocolate" in the 2nd position, that can also be checked inside the FlatJoinFunction. Please find below the link to Flink's documentation about Flat-join.
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/dataset_transformations.html#join-with-flat-join-function
Approach using GroupReduceFunction:
values
.groupBy(0)
.reduceGroup(new GroupReduceFunction<Tuple3<String,String,Double>, Tuple2<String, String>>() {
#Override
public void reduce(Iterable<Tuple3<String,String,Double>> in, Collector<Tuple2<String, String>> out) {
StringBuilder output = new StringBuilder();
String name = null;
for (Tuple3<String,String,Double> item : in) {
name = item.f0;
output.append(item.f1+","+item.f2+",");
}
out.collect(new Tuple2<String, String>(name,output.toString()));
}
});

Display an array list without foreach looping in ASP.NET MVC-3

I want to retrieve a list of array from XML file. I use an integration tool for querying. But what should I do if I want to create a list of arrays without any foreach loop. (Reason is, in this case foreach cannot be applied.
XML File Format:
<arr name="ArrayinXML"><str>dsfadasfsdasda</str><str>gdhsdhshhfb</str>
In Index.cshtml:
#p.ArrayinXML.FirstOrDefault()
In the above case, it returns only the first string value and not the second one.

Can you make an extension method that does the foreach for you?
Something like this:
public static class IEnumerableExtensions
{
public static string ToString<T>(this IEnumerable<T> collection, string separater)
{
if (collection == null)
return String.Empty;
return String.Join(separater, collection);
}
}
You could, of course, just call #String.Join(p.ArrayinXML, ", ") in your code, but I think the extension method makes it a bit more elegant.
Then add the extension namespace to your web.config, and you can do this in the view:
#p.ArrayinXML.ToString(", ")
Edit:
Here's the extension with a transform parameter so you can customize further:
public static string ToString<T>(this IEnumerable<T> collection, string separater, Func<T, object> transform) where T : class
{
if (collection == null)
return String.Empty;
return String.Join(separater, collection.Select(s => transform(s).ToString()));
}