Giraph: Using Text as VertexId

Giraph: Using Text as VertexId - giraph

I try to test Giraph.
VertexId type Text
Input Edge-Base
If I use Text as VertexId, I get error. If LongWritable, everything is OK.
Questions:
1. Is it OK to use Text as VertexId?
2. If yes, what an I doing wring?
Error:
14/10/15 14:59:28 INFO worker.InputSplitsCallable: call: Loaded 1 input splits in 0.08243016 secs, (v=0, e=12) 0.0 vertices/sec, 145.57777 edges/sec
14/10/15 14:59:28 ERROR utils.LogStacktraceCallable: Execution of callable failed
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.giraph.utils.UnsafeArrayReads.readFully(UnsafeArrayReads.java:103)
at org.apache.hadoop.io.Text.readFields(Text.java:265)
at org.apache.giraph.utils.ByteStructVertexIdDataIterator.next(ByteStructVertexIdDataIterator.java:65)
at org.apache.giraph.edge.AbstractEdgeStore.addPartitionEdges(AbstractEdgeStore.java:161)
Custom format:
public class TextDoubleTextEdgeInputFormat extends
TextEdgeInputFormat<Text, DoubleWritable> {
/** Splitter for endpoints */
private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
...
Main class:
public class HelloWorld extends
BasicComputation<Text, Text, NullWritable, NullWritable> {
#Override
public void compute(Vertex<Text, Text, NullWritable> vertex,
Iterable<NullWritable> messages) {
System.out.print("Hello world from the: " + vertex.getId().toString()
+ " who is following:");
for (Edge<Text, NullWritable> e : vertex.getEdges()) {
System.out.print(" " + e.getTargetVertexId());
}
System.out.println("");
vertex.voteToHalt();
}
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new GiraphRunner(), args));
}
}
Test starter:
#Test
public void textDoubleTextEdgeInputFormatTest() throws Exception {
String[] graph = { "1 2 1.0", "2 1 1.0", "1 3 1.0", "3 1 1.0",
"2 3 2.0", "3 2 2.0", "3 4 2.0", "4 3 2.0", "3 5 1.0",
"5 3 1.0", "4 5 1.0", "5 4 1.0" };
GiraphConfiguration conf = new GiraphConfiguration();
conf.setComputationClass(HelloWorld.class);
conf.setEdgeInputFormatClass(TextDoubleTextEdgeInputFormat.class);
// conf.setEdgeInputFormatClass(TextLongL6TextEdgeInputFormat.class);
// conf.setVertexOutputFormatClass(IdWithValueTextOutputFormat.class);
InternalVertexRunner.run(conf, null, graph);
}

I'm not experienced in Giraph but in Apache SparkX the VertexId has type Long.
I wouldn't be surprised if design patterns of Apache Giraph are reused in Apache Spark GraphX. I therefore guess that type Long is the way to go because your implementation of type LongWritable is successful.

Related

I cannot get event with ProcessingTimeWindows

I am trying to write a demo code with Flink Data stream. I create source with fromElement. I process data with *ProcessingTimeWindows. But I cannot get any event in the window.
public class OrderSummaryJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<OrderItem> orderItemDataStream = env.fromElements(
"B001 Jack 1671537515000 3",
"B002 Jack 1671537515000 2",
"B001 Tom 1671537518000 1",
"B003 Jason 1671537519000 5",
"B002 Jason 1671537519000 7",
"B005 Jason 1671537519000 -1",
"B001 Green 1671537539000 4"
).flatMap(new OrderItemFlatMapFunction())
.filter(new OrderItemFilterFunction());
orderItemDataStream.print();
/*
The following code seemed cannot be run with *ProcessingTimeWindows
*/
DataStream<OrderSkuQty> orderSkuQtyDataStream = orderItemDataStream
.keyBy(OrderItem::getSkuNo)
//.window(TumblingProcessingTimeWindows.of(Time.seconds(30)))
.window(SlidingProcessingTimeWindows.of(Time.seconds(60),Time.seconds(60)))
.process(new OrderSkuQtyProcessWindowFunction());
orderSkuQtyDataStream.print();
env.execute("Order Item");
}
}
The output show that the process functions are never called.
I try to use EventTimeWindows then I can have the window work, but when I use *ProcessingTimeWindows, it cannot work.

How can I read int values from json file in Java?

I want to take "ids" and assign it to an arraylist. config.json file is here:
{
"users":[
{"user id":1,"user name":"A","user type":"bot1", "ids": [1 ,2]},
{"user id":2,"user name":"B","user type":"bot2","ids": [1 ,2]},
{"user id":3,"user name":"C","user type":"bot3","ids": [2 ,3]}
]
}
To read this json file I have tried this:
JSONArray jsonArrayForUsers = (JSONArray) jsonObject.get("users");
for (int i=0; i<jsonArrayForUsers.size(); i++) {
JSONObject obj2 = (JSONObject) jsonArrayForUsers.get(i);
long userId = (long) obj2.get("user id");
String userName = (String) obj2.get("user name");
String userType = (String) obj2.get("user type");
JSONArray jsonDatasetIds = (JSONArray) jsonObject.get("ids");
List <Integer> list = Arrays.asList(jsonDatasetIds);// Trying to covert JSONArray into an array but error occurs
//Type mismatch: cannot convert from List<JSONArray> to List<Integer>
users.add(new User((int)userId, userName, userType,list));
}
I could read "user id", "user name" , "user type" correctly. But I don't know how to read -"ids": [1 ,2]- part and create an arraylist from it.
Edit:
Problem solved after I've changed
//It should've been 'obj2.get()'
JSONArray jsonDatasetIds = (JSONArray) obj2.get("ids");
for(int a = 0; i<jsonDatasetIds.size();i++){
list.add((int)jsonDatasetIds.get(a));
}
users.add(new User((int)userId, userName, userType,list));
But now I can not add values from 'jsonDatasetIds' to 'list'.
In other words I can not take values from JSONArray.

There are many ways to improve the code, let's start by reformulating the problem: You need to go from a JSON string to a model object with all the attributes mapped.
There are good libraries for doing just this, like Jackson and Gson, but it can also be done with json-simple 1.1.1 as you are doing.
Assuming your model object looks like this:
public class User {
private int id;
private String name;
private String type;
private List<Integer> ids;
// and constructor + getters + setters
}
And starting from the JSON you provided:
{
"users":[
{"user id":1, "user name":"A", "user type":"bot1", "ids":[1, 2]},
{"user id":2, "user name":"B", "user type":"bot2", "ids":[1, 2]},
{"user id":3, "user name":"C", "user type":"bot3", "ids":[2, 3]}
]
}
We can make a first attempt that looks like this:
public static void main(String[] args) throws ParseException {
String json =
"{ \"users\":[" +
" {\"user id\":1,\"user name\":\"A\",\"user type\":\"bot1\", \"ids\": [1 ,2]}," +
" {\"user id\":2,\"user name\":\"B\",\"user type\":\"bot2\",\"ids\": [1 ,2]}," +
" {\"user id\":3,\"user name\":\"C\",\"user type\":\"bot3\",\"ids\": [2 ,3]}" +
" ]" +
"}";
List<User> result = new ArrayList<>();
JSONParser parser = new JSONParser();
JSONObject root = (JSONObject) parser.parse(json);
List<JSONObject> users = (List<JSONObject>) root.get("users");
for (JSONObject user: users) {
List<Integer> userIds = new ArrayList<>();
for (Long id : (List<Long>) user.get("ids")) {
userIds.add(id.intValue());
}
result.add(new User(((Long)user.get("user id")).intValue(), (String) user.get("user name"), (String) user.get("user type"), userIds));
}
System.out.println(result);
}
As you can see a JSONArray can be directly casted to a List, and if we look at the signature of JSONArray class we can understand why:
public class JSONArray extends ArrayList implements List, JSONAware, JSONStreamAware
JSONArray is actually extending ArrayList so you do not need to create a new one and that is why the line List<JSONObject> users = (List<JSONObject>) root.get("users"); works.
Now that you have a list its possible to iterate over the elements with the usual foreach loop for (JSONObject user: users) {...} and process each element.
For the internal "ids" we could do the same thing but from your code it seems the ids must be integers and json-simple returns Long objects, so we need to convert them and that is why the second loop is there (if the model object accepted a List<Long> it would not be needed).
A different way for doing the same thing is with java 8 streams, specifying the different steps you need to convert the source object into the destination one:
public static void main(String[] args) throws ParseException {
String json =
"{ \"users\":[" +
" {\"user id\":1,\"user name\":\"A\",\"user type\":\"bot1\", \"ids\": [1 ,2]}," +
" {\"user id\":2,\"user name\":\"B\",\"user type\":\"bot2\",\"ids\": [1 ,2]}," +
" {\"user id\":3,\"user name\":\"C\",\"user type\":\"bot3\",\"ids\": [2 ,3]}" +
" ]" +
"}";
JSONParser parser = new JSONParser();
JSONObject root = (JSONObject) parser.parse(json);
List<User> result = ((List<JSONObject>)root.get("users"))
.stream()
.map(user -> new User(
((Long)user.get("user id")).intValue(),
(String) user.get("user name"),
(String) user.get("user type"),
((List<Long>) user.get("ids"))
.stream()
.map(Long::intValue)
.collect(Collectors.toList())))
.collect(Collectors.toList());
System.out.println(result);
}
In my opinion the stream api is more clear in situations when you need to go step by step transforming a list of objects from one type to another, in this example it does not makes much of a difference but in more complex situations with many intermediate steps it can be a really useful tool.

SimpleScheduledRoutePolicy does not work on the specific time

I'm developing a web application where user adds issue specifying the date and time on which he/she should get a notification mail. I'm new to apache camel and quartz scheduler.
I have written a sample code as below. IssueDTO is nothing but a POJO. If the issue is repetitive, I have configured a cron scheduler which works properly i.e. if I specify frequency as 5, I get the expected output which is a println statement to the console. But if the issue is not repetitive, I have used SimpleScheduledRoutePolicy and hardcoded the date and time at which process() method of the Processor should run. I simply change the date time to 2 min later of the current system time to check whether code is working. But it never enters the process method and does print this statement => System.out.println("*****************" + issueDTO.getIssueId() + " running at " + gc.getTime().toString());
#Override
public void configure() throws Exception
{
System.out.println("in ReminderRouteBuilder configure()");
System.out.println("Issue ID : " + issueDTO.getIssueId());
System.out.println("Issue Frequency : " + issueDTO.getFrequency());
System.out.println("Is Repetative : " + issueDTO.getIsRepetitive());
// if Repetitive
if (StringUtil.getBoolean(issueDTO.getIsRepetitive()))
{
String fromString = "quartz2://" + issueDTO.getIssueId() + "?cron=0/" + issueDTO.getFrequency() + "+*+*+*+*+?";
from(fromString).process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception
{
System.out.println(issueDTO.getIssueId() + " running every " + issueDTO.getFrequency() + " sec...");
}
});
}
// if not Repetitive
else
{
SimpleScheduledRoutePolicy policy = new SimpleScheduledRoutePolicy();
GregorianCalendar gc = new GregorianCalendar(2019, Calendar.AUGUST, 31, 13, 45);
policy.setRouteStartDate(gc.getTime());
from("direct:start").routeId(issueDTO.getIssueId()).routePolicy(policy).process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception
{
System.out.println("*****************" + issueDTO.getIssueId() + " running at " + gc.getTime().toString());
}
});
}
}
Am I missing something?

Direct endpoint needs to be triggered manualy with some event. If you need something, what is triggered automatically after start of route, you can use Timer endpoint with repeatCount=1 or Quartz endpoint with fireNow=true.
E.g. this will trigger Exchange only once, after route startup:
from("timer:start?repeatCount=1").routeId(issueDTO.getIssueId()).routePolicy(policy)

Ok.. I got the solution :).
I used the cron expression specifying exact date and time and it worked.
#Override
public void configure() throws Exception
{
System.out.println("in ReminderRouteBuilder configure()");
System.out.println("Issue ID : " + issueDTO.getIssueId());
System.out.println("Issue Frequency : " + issueDTO.getFrequency());
System.out.println("Is Repetative : " + issueDTO.getIsRepetitive());
// if Repetitive
if (StringUtil.getBoolean(issueDTO.getIsRepetitive()))
{
String fromString = "quartz2://" + issueDTO.getIssueId() + "?cron=0/" + issueDTO.getFrequency() + "+*+*+*+*+?";
from(fromString).process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception
{
System.out.println(issueDTO.getIssueId() + " running every " + issueDTO.getFrequency() + " sec...");
}
});
}
// if not Repetitive
else
{
String fromString = "quartz2://" + issueDTO.getIssueId() + "?cron=0 40 12 4 SEP ? 2019";
from(fromString).process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception
{
System.out.println(issueDTO.getIssueId() + " running now");
}
});
}
}

Does Apache Flink support multiple events with the same timestamp?

it seems like Apache Flink would not handle well two events with the same timestamp in certain scenarios.
According to the docs a Watermark of t indicates any new events will have a timestamp strictly greater than t. Unless you can completely discard the possibility of two events having the same timestamp then you will not be safe to ever emit a Watermark of t. Enforcing distinct timestamps also limits the number of events per second a system can process to 1000.
Is this really an issue in Apache Flink or is there a workaround?
For those of you that'd like a concrete example to play with, my use case is to build a hourly aggregated rolling word count for an event time ordered stream. For the data sample that I copied in a file (notice the duplicate 9):
mario 0
luigi 1
mario 2
mario 3
vilma 4
fred 5
bob 6
bob 7
mario 8
dan 9
dylan 9
dylan 11
fred 12
mario 13
mario 14
carl 15
bambam 16
summer 17
anna 18
anna 19
edu 20
anna 21
anna 22
anna 23
anna 24
anna 25
And the code:
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment()
.setParallelism(1)
.setMaxParallelism(1);
env.setStreamTimeCharacteristic(EventTime);
String fileLocation = "full file path here";
DataStreamSource<String> rawInput = env.readFile(new TextInputFormat(new Path(fileLocation)), fileLocation);
rawInput.flatMap(parse())
.assignTimestampsAndWatermarks(new AssignerWithPunctuatedWatermarks<TimestampedWord>() {
#Nullable
#Override
public Watermark checkAndGetNextWatermark(TimestampedWord lastElement, long extractedTimestamp) {
return new Watermark(extractedTimestamp);
}
#Override
public long extractTimestamp(TimestampedWord element, long previousElementTimestamp) {
return element.getTimestamp();
}
})
.keyBy(TimestampedWord::getWord)
.process(new KeyedProcessFunction<String, TimestampedWord, Tuple3<String, Long, Long>>() {
private transient ValueState<Long> count;
#Override
public void open(Configuration parameters) throws Exception {
count = getRuntimeContext().getState(new ValueStateDescriptor<>("counter", Long.class));
}
#Override
public void processElement(TimestampedWord value, Context ctx, Collector<Tuple3<String, Long, Long>> out) throws Exception {
if (count.value() == null) {
count.update(0L);
setTimer(ctx.timerService(), value.getTimestamp());
}
count.update(count.value() + 1);
}
#Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tuple3<String, Long, Long>> out) throws Exception {
long currentWatermark = ctx.timerService().currentWatermark();
out.collect(new Tuple3(ctx.getCurrentKey(), count.value(), currentWatermark));
if (currentWatermark < Long.MAX_VALUE) {
setTimer(ctx.timerService(), currentWatermark);
}
}
private void setTimer(TimerService service, long t) {
service.registerEventTimeTimer(((t / 10) + 1) * 10);
}
})
.addSink(new PrintlnSink());
env.execute();
}
private static FlatMapFunction<String, TimestampedWord> parse() {
return new FlatMapFunction<String, TimestampedWord>() {
#Override
public void flatMap(String value, Collector<TimestampedWord> out) {
String[] wordsAndTimes = value.split(" ");
out.collect(new TimestampedWord(wordsAndTimes[0], Long.parseLong(wordsAndTimes[1])));
}
};
}
private static class TimestampedWord {
private final String word;
private final long timestamp;
private TimestampedWord(String word, long timestamp) {
this.word = word;
this.timestamp = timestamp;
}
public String getWord() {
return word;
}
public long getTimestamp() {
return timestamp;
}
}
private static class PrintlnSink implements org.apache.flink.streaming.api.functions.sink.SinkFunction<Tuple3<String, Long, Long>> {
#Override
public void invoke(Tuple3<String, Long, Long> value, Context context) throws Exception {
long timestamp = value.getField(2);
System.out.println(value.getField(0) + "=" + value.getField(1) + " at " + (timestamp - 10) + "-" + (timestamp - 1));
}
}
I get
mario=4 at 1-10
dylan=2 at 1-10
luigi=1 at 1-10
fred=1 at 1-10
bob=2 at 1-10
vilma=1 at 1-10
dan=1 at 1-10
vilma=1 at 10-19
luigi=1 at 10-19
mario=6 at 10-19
carl=1 at 10-19
bambam=1 at 10-19
dylan=2 at 10-19
summer=1 at 10-19
anna=2 at 10-19
bob=2 at 10-19
fred=2 at 10-19
dan=1 at 10-19
fred=2 at 9223372036854775797-9223372036854775806
dan=1 at 9223372036854775797-9223372036854775806
carl=1 at 9223372036854775797-9223372036854775806
mario=6 at 9223372036854775797-9223372036854775806
vilma=1 at 9223372036854775797-9223372036854775806
edu=1 at 9223372036854775797-9223372036854775806
anna=7 at 9223372036854775797-9223372036854775806
summer=1 at 9223372036854775797-9223372036854775806
bambam=1 at 9223372036854775797-9223372036854775806
luigi=1 at 9223372036854775797-9223372036854775806
bob=2 at 9223372036854775797-9223372036854775806
dylan=2 at 9223372036854775797-9223372036854775806
Notice dylan=2 at 0-9 where it should be 1.

No, there isn't a problem with having stream elements with the same timestamp. But a Watermark is an assertion that all events that follow will have timestamps greater than the watermark, so this does mean that you cannot safely emit a Watermark t for a stream element at time t, unless the timestamps in the stream are strictly monotonically increasing -- which is not the case if there are multiple events with the same timestamp. This is why the AscendingTimestampExtractor produces watermarks equal to currentTimestamp - 1, and you should do the same.
Notice that your application is actually reporting that dylan=2 at 0-10, not at 0-9. This is because the watermark resulting from dylan at time 11 is triggering the first timer (the timer set for time 10, but since there is no element with a timestamp of 10, that timer doesn't fire until the watermark from "dylan 11" arrives). And your PrintlnSink uses timestamp - 1 to indicate the upper end of the timespan, hence 11 - 1, or 10, rather than 9.
There's nothing wrong with the output of your ProcessFunction, which looks like this:
(mario,4,11)
(dylan,2,11)
(luigi,1,11)
(fred,1,11)
(bob,2,11)
(vilma,1,11)
(dan,1,11)
(vilma,1,20)
(luigi,1,20)
(mario,6,20)
(carl,1,20)
(bambam,1,20)
(dylan,2,20)
...
It is true that by time 11 there have been two dylans. But the report produced by PrintlnSink is misleading.
Two things need to be changed to get your example working as intended. First, the watermarks need to satisfy the watermarking contract, which isn't currently the case, and second, the windowing logic isn't quite right. The ProcessFunction needs to be prepared for the "dylan 11" event to arrive before the timer closing the window for 0-9 has fired. This is because the "dylan 11" stream element precedes the watermark generated from it in the stream.
Update: events whose timestamps are beyond the current window (such as "dylan 11") can be handled by
keep track of when the current window ends
rather than incrementing the counter, add events for times after the current window to a list
after a window ends, consume events from that list that fall into the next window

How to format a float number in Jason to show only two decimals?

How to format a number as a currency with two decimals in Jason?
The code bellow illustrates the case:
products([["Banana",1], ["Apple",2], ["Pinapple",2.5]]).
margin(2).
!printPrices.
+!printPrices: products(List) & margin(Z)<-
.length(List,LLenght);
-+listSize(0);
while(listSize(Sz) & Sz < LLenght)
{
.random(Y);
.nth(Sz,List,Item);
.nth(0,Item,Name);
.nth(1,Item,Price);
.print("Product(",Sz,"): ",Name," Price $",Y*Z+Price);
-+listSize(Sz+1);
}.
The output is, I'd like to make the output more readable. Notice that float point numbers have many algharisms.:
[sampleagent] Product(0): Banana Price $1.3689469979841409
[sampleagent] Product(1): Apple Price $2.0475157980624523
[sampleagent] Product(2): Pinapple Price $3.4849443740416803

In fact there is no default internal action in Jason to format it as you want. Howeven, you can create your own Internal Action doing like this:
import jason.asSemantics.*;
import jason.asSyntax.*;
public class formatCurrency extends DefaultInternalAction {
private static final long serialVersionUID = 1L;
#Override
public Object execute(TransitionSystem ts, Unifier un, Term[] args) throws Exception {
StringTerm result = new StringTermImpl(String.format("%.2f", Float.valueOf(args[0].toString())));
un.unifies(result, args[1]);
return true;
}
}
In your agent, you can call this action by:
package_name.formatCurrency(10.5555,Price);