Flink session window not working as expected - flink-streaming

The session window in Flink is not working as expected on prod env (same logic works on local env). The idea is to emit the count of 'sample_event_two' for a specific user Id & record id incase if there is at least one event of type 'sample_event_one' for the same user Id & record id. ProcessingTimeSessionWindows with session gap of 30 mins is used here and ProcessWindowFunction has the below logic (I am doing a keyby user Id and record Id fields before setting the window size),
public void process(
String s,
Context context,
Iterable<SampleEvent> sampleEvents,
Collector<EnrichedSampleEvent> collector)
throws Exception {
EnrichedSampleEvent event = null;
boolean isSampleEventOnePresent = false;
int count = 0;
for (SampleEvent sampleEvent : sampleEvents) {
if (sampleEvent.getEventName().equals("sample_event_one_name")) {
Logger.info("Received sample_event_one for userId: {}");
isSampleEventOnePresent = true;
} else {
// Calculate the count for sample_event_two
count++;
if (Objects.isNull(event)) {
event = new EnrichedSampleEvent();
event.setUserId(sampleEvent.getUserId());
}
}
}
if (isSampleEventOnePresent && Objects.nonNull(event)) {
Logger.info(
"Created EnrichedSampleEvent for userId: {} with count: {}",
event.getUserId(),
event.getCount());
collector.collect(event);
} else if (Objects.nonNull(event)) {
Logger.info(
"No sampleOneEvent event found sampleTwoEvent with userId: {}, count: {}",
event.getUserId(),
count);
}
}
Though there is sample_event_one present in the collection (confirmed by verifying if the log message "Received sample_event_one" was present) and the count is calculated correctly, I don't see any output event getting created. Instead of EnrichedSampleEvent being emitted, I see log message "No sampleOneEvent event found sampleTwoEvent with userID: "123, count: 5". Can someone help me fix this?

Your ProcessWindowFunction will be called for each key individually. Since the key is a combination of user id and record id, it's not enough to know that "Received sample_event_one" appears in the logs for the same user. Even though it was the same user, it might have had a different record id.

Related

Helping me understand session api Gatling

I am new to gatling
I am trying to loop on json response, find the country code that I am looking for and take the id coressponding the that coutry code.
Sample algorithm:
list.foreach( value => { if (value.coutrycode == "PL") then store value.id })
on Gatling:
def getOffer() = {
exec(
http("GET /offer")
.get("/offer")
.check(status.is(Constant.httpOk))
.check((bodyString.exists),
jsonPath("$[*]").ofType[Map[String,Any]].findAll.saveAs("offerList")))
.foreach("${offerList}", "item"){
exec(session => {
val itemMap = session("item").as[Map[String,Any]]
val countryCodeId = itemMap("countryCode")
println("****" + countryCodeId)
// => print all the country code on the list
if (countryCodeId =="PL"){ // if statement condition
println("*************"+ itemMap("offerd")); // print the id eg : "23"
session.set("offerId", itemMap("offerId")); // set the id on the session
}
println("$$$$$$$$$$$$$$" + session("offerId")) // verify that th session contains the offerId but is not
session
})
}
}
When I try to print the session("offerId"), it's print "item" and not the offerId.
I looked on the documentation but I didn't understand the behaviour. Could you please explain it to me ?
It's all in the documentation.
Session instances are immutable!
Why is that so? Because Sessions are messages that are dealt with in a
multi-threaded concurrent way, so immutability is the best way to deal
with state without relying on synchronization and blocking.
A very common pitfall is to forget that set and setAll actually return
new instances.
val session: Session = ???
// wrong usage
session.set("foo", "FOO") // wrong: the result of this set call is just discarded
session.set("bar", "BAR")
// proper usage
session.set("foo", "FOO").set("bar", "BAR")
So what you want is:
val newSession =
if (countryCodeId =="PL"){ // if statement condition
println("*************"+ itemMap("offerd")); // print the id eg : "23"
session.set("offerId", itemMap("offerId")); // set the id on the session
} else {
session
}
// verify that the session contains the offerId
println("$$$$$$$$$$$$$$" + newSession("offerId").as[String])
newSession

How to keep track of the number of times a user has been mentioned?

In my bot, I have a message counter that stores the number of times a user sent a message in the server.
I was trying to count how many times a user got mentioned in the server. Does anyone know how could I do it?
You can use message.mentions.members (or message.mentions.users) to see the mentions in a message. You can store the number of mentions for every user: every time they are mentioned, you increase the count.
var mention_count = {};
client.on('message', message => {
for (let id of message.mentions.users.keyArray()) {
if (!mention_count[id]) mention_count[id] = 1;
else mention_count[id]++;
}
});
Please note that mention_count will be reset every time you restart your bot, so remember to store it in a file or in a database to avoid losing it.
Edit: below you can see your code applied to mentions: every time there's a mention to count, it gets stored in the level value of the score.
client.on('message', message => {
if (!message.guild) return;
for (let id of message.mentions.users.keyArray()) if (id != message.author.id) {
let score = client.getScore.get(id, message.guild.id);
if (!score) score = {
id: `${message.guild.id}-${id}`,
user: id,
guild: message.guild.id,
points: 0,
level: 0
};
score.level++;
client.setScore.run(score);
}
});

How do we detect repeat visitor IP and create a condition in liquid?

I have a custom page, created with Shopify liquid -> https://shop.betterbody.co/pages/nazreen-joel-video-sales-letter-16-july
I have set the timer to load within 3seconds for the sales element to appear.
The question is, I would like to set an if/else condition to immediately show these sales element for repeat visitors. There is a timer.js that sets the time for the sales element to appear. If its a new visitor, timer will trigger, else server will not load the timer. I can't seem to find any solution online. Do I detect visitor IP? or is there any best solution to do this?
Here is the code inside theme.liquid,
{{ 'timer.js' | asset_url | script_tag }} .
Timer.js code:
$(document).ready(function(){
setTimeout(function() {
$(".refference").css({paddingTop: "350px"});
// $("#early-cta, #guarentee, #payments, #info, #details").show();
$("#early-cta, #guarentee, #payments, #info, #details").fadeIn(3000);
}, 3000);
});
Pls help.
You could look into localStorage to do this.
https://www.w3schools.com/htmL/html5_webstorage.asp
Localstorage is used to store data within the browser without any expiration date.
When a visitor visits the site for the first time, you could use localStorage to detect if the user has been to your site, if the user hasn’t, you run the timer, and set a variable that the user has visited.
Upon revisiting the site, you use localStorage and check against the variable to see if the user has been to your site or not, and trigger the script accordingly.
Expounding on #Jacob's answer and my comment, you can do what you need to do with JavaScript and localStorage.
So something like this to add:
function setVisited() {
// object to be added to localStorage
let obj = {
timestamp: new Date().getTime()
// add more stuff here if you need
// someValue: "foo",
// anotherValue: "bar"
}
// add object to localStorage
localStorage.setItem(
"hasVisited", // key that you will use to retrieve data
JSON.stringify(obj)
);
}
and something like this to retrieve:
function getVisited() {
return JSON.parse(localStorage.getItem("hasVisited"));
}
// returns: {timestamp: 1533398672593} or null
Also, as an additional condition to your event, you can choose to "expire" the user's localStorage value by checking the timestamp against the current timestamp and comparing it against a predefined expiration duration.
For example, if I wish to consider a visitor who has not returned 7 days as a new visitor:
let expiration = 86400 * 1000 * 7; // 7 days in milliseconds
function isNewVisitor() {
// get current timestamp
let timeNow = new Date().getTime();
let expired = true;
// if getVisited doesn't return null..
if (getVisited()) {
let timeDiff = timeNow - getVisited().timestamp;
expired = timeDiff > expiration;
}
// if the visitor is old and not expire, return true
if (!getVisited() || expired) {
return true;
} else {
return false;
}
}
So, to integrate with your current function, we will just check the condition before setting the timeout:
// let timeout be 3000 if the visitor is new
let timeout = isNewVisitor() ? 3000 : 0;
setTimeout(function() {
$(".refference").css({paddingTop: "350px"});
$("#early-cta, #guarentee, #payments, #info, #details").fadeIn(3000);
}, timeout);
// set the visited object for new visitors, update for old visitors
setVisited();
Check out the fiddle here: https://jsfiddle.net/fr9hjvc5/15/

Pagination in Google cloud endpoints + Datastore + Objectify

I want to return a List of "Posts" from an endpoint with optional pagination.
I need 100 results per query.
The Code i have written is as follows, it doesn't seem to work.
I am referring to an example at Objectify Wiki
Another option i know of is using query.offset(100);
But i read somewhere that this just loads the entire table and then ignores the first 100 entries which is not optimal.
I guess this must be a common use case and an optimal solution will be available.
public CollectionResponse<Post> getPosts(#Nullable #Named("cursor") String cursor,User auth) throws OAuthRequestException {
if (auth!=null){
Query<Post> query = ofy().load().type(Post.class).filter("isReviewed", true).order("-timeStamp").limit(100);
if (cursor!=null){
query.startAt(Cursor.fromWebSafeString(cursor));
log.info("Cursor received :" + Cursor.fromWebSafeString(cursor));
} else {
log.info("Cursor received : null");
}
QueryResultIterator<Post> iterator = query.iterator();
for (int i = 1 ; i <=100 ; i++){
if (iterator.hasNext()) iterator.next();
else break;
}
log.info("Cursor generated :" + iterator.getCursor());
return CollectionResponse.<Post>builder().setItems(query.list()).setNextPageToken(iterator.getCursor().toWebSafeString()).build();
} else throw new OAuthRequestException("Login please.");
}
This is a code using Offsets which seems to work fine.
#ApiMethod(
name = "getPosts",
httpMethod = ApiMethod.HttpMethod.GET
)
public CollectionResponse<Post> getPosts(#Nullable #Named("offset") Integer offset,User auth) throws OAuthRequestException {
if (auth!=null){
if (offset==null) offset = 0;
Query<Post> query = ofy().load().type(Post.class).filter("isReviewed", true).order("-timeStamp").offset(offset).limit(LIMIT);
log.info("Offset received :" + offset);
log.info("Offset generated :" + (LIMIT+offset));
return CollectionResponse.<Post>builder().setItems(query.list()).setNextPageToken(String.valueOf(LIMIT + offset)).build();
} else throw new OAuthRequestException("Login please.");
}
Be sure to assign the query:
query = query.startAt(cursor);
Objectify's API uses a functional style. startAt() does not mutate the object.
Try the following:
Remove your for loop -- not sure why it is there. But just iterate through your list and build out the list of items that you want to send back. You should stick to the iterator and not force it for 100 items in a loop.
Next, once you have iterated through it, use the iterator.getStartCursor() as the value of the cursor.

In Firebase, is there a way to get the number of children of a node without loading all the node data?

You can get the child count via
firebase_node.once('value', function(snapshot) { alert('Count: ' + snapshot.numChildren()); });
But I believe this fetches the entire sub-tree of that node from the server. For huge lists, that seems RAM and latency intensive. Is there a way of getting the count (and/or a list of child names) without fetching the whole thing?
The code snippet you gave does indeed load the entire set of data and then counts it client-side, which can be very slow for large amounts of data.
Firebase doesn't currently have a way to count children without loading data, but we do plan to add it.
For now, one solution would be to maintain a counter of the number of children and update it every time you add a new child. You could use a transaction to count items, like in this code tracking upvodes:
var upvotesRef = new Firebase('https://docs-examples.firebaseio.com/android/saving-data/fireblog/posts/-JRHTHaIs-jNPLXOQivY/upvotes');
upvotesRef.transaction(function (current_value) {
return (current_value || 0) + 1;
});
For more info, see https://www.firebase.com/docs/transactions.html
UPDATE:
Firebase recently released Cloud Functions. With Cloud Functions, you don't need to create your own Server. You can simply write JavaScript functions and upload it to Firebase. Firebase will be responsible for triggering functions whenever an event occurs.
If you want to count upvotes for example, you should create a structure similar to this one:
{
"posts" : {
"-JRHTHaIs-jNPLXOQivY" : {
"upvotes_count":5,
"upvotes" : {
"userX" : true,
"userY" : true,
"userZ" : true,
...
}
}
}
}
And then write a javascript function to increase the upvotes_count when there is a new write to the upvotes node.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
exports.countlikes = functions.database.ref('/posts/$postid/upvotes').onWrite(event => {
return event.data.ref.parent.child('upvotes_count').set(event.data.numChildren());
});
You can read the Documentation to know how to Get Started with Cloud Functions.
Also, another example of counting posts is here:
https://github.com/firebase/functions-samples/blob/master/child-count/functions/index.js
Update January 2018
The firebase docs have changed so instead of event we now have change and context.
The given example throws an error complaining that event.data is undefined. This pattern seems to work better:
exports.countPrescriptions = functions.database.ref(`/prescriptions`).onWrite((change, context) => {
const data = change.after.val();
const count = Object.keys(data).length;
return change.after.ref.child('_count').set(count);
});
```
This is a little late in the game as several others have already answered nicely, but I'll share how I might implement it.
This hinges on the fact that the Firebase REST API offers a shallow=true parameter.
Assume you have a post object and each one can have a number of comments:
{
"posts": {
"$postKey": {
"comments": {
...
}
}
}
}
You obviously don't want to fetch all of the comments, just the number of comments.
Assuming you have the key for a post, you can send a GET request to
https://yourapp.firebaseio.com/posts/[the post key]/comments?shallow=true.
This will return an object of key-value pairs, where each key is the key of a comment and its value is true:
{
"comment1key": true,
"comment2key": true,
...,
"comment9999key": true
}
The size of this response is much smaller than requesting the equivalent data, and now you can calculate the number of keys in the response to find your value (e.g. commentCount = Object.keys(result).length).
This may not completely solve your problem, as you are still calculating the number of keys returned, and you can't necessarily subscribe to the value as it changes, but it does greatly reduce the size of the returned data without requiring any changes to your schema.
Save the count as you go - and use validation to enforce it. I hacked this together - for keeping a count of unique votes and counts which keeps coming up!. But this time I have tested my suggestion! (notwithstanding cut/paste errors!).
The 'trick' here is to use the node priority to as the vote count...
The data is:
vote/$issueBeingVotedOn/user/$uniqueIdOfVoter = thisVotesCount, priority=thisVotesCount
vote/$issueBeingVotedOn/count = 'user/'+$idOfLastVoter, priority=CountofLastVote
,"vote": {
".read" : true
,".write" : true
,"$issue" : {
"user" : {
"$user" : {
".validate" : "!data.exists() &&
newData.val()==data.parent().parent().child('count').getPriority()+1 &&
newData.val()==newData.GetPriority()"
user can only vote once && count must be one higher than current count && data value must be same as priority.
}
}
,"count" : {
".validate" : "data.parent().child(newData.val()).val()==newData.getPriority() &&
newData.getPriority()==data.getPriority()+1 "
}
count (last voter really) - vote must exist and its count equal newcount, && newcount (priority) can only go up by one.
}
}
Test script to add 10 votes by different users (for this example, id's faked, should user auth.uid in production). Count down by (i--) 10 to see validation fail.
<script src='https://cdn.firebase.com/v0/firebase.js'></script>
<script>
window.fb = new Firebase('https:...vote/iss1/');
window.fb.child('count').once('value', function (dss) {
votes = dss.getPriority();
for (var i=1;i<10;i++) vote(dss,i+votes);
} );
function vote(dss,count)
{
var user='user/zz' + count; // replace with auth.id or whatever
window.fb.child(user).setWithPriority(count,count);
window.fb.child('count').setWithPriority(user,count);
}
</script>
The 'risk' here is that a vote is cast, but the count not updated (haking or script failure). This is why the votes have a unique 'priority' - the script should really start by ensuring that there is no vote with priority higher than the current count, if there is it should complete that transaction before doing its own - get your clients to clean up for you :)
The count needs to be initialised with a priority before you start - forge doesn't let you do this, so a stub script is needed (before the validation is active!).
write a cloud function to and update the node count.
// below function to get the given node count.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
exports.userscount = functions.database.ref('/users/')
.onWrite(event => {
console.log('users number : ', event.data.numChildren());
return event.data.ref.parent.child('count/users').set(event.data.numChildren());
});
Refer :https://firebase.google.com/docs/functions/database-events
root--|
|-users ( this node contains all users list)
|
|-count
|-userscount :
(this node added dynamically by cloud function with the user count)

Resources