Page metadata in keen.io - analytics

I have a question around best practices for attaching metadata to our keen.io pageview events. Internally we use 3 different keyword categories to identify a piece of content, and those keywords live in tags on every page. A good example would be something like this:
<meta name="namespace:tier1" content="Programming" />
<meta name="namespace:tier2" content="Web Development, Web Operations" />
<meta name="namespace:tier3" content="JavaScript, Analytics, jQuery, HTML, CSS" />
We want to be able to segment our users based on those tiers, and do queries like this:
See all traffic segmented by tier1 keywords
See the most popular tier2 keywords that belong to a specific tier1 keyword
... and so on.
Here's my question: It seems like we could just send this metadata along with the pageview event, but we'll end up having a lot of redundant data that could live in a separate place. For example, if we scraped the keywords every day for our pages, we could index them by URL, and not have all that duplicate meta data in keen.io.
How would you approach this? Am I stuck in SQL land, and should I just don't worry about the duplicate data?
A related question is that our keywords are basically lists, and the keen.io documentation says that we should stay away from lists. Would I need to create a Metadata event for every single word then? Seems like a bit of overkill to send +10 requests on every pageview.

Short answer – don’t worry about the duplication. When it comes to event data, denormalization is your friend. Keen's query interface is designed to be the most powerful when each event contains a lot of properties – effectively the state of the world at that time.
Michelle wrote a guide to thinking about event data that contrasts it with relational data. Many of us (including me) have been stuck in SQL land before and have found this guide to be helpful :)
As far as lists - it's mostly lists of objects that you want to avoid. In this case your list is one of strings, so you can still do a fair amount of querying against that property.
For more information about Keen & lists of objects check out this SO question: Nested JSON Objects In Keen IO.

Related

Portlet event send array of objects

We have multiple projects with multiple portlets and need to send an array of objects between them.
Our situation:
One of the porlets is like a "Master-portlet", it will be responsible for all the REST-calls and consume json-data and parse it to Java-Objects.
All the other portlets will receive an array of objects and show them to the user.
Our thoughts and solution:
We wanted to implement this by sending arrays of objects trough events. One of the "smaller" portlets will send an event to the "Master-portlet" and the "Master-portlet" will then answer with a new event and send the right array of objects back.
Our problem:
We dont know how to send arrays of objects trough events. Is this even possible?
Also we are not sure if this is the right way to solve this. Are events ment to send a bigger amount of data?
Is there a better solution for our case? Maybe it would be better to implement a database and all the portlets get the information from there?
Consider portlet events (and portlets) the UI layer of your application. Based on this, judge if the amount of data that you send back and forth makes sense or not. Also, if you closely couple the portlets, you're just hiding the fact that they can only function together - at least a questionable idea. You rather want them to react to common circumstances (events), but not rely on a specific source of events (master portlet) being available.
That being said: The more complex the data is that you send as payload of a JSR-286 event, the easier you run into classloading problems in cases where your portlets are in different webapplications. If you restrict yourself to Java native types (e.g. String, Map, etc) you will omit problems with the classloader.
Typically you want to communicate changes to the current context (e.g. new "current customer" selected - and an identifier) but not all of the particular data (e.g. the new customer's name and order history). The rest of the data typically comes through the business layer anyway.
That's not to say that you absolutely must not couple your portlets - just that my preference is to rather have them very loosely coupled, so that I can add individual small portlets that replace those that I thought of yesterday.
If you have some time, I've covered a bit of this in a webinar last year, I hope that this adds some clarification where I was too vague in this quick answer.

Seamless Integration with REST API

Many examples on the net show you how to use ng-repeat with in-memory data, but in my case I have long table with infinite scroll that gets data by sending requests to a REST API (scroll down - fetch some data, scroll down again - fetch some more data, etc.). It does work, but I'm wondering how can I integrate that with filters?
Right now I have to call a specific method of API service that makes a request based on text in "search" input box and then controller updates $scope.data.
Is it possible to build a custom filter that would do that? And then my view would be utterly decoupled from the service and I could declaratively tell it how to group and order and filter data, regardless if it's in-memory or comes from a remote server, server that can serve only limited records at a time.
Also later I'm gonna need grouping and ordering as well, I'm so tempted to download the entire dataset and lock parts of the app responsible for grouping, searching and ordering (until all data is on the client), but:
a) that dataset is huge (hundred thousands of records)
b) nobody wants to deal with cache invalidation headaches
c) doing so feels so damn wrong, you don't really expect me to 'keep' all that data in-memory, right?
Can you guys point me to maybe some open-source examples where I can steal some ideas from?
Basically I need to build a service and filters that let me to work with my "pageable" data that comes from api, like it's in memory-data.
Regardless of how you choose to solve it (there are many ways to infinite-scroll with angular, here is one: http://binarymuse.github.io/ngInfiniteScroll/), at its latest current beta version, ng-repeat works really bad with large amount of data - so do filters. The reason is obvious - pulling so much information for changes is a tuff job. Moreover, ng-repeat by default will re-draw your complete list every time something changes.
There are many solutions you can explore in this area, here are the ones I found productive:
http://kamilkp.github.io/angular-vs-repeat/#?tab=8
http://www.williambrownstreet.net/blog/2013/07/angularjs-my-solution-to-the-ng-repeat-performance-problem/
https://github.com/allaud/quick-ng-repeat
You should also consider the following, which really helps with large amounts of data.
https://github.com/Pasvaz/bindonce
Updated
I guess you can't really control your server output, because filtering and ordering large amount of data are better off done on the server side.
I was pointing out the links above since even if you write your own filters (and order-bys), which is quite simple to do - http://jsfiddle.net/gdefpfqL/ - (filter by some company name and then click the "Add More" button - to add more items). ordering by is virtually impossible if you can't control the data coming for the server - the only option is getting it all, ordering and then lazy load from the client's memory. So if each of your list items doesn't have many binding by it self (as in the example I've added) - the list item is a fairly simple one (for instance: you simply present the results as a plain text in a <li>{{item.name}}</li> then angular ng-repeat might work for you. In this case, filters will work as expected - say you filter by searched text:
<li ng-repeat="item in items | filter:searchedText"></li>
even for new items added after the user has searched a text, it will still works because the magic of binding.

Game player object design

I'm supporting a server for an online card game and while thinking about refactoring it into a better state I have found myself unable to decide what is a proper object model for my needs.
I have a Player class which has a lot of attributes. The first problem is just that - the class is too big. The second problem is that I don't know how to refactor it. I will list some of the attributes and issues with these.
Some attributes are very tightly bound to a player: nick, email, last login &c. These, I suppose, are to be kept directly in the player class and in the same table in the DB.
Now, some attributes are a little more difficult, like money and gold amount. The problem with these is that they are historically stored in a different table, there might be some more currencies later on and that they MUST be synched into the database at their own pace.
Third category of attributes are loosely coupled to the player, like status string, experience, achievements, statistics &c. These are stored in different tables in the DB and MUST be stored, retrieved, cached and synchronized at their own pace.
Note that one of the big problems here is that we have to implement relatively complex database synchronization schemes because we have a lot of online players and our game is soft-realtime and we have to make load on the DB as low as possible.
My questions are:
How to determine which attributes to store within a player class and which not to? Say, experience, nickname, money amount?
When one has some attributes that may be grouped together like (strength, agility, endurance, &c.) and (handItem, headItem, feetItem, weapon) when they should be grouped and when not?
What to do with complex database synchronization schemes? Make a separate model for each attribute that needs to be synched independently or make some DataManager classes to take them apart and work with them?
What to do with the need for a class to have several different "data representations" for external consumers? Like XML, Json, another XML for some external service, human-readable string, &c.
I'm sorry if my questions are bogus, I'm not really good at OOP design, I'm more an FP guy. And my English is not very good =).
There is no "limit" to what you can store in a player class. As long as it is concerning him and him only, it should be in his class. But one thing you should consider is to make several player classes. The idea is : if you don't need is, don't query it. You may have PlayerView_Small, PlayerBuying, PlayerFighting, PlayerSettings (depending on your game, they may not be fulfilling the exact same purpose)... This way for each "need" of info on a player, you only load the player data you need, and can handle it properly. Also, you may use inheritance if some class is only a more detailed version of the other.
If you are talking about the class, it may be in a sub-class PlayerAttributes of which an instance is contained into PlayerFighting and PlayerView_Detailed. In the database, it might be interesting to store it as a string (conveniently outputted by our class, and accepted in constructor), to avoid having too much fields, but you will lose the sorting ability. That's probably not a problem in our case, but might be in some others.
Blank for now, I don't understand where there is synchronization, will edit when informed.
In your PlayerViewDetailInfo(or in your PlayerAllData depending what you need), you place some methods such as ToXmlClient1(), ToJson(), ToHumanReadableString() (although that might be a bit confusing to the eye, you should consider HTML^^). The class having the method should be the class with the least (but sufficient to provide the answer) data. When requested, you load the Player... which has the method giving the correct output, and you write it directly in the response.

Self Tracking Entities Traffic Optimization

I'm working on a personal project using WPF with Entity Framework and Self Tracking Entities. I have a WCF web service which exposes some methods for the CRUD operations. Today I decided to do some tests and to see what actually travels over this service and even though I expected something like this, I got really disappointed. The problem is that for a simple update (or delete) operation for just one object - lets say Category I send to the server the whole object graph, including all of its parent categories, their items, child categories and their items, etc. I my case it was a 170 KB xml file on a really small database (2 main categories and about 20 total and about 60 items). I can't imagine what will happen if I have a really big database.
I tried to google for some articles concerning traffic optimization with STE, but with no success, so I decided to ask here if somebody has done something similar, knows some good practices, etc.
One of the possible ways I came out with is to get the data I need per object with more service calls:
return context.Categories.ToList();//only the categories
...
return context.Items.ToList();//only the items
Instead of:
return context.Categories.Include("Items").ToList();
This way the categories and the items will be separated and when making changes or deleting some objects the data sent over the wire will be less.
Has any of you faced a similar problem and how did you solve it or did you solve it?
We've encountered similiar challenges. First of all, as you already mentioned, is to keep the entities as small as possible (as dictated by the desired client functionality). And second, when sending entities back over the wire to be persisted: strip all navigation properties (nested objects) when they haven't changed. This sounds very simple but is not at all trivial. What we do is to recursively dig into the entities present in trackable collections of say the "topmost" entity (and their trackable collections, and theirs, and...) and remove them when their ChangeTracking state is "Unchanged". But be carefull with this, because in some cases you still need these entities because they have been removed or added to trackable collections of their parent entity (so then you shouldn't remove them).
This, what we call "StripEntity", is also mentioned (not with any code sample or whatsoever) in Julie Lerman's - Programming Entity Framework.
And although it might not be as efficient as a more purist kind of approach, the use of STE's saves a lot of code for queries against the database. We are not in need for optimal performance in a high traffic situation, so STE's suit our needs and takes away a lot of code to communicate with the database. You have to decide for your situation what the "best" solution is. Good luck!
You can find an Entity Framework project item at http://selftrackingentity.codeplex.com/. With version 0.9.8, I added a method called GetObjectGraphChanges() that returns an optimized entity object graph with only objects that have changes.
Also, there are two helper methods: EstimateObjectGraphSize() and EstimateObjectGraphChangeSize(). The first method returns the estimate size of the whole entity object along with its object graph; and the later returns the estimate size of the optimized entity object graph with only object that have changes. With these two helper methods, you can decide whether it makes sense to call GetObjectGraphChanges() or not.

How to model Data Transfer Objects for different front ends?

I've run into reoccuring problem for which I haven't found any good examples or patterns.
I have one core service that performs all heavy datasbase operations and that sends results to different front ends (html, silverlight/flash, web services etc).
One of the service operation is "GetDocuments", which provides a list of documents based on different filter criterias. If I only had one front-end, I would like to package the result in a list of Document DTOs (Data transfer objects) that just contains the data. However, different front-ends needs different amounts of "metadata". The simples client just needs the document headline and a link reference. Other clients wants a short text snippet of the document, another one also wants a thumbnail and a third wants the name of the author. Its basically all up to the implementation of the GUI what needs to be displayed.
Whats the best way to model this:
As a lot of different DTOs (Document, DocumentWithThumbnail, DocumentWithTextSnippet)
tends to become a lot of classes
As one DTO containing all the data, where the client choose what to display
Lots of unnecessary data sent
As one DTO where certain fields are populated based on what the client requested
Tends to become a very large class that needs to be extended over time
One DTO but with some kind of generic "Metadata" field containing requested metadata.
Or are there other options?
Since I want a high performance service, I need to think about both network load and caching strategies.
Does anyone have any good patterns or practices that might help me?
What I would do is give the front end the ability to request the presence of the wanted metadata ( say getDocument( WITH_THUMBNAILS | WITH_TEXT_SNIPPET ) )
Then this DTO is built with only this requested information.
Adding all the possible metadata is as you said, unacceptable.
I will surely stay with one class defining all the possible methods (getTitle(), getThumbnail()) and if possible it will return a placeholder when the thumbnail was not requested. Something like "Image not available".
If you want to model this like a pattern, take a look at the factory patterns.
Hope this helps you.
Is there any noticable cost to creating a DTO that has all the data any of your views could need and using it everywhere? I would do that, especially since it insulates you from a requirement change down the line to have one of the views incorporate data one of the other views uses
ex. Maybe your silverlight/flash view doesn't show the title itself b/c it's in the thumb now, but they decide they want to sort by it later.
To clarify, I do not necesarily think you need to pass down all of the data every time, but I think your DTO class should define all of them. Just don't fall into the pits of premature optimization or analysis paralysis. Do the simplest thing first, then justify added complexity. Throw it all in and profile it. If the perf is unacceptable, optimize and try again.

Categories

Resources