I'm trying to find the best solution (performance & accurate) to have a static list of objects in a web service.
Some web methods will be making amendments to these objects and returning the state of the object after the amendments and others will be requesting the current state.
This needs to be accurate at every operation as it's money related. This web service will be bombarded with requests from different areas of our large scale project.
I've been looking at ConcurrentDictionary, and while reading some other SO questions I came across the following answer: https://stackoverflow.com/a/1966462/151625
The following paragraph is something that I do not want:
Now consider this. In the store with one clerk, what if you get all the way to the front of the line, and ask the clerk "Do you have any toilet paper", and he says "Yes", and then you go "Ok, I'll get back to you when I know how much I need", then by the time you're back at the front of the line, the store can of course be sold out. This scenario is not prevented by a threadsafe collection.
So essentially I'm asking, should I lock values within a ConcurrentDictionary, or does it defeat the whole purpose of it? If I should/could, what is the best way to do it, if not, what alternative do I have?
Related
I have a skill that elicits a U.S. state and county from the user and then retrieves some data. The backend is working fine, but I am concerned about how to structure the conversation. So far, I have created an intent called GetInfoIntent, which has two custom slots, state_name, and county_name
There are about 3,000 U.S. counties with many duplicate names. It seems silly to me that I am asking for a county, without first "narrowing down", by states. Another way I can think of to do the conversation is to have 50 intents, "GetNewHampshireInfo, GetCaliforniaInfo, etc. If I did it this way, I'd need a custom slot type for each state, like nh_counties, ca_counties. etc.
This must be a pretty generic problem. Is there a standard approach, or best practice, I can use?
My (not necessarily best practice) practice tips:
Single slot for single data type. Meaning only have one slot for a four digit number even if you use it in more than one place for two different things in the skill.
As few intents as you need
no more no less. You certainly can and should break up the back end code with helper code, but try and not break the intents into too many smaller pieces. It can lead to difficulty when Alexa is trying to choose the intended intent.
Keep it voice focused. How would you ask in a
conversation. Voice first development is always the way to go.
For the slot filling I think it is fine to ask both state and county.
If the matching is not correct ask for confirmation.
Another option is to not use auto filling within the Alexa skill and use the dialog interface. Ask the county first and then only when it has more than one state option and is ambiguous continue the dialog to fill the state.
Even if you did have 50 separate intents you really never want to have two slots that can be filled by the same word. For example having a mo_counties and ky_counties that Clack satisfies both is ambiguous and can cause unneeded difficultly.
So for someone looking for the "best practice" I have learning that there isn't one yet (maybe never will be). Do what makes sense for the conversation and try and keep it as simple as it needs to be and no less on the back end.
I also find it helpful to find a non-developer to test your conversation flow.
This wasn't really technical and is all opinion, but that is a lot of what Alexa development is. I would suggest Tuesday Alexa office hours at https://www.twitch.tv/amazonalexa very helpful and you can ask questions about stuff like this.
We have multiple projects with multiple portlets and need to send an array of objects between them.
Our situation:
One of the porlets is like a "Master-portlet", it will be responsible for all the REST-calls and consume json-data and parse it to Java-Objects.
All the other portlets will receive an array of objects and show them to the user.
Our thoughts and solution:
We wanted to implement this by sending arrays of objects trough events. One of the "smaller" portlets will send an event to the "Master-portlet" and the "Master-portlet" will then answer with a new event and send the right array of objects back.
Our problem:
We dont know how to send arrays of objects trough events. Is this even possible?
Also we are not sure if this is the right way to solve this. Are events ment to send a bigger amount of data?
Is there a better solution for our case? Maybe it would be better to implement a database and all the portlets get the information from there?
Consider portlet events (and portlets) the UI layer of your application. Based on this, judge if the amount of data that you send back and forth makes sense or not. Also, if you closely couple the portlets, you're just hiding the fact that they can only function together - at least a questionable idea. You rather want them to react to common circumstances (events), but not rely on a specific source of events (master portlet) being available.
That being said: The more complex the data is that you send as payload of a JSR-286 event, the easier you run into classloading problems in cases where your portlets are in different webapplications. If you restrict yourself to Java native types (e.g. String, Map, etc) you will omit problems with the classloader.
Typically you want to communicate changes to the current context (e.g. new "current customer" selected - and an identifier) but not all of the particular data (e.g. the new customer's name and order history). The rest of the data typically comes through the business layer anyway.
That's not to say that you absolutely must not couple your portlets - just that my preference is to rather have them very loosely coupled, so that I can add individual small portlets that replace those that I thought of yesterday.
If you have some time, I've covered a bit of this in a webinar last year, I hope that this adds some clarification where I was too vague in this quick answer.
Abstract: Is MapReduce a good idea when processing a collection of data from the database, instead of finding some answer to a somewhat complex (or just big) question?
I would like to sync a set of syndication sources (e.g. urls like http://xkcd.com/rss.xml ), which are stored in GAE's datastore as a collection/table. I see two options, one is straight forward. Make simple tasks which you put in a queue, where each task handle's 100 or 1000 or whatever natural number seems to fit in each task. The other option is MapReduce.
In the latter case, the Map does everything, and the Reduce does nothing. Moreover, the map has no result, it just alters the 'state' (of the datastore).
#Override public void map(Entity entity) {
String url = (String)entity.getProperty("url");
for(Post p : www.fetchPostsFromFeed(url)) {
p.save();
}
}
As you can see, one source can map to many posts, so my map might as well be called "Explode".
So no emits and nothing for reduce to do. The reason I like this map-approach, is that I tell google: Here, take my collection/table, split it however you see fit to different mappers, and then store the posts wherever you like. The datastore uses 'high replication'. So availability of the data is high and a best choice for what 'computational unit' does what entity doesn't really reduce network communication. The same goes for the save of the posts, as they need to go to all datastore units. What I do like is that mapreduce has some way of fault-recovery for map-computations that get stuck, and that it knows how many tasks to send to which node, instead of queueing some number of entities somewhere hoping it makes sense.
Maybe my way of thinking here is wrong, in which case, please correct me. Anyhow, is this approach 'wrong' for the lack of reduce and map being an 'explode'?
Nope, Map pretty does the same as as your manual enqueuing of tasks.
I'm working on a personal project using WPF with Entity Framework and Self Tracking Entities. I have a WCF web service which exposes some methods for the CRUD operations. Today I decided to do some tests and to see what actually travels over this service and even though I expected something like this, I got really disappointed. The problem is that for a simple update (or delete) operation for just one object - lets say Category I send to the server the whole object graph, including all of its parent categories, their items, child categories and their items, etc. I my case it was a 170 KB xml file on a really small database (2 main categories and about 20 total and about 60 items). I can't imagine what will happen if I have a really big database.
I tried to google for some articles concerning traffic optimization with STE, but with no success, so I decided to ask here if somebody has done something similar, knows some good practices, etc.
One of the possible ways I came out with is to get the data I need per object with more service calls:
return context.Categories.ToList();//only the categories
...
return context.Items.ToList();//only the items
Instead of:
return context.Categories.Include("Items").ToList();
This way the categories and the items will be separated and when making changes or deleting some objects the data sent over the wire will be less.
Has any of you faced a similar problem and how did you solve it or did you solve it?
We've encountered similiar challenges. First of all, as you already mentioned, is to keep the entities as small as possible (as dictated by the desired client functionality). And second, when sending entities back over the wire to be persisted: strip all navigation properties (nested objects) when they haven't changed. This sounds very simple but is not at all trivial. What we do is to recursively dig into the entities present in trackable collections of say the "topmost" entity (and their trackable collections, and theirs, and...) and remove them when their ChangeTracking state is "Unchanged". But be carefull with this, because in some cases you still need these entities because they have been removed or added to trackable collections of their parent entity (so then you shouldn't remove them).
This, what we call "StripEntity", is also mentioned (not with any code sample or whatsoever) in Julie Lerman's - Programming Entity Framework.
And although it might not be as efficient as a more purist kind of approach, the use of STE's saves a lot of code for queries against the database. We are not in need for optimal performance in a high traffic situation, so STE's suit our needs and takes away a lot of code to communicate with the database. You have to decide for your situation what the "best" solution is. Good luck!
You can find an Entity Framework project item at http://selftrackingentity.codeplex.com/. With version 0.9.8, I added a method called GetObjectGraphChanges() that returns an optimized entity object graph with only objects that have changes.
Also, there are two helper methods: EstimateObjectGraphSize() and EstimateObjectGraphChangeSize(). The first method returns the estimate size of the whole entity object along with its object graph; and the later returns the estimate size of the optimized entity object graph with only object that have changes. With these two helper methods, you can decide whether it makes sense to call GetObjectGraphChanges() or not.
A friend of mine is beginning to build a NetHack bot (a bot that plays the Roguelike game: NetHack). There is a very good working bot for the similar game Angband, but it works partially because of the ease in going back to the town and always being able to scum low levels to gain items.
In NetHack, the problem is much more difficult, because the game rewards ballsy experimentation and is built basically as 1,000 edge cases.
Recently I suggested using some kind of naive bayesian analysis, in very much the same way spam is created.
Basically the bot would at first build a corpus, by trying every possible action with every item or creature it finds and storing that information with, for instance, how close to a death, injury of negative effect it was. Over time it seems like you could generate a reasonably playable model.
Can anyone point us in the right direction of what a good start would be? Am I barking up the wrong tree or misunderstanding the idea of bayesian analysis?
Edit: My friend put up a github repo of his NetHack patch that allows python bindings. It's still in a pretty primitive state but if anyone's interested...
Although Bayesian analysis encompasses much more, the Naive Bayes algorithm well known from spam filters is based on one very fundamental assumption: all variables are essentially independent of each other. So for instance, in spam filtering each word is usually treated as a variable so this means assuming that if the email contains the word 'viagra', that knowledge does affect the probability that it will also contain the word 'medicine' (or 'foo' or 'spam' or anything else). The interesting thing is that this assumption is quite obviously false when it comes to natural language but still manages to produce reasonable results.
Now one way people sometimes get around the independence assumption is to define variables that are technically combinations of things (like searching for the token 'buy viagra'). That can work if you know specific cases to look for but in general, in a game environment, it means that you can't generally remember anything. So each time you have to move, perform an action, etc, its completely independent of anything else you've done so far. I would say for even the simplest games, this is a very inefficient way to go about learning the game.
I would suggest looking into using q-learning instead. Most of the examples you'll find are usually just simple games anyway (like learning to navigate a map while avoiding walls, traps, monsters, etc). Reinforcement learning is a type of online unsupervised learning that does really well in situations that can be modeled as an agent interacting with an environment, like a game (or robots). It does this trying to figure out what the optimal action is at each state in the environment (where each state can include as many variables as needed, much more than just 'where am i'). The trick then is maintain just enough state that helps the bot make good decisions without having a distinct point in your state 'space' for every possible combination of previous actions.
To put that in more concrete terms, if you were to build a chess bot you would probably have trouble if you tried to create a decision policy that made decisions based on all previous moves since the set of all possible combinations of chess moves grows really quickly. Even a simpler model of where every piece is on the board is still a very large state space so you have to find a way to simplify what you keep track of. But notice that you do get to keep track of some state so that your bot doesn't just keep trying to make a left term into a wall over and over again.
The wikipedia article is pretty jargon heavy but this tutorial does a much better job translating the concepts into real world examples.
The one catch is that you do need to be able to define rewards to provide as the positive 'reinforcement'. That is you need to be able to define the states that the bot is trying to get to, otherwise it will just continue forever.
There is precedent: the monstrous rog-o-matic program succeeded in playing rogue and even returned with the amulet of Yendor a few times. Unfortunately, rogue was only released an a binary, not source, so it has died (unless you can set up a 4.3BSD system on a MicroVAX), leaving rog-o-matic unable to play any of the clones. It just hangs cos they're not close enough emulations.
However, rog-o-matic is, I think, my favourite program of all time, not only because of what it achieved but because of the readability of the code and the comprehensible intelligence of its algorithms. It used "genetic inheritance": a new player would inherit a combination of preferences from a previous pair of successful players, with some random offset, then be pitted against the machine. More successful preferences would migrate up in the gene pool and less successful ones down.
The source can be hard to find these days, but searching "rogomatic" will set you on the path.
I doubt bayesian analysis will get you far because most of NetHack is highly contextual. There are very few actions which are always a bad idea; most are also life-savers in the "right" situation (an extreme example is eating a cockatrice: that's bad, unless you are starving and currently polymorphed into a stone-resistant monster, in which case eating the cockatrice is the right thing to do). Some of those "almost bad" actions are required to win the game (e.g. coming up the stairs on level 1, or deliberately falling in traps to reach Gehennom).
What you could try would be trying to do it at the "meta" level. Design the bot as choosing randomly among a variety of "elementary behaviors". Then try to measure how these bots fare. Then extract the combinations of behaviors which seem to promote survival; bayesian analysis could do that among a wide corpus of games along with their "success level". For instance, if there are behaviors "pick up daggers" and "avoid engaging monsters in melee", I would assume that analysis would show that those two behaviors fit well together: bots which pick daggers up without using them, and bots which try to throw missiles at monsters without gathering such missiles, will probably fare worse.
This somehow mimics what learning gamers often ask for in rec.games.roguelike.nethack. Most questions are similar to: "should I drink unknown potions to identify them ?" or "what level should be my character before going that deep in the dungeon ?". Answers to those questions heavily depend on what else the player is doing, and there is no good absolute answer.
A difficult point here is how to measure the success at survival. If you simply try to maximize the time spent before dying, then you will favor bots which never leave the first levels; those may live long but will never win the game. If you measure success by how deep the character goes before dying then the best bots will be archeologists (who start with a pick-axe) in a digging frenzy.
Apparently there are a good number of Nethack bots out there. Check out this listing:
In nethack unknown actions usually have a boolean effect -- either you gain or you loose. Bayesian networks base around "fuzzy logic" values -- an action may give a gain with a given probability. Hence, you don't need a bayesian network, just a list of "discovered effects" and wether they are good or bad.
No need to eat the Cockatrice again, is there?
All in all it depends how much "knowledge" you want to give the bot as starters. Do you want him to learn everything "the hard way", or will you feed him spoilers 'till he's stuffed?