I have created two mule runtimes on different windows VM Machines. And Clustered them via Runtime Manager.
Created a Mule App with ObjectStore selecting persistent to true and deploy to cluster. I updated the objectstore value on one server and when i tried to fetch from other its not returning the updated value.
What setting am i missing here ?
FYI: this is on Mule 4.2.2
To be shared between the cluster nodes the object store has to be set to non persistent. A persistent object store usually means it persist to disk. To share in a cluster it needs to be on memory. The cluster backend has a shared memory implementation that will share the object store.
Do not rely on the Object Store. Especially in production.
Use it only for none essential data sharing.
It is easy to be corrupted when multiple requests come from different servers at the same moment.
https://simpleflatservice.com/mule4/IgnoreUnreliableObjectStorage.html
Related
How do different components in a distributed system know where to send messages to access certain services?
For example, lets say I have a service which handles authentication, and a service which handles searching. How does the component which handles searching know where to send an authentication request? Are subdomains more commonly used? If so, how does replication work in this scenario? Is there some registry of local IP addresses which handles all this routing?
The problem you are describing is called service lookup / service registry / resource lookup / .. and it depends. It depends on how large your system is and how dynamic it is.
If you only have few components, it might be feasible enough to store the necessary information in a config file, or pass it as parameter. Generally, many use DNS as a lookup system, but it’s not considered to be a good one, due to the caching and long latency.
I think most distributed systems use Zookeeper to store this information for them. This way, all the services only need to know the IP-addresses of the Zookeeper cluster. If you have replication, you just store multiple addresses inside Zookeeper, and depending on which system you are using, you’ll need to choose an address on your own, or the driver does it (in case you’re connecting to a replicated database for instance).
Another way to do this, is to use a message queue, like ZMQ which will forward the messages to the correct instances. ZMQ can deal with replications and load balancing as well.
This is a very general question regarding GAE / Java Memcache :
I have an object called Network which is stored in the Memcache , and I have another object called User which has a reference to Network .
Network network=Memcache.get(networkKey);
user._network=network; //user references an object that came from Memcache...
now I have a bunch of code that changes user._network(without touching Memcache) :
user._network= // do some changes to the objects , without touching Memcache
Ok so here's the question ,
Looking at the code as it is , did the last line update the network object in the memcache? I have servlets that need to access the updated network object , the question is if I'm wrong by thinking at Memcache objects as regular objects .
Perhaps the right way is this ?
Network network=Memcache.get(networkKey);
network.doUpdate();
Memcache.put(key,network);
The latter code example you provided is the correct way to update the network object in memcache, but I it will work differently than the way you expect.
Your user object refers to an instance of the network object that has been retrieved from memcache. Each time you retrieve the object from memcache you will get a different instance. And, because the instances are logically distinct from the original within the cache, as soon as the cached value is updated, those instances will become out of sync with the cached version.
For example:
Network network=Memcache.get(networkKey);
Network networkTwo = Memcache.get(networkKey);
user._network=network;
networkTwo.doUpdate();
// network and networkTwo are now different
Memcache.put(networkKey,networkTwo);
// user._network still refers to the initial instance of network, not networkTwo
You'll need additional logic in your code if you want your User objects to always refer to the version of network that is in memcache.
The code does update the Network object in memcache but references to the object in other parts of your application that already have a handle on that Network instance are not updated. Memcache stores SERIALIZED objects.
The best way to use memcache in GAE is in conjunction with something like Objectify for caching datastore objects and for storing large collections of objects that are expensive to generate. It's not well suited to storing objects that change frequently or that are used by many other objects in your application.
I'm building a GAE app that requires a cryptographic key to operate. I would like to avoid storing the key in code or in a persistent datastore, and instead upload the key whenever I start my app so that it will only reside in memory for the duration of the app's lifetime (from the time I upload the key until no instances are running.)
I understand that this is possible to do with a resident backend, but this seems too expensive (cheapest backend is currently 58$/month) just to keep one value in memory and serve it to other instances on demand.
Note that I'm not looking for a general robust shared-memory solution, just one value that is basically written once and read many times. Thanks.
I don't think that this can work the way you hope. The sources of data in GAE:
Files deployed with your app (war or whatever).
Per-instance memory (front-end or back-end).
Memcache.
Datastore (or SQL now, I suppose).
Blobstore.
Information retrieved via http requests (i.e. store it somewhere else).
1 and 4 are out, as per your question. 2 doesn't work by itself because the starting and stopping of instances is out of your control (it wouldn't scale otherwise), and persistent instances are expensive. 3 doesn't work by itself because Memcache can be cleared at any time. 5 is really no different than the datastore, as it is permanently stored on the Google's servers. Maybe you could try 6 (store it somewhere else), and retrieve it into per-instance memory during the instance startup. But I suspect that is no better security-wise (and, for that matter, doesn't match with what you said that you wanted).
It seems that a Memcache and local memory solution might work if you:
have your server instances clear the memcached key on exit and
existing server instances write/refresh the key regularly (for
example on every request).
That way the key will likely be there as long as an instance is operational and most likely not be there when starting up cold.
The same mechanism could also be used to propagate a new key and/or cycle server instances in the event of a key change.
I have a pretty basic question: In GAE, if I use memcache to store some data once it was retrieved for the first time from the db, if then that data remains in the cache for like 2 days, do ALL instances of said application get to "see" it and retrieve it from cache? Or is the cache separate for each application instance?
I'm asking this because I've seen that due to the way that GAE spawns separate VM processes (not threads) for each new instance an application needs, stuff that used to be consistent across all instances (in the thread model) is now fragmented per instance (process): like the servelet application context which is NOT propagated across all instances of the same application.
So, again, is memcache consisten across multiple instances of the same application, or does it create per instance/process sets of cached data?
It's consistent; GAE memcache runs as a service separate from your instances.
I am curious as to how caching works in Google App Engine or any cloud based application. Since there is no guarantee that requests are sent to same sever, does that mean that if data is cached on 1st request on Server A, then on 2nd requests which is processed by Server B, it will not be able to access the cache?
If thats the case (cache only local to server), won't it be unlikely (depending on number of users) that a request uses the cache? eg. Google probably has thousands of servers
With App Engine you cache using memcached. This means that a cache server will hold the data in memory (rather than each application server). The application servers (for a given application) all talk the same cache server (conceptually, there could be sharding or replication going on under the hoods).
In-memory caching on the application server itself will potentially not be very effective, because there is more than one of those (although for your given application there are only a few instances active, it is not spread out over all of Google's servers), and also because Google is free to shut them down all the time (which is a real problem for Java apps that take some time to boot up again, so now you can pay to keep idle instances alive).
In addition to these performance/effectiveness issues, in-memory caching on the application server could lead to consistency problems (every refresh shows different data when the caches are not in sync).
Depends on the type of caching you want to achieve.
Caching on the application server itself can be interesting if you have complex in-memory object structure that takes time to rebuild from data loaded from the database. In that specific case, you may want to cache the result of the computation. It will be faster to use a local cache than a shared memcache to load if the structure is large.
If having consistent value between in-memory and the database is paramount, you can do some checksum/timestamp check with a stored value on the datastore, every time you use the cached value. Storing checksum/timestamp on a small object or in a global cache will fasten the process.
One big issue using global memcache is ensuring proper synchronization on "refilling" it, when a value is not yet present or has been flushed. If you have multiple servers doing the check at the exact same time and refilling value in cache, you may end-up having several distinct servers doing the refill at the same time. If the operation is idem-potent, this is not a problem; if not, a potential and very hard to trace bug.