Hazelcast much slower than Ignite on single-node map - benchmarking

I was running a simple benchmark on Hazelcast (using JMH), comparing it with Apache Ignite.
This is for a single node deployment.
The cache configuration is left at default,
final Config config = new Config();
return Hazelcast.newHazelcastInstance(config);
And I use put and get with the map,
private IMap<Long, Customer> normalCache = hazelcast.getMap( CacheName.NORMAL.getCacheName());
public void saveToCache(Customer customer) {
normalCache.put(customer.getId(), customer);
}
From the results, it seems that Ignite is 3-4X faster than Hazelcast.
I had figured the difference would be much lesser.
Both for Ignite as well as Hazelcast, I haven't used any other optimizations (near caches etc), just went with the default configuration (result is in ops/sec, throughput).
Is this the expected performance difference or are the results wrong ?

Please run in a client server setup, or with multiple nodes.
AFAIK in case of Ignite, if a local call is done, it is done on the calling thread instead of being offloaded to a partition threads.
Nice for benchmarks, not very useful for production environments because most calls will not be local (in case of a client server setup, no call is local).

You're using a distributed system for a single node deployment??? Really, don't do that, it's like using a Porsche to drive over a muddy field. If you want single node try something else like EhCache, I'm guessing you don't need High Availability, Backups etc.
If you do want to choose between Ignite or Hazelcast run a real distributed test. Start 3 nodes using Client/Server and then see which one is the fastest.

Related

Using another FileSystem configuration while creating a job

Summary
We are currently facing an issue with the FileSystem abstraction in Flink. We have a job that can dynamically connect to an S3 source (meaning it's defined at runtime).
We discovered a bug in our code, and it could be due to a wrong assumption on the way the FileSystem works.
Bug explanation
During the initialization of the job, (so in the job manager) we manipulate the FS to check that some files exist in order to fail gracefully before the job is executed.
In our case, we need to set dynamically the FS. It can be either HDFS, S3 on AWS or S3 on MinIO.
We want the FS configuration to be specific for the job, and different from the cluster one (different access key, different endpoint, etc.).
Here is an extract of the code we are using to do so:
private void validateFileSystemAccess(Configuration configuration) throws IOException {
// Create a plugin manager from the configuration
PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
// Init the FileSystem from the configuration
FileSystem.initialize(configuration, pluginManager);
// Validate the FileSystem: an exception is thrown if FS configuration is wrong
Path archiverPath = new Path(this.archiverPath);
archiverPath.getFileSystem().exists(new Path("/"));
}
After starting that specific kind of job, we notice that:
the checkpointing does not work for this job, it throws a credential error.
the job manager cannot upload the artifacts needed by the history server for all jobs already running of all kind (not only this specific kind of job).
If we do not deploy that kind of job, the upload of artifacts and the checkpointing work as expected on the cluster.
We think that this issue might come from the FileSystem.initialize() that overrides the configuration for all the FileSystems. We think that because of this, the next call to FileSystem.get() returns the FileSystem we configured in validateFileSystemAccess instead of the cluster configured one.
Questions
Could our hypothesis be correct? If so, how could we provide a specific configuration for the FileSystem without impacting the whole cluster?

how to collect the result from worker node and print it in intellij?

My code is here
I use this code in intellij,my step is:
①mvn clean
②mvn package
③run
This code is used for connecting to remote cluster with intellij.
the print() make the result saved in random taskmanager in random node in the cluster,
so I need to look for the result in $FLINK_HOME/log/*.out
Is there a way to collect these result and printed in intellij's console window?
Thanks for your help.
If you run the job within IntelliJ, using a local stream execution environment, e.g., via
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
rather than on a remote cluster, print() will shows its results in the console. But with a remote stream execution environment, the results will end up in the task managers' file systems, as you have noted.
I don't believe there is a convenient way to collect these results. Flink is designed around scalability, and thus the parallel sinks are designed to avoid any sort of bottleneck. Anything that will unify all of these output streams is a hindrance to scalability.
But what you could do, if you want to have all of the results show up in one place, would be to reduce the parallelism of the PrintSink to 1. This won't bring the results into IntelliJ, but it will mean that you'll find all of the output in one file, on one task manager. You would do that via
.print()
.setParallelism(1)

Mongo shell : is there a way to execute the javascript code remotely instead of doing work in the local machine?

I have a MongoDB instance running on Mongo Atlas and I have a local machine.
I want to execute a script but I would like this script to be execute on the mongo instance.
I have tried several things like Robo3T or Mongo shell. And it looks like the behaviour is not the one I want.
Suppose we have this script :
print(db.users.find({}).toArray().length);
My users collection has around 30k rows. I voluntarily use toArray() to force the creation of a js array. But I want this array to be created... In the MongoDB instance or close to it ; not on the instance where I launched the mongo shell (or Robo3T).
This is obviously not my use case to count the number of users, if I really just wanted the number of users, I would have used .count() and it would have been faster. But I just want to illustrate the fact that the code is run not at the location I want it to be run.
Suppose you connect to a remote ssh. You have a very poor connection.
If you do something like
wget http://moviedatabase.com/rocky.mp4
which is a 1 To movie.
You will take the same time if your connection is blazing fast or amazingly slow : what counts is the bandwith of the server you are connecting to.
With my example, all depends on the connection of the instance you are launching Mongo shell on.
If it has a good connection, it will be faster than if it has a good connection.
What is the way to execute js code "closer" to the MongoDB instance?
How this behaviour not a problem when you administrate a MongoDB instance?
Thanks in advance,
Jerome
It depends on what you are trying to do.
There is no generic context where you can run arbitrary code, but you can store a javascript function on the server, which can then be used in $where or mapReduce.
Note that server-side javascript can be disabled with the security.javascriptEnable configuration parameter.
I would expect that Atlas disables this for it's free and shared tiers.

Entity Framework 6.2 database first warm up

I have an EF 6.2 project in my MVC solution.
This uses a SQL server db and has about 40 tables with many foreign keys.
The first query is very slow, 20 secs.
I immediately hit the same page again, changing the user param, and the query takes less than 1 second.
So this looks like a warm up issue in EF6. That's fine and there's loads of things i can do to sort apparently.
The Model cache (part of EF6.2) looks like it could be beneficial, but everywhere i read about it states model first. Nothing about DB first. Would this still work with db first?
Also there's the Entity Framework 6 power tools, these allow for me to Generate Views. Tried this and it doesn't seem to make any difference. Is this still a valid route?
Any other ideas?
EF DbContexts incur a one-off cost to resolve their entity mappings. For web applications you can mitigate this by having your application start up kick off a simple query against the DbContext which "kicks off" this warm-up rather than during your first user-triggered query. Simply new-ing up a context doesn't trigger the initialization, running a query does. So for ASP.Net MVC on the Application_Start, after initializing everything:
using (var context = new MyContext())
{
var warmup = context.MyTable.Count(); // against a small table.
}
You can test this behaviour with unit tests by having a suite of timed tests that read data from the DbContext, and putting a break-point in DbContext's OnModelCreating event. It will be executed just once from the first test with the first query. You can add a OneTimeSetUp in a test fixture setup to run ahead of the tests with the above quick count example to incur this cost before measuring the performance of the test runs.
So, the answer was to update EF to 6.2 then use the newest feature:
public class MyDbConfiguration : DbConfiguration
{
public MyDbConfiguration() : base()
{
var path = Path.GetDirectoryName(this.GetType().Assembly.Location);
SetModelStore(new DefaultDbModelStore(path));
}
}
for the full story check out this link: https://entityframework.net/why-first-query-slow
Your gonna take a small performance hit at startup but then it all moves a lot faster.
For anyone using an Azure web app you can use a deployment slot (https://stackify.com/azure-deployment-slots/), this allow you to publish into a non-production slot then warm it up before swapping it in as the production slot.

under what circumstances (if any) can I continue to run "out of date" GWT clients when I update my GAEJ version?

following on from this question:
GWT detect GAE version changes and reload
I would like to further clarify some things.
I have an enterprise app (GWT 2.4 & GAEJ 1.6.4 - using GWT-RPC) that my users typically run all day in their browsers, indeed some don't bother refreshing the browser from day to day. I make new releases on a pretty regular basis, so am trying to streamline the process for minimal impact to my users. - Not all releases concern all users, so I'd like to minimize the number of restarts.
I was hoping it might be possible to do the following. Categorize my releases as follows:
1) releases that will cause an IncompatibleRemoteServiceException to be thrown
and 2) those that don't : i.e. only affect the server, or client but not the RPC interface.
Then I could make lots of changes to the client and server without affecting the interface between the two. As long as I don't make a modification to the RPC interface, presumably I can change server code and or client code and the exception won't be thrown? Right? or will any redeployment of GAE cause an old client to get an IncompatibleRemoteServiceException ?
If I was able to do that I could batch up interface busting changes into fairly infrequent releases and notify my users a restart will be required.
many thanks for any help.
I needed an answer pretty quick so I thought I'd just do some good old fashioned testing to see what's possible. Hopefully this will be useful for others with production systems using GWT-RPC.
Goal is to be able to release updates / fixes without requiring all connected browsers to refresh. Turns out there is quite a lot you can do.
So, after my testing, here's what you can and can't do:
no problem
add a new call to a RemoteService
just update some code on the server e.g. simple bug fix, redeploy
just update some client (GWT) code and redeploy (of course anyone wanting new client functionality will have to refresh browser, but others are unaffected)
limited problems
add a parameter to an existing RemoteService method - this one is interesting, that particular call will throw "IncompatibleRemoteServiceException" (of course) but all others calls to the same Remote Service or other Remote Services (Impl's) are unaffected.
Add a new type (as a parameter) to any method within a RemoteService - this is the most interesting one, and is what led me to do this testing. It will render that whole RemoteService out of date for existing clients with IncompatibleRemoteServiceException. However you can still use other RemoteServices. - I need to do some more testing here to fully understand or perhaps someone else knows more?
so if you know what you're doing you can do quite a lot without having to bother your users with refreshes or release announcements.

Resources