Is there a framework for distributing browser automation testing over a cluster of EC2 instances? - browser-automation

I am aiming to simulate a large number of 'real users' hitting and realistically using our site at the same time, and ensuring they can all get through their use cases. I am looking for a framework that combines some EC2 grid management with a web automation tool (such as GEB/WATIR). Ideal 'pushbutton' operation would do all of this:
Start up a configurable number of EC2 instances (using a specified
AMI preconfigured with my browser automation framework and test
scripts)
Start the web automation framework test(s) running on all of them,
in parallel. I guess they would have to be headless.
Wait for completion
Aggregate results
Shut down EC2 instances.

While not a framework per se, I've been really happy with http://loader.io/
It has an API for your own custom integration, reporting and analytics for analysis.
PS. I'm not affiliated with them, just a happy customer.
However, in my experience, you need to do both load testing and actual client testing. Even loader.io will only hit your service from a handful of hosts. And, it skips a major part (the client-side performance from a number of different clients' browsers).
This video has more on that topic:
http://www.youtube.com/watch?v=Il4swGfTOSM&feature=youtu.be

BrowserMob used to offer such service. Looks like they got acquired.

Related

Moving app to GAE

Context:
We have a web application written in Java EE on Glassfish + MySQL with JDBC for storage + XMPP connection for GCM CCS upstream messaging + Quartz scheduler library. Those are the main components of the app.
We're wrapping up our app development stage and we're trying to figure out the best option for deploying it, considering future scalability, both in number of users and their geographical location (ex, multiple VMS on multiple continents, if needed). Currently we're using a single VM instance from DigitalOcean for both the application server and the MySQL server, which we can then scale up with some effort (not much, but probably more than GAE).
Questions:
1) From what I understood, Cloud SQL is replicated in multiple datacenters across the world, thus storage-wise, the geographical scalability is assured. So this is a bonus. However, we would need to update the DB interaction code in the app to make use of Cloud SQL structure, thus being locked-in to their platform. Has this been a problem for you ? (I haven't looked at their DB interaction code yet, so I might be off on this one)
2) Do GAE instances work pretty much like a normal VM would ? Are there any differences regarding this aspect ? I've understood the auto-scaling capability, but how are they geographically scalable ? Do you have to select a datacenter location for each individual instance ? Or how can you achieve multiple worldwide instance locations as Cloud SQL does right out of the box ?
3) How would the XMPP connection fare with multiple instances ? Considering each instance is a separate VM, that means each instance will have a unique XMPP connection to the GCM CCS server. That would cause multiple problems, ex if more than 10 instances are being opened, then the 10 simultaneous XMPP connections limit cap will be hit.
4) How would the Quartz scheduler fare with the instances ? Right now we're saving the scheduled jobs in the MySQL database and scheduling them at each server start. If there are multiple instances, that means the jobs will be scheduled on each instance; so if there are multiple instances, the jobs will be scheduled multiple times. We wouldn't want that.
So, if indeed problems 3 & 4 are like I described, what would be the solution to those 2 problems ? Having a single instance for the XMPP connection as well as a single instance for the Quartz scheduler ?
1) Although CloudSQL is a managed replicated RDBMS, you still need to chose a region when creating an instance. That said, you cannot expect the latency to be great seamlessly globally. You would still need to design a proper architecture to achieve that.
2) GAE isn't a VM in any sense. GAP is a PaaS and, therefore, a runtime environment. You should expect several differences: you are restricted to Java, PHP, GO and Python, the mechanisms GAE provide do you out-of-the-box and compatible third-parties libraries. You cannot install a middleware there, for example. On the other hand, you don't have to deal with anything from the infrastructure standpoint, having transparent high-availability and scalability.
3) XMPP is not my strong suit, but GAE offers a XMPP service, you should take a look at it. More info: https://developers.google.com/appengine/docs/java/xmpp/?hl=pt-br
4) GAE offers a cron service that works pretty well. You wouldn't need to use Quartz.
My last advice is: if you want to migrate an existing application, your best choice would probably be GCE + CloudSQL, not GAE.
Good luck!
Cheers!

Is it a best practice in continuous deployment to deploy to all production servers immediately or just to a subset at first?

We use CD in our project and since the application is used world wide we use more than one data center (one per region). Each data center hosts an isolated instance of the application (each regional deployment uses its own DB, application server etc). Data is not shared between data centers.
There are two different approaches that we can take:
Deploy to integration server (I) where all tests are run, then
deploy to the first data center A and then (once the deployment to A
is finished) to a data center B.
Region A has a smaller user base and to prevent outage in both A
and B caused by a software bug that was not caught on the
integration server (I), an alternative is to deploy to the integration server and then "bake" the code in
region A for 24 hours and deploy the application to data center B
only after it was tested in production for 24 hours. Does this
alternative go against CI best practices since there is no
"continuous" deployment in this case?
There is a big difference between Continuous Integration and Continuous Deploy. CI is for a situation where multiple users work on the same code base and integration tests are run repetitevely for multiple check ins so that integration failures are handled quickly and programatically. Continuous deploy is a pradigm which encapsulate deploying quickly and programmatically accepting your acceptance tests so that you deploy as quick as feasible (instead of the usual ticketing delays that exist in most IT organizations). The question you are asking is a mix for both
As per your specific question, your practice does go against the best practices. If you have 2 different data centers, you have the chance of running into separate issues on the different data centers.
I would rather design your data centers to have the flexibility to switch between current and next version. That way , you can deploy your code to the "next" environment, run your tests there. Once your testing confirms that your new environment is good to go , you can switch your environments from current to next .
The best practice, as Paul Hicks commented, is probably to decouple deployment from feature delivery with feature flags. That said, organizations with many production servers usually protect their uptime by either deploying to a subset of servers ("canary deployment") and monitoring before deploying to all, or by using blue-green deployment. Once the code is deployed, one can hedge one's bet further by flipping a feature flag only for a subset of users and, again, monitoring before exposing the feature to all.

Fastest Open Source Content Management System for Cloud/Cluster deployment

Currently clouds are mushrooming like crazy and people start to deploy everything to the cloud including CMS systems, but so far I have not seen people that have succeeded in deploying popular CMS systems to a load balanced cluster in the cloud. Some performance hurdles seem to prevent standard open-source CMS systems to be deployed to the cloud like this.
CLOUD: A cloud, better load-balanced cluster, has at least one frontend-server, one network-connected(!) database-server and one cloud-storage server. This fits well to Amazon Beanstalk and Google Appengine. (This specifically excludes CMS on a single computer or Linux server with MySQL on the same "CPU".)
To deploy a standard CMS in such a load balanced cluster needs a cloud-ready CMS with the following characteristics:
The CMS must deal with the latency of queries to still be responsive and render pages in less than a second to be cached (or use a precaching strategy)
The filesystem probably must be connected to a remote storage (Amazon S3, Google cloudstorage, etc.)
Currently I know of python/django and Wordpress having middleware modules or plugins that can connect to cloud storages instead of a filesystem, but there might be other cloud-ready CMS implementations (Java, PHP, ?) and systems.
I myself have failed to deploy django-CMS to the cloud, finally due to query latency of the remote DB. So here is my question:
Did you deploy an open-source CMS that still performs well in rendering pages and backend admin? Please post your average page rendering access stats in microseconds for uncached pages.
IMPORTANT: Please describe your configuration, the problems you have encountered, which modules had to be optimized in the CMS to make it work, don't post simple "this works", contribute your experience and knowledge.
Such a CMS probably has to make fewer than 10 queries per page, if more, the queries must be made in parallel, and deal with filesystem access times of 100ms for a stat and query delays of 40ms.
Related:
Slow MySQL Remote Connection
Have you tried Umbraco?
It relies on database, but it keeps layers of cache so you arent doing selects on every request.
http://umbraco.com/azure
It works great on azure too!
I have found an excellent performance test of Wordpress on Appengine. It appears that Google has spent some time to optimize this system for load-balanced cluster and remote DB deployment:
http://www.syseleven.de/blog/4118/google-app-engine-php/
Scaling test from the report.
parallel
hits GAE 1&1 Sys11
1 1,5 2,6 8,5
10 9,8 8,5 69,4
100 14,9 - 146,1
Conclusion from the report the system is slower than on traditional hosting but scales much better.
http://developers.google.com/appengine/articles/wordpress
We have managed to deploy python django-CMS (www.django-cms.org) on GoogleAppEngine with CloudSQL as DB and CloudStore as Filesystem. Cloud store was attached by forking and fixing a django.storage module by Christos Kopanos http://github.com/locandy/django-google-cloud-storage
After that, the second set of problems came up as we discovered we had access times of up to 17s for a single page access. We have investigated this and found that easy-thumbnails 1.4 accessed the normal file system for mod_time requests while writing results to the store (rendering all thumb images on every request). We switched to the development version where that was already fixed.
Then we worked with SmileyChris to fix unnecessary access of mod_times (stat the file) on every request for every image by tracing and posting issues to http://github.com/SmileyChris/easy-thumbnails
This reduced access times from 12-17s to 4-6s per public page on the CMS basically eliminating all storage/"file"-system access. Once that was fixed, easy-thumbnails replaced (per design) file-system accesses with queries to the DB to check on every request if a thumbnail's source image has changed.
One thing for the web-designer: if she uses a image.width statement in the template this forces a ugly slow read on the "filesystem", because image widths are not cached.
Further investigation led to the conclusion that DB accesses are very costly, too and take about 40ms per roundtrip.
Up to now the deployment is unsuccessful mostly due to DB access times in the cloud leading to 4-5s delays on rendering a page before caching it.

Load testing using Jmeter 5k users, website crashed

I was running 5K users on a website while load testing with Jmeter. It was taking too long time and my colleagues were not able to access the site. Then i stop the test. now the website is totally dead and is not accessible. What should I do now?
You should either restart your browser and/or your entire system.
The best would be use ramp up technique. This does not over load the server all at once but adds the load slowly till its reaches peak. This way it is easier to find the crash point.
Restarting the server will be a good way to bring up the application. Also check the server Logs.
First of all I would recommend you to try to understand how many users are going to use your system to set clear test objectives.
These articles I find useful:
stress testing web applications with JMeter (Part 1)
stress testing web applications with JMeter (part 2)
After you've set up your testing goals clearly follow JMeter Performance Testing best practices steps.
Also I would recommend you to get acquainted with Load Testing Cloud - altentive cloud solution for running your .jmx tests instead of local run, which often associated with resource demand on your PC machine.

Load testing end to end Solr performance

We want to test the performance of Solr search engine at all tiers. So far i have used 'SolrMeter' to test the performance of Solr core under stress, SOAP UI + Load UI to to test the REST-Full service tier. We are now implementing the User interface tier. Which tool would you recommend to test the performance under stress of solr at this UI tier level? I can think of Selenium/Webdriver but was wondering if there was a dedicated tool for the same.
I recommend using jMeter.
First use JMeter Proxy to save your test scenario (it records your actions in browser).
Then setup a test plan that would fire the test scenario in a thread group (you have to provide number of users/threads, startup time, load time and unload time) which would be recorded in all Listener elements (graphs, tables, ...) you add to your test.
Don't forget to use jmeter plugins (invaluable jMeter contrib).

Resources