How to use Chaos Monkey on local cluster - database

I have a cluster back in my office for testing purposes. I have there a database and i would like to make all kind of "monekybusiness" to those test machines, long before i want to go to production.
I zipped 2-3 coffees all this morning trying to figure out HOW to make the "inFAMOUS" Simian Army to chew my nerves here on my local machines.
Everywhere i read, saw all kind of setups for AWS.
Question : Is there a possibility to deploy the Monkeys on my local cluster? Or is there any other alternative to Simian Army?

What is the question that you want to answer with your tests?
ChaosMonkey is a resilience tool that was design for the cloud, its main purpose is to verify that AWS' Auto Scaling Groups (ASG) will be able to re-provision faulty/offline nodes and that the application is capable to perform in a stable way when this process is being done. Do you have an automated process like this in your local cluster?

Related

How to sync two postgresql databases in real time

I have two postgresql databases configured for replication. Master node is sending new data in real time to secondary node and it works just fine. But there is one disadvantage - secondary node is read-only, so no write requests are accepted.
What I need is to be able to perform write/read operations on both of the databases and still be able to have them in perfect sync, so they are identical.
What is the best solution for such requirements?
Just for the context of this question:
I have two web app instances that are deployed in two different locations in the world (very high delay in sending requests is the reason I decided to deploy one instance locally in each location). Both instances are fetching the same data but they are also able to generate some data and input it into the DB. It is impossible to have only one DB due to too big delay when fetching data.
Maybe my solution is not perfect, I'm open for any suggestion really because I'm out of ideas how to make it work smoothly and maybe I'm lacking some knowledge.
Thanks

Web scraping vs Cloud Storage with AWS

My team has run into a design conflict. We are working on a project that involves scraping historical data from yahoo for all stocks for the last year to run some ML analysis on it. The latency is unbearably slow, not sure if it's the network or the web scraper. I proposed we use AWS RDS to store the data so we can access it quicker. However, a team member said that storing the data in the cloud would not solve our latency issue. I rebutted with the fact that the data will be organized and stored in a way to access the data significantly faster. He came back with something else and this went on. Is it true that a cloud DB won't offer any additional speed compared to a scraper? If so does AWS have a service that allows us to access the data we store faster through another service, almost as if the database was on our own server?
I am not that all familiar with cloud services but I do understand databases pretty well. So please dumb down the AWS stuff if you wish and feel free to point me to any duplicates or links that may help me understand this more.
Lots of good reasons to use RDS as a database, but speeding up your scraping isn't one of them - it likely isn't your bottleneck.
I have written lots of scrapers over the years, and by far the biggest performance boost will be to have a fast network connection between the scraper machine(s) and the host you are scraping, and even then, using a multi-threaded scraper for each scraping machine will give you another HUGE speed improvement.
Most time spent scraping is waiting on the host to return the results to you, not parsing the page and not saving the database to a database.
A MySQL DB on AWS RDS would be the same as the one that you'd install yourself on some machine. So, it isn't going to be different or slower just because it is in the cloud.
If you scrape some data and process it only once, then there is no point in introducing a DB in between. But if your scraper is slow and you process scraped data multiple times, then storing it in a DB should improve latencies. That is because the latencies of a DB read will be much lesser than that of scraping (assuming you design your DB schema properly; your hosts are in the same availability zones, or at least regions, as your DB etc.).
For e.g., if scraping a webpage takes ~10s and you process the scraped data twice, it'd take you ~20s if you don't have a DB. If you have a DB which has latencies of ~500ms you'd only take ~11s.

Is it a best practice in continuous deployment to deploy to all production servers immediately or just to a subset at first?

We use CD in our project and since the application is used world wide we use more than one data center (one per region). Each data center hosts an isolated instance of the application (each regional deployment uses its own DB, application server etc). Data is not shared between data centers.
There are two different approaches that we can take:
Deploy to integration server (I) where all tests are run, then
deploy to the first data center A and then (once the deployment to A
is finished) to a data center B.
Region A has a smaller user base and to prevent outage in both A
and B caused by a software bug that was not caught on the
integration server (I), an alternative is to deploy to the integration server and then "bake" the code in
region A for 24 hours and deploy the application to data center B
only after it was tested in production for 24 hours. Does this
alternative go against CI best practices since there is no
"continuous" deployment in this case?
There is a big difference between Continuous Integration and Continuous Deploy. CI is for a situation where multiple users work on the same code base and integration tests are run repetitevely for multiple check ins so that integration failures are handled quickly and programatically. Continuous deploy is a pradigm which encapsulate deploying quickly and programmatically accepting your acceptance tests so that you deploy as quick as feasible (instead of the usual ticketing delays that exist in most IT organizations). The question you are asking is a mix for both
As per your specific question, your practice does go against the best practices. If you have 2 different data centers, you have the chance of running into separate issues on the different data centers.
I would rather design your data centers to have the flexibility to switch between current and next version. That way , you can deploy your code to the "next" environment, run your tests there. Once your testing confirms that your new environment is good to go , you can switch your environments from current to next .
The best practice, as Paul Hicks commented, is probably to decouple deployment from feature delivery with feature flags. That said, organizations with many production servers usually protect their uptime by either deploying to a subset of servers ("canary deployment") and monitoring before deploying to all, or by using blue-green deployment. Once the code is deployed, one can hedge one's bet further by flipping a feature flag only for a subset of users and, again, monitoring before exposing the feature to all.

Create SidekiqWorkers on a server by receiving information from another server

Sorry If my title didn't made any sense, but my problem is that, I have one application which is hosted on a server, the application uses a database which is hosted on the same server, also the same server is using sidekiq to process a lot of queues.
One problem, is that a lot of memory is used, and everything works very slow, and even if I have a 8 core processor, I can't take advantage of it when processing queues because the application was developed on MRI and is using Unicorn.
I was thinking at moving all the part which is used to process the queues on a different server, there install Puma, and jRuby and process the queues in there(this process should be a lot faster by taking advantage of multiple cores.
All the data processed by sidekiq, is coming from a Database and is stored in a Database(currently is the same database from where it takes the info and where is storing the data). Most of the sidekiq workers are receiving some information, and are using that information to get other informations so they need to connect to the same db as the app.
What will be a good solution, to serve the same database to 2 different applications?
And is it a good idea to have another server with Puma and jRuby installed for sidekiq only(maybe other things in the future)?
Thank you
Even with MRI and Unicorn you can take advantage of multiple cores: Just start unicorn multiple times or use the clustered mode provided by Puma. Same for Sidekiq. No need to switch to JRuby right away.
Accessing the database from multiple application is no problem. But do yourself a favor and use a dedicated database server. Makes added more application servers way easier.

Central data management for custom desktop applications

I have a background in web programming where both the data and the code live on the server. Web hosts with mysql or the like are plentiful and cheap so using the application from multiple pcs was never a problem.
However I'm considering switching to building desktop applications but the only factor that annoys me is the syncing of data across the many pcs I use. I was thinking of perhaps setting up a light amazon ec2 instance with a postgresql on it and having my desktop applications use that.
I have a few questions:
I'm curious as to what latency I might expect by running the database on ec2 instead of the local network, any experience or insight is appreciated.
Are there better/more obvious/cheaper solutions?
I've looked at the pricing and it seems to come down to 24.48$ per month for a yearly contract. Whilst not really expensive, it is not exactly cheap either. At what point does it become more interesting to run a local server?
I'm obviously not using my applications for large parts of the day (sleep, work,...). I was wondering if I can have the amazon server go into a sort of "sleep" mode and wake up when poked. An initial delay for the first desktop application is acceptable. The reason behind this behavior would be to save money on the instance if it is only actually needed for 10% of the day.
I welcome any feedback at all on how this problem is best tackled.
This could get ugly. Every single query you do will have latency associated with it. If you have a lot of queries, this can add up very fast. So keep your query count low, and try to pre-fetch and cache data when possible.
Not enough information to answer that question.
Depends on the cost of your local server. Keep in mind that you will need to pay for electricity to keep it on.
You can stop your instance when you are not needing it, with the exception of high utilization reservations, you wont get billed when its in stopped state. With high utilization reservations you will still pay the full cost.

Resources