How often do Virtual Machines migrate in Google Cloud? - google-app-engine

I just read an interesting article about Google Transparent Maintenance and now I am thinking about how often a virtual machine migrates per day.
I didn't find any statistics so I am wondering if anybody can give me some information about this?
Another point that comes into my mind is doing live migration not only for maintenance purpose but for saving energy by migrating virtual machines and power off physical hosts. I found many papers discussing this topic so I think it's a big topic in currently research.
Does somebody know if there is any cloud provider already performing something like this?

How often do scheduled infrastructure maintenance events happen?
Infrastructure maintenance events don't have a set interval between occurrences, but generally happen once every couple of months.
How do I know if an instance will be undergoing a infrastructure maintenance event?
Shortly before a maintenance event, Compute Engine changes a special attribute in a virtual machine's metadata server before any attempts to live migrate or terminate and restart the virtual machine as part of a pending infrastructure maintenance event. The maintenance-event attribute will be updated before and after an event, allowing you to detect when these events are imminent. You can use this information to help automate any scripts or commands you want to run before and/or after a maintenance event. For more information, see the Transparent maintenance notice documentation.
SOURCE: Compute Engine Frequently Asked Questions.

Related

Can Snowflake be used to mitigate application failure for business continuity?

I would like your opinions or experiences around the following possible solution idea. I know Snowflake is primarily a data analytics platform. But why could we not use it for some creative scenarios like business continuity?
Problem
Imagine an application that supports a critical business process. There is a risk that the application could become unavailable for an extended period. The application in this case is a SaaS solution by a reputable vendor, Salesforce. So it does not go down often. And when it does, they normally restore it in less than a day. But the business process is a critical medical logistics process - meaning if a transaction is delayed for a few days, lives may be lost.
Background
Our transaction volumes are moderate. We probably serve 25 new patients per day, with a few hundred interactions each day to support those. In the even of an outage, a subset of those might need immediate manual intervention to keep things moving. Others might be able to wait a couple of days.
We already use Snowflake to store replicas of the application's data. We use Looker to write analytics reports.
Proposed Solution
Write reports that expose critical data that may be needed if the primary application fails. Then, when the primary application fails, users can view reports using the latest replicated data to enable manual activities to keep things going until the primary application is restored to working order.
If data changes are needed, they must be written down somewhere and then applied to the application when its availability is restored
Your only issue could be latency, as it is today Snowflake is not built for OLTP workloads, but OLAP workloads.
If the latency you get when running queries from Snowflake is fine then you have a valid use case.
Snowflake is used as an Application backend - particularly if the query is about historical analysis and latency is acceptable at a few seconds as opposed to immediate.
See: https://www.snowflake.com/workloads/data-applications/

Improving availability for Azure SQL?

What's a good strategy to maintain availability with Azure SQL? We've noticing way too many service interruptions with messages like
which basically kill our application entirely. The SLA is far from 99.9% and honestly we're not interested in getting a refund, just reliable availability so our customers don't experience app outages. We're actually having better uptimes with a single IaaS VM running SQL than relying on Azure SQL (which totally blows our minds)
Anyway, leaving all those observations behind, programmatically, what is the most economical approach to having better than the advertised 99.99% availability (say an order of magnitude better - 99.999% - 5 mins downtime/year) with Azure SQL? Any specific data access programming pattern(s) and operating procedures anyone recommends?
EDIT: We're already using the Microsoft EntLib 6.0 Transient Fault Handling Application Block library ... 10 retries with 100ms inter-attempt timings. However, it's not 'transient' when these outages are 5+ hours long ...
Have a look at the passive and active geo-replication features of SQL Database -
http://azure.microsoft.com/blog/2014/07/12/spotlight-on-sql-database-active-geo-replication/
Active geo-replication creates additional copies of the database seamlessly in another data centre you can fall back to in case of an issue in a particular data centre
We use stovepiped solutions in each datacenter with an external load-balancer/failover solution. URLs are created to monitor data center faults and failover to another region.
In other words, we built our own business continuity solution to guarantee 5x9s.

Profiling and output caching in ASP.NET MVC

So I was recently hired by a big department of a Fortune 50 company, straight out of college. I'll be supporting a brand new ASP.NET MVC app - over a million lines of code written by contractors over 4 years. The system works great with up to 3 or 4 simultaneous requests, but becomes very slow with more. It's supposed to go live in 2 weeks ... I'm looking for practical advice on how to drastically improve the scalability.
The advice I was given in Uni is to always run a profiler first. I've already secured a sizeable tools budget with my manager, so price wouldn't be a problem. What is a good or even the best profiler for ASP.NET MVC?
I'm also looking at adding caching. There is currently no second level and query cache configured for nHibernate. My current thinking is to use Redis for that purpose. Also looking at output caching, but unfortunately the majority of the users will login to the site. Is there a way to still cache parts of the pages served by MVC?
Do you have any monitoring or instrumentation setup for the application? If not, I would highly recommend starting there. I've been using New Relic for a few years with ASP.NET apps and been very happy with it.
Right off the bat you get a nice graph of request response times broken down into 3 kind of tasks that contribute to the response time
.NET CLR - Time spent running .NET code
Database - Time spent waiting on SQL requests
Request Queue - Time spent waiting for application workers to become available
It also breaks down performance by MVC action so you can see which ones are the slowest. You also get a breakdown of performance per database query. I've used this many times to detect procedures that were way too slow for heavy production loads.
If you want to, you can have New Relic add some unobtrusive Javascript to your page that allows you to instrument browser load times. This helps you figure things out like "my users outside North America spend on average 500ms loading images. I need to move my images to a CDN!"
I would highly recommend you use some instrumentation software like this. It will definitely get you pointed in the right direction and help you keep your app available and healthy.
Profiler is a handy tool to watch how apps communicate with your database and debug odd behaviour. It's not a long-term solution for performance instrumentation given that it puts a load on your server and the results require quite a bit of laborious processing and digestion to paint a clear picture for you.
Random thought: check out your application pool configuration and keep and eye out in the event log for too many recycling events. When an application pool recycles, it takes a long time to become responsive again. It's just one of those things can kill performance and you can rip your hair out trying to track it down. Improper recycling settings bit me recently so that's why I mention it.
For nHibernate analysis (session queries, caching, execution time) you could use HibernatingRhinos Profiler. It's developed by the guys that developed nhibernate, so you know it will work really good with it.
Here is the URL for it:
http://hibernatingrhinos.com/products/nhprof
You could give it a try and decide if it helps you or not.

Central data management for custom desktop applications

I have a background in web programming where both the data and the code live on the server. Web hosts with mysql or the like are plentiful and cheap so using the application from multiple pcs was never a problem.
However I'm considering switching to building desktop applications but the only factor that annoys me is the syncing of data across the many pcs I use. I was thinking of perhaps setting up a light amazon ec2 instance with a postgresql on it and having my desktop applications use that.
I have a few questions:
I'm curious as to what latency I might expect by running the database on ec2 instead of the local network, any experience or insight is appreciated.
Are there better/more obvious/cheaper solutions?
I've looked at the pricing and it seems to come down to 24.48$ per month for a yearly contract. Whilst not really expensive, it is not exactly cheap either. At what point does it become more interesting to run a local server?
I'm obviously not using my applications for large parts of the day (sleep, work,...). I was wondering if I can have the amazon server go into a sort of "sleep" mode and wake up when poked. An initial delay for the first desktop application is acceptable. The reason behind this behavior would be to save money on the instance if it is only actually needed for 10% of the day.
I welcome any feedback at all on how this problem is best tackled.
This could get ugly. Every single query you do will have latency associated with it. If you have a lot of queries, this can add up very fast. So keep your query count low, and try to pre-fetch and cache data when possible.
Not enough information to answer that question.
Depends on the cost of your local server. Keep in mind that you will need to pay for electricity to keep it on.
You can stop your instance when you are not needing it, with the exception of high utilization reservations, you wont get billed when its in stopped state. With high utilization reservations you will still pay the full cost.

Rich database frontend - how to correctly handle low quality networks?

I have a very limited experience of database programming and my applications that access databases are simple ones :). Until now :(. I need to create a medium-size desktop application (it's called rich client?) that will use a database on the network to share data between multiple users. Most probably i will use C# and MSSQL/MySQL/SQLite.
I have performed a few drive tests and discovered that on low quality networks database access is not so smooth. In one company's LAN it's a lot of data transferred over network and servers are at constant load, so it's a common situation that a simple INSERT or SELECT SQL query will take 1-2 minutes or even fail with timeout / network error.
Is it any best practices to handle such situations? Of course i can split my app into GUI thread and DB thread so network problems will not lead to frozen GUI. But what to do with lots of network errors? Displaying them to user too often will be not very good :(. I'm thinking about automatic creating local copy of a database on each computer my app is running: first updating local database and synchronize it in background, simple retrying on network errors. This will allow an app to function event if network has great lags / problems.
Any hints and buzzwords what can i look into? Maybe it's some best practices already available that i don't know :)
Sorry this is prob not the answer you are looking for but you mention that a simple insert / update could take 1-2 minutes or even fail with timeout / network error.
This to me sounds like there may be another problem rather than the network itself. If your working on a corporate network there would have to be insane levels of traffic for this sort of behavior. I would do everything in your power to look at improving the network before proceeding. Can you post the result of a ping to the db box?
If your going to architect your application around this type of network it will significantly alter the end product and even possibly result in a poor quality product for other clients.
Depending upon the nature of the application maybe look at implementing an async persistence queue and caching data on startup or even embedding a copy of the db into your application.
Even though async behaviour/queues/caching/copying the database to each local instance etc will help solve the symptoms, the problem will still remain. If the network really is that bad then I'd address it with their I.T. department, or the project manager and build some performance requirement from their side of things into the contract.

Resources