Defining Continuous Deployment - continuous-deployment

Questions we get a lot:
What is Continuous Deploymenyt?
What do we deploy 'continuously'?
How does it differ from Continuous Delivery? Is there a difference?
My attempt at answering these questions on my blog post: What is Continuous Deployment?
What is the generally accepted definition and distinction between Continuous Delivery and Continuous Deployment?

Continuous delivery is a series of practices designed to ensure that code can be rapidly and safely deployed to production by delivering every change to a production-like environment and ensuring business applications and services function as expected through rigorous automated testing. Since every change is delivered to a staging environment using complete automation, you can have confidence the application can be deployed to production with a push of a button when the business is ready.
Continuous deployment is the next step of continuous delivery: Every change that passes the automated tests is deployed to production automatically. Continuous deployment should be the goal of most companies that are not constrained by regulatory or other requirements.

I would say Continuous deployment is one of the steps of a continuous delivery system. There is a very enlightening blog written by Martin Fowler from thoughtworks on Continuous *(Integration,Delivery , testing) etc. I would suggest you read through it to understand it.
There are a lot of facets to the entire Continuous * Ecosystem( Build,integration,testing,deploy,UAT, delivery,) that cannot be covered in a single comment/answer thread. It surely deserves its own space on a blog/wiki/bliki. You should probably read a few blogs and search for understanding there.

Related

Can Snowflake be used to mitigate application failure for business continuity?

I would like your opinions or experiences around the following possible solution idea. I know Snowflake is primarily a data analytics platform. But why could we not use it for some creative scenarios like business continuity?
Problem
Imagine an application that supports a critical business process. There is a risk that the application could become unavailable for an extended period. The application in this case is a SaaS solution by a reputable vendor, Salesforce. So it does not go down often. And when it does, they normally restore it in less than a day. But the business process is a critical medical logistics process - meaning if a transaction is delayed for a few days, lives may be lost.
Background
Our transaction volumes are moderate. We probably serve 25 new patients per day, with a few hundred interactions each day to support those. In the even of an outage, a subset of those might need immediate manual intervention to keep things moving. Others might be able to wait a couple of days.
We already use Snowflake to store replicas of the application's data. We use Looker to write analytics reports.
Proposed Solution
Write reports that expose critical data that may be needed if the primary application fails. Then, when the primary application fails, users can view reports using the latest replicated data to enable manual activities to keep things going until the primary application is restored to working order.
If data changes are needed, they must be written down somewhere and then applied to the application when its availability is restored
Your only issue could be latency, as it is today Snowflake is not built for OLTP workloads, but OLAP workloads.
If the latency you get when running queries from Snowflake is fine then you have a valid use case.
Snowflake is used as an Application backend - particularly if the query is about historical analysis and latency is acceptable at a few seconds as opposed to immediate.
See: https://www.snowflake.com/workloads/data-applications/

Can vaadin-flow applications be run (reliably) on GAE standard environment? If not, what has to be done to make them run (reliably)?

This is a wrap-up of several open questions in the vaadin forum (which is moving to stack-overflow) - see 2, 3, 4, ...
The basic question is: "How can one run a (recent V14 LTS or V19/20) vaadin-flow application reliably on google app engine (GAE manual or automatic scaling) standard environment (i.e. no docker, no google compute engine), without experiencing constant refreshing of components"
Vaadin has a tutorial for deploying vaadin applications to GAE flexible (not standard as for this question). This tutorial doesn't mention that one might run into trouble when GAE switches the server-instance. Even GAE mentiones Vaadin as one of the supported frameworks.
Old Vaadin 8 release notes state, that support for GAE has been dropped.
According to the questions in the vaadin forum, vaadin-flow at GAE will lead (or at least has led in older versions of vaadin) to constant refreshing of components and/or loss of session-state.
If one uses manual scaling, ones applications can rely on the state of the memory over time, so vaadin-flow applications should not be bothered with switches of the server-instance (which will eventually occur when an instance is shut down due to an error or maintenence reasons).
So the first question is: "When running a vaadin-flow application on GAE standard with manual scaling, will it lead to constant refreshing of componends and/or loss of session-state even when the instance is not switched?"
If that works, than vaadin-flow is fine on GAE standard when one needs neither dynamic scaling nor high availability. It that does not work, vaadin-flow on GAE standard would be a NOGO for any type of application (and one needs to switch to another provider or to GAE flexible and docker).
The next question is: "What has to be done to make vaadin-flow application on GAE standard run reliably, even when instances are scaled or switched due to maintenance?"
The following suggestions were stated in the forum - but noone ever confirmed that they work:
One could have set <sessions-enabled>true</sessions-enabled> for java 8. This setting no longer exists for java 11. Even when when nr. of instances is changing or instances are restarted, this could have been the solution since session data is stored in memcache which is available over all instances.
When instances are moved or shut down, google sends a shutdown notification -> one could implement a shutdown-hook and try to serialize all session-state (if vaadin provides a way to serialize it manually and automatically de-serialize it when another instance takes over).
Has anyone found a reliable solution for this?
I think it's not possible.
According to https://stackoverflow.com/a/10640357/377320 GAE uses Datastore to store the session informations and only synchronizes objects set via session.setAttribute(). However, according to https://mvysny.github.io/vaadin-14-session-replication/ Vaadin doesn't call setAttribute(), moreover Vaadin does that on purpose.
That means that GAE won't synchronize the state properly but the requests will land on random nodes (since session affinity/sticky sessions is not supported on GAE), causing the requests to land on nodes with obsolete Vaadin state. That leads me to believe that the most likely outcome is that Vaadin components will constantly lose state, and Vaadin will try to frequently attempt to resync the UI state by performing browser reloads.
Vaadin Flow works best with sticky sessions and long-running servers while GAE sounds like an opposite of that.
I believe your best bet is to use Vaadin Fusion with a stateless server.
Regarding serializing all session state (eg. on a shutdown hook): this requires that all of your stateful objects, like all of your data beans, Vaadin Flow views, compositions etc., are serializable. In other words, all classes must implement java.io.Serializable. This can be done, as all classes from Vaadin should be serializable, but it's up to the application developer to make sure that all custom instantiable classes in the codebase (and any instantiable classes in the dependencies) can be serialized and deserialized. Based on practical experience, this is not a trivial requirement - it usually drives the application's architecture by limiting or changing the design patterns used in the code. Making an existing codebase fully serializable will likely incur significant refactoring work. This is one of the big reasons why sticky sessions (which are not available in GAE) are the recommended approach for deploying Vaadin Flow applications in multi-node environments.
My application is running since months now and I must say, that it worked just fine. BUT we never needed more than one instance (basic_scaling: max_instances: 1), so no sticky-sessions were needed and my conclusions might not be valid if one needs more than one instance:
both F- and B- instance-class worked fine - not a single switch has occurred in the past, no sessions were lost
if the instance needs to be re-started due to idle-timeout, this is just a matter of ~10 seconds which is ok for the test-instance. For production I would recommend manual scaling with 24 instance-hours per day (so the instance will never be re-started)
Session-affinity can be set in app.yaml according to https://cloud.google.com/appengine/docs/flexible/java/using-websockets-and-session-affinity. So it might as well work with multiple instances

Is it a best practice in continuous deployment to deploy to all production servers immediately or just to a subset at first?

We use CD in our project and since the application is used world wide we use more than one data center (one per region). Each data center hosts an isolated instance of the application (each regional deployment uses its own DB, application server etc). Data is not shared between data centers.
There are two different approaches that we can take:
Deploy to integration server (I) where all tests are run, then
deploy to the first data center A and then (once the deployment to A
is finished) to a data center B.
Region A has a smaller user base and to prevent outage in both A
and B caused by a software bug that was not caught on the
integration server (I), an alternative is to deploy to the integration server and then "bake" the code in
region A for 24 hours and deploy the application to data center B
only after it was tested in production for 24 hours. Does this
alternative go against CI best practices since there is no
"continuous" deployment in this case?
There is a big difference between Continuous Integration and Continuous Deploy. CI is for a situation where multiple users work on the same code base and integration tests are run repetitevely for multiple check ins so that integration failures are handled quickly and programatically. Continuous deploy is a pradigm which encapsulate deploying quickly and programmatically accepting your acceptance tests so that you deploy as quick as feasible (instead of the usual ticketing delays that exist in most IT organizations). The question you are asking is a mix for both
As per your specific question, your practice does go against the best practices. If you have 2 different data centers, you have the chance of running into separate issues on the different data centers.
I would rather design your data centers to have the flexibility to switch between current and next version. That way , you can deploy your code to the "next" environment, run your tests there. Once your testing confirms that your new environment is good to go , you can switch your environments from current to next .
The best practice, as Paul Hicks commented, is probably to decouple deployment from feature delivery with feature flags. That said, organizations with many production servers usually protect their uptime by either deploying to a subset of servers ("canary deployment") and monitoring before deploying to all, or by using blue-green deployment. Once the code is deployed, one can hedge one's bet further by flipping a feature flag only for a subset of users and, again, monitoring before exposing the feature to all.

Web-Application Deployment Process, Scrum Sprints, Git-flow, Versioning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
We have a small scrum-team which develops a webseite with a lot of users. We would like to improve our development and relase/deployment process, but there are a lot of concepts out there, which in our eyes don't fit together perfectly. Since we're not the only company developing a website, I thought it might be a good idea to ask here ;)
First of all, the software is not in good shape, so things like continuous delivery or continuous depoloyment are not possible because of bad test coverage and that's nothing we can change soon.
We have two-week sprints, so we develop new features or resolve bugs in that period, after the final sprint meeting we merge the feature branches into master (we use feature branches in git and pull-request for review and merging), do some testing, and deploy master to a public beta enveriment. There we usually find some bugs, resolve them and after that we deploy master inckuding beta fixes to the production enviroment. After that we start with our next sprint.
But that process is far from being perfect. First of all it's difficult to show features with all those branches in our sprint review meeting, and because we only have one master branch which gets deployed, we can't easyly deploy hotfixes to different enviroments. Sometimes we need more time for testing in our beta enviroment, so we can't merge new features to master, or same for production, once deployed, we can't deploy hotfixes if we are already testing new features on beta. If we expect bugs in beta or production due to larger changes, development has too keep feature branches for a longer period, so merging feature branches later on becomes more and more painful.
First we thought about long running branches, like master for development, production and beta. So we could merge whatever feature we want into one of the thre branches. But we really like to work with pull requests (reviews, feedback and deleting feature branch after merging), it's really nice to work with, but we can only apply a pull-request branch to one other branch. So here, we could merge to master without deleting the branch, and have to swithc to another tool to merge a feature to beta or production, or create new pull requests for beta and production. It works that way, but its not such a nice workflow as only merging to one master branch.
We also thought about git-flow (Vincent Driessen's branching model), which looks nice, but it seems to be better suited to software with traditional release cycles and versioning, not 100% for web aplications, which have no real versions but deploy everything which is ready after a sprint. It solves the hotfix problems, has an extra develop branch, but it requirs release-versions. So we could create a release branch, get problems sorted out, release it to production and delete the release branch. We could open a pull request for merging into master with the release branch, but problems start if we want to release it to beta (we use capistrano for deployment), because the branch changes every sprint. And what if we want to test features in our beta enviroment? We could use the release branch for beta, but this way a release to production has to wait until all features in the release/beta branch are ready. If testing of large features takes long, we can't upload small updates to production.
Next problem is versioning. We use jira and jira likes to use versions for releases. We could use versions like "1,2,3"..., but should one sprint be one version? Doesn't feel right, because sprint planning is not the same as release planning, or should it be the same when developing a web application? Because sometimes we develop featurs in a sprint, which take longer and are released later after being completed in the next 1-2 sprints. With git-flow, this changes could not be merged to develop until they are ready, because every release is branched from develop branch. So this way, some feature branches are not merged for a long time and merging them becomes more and more difficult. It's the opposite of continuous integration.
And last but not least, the deployment and QA process does not fit to scrum so very much, we don't have an own web operations team, when a sprint is ready, the product owner has to review the stories, we have to test them, deploy them to beta/production, and have to fix the problems in it immediately, which also interrupts or next sprint. We are not sure when is the right time to merge the feature branches? Before review meeting? This way we could have merged features which are not accepted by the product owner into the branch which should be released soon, but if we don't merge them, we have to demonstrate each feature in it's own branch and merge every accepted feaure branch, so integration and testing could only start after the sprint review meeting. Integration and testing could take days and needs development resources, so when should the next sprint start? After release to production? But this way we could not start a sprint every two weeks, and not every developer is needed for integrating and QA, so what should they work on? Currently we start the planning meeting for the next sprint immidiatly after the review meeting of the last sprint, but if we do it this way we are not sure how many time we need for the release which draws resources from the sprint...
So how do you release web applications?
What workflow do you use?
What's the best way if we want to integrate pull requests in the workflow?
How do you integrate QA and deployment into the scrum workflow?
I try to make a backlog and prioritize it:
Prio 1: Refit you Definition of Done.
I think you are betraying yourself how many features you can develop in a sprint.
Scrums defines the result of an sprint as "useful increment of software".
What you do is to produce a "testable increment of software".
You have to include the test into the sprint. All stories that are not tested and have a "production ready" stamp on it, are simply not DONE.
Prio 1: Do your retrospectives
Get the team together and talk about this release mess. You need a lightweight way to test stories. You and your team knows best what's available and what is hindering you.
If you already doing those retrospectives, this one is done. ;)
Prio 1: Automated Tests
How can you get any story done without UnitTests?
Start making them. I would propose that you search for the most delicate/important component in the system and bring test coverage (ECLemma/JaCoCo helps) up to at least 60% for this component.
Commit a whole sprint to this. I don't know your DOD, but this is work that has been procrastinated and has to be redone.
If a manager tells you its impossible, remind him of the good old waterfall days, where 2 weeks of development have not raised his eyebrow, why should it now...
Prio 2: Integration testing
You have Unit-Tests in place and you do some additional manual testing in the sprints.
What you want is to include the integration test (merging with master) to the sprint.
This has to happen, because you have to(!) solve the bugs in the sprint they have been introduced. So a early integration has to happen.
To get there it is important that during a sprint you work by priorities: The highest prio task/Story has to be solved by the whole team first. Goal is, to have it coded and unit tested, then you hand it over to your tester for smoke testing. Bugs found have the prio of the story. When it works good enough this part is deployed to the integration and tested there.
(Of course most of the time, not all team members can work effectively on 1 story. So one other story is processed in parallel. But whenever something stops your Prio1 Story, the whole team has to help there.)
Prio 2: Reduce new features increase Quality
To include more quality you have to lower expectations regarding amount of new features. Work harder is not what scrum tells you. It tells you to deliver more quality because this will speed up your development in mid term.
Regarding QA and deployment, I suggest the following:
Commit your code early to the main branch, and commit often. I am totally unfamiliar with git so I can't comment on how you're using it but I've always worked off the main trunk (using Subversion) with all the developers committing new code to the trunk. When we create a release we branch the trunk to create a release branch.
If you haven't done so already then set up a continuous integration system (we use TeamCity, but it'll depend to some degree on your stack). Get what unit tests you do have running at each commit.
Create a definition of 'done' and use it. This will stop you from being in the situation of testing features implemented during the previous sprint. Make sure your definition of done includes the passing of acceptance tests that have been co-written by your Product Owner, in my opinion the PO should not be rejecting features during the Sprint Review.

Is GAE a viable platform for my application? (if not, what would be a better option?)

Here's the requirement at a very high level.
We are going to distribute desktop agents (or browser plugins) to collect certain information from tons of users (in thousands or possibly millions down the road).
These agents collect data and periodically upload it to a server app.
The server app will allow for analyzing collected data (filter, sort etc based on 4-5 attributes) and summarize in form of charts etc.
We should also be able to export some of the collected data (csv or pdf)
We are looking for an platform to host the server app. GAE seems attractive because of low administrative cost and scalability (as users base increases, the platform will handle the scale... hopefully!).
Is GAE a viable option for us?
One important consideration is that sometimes the volume of uploads from the agents can exceed 50MB per upload cycle. We will have users in places where Internet connections could be very slow too. Apparently GAE has a limit on the duration a request can last. The upload volume may cause the request (transferring data from an agent to the server) to last longer than 30 seconds. How would one handle such situation?
Thanks!
The time of the upload is not considered part of the script execution time, so no worries there.
Google App Engine is very good to perform a vast number of smaller jobs but not so much to do complex long running background jobs (because of the 30 sec limit + even smaller database connection time limit). So probably GAE would be a very good platform to GATHER the data but not for actually ANALYZING it. You probably would like to separate these two.
We went ahead an implemented the first version on GAE anyway. The experience has been very much what is described here http://www.carlosble.com/?p=719
For a proof-of-concept prototype, what we have built so far is acceptable. However, we have decided not to go with GAE (at least in its current shape) for the production version. The pains somewhat outweigh the benefits in our case.
The problems we faced were numerous. Unlike my experience dealing with J2EE stacks, when you run into an issue, many a times it is a dead end. Workarounds are very complicated and ugly, if you can find one.
By writing good prototypes one could figure out whether GAE is right for the solution being built, however, the hype is a problem. Many newcomers are going get overly excited about GAE due to its hype and end up failing badly. Because they will choose GAE for all kinds of purpose that it is not suitable for.

Resources