How does eclipse che backup running workspaces? - eclipse-che

I had a look at the overall architecture of Eclipse Che from here.
https://www.eclipse.org/che/docs/che-7/che-architecture.html#high-level-che-architecture_che-architectural-elements
But there is no info on how container workspaces are persisted in case of a container crash / cluster machine crash. Is it handled by kubernetes or Container workspaces controller.

If you configure your workspaces to be persistent (and there is an option to have them be ephemeral, in which case there is no persistent storage), then a Persistent Volume Claim is made for the workspace, and that PVC is made available inside each container in the workspace, as a shared volume. Any files written to the shared volume by Che (or by any other method) will be persisted by the storage back-end - and that's a Kubernetes construct.
In Kubernetes, either a "StorageClass" is defined to say what to do with persistent volume claims (allocate a new Ceph block device for a volume, for example), or the cluster admin has pre-created Persistent Volumes (as, for example, NFS shares) which can, if they are available, be matched with the claim.
There is one important Che config option that affects this - Che can be configured to use a PVC per workspace (very wasteful, if your source code and build files are only going to be a few megabytes, and you've allocated a 1G or 2G volume), or to use one large PVC for all workspaces (more of a security concern, but for most deployments, this will be the convenient option). You can find information on this configuration here in the section "how the Che server uses PVCs and PVs for storage": https://www.eclipse.org/che/docs/che-6/kubernetes-admin-guide.html

Related

K8s which access mode should be used in a persistentvolumeclaim for a database deployment?

I want to store the data from a PostgreSQL database in a persistentvolumeclaim.
(on a managed Kubernetes cluster on Microsoft Azure)
And I am not sure which access mode to choose.
Looking at the available access modes:
ReadWriteOnce
ReadOnlyMany
ReadWriteMany
ReadWriteOncePod
I would say, I should choose either ReadWriteOnce or ReadWriteMany.
Thinking about the fact that I might want to migrate the database pod to another node pool at some point, I would intuitively choose ReadWriteMany.
Is there any disadvantage if I choose ReadWriteMany instead of ReadWriteOnce?
You are correct with the migration, where the access mode should be set to ReadWriteMany.
Generally, if you have access mode ReadWriteOnce and a multinode cluster on microsoft azure, where multiple pods need to access the database, then the kubernetes will enforce all the pods to be scheduled on the node that mounts the volume first. Your node can be overloaded with pods. Now, if you have a DaemonSet where one pod is scheduled on each node, this could pose a problem. In this scenario, you are best with tagging the PVC and PV with access mode ReadWriteMany.
Therefore
if you want multiple pods to be scheduled on multiple nodes and have access to DB, for write and read permissions, use access mode ReadWriteMany
if you logically need to have pods/db on one node and know for sure, that you will keep the logic on the one node, use access mode ReadWriteOnce
You should choose ReadWriteOnce.
I'm a little more familiar with AWS, so I'll use it as a motivating example. In AWS, the easiest kind of persistent volume to get is backed by an Amazon Elastic Block Storage (EBS) volume. This can be attached to only one node at a time, which is the ReadWriteOnce semantics; but, if nothing is currently using the volume, it can be detached and reattached to another node, and the cluster knows how to do this.
Meanwhile, in the case of a PostgreSQL database storage (and most other database storage), only one process can be using the physical storage at a time, on one node or several. In the best case a second copy of the database pointing at the same storage will fail to start up; in the worst case you'll corrupt the data.
So:
It never makes sense to have the volume attached to more than one pod at a time
So it never makes sense to have the volume attached to more than one node at a time
And ReadWriteOnce volumes are very easy to come by, but ReadWriteMany may not be available by default
This logic probably applies to most use cases, particularly in a cloud environment, where you'll also have your cloud provider's native storage system available (AWS S3 buckets, for example). Sharing files between processes is fraught with peril, especially across multiple nodes. I'd almost always pick ReadWriteOnce absent a really specific need to use something else.

Kubernetes volumes: when is Statefulset necessary?

I'm approaching k8s volumes and best practices and I've noticed that when reading documentation it seems that you always need to use StatefulSet resource if you want to implement persistency in your cluster:
"StatefulSet is the workload API object used to manage stateful
applications."
I've implemented some tutorials, some of them use StatefulSet, some others don't.
In fact, let's say I want to persist some data, I can have my stateless Pods (even MySql server pods!) in which I use a PersistentVolumeClaim which persists the state. If I stop and rerun the cluster, I can resume the state from the Volume with no need of StatefulSet.
I attach here an example of Github repo in which there is a stateful app with MySql and no StatefulSet at all:
https://github.com/shri-kanth/kuberenetes-demo-manifests
So do I really need to use a StatefulSet resource for databases in k8s? Or are there some specific cases it could be a necessary practice?
PVCs are not the only reason to use Statefulsets over Deployments.
As the Kubernetes manual states:
StatefulSets are valuable for applications that require one or more of the following:
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
You can read more about database considerations when deployed on Kubernetes here To run or not to run a database on Kubernetes
StatefulSet is not the same as PV+PVC.
A StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
In other words it manages the deployment and scaling of a set of Pods , and provides guarantees about the ordering and uniqueness of these Pods.
So do I really need to use a StatefulSet resource for databases in k8s?
It depends on what you would like to achieve.
StatefulSet gives you:
Possibility to have a Stable Network ID (so your pods will be always named as $(statefulset name)-$(ordinal) )
Possibility to have a Stable Storage, so when a Pod is (re)scheduled onto a node, its volumeMounts mount the PersistentVolumes associated with its PersistentVolume Claims.
...MySql and no StatefulSet...
As you can see, if your goal is just to run single RDBMS Pod (for example Mysql) that stores all its data (DB itself) on PV+PVC, then the StatefulSet is definitely an overkill.
However, if you need to run Redis cluster (distributed DB) :-D it'll be close to impossible to do that without a StatefulSet (to the best of my knowledge and based on numerous threads about the same on StackOverflow).
I hope that this info helps you.

Programmers should avoid writing to the local file system in cloud?

Should programmers avoid writing to the local file system when writing application to be deployed to a cloud?
Does this recommendation apply only to this particular cloud provider (Cloud Foundry)?
In short, yes, you probably should avoid it.
Most cloud providers - just like Cloud Foundry - recommend that you only keep ephemeral data (like caches) on your local disk, since a single machine may fail or reboot for upgrade or re-balancing at any time and you don't necessarily get the same machine back after a restart.
Many providers provide alternate SAN/SMB mountable disks which you can use for persistent data.

How are sparse files handled in Google Cloud Storage?

We have a 200GB sparse file which is about 80GB in actual size (VMware disk).
How does Google calculate the space for this file, 200GB or 80GB?
What would be the best practice to store it in the Google Cloud using gsutil (similar to rsync -S)
Would it be solved by using tar cSf, and then upload via gsutil? How slow could it be?
We have a 200GB sparse file which is about 80GB in actual size (VMware disk).
How does Google calculate the space for this file, 200GB or 80GB?
Google Cloud Storage does not introspect your files to understand what they are, so it's the actual size (80GB) that it takes on disk that matters.
What would be the best practice to store it in the Google Cloud using gsutil (similar to rsync -S)
There's gsutil rsync but it does not support -S so that won't be very efficient. Also, Google Cloud Storage is not storing files as blocks which can be accessed and rewritten randomly, but as blobs keyed by the bucket name + object name, so you'll essentially be uploading the entire 80GB file every time.
One alternative you might consider is to use Persistent Disks which provide block-level access to your files with the following workflow:
One-time setup:
create a persistent disk and use it only for storage of your VM image
Pre-sync setup:
create a Linux VM instance with its own boot disk
attach the persistent disk in read-write mode to the instance
mount the attached disk as a file system
Synchronize:
use ssh+rsync to synchronize your VM image to the persistent disk on the VM
Post-sync teardown:
unmount the disk within the instance
detach the persistent disk from the instance
delete the VM instance
You can automate the setup and teardown steps with scripts so it should be very easy to run on a regular basis whenever you want to do the synchronization.
Would it be solved by using tar cSf, and then upload via gsutil? How slow could it be?
The method above will be limited by your network connection, and would be no different from ssh+rsync to any other server. You can test it out by, say, throttling your bandwidth artificially to another server on your own network to match your external upload speed and running rsync over ssh to test it out.
Something not covered above is pricing, so I'll just leave these pointers for you to consider that as well, as that may be relevant for you to consider in your analysis.
Using Google Cloud Storage mode, you'll incur:
Google Cloud Storage pricing: currently $0.026 / GB / month
Network egress (ingress is free): varies by total amount of data
Using the Persistent Disk approach, you'll incur:
Persistent Disk pricing: currently $0.04 / GB / month
VM instance: needs to be up only while you're running the sync
Network egress (ingress is free): varies by total amount of data
The actual amount of data you will download should be small, since that's what rsync is supposed to minimize, so most of the data should be uploaded rather than downloaded, and hence your network cost should be low, but that is based on the actual rsync implementation which I cannot speak for.
Hope this helps.

Any distributed file system which support constant time cloning

lustre, or google file system(GFS) split a file into some kinds of block, and save them to various nodes. So they can acheive scalability, distributed traffic.
ZFS, btrfs, wafl support constant time cloning. By this, they can achieve cloning speed, writable snapshot, saving storage size.
I have been founding any file system which support above two feature.
Though there are a lot file system which support constant time cloning. but I can't find any distributed file system which can support constant time cloning. Lustre team look like developing lustre supporting zfs(and also support cloning). but it revealed yet(moreover it doesn't include 2.0 beta, maybe it will not be revealed in short time).
Nexenta storage seemed like supporting these feature by "namespace nfs". but it wasn't. it just distribute file by file-level distribution. It means, if some file exceed size of volume of one node, it will not able to handle it. If a lot of cloned files grow to big file, they can't handle that(at least, they have to really copy(not shadowing nodes) original file to other node. maybe i can attach SAN disks to zvolume of a ZFS node. but I'm very worry about concentrated traffic of ZFS node.
so I'm looking for a file system or a solution which can handle above two issue.
One working solution is to combine the Lustre filesystem with Robinhood Policy Engine in backup mode to constantly backup your filesystem files. This mode makes it possible to backup a Lustre v2.x filesystem to an external storage. It tracks modifications in the filesystem thanks to Lustre 2+ changelogs feature (FS events), and copy modified files to the backend storage, according to admin-defined migration policies. You can configure your own upcall commands in Robinhood, for example to provide a scalable way to clone your filesystem and schedule sync tasks on several nodes.
With Lustre on ZFS, it should be possible to use ZFS snapshot feature, but even the ZFS stack is not yet ready for production (currently tested on top 1 supercomputer Sequoia at LLNL).

Resources