What are the correct /etc/exports settings for Kubernetes NFS Storage? - sql-server

I have a simple NFS server (followed instructions here) connected to a Kubernetes (v1.24.2) cluster as a storage class. When a new PVC is created, it creates a PV as expected with a new directory on the NFS server.
The NFS provider was deployed as instructed here.
My issue is that containers don't seem to be able to perform all the functions they expect to when interacting with the NFS server. For example:
A PVC and PV are created with the following yml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mssql-data
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
This creates a directory on the NFS server as expected.
Then this deployment is created to use the PVC:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
selector:
matchLabels:
app: mssql
template:
metadata:
labels:
app: mssql
spec:
terminationGracePeriodSeconds: 30
hostname: mssqlinst
securityContext:
fsGroup: 10001
containers:
- name: mssql
image: mcr.microsoft.com/mssql/server:2019-latest
ports:
- containerPort: 1433
env:
- name: MSSQL_PID
value: "Developer"
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
value: "Password123"
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: mssql-data
The server comes up and responds to requests but does so with the error:
[S0002][823] com.microsoft.sqlserver.jdbc.SQLServerException: The operating system returned error 1117(The request could not be performed because of an I/O device error.) to SQL Server during a read at offset 0x0000000009a000 in file '/var/opt/mssql/data/master.mdf'. Additional messages in the SQL Server error log and operating system error log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
My /etc/exports file has the following contents:
/srv *(rw,no_subtree_check,no_root_squash)
When the SQL container starts, it doesn't undergo any container restarts but the SQL service within the container appears to get into some sort of restart loop until a connection is attempted and then it throws the error and appears to stop.
Is there something I'm missing in the /etc/exports file? I tried variations with sync, async, and insecure but can't seem to get past the SQL error.
I gather from the error that this has something to do with the container's ability to read/write from/to the disk. Am I in the right ballpark?

The config that ended up working was:
/srv *(rw,no_root_squash,insecure,sync,no_subtree_check)
This was after a reinstall of the cluster. No significant changes elsewhere but still seems like there may have been more to the issue than this one config.

Related

Tanzu Kubernetes NotAuthenticated is set on the volume on virtualmachine

Thanks for any help on this.
I'm running a Tanzu kubernetes cluster, brand new in a dev environment. I'm trying to install MS SQL Server 2019 and am hitting a wall with this error once I apply the manifest.
The SQLserver pod fails with this:
ltkc-workers-mpqdb-556696d6f6-rhpsw
Warning FailedMount 50s kubelet, sqltkc-workers-mpqdb-556696d6f6-rhpsw Unable to attach or mount volumes: unmounted volumes=[mssql-persistent-storage], unattached volumes=[default-token-qzt5k mssql-persistent-storage]: timed out waiting for the condition
Warning FailedAttachVolume 45s (x9 over 2m53s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-697e8f96-a23b-4255-9b19-fa04aeed98ee" : rpc error: code = Internal desc = observed Error: "ServerFaultCode: NotAuthenticated" is set on the volume "fbc91ad5-b62e-4bec-8132-4f2d1c5160f0-697e8f96-a23b-4255-9b19-fa04aeed98ee" on virtualmachine "sqltkc-workers-mpqdb-556696d6f6-rhpsw"
The pv and pvc all are bound:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-697e8f96-a23b-4255-9b19-fa04aeed98ee 10Gi RWO Delete Bound default/mssql-data-claim pstore-high 67m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/mssql-data-claim Bound pvc-697e8f96-a23b-4255-9b19-fa04aeed98ee 10Gi RWO pstore-high 67m
The deployment manifest is just what I downloaded from the web from various other tutorials:
apiVersion: v1
kind: Service
metadata:
name: mssql-deployment
spec:
selector:
app: mssql
ports:
- protocol: TCP
port: 1433
targetPort: 1433
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
selector:
matchLabels:
app: mssql
template:
metadata:
labels:
app: mssql
spec:
terminationGracePeriodSeconds: 10
securityContext:
fsGroup: 1000
restartPolicy: Always
containers:
- name: mssql
resources:
requests:
memory: 8000Mi
image: mcr.microsoft.com/mssql/server:2019-latest
ports:
- containerPort: 1433
env:
- name: MSSQL_PID
value: "Developer"
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
value: VMware123!
volumeMounts:
- name: mssql-persistent-storage
mountPath: /var/opt/mssql
volumes:
- name: mssql-persistent-storage
persistentVolumeClaim:
claimName: mssql-data-claim
Here is the pvc yaml:
kind: PersistentVolumeClaim
metadata:
name: mssql-data-claim
spec:
accessModes:
- ReadWriteOnce
# storageClassName: vsan-default-storage-policy
storageClassName: pstore-high
resources:
requests:
storage: 10Gi
The storage class exists. I have tried this with both the default vSAN and other storage classes and always hit the same volume authentication issue.
I've searched high and low, can't find any related docs. Was hoping to see if someone knew more.
Thanks so much!!
Thanks again for the help, our team was able to fix this. We found out that our vCenter root password had expired. Once we reset the password our persistent volumes were able to mount to the containers without any errors. Highly suggest if you are running Tanzu to make sure your vCenter is fully updated.

Every other SQL Server deployment fails due to access denied on Kubernetes

I am deploying a SQL Server 2019 to Kubernetes with the following manifest:
apiVersion : apps/v1
kind: Deployment
metadata:
name: sql
spec:
selector:
matchLabels:
app: 'sql'
template:
metadata:
labels:
app: sql
spec:
hostname: sql-dev
securityContext:
fsGroup: 10001
initContainers:
- name: volume-permissions
image: busybox
command: ["sh", "-c", "chown -R 10001:0 /var/opt/mssql"]
volumeMounts:
- mountPath: "/var/opt/mssql"
name: mssqldb
containers:
- name: sql
image: localhost:32000/sql:dev-latest
env:
- name: MSSQL_SA_PASSWORD
valueFrom:
secretKeyRef:
name: mssql
key: SA_PASSWORD
- name: ACCEPT_EULA
value: "Y"
ports:
- containerPort: 1433
resources:
limits:
memory: 2Gi
cpu: 1
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: sqldev-pvc
---
apiVersion: v1
kind: Service
metadata:
name: sql-svc
spec:
type: LoadBalancer
ports:
- protocol: TCP
port: 1433
targetPort: 1433
nodePort: 31113
selector:
app: sql
And this is the pv/pvc manifest:
apiVersion: v1
kind: PersistentVolume
metadata:
name: sqldev-pv
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
storageClassName: sql
hostPath:
path: /usr/sql
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sqldev-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: sql
resources:
requests:
storage: 1Gi
If the deployment is not present on the Cluster yet, the deployment itself works and the server is available.
The next deployment fails with the following message:
2021-01-20 12:02:34.98 Server Error: 17113, Severity: 16, State:
2021-01-20 12:02:34.98 Server Error 5(Access is denied.) occurred while opening file '/var/opt/mssql/data/master.mdf' to obtain
configuration information at startup. An invalid startup option might
have caused the error. Verify your startup options, and correct or
remove them if necessary.
Doing another deployment or simply restarting it with kubectl rollout restart deployment/sql comes up fine, while the next one fails again.
The pattern is a consistent good - bad - good - bad - ...
Plese explain why this is happening and how I can resolve this.
Update: Apparently one instance of mssql exclusively locks the database files - which makes total sense. You don't want 2 instances of brain going haywire on your sole instance of childhood memories.
So what I think is happening is:
Instance A exists and is up and running
Instance B deployment starts and wants to access the same volume as A
Only when B is created, A is being terminated with a grace period of 30 seconds
B is trying to access the mdf, while it is still being excklusively locked by and to A currently being terminated.
I have a crude solution involving a sleep 30 bash script before initializing mssql inside the pod, but right now I want to investigate, if there is a more elegant solution.
My first approach to solve this was to delay the boot up time of mssql until the previous pod was terminated using a bash loop:
echo "Waiting 35 seconds grace period."
for i in {0..35}
do
sleep 1
echo "$i seconds waited"
done
While this technically solved the problem, it isn't very elegant. If the grace period of the pod is changed to someting > 35 seconds, this will need to be changed too.
Changing the deployment strategy from its implicit default RollingUpdate to Recreate, did the trick to. The effect is, that the previous pod is being terminated before the new one is spun up.
apiVersion : apps/v1
kind: Deployment
metadata:
name: sql
spec:
selector:
matchLabels:
app: 'sql'
strategy:
type: Recreate
Documentation: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment

How do I connect a kubernetes cluster to an external SQL Server database using docker desktop?

I need to know how to connect my Kubernetes cluster to an external SQL Server database running in a docker image outside of the Kubernetes cluster.
I currently have two pods in my cluster that are running, each has a different image in it created from asp.net core applications. There is a completely separate (outside of Kubernetes but running locally on my machine localhost,1433) docker image that hosts a SQL Server database. I need the applications in my Kubernetes pods to be able to reach and manipulate that database. I have tried creating a YAML file and configuring different ports but I do not know how to get this working, or how to test that it actually is working after setting it up. I need the exact steps/commands to create a service capable of routing a connection from the images in my cluster to the DB and back.
Docker SQL Server creation (elevated powershell/docker desktop):
docker pull mcr.microsoft.com/mssql/server:2017-latest
docker run -d -p 1433:1433 --name sql -v "c:/Temp/DockerShared:/host_mount" -e SA_PASSWORD="aPasswordPassword" -e ACCEPT_EULA=Y mcr.microsoft.com/mssql/server:2017-latest
definitions.yaml
#Pods in the cluster
apiVersion: v1
kind: Pod
metadata:
name: pod-1
labels:
app: podnet
type: module
spec:
containers:
- name: container1
image: username/image1
---
apiVersion: v1
kind: Pod
metadata:
name: pod-2
labels:
app: podnet
type: module
spec:
containers:
- name: container2
image: username/image2
---
#Service created in an attempt to contact external SQL Server DB
apiVersion: v1
kind: Service
metadata:
name: ext-sql-service
spec:
ports:
- port: 1433
targetPort: 1433
type: ClusterIP
---
apiVersion: v1
kind: Endpoints
metadata:
name: ext-sql-service
subsets:
- addresses:
- ip: (Docker IP for DB Instance)
ports:
- port: 1433
Ideally I would like applications in my kubernetes cluster to be able to manipulate the SQL Server I already have set up (running outside of the cluster but locally on my machine).
When running from local docker, you connection string is NOT your local machine.
It is the local docker "world", that happens to be running on your machine.
host.docker.internal:1433
The above is docker container talking to your local machine. Obviously, the port could be different based on how you exposed it.
......
If you're trying to get your running container to talk to sql-server which is ALSO running inside of the docker world, that connection string looks like:
ServerName:
my-mssql-service-deployment-name.$_CUSTOMNAMESPACENAME.svc.cluster.local
Where $_CUSTOMNAMESPACENAME is probably "default", but you may be running a different namespace.
my-mssql-service-deployment-name is the name of YOUR deployment (I have it stubbed here)
Note there is no port number here.
This is documented here:
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services
Problem may be in kind of service you put. ClusterIP enable you juest to connect among pods inside cluster.
To connect to external service you should just change definition of service kind as NodePort.
Try to change service definition:
#Service created in an attempt to contact external SQL Server DB
apiVersion: v1
kind: Service
metadata:
name: ext-sql-service
spec:
type: NodePort
ports:
- port: 1433
targetPort: 1433
and execute command:
$ kubectl apply -f your_service_definition_file_name.yaml
Remember to run this command in proper namespace, where your deployment is configured.
Bad practice is to overlay an environment variable onto the container. And with "docker run" pass that environment variable VALUE to the container.
Of course in context of executing docker command
$ docker run -d -p 1433:1433 --name sql -v "c:/Temp/DockerShared:/host_mount" -e SA_PASSWORD="aPasswordPassword" -e ACCEPT_EULA=Y mcr.microsoft.com/mssql/server:2017-latest
Putting the db-password visible is insecure. Use Kubernetes secrets.
More information you can find here: kubernetes-secret.

Where is the cassandra.yaml location on the MAC?

I am working on the docker environment, and executed docker exec -it mycassandra cqlsh. Then, I am inserting the data, and it is occurring the following error:
WriteTimeout - Error from server: code=1100
By this, it tells me that I need to find out the cassandra.yaml document and amend the write-time, but I can not find that on my MAC.
Could you tell me how can I find it and how to amend the document?
Thanks.
for those who have installed it as brew install cassandra, yaml file will located in usr/local/etc/cassandra
If you are running the official cassandra image then cassandra.yaml may be found at /etc/cassandra/cassandra.yaml in the container. If you want to create a custom cassandra.yaml file then you may try to overwrite it in your Dockerfile or docker-compose.yml file. For example, in my docker-compose.yml file I have something like:
services:
cassandra:
image: cassandra:3.11.4
volumes:
- ./cassandra.yaml:/etc/cassandra/cassandra.yaml
which causes the cassandra.yaml file in the container to be overwritten by my local cassandra.yaml.
I hope this helps.
From the provided example, it seems that the database is executed from inside a container. So the cassandra.yaml that you're looking for will be created on the fly when the container is started up, based on the configuration that you provided.
We have set Cassandra Containers with Kubernetes, and execute them in docker, based on the instructions found here, and been able to modify the settings of the cassandra.yaml file in the configuration of the statefulset, updating the variables in env for the spec of the container.
For example, to modify the seeds list, the cluster name, and the rack of the C* cluster named c-test-qa:
apiVersion: apps/v1
kind: StatefulSet
...
spec:
serviceName: c-test-qa
replicas: 1
selector:
matchLabels:
app: c-test-qa
template:
metadata:
labels:
app: c-test-qa
spec:
containers:
- name: c-test-qa
image: cassandra:3.11
imagePullPolicy: IfNotPresent
...
env:
- name: CASSANDRA_SEEDS
value: c-test-qa-0.c-test-qa.qa.svc.cluster.local
- name: CASSANDRA_CLUSTER_NAME
value: "testqa"
- name: CASSANDRA_RACK
value: "DC1"
- name: CASSANDRA_RACK
value: "CustomRack1"
...
On MacOS :
It will be found in either of the below locations:
Cassandra package installations: /etc/cassandra
Cassandra tarball installations: install_location/conf
DataStax Enterprise package installations: /etc/dse/cassandra
DataStax Enterprise tarball installations: install_location/resources/cassandra/conf

How to provision persistent volume claim for software install in kubernetes

I am trying to provision PVC for Solr deployment in k8s and mount it as /opt/solr, which is default Solr installation directory. This way I plan to target both Solr installation and data under it on PVC. However, while storage gets provisioned just fine and statefulset gets created, my deployment doesn't work because /opt/solr ends up empty. What is a proper way to do it? Here my deployment.yaml:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: solr
labels:
app: solr
spec:
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: slow
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2Gi
serviceName: solr-svc
replicas: 1
template:
metadata:
labels:
app: solr
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- solr-pool
topologyKey: "kubernetes.io/hostname"
terminationGracePeriodSeconds: 300
containers:
- name: solr
image: solr:6.5.1
imagePullPolicy: IfNotPresent
resources:
requests:
memory: 512M
cpu: 500m
ports:
- containerPort: 8983
name: solr-port
protocol: TCP
env:
- name: VERBOSE
value: "yes"
command:
- bash
- -c
- "exec /opt/solr/bin/solr start"
volumeMounts:
- name: solr-script
mountPath: /docker-entrypoint-initdb.d/
- name: datadir
mountPath: /opt/solr/
volumes:
- name: solr-script
configMap:
name: solr-configs
nodeSelector:
pool: solr-pool
Provisioned storage is empty by default and there might be a Deleting retain policy on provisioned storage be sure to check those configurations. You can also exec to your pod and examine mounted volume and see if it's working properly or not (permission issues, read only file system)
In may case there was a conflict with docker container configuration which used /opt/solr as a location for solr install and my attempt to mount separate PV under same location. Once this PV is mounted obviously I loose solr install. The fixes for this are:
create another docker image which uses separate location
change solr config to use different location for data
change PV volume location

Resources