I tried to run my first cluster as I'm currently trying to learn so I can work in Cloud Engineering hopefully.
What I did :
I have 3 Cloud Servers ( Ubuntu 20.04), all in one Network,
I've successfully set up my ETCD Cluster ( cluster-health shows me all 3 Network IPs of the Servers, 1 leader 2 not leader)
now I've installed k3s on my first Server
curl -sfL https://get.k3s.io | sh -s - server \ --datastore-endpoint="https://10.0.0.2:2380,https://10.0.0.4:2380,https://10.0.0.3:2380"
I've done the same on the 2 other Servers the only difference is I added the token value to it and checked it beforehand in:
cat /var/lib/rancher/k3s/server/token
now everything seems to have worked but when I tried to kubectl get nodes , it just shows me one node...
does anyone have any tips or answers for me?
k3s Service FIle :
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
server \
'--node-external-ip=78.46.241.153'
'--node-name=node-1'
' --flannel-iface=ens10'
' --advertise-address=10.0.0.2'
' --node-ip=10.0.0.2'
' --datastore-endpoint=https://10.0.0.2:2380,https://10.0.0.4:2380,https://10.0.0.3:2380' \
Related
I am going to keep this simple and ask, is there a way to see which pod have an active connection with an endpoint like a database endpoint?
My cluster contains a few hundred of namespace and my database provider just told me that the maximum amount of connections is almost reached and I want to pinpoint the pod(s) that uses multiple connections to our database endpoint at the same time.
I can see from my database cluster that the connections come from my cluster node's IP... but it won't say which pods... and I have quite lot of pods...
Thanks for the help
Each container uses its own network namespace, so to check the network connection inside the container namespace you need to run command inside that namespace.
Luckily all containers in a Pod share the same network namespace, so you can add small sidecar container to the pod that print to the log open connections.
Alternatively, you can run netstat command inside the pod (if the pod has it on its filesystem):
kubectl get pods | grep Running | awk '{ print $1 }' | xargs -I % sh -c 'echo == Pod %; kubectl exec -ti % -- netstat -tunaple' >netstat.txt
# or
kubectl get pods | grep Running | awk '{ print $1 }' | xargs -I % sh -c 'echo == Pod %; kubectl exec -ti % -- netstat -tunaple | grep ESTABLISHED' >netstat.txt
After that you'll have a file on your disk (netstat.txt) with all information about connections in the pods.
The third way is most complex. You need to find the container ID using docker ps and run the following command to get PID
$ pid = "$(docker inspect -f '{{.State.Pid}}' "container_name | Uuid")"
Then, you need to create named namespace:
(you can use any name you want, or container_name/Uuid/Pod_Name as a replacement to namespace_name)
sudo mkdir -p /var/run/netns
sudo ln -sf /proc/$pid/ns/net "/var/run/netns/namespace_name"
Now you can run commands in that namespace:
sudo ip netns exec "namespace_name" netstat -tunaple | grep ESTABLISHED
You need to do that for each pod on each node. So, it might be useful to troubleshoot particular containers, but it needs some more automation for your task.
It might be helpful for you to install Istio to your cluster. It has several interesting features mentioned in this answer
The easiest way is to run netstat on all your Kubernetes nodes:
$ netstat -tunaple | grep ESTABLISHED | grep <ip address of db provider>
The last column is the PID/Program name column, and that's a program that is running in a container (with a different internal container PID) in your pod on that specific node. There are all kinds of different ways to find out which container/pod it is. For example,
# Loop through all containers on the node with
$ docker top <container-id>
Then after you find the container id, if you look through all your pods:
$ kubectl get pod <pod-id> -o=yaml
And you can find the status, for example:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-09T23:01:36Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-11-09T23:01:38Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-11-09T23:01:38Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2018-11-09T23:01:36Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://f64425b3cd0da74a323440bcb03d8f2cd95d3d9b834f8ca5c43220eb5306005d
I start my mariadb with
/etc/init.d/mysql start
Then i get
starting MariaDB database server mysqld
No more messages.
When i call
service mysql status
i get
MariaDB is stopped
Why ?
my my.cnf is:
# Example mysql config file.
[client-server]
socket=/tmp/mysql-dbug.sock
port=3307
# This will be passed to all mysql clients
[client]
password=XXXXXX
# Here are entries for some specific programs
# The following values assume you have at least 32M ram
# The MySQL server
[mysqld]
temp-pool
key_buffer_size=16M
datadir=/etc/mysql/data
loose-innodb_file_per_table
[mariadb]
datadir=/etc/mysql/data
default-storage-engine=aria
loose-mutex-deadlock-detector
max- connections=20
[mariadb-5.5]
language=/my/maria-5.5/sql/share/english/
socket=/tmp/mysql-dbug.sock
port=3307
[mariadb-10.1]
language=/my/maria-10.1/sql/share/english/
socket=/tmp/mysql2-dbug.sock
[mysqldump]
quick
max_allowed_packet=16M
[mysql]
no-auto-rehash
loose-abort-source-on-error
Thank you for your help.
If your SELinux is set to permissive, please try to adjust the permissions :
Files in /var/lib/mysql should be 660.
/var/lib/mysql directory should be 755, Any of its subdirectories should be 700.
if your SELinux is set to enforcing, Please apply the right context.
I'm in the middle of configuring SolrCloud with Zookeeper but I struggle to load the config on ZK.
Here my steps:
Configure an ensemble of 3 ZK, I see 1 leader and 2 follower
Configure a small cluster of 2 of SolrCloud that is started as followed
/bin/solr start -c -z <ip1>:2181,<ip2>:2181,<ip3>:2181 -noprompt
Then I tried to load the config on ZK using zkCli.sh:
./bin/zkCli.sh -zkhost <ip1>:2181,<ip2>:2181,<ip3>:2181 -cmd upconfig -confname config1 -confdir /folder/with/schema.xml (it come from solr standalone)
Create Solr collection using API
http://<solr_ip>:8983/solr/admin/collections?action=CREATE&name=collection_test&numShards=2&replicationFactor=2&maxShardsPerNode=2
Link the config to the collection using again zkCli.sh
./bin/zkCli.sh -zkhost 127.0.0.1:2181 -cmd linkconfig -collection collection_test -confname config1
At this point I should see the config loaded but nothing happens.
I used below steps to configure SolrCloud in my VM.
SolrCloud Setup Instructions
Infrastructure
a. Unix Boxes 3
b. ELB 1
c. create cnames for the unix boxes as cl-solr1, cl-solr2, cl-solr3
Installations
a. Install zookeeper-3.4.6 (zookeeper-3.4.6.tar.gz)
b. solr-5.2.1 (solr-5.2.1.tgz)
c. OpenJDK Runtime Environment 1.7.0_79
Setup
a. Set JAVA_HOME
b. In cl-solr1,cl-solr2,cl-solr3 create zoo.cfg file with below content at /opt/myname/zookeeper-3.4.6/conf
tickTime=2000
dataDir=/var/lib/zookeeper/data
clientPort=2181
initLimit=5
syncLimit=2
server.1=cl-solr1:2888:3888
server.2=cl-solr2:2888:3888
server.3=cl-solr3:2888:3888
c. Create myid file for each zookeeper server cl-solr1, cl-solr2 & cl-solr3 using below command
$mkdir -p /var/lib/zookeeper/data/
$echo 1 > /var/lib/zookeeper/data/myid --1 for cl-solr1 and 2 for cl-solr2 ..
Start the zookeeper
a. /opt/myname/zookeeper-3.4.6/bin/zkServer.sh start
b. /opt/myname/zookeeper-3.4.6/bin/zkServer.sh status
c. Status check in detail via
echo stat | nc cl-solr1 2181
Start the SOLR
a. cl-solr1$ /opt/myname/solr-5.2.1/bin/solr start -c -z cl-solr1:2181,cl-solr2:2181,cl-solr3:2181 -h cl-solr1
b. cl-solr2$ /opt/myname/solr-5.2.1/bin/solr start start -c -z cl-solr1:2181,cl-solr2:2181,cl-solr3:2181 -h cl-solr2
c. cl-solr3$ /opt/myname/solr-5.2.1/bin/solr start -c -z cl-solr1:2181,cl-solr2:2181,cl-solr3:2181 -h cl-solr3
Create a new Collection
a. From one of the nodes (cl-solr1) fire below commands
i. mkdir -p /opt/myname/solr-5.2.1/server/solr/pats/conf
ii. Copy conf folder from current system
iii. /opt/myname/solr-5.2.1/bin/solr create -c my_colln_name -d /opt/myname/solr-5.2.1/server/solr/pats/conf -n myname_cfg -shards 2 -replicationFactor 2
This is the process we perform manually.
$ sudo su - gvr
[gvr/DB:DEV3FXCU]/home/gvr>
$ ai_dev.env
Gateway DEV3 $
$ gw_report integrations long
report is ******
Now i am attempting to automate this process using a shell script:
#!/bin/ksh
sudo su - gvr
. ai_dev3.env
gw_report integrations long
but this is not working. Getting stuck after entering the env.
Stuck at this place (Gateway DEV3 $)
You're not running the same commands in the two examples - gw_report long != gw_report integrations long. Maybe the latter takes much longer (or hangs).
Also, in the original code you run ai_dev.env and in the second you source it. Any variables set when running a script are gone when returning from that script, so I suspect this accounts for the different behavior.
My organization is using Nagios with the check_mk plugin to monitor our nodes. My question is: is it possible run a manual check from the command line? It is important, process-wise, to be able to test a configuration change before deploying it.
For example, I've prepared a configuration change which uses the ps.perf check type to check the number of httpd processes on our web servers. The check looks like this:
checks = [
( ["web"], ALL_HOSTS, "ps.perf", "Number of httpd processes", ( "/usr/sbin/httpd", 1, 2, 80, 100 ) )
]
I would like to test this configuration change before committing and deploying it.
Is it possible to run this check via the command line, without first adding it to main.mk? I'm envisioning something like:
useful_program -H my.web.node -c ps.perf -A /usr/sbin/httpd,1,2,80,100
I don't see any way to do something like this in the check_mk documentation, but am hoping there is a way to achieve something like this.
Thanks!
that is easy to check.
Just make your config changes and then run:
cmk -nv HOSTNAME.
That (-n) will try run everything and return (-v) the output.
So can see the same results like later in the GUI.
List the check
$check_mk -L | grep ps.perf
if it listing ps.perf then run following command,
$check_mk --checks=ps.perf -I Hostname