DC/OS installation failure during preflight - mesosphere

I am using 5 cloud-based VMs to install DC/OS
1 mesos master
3 mesos agent
1 launching VM
I have installed Docker on my launching VM and start installing DC/OS. It is running successfully during install_prereqs stage without any errors. But it's failing during preflight with below errors for each of my VM system.
STDERR:
Connection to 129.114.18.235 closed.
STDOUT:
Running preflight checks /opt/dcos_install_tmp/dcos_install.sh: line 225: getenforce: command not found
Checking if docker is installed and in PATH: FAIL
Checking if unzip is installed and in PATH: FAIL
Checking if ipset is installed and in PATH: FAIL
Checking if systemd-notify is installed and in PATH: FAIL
/opt/dcos_install_tmp/dcos_install.sh: line 387: systemctl: command not found
Checking if systemctl is installed and in PATH: FAIL
Checking Docker is configured with a production storage driver: /opt/dcos_install_tmp/dcos_install.sh: line 285: docker: command not found
Do I need to install all the required software into my master and agents VMS? Please guide.

We have a similar setup but using straight vm's. We found docker needs to be running on all nodes, including masters, before running the install. Also, make sure you look at: /etc/sysconfig/docker-storageand have: DOCKER_STORAGE_OPTIONS= -s overlayset in the file on all nodes.
I don't believe this is the production setup but should get you running. You also may want to check the privilege of the user executing the install on the remote nodes, does it have permission to see/run systemctl?

I had the same error with the DC/OS web installer in version 1.9
I solved the error after double-checking the bootstraps machines's private key in the web form. To create the key, log into the bootstrap machine and run:
$ ssh-keygen -t rsa
$ for i in `cat dcos-ips.txt`; do ssh-copy-id root#$i; done
$ cat ~/.ssh/id_rsa

Related

SSH Agent Plugin v1.17 with Jenkins Declaritive Pipeline not working with Windows

I have been having issues getting my multibranch pipeline to perform git commands with an SSH key via the SSH Agent plugin on Windows.
I am able to successfully perform a git clone with the ssh from Git Bash on windows server that is running Jenkins.
In my pipeline log I am getting the following error when trying to use the sshagent plugin:
[ssh-agent] Looking for ssh-agent implementation... Could not find
ssh-agent: IOException: Cannot run program "ssh-agent": CreateProcess
error=2, The system cannot find the file specified Check if ssh-agent
is installed and in PATH [ssh-agent] FATAL: Could not find a suitable
ssh-agent provider
I have seen that installing Apache Tomcat Native libraries has helped some people, but the steps for doing so are not very descriptive.
Any help is appreciated. Thanks!

Starting jetty fail in ubuntu 14

I install the solr-jetty package in a Ubuntu 14 container running in a cloud9 workspace.
To install the package I run the following command:
sudo apt-get install solr-jetty
The installation doesn't return any error.
Then I try to start solr with the following command:
sudo service jetty start
But I receive the following error:
* Starting Jetty servlet engine. jetty
* Jetty servlet engine started, reachable on http://host-solr-3694477:8983/. jetty
...fail!
In the log file of jetty I get the following message:
failed setting default capabilities.
set_caps(CAPS) failed for user 'jetty'
Service exit with a return value of 4
How can I resolve this issue?
To resolve the problem I had to change the user that run jetty from jetty to root.
This can be configured by editing the /etc/default/jetty file.
I think it is not the more correct solution because it can add security problems. If anyone have a better solution ...
Docker user here, same problem, but - this worked for me (and this is as unadvised as changing the user to 'root', suggested above):
https://docs.docker.com/engine/reference/run/#/runtime-privilege-and-linux-capabilities
Set the following on your 'docker run' command when creating a container:
--privileged=true
I'm just using docker for development, so not overly concerned yet with the security implications of this.

Mesosphere installation PermissionError:/genconf/config.yaml

I got a Mesosphere-EE, and install on fedora 23 server (kernel 4.4)with:
$bash dcos_generate_config.ee.sh --web –v
then output:
Running mesosphere/dcos-genconf docker with BUILD_DIR set to/home/mesos-ee/genconf
Usage of loopback devices is strongly discouraged for production use.Either use `--storage-opt dm.thinpooldev` or use `--storage-opt
dm.no_warn_on_loop_devices=true` to suppress this warning.
07:53:46:: Logger set to DEBUG
07:53:46:: ====> Starting DCOS installer in web mode
07:53:46:: DCOS Installer v1
07:53:46:: Starting server ('0.0.0.0', 9000)
Then I start firefox though vnc, the vnc is on root. then:
07:53:57:: Root page requested. 07:53:57:: Serving/usr/local/lib/python3.4/site-packages/dcos_installer/templates/index.html
07:53:58:: Request for configuration type made.
07:53:58::Configuration file not found, /genconf/config.yaml. Writing new onewith all defaults.
07:53:58:: Error handling request
PermissionError: [Errno 13] Permission denied: '/genconf/config.yaml'
But I already have a genconf/config.yaml, it look like:
bootstrap_url: http://<bootstrap_public_ip>:<your_port>
cluster_name: '<cluster-name>'
exhibitor_storage_backend: zookeeper
exhibitor_zk_hosts: <host1>:2181,<host2>:2181,<host3>:2181
exhibitor_zk_path: /dcos
master_discovery: static
master_list:
- <master-private-ip-1>
- <master-private-ip-2>
- <master-private-ip-3>
superuser_username: <username>
superuser_password_hash: <hashed-password>
resolvers:
- 8.8.8.8
- 8.8.4.4
I do not know what’s going on. If you have any idear, please let me know, thank you very much!
Disable Selinux!
Configure SELINUX=disabled in the /etc/selinux/config file and then reboot!
Be ensure the selinux is disabled by the command getenforce.
$ getenforce
Disabled
zhe.
Correctly installing the enterprise edition depends on the correct system prerequisites. Anyway I suppose you're still on the bootstrap node so I will give you some path to succed in your current task.
Run the script as root or as a user issuing sudo dcos_generate_config.ee.sh
The script will also generate the config file automatically; if you want to use your own configuration file then create a folder named genconf and put it inside before running the script. You should changes the values inside <> with your specific configuration. If you need more help for your specific case send me an email to infofs2 at gmail.com

Managed VM Deployment hangs on "Copying certificates for secure access..."

I'm running the following command to deploy my Managed VMs app (on Windows 10):
gcloud preview app deploy app.yaml --project=<PROJECT> --promote
The deployment starts bug hangs on the following line:
Copying certificates for secure access. You may be prompted to create an SSH keypair.
And after some time I get the error:
ERROR: (gcloud.preview.app.deploy) Unable to copy certificates.
I've already:
Made sure that there are SSH keys in ~\.ssh\google_compute_engine
Tried to run with --quiet - same results
Renamed ssh-term.exe to ssh.exe - same results
Run the command as an administrator.
Run the command with --verbosity debug, which prints the following line multiple times: DEBUG: File [f] does not exist locally.
Any help will be much appreciated!
Found the cause! It was the project's firewall that blocked SSH by default. Fixed that and it worked.
Glad you fixed it, I had the same problem and will use your fix. I did happen accros a work around. By using the Container Build API to perform the build.
enter the command
gcloud config set app/use_cloud_build true
Before you
gcloud preview app deploy
Cite: https://github.com/isusanin/google-cloud-sdk/issues/533

Unable to install Nagios agent on CentOS7

When I try to install the agent downloaded from (http://assets.nagios.com/downloads/nagiosxi/agents/linux-nrpe-agent.tar.gz) I get a firewalld error. The only solutions I can find is to enable the firewall but I do not want to.
2nd, I tried this command "cat /dev/null > 4-firewall". But the installer also came back with an error "The script that failed was: './A-subcomponents'"
Is there a workaround?
============================
Nagios Linux Agent Installer
============================
This script will install the Nagios Linux Agent by executing all necessary
sub-scripts.
IMPORTANT: This script should only be used on a clean installed system:
RedHat Enterprise, CentOS, Fedora, or Oracle
OpenSUSE or SUSE Enterprise
Ubuntu or Debian
Do NOT use this on a system running any other distro or that
does not allow additional package installation.
Do you want to continue? [Y/n] y
Proceeding with installation...
Running './0-repos'...
Configuring Repos...
epel-release RPM installed OK
Repos configured OK
RESULT=0
Running './1-prereqs'...
Installing prerequisites...
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: anorien.csc.warwick.ac.uk
* epel: epel.besthosting.ua
* extras: centos.hyve.com
* rpmforge: apt.sw.be
* updates: centos.mirroring.pulsant.co.uk
Package autoconf-2.69-11.el7.noarch already installed and latest version
Package gcc-4.8.3-9.el7.x86_64 already installed and latest version
Package glibc-2.17-78.el7.x86_64 already installed and latest version
Package libmcrypt-devel-2.5.8-13.el7.x86_64 already installed and latest version
Package 1:make-3.82-21.el7.x86_64 already installed and latest version
Package 1penssl-devel-1.0.1e-42.el7.9.x86_64 already installed and latest version
Package sudo-1.8.6p7-13.el7.x86_64 already installed and latest version
Package sysstat-10.1.5-7.el7.x86_64 already installed and latest version
Package 2:xinetd-2.3.15-12.el7.x86_64 already installed and latest version
Package bc-1.06.95-13.el7.x86_64 already installed and latest version
Nothing to do
Prerequisites installed OK
RESULT=0
Running './2-usersgroups'...
Adding users and groups...
useradd: user 'nagios' already exists
groupadd: group 'nagios' already exists
useradd: user 'nagios' already exists
groupadd: group 'nagcmd' already exists
Users and groups added OK
RESULT=0
Running './3-services'...
/etc/services updated
RESULT=0
Running './4-firewall'...
The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, force-reload, status). For other actions, please try to use systemctl.
FirewallD is not running
RESULT=252
===================
INSTALLATION ERROR!
===================
Installation step failed - exiting.
Check for error messages in the install log (install.log).
If you require assistance in resolving the issue, please include install.log
in your communications with Nagios XI technical support.
The script that failed was: './4-firewall'
In the same folder that contains fullinstall, try creating an empty installed.firewall file by executing:
touch installed.firewall
Then try re-running
./fullinstall
This is based on line 12 of 4-firewall:
# Was this step already completed?
if [ -f installed.firewall ]; then
echo "Firewall rules already configured - skipping."
exit 0
fi
If a file named installed.firewall exists in that directory, the firewall configuration step should get skipped.
Also you can enable firewalld on your server with the following command.
To enable firewalld,
systemctl enable firewalld
To start firewalld
systemctl start firewalld
To check the status of firewalld
systemctl status firewalld

Resources