Start Riak crashing after 30 seconds - solr

$ riak start crashing after 30 seconds of its start. I'm having following (changes) settings in my riak.conf:
search = on
storage_backend = leveldb
riak_control = on
crash.log contains the following:
2016-06-30 14:49:38 =ERROR REPORT====
** Generic server yz_solr_proc terminating
** Last message in was {check_solr,0}
** When Server state == {state,"./data/yz",#Port<0.9441>,8093,8985}
** Reason for termination ==
** "solr didn't start in alloted time"
2016-06-30 14:49:38 =CRASH REPORT====
crasher:
initial call: yz_solr_proc:init/1
pid: <0.582.0>
registered_name: yz_solr_proc
exception exit: {"solr didn't start in alloted time",[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,744}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
ancestors: [yz_solr_sup,yz_sup,<0.578.0>]
messages: [{'EXIT',#Port<0.9441>,normal}]
links: [<0.580.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 376
stack_size: 27
reductions: 16170
neighbours:
2016-06-30 14:49:38 =SUPERVISOR REPORT====
Supervisor: {local,yz_solr_sup}
Context: child_terminated
Reason: "solr didn't start in alloted time"
Offender: [{pid,<0.582.0>},{name,yz_solr_proc},{mfargs,{yz_solr_proc,start_link,["./data/yz","./data/yz_temp",8093,8985]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}]
2016-06-30 14:49:39 =ERROR REPORT====
** Generic server yz_solr_proc terminating
** Last message in was {#Port<0.12204>,{exit_status,1}}
** When Server state == {state,"./data/yz",#Port<0.12204>,8093,8985}
** Reason for termination ==
** {"solr OS process exited",1}
2016-06-30 14:49:39 =CRASH REPORT====
crasher:
initial call: yz_solr_proc:init/1
pid: <0.7631.0>
registered_name: yz_solr_proc
exception exit: {{"solr OS process exited",1},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,744}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
ancestors: [yz_solr_sup,yz_sup,<0.578.0>]
messages: [{'EXIT',#Port<0.12204>,normal}]
links: [<0.580.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 8968
neighbours:
2016-06-30 14:49:39 =SUPERVISOR REPORT====
Supervisor: {local,yz_solr_sup}
Context: child_terminated
Reason: {"solr OS process exited",1}
Offender: [{pid,<0.7631.0>},{name,yz_solr_proc},{mfargs,{yz_solr_proc,start_link,["./data/yz","./data/yz_temp",8093,8985]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}]
2016-06-30 14:49:39 =SUPERVISOR REPORT====
Supervisor: {local,yz_solr_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.7631.0>},{name,yz_solr_proc},{mfargs,{yz_solr_proc,start_link,["./data/yz","./data/yz_temp",8093,8985]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}]
2016-06-30 14:49:39 =SUPERVISOR REPORT====
Supervisor: {local,yz_sup}
Context: child_terminated
Reason: shutdown
Offender: [{pid,<0.580.0>},{name,yz_solr_sup},{mfargs,{yz_solr_sup,start_link,[]}},{restart_type,permanent},{shutdown,5000},{child_type,supervisor}]
2016-06-30 14:49:39 =SUPERVISOR REPORT====
Supervisor: {local,yz_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.580.0>},{name,yz_solr_sup},{mfargs,{yz_solr_sup,start_link,[]}},{restart_type,permanent},{shutdown,5000},{child_type,supervisor}]

Make sure the ports used by Solr are available. The defaults are 8093 for search, and 8985 for JMX.
Tune your system to improve performance. Follow Improving Performance for Linux.
In riak.conf, increase the JVM's heap size, the default of 1G is often not enough. For example, search.solr.jvm_options=-d64 -Xms2g -Xmx4g -XX:+UseStringCache -XX:+UseCompressedOops (see Search Settings).
On a slow machine, Solr just may take longer to start. Try increasing search.solr.start_timeout.
Solr directories must be writable (usually /var/lib/riak/data/yz*), and a compatible JVM be used.

Riak's internal solr use localhost and 127.0.0.1 as default host. So it should have defined in /etc/hosts file:
127.0.0.1 localhost
FYI, if you use windows your hosts file location could be different.

Related

Azure SQL Edge container failed to start on M1, when mapping volume to relative path

On an M1 Macbook, I followed online examples and successfully start Azure SQL Edge container with basic configuration.
Then I want to map a volume (mySpecialFolder) by "Path to the host, relative to the Compose file".
Here we want "./mySpecialFolder:/tmp", not "mySpecialFolder:/tmp".
services:
mssql:
container_name: mssql
image: "mcr.microsoft.com/azure-sql-edge:latest"
environment:
SA_PASSWORD: "something"
ACCEPT_EULA: "Y"
expose:
- 1433
ports:
- 1433:1433
networks:
- sql
volumes:
- ./mySpecialFolder:/tmp
- mssqlsystem:/var/opt/mssql
It failed to load and reports
Azure SQL Edge will run as non-root by default.
This container is running as user mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2140520.
2022/07/29 11:00:39 [launchpadd] INFO: Extensibility Log Header: <timestamp> <process> <sandboxId> <sessionId> <message>
2022/07/29 11:00:39 [launchpadd] WARNING: Failed to load /var/opt/mssql/mssql.conf ini file with error open /var/opt/mssql/mssql.conf: no such file or directory
2022/07/29 11:00:39 [launchpadd] INFO: DataDirectories = /bin:/etc:/lib:/lib32:/lib64:/sbin:/usr/bin:/usr/include:/usr/lib:/usr/lib32:/usr/lib64:/usr/libexec/gcc:/usr/sbin:/usr/share:/var/lib:/opt/microsoft:/opt/mssql-extensibility:/opt/mssql/mlservices:/opt/mssql/lib/zulu-jre-11:/opt/mssql-tools
2022/07/29 11:00:39 Drop permitted effective capabilities.
2022/07/29 11:00:39 [launchpadd] INFO: Polybase remote hadoop bridge disabled
2022/07/29 11:00:39 [launchpadd] INFO: Launchpadd is connecting to mssql on localhost:1431
2022/07/29 11:00:39 [launchpadd] WARNING: Failed to connect to SQL because: dial tcp 127.0.0.1:1431: connect: connection refused, will reattempt connection.
This program has encountered a fatal error and cannot continue running at Fri Jul 29 11:00:40 2022
The following diagnostic information is available:
Reason: 0x00000007
Status: 0xc0000002
Message: Failed to load KM driver [Npfs]
Stack Trace:
file://package4/windows/system32/sqlpal.dll+0x000000000030E879
file://package4/windows/system32/sqlpal.dll+0x000000000030DB54
file://package4/windows/system32/sqlpal.dll+0x000000000030AB96
file://package4/windows/system32/sqlpal.dll+0x000000000030961D
file://package4/windows/system32/sqlpal.dll+0x000000000034EE01
Stack:
IP Function
---------------- --------------------------------------
0000aaaac9c2ba70 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::~_Sp_counted_base()+0x25d0
0000aaaac9c2b618 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::~_Sp_counted_base()+0x2178
0000aaaac9c39d74 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::~_Sp_counted_base()+0x108d4
0000aaaac9c3a75c std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::~_Sp_counted_base()+0x112bc
0000aaaac9ced6c4 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_
0000ffffb9e44df8 S_SbtUnimplementedInstruction+0x2542b4
0000ffffb9e4472c S_SbtUnimplementedInstruction+0x253be8
0000ffffb9e45238 S_SbtUnimplementedInstruction+0x2546f4
0000ffffb9e3ca90 S_SbtUnimplementedInstruction+0x24bf4c
0000ffffb9e395dc S_SbtUnimplementedInstruction+0x248a98
0000ffffb9ed8ddc S_SbtUnimplementedInstruction+0x2e8298
0000ffffb9e38e44 S_SbtUnimplementedInstruction+0x248300
0000ffffb9e38b98 S_SbtUnimplementedInstruction+0x248054
0000ffffb9e38604 S_SbtUnimplementedInstruction+0x247ac0
0000ffffb9e38ffc S_SbtUnimplementedInstruction+0x2484b8
0000ffffbdb248a4 CallGuestFunction+0x84
0000ffffbdb1f964 Sbt::Dispatcher::SimulateCpu(Sbt::GuestCtx*)+0x2c
0000ffffbdb20d9c Sbt::RuntimeImpl::SimulateCpu(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long)+0x3c8
0000ffffbdb219e4 Sbt::SimulateCpu(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long)+0x30
0000ffffbdb22c04 SbtRtSimulateCpu+0x84
0000aaaac9c42164 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::~_Sp_counted_base()+0x18cc4
0000aaaac9c3fe34 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::~_Sp_counted_base()+0x16994
Process: 24 - sqlservr
Thread: 28 (application thread 0x4)
Instance Id: 76bd6c34-28e2-4a7f-9e5a-f3ffa17d9c1a
Crash Id: a022551e-96fe-4a59-ada3-4da01d244653
Build stamp: 06cd67626d2ebedd8721dc1bd892cdda65157cdcd6ac004bb81acdd6498ec618
Distribution: Ubuntu 18.04.6 LTS aarch64
Processors: 5
Total Memory: 8232747008 bytes
Timestamp: Fri Jul 29 11:00:40 2022
Last errno: 2
Last errno text: No such file or directory

PiHole with Recursive DNS not Handshaking with Wireguard setup via PiVPN

Expected Behaviour:
I've set my Router to use my PiHole along with Wireguard to use it as a VPN. I've set it up using PIVPN and some tutorials on Youtube. I have included Screenshots of my router and it's setup along with my Wireguard config files and setup. My PiHole is set up to use Recursive DNS and I have set up a DDNS with my Router and made sure to disable my Router's inherent DHCP service, set the PIHole as my Primary DNS and reserve the address. My PiHole is working nicely, but none of my devices are connecting to the Wireguard VPN.
Actual Behaviour:
My Phone/Mac should be handshaking with the VPN but it's not. My Pi-Hole is working correctly but the Wireguard is not.
I have been working on this for the better part of the day and am utterly at a loss, any help whatsoever would be greatly appreciated. thanks!
The three main Youtube videos I used to help me set this up were:
For the Wireguard and Pi-Hole interaction
https://www.youtube.com/watch?v=DUpIOSbbvKk&t=595s
https://www.youtube.com/watch?v=lnYYmC-A4S0
For my Recursive Pi-Hole DNS server
https://www.youtube.com/watch?v=FnFtWsZ8IP0&t=939s
I did end up using PIVPN to set things up
PI Ifconfig Results
pi#raspberrypi:~ $ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.155 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::xxx:xxxx:xxxx:xxxx prefixlen 64 scopeid 0x20<link>
ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 8349 bytes 1644604 (1.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3440 bytes 943688 (921.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 1388 bytes 123499 (120.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1388 bytes 123499 (120.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wg0: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1420
inet 10.6.0.1 netmask 255.255.255.0 destination 10.6.0.1
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 58 overruns 0 carrier 0 collisions 0
wlan0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Debug Token:
*** [ INITIALIZING ]
[i] 2022-01-30:11:31:18 debug log has been initialized.
[i] System has been running for 0 days, 0 hours, 46 minutes
*** [ INITIALIZING ] Sourcing setup variables
[i] Sourcing /etc/pihole/setupVars.conf...
*** [ DIAGNOSING ]: Core version
[i] Core: v5.8.1 (https://discourse.pi-hole.net/t/how-do-i-update-pi-hole/249)
[i] Remotes: origin https://github.com/pi-hole/pi-hole.git (fetch)
origin https://github.com/pi-hole/pi-hole.git (push)
[i] Branch: master
[i] Commit: v5.8.1-0-g875ad04
*** [ DIAGNOSING ]: Web version
[i] Web: v5.10.1 (https://discourse.pi-hole.net/t/how-do-i-update-pi-hole/249)
[i] Remotes: origin https://github.com/pi-hole/AdminLTE.git (fetch)
origin https://github.com/pi-hole/AdminLTE.git (push)
[i] Branch: master
[i] Commit: v5.10.1-0-gcb7a866
*** [ DIAGNOSING ]: FTL version
[✓] FTL: v5.13
*** [ DIAGNOSING ]: lighttpd version
[i] 1.4.59
*** [ DIAGNOSING ]: php version
[i] 7.4.25
*** [ DIAGNOSING ]: Operating system
[i] dig return code: 0
[i] dig response: "Raspbian=9,10,11 Ubuntu=16,18,20,21 Debian=9,10,11 Fedora=33,34 CentOS=7,8"
[✓] Distro: Raspbian
[✓] Version: 11
*** [ DIAGNOSING ]: SELinux
[i] SELinux not detected
*** [ DIAGNOSING ]: FirewallD
[i] Firewalld service inactive
*** [ DIAGNOSING ]: Processor
[✓] armv7l
*** [ DIAGNOSING ]: Disk usage
Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 1.6G 26G 6% /
devtmpfs 333M 0 333M 0% /dev
tmpfs 462M 1.1M 461M 1% /dev/shm
tmpfs 185M 716K 184M 1% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
/dev/mmcblk0p1 253M 50M 203M 20% /boot
tmpfs 93M 0 93M 0% /run/user/999
tmpfs 93M 0 93M 0% /run/user/1000
*** [ DIAGNOSING ]: Networking
[✓] IPv4 address(es) bound to the eth0 interface:
192.168.0.155/24
[✓] IPv6 address(es) bound to the eth0 interface:
fe80::a7c:c1a2:460f:f20b/64
[i] Default IPv4 gateway: 192.168.0.1
* Pinging 192.168.0.1...
[✓] Gateway responded.
*** [ DIAGNOSING ]: Ports in use
[✓] udp:0.0.0.0:53 is in use by pihole-FTL
udp:0.0.0.0:68 is in use by dhcpcd
udp:0.0.0.0:51820 is in use by <unknown>
udp:127.0.0.1:5335 is in use by unbound
udp:0.0.0.0:5353 is in use by avahi-daemon
udp:0.0.0.0:51038 is in use by avahi-daemon
[✓] udp:*:53 is in use by pihole-FTL
udp:*:51820 is in use by <unknown>
udp:*:5353 is in use by avahi-daemon
udp:*:37789 is in use by avahi-daemon
[✓] tcp:127.0.0.1:4711 is in use by pihole-FTL
[✓] tcp:0.0.0.0:80 is in use by lighttpd
[✓] tcp:0.0.0.0:53 is in use by pihole-FTL
tcp:0.0.0.0:22 is in use by sshd
tcp:127.0.0.1:5335 is in use by unbound
tcp:127.0.0.1:8953 is in use by unbound
[✓] tcp:[::1]:4711 is in use by pihole-FTL
[✓] tcp:[::]:80 is in use by lighttpd
[✓] tcp:[::]:53 is in use by pihole-FTL
tcp:[::]:22 is in use by sshd
*** [ DIAGNOSING ]: Name resolution (IPv4) using a random blocked domain and a known ad-serving domain
[✓] mail.chileexe77.com is 0.0.0.0 on lo (127.0.0.1)
[✓] mail.chileexe77.com is 0.0.0.0 on eth0 (192.168.0.155)
[✓] No IPv4 address available on wlan0
[✓] mail.chileexe77.com is 0.0.0.0 on wg0 (10.6.0.1)
[✓] doubleclick.com is 172.217.15.238 via a remote, public DNS server (8.8.8.8)
*** [ DIAGNOSING ]: Name resolution (IPv6) using a random blocked domain and a known ad-serving domain
[✓] file.firefoxupdata.com is :: on lo (::1)
[✓] file.firefoxupdata.com is :: on eth0 (fe80::a7c:c1a2:460f:f20b)
[✓] No IPv6 address available on wlan0
[✓] No IPv6 address available on wg0
[✗] Failed to resolve doubleclick.com via a remote, public DNS server (2001:4860:4860::8888)
*** [ DIAGNOSING ]: Discovering active DHCP servers (takes 10 seconds)
Scanning all your interfaces for DHCP servers
Timeout: 10 seconds
WARN: Could not sendto() in send_dhcp_discover() (/__w/FTL/FTL/src/dhcp-discover.c:233): Operation not permitted
DHCP packets received on interface wlan0: 0
DHCP packets received on interface eth0: 0
DHCP packets received on interface lo: 0
*** [ DIAGNOSING ]: Pi-hole processes
[✓] lighttpd daemon is active
[✓] pihole-FTL daemon is active
*** [ DIAGNOSING ]: Pi-hole-FTL full status
● pihole-FTL.service - LSB: pihole-FTL daemon
Loaded: loaded (/etc/init.d/pihole-FTL; generated)
Active: active (exited) since Sun 2022-01-30 10:44:27 MST; 47min ago
Docs: man:systemd-sysv-generator(8)
Process: 637 ExecStart=/etc/init.d/pihole-FTL start (code=exited, status=0/SUCCESS)
CPU: 143ms
Jan 30 10:44:24 raspberrypi systemd[1]: Starting LSB: pihole-FTL daemon...
Jan 30 10:44:25 raspberrypi pihole-FTL[637]: Not running
Jan 30 10:44:25 raspberrypi su[665]: (to pihole) root on none
Jan 30 10:44:25 raspberrypi su[665]: pam_unix(su:session): session opened for user pihole(uid=999) by (uid=0)
Jan 30 10:44:27 raspberrypi pihole-FTL[738]: FTL started!
Jan 30 10:44:27 raspberrypi systemd[1]: Started LSB: pihole-FTL daemon.
*** [ DIAGNOSING ]: Setup variables
PIHOLE_INTERFACE=eth0
IPV4_ADDRESS=192.168.0.155/24
IPV6_ADDRESS=
QUERY_LOGGING=true
INSTALL_WEB_SERVER=true
INSTALL_WEB_INTERFACE=true
LIGHTTPD_ENABLED=true
CACHE_SIZE=10000
BLOCKING_ENABLED=true
PIHOLE_DNS_1=127.0.0.1#5335
DNS_FQDN_REQUIRED=true
DNS_BOGUS_PRIV=true
DNSSEC=false
REV_SERVER=false
DNSMASQ_LISTENING=local
*** [ DIAGNOSING ]: Dashboard and block page
[✗] Block page X-Header: X-Header does not match or could not be retrieved.
HTTP/1.1 200 OK
Content-type: text/html; charset=UTF-8
Expires: Sun, 30 Jan 2022 18:31:35 GMT
Cache-Control: max-age=0
Date: Sun, 30 Jan 2022 18:31:35 GMT
Server: lighttpd/1.4.59
[✓] Web interface X-Header: X-Pi-hole: The Pi-hole Web interface is working!
*** [ DIAGNOSING ]: Gravity Database
-rw-rw-r-- 1 pihole pihole 220K Jan 30 03:21 /etc/pihole/gravity.db
*** [ DIAGNOSING ]: Info table
property value
-------------------- ----------------------------------------
version 15
updated 1643538072
gravity_count 2046
Last gravity run finished at: Sun 30 Jan 2022 03:21:12 AM MST
----- First 10 Gravity Domains -----
advanbusiness.com
aoldaily.com
aolon1ine.com
applesoftupdate.com
arrowservice.net
attnpower.com
aunewsonline.com
avvmail.com
bigdepression.net
bigish.net
*** [ DIAGNOSING ]: Groups
id enabled name date_added date_modified description
---- ------- -------------------------------------------------- ------------------- ------------------- --------------------------------------------------
0 1 Default 2022-01-30 01:54:48 2022-01-30 01:54:48 The default group
*** [ DIAGNOSING ]: Domainlist (0/1 = exact white-/blacklist, 2/3 = regex white-/blacklist)
*** [ DIAGNOSING ]: Clients
*** [ DIAGNOSING ]: Adlists
id enabled group_ids address date_added date_modified comment
----- ------- ------------ ---------------------------------------------------------------------------------------------------- ------------------- ------------------- --------------------------------------------------
2 1 0 http://www.malwaredomainlist.com/hostslist/hosts.txt 2022-01-30 02:05:09 2022-01-30 02:05:09
*** [ DIAGNOSING ]: contents of /etc/pihole
-rw-r--r-- 1 root root 0 Jan 30 01:54 /etc/pihole/custom.list
-rw-r--r-- 1 root root 65 Jan 30 03:21 /etc/pihole/local.list
-rw-r--r-- 1 root root 234 Jan 30 01:54 /etc/pihole/logrotate
/var/log/pihole.log {
su root root
daily
copytruncate
rotate 5
compress
delaycompress
notifempty
nomail
}
/var/log/pihole-FTL.log {
su root root
weekly
copytruncate
rotate 3
compress
delaycompress
notifempty
nomail
}
-rw-rw-r-- 1 pihole root 127 Jan 30 01:54 /etc/pihole/pihole-FTL.conf
PRIVACYLEVEL=0
*** [ DIAGNOSING ]: contents of /etc/dnsmasq.d
-rw-r--r-- 1 root root 1.4K Jan 30 02:14 /etc/dnsmasq.d/01-pihole.conf
addn-hosts=/etc/pihole/local.list
addn-hosts=/etc/pihole/custom.list
localise-queries
no-resolv
cache-size=10000
log-queries
log-facility=/var/log/pihole.log
log-async
server=127.0.0.1#5335
domain-needed
expand-hosts
bogus-priv
local-service
-rw-r--r-- 1 root root 38 Jan 30 02:14 /etc/dnsmasq.d/02-pivpn.conf
addn-hosts=/etc/pivpn/hosts.wireguard
-rw-r--r-- 1 root root 2.2K Jan 30 01:54 /etc/dnsmasq.d/06-rfc6761.conf
server=/test/
server=/localhost/
server=/invalid/
server=/bind/
server=/onion/
*** [ DIAGNOSING ]: contents of /etc/lighttpd
-rw-r--r-- 1 root root 0 Jan 30 01:54 /etc/lighttpd/external.conf
-rw-r--r-- 1 root root 3.7K Jan 30 01:54 /etc/lighttpd/lighttpd.conf
server.modules = (
"mod_access",
"mod_accesslog",
"mod_auth",
"mod_expire",
"mod_redirect",
"mod_setenv",
"mod_rewrite"
)
server.document-root = "/var/www/html"
server.error-handler-404 = "/pihole/index.php"
server.upload-dirs = ( "/var/cache/lighttpd/uploads" )
server.errorlog = "/var/log/lighttpd/error.log"
server.pid-file = "/run/lighttpd.pid"
server.username = "www-data"
server.groupname = "www-data"
server.port = 80
accesslog.filename = "/var/log/lighttpd/access.log"
accesslog.format = "%{%s}t|%V|%r|%s|%b"
index-file.names = ( "index.php", "index.html", "index.lighttpd.html" )
url.access-deny = ( "~", ".inc", ".md", ".yml", ".ini" )
static-file.exclude-extensions = ( ".php", ".pl", ".fcgi" )
mimetype.assign = (
".ico" => "image/x-icon",
".jpeg" => "image/jpeg",
".jpg" => "image/jpeg",
".png" => "image/png",
".svg" => "image/svg+xml",
".css" => "text/css; charset=utf-8",
".html" => "text/html; charset=utf-8",
".js" => "text/javascript; charset=utf-8",
".json" => "application/json; charset=utf-8",
".map" => "application/json; charset=utf-8",
".txt" => "text/plain; charset=utf-8",
".eot" => "application/vnd.ms-fontobject",
".otf" => "font/otf",
".ttc" => "font/collection",
".ttf" => "font/ttf",
".woff" => "font/woff",
".woff2" => "font/woff2"
)
include_shell "cat external.conf 2>/dev/null"
include_shell "/usr/share/lighttpd/use-ipv6.pl " + server.port
include_shell "find /etc/lighttpd/conf-enabled -name '*.conf' -a ! -name 'letsencrypt.conf' -printf 'include \"%p\"
' 2>/dev/null"
$HTTP["url"] =~ "^/admin/" {
setenv.add-response-header = (
"X-Pi-hole" => "The Pi-hole Web interface is working!",
"X-Frame-Options" => "DENY"
)
}
$HTTP["url"] =~ "^/admin/\.(.*)" {
url.access-deny = ("")
}
$HTTP["url"] =~ "/(teleporter|api_token)\.php$" {
$HTTP["referer"] =~ "/admin/settings\.php" {
setenv.add-response-header = ( "X-Frame-Options" => "SAMEORIGIN" )
}
}
expire.url = ( "" => "access plus 0 seconds" )
*** [ DIAGNOSING ]: contents of /etc/cron.d
-rw-r--r-- 1 root root 1.8K Jan 30 01:54 /etc/cron.d/pihole
21 3 * * 7 root PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updateGravity >/var/log/pihole_updateGravity.log || cat /var/log/pihole_updateGravity.log
00 00 * * * root PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole flush once quiet
#reboot root /usr/sbin/logrotate --state /var/lib/logrotate/pihole /etc/pihole/logrotate
*/10 * * * * root PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker local
34 16 * * * root PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker remote
#reboot root PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker remote reboot
*** [ DIAGNOSING ]: contents of /var/log/lighttpd
-rw-r--r-- 1 www-data www-data 770 Jan 30 10:44 /var/log/lighttpd/error.log
-----head of error.log------
2022-01-30 01:53:27: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 01:54:35: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 01:54:35: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 01:59:39: Wrong token! Please re-login on the Pi-hole dashboard.
2022-01-30 02:06:15: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 02:06:53: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 02:17:36: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 02:18:02: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 09:17:22: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 10:44:02: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 10:44:25: server.c.1513) server started (lighttpd/1.4.59)
-----tail of error.log------
2022-01-30 01:53:27: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 01:54:35: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 01:54:35: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 01:59:39: Wrong token! Please re-login on the Pi-hole dashboard.
2022-01-30 02:06:15: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 02:06:53: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 02:17:36: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 02:18:02: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 09:17:22: server.c.1513) server started (lighttpd/1.4.59)
2022-01-30 10:44:02: server.c.1976) server stopped by UID = 0 PID = 1
2022-01-30 10:44:25: server.c.1513) server started (lighttpd/1.4.59)
*** [ DIAGNOSING ]: contents of /var/log
-rw-r--r-- 1 pihole pihole 55K Jan 30 11:00 /var/log/pihole-FTL.log
-----head of pihole-FTL.log------
[2022-01-30 01:54:42.959 11980M] Using log file /var/log/pihole-FTL.log
[2022-01-30 01:54:42.959 11980M] ########## FTL started on raspberrypi! ##########
[2022-01-30 01:54:42.959 11980M] FTL branch: master
[2022-01-30 01:54:42.959 11980M] FTL version: v5.13
[2022-01-30 01:54:42.959 11980M] FTL commit: b197b69
[2022-01-30 01:54:42.959 11980M] FTL date: 2022-01-05 18:19:34 +0000
[2022-01-30 01:54:42.959 11980M] FTL user: pihole
[2022-01-30 01:54:42.959 11980M] Compiled for armv7hf (compiled on CI) using arm-linux-gnueabihf-gcc (Debian 6.3.0-18) 6.3.0 20170516
[2022-01-30 01:54:42.959 11980M] Creating mutex
[2022-01-30 01:54:42.959 11980M] Creating mutex
[2022-01-30 01:54:42.961 11980M] Starting config file parsing (/etc/pihole/pihole-FTL.conf)
[2022-01-30 01:54:42.961 11980M] SOCKET_LISTENING: only local
[2022-01-30 01:54:42.961 11980M] AAAA_QUERY_ANALYSIS: Show AAAA queries
[2022-01-30 01:54:42.961 11980M] MAXDBDAYS: max age for stored queries is 365 days
[2022-01-30 01:54:42.961 11980M] RESOLVE_IPV6: Resolve IPv6 addresses
[2022-01-30 01:54:42.961 11980M] RESOLVE_IPV4: Resolve IPv4 addresses
[2022-01-30 01:54:42.962 11980M] DBINTERVAL: saving to DB file every minute
[2022-01-30 01:54:42.962 11980M] DBFILE: Using /etc/pihole/pihole-FTL.db
[2022-01-30 01:54:42.962 11980M] MAXLOGAGE: Importing up to 24.0 hours of log data
[2022-01-30 01:54:42.962 11980M] PRIVACYLEVEL: Set to 0
[2022-01-30 01:54:42.962 11980M] IGNORE_LOCALHOST: Show queries from localhost
[2022-01-30 01:54:42.962 11980M] BLOCKINGMODE: Null IPs for blocked domains
[2022-01-30 01:54:42.962 11980M] ANALYZE_ONLY_A_AND_AAAA: Disabled. Analyzing all queries
[2022-01-30 01:54:42.962 11980M] DBIMPORT: Importing history from database
[2022-01-30 01:54:42.962 11980M] PIDFILE: Using /run/pihole-FTL.pid
[2022-01-30 01:54:42.962 11980M] PORTFILE: Using /run/pihole-FTL.port
[2022-01-30 01:54:42.962 11980M] SOCKETFILE: Using /run/pihole/FTL.sock
[2022-01-30 01:54:42.962 11980M] SETUPVARSFILE: Using /etc/pihole/setupVars.conf
[2022-01-30 01:54:42.962 11980M] MACVENDORDB: Using /etc/pihole/macvendor.db
[2022-01-30 01:54:42.962 11980M] GRAVITYDB: Using /etc/pihole/gravity.db
[2022-01-30 01:54:42.962 11980M] PARSE_ARP_CACHE: Active
[2022-01-30 01:54:42.962 11980M] CNAME_DEEP_INSPECT: Active
[2022-01-30 01:54:42.963 11980M] DELAY_STARTUP: No delay requested.
[2022-01-30 01:54:42.963 11980M] BLOCK_ESNI: Enabled, blocking _esni.{blocked domain}
[2022-01-30 01:54:42.963 11980M] NICE: Set process niceness to -10 (default)
-----tail of pihole-FTL.log------
[2022-01-30 10:44:26.702 738M] ADDR2LINE: Enabled
[2022-01-30 10:44:26.702 738M] REPLY_WHEN_BUSY: Permit queries when the database is busy
[2022-01-30 10:44:26.702 738M] BLOCK_TTL: 2 seconds
[2022-01-30 10:44:26.702 738M] BLOCK_ICLOUD_PR: Enabled
[2022-01-30 10:44:26.702 738M] CHECK_LOAD: Enabled
[2022-01-30 10:44:26.702 738M] CHECK_SHMEM: Warning if shared-memory usage exceeds 90%
[2022-01-30 10:44:26.702 738M] CHECK_DISK: Warning if certain disk usage exceeds 90%
[2022-01-30 10:44:26.702 738M] Finished config file parsing
[2022-01-30 10:44:26.707 738M] Database version is 9
[2022-01-30 10:44:26.708 738M] Resizing "FTL-strings" from 40960 to (81920 * 1) == 81920 (/dev/shm: 1.1MB used, 483.8MB total, FTL uses 1.1MB)
[2022-01-30 10:44:26.710 738M] Imported 0 alias-clients
[2022-01-30 10:44:26.710 738M] Database successfully initialized
[2022-01-30 10:44:27.558 738M] New upstream server: 127.0.0.1:5335 (0/256)
[2022-01-30 10:44:27.570 738M] Imported 207 queries from the long-term database
[2022-01-30 10:44:27.571 738M] -> Total DNS queries: 207
[2022-01-30 10:44:27.571 738M] -> Cached DNS queries: 67
[2022-01-30 10:44:27.571 738M] -> Forwarded DNS queries: 140
[2022-01-30 10:44:27.571 738M] -> Blocked DNS queries: 0
[2022-01-30 10:44:27.571 738M] -> Unknown DNS queries: 0
[2022-01-30 10:44:27.571 738M] -> Unique domains: 44
[2022-01-30 10:44:27.571 738M] -> Unique clients: 5
[2022-01-30 10:44:27.572 738M] -> Known forward destinations: 1
[2022-01-30 10:44:27.572 738M] Successfully accessed setupVars.conf
[2022-01-30 10:44:27.579 738M] listening on 0.0.0.0 port 53
[2022-01-30 10:44:27.579 738M] listening on :: port 53
[2022-01-30 10:44:27.586 741M] PID of FTL process: 741
[2022-01-30 10:44:27.588 741/T742] Listening on port 4711 for incoming IPv4 telnet connections
[2022-01-30 10:44:27.589 741M] INFO: FTL is running as user pihole (UID 999)
[2022-01-30 10:44:27.589 741/T744] Listening on Unix socket
[2022-01-30 10:44:27.591 741/T743] Listening on port 4711 for incoming IPv6 telnet connections
[2022-01-30 10:44:27.603 741M] Reloading DNS cache
[2022-01-30 10:44:28.601 741/T745] Compiled 0 whitelist and 0 blacklist regex filters for 5 clients in 2.7 msec
[2022-01-30 10:44:29.597 741M] Blocking status is enabled
[2022-01-30 11:00:01.881 741/T747] SQLite3 message: database is locked in "SELECT name FROM network_addresses WHERE name IS NOT NULL AND ip = ?;" (5)
[2022-01-30 11:00:01.881 741/T747] getNameFromIP("192.168.0.128") - SQL error prepare: database is locked
*** [ DIAGNOSING ]: contents of /dev/shm
-rw------- 1 pihole pihole 668K Jan 30 11:31 /dev/shm/FTL-clients
-rw------- 1 pihole pihole 240 Jan 30 10:44 /dev/shm/FTL-counters
-rw------- 1 pihole pihole 4.0K Jan 30 10:44 /dev/shm/FTL-dns-cache
-rw------- 1 pihole pihole 4.0K Jan 30 10:44 /dev/shm/FTL-domains
-rw------- 1 pihole pihole 56 Jan 30 10:44 /dev/shm/FTL-lock
-rw------- 1 pihole pihole 12K Jan 30 10:44 /dev/shm/FTL-overTime
-rw------- 1 pihole pihole 4.0K Jan 30 10:44 /dev/shm/FTL-per-client-regex
-rw------- 1 pihole pihole 176K Jan 30 10:44 /dev/shm/FTL-queries
-rw------- 1 pihole pihole 12 Jan 30 10:44 /dev/shm/FTL-settings
-rw------- 1 pihole pihole 80K Jan 30 10:44 /dev/shm/FTL-strings
-rw------- 1 pihole pihole 156K Jan 30 10:44 /dev/shm/FTL-upstreams
*** [ DIAGNOSING ]: contents of /etc
-rw-r--r-- 1 root root 24 Jan 30 01:54 /etc/dnsmasq.conf
conf-dir=/etc/dnsmasq.d
-rw-r--r-- 1 root root 47 Jan 30 10:44 /etc/resolv.conf
nameserver 127.0.0.1
*** [ DIAGNOSING ]: Pi-hole diagnosis messages
*** [ DIAGNOSING ]: Locale
LANG=en_US.UTF-8
*** [ DIAGNOSING ]: Pi-hole log
-rw-r--r-- 1 pihole pihole 88K Jan 30 11:31 /var/log/pihole.log
-----head of pihole.log------
Jan 30 01:54:48 dnsmasq[11982]: started, version pi-hole-2.87test4-18 cachesize 10000
Jan 30 01:54:48 dnsmasq[11982]: DNS service limited to local subnets
Jan 30 01:54:48 dnsmasq[11982]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n IDN DHCP DHCPv6 Lua TFTP no-conntrack ipset no-nftset auth cryptohash DNSSEC loop-detect inotify dumpfile
Jan 30 01:54:48 dnsmasq[11982]: using nameserver 127.0.0.1#5335
Jan 30 01:54:48 dnsmasq[11982]: using nameserver 127.0.0.1#5335
Jan 30 01:54:48 dnsmasq[11982]: using only locally-known addresses for onion
Jan 30 01:54:48 dnsmasq[11982]: using only locally-known addresses for bind
Jan 30 01:54:48 dnsmasq[11982]: using only locally-known addresses for invalid
Jan 30 01:54:48 dnsmasq[11982]: using only locally-known addresses for localhost
Jan 30 01:54:48 dnsmasq[11982]: using only locally-known addresses for test
Jan 30 01:54:48 dnsmasq[11982]: read /etc/hosts - 5 addresses
Jan 30 01:54:48 dnsmasq[11982]: read /etc/pihole/custom.list - 0 addresses
Jan 30 01:54:48 dnsmasq[11982]: failed to load names from /etc/pihole/local.list: No such file or directory
Jan 30 02:00:04 dnsmasq[11982]: exiting on receipt of SIGTERM
Jan 30 02:00:07 dnsmasq[13277]: started, version pi-hole-2.87test4-18 cachesize 10000
Jan 30 02:00:07 dnsmasq[13277]: DNS service limited to local subnets
Jan 30 02:00:07 dnsmasq[13277]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n IDN DHCP DHCPv6 Lua TFTP no-conntrack ipset no-nftset auth cryptohash DNSSEC loop-detect inotify dumpfile
Jan 30 02:00:07 dnsmasq[13277]: using nameserver 127.0.0.1#5335
Jan 30 02:00:07 dnsmasq[13277]: using only locally-known addresses for onion
Jan 30 02:00:07 dnsmasq[13277]: using only locally-known addresses for bind
-----tail of pihole.log------
Jan 30 11:31:20 dnsmasq[741]: query[AAAA] ns1.pi-hole.net from 127.0.0.1
Jan 30 11:31:20 dnsmasq[741]: forwarded ns1.pi-hole.net to 127.0.0.1
Jan 30 11:31:20 dnsmasq[741]: reply ns1.pi-hole.net is 205.251.193.151
Jan 30 11:31:20 dnsmasq[741]: reply ns1.pi-hole.net is 2600:9000:5301:9700::1
Jan 30 11:31:22 dnsmasq[741]: query[A] mail.chileexe77.com from 127.0.0.1
Jan 30 11:31:22 dnsmasq[741]: gravity blocked mail.chileexe77.com is 0.0.0.0
Jan 30 11:31:22 dnsmasq[741]: query[A] mail.chileexe77.com from 192.168.0.155
Jan 30 11:31:22 dnsmasq[741]: gravity blocked mail.chileexe77.com is 0.0.0.0
Jan 30 11:31:22 dnsmasq[741]: query[A] mail.chileexe77.com from 10.6.0.1
Jan 30 11:31:22 dnsmasq[741]: gravity blocked mail.chileexe77.com is 0.0.0.0
Jan 30 11:31:22 dnsmasq[741]: query[PTR] 155.0.168.192.in-addr.arpa from 127.0.0.1
Jan 30 11:31:23 dnsmasq[741]: config 155.0.168.192.in-addr.arpa is <PTR>
Jan 30 11:31:23 dnsmasq[741]: query[PTR] 1.0.6.10.in-addr.arpa from 127.0.0.1
Jan 30 11:31:23 dnsmasq[741]: config 1.0.6.10.in-addr.arpa is <PTR>
Jan 30 11:31:23 dnsmasq[741]: query[AAAA] file.firefoxupdata.com from ::1
Jan 30 11:31:23 dnsmasq[741]: gravity blocked file.firefoxupdata.com is ::
Jan 30 11:31:23 dnsmasq[741]: query[AAAA] file.firefoxupdata.com from fe80::a7c:c1a2:460f:f20b
Jan 30 11:31:23 dnsmasq[741]: gravity blocked file.firefoxupdata.com is ::
Jan 30 11:31:24 dnsmasq[741]: query[PTR] b.0.2.f.f.0.6.4.2.a.1.c.c.7.a.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa from 127.0.0.1
Jan 30 11:31:24 dnsmasq[741]: config b.0.2.f.f.0.6.4.2.a.1.c.c.7.a.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa is <PTR>
********************************************
********************************************
[✓] ** FINISHED DEBUGGING! **

Paragraph execution in zeppelin goes to pending state after some time and hadoop application status for zeppelin is finished

I am using zeppelin from the last 3 months and noticed this strange problem recently. Everyday morning I had to restart zeppelin for it to work or else the paragraph execution will go to pending state and never run. I tried to dig deeper to check what is the problem. The state of the zeppelin application in yarn is finshed. I tried to check the log and it shows the below error. Couldn't make out anything out of it.
2017-06-28 22:04:08,986 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56876 for container-id container_1498627544571_0001_01_000002: 1.2 GB of 4 GB physical memory used; 4.0 GB of 20 GB virtual memory used
2017-06-28 22:04:08,995 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56787 for container-id container_1498627544571_0001_01_000001: 330.2 MB of 1 GB physical memory used; 1.4 GB of 5 GB virtual memory used
2017-06-28 22:04:09,964 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1498627544571_0001_01_000002 is : 1
2017-06-28 22:04:09,965 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1498627544571_0001_01_000002 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 22:04:09,972 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-06-28 22:04:09,972 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1498627544571_0001_01_000002
2017-06-28 22:04:09,972 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
I am the only user in that environment and no one else is using it. There isn't any process running at that time as well. Couldn't understand why it is happening.

Flink Streaming Job is Failed Automatically

I am running a flink streaming job with parallelism 1 .
Suddenly after 8 hours job failed . It showed
Association with remote system [akka.tcp://flink#192.168.3.153:44863] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
2017-04-12 00:48:36,683 INFO org.apache.flink.yarn.YarnJobManager - Container container_e35_1491556562442_5086_01_000002 is completed with diagnostics: Container [pid=64750,containerID=container_e35_1491556562442_5086_01_000002] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e35_1491556562442_5086_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 64750 64748 64750 64750 (bash) 0 0 108654592 306 /bin/bash -c /usr/java/jdk1.7.0_67-cloudera/bin/java -Xms724m -Xmx724m -XX:MaxDirectMemorySize=1448m -Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native/ -Dlog.file=/var/log/hadoop-yarn/container/application_1491556562442_5086/container_e35_1491556562442_5086_01_000002/taskmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnTaskManagerRunner --configDir . 1> /var/log/hadoop-yarn/container/application_1491556562442_5086/container_e35_1491556562442_5086_01_000002/taskmanager.out 2> /var/log/hadoop-yarn/container/application_1491556562442_5086/container_e35_1491556562442_5086_01_000002/taskmanager.err
|- 64756 64750 64750 64750 (java) 269053 57593 2961149952 524252 /usr/java/jdk1.7.0_67-cloudera/bin/java -Xms724m -Xmx724m -XX:MaxDirectMemorySize=1448m -Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native/ -Dlog.file=/var/log/hadoop-yarn/container/application_1491556562442_5086/container_e35_1491556562442_5086_01_000002/taskmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnTaskManagerRunner --configDir .
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
There are no application/code side error.
Need help to understand what could be the cause ?
The job is killed, because it exceeds memory limits set in Yarn.
See this part of your error message:
Container [pid=64750,containerID=container_e35_1491556562442_5086_01_000002] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.9 GB of 4.2 GB virtual memory used. Killing container.

Restart a mpi slave after checkpoint before failure on ARMv6

UPDATE
I have an university project in which I should build up a cluster with RPis.
Now we have a fully functional system with BLCR/MPICH on.
BLCR works very well with normal processes linked with the lib.
Demonstrations we have to show from our management web interface are:
parallel execution of a job
migration of processes across the nodes
fault tolerance with MPI
We are allowed to use the simplest computations.
The first one we got easily, with MPI too. The second point we actually have only working with normal processes (without MPI). Regarding the third point I have less idea how to implement a master-slave MPI scheme, in which I can restart a slave process, which also affects point two because we should/can/have_to make a checkpoint of the slave process, kill/stop it and restart it on another node. I know that I have to handle the MPI_Errors myself but how to restore the process? It would be nice if someone could post me a link or paper (with explanations) at least.
Thanks in advance
UPDATE:
As written earlier our BLCR+MPICH stuff works or seems to.
But... When I start MPI Processes checkpointing seems to work well.
Here the proof:
... snip ...
Benchmarking: dynamic_5: md5($s.$p.$s) [32/32 128x1 (MD5_Body)]... DONE
Many salts: 767744 c/s real, 767744 c/s virtual
Only one salt: 560896 c/s real, 560896 c/s virtual
Benchmarking: dynamic_5: md5($s.$p.$s) [32/32 128x1 (MD5_Body)]... [proxy:0:0#node2] requesting checkpoint
[proxy:0:0#node2] checkpoint completed
[proxy:0:1#node1] requesting checkpoint
[proxy:0:1#node1] checkpoint completed
[proxy:0:2#node3] requesting checkpoint
[proxy:0:2#node3] checkpoint completed
... snip ...
If I kill one Slave-Process on any node I get this here:
... snip ...
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
... snip ...
It is ok because we have a checkpoint so we can restart our application.
But it doesn't work:
pi 7380 0.0 0.2 2984 1012 pts/4 S+ 16:38 0:00 mpiexec -ckpointlib blcr -ckpoint-prefix /tmp -ckpoint-num 0 -f /tmp/machinefile -n 3
pi 7381 0.1 0.5 5712 2464 ? Ss 16:38 0:00 /usr/bin/ssh -x 192.168.42.101 "/usr/local/bin/mpich/bin/hydra_pmi_proxy" --control-port masterpi:47698 --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
pi 7382 0.1 0.5 5712 2464 ? Ss 16:38 0:00 /usr/bin/ssh -x 192.168.42.102 "/usr/local/bin/mpich/bin/hydra_pmi_proxy" --control-port masterpi:47698 --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
pi 7383 0.1 0.5 5712 2464 ? Ss 16:38 0:00 /usr/bin/ssh -x 192.168.42.105 "/usr/local/bin/mpich/bin/hydra_pmi_proxy" --control-port masterpi:47698 --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 2
pi 7438 0.0 0.1 3548 868 pts/1 S+ 16:40 0:00 grep --color=auto mpi
I don't know why but the first time I restart the app on every node the process seems to be restarted (I got it from using top or ps aux | grep "john" but no output to the management (or on the management console/terminal) is shown. It just hangs up after showing me:
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp -ckpoint-num 0 -f /tmp/machinefile -n 3
Warning: Permanently added '192.168.42.102' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.101' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.105' (ECDSA) to the list of known hosts.
My plan B is just to test with own application if the BLCR/MPICH stuff really works. Maybe there some troubles with john.
Thanks in advance
**
UPDATE
**
Next problem with simple hello world. I dispair slowly. Maybe I'm confused too much.
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/ -ckpoint-interval 3 -f /tmp/machinefile -n 4 ./hello
Warning: Permanently added '192.168.42.102' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.105' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.101' (ECDSA) to the list of known hosts.
[proxy:0:0#node2] requesting checkpoint
[proxy:0:0#node2] checkpoint completed
[proxy:0:1#node1] requesting checkpoint
[proxy:0:1#node1] checkpoint completed
[proxy:0:2#node3] requesting checkpoint
[proxy:0:2#node3] checkpoint completed
[proxy:0:0#node2] requesting checkpoint
[proxy:0:0#node2] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:0#node2] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:0#node2] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#node2] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1#node1] requesting checkpoint
[proxy:0:1#node1] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:1#node1] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:1#node1] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1#node1] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:2#node3] requesting checkpoint
[proxy:0:2#node3] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:2#node3] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:2#node3] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2#node3] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec#masterpi] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec#masterpi] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#masterpi] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec#masterpi] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
hello.c
/* C Example */
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size, i, j;
char hostname[1024];
hostname[1023] = '\0';
gethostname(hostname, 1023);
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
i = 0;
for(i ; i < 400000000; i++){
for(j; j < 4000000; j++){
}
}
printf("%s done...", hostname);
printf("%s: %d is alive\n", hostname, getpid());
MPI_Finalize();
return 0;
}

Resources