Wireguard keeps generating a new keypair on one client - wireguard

Client 2 successfully maintains a wireguard connection.
Client 1 repeatedly creates/destroys keypairs.
Both profiles work fine on Client 2 (Android, mobile connection)
Both profiles don't work properly even though they did in the past, on Client 1 (Windows, cloud VM)
I've restarted the wg0 interface. Constant pings to the wireguard server do not show any issues from Client 1, I just can't load any pages I suspect because the keypairs constantly change. No other device is using Client 1's profile. One odd thing is how 2 keypairs seem to be created at the same time, which is likely the cause of them perpetually being destroyed.
How can I troubleshoot what is causing the keypair to need to be recreated so frequently?
2022-07-25 07:21:25.332: [TUN] [client1] Keypair 11 created for peer 1
2022-07-25 07:21:25.332: [TUN] [client1] Sending keepalive packet to peer 1 (:51820)
2022-07-25 07:21:25.518: [TUN] [client1] Receiving keepalive packet from peer 1 (:51820)
2022-07-25 07:21:41.666: [TUN] [client1] Receiving handshake initiation from peer 1 (:51820)
2022-07-25 07:21:41.666: [TUN] [client1] Sending handshake response to peer 1 (:51820)
2022-07-25 07:21:41.667: [TUN] [client1] Keypair 10 destroyed for peer 1
2022-07-25 07:21:41.667: [TUN] [client1] Keypair 12 created for peer 1
2022-07-25 07:21:41.882: [TUN] [client1] Receiving keepalive packet from peer 1 (:51820)
2022-07-25 07:22:28.060: [TUN] [client1] Receiving keepalive packet from peer 1 (:51820)
2022-07-25 07:22:52.214: [TUN] [client1] Retrying handshake with peer 1 (:51820) because we stopped hearing back after 15 seconds
2022-07-25 07:22:52.214: [TUN] [client1] Sending handshake initiation to peer 1 (:51820)
2022-07-25 07:22:52.385: [TUN] [client1] Receiving handshake response from peer 1 (:51820)
2022-07-25 07:22:52.385: [TUN] [client1] Keypair 11 destroyed for peer 1
2022-07-25 07:22:52.385: [TUN] [client1] Keypair 13 created for peer 1
2022-07-25 07:22:52.385: [TUN] [client1] Sending keepalive packet to peer 1 (:51820)
2022-07-25 07:22:52.571: [TUN] [client1] Receiving keepalive packet from peer 1 (:51820)

I've stumbled upon this exact issue (or very similar?) so here is what worked for me:
if the client is unable to finish the handshake - I'd suggest to try to regenerate the keys - as suggested here:
https://serverfault.com/questions/1040165/wireguard-not-completing-handshake
this however did not solve my issue..
1st verify if reported WAN IP address in wireguard server log matches the one of the client. Mine was off in the last octet - which is very strange.
Since my client is hosted on ISP network that shares one public IP address for many hose holds - it's possible that UDP packets were ending up on some other wireguard instance (?speculation) - unable to tell exactly as I have almost no visibility into ISP internal networking.
2nd if the WAN IP of client (you can verify like curl -s ipinfo.io/ip or by typing "what is my IP" into google) does not match - you may want to change client wg0.conf to listen on a different port than ListenPort = 51820 - I've adjusted it like so:
root#wg-client2:~# cat /etc/wireguard/wg0.conf
[Interface]
PrivateKey = ###MODERATED###
Address = ###MODERATED###/24
ListenPort = 51810
this client conf change does not need any server conf change.
I did wg-quick down wg0 && wg-quick up wg0 on both server and client and the comm was then established. (possible that only client needs to be restarted).
to check for wireguard kern. logs:
echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control
dmesg -wT | grep wireguard
to dump all clients on server: wg show all dump
good article on wireguard logging: https://www.procustodibus.com/blog/2021/03/wireguard-logs/

Never came back with an update, but the issue turned out to be packets being fragmented somewhere in the path of the host. I believe on their end since it worked for years then suddenly stopped. I reduced the MTU in the wireguard config and the problem was resolved.
Just in case someone else runs into similar...

Related

Unable to SSH into wireguard IP until I ping another server from inside the server

I have wireguard setup on a machine (call it MachineA, with the IP 10.42.0.19). I have my laptop configured with the IP 10.42.0.15, call it LaptopB. I am able to SSH into MachineA from the LaptopB when I connect both peers using ssh root#MachineA. Then, if I wait a while, I can no longer SSH into the MachineA from LaptopB. For example, the same command ssh root#MachineA just hangs.
Using -vvvv shows me this:
$ ssh -vvvv root#10.42.0.19
OpenSSH_8.3p1 Ubuntu-1ubuntu0.1, OpenSSL 1.1.1f 31 Mar 2020
debug1: Reading configuration data /home/xrd/.ssh/config
...
debug2: ssh_connect_direct
debug1: Connecting to 10.42.0.19 [10.42.0.19] port 22.
And, it never connects.
There is a simple fix: from inside the machine, ping any other Wireguard machine on the network. MachineA is a DigitalOcean droplet. If I use the web console to login, and then ping any other peer on the network (say 10.42.0.4), then immediately after the ping starts, the SSH connection completes.
How do I troubleshoot this?
I have not restarted wireguard on either LaptopB nor MachineA. Both appear to be connected.
In my wg0.conf on both ends they are more or less like this:
[Interface]
Address = 10.42.0.19/24
PrivateKey = DontYouWishYouHadThis
DNS = 10.42.0.1,8.8.8.8
[Peer]
PublicKey = SomePublicKeyIsHere
AllowedIPs = 10.42.0.0/24
Endpoint = 33.33.33.33.:51280

Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host in taskmanager

When I start my apache flink 1.10 taskmanager service in kubernetes(v1.15.2) cluster,it shows logs like this:
2020-05-01 08:34:55,847 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink#flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:34:55,847 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:34:55,848 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,874 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:35:08,877 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,878 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink#flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:35:21,907 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
and the taskmanager could not registered success, and I logged into taskmanager and find out I could success ping jobmanager liket this:
flink#flink-taskmanager-54d85f57c7-nl9cf:~$ ping flink-jobmanager
PING flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171) 56(84) bytes of data.
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=3 ttl=64 time=0.079 ms
so why this would happen and what should I do to fix it?
Try to install nmap in your kubernetes taskmanger's pod container:
apt-get udpate
apt-get install nmap -y
then scan the jobmanager and make sure the pod's expose port 6123 is accessable(in my case ,I found could not access the port 6123 from current pod).
nmap -T4 <your-jobmanager's-pod-ip>
Hope this help.

Replace zookeeper server from zookeeper ensemble (with SolrCloud)

I have a SolrCloud cluster (6.6) setup with external Zookeeper Ensemble (3.4.8) of 5 nodes. Recently, one machine (ip1:port1) that run 1 Zookeeper with id=1 went down. This is what I've done to replace zookeeper:
Start zookeeper in another machine with the same id (=1).
Change zoo.cfg in 4 live zookeeper to match new zookeeper server and restart.
Update ZK_HOST variable in solr.in.sh to match new zookeeper server.
Restart solr.
After that, my solr cluster seemed to functioning well, but in solr.log, it looked like solr client and zookeeper servers still try to connect to the old zookeeper:
Solr log
2017-12-01 15:04:38.782 WARN (Timer-0-SendThread(ip1:port1)) [ ] o.a.z.ClientCnxn Client session timed out, have not heard from server in 30029ms for sessionid 0x0
2017-12-01 15:04:40.807 WARN (Timer-0-SendThread(ip1:port1)) [ ] o.a.z.ClientCnxn Client session timed out, have not heard from server in 31030ms for sessionid 0x0
Zookeeper log:
2017-12-01 13:53:57,972 [myid:] - INFO [main-SendThread(ip1:port1):ClientCnxn$SendThread#1032] - Opening socket connection to server ip1:port1. Will not attempt to authenticate using SASL (unknown error)
2017-12-01 13:54:03,972 [myid:] - WARN [main-SendThread(ip1:port1):ClientCnxn$SendThread#1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2017-12-01 13:54:05,074 [myid:] - INFO [main-SendThread(ip1:port1):ClientCnxn$SendThread#1032] - Opening socket connection to server ip1:port1. Will not attempt to authenticate using SASL (unknown error)
2017-12-01 13:54:06,974 [myid:] - WARN [main-SendThread(ip1:port1):ClientCnxn$SendThread#1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
I've done some search in add/remove zookeeper but didn't find a document for it. My zookeeper version (3.4.7) is not supported for dynamic reconfiguration (which is in zookeeper 3.5).
Is there a way I can manually remove/add zookeeper server from ensemble?
Thanks for your attention!

Web application doesn't run on port 80, but runs on 4200

I have an angular web app which I'm trying to deploy. When I run it on port 4200, and access the website using the external IP, I can see the web page.
However, when I run the same application on port 80, it runs but the website isn't reachable anymore. (Connection refused)
I can see the process listening on port 80. Here's the output of netstat.
user#localhost:/etc$ sudo netstat -ant
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 aaa.bbb.ccc.ddd:80 0.0.0.0:* LISTEN
tcp 0 0 aaa.bbb.ccc.ddd:22 aaa.bbb.ccc.eee:51422 ESTABLISHED
tcp 0 320 aaa.bbb.ccc.ddd:22 aaa.bbb.ccc.eee:51421 ESTABLISHED
tcp6 0 0 :::21 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
I have opened port 80, but it's still the same. I'm running Ubuntu 16.04.
try different port.. 80 may be disable or blocked from your side. Use port no. 90.

not attempt to authenticate using SASL (unknown error)

I am trying to setup zookeeper on ec2 two instances. as given here and here.
I am trying to run zookeeper which fails with an error:
command: bin/zkCli.sh -server localhost:2181
> 2015-03-15 00:22:35,644 [myid:] - INFO [main:ZooKeeper#438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher#3ff0efca
Welcome to ZooKeeper!
2015-03-15 00:22:35,671 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread#975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2015-03-15 00:22:35,677 [myid:] - WARN [main-SendThread(localhost:2181):ClientCnxn$SendThread#1102] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[zk: localhost:2181(CONNECTING) 0] 2015-03-15 00:22:36,796 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread#975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-03-15 00:22:36,797 [myid:] - WARN [main-SendThread(localhost:2181):ClientCnxn$SendThread#1102] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
zoo.cfg as bellow
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
server.1=localhost:2888:3888
server.2=<My ec2 private IPs>:2889:3889
also I have created myId file as on both ec2 instances - /var/lib/zookeeper/myid
I also tried to edit /ect/hosts file but still facing the same issue.
also how I can start both of the zookeeper instances by 1 command?
Note: Server get started successfully if I tried with bin/zkCli.sh start command.
Thanks in advance!
look zk log zookeeper.out,if there have connection limit error, configure the following to zoo.cfg.
# the maximum number of client connections.
# increase this if you need to handle more clients
maxClientCnxns=60
This is temporary error , for mine after some time , It gone away :-
This is my zoo.conf file ::-
Dir=../data
clientPort=2181
tickTime=2000
initLimit=5
This error occurred when I forgot to run% ZOOKEEPER_HOME% \ bin \ zkserver.cmd
By running, the problem has been resolved.
Correct this property on the server.properties
default would be localhost change it to match the zookeeper server starup ip and port
zookeeper.connect=0.0.0.0:2181

Resources