Vagrant VMs can talk to each other, but I can't reach HTTP from the host - solr

I have 5 VMs on 192.168.56.*:
.19 - Zookeeper
.20 - Solr1
.21 - Solr2
.22 - Solr3
.23 - Solr4
This is my Vagrantfile:
Vagrant.configure(2) do |config|
# The most common configuration options are documented and commented below.
# For a complete reference, please see the online documentation at
# https://docs.vagrantup.com.
# Every Vagrant development environment requires a box. You can search for
# boxes at https://atlas.hashicorp.com/search.
# config.vm.box = "base"
(1..4).each do |x|
ip = 20
config.vm.define "solr#{x}" do |solr|
solr.vm.box = 'ubuntu/wily64'
solr.vm.network "private_network", ip: "192.168.56.#{ip}", bridge: "Intel(R) Centrino(R) Advanced-N 6205"
ip = ip + 1
solr.vm.provider "virtualbox" do |v|
v.memory = 2048
#v.cpus = 1
end
end
end
end
I have Apache HTTP on port 80 and Solr on port 8983. I can do wget 192.168.56.20:8983 from the ZooKeeper VM and it downloads the main page. When I try to hit 192.168.56.20:8983 from the host OS, it just hangs. Firewall rules are in place that open up those ports, so no idea why .19 can access Solr, but the host cannot.
Any ideas?

Related

Asimbench benchmark running in gem5 fails with "fatal: Unable to find destination for [0x40008000:0x40008040] on system.iobus"

I have downloaded asimbench files which provided in the gem5.org website and I have modified the config/common/FSConfig.py with following changes:
def makeArmSystem(..)
..................
self.cf0 = CowIdeDisk(driveID='master')
self.cf2 = CowIdeDisk(driveID='master')
self.cf0.childImage(mdesc.disk())
self.cf2.childImage(disk("sdcard-1g-mxplayer.img"))
#Old platforms have a built-in IDE or CF controller. Default to
#the IDE controller if both exist. New platforms expect the
#storage controller to be added from the config script.
if hasattr(self.realview, "ide"):
#self.realview.ide.disks = [self.cf0]
self.realview.ide.disks = [self.cf0, self.cf2]
elif hasattr(self.realview, "cf_ctrl"):
#self.realview.cf_ctrl.disks = [self.cf0]
self.realview.cf_ctrl.disks = [self.cf0, self.cf2]
else:
self.pci_ide = IdeController(disks=[self.cf0])
pci_devices.append(self.pci_ide
I used this command:
./build/ARM/gem5.opt configs/example/fs.py --mem-size=8192MB
--disk-image=/home/yaz/gem5/full_system_images/disks/ARMv7a-ICS-Android.SMP.Asimbench-v3.img
--kernel=/home/yaz/gem5/full_system_images/binaries/vmlinux.smp.ics.arm.asimbench.2.6.35
--os-type=android-ics --cpu-type=MinorCPU --machine-type=VExpress_GEM5 --script=/home/yaz/gem5/full_system_images/boot/adobe.rcS
warn: CheckedInt already exists in allParams. This may be caused by
the Python 2.7 compatibility layer. warn: Enum already exists in
allParams. This may be caused by the Python 2.7 compatibility layer.
warn: ScopedEnum already exists in allParams. This may be caused by
the Python 2.7 compatibility layer. gem5 Simulator System.
http://gem5.org gem5 is copyrighted software; use the --copyright
option for details. gem5 version 20.0.0.3 gem5 compiled Jul 7 2020
16:17:12 gem5 started Jul 16 2020 04:41:50 gem5 executing on
yazeed-OptiPlex-9010, pid 3367 command line: ./build/ARM/gem5.opt
configs/example/fs.py --mem-size=8192MB
--disk-image=/home/yaz/gem5/full_system_images/disks/ARMv7a-ICS-Android.SMP.Asimbench-v3.img
--kernel=/home/yaz/gem5/full_system_images/binaries/vmlinux.smp.ics.arm.asimbench.2.6.35
--os-type=android-ics --cpu-type=MinorCPU --machine-type=VExpress_GEM5 --script=/home/yaz/gem5/full_system_images/boot/adobe.rcS
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
info: kernel located at: /home/yaz/gem5/full_system_images/binaries/vmlinux.smp.ics.arm.asimbench.2.6.35
system.vncserver: Listening for connections on port 5900
system.terminal: Listening for connections on port 3456
system.realview.uart1.device: Listening for connections on port 3457
system.realview.uart2.device: Listening for connections on port 3458
system.realview.uart3.device: Listening for connections on port 3459
0: system.remote_gdb: listening for remote gdb on port 7000 info:
Using bootloader at address 0x80000000
info: Using kernel entry physical address at 0x140008000 warn: DTB file specified, but no
device tree support in kernel
**** REAL SIMULATION ****
warn:Existing EnergyCtrl, but no enabled DVFSHandler found. info: Entering
event queue # 0. Starting simulation...
fatal: Unable to find destination for [0x40008000:0x40008040] on system.iobus
Memory Usage: 8786764 KBytes
Thanks for helping

J-Link connection to Cortex-A53 (Raspberry PI3b+)

I've got a JTAG (J-Link more precisely) related problem.
I'm trying to connect by J-Link to raspberry pi 3b+ (bare-metal).
The probe finds the CPU and reads coresight ROM table, but there are missing information about Cross Trigger Interface (CTI). The units are available in the CPU according to ARM documentation.
There is a possibility to write a special script for J-Link, to set up CPU but documentation is poor and I do not know how to do it.
It there anyone who had or met with similar problem? Any advice how to run it?
Below is output from run of JLinkExe with attempt to connect:
./JLinkExe
SEGGER J-Link GDB Server V6.52c Command Line Version
JLinkARM.dll V6.52c (DLL compiled Oct 11 2019 15:44:50)
Command line: -if jtag -device Cortex-A53 -endian little -speed auto -port 2331 -swoport 2332 -telnetport 2333 -vd -ir -localhostonly 1 -singlerun -strict -timeout 0 -nogui -jlinkscriptfile /home/piotr/rpi.JLinkScript
-----GDB Server start settings-----
GDBInit file: none
GDB Server Listening port: 2331
SWO raw output listening port: 2332
Terminal I/O port: 2333
Accept remote connection: localhost only
Generate logfile: off
Verify download: on
Init regs on start: on
Silent mode: off
Single run mode: on
Target connection timeout: 0 ms
------J-Link related settings------
J-Link Host interface: USB
J-Link script: /home/piotr/rpi.JLinkScript
J-Link settings file: none
------Target related settings------
Target device: Cortex-A53
Target interface: JTAG
Target interface speed: auto
Target endian: little
Connecting to J-Link...
J-Link is connected.
Firmware: J-Link V10 compiled Oct 8 2019 14:57:57
Hardware: V10.10
S/N: 260111336
OEM: SEGGER-EDU
Feature(s): FlashBP, GDB
Checking target voltage...
Target voltage: 3.09 V
Listening on TCP/IP port 2331
Connecting to target...ERROR: CTI connected to core not found. Debugging not possible
ERROR: CTI connected to core not found. Debugging not possible
ERROR: Could not connect to target.
Target connection failed. GDBServer will be closed...Restoring target state and closing J-Link connection...
Shutting down...
Could not connect to target.
Please check power, connection and settings. piotr  /  opt  JLink  ./JLinkExe
SEGGER J-Link Commander V6.52c (Compiled Oct 11 2019 15:44:58)
DLL version V6.52c, compiled Oct 11 2019 15:44:50
Connecting to J-Link via USB...O.K.
Firmware: J-Link V10 compiled Oct 8 2019 14:57:57
Hardware version: V10.10
S/N: 260111336
License(s): FlashBP, GDB
OEM: SEGGER-EDU
VTref=3.085V
Type "connect" to establish a target connection, '?' for help
J-Link>connect
Please specify device / core. <Default>: CORTEX-A53
Type '?' for selection dialog
Device>
Please specify target interface:
J) JTAG (Default)
S) SWD
T) cJTAG
TIF>J
Device position in JTAG chain (IRPre,DRPre) <Default>: -1,-1 => Auto-detect
JTAGConf>
Specify target interface speed [kHz]. <Default>: 4000 kHz
Speed>
Device "CORTEX-A53" selected.
Connecting to target via JTAG
TotalIRLen = 4, IRPrint = 0x01
JTAG chain detection found 1 devices:
#0 Id: 0x4BA00477, IRLen: 04, CoreSight JTAG-DP
Scanning AP map
AP scan stopped (required AP found)
AP[0]: APB-AP
Scanning ROMTbl # 0x80000000
[0]Comp[0] # 0x80010000: Cortex-A53
[0]Comp[1] # 0x80011000: PMU-A53
[0]Comp[2] # 0x80012000: Cortex-A53
[0]Comp[3] # 0x80013000: PMU-A53
[0]Comp[4] # 0x80014000: Cortex-A53
[0]Comp[5] # 0x80015000: PMU-A53
[0]Comp[6] # 0x80016000: Cortex-A53
[0]Comp[7] # 0x80017000: PMU-A53
End of ROM table
TotalIRLen = 4, IRPrint = 0x01
JTAG chain detection found 1 devices:
#0 Id: 0x4BA00477, IRLen: 04, CoreSight JTAG-DP
Scanning AP map
AP scan stopped (required AP found)
AP[0]: APB-AP
Scanning ROMTbl # 0x80000000
[0]Comp[0] # 0x80010000: Cortex-A53
[0]Comp[1] # 0x80011000: PMU-A53
[0]Comp[2] # 0x80012000: Cortex-A53
[0]Comp[3] # 0x80013000: PMU-A53
[0]Comp[4] # 0x80014000: Cortex-A53
[0]Comp[5] # 0x80015000: PMU-A53
[0]Comp[6] # 0x80016000: Cortex-A53
[0]Comp[7] # 0x80017000: PMU-A53
End of ROM table
****** Error: CTI connected to core not found. Debugging not possible
TotalIRLen = 4, IRPrint = 0x01
JTAG chain detection found 1 devices:
#0 Id: 0x4BA00477, IRLen: 04, CoreSight JTAG-DP
Scanning AP map
AP scan stopped (required AP found)
AP[0]: APB-AP
Scanning ROMTbl # 0x80000000
[0]Comp[0] # 0x80010000: Cortex-A53
[0]Comp[1] # 0x80011000: PMU-A53
[0]Comp[2] # 0x80012000: Cortex-A53
[0]Comp[3] # 0x80013000: PMU-A53
[0]Comp[4] # 0x80014000: Cortex-A53
[0]Comp[5] # 0x80015000: PMU-A53
[0]Comp[6] # 0x80016000: Cortex-A53
[0]Comp[7] # 0x80017000: PMU-A53
End of ROM table
TotalIRLen = 4, IRPrint = 0x01
JTAG chain detection found 1 devices:
#0 Id: 0x4BA00477, IRLen: 04, CoreSight JTAG-DP
Scanning AP map
AP scan stopped (required AP found)
AP[0]: APB-AP
Scanning ROMTbl # 0x80000000
[0]Comp[0] # 0x80010000: Cortex-A53
[0]Comp[1] # 0x80011000: PMU-A53
[0]Comp[2] # 0x80012000: Cortex-A53
[0]Comp[3] # 0x80013000: PMU-A53
[0]Comp[4] # 0x80014000: Cortex-A53
[0]Comp[5] # 0x80015000: PMU-A53
[0]Comp[6] # 0x80016000: Cortex-A53
[0]Comp[7] # 0x80017000: PMU-A53
End of ROM table
****** Error: CTI connected to core not found. Debugging not possible
Cannot connect to target.
You may find some of the information you need in this (detailed) article.
The (DBG/)CTI addresses you are looking for would then be:
set DBGBASE {0x80010000 0x80012000 0x80014000 0x80016000}
set CTIBASE {0x80018000 0x80019000 0x8001a000 0x8001b000}
Please note that you can test using OpenOCD, since it does support your J-Link EDU, by using the tcl/interface/jlink.cfg definition file, prior to resume your testing with J-Link commander.
You should use the trunk version of OpenOCD, since 0.10 does not support Armv8 form what I understand. The git repository is located here: git://repo.or.cz/openocd.git
Regarding the J-Link-specific script, you may search on the Segger support forum archives now that you know the CTI addresses, the information you are looking for may be available there. If not, just ask for help on the forum.
Update: you already asked for support on the forum I guess.
From the manual, page 218, you may just need to set CORESIGHT_Core-BaseAddr in your script - there is an example here.

Trying to reach localhost from inside Selenium docker

I'm trying to run my tests using Selenium docker,
I have a local grunt server running on port 9000, I' launched the following selenium docker:
docker run -d -p 4444:4444 -p 5900:5900 selenium/standalone-chrome-debug
Then I've launched my tests (using Capybara) and opened VNC to watch the tests, but all I get is chrome messgae "This site can’t be reached".
cabybara.rb:
isWindows = (/cygwin|mswin|mingw|bccwin|wince|emx/ =~ RUBY_PLATFORM) != nil
require 'capybara/rspec'
require 'capybara'
require 'capybara/dsl'
require_relative 'sinatra_proxy'
require 'selenium/webdriver'
require 'selenium/webdriver/remote/http/curb' if !isWindows
Capybara.register_driver :selenium_chrome do |app|
http_client = isWindows ? nil : Selenium::WebDriver::Remote::Http::Curb.new
options = {
http_client: http_client,
browser: :chrome,
# service_log_path: 'chromedriver.out', # Enable Selenium logs
switches: ["--disable-web-security", '--user-agent="Chrome under Selenium for Capybara"']
}
options[:url] = "http://172.17.0.2:4444/wd/hub"
Capybara::Selenium::Driver.new app, options
end
Capybara.default_driver = :selenium_chrome
Capybara.app = SinatraProxy.new
Capybara.app_host = "http://127.0.0.1:9000"
Capybara.server_host = '0.0.0.0'
ip addr show docker0
ip addr show docker0
6: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:22:ec:65:9e:f1 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe40::42:ecdd:fe73:9ef4/64 scope link
valid_lft forever preferred_lft forever
Needed to be the IP of docker host, used:
ip route show | grep docker0 | awk '{print $9}'
for Capybara.app_host (DOCKER_HOST_IP:PORT) and Capybara.server_host (DOCKER_HOST_IP)
Use:
Capybara.app_host = "http://yourhostip:9000"
not localhost. For docker container localhost is itself.
Also I recommend not calling docker by its internal ip just use:
options[:url] = "http://localhost:4444/wd/hub"
But first solve the former problem
Regards

How to access homestead on other devices from mac, If running more than one projects in one vagrant box?

Is there any way to access homestead, running more than one projects. Like if I have setup homestead.yaml file with three different domains, as given below:
---
box: laravel/homestead-7
ip: "192.168.10.10"
memory: 2048
cpus: 1
provider: virtualbox
authorize: ~/Laravelhomestead.pub
keys:
- ~/Laravelhomestead
folders:
- map: ~/Code
to: /home/vagrant/Code
sites:
- map: siteone.app
to: /home/vagrant/Code/siteone/public
- map: sitetwo.app
to: /home/vagrant/Code/sitetwo/public
- map: sitethree.app
to: /home/vagrant/Code/sitethree/public
databases:
- siteone
- sitetwo
- sitethree
variables:
- key: APP_ENV
value: local
# blackfire:
# - id: foo
# token: bar
# client-id: foo
# client-token: bar
# ports:
# - send: 93000
# to: 9300
# - send: 7777
# to: 777
# protocol: udp
And when I try to access my machine on the same network with my machine allocated IP (192.168.1.6:8000), It works but It goes to sitethree.app, and I want to access siteone.app. If anyone knows how to do that, please leave your answer.
Any help is appreciated.
Thanks.
Assuming you are on Linux/Mac (on other machines) you need to add following into your /etc/hosts file
192.168.1.6 siteone.app sitetwo.app sitethree.app
And for MS Windows machines it would be something like C:\WINDOWS\system32\drivers\etc\hosts

How can I have DNS name resolving running while other protocols seem to be down?

We are trying to implement a software based on Moxa UC-7112-LX embedded computer (uClinux OS). We use Cinteron MC52i GSM modem (regular GPRS service) and standart pppd to connect to the Internet.
Everything seems to be fine, right after the connection. Ping utility is working, Socket functions in my program work normally too. However after some time ppp connection brokes in a very peculiar way. These are the symptoms of that situation:
When I call ping utility with some host name as parameter the system is able to resolve it's IP and starts sending ICMP packets but gets no response. I am trying different web resources names, so that the system cannot have their addresses cached or something. Whatever I choose, the system correctly resolves IP but can't get any ping responce.
connect() and write() functions in my application give no error return but when it comes to read() the function returns with errno set to ECONNRESET (Connection reset by peer). The program uses standard socket functions (TCP protocol)
the ppp link is shown as running (ifconfig ppp0)
So, the situation that I have is: the link is good enough to maintain DNS resolving service (UDP is working?) but NOT good enough to run TCP connection and receive ping echoes...
The situation does not appear all the time. Sometimes the system can work normally for days without any problem. Whenever the problem appears, simple reset solves everything.
I know that the system we use is quite exotic, and the situation described here may be connected with some buggy tcp stack or pppd implementation. Considering that the system is preconfigured by the manufacturer I don't have any options to rebuild/change the OS firmware.
Still I hope that someone have seen the similar situation on any linux-like system. Is there any way to test why DNS name resolving is working while the other network stuff does not? Is it possible to remove such connection state with some pppd settings?
Edit:
First of all, I'd like to address the possibility of local caching of the IP addresses. I don't have dig utility and I have no idea how to check which host gives the result to getaddrinfo(). Still I'm sure that the addresses are not cached cause I'm trying to ping totally random URLs. Also given the slow GPRS response time it is not necessary to have the time measuring utility to see that ping takes 1-2 seconds or more to resolve IP before starting sending out packets. Furthermore ncsd, BIND or any dns servers do not run locally on the machine. I understand that you may not see that as proof, but that's what I have given the utility set available on my system.
I'd like to give some additional information concerning the internet connection operation.
Normal connection state
The rc script at system load runs another script as background process:
sh /etc/connect &
The connect script is as follows:
#!/bin/sh
echo First connect attempt > /etc/ppp/conn.info
while true
do
date >> /etc/ppp/conn.info
pppd call mts
echo Reconnecting... >> /etc/ppp/conn.info
done
The reason that I've made a loop here is simple: the connection persists for several hours and after that it always breaks. Unfortunately my implementation of pppd does not support the logfile option (so I can't see why is it broken). persist does not seem to work either so I've come to the connect script above. The pppd options are:
/dev/ttyM0 115200 crtscts
connect 'chat -f /etc/ppp/peers/mts.chat'
noauth
user mts
password mts
noipdefault
usepeerdns
defaultroute
ifconfig ppp0 gives:
ppp0 Link encap:Point-Point Protocol
inet addr:172.22.22.109 P-t-P:192.168.254.254 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:34 errors:0 dropped:0 overruns:0 frame:0
TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:3
RX bytes:3130 (3.0 KiB) TX bytes:2250 (2.1 KiB)
And thats where it starts getting strange. Whenever I connect I'm getting different inet addr but P-t-p is always the same: 192.168.254.254. This is the same address that appears in default gateway entry, as given by netstat -rn:
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.254.254 0.0.0.0 255.255.255.255 UH 0 0 0 ppp0
192.168.4.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
192.168.15.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.0.0 192.168.15.1 255.255.0.0 UG 0 0 0 eth0
0.0.0.0 192.168.254.254 0.0.0.0 UG 0 0 0 ppp0
route -Cevn is unavailable on my system, route gives the same info as above.
But I'm never able to ping the 192.168.254.254, not even when everything is working as intended: tcp connection, ping, DNS etc. Here is the result of traceroute:
traceroute to kernel.org (149.20.4.69), 30 hops max, 40 byte packets
1 172.16.4.210 (172.16.4.210) 528.765 ms 545.269 ms 616.67 ms
2 172.16.4.226 (172.16.4.226) 563.034 ms 526.176 ms 537.07 ms
3 10.250.85.161 (10.250.85.161) 572.805 ms 564.073 ms 556.766 ms
4 172.31.250.9 (172.31.250.9) 556.513 ms 563.383 ms 580.724 ms
5 172.31.250.10 (172.31.250.10) 518.15 ms 526.403 ms 537.574 ms
6 pub2.kernel.org (149.20.4.69) 538.058 ms 514.222 ms 538.575 ms
7 pub2.kernel.org (149.20.4.69) 537.531 ms 538.52 ms 537.556 ms
8 pub2.kernel.org (149.20.4.69) 568.695 ms 523.099 ms 570.983 ms
9 pub2.kernel.org (149.20.4.69) 526.511 ms 534.583 ms 537.994 ms
##### traceroute loops here - why?? #######
So, I can assume that 172.16.4.210 is peer's address. Such address is pingable in any case (see below). I have no idea why the structure of traceroute output is like this (packets come from internal network of ISP right to the destination, 'loop' at the destination address - it just should not be like this).
Also I would like to note that I can ping DNS server but traceroute does not go all the way up to it.
You may notice that there are eth0 and eth1 devices. They are irrelevant to the case. eth1 is not connected and eth0 is connected to lan without internet access.
Bad connection state
So, some time passes and the situation under question appears. I can't ping anything but DNS server (and peer, the address for which I get from traceroute result for the DNS) and cant communicate with remote host via tcp. DNS resolving is working
The network utilites give the same output as in normal state. I have the same unpingable peer (192.168.254.254 from ifconfig result), the routing table is the same:
# ifconfig ppp0
ppp0 Link encap:Point-Point Protocol
inet addr:172.22.22.109 P-t-P:192.168.254.254 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:297 errors:0 dropped:0 overruns:0 frame:0
TX packets:424 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:3
RX bytes:33706 (32.9 KiB) TX bytes:27451 (26.8 KiB)
# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.254.254 * 255.255.255.255 UH 0 0 0 ppp0
192.168.4.0 * 255.255.255.0 U 0 0 0 eth1
192.168.15.0 * 255.255.255.0 U 0 0 0 eth0
192.168.0.0 192.168.15.1 255.255.0.0 UG 0 0 0 eth0
default 192.168.254.254 0.0.0.0 UG 0 0 0 ppp0
Note that the original ppp connection (one which I used to provide the output from normal state) persisted. My /etc/connect script did not loop (there was no new record in a makeshift log the script makes).
Here goes the ping to DNS server:
# cat /etc/resolv.conf
#search moxa.com
nameserver 213.87.0.1
nameserver 213.87.1.1
# ping 213.87.0.1
PING 213.87.0.1 (213.87.0.1): 56 data bytes
64 bytes from 213.87.0.1: icmp_seq=0 ttl=59 time=559.8 ms
64 bytes from 213.87.0.1: icmp_seq=1 ttl=59 time=509.9 ms
64 bytes from 213.87.0.1: icmp_seq=2 ttl=59 time=559.8 ms
And traceroute:
# traceroute 213.87.0.1
traceroute to 213.87.0.1 (213.87.0.1), 30 hops max, 40 byte packets
1 172.16.4.210 (172.16.4.210) 542.449 ms 572.858 ms 595.681 ms
2 172.16.4.214 (172.16.4.214) 590.392 ms 565.887 ms 676.919 ms
3 * * *
4 217.8.237.62 (217.8.237.62) 603.1 ms 569.078 ms 553.723 ms
5 * * *
6 * * *
## and so on ###
*** lines may look like trouble but im getting the same traceroute for that DNS in normal situation
ping to 172.16.4.210 works fine as well.
Now to TCP. I've started a simple echo server on my PC and tried to connect via telnet to it (the actual ip address is not shown):
# telnet XXX.XXX.XXX.XXX 9060
Trying XXX.XXX.XXX.XXX(25635)...
Connected to XXX.XXX.XXX.XXX.
Escape character is '^]'.
aaabbbccc
Connection closed by foreign host.
So thats what happened here. Successfull connect() just like in my custom application is followed by Connection closed... when telnet called read(). The actual server did not receive any incoming connection. Why did 'connect()' return normally (it could not get the handshake response from the host!) is beyond my scope of knowledge.
Sure enough same telnet test works fine in normal state.
Note:
I did not publish this on serverfault cause of the embedded nature of my system. serverfault as far as I understand deals with more conventional systems (like x86s running 'normal' linux). I just hope that stackoverflow has more embedded experts who know such systems as my Moxa.
Q: How can I have DNS name resolving running while other protocols seem to be down?
A: Your local DNS resolver (bind is another possibility besides ncsd) might be caching the first response. dig will tell you where you are getting the response from:
[mpenning#Bucksnort ~]$ dig cisco.com
; <<>> DiG 9.6-ESV-R4 <<>> +all cisco.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22106
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
;; QUESTION SECTION:
;cisco.com. IN A
;; ANSWER SECTION:
cisco.com. 86367 IN A 198.133.219.25
;; AUTHORITY SECTION:
cisco.com. 86367 IN NS ns2.cisco.com.
cisco.com. 86367 IN NS ns1.cisco.com.
;; Query time: 1 msec <----------------------- 1msec is usually cached
;; SERVER: 127.0.0.1#53(127.0.0.1) <--------------- Answered by localhost
;; WHEN: Wed Dec 7 04:41:21 2011
;; MSG SIZE rcvd: 79
[mpenning#Bucksnort ~]$
If you are getting a very quick (low milliseconds) answer from 127.0.0.1, then it's very likely that you're getting a locally cached answer from a prior query of the same DNS name (and it's quite common for people to use caching DNS resolvers on a ppp connection to reduce connection time, as well as achieving a small load reduction on the ppp link).
If you suspect a cached answer, do a dig on some other DNS name to see whether it can resolve too.
If random DNS names continue resolution and you still cannot make a TCP connection to a certain host, this is worthy of noting when you edit the question after this investigation.
If random DNS names don't resolve, then this is indicative of something like the loss of your default route, or the ppp connection going down.
Other diagnostic information
If you find yourself in either of the last situations I described, you need to do some IP and ppp-level debugs before this can be isolated further. As someone mentioned, tcpdump is quite valuable at this point, but it sounds like you don't have it available.
I assume you are not making a TCP connection to the same IP address of your DNS server. There are many possibilities at this point... If you can still resolve random DNS names, but TCP connections are failing, it is possible that the problem you are seeing is on the other side of the ppp connection, that the kernel routing cache (which holds a little TCP state information like MSS) is getting messed up, you have too much packet loss for tcp, or any number of things.
Let's assume your topology is like this:
10.1.1.2/30 10.1.1.1/30
[ppp0] [pppX]
uCLinux----------------------AccessServer---->[To the reset of the network]
When you initiate your ppp connection, take note of your IP address and the address of your default gateway:
ip link show ppp0 # display the link status of your ppp0 intf (is it up?)
ip addr show ppp0 # display the IP address of your ppp0 interface
ip route show # display your routing table
route -Cevn # display the kernel's routing cache
Similar results can be found if you don't have the iproute2 package as part of your distro (iproute2 provides the ip utility):
ifconfig ppp0 # display link status and addresses on ppp0
netstat -rn # display routing table
route -Cevn # display kernel routing table
For those with the iproute2 utilities (which is almost everybody these days), ifconfig has been deprecated and replaced by the ip commands; however, if you have an older 2.2 or 2.4-based system you may still need to use ifconfig.
Troubleshooting steps:
When you start having the problem, first check whether you can ping the address of pppX on your access server.
If you can not ping the ip address of pppX on the other side, then it is highly unlikely your DNS is getting resolved by anything other than a cached response on your uCLinux machine.
If you can ping pppX, then try to ping the ip address of your TCP peer and the IP address of the DNS (if it is not on localhost). Unless there is a firewall involved, you must be able to ping it successfully for any of this to work.
If you can ping the ip address of pppX but you cannot ping your TCP peer's ip address, check your routing table to see whether your default route is still pointing out ppp0
If your default route points through ppp0, check whether you can still ping the ip address of the default route.
If you can ping your default route and you can ping the remote host that you're trying to connect to, check the kernel's routing cache for the IP address of the remote TCP host.... look for anything odd or suspicious
If you can ping the remote TCP host (and you need to do about 200 pings to be sure... tcp is sensitive to significant packet loss & GPRS is notoriously lossy), try making a successful telnet <remote_host> <remote_port>. If both are successful, then it's time to start looking inside your software for clues.
If you still can't untangle what is happening, please include the output of the aforementioned commands when you come back... as well as how you're starting the ppp connection.
Pings should never be part of an end-user application(see note), and no program should rely on ping to function. At best ping might tell us that a part of the TCP/IP stack was running on the remote. See my argument here.
What the OP describes as a problem doesn't seem to be a problem. All network connections fail, the resolver may or may not use the network, and ping isn't really helpful. I would guess that the OP can check that the modem is connected or not, and if it isn't connect again.
edit: Pseudo code
do until success
try
connect "foobar.com"
try
write data
read response
catch
not success
endtry
catch error
'modem down - reconnect
not success
end try
loop
Note: the exception would be if you are writing a network monitoring application for a networking person.

Resources