In DolphinDB, why can't controller node find agent node? - database

I'm configuring a multi-physical node cluster deployment.
1. The controller node is on linux debian of Oracle virtual machine( bridge mode)
2. The agent node and data nodes are on the windows of the host machine.
3. Agent node and controller node can ping each other, but why can't the controller node find the agent node?
Agent node log:: HeartBeatsSender exception: Failed to read response header from the socket with IO error type
Agent node log:: Failed to enable TCP_NODELAY with error code 10038
data node log: AsynchronousRemoteExecutor::closeConnection to master #10 numConnections=0 Failed to connect
data node log: : close connection to master #10 with error: Failed to connect
Any suggestions will be appreciated.

why not deploy DolphinDB on linux os. there is no such case for deploying across different os.

Related

when using tdenigne,database not ready

Bug Description
Database not ready .
To Reproduce
Steps to reproduce the behavior:
use k8s deploy 3 node follow doc
config 3 Mnode, everyone is Mnode
sometimes node happen error and the taos-check is failed, so it always restart
in the time, first node sync failed, because the second and three is
failed to start
cannot but reomve taos-check andredeploy finally, DND ERROR
failed to send status msg since Databasenot ready, need retry,
numOfEps:3 inUse:0
DND ERROR failed to send status msg since Database not ready, need retry, numOfEps:3 inUse:0
Environment:
OS: k8s
Memory, CPU, current Disk Space: 4C8G
TDengine Version : 3.0.16

unable to connect to corosync-Job for corosync.service failed because the control process exited with error code

trying to start pacemaker cluster. I got an error in staring nodes.
nodes configuration : ubuntu 18.04
mssql-server 2017 edition
2 nodes are added into /etc/hosts as
localhost node1
192.168.43.64 node2
on running sudo pcs cluster start --all
I get an error : unable to start corosync
I followed this link exactly.
As above link i executed pcs cluster destroy.
I tried to create log files into /var/log/cluster/ folder as said in link. It didn't work.
part of my corosync.conf file contains:
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
}
and error I get at the end:
Response Code: 400
--Debug Response Start-- Starting Cluster... Job for corosync.service failed because the control process exited with error code. See
"systemctl status corosync.service" and "journalctl -xe" for details.
Error: unable to start corosync
node1: Error connecting to node1 - (HTTP error: 400) node2: Error
connecting to node2 - (HTTP error: 400) Error: unable to start all
nodes node1: Error connecting to node1 - (HTTP error: 400) node2:
Error connecting to node2 - (HTTP error: 400)
I need both the nodes to be connected.
Any help I get would be great and thanks in advance

Task Manager not able to connect to Job Manager

I'm trying to upgrade our Flink cluster from 1.4.2 to 1.7.2
When I bring up the cluster, the task managers refuse to connect to the job managers with the following error.
2019-03-14 10:34:41,551 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#cluster:22671] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#cluster:22671]] Caused by: [cluster: Name or service not known]
Now, this works correctly if I add the following line into the /etc/hosts file.
x.x.x.x job-manager-address.com cluster
Why is Flink 1.7.2 connecting to JM using cluster in the address? Flink 1.4.2 used to have the job manager's address instead of the word cluster.
The jobmanager.sh script was being invoked with a second argument called cluster.
${Flink_HOME}/bin/jobmanager.sh start cluster
Prior to 1.5, the script expected an execution mode (local or cluster) but this is no longer the case. Invoking the script without the second argument solved this issue.
${Flink_HOME}/bin/jobmanager.sh start
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-7-2-Task-Manager-not-able-to-connect-to-Job-Manager-td26707.html
https://github.com/apache/flink/commit/d61664ca64bcb82c4e8ddf03a2ed38fe8edafa98
https://github.com/apache/flink/blob/c6878aca6c5aeee46581b4d6744b31049db9de95/flink-dist/src/main/flink-bin/bin/jobmanager.sh#L21-L25

Node mssql module error on AWS Lambda

I am facing an issue with the mssql node module, it is working very well in the local environment but when I deployed the code base into AWS lambda and from there if I execute the function it is failing for some reason below is the error.
START RequestId: 5ea5fc6b-9d45-11e7-9dca-ffdf71e8f823
Version: $LATEST2017-09-19T14:18:28.733Z 5ea5fc6b-9d45-11e7-9dca-ffdf71e8f823
SQL connection error { ConnectionError: Failed to connect to 10.71.12.16:49001 in 15000ms
at Connection.tedious.once.err
Note: even in local it should be in the allocated network to get the response.
Any help would be appreciated.

Selenium grid on Virtual XP machine - connection error

I am running my tests against FF 6.0 on virtual Windows XP machine. The winner-line
java -jar selenium-server-standalone-2.5.0.jar -role webdriver -hub http://192.168.1.149:4444/grid/register -port 5558 -host 10.0.2.15 -browser "browserName=firefox, version=6, platform=WINDOWS"
Gives me the following result:
17:53:22.667 INFO - Started SocketListener on 0.0.0.0:5559
17:53:22.667 INFO - Started org.openqa.jetty.jetty.Server#5dccce3c
17:53:22.668 INFO - using the json request : {"class":"org.openqa.grid.common.RegistrationRequest","capabilities":[{" version":"6","browserName":"firefox"," platform":"WINDOWS"}],"configuration":{"port":5559,"host":"192.168.1.135","hubHost":"192.168.1.149","registerCycle":5000,"hub":"http://192.168.1.149:4444/grid/register","url":"http://10.0.2.15:5559/wd/hub","register":true,"singleWindow":"-role","proxy":"org.openqa.grid.selenium.proxy.WebDriverRemoteProxy","maxSession":5,"browser":"browserName=firefox, version=6, platform=WINDOWS","role":"webdriver","hubPort":4444}}
17:53:22.669 INFO - starting auto register thread. Will try to register every 5000 ms.
17:53:22.669 INFO - Registering the node to hub :http://192.168.1.149:4444/grid/register
Unfortunately, it never finishes by
17:53:25.486 INFO - Executing: org.openqa.selenium.remote.server.handler.Status#43a6684f at URL: /status)
17:53:25.488 INFO - Done: /status
It sort of hangs up on the word "register".
Consequently, when running the tests I get an error:
** Erubis 2.6.6
Loaded suite test/selenium/website_smoke_tests
Started
E
Finished in 21.022798 seconds.
1) Error:
test_top_page(WebsiteSmokeTest):
Errno::ETIMEDOUT: Connection timed out - connect(2)
The node is visible on http://192.168.1.149:4444/grid/console.
The problem is solved. It was the error of my connection. Have never gotten the response from Hub to Node on the VM. My bad.

Resources