My code, which is written in C for the C Client binding for zookeeper, runs perfectly on my local computer using the same ip (not localhost:2181). However, compiling and executing my code on another computer yields with a connection loss error. I was not able to connect to my zookeeper server by using my public IP(I got my publicIP by looking up whatsmyip on google). I did an ifconfig on my terminal to get the 10.111.129.199. I am assuming this is a private IP as it starts with 10. The machine I have ssh'd to is running SolarisOS. This caused me to change a single function in zookeeper source code from synch_fetch_and_add (I think) to atomic_add because sync_fetch... is not supported by SolarisOS. According to ZooKeeper documentation, SolarisOS is not currently supported by Zookeeper. I am able to compile Zookeeper perfectly fine, and am told someone else in my company had implemented Zookeeper beforehand on our systems.
My program is trying to create a single node on the zookeeper server. My code looks like this:
int main(int argc, char *argv[]){
//zh is a global zookeeper_handle for now.
zh = zookeeper_init(host_port, my_watcher_func, 20000, 0, NULL, 0);
if(zh == NULL){
fprintf(stderr, "Error connecting to ZooKeeper Server!! \n");
exit(EXIT_FAILURE);
return 0;
}
int retval = create("/TFS/pool" , "1");
printf("return value of create = %d\n", retval);
}
int create(char* path, char* data){
int value_length= -1;
if(data != NULL){
value_length = (int) strlen(data);
}
printf("creating node at path: %s with data %s\n", path, data);
int retval = zoo_create( zh, path, data, value_length,
&ZOO_OPEN_ACL_UNSAFE, 0, 0, 0);
return retval;
}
/*empty watcher function*/
//I have no idea why this is needed.
void my_watcher_func(zhandle_t *zzh, int type, int state,
const char *path, void *watcherCtx) {}
Both systems are running GCC compiler. The problem, I think, isn't in the code as it runs fine locally, but the connection issue I am facing.
I would assume that zh would return 0 if the connection to the zookeeper was a failure from the zookeeper_init() function. This however does not happen and continues to the create().
creating node at path: /TFS/pool with data abc
2018-07-16
10:30:44,232:16332(0x2):ZOO_ERROR#handle_socket_error_msg#1670: Socket
[10.111.129.190:2181] zk retcode=-4, errno=0(Error 0): connect() call
failed
return value of create = -4
When I telnet to the ip:port it will connect. I also know that zookeeper detects my connection during telnet because I am running it in the foreground. The following is the output of zkServer.sh running in foreground when I connect via telnet 10.111.129.190 2181
2018-07-16 11:04:03,807 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] -
Accepted socket connection from /10.7.1.70:61479
The expected output should have been:
creating node at path: /TFS/pool with data 1
2018-07-16 12:14:37,078:3180(0x70000c98c000):ZOO_INFO#check_events#1764: initiated connection to server [10.111.129.190:10101]
2018-07-16 12:14:37,107:3180(0x70000c98c000):ZOO_INFO#check_events#1811: session establishment complete on server [10.111.129.190:10101], sessionId=0x10000590d2b0000, negotiated timeout=20000
return value of create = 0
This output has always confused me as zookeeper connection is established after the zookeeper_handle is initiated. It is established upon zoo_create() instead of zookeeper_init. Doesn't effect anything, but just an interesting time to establish a connection.
I understand that retcode=-4 means CONNECTIONLOSS, but it's not even able to establish a connection with the server. If there is anyway I could fix this please do tell!.
Related
A question on the wallclock time of socket communication.
I am having a function, which finds the servers registered at a central server.
I am adding a layer of network check over this function by extracting the URL and port number of the servers and trying to connect to them by behaving like a simple TCP client.
If the return value is greater than 0, then it means that the network is working fine; if -1, then the network is broken.
printf("--Checking for network connectivity--\n");
for(size_t i = 0; i < serverOnNetworkSize; i++) {
UA_ServerOnNetwork *server = &serverOnNetwork[i];
A[i] = (char *)UA_malloc(server->discoveryUrl.length+1);
memcpy(A[i],server->discoveryUrl.data,server->discoveryUrl.length);
A[i][server->discoveryUrl.length] = 0;
int length = strlen(A[i]);
//discovery URLs are of the form : opc.tcp://hostname:port
//new addition to extract port
B[i] = A[i] + 10;
//printf("Hostname: %s\n", B[i]);
char *p = strrchr(B[i], ':');
int port = strtoul(p+1, NULL, 10);
//printf("%d\n",port);
B[i][length-5]='\0';
//printf("Hostname: %s\n", B[i]);
//removing the port
A[i][length-5]='\0';
//without initial tcp binding
C[i] = A[i] + 10;
//printf("Hostname: %s\n", C[i]);
// FIND IP OF THAT HOST
if(i!=0){
char ip_address[50];
find_ip_address(C[i],ip_address);
socketCommunication(ip_address,C[i],port);
}
}
printf("--Checks done!--\n");
Global Funcitons:
int find_ip_address(char *hostname, char *ip_address)
{
struct hostent *host_name;
struct in_addr **ipaddress;
int count;
if((host_name = gethostbyname(hostname)) == NULL)
{
herror("\nIP Address Not Found\n");
return 1;
}
else
{
ipaddress = (struct in_addr **) host_name->h_addr_list;
for(count = 0; ipaddress[count] != NULL; count++)
{
strcpy(ip_address, inet_ntoa(*ipaddress[count]));
return 0;
}
}
return 1;
}
void socketCommunication(char *ip_address,char *hostname, int port){
int clientSocket,ret;
struct sockaddr_in serverAddr;
char buffer[1024];
clientSocket = socket(AF_INET,SOCK_STREAM,0);
if(clientSocket<0){
printf("Error in connection \n");
exit(1);
}
//printf("Client socket is created\n");
memset(&serverAddr,'\0',sizeof(serverAddr));
serverAddr.sin_port = htons(port);
serverAddr.sin_family=AF_INET;
serverAddr.sin_addr.s_addr=inet_addr(ip_address);
ret = connect(clientSocket,(struct sockaddr*)&serverAddr,sizeof(serverAddr));
if(ret<0){
printf("\nLOOKS LIKE NETWORK CONNECTION HAS FAILED. HAVE A LOOK AT THE NETWORK CONNECTIVITY at host : %s\n",hostname);
printf("\n----Updated Status Information----:\n");
printf("Discovery URL : opc.tcp://%s:%d\n",hostname,port);
printf("Status:CONNECTON TIMED OUT\n");
printf("\n");
}
To test this, I switch off the network from one of the registered servers.
When I measure the time, it shows inconsistent values of 18seconds,24,38 seconds etc.
These values occur when I switch the network of the server and run my application. On a second run of the same application, the value reduces to 2seconds or 1 second sometimes.
Output:
LOOKS LIKE NETWORK CONNECTION HAS FAILED. HAVE A LOOK AT THE NETWORK CONNECTIVITY at host : o755-gksr
----Updated Status Information----:
Discovery URL : opc.tcp://o755-gksr:4841
Status:CONNECTON TIMED OUT
--Checks done!--
Time measured: 18 seconds.
Output on another try
--Checking for network connectivity--
LOOKS LIKE NETWORK CONNECTION HAS FAILED. HAVE A LOOK AT THE NETWORK CONNECTIVITY at host : o755-gksr
----Updated Status Information----:
Discovery URL : opc.tcp://o755-gksr:4841
Status:CONNECTON TIMED OUT
--Checks done!--
Time measured: 0 seconds.
My question is : Why does it show inconsistent values? If the connection is not possible, should it not return -1 and show the error quickly?
Is there any background process, which tries to establish the connection for a finite number of times before coming to a halt?
Please let me know.
Regards,
Rakshan
The connect() behavior and its timeouts highly depends on underlying network. There are more reasons why connect() fails when the target machine is down. Errors in most cases are:
ETIMEDOUT - it means the client sent SYNs but it does not receive any response at all. It is a TCP timeout and can be quite long (minutes).
EHOSTUNREACH - it means local ARP query failed or the client sent SYN and ICMP error Host Unreachable was received. ARP query failure is detected in a few seconds. ICMP error Host Unreachable is usually returned by a remote router when its ARP query fails.
So what happen in your case if the server is in the same network as your client :
The client has server's MAC address in its ARP cache.
You "switch off the network from one of the registered servers.". You probably disconnect a cable from the server or something like that.
The client calls connect. SYN is sent directly to the MAC address from the ARP cache and in worst case the connect returns with ETIMEDOUT after two minutes.
Client delete the entry in the ARP cache.
Subsequent connect needs ARP resolution. Either it fails after 3 ARP request (3 seconds) or it fails immediately if the negative entry in ARP cache is valid. It may be valid for a few seconds only.
If the server is in remote network then the situation is similar. The ARP cache of the remote router is guilty in this case. If the remote router cannot resolve IP address to MAC address then it send ICMP Host Unreachable almost immediately but if the remote router still has the destination IP in its ARP cache it takes some time than it realizes the cache entry is obsolete and MAC address is not available.
I'm creating a C library that manage a lot of pheripherical of my embedded device. The S.O. used, is a linux dristro compiled with yocto. I'm trying to make some functions to connect my device to a wifi (well-know) router, with netlink (using the libnl commands). With the help of this community i've developed a function able to scan the routers in the area link here . Some of you know how to use the libnl command for connect my device to a router wifi?
I've developed the following code, that try to connect to an AP called "Validator_Test" (that have no password for authentication). The software return no error, but my device still remain disconneted from the ap.
static int iw_conn() {
struct nl_msg *msg = nlmsg_alloc();
int if_index = if_nametoindex("wlan0"); // Use this wireless interface for scanning.
// Open socket to kernel.
struct nl_sock *socket = nl_socket_alloc(); // Allocate new netlink socket in memory.
genl_connect(socket); // Create file descriptor and bind socket.
int driver_id = genl_ctrl_resolve(socket, "nl80211"); // Find the nl80211 driver ID.
genlmsg_put(msg, 0, 0, driver_id, 0, (NLM_F_REQUEST | NLM_F_ACK), NL80211_CMD_CONNECT, 0);
nla_put_u32(msg, NL80211_ATTR_IFINDEX, if_index); // Add message attribute, which interface to use.
nla_put(msg, NL80211_ATTR_SSID, strlen("Validator_Test"), "Validator_Test");
nla_put(msg, NL80211_ATTR_MAC, strlen("00:1e:42:21:e4:e9"), "00:1e:42:21:e4:e9");
int ret = nl_send_auto_complete(socket, msg); // Send the message.
printf("NL80211_CMD_CONNECT sent %d bytes to the kernel.\n", ret);
ret = nl_recvmsgs_default(socket); // Retrieve the kernel's answer. callback_dump() prints SSIDs to stdout.
nlmsg_free(msg);
if (ret < 0) {
printf("ERROR: nl_recvmsgs_default() returned %d (%s).\n", ret, nl_geterror(-ret));
return ret;
}
nla_put_failure:
return -ENOSPC;
}
It seems similar to this one:
How to use libnl and netlink socket for connect devices to AP programatically?
--
Thanks for the code.
Based on your code, I modified and did the test here; it works. The source code is at: https://github.com/neojou/nl80211/blob/master/test_connect/src/test_connect_nl80211.c
Some suggestions for this:
Make sure the test environment is correct
Before test the code, maybe you can try to use iw to do the test. iw is the open source tool, which uses netlink also. you can type "sudo iw wlan0 connect Validator_Test" and then use iwconfig to see if it is connected or not first. ( Suppose there is no security setting at the AP as you said )
there are two differences between your source code and mine
(1) don't need to set NL80211_ATTR_MAC
(2) ret = nl_recvmsgs_default(socket);
not sure if there is any judgement of the return value of your ap_conn(), but it seems better to return 0 in ap_conn(), when nl_recvmsgs_default() returns 0.
I'm creating a C library that manages a lot of peripherical of my embedded device. The S.O. used, is a Linux distro compiled with yocto. I'm trying to make some functions to connect my device to wifi (well-know) router, with netlink (using the libnl commands). With the help of this community, I've developed a function able to scan the routers in the area. Some of you know how to use the libnl command to connecting my device to router wifi?
I've developed the following code, that tries to connect to an AP called "Validator_Test" (that have no authentication password). The software return no error, but my device still remain disconnected from the AP. Some of you know what is wrong in my code? Unfortunately, i've not found any example or documentation for this operation.
static int ap_conn() {
struct nl_msg *msg = nlmsg_alloc();
int if_index = if_nametoindex("wlan0"); // Use this wireless interface for scanning.
// Open socket to kernel.
struct nl_sock *socket = nl_socket_alloc(); // Allocate new netlink socket in memory.
genl_connect(socket); // Create file descriptor and bind socket.
int driver_id = genl_ctrl_resolve(socket, "nl80211"); // Find the nl80211 driver ID.
genlmsg_put(msg, 0, 0, driver_id, 0, (NLM_F_REQUEST | NLM_F_ACK), NL80211_CMD_CONNECT, 0);
nla_put_u32(msg, NL80211_ATTR_IFINDEX, if_index); // Add message attribute, which interface to use.
nla_put(msg, NL80211_ATTR_SSID, strlen("Validator_Test"), "Validator_Test");
nla_put(msg, NL80211_ATTR_MAC, strlen("00:1e:42:21:e4:e9"), "00:1e:42:21:e4:e9");
int ret = nl_send_auto_complete(socket, msg); // Send the message.
printf("NL80211_CMD_CONNECT sent %d bytes to the kernel.\n", ret);
ret = nl_recvmsgs_default(socket); // Retrieve the kernel's answer. callback_dump() prints SSIDs to stdout.
nlmsg_free(msg);
if (ret < 0) {
printf("ERROR: nl_recvmsgs_default() returned %d (%s).\n", ret, nl_geterror(-ret));
return ret;
}
nla_put_failure:
return -ENOSPC;
}
Thanks to all of you!
Thanks for the code.
Based on your code, I modified and did the test here; it works.
The source code is at:
https://github.com/neojou/nl80211/blob/master/test_connect/src/test_connect_nl80211.c
Some suggestions for this:
Make sure the test environment is correct
Before test the code, maybe you can try to use iw to do the test.
iw is the open source tool, which uses netlink also.
you can type "sudo iw wlan0 connect Validator_Test"
and then use iwconfig to see if it is connected or not first.
( Suppose there is no security setting at the AP as you said )
there are two differences between your source code and mine
(1) don't need to set NL80211_ATTR_MAC
(2) ret = nl_recvmsgs_default(socket);
not sure if there is any judgement of the return value of your ap_conn(),
but it seems better to return 0 in ap_conn(), when nl_recvmsgs_default() returns 0.
I have two simple programs: a client and a server. I'm trying to use zstr_sendfm and zstr_recv to send and receive a simple string. Roughly speaking, I'm using the code from the file transfer test in the zeromq tutorial. Here's the server function:
#define PIPELINE = 10;
int server()
{
char *name = "someName";
zctx_t *ctx = zctx_new();
void *router = zsocket_new(ctx, ZMQ_ROUTER);
zsocket_set_hwm(router, PIPELINE*2);
if (0 == zsocket_connect(router, tcp://127.0.0.1:6000))
{
printf("failed to connect to router.\n");
}
printf( "sending name %s\n, name);
zstr_sendfm( router, name );
return 0;
}
Here's the client function:
int client()
{
zctx_t *ctx = zctx_new ();
void *dealer = zsocket_new (ctx, ZMQ_DEALER);
zsocket_bind(dealer, "tcp://*:6000")
char *receivedName = zstr_recv( dealer );
printf("received the following name: %s\n", receivedName);
return 0
}
Both of these are run in two separate programs (which do nothing other than run their respective functions) on the same computer.
Here's how things always play out:
Start client function, which holds at "zstr_recv" as it's supposed to
Start server function, which connects successfully, claims to have sent the data, and exits
Client function continues to sit and wait, but claims to have not received anything from the server.
What am I missing here? I've added a bunch of error checking and even tried this out in gdb with no luck.
Help and advice appreciated.
I think you have your client and server mixed up, although in ZeroMQ client and server is not as strict as with normal sockets. Normally you would create a server with a REP socket that binds/receives/sends and a client with a REQ socket that connects/sends/receives. You should try this first and then experiment with ROUTER for the server (instead of REP) and DEALER for the client (instead of REQ).
I have 3 programs running. A client, a main server and a backup server. I want to somehow determine if the main server is up (did it crash) so that if not I can send the message to the backup. I have tried if(send(....) >= 0){....} that obviously didn't work, any other ideas?
From your client, you need to continuously try to read data from the server.
Something like this :
Incase you are using Linux based server/client.
while ( (n = read(socket_fd, recvBuffer, sizeof(recvBuffer)-1)) > 0)
{
recvBuffer[n] = 0;
if(fputs(recvBuffer, stdout) == EOF)
{
printf("\n Error : error in Fputs\n");
}
}
You can create this socket_fd using connect and passing the server address.