Sockets & Data Persistence

Sockets & Data Persistence - c

This is potentially a newbie question, but if i open and write some data to a socket, then exit the subroutine so the socket goes out of scope, and then try and read the data from another program, at a later time, will the data still be there or does it die when the original declarations go out of scope ?
Thanks,
N.
Further information :
I am trying to rewrite 2 programs that use files as the interface to communicate. The general flow is :
Main Process : Write Data.
Main Process : Spawn secondary process(es) onto other nodes in a cluster
Main Process : Wait until Secondary Process finished.
Secondary Process : Read Data (written by main)
Secondary Process : Write Data
Secondary Process : exit
Main Process : Read data.
So i essentially want to replace the Write/Read/Write/Read of files with sockets (which should be much faster!)

For TCP sockets you need a bi-directional connection opened before sending data, so the question is irrelevant if you don't have a receiving side.
For UDP, if no one is listening on the socket at the time you're sending data, no one will receive it unless you manage to open a listening program fast enough for the data to be still traveling inside the networking drivers. But don't count on it, because the 'localhost loopback' inside the driver shouldn't take more than a few microseconds to deliver the data.
P.S. Perhaps you can get a more suitable answer if you describe your exact situation in more detail. What are you trying to achieve?
Regarding your "further information". You can't do this with sockets by simple replacing the files with sockets and keeping the current scheme. However, you can try to change the scheme by first spawning the child processes and only then send them the data via sockets. When the children finish, they return an answer to the parent via a socket, and exit.
There's an inefficiency here in a sense, because you have to send the same data to each child separately (unless you can use multicasting).
I'm not sure sockets will be much faster than files for you, but they will certainly be safer for more complex scheme and will also allow distribution among machines that don't share a file-system.

When using a raw socket, if there isn't another endpoint available (connected) at the time that you write the data, the data will be lost. The only way that you could actually write the data without first having connected to the other endpoint would be to use UDP, in which case the data would simply be flushed by the receiving system if no matching endpoint is available.
If you want to have asynchronous delivery you will need to use a message passing system that allows delayed delivery. In this case, the receiver of the message is actually a system process that stores the message until a client requests it. The actual communication takes place between a client on one system and the system process on the other, with the client on the other system obtaining the data locally. You can read more about message passing and its variants at http://en.wikipedia.org/wiki/Message_passing.

Related

Sending a message to a number of connection-oriented sockets without using loops

This is a question that appeared in one past exam of my college and I am not at all understanding how to do it. It involves reading data from a bunch of processes which are connected to the central process via different kinds of IPC mechanisms( pipes, FIFO, one is used with popen).The central process is also listening to two connection-oriented sfds. The main challenge of this question is to send the data read from the former processes to ALL the clients using send() system call only once and that too without loops.
The second pitfall in this question in this question is the echo server E. The central process passes one client to E after getting a signal from another process P4. Afterwards that client is served by E.
The problem statement of the question is very complicated, that's why I could not explain it with clarity. So, I am providing the pic of the actual question,please go through it for clarity:
I have two main questions:
How to send a message to all recipients using connection oriented sockets?
How to pass a file descriptor to other (unrelated) process.

Socket Server with unknown number of receives in loop

thank you for reading. I'm currently implementing both the server and client for a socket server in C using linux. Currently i have a working "chat" system where both the server and the socket can send unique messages and the other end would receive that message with the correct length.
example output:
Server side
You:Hello!
client:hi, how are you?
You: fine thanks.
client: blabla
..And the client side would look be as follows:
server: Hello!
you:hi,how are you?
etc etc.
My question is, is there any way for the client/server to be able to send multiple messages before the other replies?
I currently have an endless while loop that waits for a receive and then proceeds to send, and this will repeat until the connection is lost. Using this method i can only send one message before i am forced to wait for a receive. I'm not sure of the correct implementation as I'm still quite new to both sockets and C! Thanks :)

Yes it could be possible.
The main body of your code, does not wait on socket for data. It reads the socket if data is already on it. It is possinle by using select function. After the select call, it reads the socket to display the received messages and sends user messages to other peer if there are ready on input.

A generic solution: You must use threading, and i'd propose to run the receiving part in a separate thread.
Hence, you first code the main thread to only manage sending, just as if the application couldn't receive at all. Apparently you have an edit field somewhere (and a messgae loop somehow). Each time the user presses Enter, you Send from within the Edit field's callback function.
Then you code a separate thread, that calls (and hangs on, blocks on) Receive(). Each time Receive "slips on" (ie. data came in), you do something with the data and then jump back to the Receive entry point. This goes on until you terminate the socket, or by other means decide to in fact not jump back to the Receive entry point.
The only situation where the two threads "touch" each other is when they both want to write text content to the same chat window. Both shall do it immediately as the transmission happens, but potentially both may try to access the chat window at exactly the same moment, causing a crash. Hence you muct apply a locking mechanism here; the one that first tries to access the chat window "gets it", while the locking mechanism keeps the other one on hold until the first releases the lock. Then the second one can do it's job. The locking is after all only a matter of microseconds.
These are immediate actions, free from each other. You don't need to que multiple messages; each one gets processed "as it happens".

Synchronization issues when two threads try to write to same tcp socket simultaneouslu

I have 20 threads all sending data on single tcp socket at a time and receiving data as well. When I run my application I don’t see any synchronization issues but according to my understanding, there may be some issues when two threads simultaneously try to write to tcp socket or when one thread is writing while other is reading.
If my understanding is correct, why don’t I face any errors?

Sometimes when you don't look both ways before crossing the street, you still get to the other side of the street safely. That doesn't mean it will work successfully every time you do it.
Here's the thing, you say "you don't see any synchronization issues", but that's only because it happens to do what you want it to do. Flip this around -- the reason you don't see any synchronization issues is because you happen to want it to do what it happens to do. Someone who expected it to do something else would see synchronization issues with the very same code.
In other words, you flipped a coin that could come up heads or tails. You expected it to come up heads, knowing that it was not guaranteed. And it came up heads. There's no mystery -- the explanation is that you expected what it happened to do. Had you expected something else, even had it done the very same thing, it would not have done what you expected.

First, the send and receive streams of each socket are independent. There should be no problem with one thread sending while another is receiving.
If multiple threads attempt to write to one socket, the behaviour is, in general, undefined. In practice, a write call from one of the threads will get into a lock in the TCP stack state-machine first, preventing any other threads from entering, write its data, release the lock and exit the stack, so allowing write calls from other threads to proceed. The would allow single write calls to be serialized. If your protocol implementation can send all PDU's with one write call, then fine. If a PDU requires more than one write call, then your outgoing PDU's can get sliced as the write calls from the multiple threads get interleaved.
Making receive calls from multiple threads to one socket is just... something. Even if the stack internal synchro allows only one receive call per socket at a time, the streaming nature of TCP would surely split up the received data in a pseudo-arbitrary manner across the threads. Just don't do it, it's crazy.
TCP already has a mechanism for multiplexing data streams - multiple sockets. You should use them correctly.
If you need to multiplex data streams across one socket, you should add a data routing protocol on top of TCP and implement this protocol in just one receive thread. This thread can keep a list of virtual connections and so service stream/message requests from other threads.

The most efficient way to manage multiple socket(maximum 50 sockets.) in a single process?

I'm trying to implement Bittorrent client. in order to receive pieces from different peers, The client should manage multiple socket.
Well-known solution that I know are
1. Each thread has one socket.
2. Using select() call, non-blocking I/O.
3. a mix of 1 and 2.
The first solution requires too many threads. The second solution wastes CPU time since it continue to checks maximum 50 socket. Also, when deciding to use the third solution, I don't know how many threads a single process use.
Which solution is the best one, to receive a fairly large file?
Is there any web page that give me a good solution?
Any advice would be awesome.

Some High Level Ideas from my side. : )
Have a main thread in which you will be doing the "select" / "poll" call for all the connections.
Have a thread pool of worker threads
If for a particular connection, select indicates that there is data to read, then pass the socket + additional information to one of the free worker threads for receiving / sending data on that connection.
Upon completion of the work, the worker thread returns to the free worker thread queue, which can be used again for another connection.
Hope this helps

You're right, the first solution is the worst.
The second one, with select() can do the job, but there's a problem: select() has a complexity of log(n). You should use /dev/poll, epoll(), kqueue() or whatever, but don't use select().
Don't use one thread per socket !! You will loose a lot of time due to the context switch.
You should have:
A Listener thread : just do all the accept and put the new socket
in a Worker thread.
Multiple Worker thread: do all the other stuff. It will check if there's data available and will handle it. A Worker thread manage many sockets.
Take a look at the Kegel's c10k page if you want more informations.

Check some Open Source BitTorrent client and check the code to get some ideas, it is the best thing you could do.
I recommend you to check BitTorrent in C or Hadouken in C# for example:
https://github.com/bittorrent
https://github.com/hadouken/hdkn

UDP Multiple sockets Receive data and process efficiently - C & Linux

I have to receive data from 15 different clients each of them sending on 5 different ports. totally 15 *5 sockets.
for each client port no is defined and fixed. example client 1 ,ports 3001 to 3005. client 2 ,ports 3051 to 3055 etc. They have one thing in common say first port (3001 , 3051) is used to send commands. other ports send some data.
After receiving the data i have to check for checksum. keep track of recvd packets, Re request the packet if lost and also have to write to files on hard disk.
Restriction I cannot change the above design and i cannot change from UDP to TCP.
The two methods i'm aware of after reading are
asynchronous multiplexing using select().
Thread per socket.
I tried the first one and i'm stuck at the point when i get the data. I'm able to receive data. I have some processing to do so i want to start a thread for each socket (or) for sockets to handle (say all first ports, all second, etc ..i.e.3001,3051 etc)
But here if client sends any data then FD_ISSET becomes true , so if i start a thread ,then it becomes thread for every message.
Question:
How to add thread code here, Say if i include pthread_create inside if(FD_ISSET .. ) then for every message that i receive i create a thread. But i wanted a thread per socket.
while(1)
{
int nready=0;
read_set = active_set;
if((nready = select(fdmax+1,&read_set,NULL,NULL,NULL)) == -1)
{
printf("Select Errpr\n");
perror("select");
exit(EXIT_FAILURE);
}
printf("number of ready desc=%d\n",nready);
for(index=1;index <= 15*5;index++)
{
if(FD_ISSET(sock_fd[index],&read_fd_set))
{
rc = recvfrom(sock_fd[index],clientmsgInfo,MSG_SIZE,0,
(struct sockaddr *)&client_sockaddr_in,
&sockaddr_in_length);
if(rc < 0)
printf("socket %d down\n",sock_fd[index]);
printf("Recieved packet from %s: %d\nData: %s\n\n", inet_ntoa(client_sockaddr_in.sin_addr), ntohs(client_sockaddr_in.sin_port), recv_client_message);
}
} //for
} //while

create the threads at the startup of program and divide them to handle data, commmands e.t.c.
how?
1. lets say you created 2 threads, one for data and another for the commands.
2. make them sleep in the thread handler or let them wait on a lock that the main thread
acquired, seems to be that mainthread got two locks one for each of them.
3. when any client data or command that got into the recvfrom at mainthread, depending on the
type of the buffer(data, commands), copy the buffer into the shared data by mainthread and
other threads and unlock the mutex.
4. at threads lock the mutex so that mainthread wont' corrupt the data and once processing is
done at the threads unlock and sleep.
The better one would be to have a queue, that fills up by main thread and can be accessed element wise by the other threads.

I assume that each client context is independent of the others, ie. one client socket group can be managed on its own, and the data pulled from the sockets can be processed alone.
You express two possibilities of handling the problem:
Asynchronous multiplexing: in this setting, the sockets are all managed by one single thread. This threads selects which socket must be read next, and pulls data out of it
Thread per socket: in this scenario, you have as many threads as there are sockets, or more probably group of sockets, ie. clients - this the interpretation I will build from.
In both cases, threads must keep ownership of their respective resources, meaning sockets. If you start moving sockets around between threads, you will make things more difficult that it needs to be.
Outside the work that needs to be done, you will need to handle thread management:
How do threads get started?
How and when are they stopped?
What are the error handling policies?
Your question doesn't cover these issues, but they might play a significant role in your final design.
Scenario (2) seems simpler: you have one main "template" (I use the word in a general meaning here) for handling a group of sockets using select on them, and in the same thread receive and process the data. It's quite straightforward to implement, with a struct to contain the context specific data (socket ports, pointer to function for packet processing), and a single function looping on select and process, plus perhaps some other checks for errors and thread life management.
Scenario (1) requires a different setup: one I/O thread reads all the packets and pass them on to specialized worker threads to do the processing. If processing error occurs, worker threads will have to generate the adhoc packet to be sent to the client, and pass it to the I/O thread for sending. You will need packet queues both ways to allow communication between I/O and workers, and have the I/O thread check the worker queues somehow for resend requests. So this solution is a bit more expensive in terms of developement, but reduce the I/O contention to one single point. It's also more flexible, in case some processing must be done against data coming from several clients, or if you want to chain up processing somehow. For instance, you could have instead one thread per client socket, and then one other thread per client group of socket further down the work pipeline, with each step of the pipeline interconnected by packet queue.
A blend of both solution can of course be implemented, with one IO thread per client, and pipelined worker threads.
The advantage of both outlined solutions is the fixed number of threads: no need to spawn and destroy threads on demand (although you could design a thread pool to handle that as well).
For a solution involving moving sockets between threads, the questions are:
When should these resources be passed on? What happens after a worker thread has read a packet? Should it return the socket to the IO thread, or risk a blocking read on the socket for the next packet? If it does a select to poll the socket for more packets, we fall in scenario (2), where each client will has its own I/O thread when there is network trafic from all of them, in which case what is the gain of the initial I/O thread doing the select?
If it passes the socket back, should the IO thread wait for all workers to give back their socket before initiating another select? If it waits, it takes the risk of making unserved client wait for packets already in the network buffers, inducing processing lag. If it does not wait, and return to select to avoid lag on unserved sockets, then the served ones will have to wait for the next wake up to see their sockets back in the select pool.
As you can see, the problem is difficult to handle. That's the reason why I recommend exclusive sockets ownership by threads as described in scenarii (1) and (2).

Your solution requires a fixed, relatively small, number of connections.
Create a help procedure that creates thread procedures that listen to each of the five ports and block on the recvfrom(), process the data, and block again. You can then call the helper 15 times to create the threads.
This avoids all polling, and allows Linux to schedule each thread when the I/O completes. No CPU used while waiting, and this can scale to somewhat larger solutions.
If you need to scale massively, why not use a single set of ports, and get the partner address from the client_sockaddr_in structure. If the processing takes a material amount of time, you could extend it by keeping a pool of threads available and assign a new one each time a message is received and continue processing the message thereafter, and adding the thread back to the pool after the response is sent.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Sockets & Data Persistence - c

Related

Sending a message to a number of connection-oriented sockets without using loops

Socket Server with unknown number of receives in loop

Synchronization issues when two threads try to write to same tcp socket simultaneouslu

The most efficient way to manage multiple socket(maximum 50 sockets.) in a single process?

UDP Multiple sockets Receive data and process efficiently - C & Linux

Categories

Resources