What's the difference between Linux SOCK_RAW and SOCK_STREAM?

What's the difference between Linux SOCK_RAW and SOCK_STREAM? - c

I've been studying networking with c code and cryptography lately and upon pondering random questions I stumbled across a block of code that's used for packet sniffing and I had a question on the actual socket that gets used in the function recvfrom(). The socket gets initialized through the following sock function rawSock = socket(AF_INET, SOCK_RAW, 0).
I understand that SOCK_STREAM and SOCK_RAW are macros that represent an integer; but the question isn't about the values, it's about the results.
When would I use SOCK_STREAM over SOCK_RAW and vice versa?
I understand basic client and server communications using SOCK_STREAM. I'm working with C and in Linux

Read the man page.
For the prototype
int socket(int domain, int type, int protocol);
The types can be
SOCK_STREAM Provides sequenced, reliable, two-way, connection-
based byte streams. An out-of-band data transmission
mechanism may be supported.
or
SOCK_RAW Provides raw network protocol access.
In one line, SOCK_STREAM is for connection oriented sockets, where the underlying OS creates and manages the headers for L4 (TCP), L3 and L2. OTOH SOCK_RAW provides more fine-grained control over header and packet construction, where the user has to construct and supply the headers and can also manage the contents.
To elaborate:
Sockets of type SOCK_STREAM are full-duplex byte streams. They do
not preserve record boundaries. A stream socket must be in a
connected state before any data may be sent or received on it. A
connection to another socket is created with a connect(2) call. Once
connected, data may be transferred using read(2) and write(2) calls
or some variant of the send(2) and recv(2) calls. When a session has
been completed a close(2) may be performed. Out-of-band data may
also be transmitted as described in send(2) and received as described
in recv(2).
and
SOCK_RAW sockets allow sending of datagrams to
correspondents named in sendto(2) calls. Datagrams are generally
received with recvfrom(2), which returns the next datagram along with
the address of its sender.

Related

Questions on using raw sockets for sniffing UDP packets

I want to use raw sockets for packet analysis/sniffing limited to UDP. Yes I am aware of wireshark and libpcap. The scope of my analyzer is very limited and has to be implemented in C on Linux.
I know that we start with:
sd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
Then while reading, using recvfrom, we use the flag MSG_PEEK.
Questions:
Is the MSG_PEEK flag required if we use SOCK_RAW? .
If I read from this socket, will I get all incoming UDP packets on all interfaces and ports?
Does the read (with MSG_PEEK flag or otherwise) allow reading non-blocking sockets just like it would be had MSG_PEEK flag not been used? (I am using epoll). In epoll_ctl, one has to specify the fd to be monitored. Are all UDP sockets mapped to this single socket? So, if data is available on any UDP socket opened by other processes, this socket's EPOLLIN event will be triggered?
Will this reading (with MSG_PEEK or without) impact the reading of sockets in the actual program that use the packets for real applications?

Raw sockets receiving messages sent by itself

I was trying to write some codes using raw sockets, while I observed some strange phenomenon. Consider the code:
int rsfd = socket(AF_INET,SOCK_RAW,253);
if(rsfd<0)
{
perror("Raw socket not created");
}
else
{
struct sockaddr_in addr2;
memset(&addr2,0,sizeof(addr2));
addr2.sin_family = AF_INET;
addr2.sin_addr.s_addr = inet_addr("127.0.0.2");
/* if(connect(rsfd,(struct sockaddr*)&addr2,sizeof(addr2))<0)
{
perror("Could not connect");continue;
} */
}
Now if I remove the commented portion, whatever message I am sending through this rsfd is also being received by itself. On the other end I have already bound a socket with the ip address 127.0.0.2. When I printed the ip address of the sender socket, it is printing 127.0.0.1 but still it is receiving packets which is meant for 127.0.0.2. This problem was solved when I added that connect request which is mentioned in the commented portion. This seems weird because on the other side, no one is accepting or listen on this address and moreover, I am using sendto and recvfrom functions for sending and receiving packets which is used for connection less sockets. My question is, why is this happening? How is this connect request solvong the problem here?

Now if I [don't connect() the socket], whatever message I am sending through this rsfd is also being received by itself.
I note first that raw sockets are an extension to POSIX. Linux offers them, and I think other systems do too, but details of their behavior are not certain to be consistent across implementations.
With that said, the problem seems likely to be that you are not bind()ing your socket to any address. On Linux, for example, the docs for raw sockets note that
A raw socket can be bound to a specific local address using the
bind(2) call. If it isn't bound, all packets with the specified IP
protocol are received.
(Emphasis added.) On a system where raw sockets have that behavior, if you're sending packets to an IP loopback address via a raw IP socket that is neither bound nor connected then yes, the source socket will receive them, or at least may do.
It's unclear why connecting the socket solves the problem, or why it is even successful at all. The behavior of connect() is unspecified for socket types other than the standard ones, SOCK_DGRAM, SOCK_STREAM, and SOCK_SEQPACKET. However, the behavior you observe is consistent with connect() having an effect on raw sockets like that it has on datagram sockets, which are also connectionless:
If the socket sockfd is of type SOCK_DGRAM, then addr is the address
to which datagrams are sent by default, and the only address from
which datagrams are received.
Instead of relying on that discovered behavior, however, I suggest following the documented (at least on Linux) procedure of binding the socket to an address (including a port), and communicating with it at that address.

What is the HOPOPT protocol and how does socket() work?

I'm messing with sockets in C and this protocol continues to show up, I couldn't find anything about it, so what is it used for? What's the difference between HOPOPT and IP?
Also i'm don't get why the last argument of the socket() function should be 0. According to the man page:
The protocol specifies a particular protocol to be used with the socket. Normally only a single protocol exists to support a particular socket type within a given protocol family, in which case protocol can be specified as 0. However, it is possible that many protocols may exist, in which case a particular protocol must be specified in this manner. The protocol number to use is specific to the “communication domain” in which communication is to take place; see protocols(5). See getprotoent(3) on how to map protocol name strings to protocol numbers.
As far as I understand setting the last argument to 0 will let the standard library to decide which protocol to use but in which case would one use a number other than 0?

HOPOPT is the acronym of the Hop-by-Hop IPv6 extension header. It is a header that allows to add even more options to an IPv6 packet. It is normal that IPv6 packets include this header.
socket() is the system call that BSD and others (Linux et al.) provide to create a new socket, that is the internal representation of a network connection. When creating a new socket, the desired protocol must be specified: TCP, UDP, etc., which may go over IPv4, IPv6, etc.
The paragraph that you are citing explains that one or many protocols may exist for each socket type.
If only one exists, the protocol argument must be zero. For instance, SOCK_STREAM sockets are only implemented by TCP:
int sk = socket(AF_INET, SOCK_STREAM, 0);
If more exist, than you must specify which protocol in particular you want to use. For example, the SOCK_SEQPACKET type can be implemented with the SCTP protocol:
int sk = socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);
So, in conclusion:
If you want to create a socket, choose what protocol to use, for instance TCP over IPv4.
HOPOPT is totally normal in an IPv6 packet. If you see it appear in your traces, because you created an IPv6 socket (using AF_INET6), it is OK.

Purpose of sendto address for C raw socket?

I'm sending some ping packets via a raw socket in C, on my linux machine.
int sock_fd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
This means that I specify the IP packet header when I write to the socket (IP_HDRINCL is implied).
Writing to the socket with send fails, telling me I need to specify an address.
If I use sendto then it works. For sendto I must specify a sockaddr_in struct to use, which includes the fields sin_family, sin_port and sin_addr.
However, I have noticed a few things:
The sin_family is AF_INET - which was already specified when the socket was created.
The sin_port is naturally unused (ports are not a concept for IP).
It doesn't matter what address I use, so long as it is an external address (the IP packet specifies 8.8.8.8 and the sin_addr specifies 1.1.1.1).
It seems none of the extra fields in sendto are actually used to great extent. So, is there a technical reason why I have to use sendto instead of send or is it just an oversight in the API?

Writing to the socket with send fails, telling me I need to specify an address.
It fails, because the send() function can only be used on connected sockets (as stated here). Usually you would use send() for TCP communication (connection-oriented) and sendto() can be used to send UDP datagrams (connectionless).
Since you want to send "ping" packets, or more correctly ICMP datagrams, which are clearly connectionless, you have to use the sendto() function.
It seems none of the extra fields in sendto are actually used to great
extent. So, is there a technical reason why I have to use sendto
instead of send or is it just an oversight in the API?
Short answer:
When you are not allowed to use send(), then there is only one option left, called sendto().
Long answer:
It is not just an oversight in the API. If you want to send a UDP datagram by using an ordinary socket (e.g. SOCK_DGRAM), sendto() needs the information about the destination address and port, which you provided in the struct sockaddr_in, right? The kernel will insert that information into the resulting IP header, since the struct sockaddr_in is the only place where you specified who the receiver will be. Or in other words: in this case the kernel has to take the destination info from your struct as you don't provide an additional IP header.
Because sendto() is not only used for UDP but also raw sockets, it has to be a more or less "generic" function which can cover all the different use cases, even when some parameters like the port number are not relevant/used in the end.
For instance, by using IPPROTO_RAW (which automatically implies IP_HDRINCL), you show your intention that you want to create the IP header on your own. Thus the last two arguments of sendto() are actually redundant information, because they're already included in the data buffer you pass to sendto() as the second argument. Note that, even when you use IP_HDRINCL with your raw socket, the kernel will fill in the source address and checksum of your IP datagram if you set the corresponding fields to 0.
If you want to write your own ping program, you could also change the last argument in your socket() function from IPPROTO_RAW to IPPROTO_ICMP and let the kernel create the IP header for you, so you have one thing less to worry about. Now you can easily see how the two sendto()-parameters *dest_addr and addrlen become significant again because it's the only place where you provide a destination address.
The language and APIs are very old and have grown over time. Some APIs can look weird from todays perspective but you can't change the old interfaces without breaking a huge amount of existing code. Sometimes you just have to get used to things that were defined/designed many years or decades ago.
Hope that answers your question.

The send() call is used when the sockets are in a TCP SOCK_STREAM connected state.
From the man page:
the send() call may be used only when the socket is in a connected
state (so that the intended recipient is known).
Since your application obviously does not connect with any other socket, we cannot expect send() to work.

In addition to InvertedHeli's answer, the dest_addr passed in sendto() will be used by kernel to determine which network interface to used.
For example, if dest_addr has ip 127.0.0.1 and the raw packet has dest address 8.8.8.8, your packet will still be routed to the lo interface.

Making a reliable UDP by socket function in c

I am having this doubt in socket programming which I could not get cleared by reading the man pages.
In c the declaration of socket function is int socket(int domain, int type, int protocol);
The linux man page says that while type decides the stream that will be followed the protocol number is the one that decides the protocol being followed.
So my question is that suppose I give the type parameter as SOCK_STREAM which is reliable and add the protocol number for UDP would it give me a reliable UDP which is same as TCP but without flow control and congestion control.
Unfortunately I can't test this out as I have a single machine so there is no packet loss happening.
Could anyone clear this doubt? Thanks a lot...

UDP cannot be made reliable. Transmission of the packets is done on a "best effort" capacity, but any router/host along the chain is free to drop the packet in the garbage and NOT inform the sender that it has done so.
That's not to say you can't impose extra semantics on the sending and receiving ends to expect packets within a certain time frame and say "hey, we didn't get anything in the last X seconds". But that can't be done at the protocol level. UDP is a "dump it into the outbox and hope it gets there" protocol.

No. For an IPV4 or IPV6 protocol stack, SOCK_STREAM is going to get you TCP and the type SOCK_DGRAM is going to give you UDP. The protocol parameter is not used for either of the choices and the socket library is typically expecting a value of 0 to be specified there.

If you do this:
socket(AF_INET,SOCK_STREAM,IPPROTO_UDP):
socket() will return -1 and sett errno to
EPROTONOSUPPORT
The protocol type or the specified protocol
is not supported within this domain.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight