Invoke web page from Linux C - c

I need to read all the HTML text from a url like http://localhost/index.html into a string in C.
I know that if i put on telnet -> telnet www.google.com 80 Get webpage.... it returns all the html.
How do I do this in a linux environment with C?

I would suggest using a couple of libraries, which are commonly available on most Linux distributions:
libcurl and libxml2
libcurl provides a comprehensive suite of http features, and libxml2 provides a module for parsing html, called HTMLParser
Hope that points you in the right direction

Below is a rough outline of code (i.e. not much error checking and I haven't tried to compile it) to get your started, but use http://www.tenouk.com/cnlinuxsockettutorials.html to learn socket programming. Lookup gethostbyname if you need to translate a hostname (like google.com) into an IP address. Also you may need to do some work to parse out the content length from the HTTP response and then make sure you keep calling recv until you've gotten all the bytes.
#include <netinet/in.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>
#include <stdlib.h>
void getWebpage(char *buffer, int bufsize, char *ipaddress)
{
int sockfd;
struct sockaddr_in destAddr;
if((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1){
fprintf(stderr, "Error opening client socket\n");
close(sockfd);
return;
}
destAddr.sin_family = PF_INET;
destAddr.sin_port = htons(80); // HTTP port is 80
destAddr.sin_addr.s_addr = inet_addr(ipaddress); // Get int representation of IP
memset(&(destAddr.sin_zero), 0, 8);
if(connect(sockfd, (struct sockaddr *)&destAddr, sizeof(struct sockaddr)) == -1){
fprintf(stderr, "Error with client connecting to server\n");
close(sockfd);
return;
}
// Send http request
char *httprequest = "GET / HTTP/1.0";
send(sockfd, httprequest, strlen(httprequest), 0);
recv(sockfd, buffer, bufsize, 0);
// Now buffer has the HTTP response which includes the webpage. You can either
// trim off the HTTP header, or just leave it in depending on what you are doing
// with the page
}

if you really don't feel like messing around with sockets, you could always create a named temp file, fork off a process and execvp() it to run wget -0 , and then read the input from that temp file.
although this would be a pretty lame and inefficient way to do things, it would mean you wouldn't have to mess with TCP and sending HTTP requests.

You use sockets, interrogate the web server with HTTP (where you have "http://localhost/index.html") and then parse the data which you have received.
Helpful if you are a beginner in socket programming: http://beej.us/guide/bgnet/

Assuming you know how to read a file into a string, I'd try
const char *url_contents(const char *url) {
// create w3m command and pass it to popen()
int bufsize = strlen(url) + 100;
char *buf = malloc(bufsize);
snprintf(buf, bufsize, "w3m -dump_source '%s'");
// get a file handle, read all the html from it, close, and return
FILE *html = popen(buf, "r");
const char *s = read_file_into_string(html); // you write this function
fclose(html);
return s;
}
You fork a process, but it's a lot easier to let w3m do the heavy lifting.

Related

Listening to virtual network interface

Don't get confused by me talking about L2TP. Although my problem is related to L2TP it is not an L2TP problem per se. It's more of an networking problem.
Background
I'm writing an application working with L2TP. This is my first time working with L2TP and the linux L2TP subysytem, so I hope I got all this right.
When creating an L2TP Ethernet session the subsystem automatically creates a virtual network interface.
After bringing the interface up I can check with Wireshark and indeed the desired data is sent to the interface. This is without any packaging tho. It's not inside an ethernet frame or anything, but just the data bytes which were included in the L2TP packet.
I have no control over actually creating the device, but I can query its name and therefore its index etc., so so far so good.
The actual problem
My question is actually pretty simple: How do I get the data which is sent to a virtual interface into my userspace application?
I don't have a lot of experience with networking on unix but my expectation would be that this is a fairly simple problem, solvable by either obtaining an file descriptor with which I can use read / recv or somehow binding a socket to just that network interface.
I couldn't find any (gen-)netlink / ioctl API (or anything else) to do this or something comparable.
Although my application is written in GO not in C, a solution in C would be completely sufficient. Tbh at this point I would be happy about any approach to solve this issue programmatically. :)
Thanks a lot in advance
I just found a tutorial which answers my own question. It was actually really easy using AF_PACKET sockets.
There is a lovely tutorial on microhowto.info, which explains how AF_PACKET sockets work, better than I ever could. It even includes a section "Capture only from a particular network interface".
Here is a minimal example, which worked for my use case:
#include <stdlib.h>
#include <stdio.h>
#include <arpa/inet.h>
#include <net/ethernet.h>
#include <linux/if_packet.h>
#include <sys/socket.h>
// [...]
// Create socket
int fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (fd == -1) {
perror("ERROR socket");
exit(1);
}
// Interface index (i.e. obtainable via ioctl SIOCGIFINDEX)
int ifindex = 1337;
// create link layer socket address
struct sockaddr_ll addr = {0};
addr.sll_family = AF_PACKET;
addr.sll_ifindex = ifindex;
addr.sll_protocol = htons(ETH_P_ALL)
if (bind(fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
perror("ERROR bind");
exit(1);
}
char buffer[65535];
ssize_t len;
do {
len = recv(fd, buffer, sizeof(buffer) -1, 0);
if (len < 0) {
perror("ERROR recvfrom");
exit(1);
}
printf("recived data (length: %i)\n", (int) len);
} while (len > 0);

Bluetooth on the EV3

Before I get started. Yes, I could use leJOS, ev3dev, or some others, but I'd like to do it this way because that is how I learn.
I am using the CodeSourcery arm-2009q1 arm toolchain. I fetched the required libraries (bluetooth) from here: https://github.com/mindboards/ev3sources.
I am uploading the programs to the brick by using this tool: https://github.com/c4ev3/ev3duder
I have also fetched the brick's shared libraries, but I can not get them to work properly and there is 0 documentation on how to write a c program for the ev3 using the shared libraries. If I could get that working I might be able to use the c_com module to handle bluetooth, but right now bluez and rfcomm in conjunction with: https://github.com/c4ev3/EV3-API for motor and sensor control seems to be my best bet.
Now, with that out of the way:
I'd like to run the EV3 as a bluetooth "server" meaning that I start a program on it and the program opens a socket, binds it, listens for a connection, and then accepts a single connection.
I am able to do open a socket, bind it to anything but channel 1 (I believe this might be the crux of my issue), I am able to listen. These all return 0 (OK) and everything is fine.
Then I try to accept a connection. That instantly returns -1 and sets the remote to address 00:00:00:00:00:00.
My code is pretty much the same as can be found here: https://people.csail.mit.edu/albert/bluez-intro/x502.html
Here it is:
#include <stdio.h>
#include <unistd.h>
#include <sys/socket.h>
#include <bluetooth/bluetooth.h>
#include <bluetooth/rfcomm.h>
#include <ev3.h>
int main(int argc, char **argv)
{
InitEV3();
struct sockaddr_rc loc_addr = { 0 }, rem_addr = { 0 };
char buf[1024] = { 0 };
int sock, client, bytes_read;
socklen_t opt = sizeof(rem_addr);
sock = socket(AF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
loc_addr.rc_family = AF_BLUETOOTH;
loc_addr.rc_bdaddr = *BDADDR_ANY;
loc_addr.rc_channel = 2; // <-- Anything but 1. 1 seems to be taken
bind(sock, (struct sockaddr *)&loc_addr, sizeof(loc_addr));
listen(sock, 1);
// accept one connection <-- PROGRAM FAILS HERE AS accept() returns -1
client = accept(sock, (struct sockaddr *)&rem_addr, &opt);
// ---- All following code is irrelevant because accept fails ----
ba2str( &rem_addr.rc_bdaddr, buf );
fprintf(stderr, "accepted connection from %s\n", buf);
memset(buf, 0, sizeof(buf));
bytes_read = read(client, buf, sizeof(buf));
if( bytes_read > 0 )
printf("received [%s]\n", buf);
close(client);
close(sock);
FreeEV3();
return 0;
}
I am able to get the same code working on my pi. Even communication back and forth when the ev3api-specific functions are commented out. I just can't fathom why it won't work on the EV3.
I figured it out.
On my raspberry PI, the accept call worked as expected with no quirks. On the EV3 however, the accept call is non-blocking even if it has not been told to act like so.
The solution was to place the accept call in a loop until an incoming connection was in the queue.
while (errno == EAGAIN && ButtonIsUp(BTNEXIT) && client < 0)
client = accept(sock, (struct sockaddr*)&rem_addr, sizeof(rem_addr));
I'll upload the code to github. Contact me if you'd like to do something similar with the EV3.

UDP socket at webassembly

I'm trying to port my desktop app written in C and C++ to webassembly platform and am investigating if it is possible at all. One of important things the app does is communicate by sending and receiving UDP messages. I have implemented minimal UDP client which just creates UDP socket and sends packets to server (which is build natively and is running as separate executable at the same machine). socket, bind and sendto APIs return no error and everything looks working but no messages are receiving on server side and wireshark shows no activity on that port.
Is UDP socket just stubbed at webassembly libc port, or it is implemented on top of some web standard connection (e.g. WebRTC)?
The client code is below. I checked that native build is working properly.
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>
#define BUFLEN 512
#define NPACK 100
#define PORT 9930
void diep(char *s)
{
perror(s);
exit(1);
}
#define SRV_IP "127.0.0.1"
int main(void)
{
struct sockaddr_in si_other;
int s, i, slen=sizeof(si_other);
char buf[BUFLEN];
if ((s=socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP))==-1)
diep("socket");
memset((char *) &si_other, 0, sizeof(si_other));
si_other.sin_family = AF_INET;
si_other.sin_port = htons(PORT);
if (inet_aton(SRV_IP, &si_other.sin_addr)==0) {
fprintf(stderr, "inet_aton() failed\n");
exit(1);
}
for (i=0; i<NPACK; i++) {
printf("Sending packet %d\n", i);
sprintf(buf, "This is packet %d\n", i);
if (sendto(s, buf, BUFLEN, 0, (struct sockaddr*)&si_other, slen)==-1)
diep("sendto()");
}
close(s);
return 0;
}
I followed instructions from http://webassembly.org/getting-started/developers-guide/ to build and run it.
Thanks in advance for any help or clues!
I found how UDP sockets are implemented at webassembly. Actually, they are emulated by websockets. It probably would work if both client and server were webassemblies, but my server is built natively. As wasm doesn't support dynamic linking, all the code (including libc implementation) is bundled to one JS file, were we can find UDP sendto implementation:
// if we're emulating a connection-less dgram socket and don't have
// a cached connection, queue the buffer to send upon connect and
// lie, saying the data was sent now.
if (sock.type === 2) {
if (!dest || dest.socket.readyState !== dest.socket.OPEN) {
// if we're not connected, open a new connection
if (!dest || dest.socket.readyState === dest.socket.CLOSING || dest.socket.readyState === dest.socket.CLOSED) {
dest = SOCKFS.websocket_sock_ops.createPeer(sock, addr, port);
}
dest.dgram_send_queue.push(data);
return length;
}
}
Anything that runs in the browser will not give you native socket access and I suspect that browser vendors would strongly object to any such access as it is a potential security violation.
Perhaps as more and more native applications move to the web as the performance difference shrinks due to webassembly and similar initiatives would make them change their stance, but until then, anything that wants direct socket control would have to remain a native app.

winsock2 P2P without port forwarding

I am new to winsock2 and networking in general but I am not new to C.
My goal is to make a program that can send and receive data from one computer to another.
Basically I want to make something like this:
Computer one initializes a transfer with computer two. Computer one does know the IP address of computer two but computer two does not know the IP address of computer one. So in other words computer two can be though of as a server and computer one as a client.
I would like for this to work without either of the users need to mess with router settings such as forwarding ports. My idea was to make something like an HTTP server. The reason for this is because pretty much all routers can view webpages which regularly send and receive data which is my goal. And just like what I want to do, the server does not know the clients IP address until the client tries to request something from the server. So with that said I realize that I should model my program off of HTTP. I decided to first write a simple program testing the programs ability to send a webpage.
#include <Winsock2.h>
#include <windows.h>
#include <stdio.h>
static const char html[]="HTTP/1.1 200 OK\r\n"
"Connection: close\r\n"
"Content-type: text/html\r\n\r\n"
"<html>\r\n"
"<head>\r\n"
"<title>Html Test</title>\r\n"
"</head>\r\n"
"<body>\r\n";
static const char htmlend[]="</body>\r\n"
"</html>\r\n\r\n";
static const char * defaultStr="Default";
int main(void){
int exit=0;
WSADATA wsa;
char buffer[512];
int bytes;
SOCKET s,client;
SOCKADDR_IN localAddress;
WSAStartup(MAKEWORD(2,2),&wsa);
while(!exit){
char * str;
s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
localAddress.sin_family = AF_INET;
localAddress.sin_port = htons(80);
localAddress.sin_addr.s_addr = INADDR_ANY;
bind(s, (SOCKADDR*)&localAddress, sizeof(localAddress));
listen(s, SOMAXCONN);
client = accept(s,NULL,NULL);
closesocket(s);
bytes = recv(client,buffer,512,0);
fputs(buffer,stdout);
str=strstr(buffer,"GET /")+5;
if(str){
char *str3;
if(*str==' '){
str=(char*)defaultStr;
}else{
char *str2=strstr(str," ");
*str2=0;
if(!strcmp(str,"Exit"))
exit=1;
}
puts(str);
str3=malloc(strlen(html)+strlen(htmlend)+strlen(str)+6);
strcpy(str3,html);
strcat(str3,str);
strcat(str3,"\r\n");
strcat(str3,htmlend);
fputs(str3,stdout);
send(client,str3,strlen(str3),0);
free(str3);
}
shutdown(client,SD_BOTH);
closesocket(client);
}
WSACleanup();
return 0;
}
The issue with the above program is that it works only on internal network. When I try to access it on the world wide web nothing happens. Am I even taking the right approach (simulating http for P2P?)? What is wrong with my program that it does not work on the world wide web? If anyone has an answer to either or both of these questions I thank you in advanced.

How to create a simple Proxy to access web servers in C

I’m trying to create an small Web Proxy in C. First, I’m trying to get a webpage, sending a GET frame to the server.
I don’t know what I have missed, but I am not receiving any response. I would really appreciate if you can help me to find what is missing in this code.
int main (int argc, char** argv) {
int cache_size, //size of the cache in KiB
port,
port_google = 80,
dir,
mySocket,
socket_google;
char google[] = "www.google.es", ip[16];
struct sockaddr_in socketAddr;
char buffer[10000000];
if (GetParameters(argc,argv,&cache_size,&port) != 0)
return -1;
GetIP (google, ip);
printf("ip2 = %s\n",ip);
dir = inet_addr (ip);
printf("ip3 = %i\n",dir);
/* Creation of a socket with Google */
socket_google = conectClient (port_google, dir, &socketAddr);
if (socket_google < 0) return -1;
else printf("Socket created\n");
sprintf(buffer,"GET /index.html HTTP/1.1\r\n\r\n");
if (write(socket_google, (void*)buffer, MESSAGE_LENGTH+1) < 0 )
return 1;
else printf("GET frame sent\n");
strcpy(buffer,"\n");
read(socket_google, buffer, sizeof(buffer));
// strcpy(message,buffer);
printf("%s\n", buffer);
return 0;
}
And this is the code I use to create the socket. I think this part is OK, but I copy it just in case.
int conectClient (int puerto, int direccion, struct sockaddr_in *socketAddr) {
int mySocket;
char error[1000];
if ( (mySocket = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
printf("Error when creating the socket\n");
return -2;
}
socketAddr->sin_family = AF_INET;
socketAddr->sin_addr.s_addr = direccion;
socketAddr->sin_port = htons(puerto);
if (connect (mySocket, (struct sockaddr *)socketAddr,sizeof (*socketAddr)) == -1) {
snprintf(error, sizeof(error), "Error in %s:%d\n", __FILE__, __LINE__);
perror(error);
printf("%s\n",error);
printf ("-- Error when stablishing a connection\n");
return -1;
}
return mySocket;
}
Thanks!
First, you're not checking how many bytes the write(2) call actually wrote to the socket. The return value of the call tells you that. Same for the read(2). TCP socket is a bi-directional stream, so as a rule always do both in a loop until expected number of bytes is transferred, EOF is read (zero return from read(2)), or an error occurred (which you are not checking for when reading either).
Then HTTP is rather complex protocol. Make yourself familiar with RFC 2616, especially application level connection management and transfer encodings.
Edit 0:
Hmm, there's no such thing as "simple" proxy. You need to manage multiple connections (at least client-to-proxy and proxy-to-server), so it's probably best to look into select(2)/poll(2)/epoll(4)/kqueue(2) family of system call, which allow you to multiplex I/O. This is usually combined with non-blocking sockets. Look into helper libraries like libevent. Look at how this is done in good web-servers/proxies like nginx. Sound like it's a lot for you to discover, but don't worry, it's fun :)
Since you didn't post the GetIP routine, I am not certain that your hostname lookup is correct, as from the looks of it, I am not sure that you are using inet_addr function correctly.
Nikolai has pointed out some very good points (and I fully agree). In fact you GET request is actually broken, and while I was testing it on my own local Apache web server on my system, it didn't work.
sprintf(buffer,"GET /index.html HTTP/1.1\r\n\r\n");
if (write(socket_google, (void*)buffer, LONGITUD_MSJ+1) < 0 )
return 1;
else printf("GET frame sent\n");
...
strcpy(buffer,"\n");
read(socket_google, buffer, sizeof(buffer));
should be replaced with
snprintf(buffer, sizeof(buffer),
"GET / HTTP/1.1\r\nHost: %s\r\nUser-Agent: TEST 0.1\r\n\r\n",
google);
if (write(socket_google, buffer, strlen(buffer)+1) < 0 ) {
close(socket_google);
return 1;
} else
printf("GET frame sent\n");
...
buffer[0] = '\0';
/* Read message from socket */
bytes_recv = read(socket_google, buffer, sizeof(buffer));
if (bytes_recv < 0) {
fprintf(stderr, "socket read error: %s\n", strerror(errno));
close(socket_google);
exit(10);
}
buffer[bytes_recv] = '\0'; /* NUL character */
/* strcpy(message,buffer); */
printf("%s\n", buffer);
...
You should also close the socket before exiting the program. Enable standard C89/90 or C99 mode of your compiler (e.g. -std=c99 for gcc) and enable warnings (e.g. -Wall for gcc), and read them. And #include the necessary header files (assuming Linux in my case) for function prototypes:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h> /* for gethostbyname() */
There is some casting of pointers and structs in regards to the hostname / IP address resolving, which can be confusing and easy place to make a mistake, so verify that is working as you expect it is.
in_addr_t ip;
...
GetIP(google, &ip); /* I changed the parameters */
printf("IP address = %x (%s)\n",
ip,
inet_ntoa(*((struct in_addr*)&ip)));
Actually, I've been implementing a small web proxy using my library called rzsocket link to it.
One of the most difficult things I've found when implementing the web proxy, perhaps this might also be your problem, was that, in order to make the proxy work properly, I had to set keep-alive settings false. One way of doing this in FireFox, is accessing about:config address, and setting the value of network.http.proxy.keep-alive to false.

Resources