How to receive large HTML data using SSL_read

How to receive large HTML data using SSL_read - c

while(byte_count != 0){
byte_count = SSL_read(conn,get_buffer,sizeof(get_buffer));
printf("%s",get_buffer);
write_to_file(get_buffer,html,byte_count); // func to write to file
}
I've been trying to write a http/https client using sockets and SSL in C. The task is to get the HTML file of the landing page of a given website into a file on my system. I've handled the HTTP redirections and I was able to read only a portion of the HTTP payload since I've only called recv/SSL_read once. When I put this in a while loop it reads a few more 16kb segments and the connection times out. Is there any other way I can obtain whole of the HTML file ? (Sorry if this question seems vague, I'll be glad to make edits according to your responses)

Related

change file descriptor without re-initializing the handle of uv_poll_t type

I have an application project running on Linux environment, which includes libuv and another third-party library, the third-party library provides APIs for starting a TCP connection to remote server (say xxx_connect()) and getting file descriptor of the active connection (say xxx_get_socket()) . So far I managed to get valid file descriptor from xxx_get_socket() after xxx_connect() completed successfully, and initialize uv_poll_t handle with that file descriptor in my program.
Currently I am working on reconnecting function, after reconnecting the same server (by running xxx_connect() again), xxx_get_socket() returns different file descriptor, that means it is necessary to update io_watcher.fd member of a uv_poll_t handle to receive data in the new active connection.
AFAIK uv_poll_init() internally invokes uv__io_check_fd() , uv__nonblock() and uv__io_init() , it seems possible to modify io_watcher.fd of a uv_poll_t handle without closing the handle and then initializing it again (see sample code below), which has extra latency. However I'm not sure if it is safe to do so, I don't know whether io_watcher.fd member of a uv_poll_t handle is referenced elsewhere in libuv (e.g. uv_run()) which makes thing more complex. Is my approach feasible or should I re-initialize the uv_poll_t handle in such case ? Appreciate any feedback.
Possible approach , simplified sample code :
int uv_poll_change_fd( uv_poll_t *handle, int new_fd ) {
if (uv__fd_exists(handle->loop, new_fd))
// ..... some code ....
err = uv__io_check_fd(handle->loop, new_fd);
if(err)
// ..... some code ....
err = uv__nonblock(new_fd, 1);
// ..... some code ....
handle->io_watcher.fd = new_fd;
}

C sockets not receiving all data before HTTP site begins hanging [duplicate]

This question already has answers here:
When does an HTTP 1.0 server close the connection?
(1 answer)
Detect end of HTTP request body
(6 answers)
HTTP header line break style
(3 answers)
Closed 3 years ago.
I'm trying to create a server in C sockets that will allow file uploads through HTTP. The problem I'm having is that I get a very good chunk of the HTTP content but then it just stops sending and begins hanging, perhaps waiting for a response or something? But since recv never hits 0 it never gets to my response. (not sure thats even the reason why).
I looked around on google but most answers are about receiving data and looping to receive more when I'm already doing that.
Heres the code:
fp = fopen("fileName", "a");
for(;;)
{
ret = recv(fd, buffer, BUFSIZE, 0);
if(ret == 0){
\\ Never gets here unless I cancel the web request manually (pressing x where refresh usually is)
printf("Finished recieving");
char* sendHeader = "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 8\n\nRecieved";
write(fd, sendHeader, strlen(sendHeader));
break;
}
if(ret == -1) {
printf("Error recieving");
break;
}
fprintf(fp, "%s", buffer);
}
fclose(fp);
Right now i'm just taking the file contents and appending it to a file.
The actual result's I'm getting is:
(using ... to abbreviate)
--WebKitFormBoundaryRMGRl...
Content-Dispotion: form-data; name="filetoUpload"; filename...;
Content-type: application/octet-stream
\n
Actual file contents
\n
--WebKitFormBoundaryRMGRl...
Content-Disposition: form data; name="submit"
Upload License
--WebKitFormBoundaryRMGRl...
Begins writing file contents again, writes about 10 lines, then hangs until I manually cancel request
When I print the byte values, i fill the buffer 2 times then the 3rd time it doesn't completely fill it and just hangs waiting?
Any ideas?

But since recv never hits 0 it never gets to my response ...
recv will return 0 if the client shuts down the connection. But the client will not shut down the connection since it want to receive the response (ok, it could shut down for writing then) and maybe wants to send more requests (HTTP persistent connection).
Instead you have to parse the HTTP request to figure out how much data the client will send in the body. The usual way to do this is by setting the Content-length header to the size of the body. If the size is not known up-front the client might use chunked transfer encoding though were each chunk is prefixed by its length (in hex).
Or in other words: if you are trying to implement HTTP then please make yourself familiar with the standard by studying it and not by making just assumptions. That's what standards are actually for.

Autonomically sending a message from kernel-module to user-space application without relying on the invoke of input. from user-space

I will give a detailed exp of the program and lead to the issue regarding the use of netlink socket communication.
The last paragraph asks the actual question I need an answer for, so you might wanna start by peeking it first.
Disclaimer before I start:
- I have made an earlier search before asking here and did not find complete solution / alternative to my issue.
- I know how to initialize a module and insert it to kernel.
- I know to handle communication between module and user-space without using netlink sockets. Meaning using struct file_operations func pointers assignments to later be invoked by the module program whenever a user attempts to read/write etc. and answer to the user using copy_to_user / copy_from_user.
- This topic refers to Linux OS, Mint 17 dist.
- Language is C
Okay, so I am building a system with 3 components:
1. user.c : user application (user types commands here)
2. storage.c : storage device ('virtual' disk-on-key)
3. device.ko : kernel module (used as proxy between 1. and 2.)
The purpose of this system is to be able (as a user) to:
- Copy files to the virtual disk-on-key device (2) - like an "upload" from local directory that belongs to the user.
- Save files from the virtual device on local directory - like "download" from the device storage to the user directory.
Design:
Assuming programs (1),(2) are compiled and running + (3) has successfully inserted using the bash command ' sudo insmod device.ko ' , the following should work like this (simulation ofc):
Step 1 (in user.c) -> user types 'download file.txt'
Step 2 (in device.ko) -> the device recognizes the user have tried to 'write' to it (actually user just passing the string "download file.txt") and invokes the 'write' implementation of the method we set on struct file_operation earlier on module_init().
The device (kernel module) now passes the data (string with a command) to the storage.c application, expecting an answer to later be retrieved to the user.c application.
Step 3 (in storage.c) -> now, lets say this program performs a busy-wait loop of 'readmsg()' and that's how a request from module event is triggered and recognized, the storage device now recognizes that the module has sent a request (string with a command \ data). Now, the storage programs shall perform an implementation of some function 'X' to send the data requested using sendmsg() somewhere inside the function.
Now, here comes the issue.
Usually, on all of the examples I've looked on web, the communication between the kernel-module and a user-space (or the storage.c program in our case) using netlink is triggered by the user-space and not vice versa. Meaning that the sendmsg() function from the user-space invokes the 'request(struct sk_buff *skb)' method (which is set on the module_init() part as following:
struct netlink_kernel_cfg cfg = {
.input = request // when storage.c sends something, it invokes the request function
};
so when the storage.c performs something like:
sendmsg(sock_fd,&msg,0); // send a msg to the module
the module invokes and runs the:
static void request(struct sk_buff *skb) {
char *msg ="Hello from kernel";
msg_size=strlen(msg);
netlink_holder=(struct nlmsghdr*)skb->data;
printk(KERN_INFO "Netlink received msg payload:%s\n",(char*)nlmsg_data(netlink_holder));
pid = netlink_holder->nlmsg_pid; // pid of sending process
skb_out = nlmsg_new(msg_size,0);
if(!skb_out){
printk(KERN_ERR "Failed to allocate new skb\n");
return;
}
netlink_holder=nlmsg_put(skb_out,0,0,NLMSG_DONE,msg_size,0); // add a new netlink message to an skb. more info: http://elixir.free-electrons.com/linux/v3.2/source/include/net/netlink.h#L491
NETLINK_CB(skb_out).dst_group = 0; // not in multicast group
strncpy(nlmsg_data(netlink_holder),msg,msg_size); // assign data as char* (variable msg)
result=nlmsg_unicast(sock_netlink,skb_out,pid); // send data to storage. more info: http://elixir.free-electrons.com/linux/latest/source/include/net/netlink.h#L598
if(result<0)
printk(KERN_INFO "Error while sending bak to user\n");
}
and from all that big chunk, the only thing that im interesting in is actually doing this:
result=nlmsg_unicast(sock_netlink,skb_out,pid); // send data to storage.
BUT I can't use nlmsg_unicast() without having the strcut sk_buff* which is provided automatically for me whenever there's an invoke from storage.c !
To sum up everything:
How do I send a msg from the device.ko (kernel module) to the user-space withtout having to wait for request to invoke / rely on the provided strcut sk_buff parameter from the earlier shown 'request()' method ?
Hope this sums up the point.
Thanks.

The only question here is that you need the user-space program connected to kernel-space first to get the pid of your user-program.
After get the pid, you can manually construct the skb_out and send it out through netlink_unicast or nlmsg_unicast.
The pid is always needed, you can set it as static and let your user-space program connect to your device.ko to make a long-maintained link.
Although this question is asked at 2017, I believe OP has already found the answer :D

Does the libcurl library give any way to determine which response header came from which command?

Background:
I'm working on my first C program with the library and I need to gather responses from each command sent to a SMTP server.
I've gotten as far as sending commands to the SMTP server and printing the response headers using curl_easy_setopt(curl_handle, CURLOPT_HEADERFUNCTION, parse_head), but I'm using multi threaded options. It is not at all clear when I get a response which command it was caused by. I am assuming that they will not necessarily be received in the same order sent. Is that correct?
Making it more difficult, since the library handles some calls (like setting up the initial connection) without my explicit request, I would need to handle more headers than explicit requests. That would be predictable and repeatable, but definitely adds an extra level of complexity.
Question:
Is there a "good" way to determine exactly which command resulted in which response header using multi thread?
Also, moderately related, does the library support returning the numeric return code or do I have to manually parse that out? Looking through the library, it seems that it doesn't. I just want to be sure.

I am assuming that they will not necessarily be received in the same order sent. Is that correct?
Yes, it is. That's how multithreading works.
Is there a "good" way to determine exactly which command resulted in which response header using multi thread?
Yes. You can set user data (context info, whatever you call it) using the CURLOPT_HEADERDATA option - this will be passed in as the 4th argument of your header function. So you can write code like this:
CURL *hndl = curl_easy_init();
// ...
curl_easy_setopt(hndl, CURLOPT_HEADERFUNCTION, parse_head);
curl_easy_setopt(hndl, CURLOPT_HEADERDATA, some_pointer_that_identifies_the_thread);
// ...
size_t parse_head(void *buf, void *size_t sz, size_t nmemb, void *context)
{
// context will be the pointer identifying the thread
}
does the library support returning the numeric return code or do I have to manually parse that out?
Yes, it does:
long httpStatus;
curl_easy_getinfo(hndl, CURLINFO_RESPONSE_CODE, &httpStatus);
if (200 <= httpStatus && httpStatus < 300) {
// HTTP 2XX OK
} else {
// Error (4XX, 5XX) or redirect (3XX)
}

How to FastCGI in C?

I have a website where each webpage is compiled into a binary (I have 100 webpages, therefore I have 100 binaries). Apache's .htaccess contains the line "SetHandler cgi-script" which instructs apache to use CGI when a binary (webpage) is requested.
How can I modify this website to use FastCGI instead of CGI ?
Do I just have to include this header and use this while loop (FastCGI.com) in each of the 100 binaries and modify .htaccess to "SetHandler fastcgi-script" ?
#include "fcgi_stdio.h" // instead of stdio.h
while(FCGI_Accept() >= 0)
So how will FastCGI work exactly ? Apache will dispatch webpages using 1 persistent process for the entire website or will there be 1 persistent process for each of the 100 binaries ?

A FastCGI script is a network server that listens for connections in a loop. The web server forward requests to the FCGI server which sends back some dynamically generated content - all over a socket connection. Thus a FCGI script is faster than CGI as it is not re-spawned for each request.
I don't understand why you need 100 binaries for 100 pages. A single script is enough to generate content for 100 pages, based on some request parameter. The FCGI server should also scale pretty well for multiple connections as it is usually made to poll on the socket file descriptor. (Look at the code of the implementation to make sure of this).
To generate 100 pages you don't necessarily need 100 if statements. Consider this pseudo-code:
hash_table page_generators; // map page types to function objects (or function pointers)
page_generators["login_page"] = handle_login_page_fn;
page_generators["contact_page"] = handle_contact_page_fn;
// ... and so on
// request handler
page_type = request.get("page_type");
fn = page_generators[page_type];
if (fn == NULL)
return "<html><body>Invalid request</body></html>";
else
return fn(request);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to receive large HTML data using SSL_read - c

Related

change file descriptor without re-initializing the handle of uv_poll_t type

C sockets not receiving all data before HTTP site begins hanging [duplicate]

Autonomically sending a message from kernel-module to user-space application without relying on the invoke of input. from user-space

Does the libcurl library give any way to determine which response header came from which command?

How to FastCGI in C?

Categories

Resources