Large file uploading to a libevent-based HTTP server - c

I'm trying to write a HTTP to ZeroMQ proxy with libevent (2.0.4) which should be able to handle very large (up to 4GB) file upload.
The problem is I don't know how large post requests (larger than memory) are handled by libevent so if you have hints on how to implement large file uploading, please led me on the right path.

have you read the libevent source code? it's very readable.
If you're using it's HTTP code, i think it uses the 'bufferedevent' (or is it evented buffers?) feature. You can simply set callbacks when the input buffer reaches the highwater mark.

Maybe you will find some info in http://mongrel2.org/home since this is HTTP server and proxy that uses ZeroMQ for processing (backend handlers).

Related

Sending requests using Sockets through a SOCKS5 Proxy Server in C

I have a simple networking program for sending and responding to HTTP requests/responses. However, if I wanted to send a HTTP or another request via a SOCKS5 proxy, how would I go about this? I'm using C Unix sockets.
I could solve this by creating a proxy server in Linux. However, my intention is so that I can send this to a proxy server I do not own so that it can be forwarded to the destination server. I couldn't seem to find a library. I found an RFC for it which I've read https://www.rfc-editor.org/rfc/rfc1928.txt , but i'm still not 100% unsure how to format my request.
Am I supposed to send the segments for the handshakes as hex? If so, would I send a string of hex 0F3504 or \x0F \x35 \x04 or 0x0F3504? Another question is do i need to denote that header = value in the message, or does the SOCKS5 server know what header i am referring to by the position of the byte it is looking at from the message I've sent?
Any clear up would be very much appreciated
Some time ago I wrote an open source C library that may help you: https://github.com/brechtsanders/proxysocket
Maybe you can use the library. Its quite easy, just replace the connect() with stuff from the library and for the rest you can keep the rest of your code that uses the socket that is returned.
Or you can take a peek in the code to see how it's done there.

HTTPS protocol file integrity

I understand that when you send a file from a client to a server using HTTP/HTTPS protocols, you have the guarantee that all data sent successfully arrived at the destination. However, if you are sending a huge file and then suddenly the internet connection goes down, not all packages are sent and, therefore, you lose the logical integrity of the file.
Is there any point I am missing in my statement?
I would like to know if there is a way for the destination node to check file logical integrity without using a "custom code/api".
HTTPS is just HTTP over a TLS layer, so all applies to HTTPS, too:
HTTP is typically transported over TCP/IP. Now, TCP has flow control (ie. lost packets will be resent), and checksums (ie. the probability, that without the receiver noticing and re-requesting a packet data got altered is minor). So if you're really just transferring data, you're basically set (as long as your HTTP server is configured to send the length of your file in bytes, which, at least for static files, it usually is).
If your transfer is stopped before the whole file size that was advertised in the HTTP GET reply that your server sends to the client is reached, your client will know! Many HTTP libraries/clients can re-start HTTP transmissions (if the server supports it).
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15
even specifies a MD5 checksum header field. You can configure web servers to use that field, and clients might use it to verify the overall file integrity.
EDIT: Content-MD5 as specified by rfc2616 seems to be deprecated. You can now use a content digest, which is much more flexible.
Also, you mention that you want to check the file that a client sends to a server. That problem might be quite a bit harder -- whilst you're usually in total control of your web server, you can't force an arbitrary client (e.g. a browser) to hash its file before uploading.
If you're, on the other hand, in fact in control over the client's HTTP implementation, you could most probably also use something more file transfer oriented than plain HTTP -- think WebDav, AtomPUB etc, which are protocols atop of HTTP, or even more file exchange oriented protocols like rsync (which I'd heartily recommend if you're actually syncing stuff -- it reduces network usage to a minimum if both side's versions only differ partially). If for some reason you're in the position that your users share most of their data within a well-defined circle (for example, you're building something where photographers share their albums), you might even just use bittorrent, which has per-chunk hashing, extensive load balancing options, and allows for "plain old HTTP seeds".
There are several issues here:
As Marcus stated is his answer TCP protects your bytes from being accidentaly corrupted, but it doesn't help if download was interrupted
HTTPS additionally ensures that those bytes weren't tampered with between server and client (you)
If you want to verify integrity of file (whose transfer was or was not interrupted) you should use checksum designed to protect from accidental file corruption (e.g. CRC32, there could be better ones, you should check)
If in addition you use HTTPS then you're safe from intentional attacks too because you know your checksum is OK and that file parts you got weren't tampered with.
If you use checksum, but don't use HTTPS (but you really should) then you should be safe against accidental data corruption but not against malicious attacks. It could be mitigated, but it's outside the scope of this question
In HTTP/1.1, the recipient can always detect whether it received a complete message (either by comparing the Content-Length, or by properly handling transfer-encoding: chunked).
(Adding content hashes can help if you suspect bit errors on the transport layer.)

libcurl (linux, C) - Is there a built-in way to POST/PUT a gzipped request?

I want to perform a HTTP POST and/or PUT (using libcurl) with the request being compressed using GZIP. I haven't been able to find any native support for this in libcurl, and am wondering if I just haven't found the correct documentation or if there really is no support for this? (ie. Will I have to implement my own wrapper to gzip the request body?)
HTTP has no "automatic" or negotiated compression for requests, you need to do them explicitly by yourself before sending the data.
Also, I disagree with Aditya who provided another answer (judging my bias it's not that strange) but would say that libcurl is one of the best possible options for you to do HTTP requests with C (or C++, or other languages) ...
I would recommend not using libcurl if you are interested in supporting automatic proxy detection, certificate store (think HTTPS/SSL support) along with stuff like gzipping of requests.
You could use zlib for coming out of this situation, but what about other scenarios? It is better to use platform APIs for network requests even if overall program is platform independent in C++.

How does main stream web server implement this feature?

This means, for example, a module can
start compressing the response from a
backend server and stream it to the
client before the module has received
the entire response from the backend.
Nice!
I know it's some kind of asynchronous IO but simple like that isn't enough.
Anyone knows?
Without looking at the source code of an actual implementation, I'm speculating here:
It's most likely some kind of stream (abstract buffered IO) that is passed from one module to the other ("chaining"). One module (maybe a servlet container) writes to a stream that is read by another module (the compression module in your example), which then writes its output to another stream. The contents of that stream may then be processed further or transmitted to the client.
The backend may need to wait on IO before it can fully produce the page. Modules can begin compressing the start of the message before the backend is entirely done writing it.
To understand why this is useful, you need to understand how ngnix is structured. ngninx is a server that relies on non-blocking input and output. Normally, a server will use blocking input and output: it will listen on a connection, and when a connection is found, it will process the page. In order to increase throughput, multiple threads are spawned, called 'workers'.
Contrast this to ngnix: It continually asks the kernel, "Are any of my IO requests ready?" This allows it to handle the same amount of pages with 1) less overhead from all the different processes, and 2) lower memory usage. It has some downsides, however. For extremely low-volume applications, ngnix may use more CPU than a blocking server. Second, it's much less portable. Windows uses an entirely different model for non-blocking IO.
Getting back to your original question, compressing the beginning of a page is useful because it can be ready for the rest of the page when it's done accessing a database or reading from a disk or what-have-you.

Real time embeddable http server library required

Having looked at several available http server libraries I have not yet found what I am looking for and am sure I can't be the first to have this set of requirements.
I need a library which presents an API which is 'pipelined'. Pipelining is used to describe an HTTP feature where multiple HTTP requests can be sent across a TCP link at a time without waiting for a response. I want a similar feature on the library API where my application can receive all of those request without having to send a response (I will respond but want the ability to process multiple requests at a time to reduce the impact of internal latency).
So the web server library will need to support the following flow
1) HTTP Client transmits http request 1
2) HTTP Client transmits http request 2 ...
3) Web Server Library receives request 1 and passes it to My Web Server App
4) My Web Server App receives request 1 and dispatches it to My System
5) Web Server receives request 2 and passes it to My Web Server App
6) My Web Server App receives request 2 and dispatches it to My System
7) My Web Server App receives response to request 1 from My System and passes it to Web Server
8) Web Server transmits HTTP response 1 to HTTP Client
9) My Web Server App receives response to request 2 from My System and
passes it to Web Server
10) Web Server transmits HTTP response 2 to HTTP Client
Hopefully this illustrates my requirement. There are
two key points to recognise. Responses to the Web Server Library are
asynchronous and there may be several HTTP requests passed to My Web
Server App with responses outstanding.
Additional requirements are
Embeddable into an existing 'C' application
Small footprint; I don't need all the functionality available in Apache etc.
Efficient; will need to support thousands of requests a second
Allows asynchronous responses to requests; their is a small latency to responses and given the required request throughput a synchronous architecture is not going to work for me.
Support persistent TCP connections
Support use with Server-Push Comet connections
Open Source / GPL
support for HTTPS
Portable across linux, windows; preferably more.
I will be very grateful for any recommendation
Best Regards
You could try libmicrohttp.
Use the Onion, Luke. This is lightweight and easy to use HTTP server library in C.
For future reference, that meets your requirement, take a look at libasyncd
I'm one of contributors.
Embeddable into an existing 'C' application
It's written in C.
Small footprint; I don't need all the functionality available in Apache etc.
Very compact.
Efficient; will need to support thousands of requests a second
It's libevent based framework. Can handle more than that.
Allows asynchronous responses to requests;
It's asynchronous. Also support pipelining.
Support persistent TCP connections
Sure, keep-alive.
Support use with Server-Push Comet connections
It's up to how you code your logic.
Open Source / GPL
under BSD license
support for HTTPS
Yes. it supports https with openssl.
Portable across linux, windows; preferably more.
Portable but not windows at this moment but it's portable to windows.
What you want is something that supports HTTP pipelining. You should make yourself familiar with that page if you are not already.
Yes, go for libmicrohttp. It has support for SSL etc and work in both Unix and Windows.
However, Christopher is right on the spot in his comment. If you have a startup time for each response, you are not going to gain much by pipelining. However, if you only have a significant response time to the first request, you may win something.
On the other hand, if each response has a startup time, you may gain a lot by not using pipelining, but create a new request for each object. Then each request can have its own thread, sucking up the startup costs in parallel. All responses will then be sent "at once" in the optimum case. libmicrohttp supports this mode of operation in its MHD_USE_THREAD_PER_CONNECTION thread model.
Following up on previous comments and updates...
You don't say how many concurrent connections you'll have but just "a TCP link".
If it's a single connection, then you'll be using HTTP pipelining as previously mentioned; so you would only need a handful of threads — rather than thousands — to process the requests at the head of the pipeline.
So you wouldn't need to have a thread for every request; just a small pool of workers for each connection.
Have you done any testing or implementation so far to show whether you actually do have problems with response latency for pipelined connections?
If your embedded device is powerful enough to cope with thousands of requests per second, including doing TLS setup, encryption and decryption, I would worry about premature optimisation at this level.
Howard,
Have you taken a look at lighthttpd? It meets all of your requirements except it isn't explicitly an embedded webserver. But it is open source and compiling it in to your application shouldn't be too hard. You can then write a custom plugin to handle your requests.
Can't believe no one has mentioned nginx. I've read large portions of the source-code and it is extremely modular. You could probably get the parts you need working pretty quickly.
uIP or lwip could work for you. I personally use uIP. It's good for a small number of clients and concurrent connections (or as you call it, "pipelining"). However, it's not as scalable or as fast at serving up content as lwip from what I've read. I went with simplicity and small size of uIP instead of the power of lwip, as my app usually only has 1 user.
I've found uIP pretty limited as the concurrent connections increase. However, I'm sure that's a limitation of my MAC receive buffers available and not uIP itself. I think lwip uses significantly more memory in some way to get around this. I just don't have enough ethernet RAM to support a ton of request packets coming in. That said, I can do background ajax polling with about a 15ms latency on a 56mhz processor.
http://www.sics.se/~adam/software.html
I've actually modified uIP in several ways. Adding a DHCP server and supporting multipart POST for file uploads are the big things.) Let me know if you have any questions.

Resources