Writing HTML from HTTP response into a file in C - c

char get_buffer[10000];
SSL_write(conn,https_request_get,strlen(https_request_get));
printf("GET Sent...\n");
byte_count = SSL_read(conn,get_buffer,sizeof(get_buffer));
printf("recv()'d %d bytes of data in get_buff\n",byte_count);
printf("%s",get_buffer);
fprintf(html, "%s",get_buffer);
fwrite(get_buffer,sizeof(get_buffer),1,html); //html is the file pointer
I've written a C program with sockets that "downloads" the landing page HTML of a given website from the HTTP response. I was able to store the entire HTTP response in a char array and was also able to print it in the console using printf("%s",buffer_name).
Now I am trying to write the same array into a file ( fprintf()), but it only prints the HTTP response excluding the HTML of the page. I understand that there can be null characters and I also tried
fwrite(buffer,sizeof(buffer),1,file_ptr)
which gave me the same output as before (didn't put HTML into the file).
Can anyone help me out with this ?

Related

Uploading a file to Camel Rest route

I'm trying to upload a file using multipart/form-data to a Camel route.
All is good, however, I can't get the original file name.
Camel version is: 3.14.1
Update
With the following modification to the route. I managed to process binary files (getting the file name and storing them). However, with text files, the file is appended with the boundary footer:
------WebKitFormBoundary7BH9nQ2RqDXvTRAJ--
The route definition:
rest("/v1/file-upload-form")
.post()
.consumes(MediaType.MULTIPART_FORM_DATA_VALUE)
.route()
.process((exchange) -> {
InputStream is = exchange.getIn().getBody(InputStream.class);
MimeBodyPart mimeMessage = new MimeBodyPart(is);
DataHandler dh = mimeMessage.getDataHandler();
exchange.getIn().setBody(dh.getInputStream());
exchange.getIn().setHeader(Exchange.FILE_NAME, dh.getName());
})
.to("file://" + incomingFolder);
Thank you in advance
Edwardo
Edit: Since you have everything else already working, I'd recommend the Stream Caching option.
As Nicolas suggested, checkout Camel's MIME Multipart data format.
Also, the reason you're getting "Missing start boundary" is because your processor is consuming the InputStream. You can try to reset() it, but it might be better to just consume the InputStream once, or enable Stream Caching.
Instead of stream caching, you could also just convert the stream to a string. Before your processor, add:
.convertBodyTo(String.class)
The string can be read over and over. If you still get the missing start boundary error, try logging the body before the unmarshal operation. Make sure the message is intact and that it indeed contains the start boundary.

React Fetch POST removes characters with header Content-type = application/x-www-form-urlencoded

I am using react with fetch for sending an image to the server.
I have tried using Content-type = application/x-www-form-urlencoded to pass my data to the server but it replaces special characters with spaces (i.e. + becomes a space).
I have switched the header to be Content-type: multipart/form-data but that throws the error
Failed to load resource: the server responded with a status of 500
(Internal Server Error).
I have added a boundary to the Content-type as boundary=abcdefg.
That did not change anything and I am not sure what my boundary would be.
Finding a clear answer with straight forward examples about boundaries have been impossible to get.
The data that I am sending is a large string.
If needed I can post that as well.
Here is a sample of the code that is causing the problem:
SaveTest4(data: string) {
const options = {
method: 'post',
headers: {
"Content-type": "multipart/form-data; boundary=abcdefg"
},
body: 'data=' + data
}
fetch('api/DataPoint/AddTest4', options);
}
Based on part of your analysis, it sounds like you're trying to send base64-encoded data. A content type of application/x-www-form-urlencoded will result in the server performing URL decoding, which will replace each instance of + in the content body with a space character.
When you used a content type of multipart/form-data, the server fails with status 500 because the data you provided wasn't a properly constructed MIME document.
My psychic debugging powers tell me that you're trying to post a base64-encoded file to a ASP.NET MVC WebAPI endpoint that's expecting a JSON document. You might have a controller method that looks like this:
[HttpPost("api/DataPoint/AddTest4")]
public void AddTest4([FromBody] string data) { ... }
If you send with a content type of application/json, this endpoint will expect a document that looks like this:
"{base64-encoded-data}"
Note that there are quotes around the data, because a quoted string is a proper JSON document. You'd just use JSON.stringify() to create the quoted string in this case, which would escape any quotes within the string correctly.
If you send with application/x-www-form-urlencoded, you'd need to send a document that looks like this:
data={base64-encoded-data}
But, as you note, you'd have to make sure you escape all of the special characters in the payload; you can do this using window.encodeURIComponent(), which would translate each "+" to "%2B", among other things.
If the files that you're uploading to this endpoint are large, it would be significantly better to use an instance of FormData. This would allow the browser to stream the file to the server in chunks instead of reading it into memory and base64-encoding it in JavaScript.

pdf.js and protected files not otherwise viewable

I am using the PDF.js library to display PDf files within my site (using the pdf_viewer.js to display documents on-screen), but the PDF files I am displaying are confidential and I need to be able to show them within the site but block non-authorized public folks from being able to view the same files just by typing in theie URLs and seeing them show up right in their browser.
I tried to add the Deny from all line in my htaccess file, but that also of courfse blocked the viewer from showing the docs, so that seems to be a no-go. Clearly anyone could simply look at inspector and see the pdf file that is being read by the viewer, so it seems a direct URL is not going to be secure in any way.
I did read about PDF.js being able to read binary data, but I have no knowledge of how I might read in a PDF in my own file system and prep it for use by the library, eveen if that means it is all a bit slower in loading to get the file contents and prep it on the fly.
Anyone have a solution that allows PDFJS to work without revealing the source PDF URL, or to otherwise read the file using local file calls?
Okay, after some testing, the solution is very easy:
Get the PDF data using an Ajax-called function that can figure out what actual file is to be viewed.
In that PHP file...
Read the file into memory, using fopen and fread normally.
Convert to base64 using the base64_encode
Pass that string back to the calling Javascript.
In the original calling function, use the following to convert the string to a Uint array and then pass that to the PDFJS library...
## The function that turns the base64 string into whatever a Uint8 array is...
function base64ToUint8Array(base64) {
var raw = atob(base64);
var uint8Array = new Uint8Array(raw.length);
for (var i = 0; i < raw.length; i++) {
uint8Array[i] = raw.charCodeAt(i);
}
return uint8Array;
}
## the guts that gets the file data, calls the above function to convert it, and then calls PDF.JS to display it
$.ajax({
type: "GET",
data: {file: <a file id or whatever distinguishes this PDF>},
url: 'getFilePDFdata.php', (the PHP file that reads the data and returns it encoded)
success: function(base64Data){
var pdfData = base64ToUint8Array(base64Data);
## Loading document.
PDFJS.getDocument(pdfData).then(function (pdfDocument) {
## Document loaded, specifying document for the viewer and
## the (optional) linkService.
pdfViewer.setDocument(pdfDocument);
pdfLinkService.setDocument(pdfDocument, null);
});
}
});

How to read the body of ssl response using BIO

Hello I've got a problem trying to create a twitter console app. When getting a response I've got some thrash in it.
Here my code is:
char new_request[1024] = "GET /1.1/statuses/user_timeline.json?count=4&screen_name=twitterapi HTTP/1.1\r\nHost: api.twitter.com\r\nUser-Agent: twitter-terminal-app$
strcat(new_request, bearer);
strcat(new_request, "\r\nAccept-Encoding: gzip\r\n\r\n\0");
BIO_write(bio, new_request, strlen(new_request));
printf("%s\n", new_request);
p = BIO_read(bio, ans, 2047); // Getting header
ans[p] = 0;
printf("%s\n", ans);
char ans2[100000] = "";
p = BIO_read(bio, ans2, 10000); // Getting body
BIO_should_retry(bio);
FILE *file = fopen("result.txt", "w+");
fputs(ans2, file);
printf("%s\n%i\n", ans2, p);
The answer I have in body looks like that:
▒▒is▒HǿJ▒_
▒▒▒▒V▒▒▒▒.▒▒T▒▒f
▒Tm▒m
▒P▒▒1▒|▒}▒%▒▒▒▒Q^▒▒▒▒?▒cx{Չ
F%▒C*;▒▒ŴD/#▒L▒▒▒▒?L▒A▒▒▒N▒▒ĝ▒▒▒▒$▒El▒▒▒▒X▒▒B0▒~%▒▒5▒˲#Y)GY▒ctz▒h&-▒▒▒O>AG▒▒▒l6b▒z▒:K
▒T▒\▒▒2+▒b▒H▒&B ▒▒▒B▒qV▒▒▒▒▒▒
▒▒a_▒l▒?#▒o▒▒▒W▒▒u%▒▒▒;▒1M▒v▒▒L▒▒
Maybe the answer is encoded somehow and that's the problem. Tried to serf dev.twitter.com but didn't find any answer. If I use BIO_gets() instead of BIO_read() the answer is -2. Any ideas?
strcat(new_request, "\r\nAccept-Encoding: gzip\r\n\r\n\0");
...
p = BIO_read(bio, ans, 2047); // Getting header
...
p = BIO_read(bio, ans2, 10000); // Getting body
... Maybe the answer is encoded somehow
You've declared in your HTTP request that you support data compressed with gzip and that's why the server has send you the data compressed. If you would not only read and ignore the HTTP response header but actually take a look at it you would probably notice:
Content-Encoding: gzip
Apart from explicitly allowing compressed data you are doing a HTTP/1.1 request. This means that you would also need to be able to deal with chunked transfer encoding. HTTP/1.1 also implies that by default that connections are persistent so you would need to properly parse the header to find out where the response really ends instead of relying on connection end. You also cannot rely on a fixed size of the header or that header and body can be read with separate BIO_read calls. For example you might need multiple reads for the body or the body might already be included in the single read you do for the header.
Unless you really want to deal with all these problems yourself I recommend you better use an existing library which implements this properly and thus gets you the correct response reliably and not by chance.
If you instead want to learn how this is done I recommend you start with learning more about HTTP, i.e. by reading the wikipedia entry and then continue with all the standards referenced there. I suggest to start with HTTP/1.0 since this is simpler than HTTP/1.1.

Parse XML Data in Arduino

I am trying to read XML data in Arduino from a google spreadsheet published to the web using an HTTP GET request to the following link.
https://spreadsheets.google.com/feeds/cells/SpreadsheetID/5/public/basic?&range=D10
I receive the following reply along with some headers which I can observe on the serial port.
I want to parse the data written in bold format in the above reply. The data can be a real number and can be positive & negative. Please help me to find a way to parse this data.
if you need only one-two easy(out of param) nodes, you can use my function :)
String xmlTakeParam(String inStr,String needParam)
{
if(inStr.indexOf("<"+needParam+">")>0){
int CountChar=needParam.length();
int indexStart=inStr.indexOf("<"+needParam+">");
int indexStop= inStr.indexOf("</"+needParam+">");
return inStr.substring(indexStart+CountChar+2, indexStop);
}
return "not found";
}
I tried to find a solution for a long time, leave it here for clarity.
The complex xml with repeating nodes already so will not succeed.
Have you checked out this library?
https://web.archive.org/web/20160622041818/http://interactive-matter.eu/how-to/ajson-arduino-json-library/
You're better off converting your XML to JSON and giving that a go considering the memory availability on Arduino.
Otherwise if you really want to work with XML, then there's always these resources:
https://github.com/RobTillaart/Arduino/tree/master/libraries/XMLWriter
http://john.crouchley.com/blog/archives/454

Resources