Urlencoding weird characters in another charset - c

I'm learning C, and I'm using libcURL to send a POST request to log into a website.
I'm stumbling upon a problem, my password contains the ü character.
Reading the POST request from my browser, I can see it gets encoded as %FC.
However, when using curl_easy_escape() to encode it, it encodes as %C3%BC.
I've gone to search, and found out it's a different encoding. I think ISO, because the page has this meta: <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
However, I can't figure out how to convert it somehow.
Now, how would I go about urlencoding üas %FC?

Use of non-UTF-8 encodings for POST is an utter mess and the behavior actually varied quite a bit between browsers, so it's considered very bad practice to do this. But since you're stuck with a site that does, you'll have to work around it.
I can't find a curl api for doing percent encoding with alternate charsets so you might have to do it yourself (first use iconv to convert from your system's native encoding, hopefully UTF-8, to ISO-8859-1 (Latin-1), then do the percent encoding manually).
One idea - are you sure you even should be doing your own escaping? My impression is that it's just meant for URLs, and the curl API for POSTing forms may already do escaping internally (not sure), in which case you probably just have to tell it the right content type.

Related

Model Monitor Capture data - EndpointOutput Encoding is BASE64

https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture-endpoint.html
I have followed the steps mentioned in this link and it appears I cannot change the encoding for EndpointOutput in datacapture file. It's coming BASE64 for xgboost model. I am using latest version 1.2.3.
For monitor scheduler it required both EndpointOutput and EndpointInput to have the same encoding. My EndpointInput is CSV but EndpointOutput is coming to be BASE64 and nothing can change it.
This is causing issue while run of analyzer. After baseline is generated and data is captured, when monitoring schedule runs the analyzer it throws error of Encoding mismatch. For it to run EndpointOutput and EndpointInput should have same encoding.
I saw we cannot do anything to change the encoding of output. I used LightGBM, CatBoost algorithms also and found for these EndpointOuput encoding is JSON, which is readable but still not solving the purpose.
Is there a way we can change EndpointOutput Encoding for DataCapture.
I have used option of deserializer in predictor, used both JSONDeserializer and CSVDeserializer, still I kept getting BASE64 with xgboost and JSON encoding format with LightGBM and CatBoost algorithm.

Pass value to C binary through cURL (or FTP)

I've got a web application that I know to be written in C, that's running on a specified IP address and port. I can access the application either with telnet or nc. With each of those, once connecting I'm prompted for input.
Since I've got a copy of the binary, running it through strings and hd showed me that the application is looking for a particular string to validate.
There's a file sitting on that domain that I'd like to access, which I can't seem to do with telnet or nc, so I'm thinking that either cURL or ftp would be the better bet here.
However, since the string validation that happens with this running service isn't really a password, I'm not sure how to pass this string value into the service with cURL or ftp. My gut tells me that I probably need to structure the command as a POST, but since this definitely isn't an HTTP service, I'm not sure how to proceed.
Any ideas?
(not an answer, but too long to post as a comment)
when dealing with undocumented custom protocols, telnet/curl/wget is definitely not suitable, and nc is not practical. write your own client.
you say the server is expecting a string? well, maybe you could run a dictionary attack on it, make a client that tries everything in a large dictionary (like this?), looking for any non-standard response, and go from there. add any strings you find in the binary too, ofc
if that gets you nowhere, maybe the binary is vulnerable to timing attacks? maybe you can extract a string that it is looking for, through a timing attack
and because you already have the binary, you could run it through a disassembler and study the assembly code, it should reveal both whether or not it's timing-attack vulnerable, and, if the strings are hardcoded, what string it is looking for, albeit, reading compiled assembly code is really difficult.. (game crackers does this all the time for cracking video game copy protections)

Unreadable characters in a file on a remote server when viewing in a browser

I have to work with a text file on a remote server. The file can be accessed by a direct link using any browser, in the form http://server.school.com/files/people.all (not a real link, since the access requires a password). When I view it in Firefox some of characters are unreadable for example: 'José Luis Paniagua Sánchez'. I have a few questions.
Could the issue be caused by incorrect settings of my browser or could there be a problem with the file itself?
Is opening a file in a web browser and copying the entire content to a text editor using copy/paste inherently different from downloading the information with a script? Could it affect the encoding of the data?
Thanks.
Select in the browser the encoding, UTF-8 likely. Firefox: View - Character Encoding. The problem is that the file does not specify the encoding of the file (or specifies a default encoding).
A binary download, like downloading an image file (with which you could try), should keep the file as-is.
Cut-copy-paste using the right encoding in the browser should work for UTF-8.
Assuming it is indeed UTF-8 (multibyte sequences for special chars), and you are working on Windows (which is single-byte), you'll better use a programmer's editor like NotePad++ or JEdit, both free. They can set the encoding explicitly, and even convert.

Mercurial: which config property controls the encoding of file contents?

Is there a dedicated Mercurial configuration property which specifies the encoding of file contents and hence should be used by a Mercurial client to properly display a file?
I've found web.encoding which does not seem to mean exactly what I'm looking for. Also, Google gave some results for ui.encoding as well, but I couldn't find any hints in the reference.
Mercurial is not concerned with the encoding of the files you put in your repository: Mercurial is happy to store files with any encoding (or maybe not particular encoding at all).
This means that you can add files with UTF-8, Latin-1, or any other encoding to your repository and Mercurial will check them out exactly as they were when you added them.
The encoding of each file is not stored anywhere in Mercurial and it is up to the client to recognize the encoding (perhaps based on file content where it makes sense, e.g., for XML files).
For a Mercurial desktop client (as per your comments below) I suggest looking at the file content:
Can you decode it at UTF-16?
Can you decode it as UTF-8?
Are the NUL bytes in the file? Then stop and declare it to be "binary".
Fallback on a Latin-N encoding such as Latin-1 for Western Europe.
The UTF-16 and UTF-8 encodings are nice since they are structured and this makes it possible for you to detect that a file isn't valid UTF-8 encoded, say. The above list is written with a European perspective — you should probably also consult someone with knowledge about Shift JIS and other encodings used in Asia.
In any case, I would only expect a Mercurial client to do a best effort attempt at showing me a file with an encoding other than ASCII.
Some alternative interpretations of your question:
If you're really asking about how to make your files look "correct" when you view them in hgweb, then it's a matter of using a consistent encoding in the repository and setting `web.encoding.
If you're really asking how to ensure that text files get the OS native line ending character on different platforms (\n on Unix, \r\n on Windows) when take a look at the eol extension that comes with Mercurial.
No. Encoding (charset) is property of file in repository

Are QR codes guaranteed to work?

Not sure if this will get closed as "not a real question" but I asked this on Superuser and it was closed for that very reason. We are thinking of implementing a QR code which will be sent to a number of users via a letter.
Now I'm aware that you can just Google 'QR codes' and there are a plethora of options that allow you to make a QR code. My question is thus; if we do go with this solution can we guarantee that it would work cross platform? I.e. on Android, iOS, Symbian etc? Once a QR code is generated will it work on ANY app on ANY platform?
Thanks and apologies if this is not really a 'programming question'
Thanks
Kiran
The only time I have had problems is when the resolution of the screen displaying/the printer that printed the QR code was too low. Ensure that your QR code is big enough for the data its holding and you should be ok.
I found if I padded out the data in my QRCode with spaces or zeros I could maintain a set size which helped me place it on a page and always know its size.
Kind of a suck it and see problem though
Edit
PS. Don't use QRCodes for the sake of using QRCodes - if you're sending the user the QRCode by email why not just send them a hyperlink?!
There are QR code readers for all platforms. It is a well-established ISO standard. Whether its content will be interpreted how you want is a bit more platform-dependent; the QR code spec says nothing about what to do with the content.
However, if you're just putting text or URLs in a QR code, it will certainly be handled as you expect on all known platforms. It might be more hit and miss if encoding stuff like vCard data.
See this guide to QR code contents.

Resources