Jsoup - how to deal with URLs that have space - google-app-engine

I'm using Jsoup to connect to a URL with a space in it but keep getting a URI syntax error. Any ideas on how deal with this?
Thanks!!

You should percent-encode URLs. Space is represented as %20.
Additionally, some characters/symbols are used as delimiters in URLs (&, ?, =, etc.) and thus they cannot be put in the URL unless they are meant as delimiters. For example, if you wanted to pass the string "1 &1" I would have to encode it as "1%20%261" or else it will not parse properly.
See http://en.wikipedia.org/wiki/Percent-encoding for more details.

URLs cannot have spaces, by RFC1738 (section 2.2): http://www.ietf.org/rfc/rfc1738.txt
For some additional background on how to handle this, see: http://www.w3schools.com/tags/ref_urlencode.asp
In particular, one mechanism for encoding URLs can be found at:
HTTP URL Address Encoding in Java

Related

Use a path (with "/") as a param in a URL

I'm using React Router v6 (Beta) and have a requirement to allow a path as a param, eg.
"some-url/:folderPath" -> "some-url/some/path/to/a/folder"
This is proving trickier than I expected, though I recognize the obvious confusion caused by slashes being URL delimiters. I found this answer that suggests that at some point React Router allowed for a "+" that would allow the URL param to include everything following that point in the URL, though it no longer appears supported.
Is there an explicit way to approach this problem using the React APIs?
You can encode any character using URL encoding. The encoding for forward slash is %2F. If you replace all the slashes with %2F it should work. You can also call a function that will URL-encode a string.
See: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent
EDIT:
That being said, the comment by #Linda Paiste is a superior solution.

Encoded slash (%2F) in query parameter causes redirection to fail in Angular Application

In my angular application, I need to make GET call to a Tomcat server. This GET call requires query parameters which could contain special characters too like "+", "/", "/+"
GET call is being made from angular controller using $window.open with target as "_blank"
Currently the redirection is getting failed without any encoding.
So, I added encoding in .js file before the GET call is being made by using encodeURIComponent.
Then I added decoding logic using URLDecode.decode in backend java code to decode query parameters.
But still it doesn't work.
It works only if I encode query parameters twice within the .js file using encodeURIComponent twice.
I am trying to find the root cause for double encoding but no luck yet. I would greatly appreciate if anyone could share any inputs.
Made it work by adding a * in path parameter in app.js. Adding a star means that the request will include multiple path parameters separated by /, and so angular will not try to encode / in the request.
Double encoding could also work but then the server side logic has to be modified to decode the request parameters twice and replace %2B2F by %2F

Angular Routing with an encoded backslash

I am building in invite/registration form for my site. The idea is that one user invites another user, which sends a code with a url to register with. The problem is that the code can have an encoded backslash in it. When that encoded backslash is processed in Angular, it seems to get decoded and ends up busting the routing.
http://localhost:54464/ang/register/owi0%2fCQCrjzBcwqEORVVHhrICIANGKxtxMJ2Kh91y%2bNhhB%2br06appZzEVPhpkP2C
becomes:
http://localhost:54464/ang/register/owi0/CQCrjzBcwqEORVVHhrICIANGKxtxMJ2Kh91y+NhhB+r06appZzEVPhpkP2C
How can I stop this behavior?
Try using a route like:
/register/*code
The code will contain the string with the slashes
source
It is typically used for path-like url arguments...but I don't see why this wouldn't work for your case.

Angular ui-router - Curly brackets etc in url

I have been searching for a way to have curly brackets{} and slashes / in the URL. But the URL looks like this:
URL appearance now:
http://localhost/#/?ph=tes:testes%2Ftest%7Blb0%7D
URL wanted appearance:
http://localhost/#/?ph=tes:tests/test{lb0}
How do I get an URL without the %2f etc?
Thanks in advance!
All valid characters that can be used in a URI (a URL is a type of URI) are defined in RFC 3986.
All other characters can be used in a URL provided that they are "URL Encoded" first. This involves changing the invalid character for specific "codes" (usually in the form of the percent symbol (%) followed by a hexadecimal number).
This link, HTML URL Encoding Reference, contains a list of the encodings for invalid characters.
Happy Helping!
These (%2F, %7B) are escape sequences. They represent the special characters that are in your URL, as a URL cannot contain these character directly.
Where do you want exactly to process this URL?
You can use Javascript's unescape(url) to get the escaped URL string back to the original one if you're going to process it in Javascript itself.

mail clients stripping part of angular url

I am sending a signup activation email containing a signup confirmation url with a confirmation token that points to an angular front end app:
...
Activate
...
Note that the token is a JWT and is fairly long.
This works find for most users, but for some clicking on the link takes them to https://domain/com only without the confirm-signup?token=...
It seems as though the mail client may be stripping off everything after the #, but I can't find any evidence of others having this problem, nor can I reproduce it.
My best guess so far is that some mail clients are seeing the # and somehow treating the trailing part as an internal anchor and stripping it...?
Has anyone else encountered this sort of problem? If so, have you found any solution short of replacing the whole mechanism with something else?
Some clients treat the hash-link just fine. Others don't. There's a conversation about Outlook being dirty about this here: Outlook strips URL hash from email
What we did to resolve this at our company is simply create a handler on our server that redirects. Your email link would become http://domain.com/email-link?url=https%3A%2F%2Fdomain.com%2F%23%2Fconfirm-signup%3Ftoken%3D1234 and your server side script would grab the query param url and immediately trigger a redirect.
You'd need to make sure that you find all links in your emails and replace them. Here's a PHP function for that, but you could do this in whatever backend language you're using. Regex here may be helpful at least.
function replaceLinks($html,$hash) {
return preg_replace_callback('/<a [^>]*href=[\"\']{1}(.+?)[\"\\\']{1}/', function($matches) use ($hash) {
return str_replace($matches[1],"http://domain.com/email-link?url=".rawurlencode($matches[1]),$matches[0]);
}, $html);
}
Yes I have encountered this issue before because of the #, I was trying to link to a anchor on a landingpage.. My solution ended up using a short.url service to "hide" the # from the html e.g. https://goo.gl/
Looks like you need percent encoding!
A lot of times when your href gets parsed (by angular in this case) it doesn't handle the special characters right, or strips them. Find your problem characters and replace them with %3F for ?, %26 for &, and %23 for #. The rest are in a chart in the link.
Once the encoded address hits the browser the url will be decoded in your url bar.

Resources