I'm storing URLs in a database, and I want to be able to know if two URLs are identical.
Generally, a trailing slash at the end doesn't change the response you'd get from a server. (ie. http://www.google.com/ is the same as http://www.google.com)
Can I always blindly remove the trailing slash from any URL, without looking at anything?
Is that safe?
What I mean by "without looking at anything" is that I'd remove the slash from:
http://www.google.com/q?xxx=something&yyy=something/
I know the web server could theoretically return completely different things if it wanted, and I know sometimes going to a URL without the slash will redirect to one with the slash. My only intention here is determining if both URLs are the same.
Is this method safe?
No it is not always safe. A web server could interpret the path part of the URL anyway it likes. You cannot tell what it will do (resolve the URI) without using a GET or HEAD on the URL.
It may be safe in the sense that you'll get the same response with or without a trailing slash (and I can't guarantee that's true), but they can definitely mean different things. Consider a URL that references a directory, or something presented by the site as a directory. Using the URL
http://www.somesite.com/directory/
...makes it clear you're asking for a directory. If you hack off the trailing slash:
http://www.somesite.com/directory
...the site's going to take this as a request for a file called "directory", and get all confused for a moment. It'll likely interpret this as a request for a directory, but the meanings are not the same, and you might not get what you expect.
See this article for more detail.
No. I've encountered situations where, depending on the settings in a .htaccess file, some directories or "clean URLs" (such as those generated by a CMS) could not be accessed without a trailing slash. It's rare and it might be a mistake on the part of the webmaster, but it can happen.
As others have noted, it's not always safe. If it will work for you, my recommendation is to store the URL's with the slashes, and strip them off when you do your comparison. You'll take a performance hit, but I'd think that's better than sending someone to the wrong web page.
Related
For years our database driven websites have had URLs that look like:
https://www.example.com/product?id=30
But nowadays, and especially for SEO purposes, we want our URLs to be "nice" and look like:
https://www.example.com/30/myproduct
We use Zope 2.13.x running on Debian and using Apache 2.4 as the front-end webserver. I know that not too many people use Zope, but utilizing Apache's mod_rewrite we should be able to proxy the rewrite and have nice URLs that still pass the database arguments necessary in order to properly serve the pages to the end users.
There used to be a Zope Cookbook where I wrote a bunch of really detailed tutorials on Zope functionality but that no longer seems to exist and I wanted to share this with the SE community.
The awesome thing is that this is not specific to Zope, but will/should work with any rewrite of a parameter based URL into a nice URL and it's super easy once it's all working.
For complete transparency, I am going to answer my own question so that it's documented for everyone.
Using the rewrite engine in Apache, decide how you want your URLs to look to the end user in their web browser.
For example, if you are calling to a database and have a url that looks like
https://www.example.com/products?id=30&product_name=myproduct
but you want that URL to look like
https://www.example.com/products/30/myproduct
you would use a rewrite rule as follows:
RewriteRule ^/products/(.*)/(.*) /products?id=$1&product_name=$2 [L,P,NE,QSA]
To explain that further:
^/products/(.*)/(.*) is saying that anytime domain.com/products is accessed, look for two variables in the next directory names, i.e. /(.*)/(.*)
If you only wanted one variable you would do ^/products/(.*)
Likewise if you wanted three variables you would do ^/products/(.*)/(.*)/(.*)
From there we need to tell Apache how to interpret that URL in order to rewrite and still allow Zope (or whatever db you may be using) to pass the correct URL parameters. That part is:
/products?id=$1&product_name=$2
Apache will now take the first (.*) and treat that as $1. It will take the second (.*) and treat that as $2 and so on.
The part in the brackets is extremely important
L = This makes Apache stop processing the rewrite ruleset if the rule matches. This is important because you don't want Apache to get confused and start trying other rewrites.
P = Proxy the request. This makes sure that the browser does not display a different URL than https://www.example.com/products/30/myproduct (i.e. we do not want the end user seeing the rewritten URL as https://www.example.com/products?id=30&product_name=myproduct
NE = No Escaping any URL characters. You need this to ensure that the URL rewrite does not try and escape the special characters like $ = & as these are important to URL parameters
QSA = This allows multiple variables (or URL parameters) to exist
Please Note: It is very important to consider how you want your URLs to look (the nice URLs) because that is what you want to submit to the search engines. If you change your URL structure, those nice URLs will no longer work and your search engine rankings may decrease.
I see in websites such as Facebook or Twitter, images such as profile pictures have filenames and locations such as 640122062739084800/BXK8aBbv.jpg.
This is quite clearly generated. But why do websites to this? Why not instead have (user_id)/image.jpg instead which is much more logical?
Is there a security risk or is there another reason? Thanks.
There a script behind every 'token' you see in URL.
Tokens are a way to control what's happening and when with security.
Some characters sequence are specification of the request, even you don't understand it at fist time.
In short. Yes, its generally for security purposes, but for controls and request specifications too.
Hope it was usesfull
I currently have a website i'm working on that I have taken over from another individual, I dumped his SQL file into my database and everything seems to be ok apart from one thing. Whenever I try to log in to the back end or if I try to go elsewhere, it will add an additional .co.uk to the address bar, making it like so:
From: www.domain.co.uk to www.domain.co.uk.co.uk
I've had a dig in the database but I really can't find anything and i've never faced this issue before, could anyone shed some light on this for me? Maybe just let me know where I could look within the database to identify the problem, many thanks.
Take a look at the .htaccess file in the root folder, which is hidden and may contain rewrite rules.
Also, I recommend you use this plugin for migrations:
http://wordpress.org/extend/plugins/wp-migrate-db/
I use it whenever I move from localhost to a live site and vice versa. It will also ensure your widgets are preserved, since doing a find replace will cause the object serialisation syntax WordPress uses to break.
After migrating, you need to visit Settings > Permalinks so the .htaccess file can be updated according to the new URL for rewrites.
I set up an older Rails 2 project on a brand new Apache#Debian#squeeze. The project itself could be a single pager, using links to scroll page up and down. My links look like that:
http://mydomain.com/en/#home
These links do fine as long as JavaScript intercepts the click event and simply scrolls to the intended section. In case the user leaves the single page and opens one where these links (still the same) cannot be followed via JavaScript, I only receive an:
Forbidden
You don't have permission to access /en/ on this server.
If I change the link to:
http://mydomain.com/en#home
everything works fine and as expected. But I do not want to change my link structure. It already worked well at an older Debian5 box.
I expect that to be an Apache2 configuration issue, but do not find anything useful in the net.
Looking forward to any kind of enlightenment.
Thx
Felix
I don't know how or where you are working with javascript related to this problem, but let me tell you this.
Everything after the hashtag # is never passed to the server. Its HTTP standardization, it is just not passed to the server.
It is only intended to navigate to anchor within the webpage, and today used for a lot of new techniques including, but not limited to, xss scripting, javascript hooks, etc
It is possible that links are prohibited to load with an onclick event and some javascript does something instead, but it is not possible that you end up on this page http://mydomain.com/en/#home if http://mydomain.com/en/ does not work.
However to solve your problem you probably have to adjust your your apache rewriting rule (or enable mod_rewrite at all?) to also capture links with trailing slashes.
The link http://mydomain.com/en/ http://mydomain.com/en is something different and could serve a completely different page.
I would strongly recommend not to get a mess here and do a strict permanent redirect from one to the other. Which you choose for primary usage is up to you.
I prefer a trailing slash and can also supply arguments for that, but they can be invalidated easily and replaced by some to suggest the opposite. You should find plenty on discussion on that if you search for trailing slash here.
To solve your problem please try to find the according RewriteRule, copy it and add it one more time with a trailing slash. See whether it works and make a redirect to the url without trailign slash.
You may also edit your answer and post your server config to get help with that.
We generally use two different hosting services. On one, everything works ticketyboo, as it does on my local dev servers. On the other server, however, I am having this problem:
I can't access the users controller like this:
http://www.example.com/users/login
But I can like this:
http://www.example.com/Users/login
** note the capitalised 'Users' **
If I displace the application to a sub-folder everything works fine (both upper- and lowercase).
The hosting company have looked at it and can't see a problem at their end and they assure me that users is not a reserved word.
You might say this isn't a problem, just use the version that works. Unfortunately it leads to problems downstream where Cake core starts generating urls itself.
Anybody else seen this problem or know the solution?
[This only occurs on the users controller - all others work as expected]
Without seeing all your code / diving in too deeply, I'm not sure what the cause of this problem is. Do you have some special stuff going on in the routes.php file? If you have a specific route defined for users, that could be it.
However, you could make a quick fix -- in UsersController (or AppController if you want to ensure this behavior doesn't pop up elsewhere) just add a line to the beforeFilter() method to capitalize / decapitalize (whichever is more appropriate) the controller parameter.
[edit] - sorry, didn't finish that first paragraph. It still could be the routes file, even though it works on one server and not the other, because it's possible that the working server uses a case-insensitive apache module that normalizes all urls. This is why it's so nice to have your staging and dev setups being EXACTLY the same as production.
While the hosting support denied that the word 'user' or 'users' or 'Users' was in any way reserved, it seems that it was:
"We have removed the users/ redirect"
Problem solved.