Apache hashbang url problems - apache2

I set up an older Rails 2 project on a brand new Apache#Debian#squeeze. The project itself could be a single pager, using links to scroll page up and down. My links look like that:
http://mydomain.com/en/#home
These links do fine as long as JavaScript intercepts the click event and simply scrolls to the intended section. In case the user leaves the single page and opens one where these links (still the same) cannot be followed via JavaScript, I only receive an:
Forbidden
You don't have permission to access /en/ on this server.
If I change the link to:
http://mydomain.com/en#home
everything works fine and as expected. But I do not want to change my link structure. It already worked well at an older Debian5 box.
I expect that to be an Apache2 configuration issue, but do not find anything useful in the net.
Looking forward to any kind of enlightenment.
Thx
Felix

I don't know how or where you are working with javascript related to this problem, but let me tell you this.
Everything after the hashtag # is never passed to the server. Its HTTP standardization, it is just not passed to the server.
It is only intended to navigate to anchor within the webpage, and today used for a lot of new techniques including, but not limited to, xss scripting, javascript hooks, etc
It is possible that links are prohibited to load with an onclick event and some javascript does something instead, but it is not possible that you end up on this page http://mydomain.com/en/#home if http://mydomain.com/en/ does not work.
However to solve your problem you probably have to adjust your your apache rewriting rule (or enable mod_rewrite at all?) to also capture links with trailing slashes.
The link http://mydomain.com/en/ http://mydomain.com/en is something different and could serve a completely different page.
I would strongly recommend not to get a mess here and do a strict permanent redirect from one to the other. Which you choose for primary usage is up to you.
I prefer a trailing slash and can also supply arguments for that, but they can be invalidated easily and replaced by some to suggest the opposite. You should find plenty on discussion on that if you search for trailing slash here.
To solve your problem please try to find the according RewriteRule, copy it and add it one more time with a trailing slash. See whether it works and make a redirect to the url without trailign slash.
You may also edit your answer and post your server config to get help with that.

Related

Use Apache to Rewrite URLs with Database Parameters as Nice URLs

For years our database driven websites have had URLs that look like:
https://www.example.com/product?id=30
But nowadays, and especially for SEO purposes, we want our URLs to be "nice" and look like:
https://www.example.com/30/myproduct
We use Zope 2.13.x running on Debian and using Apache 2.4 as the front-end webserver. I know that not too many people use Zope, but utilizing Apache's mod_rewrite we should be able to proxy the rewrite and have nice URLs that still pass the database arguments necessary in order to properly serve the pages to the end users.
There used to be a Zope Cookbook where I wrote a bunch of really detailed tutorials on Zope functionality but that no longer seems to exist and I wanted to share this with the SE community.
The awesome thing is that this is not specific to Zope, but will/should work with any rewrite of a parameter based URL into a nice URL and it's super easy once it's all working.
For complete transparency, I am going to answer my own question so that it's documented for everyone.
Using the rewrite engine in Apache, decide how you want your URLs to look to the end user in their web browser.
For example, if you are calling to a database and have a url that looks like
https://www.example.com/products?id=30&product_name=myproduct
but you want that URL to look like
https://www.example.com/products/30/myproduct
you would use a rewrite rule as follows:
RewriteRule ^/products/(.*)/(.*) /products?id=$1&product_name=$2 [L,P,NE,QSA]
To explain that further:
^/products/(.*)/(.*) is saying that anytime domain.com/products is accessed, look for two variables in the next directory names, i.e. /(.*)/(.*)
If you only wanted one variable you would do ^/products/(.*)
Likewise if you wanted three variables you would do ^/products/(.*)/(.*)/(.*)
From there we need to tell Apache how to interpret that URL in order to rewrite and still allow Zope (or whatever db you may be using) to pass the correct URL parameters. That part is:
/products?id=$1&product_name=$2
Apache will now take the first (.*) and treat that as $1. It will take the second (.*) and treat that as $2 and so on.
The part in the brackets is extremely important
L = This makes Apache stop processing the rewrite ruleset if the rule matches. This is important because you don't want Apache to get confused and start trying other rewrites.
P = Proxy the request. This makes sure that the browser does not display a different URL than https://www.example.com/products/30/myproduct (i.e. we do not want the end user seeing the rewritten URL as https://www.example.com/products?id=30&product_name=myproduct
NE = No Escaping any URL characters. You need this to ensure that the URL rewrite does not try and escape the special characters like $ = & as these are important to URL parameters
QSA = This allows multiple variables (or URL parameters) to exist
Please Note: It is very important to consider how you want your URLs to look (the nice URLs) because that is what you want to submit to the search engines. If you change your URL structure, those nice URLs will no longer work and your search engine rankings may decrease.

Hiding the word "joomla" from a script in contact form

Whenever i create a contact form in my Joomla! 3.3.6, some script appears in the the page's HTML code that contains many words Joomla in it. I'd like to change those Joomla words and replace them with another words (i.e. Foo) for some security issue. I'd like to know whether or not i'm able to do so and how.
That script is:
<script>(function(){var strings={"JLIB_FORM_FIELD_INVALID":"\u0641\u06cc\u0644\u062f \u0646\u0627\u0645\u0639\u062a\u0628\u0631:&#160"};if(typeof Joomla=='undefined'){Joomla={};Joomla.JText=strings;}
else{Joomla.JText.load(strings);}})();</script>
I have no idea whether a plugin or an extension creates it or not.
Thank you
Regards
This script seems to be translating some text required for the form to use in its javascript, eg validation messages. It does this using a javascript version of JText, which is part of core Joomla. There is some info on how that works here. Weirdly, there seems to be little information in the official Joomla documentation about it.
The main JText function it is calling appears here: media/system/js/core.js
I'm sure it would be possible to write a plug-in to remove this script before the page is rendered and then to translate any untranslated text with your own scripts. However, I'm not sure I see any security benefit in doing this so it seems a waste of time.
Ultimately, someone sniffing a site for what it is built in is far more likely to see if core files exist by going direct to places like media/system/js/core.js, rather than to scan the code for the word "Joomla" - which would trigger a lot of false-positives (any site which just mentions Joomla) and negatives (any page which doesn't have a form on it). It also does not reveal the version of Joomla, which is the info a hacker would more likely be after.
I think you have to search for the script (i.e via Notepad++) in the whole directory. It must be a plugin for the contact form that has some inline script in it.
also do you use any special third party plugin or so? that might be the source of it.
PS: also i had some similar experience, i don't know exactly how i got rid of those words, but like you, i wanted to do that to hide the fact that i'm using joomla for security.
Its actually Joomla who add this, from the file: Joomlainstall/libraries/joomla/document/html/renderer/head.php
And load it globaly from:
Joomlainstall/libraries/cms/html/formbehavior.php
The developer ad that code by using the function, JText, for an example:
JText::_( 'COM_CONTACT_EMAIL_FORM' )
In my case it was the plugin ContactUs Form who add the javascript. If JText is not used, it is not loaded. If I disabled the plugin, the javascript was then not loaded. If you have that plugin enabled, my be try an other contact form?
For security reson it is bad programming by the developer off Joomla, for sure.

Wordpress URL Change on submit

I currently have a website i'm working on that I have taken over from another individual, I dumped his SQL file into my database and everything seems to be ok apart from one thing. Whenever I try to log in to the back end or if I try to go elsewhere, it will add an additional .co.uk to the address bar, making it like so:
From: www.domain.co.uk to www.domain.co.uk.co.uk
I've had a dig in the database but I really can't find anything and i've never faced this issue before, could anyone shed some light on this for me? Maybe just let me know where I could look within the database to identify the problem, many thanks.
Take a look at the .htaccess file in the root folder, which is hidden and may contain rewrite rules.
Also, I recommend you use this plugin for migrations:
http://wordpress.org/extend/plugins/wp-migrate-db/
I use it whenever I move from localhost to a live site and vice versa. It will also ensure your widgets are preserved, since doing a find replace will cause the object serialisation syntax WordPress uses to break.
After migrating, you need to visit Settings > Permalinks so the .htaccess file can be updated according to the new URL for rewrites.

Whats a good way to protect a link database from automatic scrapers?

I have a large link database, that I would want to protect against others who would want to copy them. Is there anything I can do other than force people to enter a CAPTCHA before each link?
you can output the links using ROT13, and then use javascript to put them back to normal.
this way, the scrapers must support javascript in order to steal your links, which should cut down on the number of eligible scrapers
bonus points: replace ROT13 with something harder, and obfuscate your 'decode' javascript.
The javascript suggestion could work, but you would render your page inaccessible to those using assistive technologies like screen readers as well as anyone without javascript.
Another possible option would be to generate a cryptographic nonce. This technique is currently used to protect against CSRF attacks, but could also be used to ensure that the scraper would have to request a page from your site before accessing a link. This approach may not be appropriate if you support hotlinking, but if you just want to make sure that someone went to your site first, it could work.
Another somewhat ghetto option would be use referrers. These can be easily faked, but it might prevent some of the dumber scrapers. This also requires that you know where your users came from before they hit your site.
Can you let us know if you are hotlinking or if the user comes to your site before going to the protected link? We might be able to provide better advice that way.

Is it always safe to remove a trailing slash from a URL?

I'm storing URLs in a database, and I want to be able to know if two URLs are identical.
Generally, a trailing slash at the end doesn't change the response you'd get from a server. (ie. http://www.google.com/ is the same as http://www.google.com)
Can I always blindly remove the trailing slash from any URL, without looking at anything?
Is that safe?
What I mean by "without looking at anything" is that I'd remove the slash from:
http://www.google.com/q?xxx=something&yyy=something/
I know the web server could theoretically return completely different things if it wanted, and I know sometimes going to a URL without the slash will redirect to one with the slash. My only intention here is determining if both URLs are the same.
Is this method safe?
No it is not always safe. A web server could interpret the path part of the URL anyway it likes. You cannot tell what it will do (resolve the URI) without using a GET or HEAD on the URL.
It may be safe in the sense that you'll get the same response with or without a trailing slash (and I can't guarantee that's true), but they can definitely mean different things. Consider a URL that references a directory, or something presented by the site as a directory. Using the URL
http://www.somesite.com/directory/
...makes it clear you're asking for a directory. If you hack off the trailing slash:
http://www.somesite.com/directory
...the site's going to take this as a request for a file called "directory", and get all confused for a moment. It'll likely interpret this as a request for a directory, but the meanings are not the same, and you might not get what you expect.
See this article for more detail.
No. I've encountered situations where, depending on the settings in a .htaccess file, some directories or "clean URLs" (such as those generated by a CMS) could not be accessed without a trailing slash. It's rare and it might be a mistake on the part of the webmaster, but it can happen.
As others have noted, it's not always safe. If it will work for you, my recommendation is to store the URL's with the slashes, and strip them off when you do your comparison. You'll take a performance hit, but I'd think that's better than sending someone to the wrong web page.

Resources