Sanitize::html with accents CakePHP2.0 - cakephp

I was using this to save my data into the Database:
$this->request->data['Post']['body'] = utf8_decode($this->request->data['Post']['body']);
Like that, i could save into my DB some "special" characters like the ones with accents:
áéíóú without any problem.
Then, i tried to protect my application from HTML and SQL injections and i used Sanitization like this:
$this->request->data['Post']['body'] = Sanitize::html($this->request->data['Post']['body']);
So now my text is stored on the database like this:
á = á
é = é
í = í ...etc
And i dont want that. Also, my field on the DB has a maximun of characters and this doesn't help.
I have also tried to use the options param at sanitize with encode = true, encode = false or encode = 'utf8' but nothing seems to change.
What should i do?
Thanks.
UPDATE 1
I have also tried to use htmlentities function on my controller but it inserts this in my database instead of á:
Ã

If you're using Cake to save your data (i.e., using save() not query()) then you are protected against SQL injection. It escapes values automatically.
As stated in the docs, Sanitize::html() will convert characters to HTML entities, such as >, á, etc. You probably shouldn't use it unless you specifically want HTML entities. Cake will take care of storing your accents in the database just fine if you have the proper encoding on your app and tables.

Related

How to make Text::slug() convert german umlauts properly?

I am using CakePHP 3.6, and when I am using words with german umlauts like:
Text::slug('Grundstücke')
I will get:
Grundstucke (where ü = u)
but that's not correct, I should get:
Grundstuecke (where ü = ue)
Is there an option to set so that umlauts are being converted the way I want them to?
Change your transliterator
The Text::slug() uses internally transliterator_transliterate (see php doc).
So you need to change the default transliterator that is being used.
After some research I found one that will work for you.
At the end of your bootstrap.php file add:
\Cake\Utility\Text::setTransliteratorId( 'de-ASCII; Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove ');
Then your text will be converted as you expect.
Notes
Resources I've used to find this answer:
CakekPHP Text::transliterate()
Transliteration Identifiers Documentation
transliterator_list_ids - to get a list of valid identifiers - this is how I found the one that finally worked: de-ASCII
Text Utility API - to set the a new default transliterator id.

Need to create a copy of a Database to a location not under Data

I want to create a copy of a database to a folder not under the data either locally or on a server I have code that looks like this:
var arcName:String = "C:\Archive\MyArchives\SomeName.nsf"
var arcDB:NotesDatabase = appDB.createCopy("", arcName);
When the action finishes (it does not generate any errors) I can't find the database anywhere. if I change the arcName to "Archives\Myarchives\SomeName,nsf" the process works correctly. But I don't want these Archives under Data.
Using the full path does not seem to make it move out from under the Data folder.
This may be a case of string escaping - in SSJS, like most C-lineage languages, \ is the escape character. Give it a shot with \\ in place of each. In my testing, it works as database.createCopy("", "C:\\Archive\\MyArchives\\SomeName.nsf").

How to query atom field with unicode value in Google App Engine production search?

I wrote some text search with use Google App Engine search.
In SDK I tested such query on atom field:
u'tag:"wartości"'
In production I run the same query but it not works on same data.
How can I do unicode query on atom field?
Is it possible to use unicode in Google App Engine search?
We are aware of this issue and plan to fix ASAP. The fix that we're currently planning will require that the atom field value include exactly the same accent characters in order to match. Matches will continue to be case-insensitive. We expect that at least initially, values that use combining diacritical marks will be treated as different values than those using precomposed characters. We may revisit that decision depending on feedback, but it's the most straightforward fix on our end.
For more on the precomposed characters vs. combining diacritical marks, see this Wikipedia article:
http://en.wikipedia.org/wiki/Precomposed_character
Chris
It looks that I need translate AtomField values into new string and I need to translate queries too. This workaround will allow only Polish unicode search. I do not know tonkenization rules so I use 'q', 'x' to expand alphabet since not used in Polish.
# coding=utf-8
translate = {
u'ą': u'aq',
u'Ą': u'Aq',
u'ć': u'cq',
u'Ć': u'Cq',
u'ę': u'eq',
u'Ę': u'Eq',
u'ł': u'lq',
u'Ł': u'Lq',
u'ń': u'nq',
u'Ń': u'Nq',
u'ó': u'oq',
u'Ó': u'Oq',
u'ś': u'sq',
u'Ś': u'Sq',
u'ż': u'zx',
u'Ż': u'Zx',
u'ź': u'zq',
u'Ź': u'Zq',
}
import re
reTranslate = re.compile(u'(%s)' % u'|'.join(translate))
print reTranslate.pattern
test = u"""\
Właściwie prowadzona komunikacja wewnętrzna w firmie,\
zwłaszcza dużej czy posiadającej rozproszoną sieć oddziałów,\
może przynieść oszczędność czasu, a co za tym idzie, również pieniędzy."""
print reTranslate.sub(lambda match: translate[match.group(0)], test)

Different coding sets in database and website

I have a website with very simple news system (posting, editting, deleting etc). All my html pages are saved in UTF-8 formatting, everything displayes correctly.
I specify using UTF in every header:
For saving news to database, I use simple scripts like (all values come from a html form):
$newsTitel = isset($_POST['title']) ? $_POST['title'] : 'Untitled';
$submitDate = $date = date('Y/m/d');
$content = isset($_POST['newstext']) ? $_POST['newstext'] : 'No content';
include 'includes/dbconnect.php';
mysql_query("SET CHARACTER SET utf8");
mysql_query("SET NAMES 'utf8'");
$query = mysql_query("INSERT INTO news SET date='$submitDate',subject='$newsTitel',news='$content'");
The data get saved to database but in a weird format (coding). There are characters like à ¡ Ä etc which makes the content almost unreadable. Other problem is that when loading this content back to html forms (for editting news) it displays in this weird coding. When I looked into the specification of the database I use, it says that it saves data in UTF-8.
I use phpMyAdmin to access the MYSQL database.
So to sum it up:
Pages: saved in UTF8, all have correct header
Database: interaction with the server: utf8_czech_ci, tables in the same format
What I do not understand at all is this strange bevaior:
1) I save the data into the database using the script above
2) I take a look into phpMyAdmin and see broken encoding
3) I load the data back into my website and display them using this:
<?php
include 'includes/dbconnect.php';
$data = mysql_query("SELECT * FROM news ORDER BY id DESC limit 20") or die(mysql_error());
while($info = mysql_fetch_array( $data ))
{
echo '<article><h3> '.$info['subject'].'</h3><div id="date">'.$info['date'].'</div>';
echo '<p>'.$info['news']. '</p></article>';
}
?>
The encoding is correct and no weird characters are displayed.
4) I load the exact same data into a html form (for edition purposes) and see the same broken encoding as in the database.
What happened? I really dont get it. I tried fixing this by re-saving everything in utf8, alterign tables and changing their encodings into different utf8 versions etc...
This is example of a data I pass to the database (it is in czech with html tags):
<p>Vařila myšička kašičku</p>
<img src="someImage.jpg">
<p>Další text</p>
Thanks for any help...
The commands for specifying the character set should be:
set names 'utf8';
If you check the result returned from your queries at the moment, what does it say? If I try it in the monitor I get the following:
mysql> set names 'UTF-8';
ERROR 1115 (42000): Unknown character set: 'UTF-8'
Have you tried using set names 'utf8' before connecting for the SELECT as well? The characters you're saying are output make me think you're getting back the correct bytes for UTF-8, but they're being interpreted as ISO-8859-1.
You are not escaping single quotes or some other html chars.
Use mysql_real_escape_string.
$newsTitel = isset($_POST['title']) ? mysql_real_escape_string($_POST['title']) : 'Untitled';

Google Datastore problem with query on *User* type

On this question I solved the problem of querying Google Datastore to retrieve stuff by user (com.google.appengine.api.users.User) like this:
User user = userService.getCurrentUser();
String select_query = "select from " + Greeting.class.getName();
Query query = pm.newQuery(select_query);
query.setFilter("author == paramAuthor");
query.declareParameters("java.lang.String paramAuthor");
greetings = (List<Greeting>) query.execute(user);
The above works fine - but after a bit of messing around I realized this syntax in not very practical as the need to build more complicated queries arises - so I decided to manually build my filters and now I got for example something like the following (where the filter is usually passed in as a string variable but now is built inline for simplicity):
User user = userService.getCurrentUser();
String select_query = "select from " + Greeting.class.getName();
Query query = pm.newQuery(select_query);
query.setFilter("author == '"+ user.getEmail() +"'");
greetings = (List<Greeting>) query.execute();
Obviously this won't work even if this syntax with field = 'value' is supported by JDOQL and it works fine on other fields (String types and Enums). The other strange thing is that looking at the Data viewer in the app-engine dashboard the 'author' field is stored as type User but the value is 'user#gmail.com', and then again when I set it up as parameter (the case above that works fine) I am declaring the parameter as a String then passing down an instance of User (user) which gets serialized with a simple toString() (I guess).
Anyone any idea?
Using string substitution in query languages is always a bad idea. It's far too easy for a user to break out and mess with your environment, and it introduces a whole collection of encoding issues, etc.
What was wrong with your earlier parameter substitution approach? As far as I'm aware, it supports everything, and it sidesteps any parsing issues. As far as the problem with knowing how many arguments to pass goes, you can use Query.executeWithMap or Query.executeWithArray to execute a query with an unknown number of arguments.

Resources