I have a website with very simple news system (posting, editting, deleting etc). All my html pages are saved in UTF-8 formatting, everything displayes correctly.
I specify using UTF in every header:
For saving news to database, I use simple scripts like (all values come from a html form):
$newsTitel = isset($_POST['title']) ? $_POST['title'] : 'Untitled';
$submitDate = $date = date('Y/m/d');
$content = isset($_POST['newstext']) ? $_POST['newstext'] : 'No content';
include 'includes/dbconnect.php';
mysql_query("SET CHARACTER SET utf8");
mysql_query("SET NAMES 'utf8'");
$query = mysql_query("INSERT INTO news SET date='$submitDate',subject='$newsTitel',news='$content'");
The data get saved to database but in a weird format (coding). There are characters like à ¡ Ä etc which makes the content almost unreadable. Other problem is that when loading this content back to html forms (for editting news) it displays in this weird coding. When I looked into the specification of the database I use, it says that it saves data in UTF-8.
I use phpMyAdmin to access the MYSQL database.
So to sum it up:
Pages: saved in UTF8, all have correct header
Database: interaction with the server: utf8_czech_ci, tables in the same format
What I do not understand at all is this strange bevaior:
1) I save the data into the database using the script above
2) I take a look into phpMyAdmin and see broken encoding
3) I load the data back into my website and display them using this:
<?php
include 'includes/dbconnect.php';
$data = mysql_query("SELECT * FROM news ORDER BY id DESC limit 20") or die(mysql_error());
while($info = mysql_fetch_array( $data ))
{
echo '<article><h3> '.$info['subject'].'</h3><div id="date">'.$info['date'].'</div>';
echo '<p>'.$info['news']. '</p></article>';
}
?>
The encoding is correct and no weird characters are displayed.
4) I load the exact same data into a html form (for edition purposes) and see the same broken encoding as in the database.
What happened? I really dont get it. I tried fixing this by re-saving everything in utf8, alterign tables and changing their encodings into different utf8 versions etc...
This is example of a data I pass to the database (it is in czech with html tags):
<p>Vařila myšička kašičku</p>
<img src="someImage.jpg">
<p>Další text</p>
Thanks for any help...
The commands for specifying the character set should be:
set names 'utf8';
If you check the result returned from your queries at the moment, what does it say? If I try it in the monitor I get the following:
mysql> set names 'UTF-8';
ERROR 1115 (42000): Unknown character set: 'UTF-8'
Have you tried using set names 'utf8' before connecting for the SELECT as well? The characters you're saying are output make me think you're getting back the correct bytes for UTF-8, but they're being interpreted as ISO-8859-1.
You are not escaping single quotes or some other html chars.
Use mysql_real_escape_string.
$newsTitel = isset($_POST['title']) ? mysql_real_escape_string($_POST['title']) : 'Untitled';
Related
I am trying to save category name after convert in slug
So in entity I have used setter for convert my text to slug text
protected function _setName($name)
{
return Text::slug($name);
}
After send post request in input name "আমি তোমায় ভালোবাসি"
Has got in database
ami-tomaya-bhalobasi
After make transliteratorId false
return Text::slug($name,[
'transliteratorId' => false
]);
I got output : আম-ত-ম-য-ভ-ল-ব-স
My expected result is
আমি-তোমায়-ভালোবাসি
How can I get my desire result ?
The whole point of slugs is to obtain a "safe" pure US-ASCII string. If all you seemingly want is to remove white spaces you can use a simple regular expression:
preg_replace('/\s/u', '-', 'আমি তোমায় ভালোবাসি')
However, I recommend you double-check why you think this is necessary in the first place. A properly encoded URL would display spaces as %20 anyway, which is "ugly" in a Latin script text but will get unnoticed in other scripts:
var_dump(rawurlencode('আমি তোমায় ভালোবাসি'));
string(159) "%E0%A6%86%E0%A6%AE%E0%A6%BF%20%E0%A6%A4%E0%A7%8B%E0%A6%AE%E0%A6%BE%E0%A6%AF%E0%A6%BC%20%E0%A6%AD%E0%A6%BE%E0%A6%B2%E0%A7%8B%E0%A6%AC%E0%A6%BE%E0%A6%B8%E0%A6%BF"
When I try to upload a file with apostrophe, I get the error:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
if the file name is test's.pdf, I get the error. But if I change the name to test.pdf, there is no error.
Does anyone know why?
Thanks
I had a similar situation where I was dynamically creating filenames for pages that created excel files from query results. The approach I took was to create a function that replaced all the bad characters with something. Here is part of that function.
<cfargument name="replacementString" required="no" default=" ">
<cfscript>
var inValidFileNameCharacters = "[/\\*'?[\]:><""|]";
return reReplace (arguments.fileNameIn, inValidFileNameCharacters, arguments.replacementString, "all");
</cfscript>
You might want to consider an opposite approach. Instead of declaring invalid characters and replacing them, declare valid ones and replace anything that is not in the list of valid characters.
I suggest making this a function that's available on all appropriate pages. How you do that depends on your situation.
My guess is that the apostrophe is one of those multi-character apostrophes that Microsoft Word often uses. A character like that may not be a valid character for your OS file system.
You may want to re-code the system to use a temporary file on upload and then rename it to a valid file name after the upload is successful.
Here's some basic trouble shooting info.
Wrap your code in a try/catch block and dump the full error to the page output. Examples of using try/catch/dump below. The examples below force an error by dividing by zero.
For tag based cfml:
<cftry>
<cfset offendingCode = 1 / 0>
<cfcatch type="any">
<cfdump var="#cfcatch#" label="cfcatch">
</cfcatch>
</cftry>
For cfscript cfml:
<cfscript>
try {
offendingCode = 1 / 0;
} catch (any e) {
writeDump(var=e, label="Exception");
}
</cfscript>
I was using this to save my data into the Database:
$this->request->data['Post']['body'] = utf8_decode($this->request->data['Post']['body']);
Like that, i could save into my DB some "special" characters like the ones with accents:
áéíóú without any problem.
Then, i tried to protect my application from HTML and SQL injections and i used Sanitization like this:
$this->request->data['Post']['body'] = Sanitize::html($this->request->data['Post']['body']);
So now my text is stored on the database like this:
á = á
é = é
í = í ...etc
And i dont want that. Also, my field on the DB has a maximun of characters and this doesn't help.
I have also tried to use the options param at sanitize with encode = true, encode = false or encode = 'utf8' but nothing seems to change.
What should i do?
Thanks.
UPDATE 1
I have also tried to use htmlentities function on my controller but it inserts this in my database instead of á:
Ã
If you're using Cake to save your data (i.e., using save() not query()) then you are protected against SQL injection. It escapes values automatically.
As stated in the docs, Sanitize::html() will convert characters to HTML entities, such as >, á, etc. You probably shouldn't use it unless you specifically want HTML entities. Cake will take care of storing your accents in the database just fine if you have the proper encoding on your app and tables.
I downloaded a data set which is supposed to be in RDF format http://iw.rpi.edu/wiki/Dataset_1329, using Notepad++ I opened it but can't read it. Any suggestions?
The file, uncompressed, is about 140MB. Notepad++ is probably failing due to the size of the file. The RDF format used in this dataset is Ntriples, one triple per line with three components (subject, predicate, object), very human readable. Sample data from the file:
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/race_other_multi_racial> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/race_black_and_white> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/national_origin_hispanic> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/filed_cases> "1" .
If you want to have a look at the data then try to open it with a tool that streams the file rather than loading it all at once, for instance less or head.
If you want to use the data you might want to look into loading it in a triple store (4store, Virtuoso, Jena TDB, ...) and use SPARQL to query it.
Try Google Refine (possibly with RDF extension: http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ )
I'm working on a data migration task, where I have to export a somewhat large Lotus Notes application into a blogging platform. My first task was to export the articles from Lotus Notes into CSV files.
I created a Agent in LotusScript to export the data into CSV files. I use a modified version of this IBM DeveloperWorks forum post. And it basically does the job. But the contents of the Rich Text field is stripped of any formatting. And this is not what I want, I want the Rich Text field rendered as HTML.
The documentation for the GetItemValue method explicitly states that the text is rendered into plain text. So I began to research for something that would retrieve the HTML. I found the NotesMIMEEntity class and some sample code in the IBM article How To Access HTML in a Rich Text Field Using LotusScript.
But for the technique described in the above article to work, the Rich Text field need to have the property "Store Contents as HTML and MIME". And this is not the case with my Lotus Notes database. I tried to set the property on the fields in question, but it didn't do the trick.
Is it possible to use the NotesMIMEEntity and set the "Store Contents as HTML and MIME" property after the content has been added, to export the field rendered as HTML?
Or what are my options for exporting the Notes database Rich Text fields as HTML?
Bonus information: I'm using IBM Lotus Domino Designer version 8.5
There is this fairly unknown command that does exactly what you want: retrieve the URL using the command OpenField.
Example that converts only the Body-field:
http://SERVER/your%5Fdatabase%5Fpath.nsf/NEW%5FVIEW/docid/Body?OpenField
Here is how I did it, using the OpenField command, see D.Bugger's post above
Function GetHtmlFromField(doc As NotesDocument, fieldname As String) As String
Dim obj
Set obj = CreateObject("Microsoft.XMLHTTP")
obj.open "GET", "http://www.mydomain.dk/database.nsf/0/" + doc.Universalid + "/" + fieldname + "?openfield&charset=utf-8", False, "", ""
obj.send("")
Dim html As String
html = Trim$(obj.responseText)
GetHtmlFromField = html
End Function
I'd suggest looking at Midas' Rich Text LSX (http://www.geniisoft.com/showcase.nsf/MidasLSX)
I haven't used the personally, but I remember them from years ago being the best option for working with Rich Text. I'd bet it saves you a lot of headaches.
As for the NotesMIMEEntity class, I don't believe there is a way to convert RichText to MIME, only MIME to RichText (or retain the MIME within the document for emailing purposes).
If you upgrade to Notes Domino 8.5.1 then you can use the new ConvertToMIME method of the NotesDocument class. See the docs. This should do what you want.
Alternativly the easiest way to get the Domino server to render the RichText will be to actually retrieve it via a url call. Set up a simple form that just has the RichText field and then use your favourite HTTP api to pull in the page. It should then be pretty straight forward to pull out the body.
Keep it simple.
Change the BODY field to Store contents as HTML and MIME
Open the doc in editmode.
Save.
Close.
You can now use the NotesMIMEEntity to get what you need from script.
You can use the NotesDXLExporter class to export the Rich Text and use an XSLT to transform the output to what you need.
I know you mentioned using LotusScript, but if you don't mind writing a small Java agent (in the Notes client), this can be done fairly easily - and there is no need to modify the existing form design.
The basic idea is to have your Java code open a particular document through a localhost http request (which is simple in Java) and to have your code capture that html output and save it back to that document. You basically allow the Domino rendering engine to do the heavy lifting.
You would want do this:
Create a form which contains only the rich-text field you want to convert, and with Content Type of HTML
Create a view with a selection formula for all of the documents you want to convert, and with a form formula which computes to the new form
Create the Java agent which just walks your view, and for each document gets its docid, opens a URL in the form http://SERVER/your_database_path.nsf/NEW_VIEW/docid?openDocument, grabs the http response and saves it.
I put up some sample code in a similar SO post here:
How to convert text and rich text fields in a document to html using lotusscript?
Works in Domino 10 (have not tested with 9)
HTMLStrings$ = NotesRichTextItem .Converttohtml([options] ) As String
See documentation :
https://help.hcltechsw.com/dom_designer/10.0.1/basic/H_CONVERTOHTML_METHOD_NOTESRICHTEXTITEM.html
UPDATE (2022)
HCL no longer support this method since version 11. The documentation does not include any info about the method.
I have made some tests and it still works in v12 but HCL recommended to not use it.
Casper's recommendation above works well, but make sure the ACL is such to allow Anonymous Access otherwise your HTML will be the HTML from your login form
If you do not need to get the Richtext from the items specifically, you can use ?OpenDocument, which is documented (at least) here: https://www.ibm.com/developerworks/lotus/library/ls-Domino_URL_cheat_sheet/
https://www.ibm.com/support/knowledgecenter/SSVRGU_9.0.1/com.ibm.designer.domino.main.doc/H_ABOUT_URL_COMMANDS_FOR_OPENING_DOCUMENTS_BY_KEY.html
OpenDocument also allows you to expand sections (I am unsure if OpenField does)
Syntax is:
http://Host/Database/View/DocumentUniversalID?OpenDocument
But be sure to include the charset parameter as well - Japanese documents were unreadable without specifying utf-8 as the charset.
Here is the method I use that takes a NotesDocument and returns the HTML for the doc as a string.
private string ConvertDocumentToHml(Domino.NotesDocument doc, string sectionList = null)
{
var server = doc.ParentDatabase.Server.Split('/')[0];
var dbPath = doc.ParentDatabase.FilePath;
string viewName = "0";
string documentId = doc.UniversalID.ToUpper();
var ub = new UriBuilder();
ub.Host = server;
ub.Path = dbPath.Replace("\\", "/") + "/" + viewName + "/" + documentId;
if (string.IsNullOrEmpty(sectionList))
{
ub.Query = "OpenDocument&charset=utf-8";
}
else
{
ub.Query = "OpenDocument&charset=utf-8&ExpandSection=" + sectionList;
}
var url = ub.ToString();
var req = HttpWebRequest.CreateHttp(url);
try
{
var resp = req.GetResponse();
string respText = null;
using (var sr = new StreamReader(resp.GetResponseStream()))
{
respText = sr.ReadToEnd();
}
return respText;
}
catch (WebException ex)
{
return "";
}
}