Why always store shortnames for emojis? - database

On EmojiOne's Github page, they say:
When storing user inputted text in your database, [...] you should always make sure you are storing text containing only :shortnames: and not Unicode emoji characters [...].
Why is it always a bad idea ? If my server language, my database, and the browsers versions supported by my web app can all handle them without difficulty, where is the problem?

After a few days of reading on emoji and looking at emojione, my conclusion -- I would not store shortcodes (what emojione called shortnames) in your storage mechanism/db. As suggested by a comment in your question, simply store the character (the emoji: 😀) itself in your database.
For those just starting to learn what emoji are, they are simply another character in the unicode standard. Letters, numbers, exclamation points, Japanese characters, etc etc are all characters part of the unicode standard. There's nothing special about emoji, you can simply think of them like any other unicode character. All modern browsers will render them correctly.
The primary reason I would not store shortcodes is simplicity. By storing the actual unicode character 😀, you don't have to do any type of conversion when displaying the character to a user. If you were to store the shortcode, in this case that's :grinning:, you'll have to do some sort of conversion to correctly display a grinning face to the user.
Emojione's library has the ability to convert either the unicode itself or shortcodes to their images. Given that, just store unicode and use emojione's library to convert them before displaying to the user. If you want to stop using emojione in the future and use the standard browser implementation of emoji, you'll have no extra work to do.

Related

How to encode FNC1 starting character to make GS1 Datamatrix?

I have made string for GS1 Datamatrix
è010506060985000521sn1234567890ab 1002TRIAL003 17200228
ASCII 232
(01) Product Code (aka GTIN)
(21) Serial Number
ASCII 29 (aka aka Group Separator)
(10) Lot/Batch
ASCII 29 (aka aka Group Separator)
(17) Expiry Date
I am passing this string to Dev express Control – symbology as Datamatrix and compatible mode as ASCII.
This barcode scan correctly click here to view barcode as GS1 Datamatrix, but when I sent this string to our printing person in China, he did printed but when I am scanning his barcode getting error “Unknown encoding”.
I thing their system is not able to encode ASCII 232 – “è”.
Is any alternate way?
I am just replacing FNC 1 Start changer ASCII 232 to ASCII 29, is it correct way? click here to view barcode Is it GS1 Datamatrix?
(I just scan that in one mobile app in that it comes as GS1 Datamatrix but when did I scan into another app it just come as Datamatrix)
I want to achieve GS1 Datamatrix...
Thanks
this issue is totally dependant on the hardware used. The way to indicate FNC1 character may differ between printer family/type. Do you have info on which one is used in your case?
First, your printer partner should check himself the label he's creating (there is a GS1 app easy to use on smartphone to do that), so he can directly see if the expected information are present and well encoded.
Then, you should check which printer type he is using and which software is used to create the printer mask/job. I know lots of people are using NiceLabel for example, but I remember some issues can be found on the FNC1 character is you are using some recent Zebra printer for example. This is something the printer SAV can probably help with if it's something similar.
[EDIT]:In case of doubt this can help but you probably have it already.
Based on what you said, your part is acting like a scanner, so check chapter 2.2.1 => Important: In accordance with ISO/IEC 15424 - Data Carrier Identifiers (including
Symbology Identifiers), the Symbology Identifier is the first three characters transmitted by
the scanner indicating symbology type. For a GS1 DataMatrix the symbology identifier is ]d2

Android studio: Where should i put my Strings?

Where should i put my Strings?
I need many relatively long Strings. Only one String is displayed simultaneously called by a switch/case. Should i save the Strings in the Code(in a separated class), in an array, a SQL DB or in the Strings.xml?
i am pretty new to android, but from what i understand res -> values -> strings.xml is for text or content description set in your layout.xml
if it were me depending on the importance of the data, i would either store it on a class if there is not too much information, or more than likely use sql lite;

PostgreSQL: unable to save special character (regional language) in blob

I am using PostgreSQL 9.0 and am trying to store a bytea file which contains certain special characters (regional language characters - UTF8 encoded). But I am not able to store the data as input by the user.
For example :
what I get in request while debugging:
<sp_first_name_gu name="sp_first_name_gu" value="ઍયેઍ"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
This is what is stored in DB:
<sp_first_name_gu name="sp_first_name_gu" value="\340\252\215\340\252\257\340\253\207\340\252\215"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
Note the difference in value tag. With this issue I am not able to retrieve the proper text input by the user.
Please suggest what do I need to do?
PS: My DB is UTF8 encoded.
The value is stored correctly, but is escaped into octal escape sequences upon retrieval.
To fix that - change the settings of the DB driver or chose different different encoding/escaping for bytea.
Or just use proper field types for the XML data - like varchar or XML.
Your string \340\252\215\340\252\257\340\253\207\340\252\215 is exactly ઍયેઍ in octal encoding, so postgres stores your data correctly. PostgreSQL escapes all non printable characters, for more details see postgresql documentation, especially section 8.4.2

How to escape special characters when retrieving data from database?

I am going to generate XML file based on the data returned from SQL Server, but there are some special characters like  and  (there may be other characters like these), which will fail the XML.
Is there any way to escape them?
Thanks!
The control characters U+001C (file separator) and U+001F (unit separator) are not legal to include in an XML 1.0 document, whether verbatim or encoded using a &#...; numeric character reference.
They are allowed in XML 1.1 documents only when included as a character reference. However, XML 1.1 is not nearly as widely accepted as 1.0, and you can't have U+0000 (null) even as a character reference, so it's still not possible to put arbitrary binary data in an XML file — not that it was ever a good idea.
If you want to include data bytes in an XML file you should generally be using an ad hoc encoding of your own that is accepted by all consumers of your particular type of document. It is common to use base64 for the purpose of putting binary data into XML. For formats that do not accommodate any such special encoding scheme, you simply cannot insert these control characters.
What is the purpose of the control characters?
The exact same way you're escaping any other user-supplied input prior to insertion into a database; probably one of (from worst to best):
Escaping control characters prior to construction of an SQL statement
Use of parameterised queries
Use of a DAO or ORM which abstracts this problem away from you
Use parametrized queries and you won't have to worry about escaping. Can't really give you more help as to how to use them unless you mention which language you're using.
Well, I just use the pattern matching stuff to replace those special characters manually. Match for '&#.+?;'

Any python/django function to check whether a string only contains characters included in my database collation?

As expected, I get an error when entering some characters not included in my database collation:
(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
Is there any function I could use to make sure a string only contains characters existing in my database collation?
thanks
You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:
import re
exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)
If an object is returned a match is found, if no return value, invalid string.
I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).
From Python's docs:
"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."
http://docs.python.org/library/stdtypes.html
http://docs.python.org/library/codecs.html

Resources