How to tell Watson conversation to not recognize strings as numbers - ibm-watson

I'm facing a strange issue with IBM Watson Conversation when capturing numbers in Spanish language:
In Spanish when you write (or say), "please give me an answer" (por favor, dame una respuesta) or "I want to talk with a professional" (quiero hablar con un profesional), Watson recognize the words "una" and "un" as a number. Yes, it is a number (the number 1) but in these phrases they do not have the meaning of a number, they work as an article.
Do you know how to tell Watson to not recognize strings as numbers? I have been thinking about patterns but the numbers can have different length.

According to the Official documentation, the #sys-number system entity detects numbers that are written using either numerals or words. In either case, a numeric value is returned.
When you enable the System Entity #sys-number, this entity always tries to detect if the user typed some number, there are the recognized formats:
21
twenty one (in your case, works with un, una, etc)
3.13
You can see this table showing how to use this entity with other examples:
So, Watson will recognize these values (un, una) like one number, and currently don't have exceptions or configuration for does not recognize something, like your example, the word typed by the user.
If you want for some purpose to send to the user the 'una' or 'un'(literal format example), just add in your conversation response:
The number is #sys-number.literal
And the return in the bot will be:
The number is un?
See more about #sys-number System entity.
See more about System entities.

Related

How to encode FNC1 starting character to make GS1 Datamatrix?

I have made string for GS1 Datamatrix
è010506060985000521sn1234567890ab 1002TRIAL003 17200228
ASCII 232
(01) Product Code (aka GTIN)
(21) Serial Number
ASCII 29 (aka aka Group Separator)
(10) Lot/Batch
ASCII 29 (aka aka Group Separator)
(17) Expiry Date
I am passing this string to Dev express Control – symbology as Datamatrix and compatible mode as ASCII.
This barcode scan correctly click here to view barcode as GS1 Datamatrix, but when I sent this string to our printing person in China, he did printed but when I am scanning his barcode getting error “Unknown encoding”.
I thing their system is not able to encode ASCII 232 – “è”.
Is any alternate way?
I am just replacing FNC 1 Start changer ASCII 232 to ASCII 29, is it correct way? click here to view barcode Is it GS1 Datamatrix?
(I just scan that in one mobile app in that it comes as GS1 Datamatrix but when did I scan into another app it just come as Datamatrix)
I want to achieve GS1 Datamatrix...
Thanks
this issue is totally dependant on the hardware used. The way to indicate FNC1 character may differ between printer family/type. Do you have info on which one is used in your case?
First, your printer partner should check himself the label he's creating (there is a GS1 app easy to use on smartphone to do that), so he can directly see if the expected information are present and well encoded.
Then, you should check which printer type he is using and which software is used to create the printer mask/job. I know lots of people are using NiceLabel for example, but I remember some issues can be found on the FNC1 character is you are using some recent Zebra printer for example. This is something the printer SAV can probably help with if it's something similar.
[EDIT]:In case of doubt this can help but you probably have it already.
Based on what you said, your part is acting like a scanner, so check chapter 2.2.1 => Important: In accordance with ISO/IEC 15424 - Data Carrier Identifiers (including
Symbology Identifiers), the Symbology Identifier is the first three characters transmitted by
the scanner indicating symbology type. For a GS1 DataMatrix the symbology identifier is ]d2

What kind of slot to be used to catch a random piece of text?

I am trying to build an Amazon Alexa skill. In that, one of the intents needs a text string. It can be any random word (including names). I have do some search from a database using that word. How do I go about solving this?
I have followed the suggestion given in the accepted answer of this question - Amazon Alexa - How to create Generic Slot . But the skill is not able to read the word (or anything that sounds like it). It just identifies the intent but the slot has confirmationStatus = NONE.
You can use AMAZON.SearchQuery to capture less-predictable input that makes up the search query.
You can find more details at https://developer.amazon.com/docs/custom-skills/slot-type-reference.html#amazonsearchquery

Why always store shortnames for emojis?

On EmojiOne's Github page, they say:
When storing user inputted text in your database, [...] you should always make sure you are storing text containing only :shortnames: and not Unicode emoji characters [...].
Why is it always a bad idea ? If my server language, my database, and the browsers versions supported by my web app can all handle them without difficulty, where is the problem?
After a few days of reading on emoji and looking at emojione, my conclusion -- I would not store shortcodes (what emojione called shortnames) in your storage mechanism/db. As suggested by a comment in your question, simply store the character (the emoji: 😀) itself in your database.
For those just starting to learn what emoji are, they are simply another character in the unicode standard. Letters, numbers, exclamation points, Japanese characters, etc etc are all characters part of the unicode standard. There's nothing special about emoji, you can simply think of them like any other unicode character. All modern browsers will render them correctly.
The primary reason I would not store shortcodes is simplicity. By storing the actual unicode character 😀, you don't have to do any type of conversion when displaying the character to a user. If you were to store the shortcode, in this case that's :grinning:, you'll have to do some sort of conversion to correctly display a grinning face to the user.
Emojione's library has the ability to convert either the unicode itself or shortcodes to their images. Given that, just store unicode and use emojione's library to convert them before displaying to the user. If you want to stop using emojione in the future and use the standard browser implementation of emoji, you'll have no extra work to do.

What are the values for the 'country' field in Active Directory?

I have a feeling Windows expects 'country' to be an integer, with 0 meaning 'US'. If that's the case, what's the mapping between integers and ISO 2-letter country codes?active
There are three different properties that must be set in Active Directory. Each is designated in the ISO 3166 standard. The ISO website has a search tool that you can use to find the official codes. Select Country codes and hit search, then click on Officially assigned... on the left.
c — 2 digit abbreviation (e.g. US)
The country/region in the address of the user. The country/region is represented as a 2-character code based on ISO-3166.
co — Country name (e.g. United States). Microsoft got really detailed on their description for this one.
The country/region in which the user is located.
countryCode — Numeric Id (e.g. 840)
Specifies the country/region code for the user's language of choice. This value is not used by Windows 2000.
Note: If you want to clear the country field, then you need to set this value to 0. You cannot set it to null or String.Empty. It will throw a DirectoryServicesCOMException stating "The server is unwilling to process the request" when you call CommitChanges() if you try to set it to anything other than an int.
DirectoryEntry.Properties["countryCode"].Value = 0;
See this link here:
ISO 3166 Country Codes
Seems to be standard ISO 3166 country codes used in several places.
Same result from this post here: Active Directory and .NET
Point 5 reads:
5. Set user's country
To set the country property for a user
was one of the tasks that took me some
time to figure out. After some hours
of research I realized that you need
to know the ISO 3166 Codes for
countries and set three properties to
define a user's country: c, co, and
countryCode.
Best overview that includes the elusive ISO 3166 numeric codes can be found on Wikipedia - of course! (at ISO itself, you can't seem to get those lists for free - you have to pay for the privilege....)
There's two country properties, countryCode and c, both are ISO 3166 values. The former is a number and the latter a string (ISO 3166 A2).
See ISO 3166.
Also, there's the co property which is the name of country.

Twitter name length in DB

I'm adding a field to a member table for twitter names for members on a site. From what I can work out the maximum twitter name length is 20 so it seems obvious that I should set the field size to varchar(20) (SQL Server).
Is this a good idea?
What if Twitter starts allowing multi-byte characters in the user names? Should I make this field nvarchar?
What if Twitter decides to increase the size of a username? Should I make it 50 instead and then warn a user if they enter a name longer than 20?
I'm trying to code defensively so that I can reduce the chances of modifying the code around this input field and the DB schema changes that might be needed.
while looking for the same info i found the following in a sort of weird place in the twitter help section (why not in the API docs? who knows?):
"Your user name can contain up to 15 characters. Why no more? Because we append your user name to your 140 characters on outgoing SMS updates and IM messages. If your name is longer than 15 characters, your message would be too long to send in a single text message."
http://help.twitter.com/entries/14609-how-to-change-your-username
so perhaps one could even get away with varchar(16)
While new accounts has a limit of 15 characters in the username and 20 characters in the name, for old accounts this limit seems to be undefined. The documentation here states:
Earlybirds: Early users of Twitter may have a username or real name longer than user names we currently allow. This is ok until you need to save changes to your account settings. No changes will save unless your user/real name is the appropriate length; this means you have to change your real name/username to meet our most modern regulations.
So you are probably better of having a long field and save yourself some time when you hit the border cases.
Nowadays, space is usually not a concern, so I'd use a mostly generic approach: use nvarchar(200).
When designing DB schemas you must think 2 steps ahead, even more than when programming. Or get yourself a good schema update strategy, then you'll be fine also with varchar(20).
Personally I wouldn't worry. Use something like 200 (or a nice round number like 256) and you won't have this problem. The limit then is on their API, so you might be best to do some verification that it is a real username anyway. That verification implicitly includes the length checking.
Twitter allows for 140 characters to be typed in as the message payload for transmission, and includes "[username]:" at the beginning of the SMS message. With an upper limit of 140 characters for the message combined with the messaging system being based on SMS, I think they would have to decrease the allowable message size to increase the username. I think it is a pretty safe bet that 20 characters would be the max username length. I'd use nvarchar just in case someone uses 16-bit characters, and maybe pad it a little. nvarchar(24) should work; I wouldn't go any higher than nvarchar(32).
If you're going to develop an app for their service, you should probably watch the messages on Twitter's API Announcements mailing list.
[opinion only]
Twitter works on SMS and the limit there is something like 256 characters, so the name has to be small to avoid hitting into the message.
nvarchar would be a good idea for all twitter text
If the real ID of a Twitterer is a cell-phone then the longest phone number is your max - 20 should easily cover it!
Defensive programming is always good :) !
[/opinion only]
There's only so much you can code defensively, I'd suggest looking at the twitter API documentation and following anything specified there. That said, from a cursory look through nowhere seems to specify the length of the username, annoyingly :/
One thing to keep in mind here is that a field using nvarchar needs twice as much space, since it needs 2 bytes to store each potential unicode character. So, a twitter status would need a size of 280 using nvarchar, PLUS some more for possible retweets, as those aren't inlcuded in the 140 char limit. I discovered this just today in fact!
For example:
RT #chatrbyte: here's some great tweet
that I'm retweeting.
The RT #chatrbyte: is not included in the 140 character limit.
So, assuming that a Twitter username has a 20 character limit, and wanting to also capture a ReTweet, a field to hold a full tweet would need to be a nvarchar of size 280 + 40 (for the username) + 8 (for the initial RT # before a retweet) +4 (for the :+space after a Retweet username) = 330.
I would say go for nvarchar(350) to give yourself a little room. That's what I am trying right now. If I'm wrong I'll update here.
I'm guessing you are managing the data entry on the Twitter name field in your application somewhere other than just in the database. If you open the field to 200 characters, you only have to change the code in one place or if you allow users to enter Twitters names with more than 20 characters, you don't have to worry about a change at all.

Resources