What kind of slot to be used to catch a random piece of text? - alexa

I am trying to build an Amazon Alexa skill. In that, one of the intents needs a text string. It can be any random word (including names). I have do some search from a database using that word. How do I go about solving this?
I have followed the suggestion given in the accepted answer of this question - Amazon Alexa - How to create Generic Slot . But the skill is not able to read the word (or anything that sounds like it). It just identifies the intent but the slot has confirmationStatus = NONE.

You can use AMAZON.SearchQuery to capture less-predictable input that makes up the search query.
You can find more details at https://developer.amazon.com/docs/custom-skills/slot-type-reference.html#amazonsearchquery

Related

How to tell Watson conversation to not recognize strings as numbers

I'm facing a strange issue with IBM Watson Conversation when capturing numbers in Spanish language:
In Spanish when you write (or say), "please give me an answer" (por favor, dame una respuesta) or "I want to talk with a professional" (quiero hablar con un profesional), Watson recognize the words "una" and "un" as a number. Yes, it is a number (the number 1) but in these phrases they do not have the meaning of a number, they work as an article.
Do you know how to tell Watson to not recognize strings as numbers? I have been thinking about patterns but the numbers can have different length.
According to the Official documentation, the #sys-number system entity detects numbers that are written using either numerals or words. In either case, a numeric value is returned.
When you enable the System Entity #sys-number, this entity always tries to detect if the user typed some number, there are the recognized formats:
21
twenty one (in your case, works with un, una, etc)
3.13
You can see this table showing how to use this entity with other examples:
So, Watson will recognize these values (un, una) like one number, and currently don't have exceptions or configuration for does not recognize something, like your example, the word typed by the user.
If you want for some purpose to send to the user the 'una' or 'un'(literal format example), just add in your conversation response:
The number is #sys-number.literal
And the return in the bot will be:
The number is un?
See more about #sys-number System entity.
See more about System entities.

How do I store a signature block, including formatting, in a Sql server table?

I've been assigned the task of creating a table that stores an email signature for each username. The question is, how should I store the signature block? I could use a regular varchar type, but then how do I store the formatting metadata?
Any ideas or suggestions would be welcome.
Thanks!
Another idea I had was that you could design a specific email signature template, and then let people specify fields, such as Username, quote, avatar, alignment etc, and then have them modify their signature in a "signature editor". This way you could just store the "data" and not the rendering. so you could store something like follows:
<signature>
<username>chama</username>
<avatar href="http://url to my image"/>
<quote>A bird in the hand is not in the nest</quote>
</signature>
and it could look something like:
Chama
A bird in the hand is not in the nest
use varchar(max), or whatever length limit is appropriate.
otherwise, the only real concern is that you might want to make sure the html is html-encoded before you stick it in the database. (i.e., replace < with <, etc.) Not sure what you're using, but some tools have a setting so you don't have to do it manually.
other things you can do besides / in addition to html-encoding
1) restrict the formatting tags to some pre-defined set (i.e., search/replace tags you don't want before doing the insert. You can manage this in your db stored procedure, or better yet, in your front-end (if you have control over that).
2) disqualify attempts to insert data if they include certain tags (like '<script>', etc.)
HTML, RTF, XML. The stanard choices are multiple.
Note: "email signature" is NOT "digital signature". The term digital signature has a specific meaning and means a SIGNATURE to make sure - for email - it comes from th real sender and has not been tampered with.
I'd suggest going with your initial thought -- varchar(max). This will allow you to store signatures that are ASCII based. This includes plaintext, RTF or HTML signatures.
If users want to embed images (i.e. not a link to an image), then you'd have to determine a way for the caller to convert those images to Base64 or other before storing and after reading from your table.
Based on what I'm finding, you have basically two options:
1) Convert your formatted signature data to Binary and store it as a BLOB.
2) Instead of saving the signature itself in the DB, save them as files somewhere and store a reference to that file location in the DB.

How can I use SQL Server's full text search across multiple rows at once?

I'm trying to improve the search functionality on my web forums. I've got a table of posts, and each post has (among other less interesting things):
PostID, a unique ID for the individual post.
ThreadID, an ID of the thread the post belongs to. There can be any number of posts per thread.
Text, because a forum would be really boring without it.
I want to write an efficient query that will search the threads in the forum for a series of words, and it should return a hit for any ThreadID for which there are posts that include all of the search words. For example, let's say that thread 9 has post 1001 with the word "cat" in it, and also post 1027 with the word "hat" in it. I want a search for cat hat to return a hit for thread 9.
This seems like a straightforward requirement, but I don't know of an efficient way to do it. Using the regular FREETEXT and CONTAINS capabilities for N'cat AND hat' won't return any hits in the above example because the words exist in different posts, even though those posts are in the same thread. (As far as I can tell, when using CREATE FULLTEXT INDEX I have to give it my index on the primary key PostID, and can't tell it to index all posts with the same ThreadID together.)
The solution that I currently have in place works, but sucks: maintain a separate table that contains the entire concatenated post text of every thread, and make a full text index on THAT. I'm looking for a solution that doesn't require me to keep a duplicate copy of the entire text of every thread in my forums. Any ideas? Am I missing something obvious?
As far as i can see there is no "easy" way of doing this.
I would create a stored procedure which simply splits up the search words and starts looking for the first word and put the threadid's in a table variable. Then you look for the other words (if any) in the threadids you just collected (inner join).
If intrested i can write a few bits of code but im guessing you wont need it.
What are you searching for?
CAT HAT as a complete word, in which case:
CONTAINS(*,'"CAT HAT")
CAT OR HAT then..
CONTAINS (*,'CAT OR HAT')
Searching for "CAT HAT" and expecting just the post with CAT in doesn't make any sense. If the problem is parsing what the user types, you could just replace SPACES with OR (to search any of the words, AND if both required). The OR will give you both posts for thread 9.
SELECT DISTINCT ThreadId
FROM Posts
WHERE CONTAINS (*,'"CAT OR HAT")
Better still you could , if it helps, use the brilliant irony (http://irony.codeplex.com/) which translates (parses) a search string into a Fulltext query. Might help for you.
Requires the use of google syntax for the original search which can only be a good thing as most people are used to typing in google searches.
Plus here is an article on how to use it.
http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/

Twitter name length in DB

I'm adding a field to a member table for twitter names for members on a site. From what I can work out the maximum twitter name length is 20 so it seems obvious that I should set the field size to varchar(20) (SQL Server).
Is this a good idea?
What if Twitter starts allowing multi-byte characters in the user names? Should I make this field nvarchar?
What if Twitter decides to increase the size of a username? Should I make it 50 instead and then warn a user if they enter a name longer than 20?
I'm trying to code defensively so that I can reduce the chances of modifying the code around this input field and the DB schema changes that might be needed.
while looking for the same info i found the following in a sort of weird place in the twitter help section (why not in the API docs? who knows?):
"Your user name can contain up to 15 characters. Why no more? Because we append your user name to your 140 characters on outgoing SMS updates and IM messages. If your name is longer than 15 characters, your message would be too long to send in a single text message."
http://help.twitter.com/entries/14609-how-to-change-your-username
so perhaps one could even get away with varchar(16)
While new accounts has a limit of 15 characters in the username and 20 characters in the name, for old accounts this limit seems to be undefined. The documentation here states:
Earlybirds: Early users of Twitter may have a username or real name longer than user names we currently allow. This is ok until you need to save changes to your account settings. No changes will save unless your user/real name is the appropriate length; this means you have to change your real name/username to meet our most modern regulations.
So you are probably better of having a long field and save yourself some time when you hit the border cases.
Nowadays, space is usually not a concern, so I'd use a mostly generic approach: use nvarchar(200).
When designing DB schemas you must think 2 steps ahead, even more than when programming. Or get yourself a good schema update strategy, then you'll be fine also with varchar(20).
Personally I wouldn't worry. Use something like 200 (or a nice round number like 256) and you won't have this problem. The limit then is on their API, so you might be best to do some verification that it is a real username anyway. That verification implicitly includes the length checking.
Twitter allows for 140 characters to be typed in as the message payload for transmission, and includes "[username]:" at the beginning of the SMS message. With an upper limit of 140 characters for the message combined with the messaging system being based on SMS, I think they would have to decrease the allowable message size to increase the username. I think it is a pretty safe bet that 20 characters would be the max username length. I'd use nvarchar just in case someone uses 16-bit characters, and maybe pad it a little. nvarchar(24) should work; I wouldn't go any higher than nvarchar(32).
If you're going to develop an app for their service, you should probably watch the messages on Twitter's API Announcements mailing list.
[opinion only]
Twitter works on SMS and the limit there is something like 256 characters, so the name has to be small to avoid hitting into the message.
nvarchar would be a good idea for all twitter text
If the real ID of a Twitterer is a cell-phone then the longest phone number is your max - 20 should easily cover it!
Defensive programming is always good :) !
[/opinion only]
There's only so much you can code defensively, I'd suggest looking at the twitter API documentation and following anything specified there. That said, from a cursory look through nowhere seems to specify the length of the username, annoyingly :/
One thing to keep in mind here is that a field using nvarchar needs twice as much space, since it needs 2 bytes to store each potential unicode character. So, a twitter status would need a size of 280 using nvarchar, PLUS some more for possible retweets, as those aren't inlcuded in the 140 char limit. I discovered this just today in fact!
For example:
RT #chatrbyte: here's some great tweet
that I'm retweeting.
The RT #chatrbyte: is not included in the 140 character limit.
So, assuming that a Twitter username has a 20 character limit, and wanting to also capture a ReTweet, a field to hold a full tweet would need to be a nvarchar of size 280 + 40 (for the username) + 8 (for the initial RT # before a retweet) +4 (for the :+space after a Retweet username) = 330.
I would say go for nvarchar(350) to give yourself a little room. That's what I am trying right now. If I'm wrong I'll update here.
I'm guessing you are managing the data entry on the Twitter name field in your application somewhere other than just in the database. If you open the field to 200 characters, you only have to change the code in one place or if you allow users to enter Twitters names with more than 20 characters, you don't have to worry about a change at all.

Make SQL Server index small numbers

We're using SQL Server 2005 in a project. The users of the system have the ability to search some objects by using 'keywords'. The way we implement this is by creating a full-text catalog for the significant columns in each table that may contain these 'keywords' and then using CONTAINS to search for the keywords the user inputs in the search box in that index.
So, for example, let say you have the Movie object, and you want to let the user search for keywords in the title and body of the article, then we'd index both the Title and Plot column, and then do something like:
SELECT * FROM Movies WHERE CONTAINS(Title, keywords) OR CONTAINS(Plot, keywords)
(It's actually a bit more advanced than that, but nothing terribly complex)
Some users are adding numbers to their search, so for example they want to find 'Terminator 2'. The problem here is that, as far as I know, by default SQL Server won't index short words, thus doing a search like this:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"')
is actually equivalent to doing this:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator"') <-- notice the missing '2'
and we are getting a plethora of spurious results.
Is there a way to force SQL Server to index small words? Preferably, I'd rather index only numbers like 1, 2, 21, etc. I don't know where to define the indexing criteria, or even if it's possible to be as specific as that.
Well, I did that, removed the "noise-words" from the list, and now the behaviour is a bit different, but still not what you'd expect.
A search won't for "Terminator 2" (I'm just making this up, my employer might not be really happy if I disclose what we are doing... anyway, the terms are a bit different but the principle the same), I don't get anything, but I know there are objects containing the two words.
Maybe I'm doing something wrong? I removed all numbers 1 ... 9 from my noise configuration for ENG, ENU and NEU (neutral), regenerated the indexes, and tried the search.
These "small words" are considered "noise words" by the full text index. You can customize the list of noise words. This blog post provides more details. You need to repopulate your full text index when you change the noise words file.
I knew about the noise words file, but I'm not why your "Terminator 2" example is still giving you issues. You might want to try asking this on the MSDN Database Engine forum where people that specialize in this sort of thing hang out.
You can combine CONTAINS (or CONTAINSTABLE) with simple where conditions:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"') and Title like '%Terminator 2%'
While the CONTAINS find all Terminator the where will eliminate 'Terminator 1'.
Of course the engine is smart enough to start with the CONTAINS not the like condition.

Resources