Maximum number of slot values in Alexa skill setup - alexa

What are the maximum number of values that can be added per slot when creating a skill in Alexa?
It looks like the AMAZON.LITERAL was deprecated due to concerns around skills recording raw user conversations as an input.
I've got a large number of possible values that could be used as an input if users raw input cannot be recorded.

"A skill can have a total of 50,000 custom slot values, totaled [sic] across all custom slots used in the interaction model."
https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interaction-model-reference#h2_custom_syntax
This limit (and the lack of any ability to change them on the fly) is certainly a limitation that comes up. On the other hand, keep in mind that the word list is just a guide to how Alexa should interpret user input, so the total number of possible inputs is actually much greater then 50K.
For more, you might want to read this article:
Why a Custom Slot is the Literal Solution
https://developer.amazon.com/blogs/post/Tx3IHSFQSUF3RQP/Why-a-Custom-Slot-is-the-Literal-Solution

Related

Firebase - Efficiently query users by geolocation + other filters

I am working on a Flutter app (similar to dating apps with swippable cards ) that show one user at a time.
I have a Firestore collection called users from which I'd like to query the nearby users first (based on the current user's location). Every user has a GeoPoint(Long + Lat) and a Geohash (8 characters) saved.
What I'd like:
Query the closest users first (20 or 30 users per query call) since I don't want the user to wait a long time.
The area in which the users are being queried grows bigger and bigger whenever the webservice is being called over and over (since the close-by users have been shown already)
There are a few filters that are applied to the query like: only get users of a certain age group, or certain gender, etc.
I thought about Duncan Campbell's solution of having multiple variants of the GeoHash per user. For instance, every user would have a GeoHash of 2 characters, 3 characters, and 4 characters. And I'd search with a Where In clause containing the neighboring geohashes or the parents of the neighboring geohashes which should get me the users quickly.
However, this won't work for my purpose. Let's say we got all the closeby users by using the neighboring geohash4 (4 characters). Once done, if the user kept on swiping, we will have to search with a larger area geohash3( 3 characters) which will contain almost all the users that were searched previously. Yes I can filter them afterward, but that is not efficient (and costly). Also if we keep on going, we'd reach the geohash2, but then, in order to get the rest of the users, I'd have to search with a where not inclause containing all the neighboring geohash2 areas that we already searched. I do understand that this is a huge area to cover, but with the data that I currently have, users will cover it really quickly.
I also think that GeoFire doesn't do the job for me since it queries all the users in a given area at once which doesn't satisfy my requirements (as discussed in the linked article).
For me, it sounds like this solution is a bit complicated, and so I am seeking suggestions for making this more efficient and effective. Any ideas on how to tackle this issue?

Which is better: sending many small messages or fewer large ones?

I have an app whose messaging granularity could be written two ways - sending many small messages vs. (possibly far) fewer larger ones. Conceptually what moves around is a set of 'alive' vertex IDs that might get filtered at each superstep based on a processed list (vertex value) that vertexes manage. The ones that survive to the end are the lucky winners. compute() calculates a set of 'new-to-me' incoming IDs that are perfect for the outgoing message, but I could easily send each ID one at a time. My guess is that sending fewer messages is more important, but then each set might contain thousands of IDs. Thank you.
P.S. A side question: The few custom message type examples I've found are relatively simple objects with a few primitive instance variables, rather than collections. Is it nutty to send around a collection of IDs as a message?
I have used lists and even maps to be sent or just stored as vertex data, so that isn’t a problem. I think it shouldn’t matter for giraph which you want to choose, and I’d rather go with many simple small messages, as you will use Giraph appropriately. Instead you will need to go in the compute function through the list of messages and for each message through the list of IDs.
Performance-wise it shouldn’t make any difference. What I’ve rather found to make a big difference is, try to compute as much as possible in on cycle, as the switching between cycles and synchronising the messages, ... takes a lot of time. As long as that doesn’t change it should be more or less the same and probably much easier to read and maintain when you keep the size of messages small.
In order to answer your question, you need understand the MessageStore interface and its implementations.
In a nutshell, under the hood, it took the following steps:
The worker receive the byte raw input of the messages and the destination IDs
The worker sort the messages and put them into A Map of A Map. The first map's key is the partition ID, the section map's key is the vertex ID. (It is kind of like the post office. The work is like the center hub, and it sort the letters into different zip code first, then in each zip code sorted by address)
When it is the vertex's turn of compute, a Iterable of that vertex's messages are passed to the vertex's compute method, and that's where you get the messages and use it.
So less and bigger messages are better because of less sorting if the total amount of bytes is the same for both cases.
Also, you could send many small messages, but let Giraph convert this into a long one (almost) automatically. You can use Combiners.
The documentation on this subject is terrible on Giraph site, but you maybe could extract an example from the book Practical Graph Analytics with Apache Giraph.
This depends on the type of messages that you are sending, mainly.

Google App Engine storing as list vs JSON

I have a model called User, and a user has a property relatedUsers, which, in its general format, is an array of integers. Now, there will be times when I want to check if a certain number exists in a User's relatedUsers array. I see two ways of doing this:
Use a standard Python list with indexed values (or maybe not) and just run an IN query and see if that number is in there.
Having the key to that User, get back the value for property relatedUsers, which is an array in JSON string format. Decode the string, and check if the number is in there.
Which one is more efficient? Would number 1 cost more reads than option 2? And would number 1 writes cost more than number 2, since indexing each value costs a write. What if I don't index -- which solution would be better then?
Here's your costs vs capability, option wise:
Putting the values in an indexed list will be far more expensive. You will incur the cost of one write for each value in the list, which can explode depending on how many friends your users have. It's possible for this cost explosion to be worse if you have certain kinds of composite indexes. The good side is that you get to run queries on this information: you can get query for a list of users who are friends with a particular user, for example.
No extra index or write costs here. The problem is that you lose querying functionality.
If you know that you're only going to be doing checks only on the current user's list of friends, by all means go with option 2. Otherwise you might have to look at your design a little more carefully.

Generate and/or validate license keys for App Engine for a closed beta

I will release my GAE application in a few months on a closed beta state, so that just a few users can use it and I get some date and know where and how to improve it. My idea was that I use a key system to let them access the application.
What I want to do:
I want to generate a punch of keys and store them with Datastore. When a users comes to the application the first time he logs in with his Google account and has to enter a key to activate his account.
My question:
My previous software didn't require such license keys or similar so this is a new area for me. Do you think this is good way to realize a closed beta? My second idea was to generate a bunch of keys and validate them with a system like other popular software does it, but I think this is unnecessary and I wan't to avoid a that someone can make a key-gen. Just generating, storing, then checking the key if it exists in the Datastore, setting it to used and activating the account would be my suggestion.
How can I generate a lot of valid and easily add more (without duplicates) keys. I'm thankful for every experience and suggestion.
As a refinement to Ashley's suggestion, if you'd like to generate shorter and/or easier to type IDs, you can generate some random data and encode it using base32:
base64.b32encode(os.urandom(8)).strip('=')
Make it a bit more readable by inserting hyphens:
'-'.join(base64.b32encode(os.urandom(8)).strip('=')[5*x:5*(x+1)] for x in range(3))
This gives you codes like the following:
'C6ZVG-NJ6KA-CWE'
Then just store the result in your datastore and hand them out to users. I'd suggest storing the code without the hyphens, and stripping those characters before checking the database. If you want to get really fancy, base32's alphabet is chosen to avoid characters that look similar; you could substitute those characters before you do the check to account for typos.
8 bytes of random data gives you 2^64 possible invite codes; if you hand out, say, 2^16 (65,536) of them, an attacker will still have to try 2^48 (about 300 trillion) codes to find a valid one. You can make your codes shorter at the cost of reducing the search space, if you want.
I use UUID for generating random keys:
UUID.randomUUID().toString().replace("-", "");
From the docs: "The UUID is generated using a cryptographically strong pseudo random number generator".
Generate a long list of them in the datastore, and then when a user arrives at something like: yourapp.com/betainvite/blahblahkey you can simply check if the key is in the table, and if it's rsvp property is null (or already set to the date it was used, in which case you deny the invite).
You could store the key against your User too, so you can find out who used each one and when.
Also good idea to maintain an invited date on the keys, then as you use each one you can mark it as invited, so you don't double invite people.

Flagging possible identical users in an account management system

I am working on a possible architecture for an abuse detection mechanism on an account management system. What I want is to detect possible duplicate users based on certain correlating fields within a table. To make the problem simplistic, lets say I have a USER table with the following fields:
Name
Nationality
Current Address
Login
Interests
It is quite possible that one user has created multiple records within this table. There might be a certain pattern in which this user has created his/her accounts. What would it take to mine this table to flag records that may be possible duplicates. Another concern is scale. If we have lets say a million users, taking one user and matching it against the remaining users is unrealistic computationally. What if these records are distributed across various machines in various geographic locations?
What are some of the techniques, that I can use, to solve this problem? I have tried to pose this question in a technologically agnostic manner with the hopes that people can provide me with multiple perspectives.
Thanks
The answer really depends upon how you model your users and what constitutes a duplicate.
There could be a user that uses names from all harry potter characters. Good luck finding that pattern :)
If you are looking for records that are approximately similar try this simple approach:
Hash each word in the doc and pick the min shingle. Do this for k different hash functions. Concatenate these min hashes. What you have is a near duplicate.
To be clear, lets say a record has words w1....wn. Lets say your hash functions are h1...hk.
let m_i = min_j (h_i(w_j)
and the signature is S = m1.m2.m3....mk
The cool thing with this signature is that if two documents contain 90% same words then there is a good 90% chance that good chance that the signatures would be the same for the two documents. Hence, instead of looking for near duplicates, you look for exact duplicates in the signatures. If you want to increase the number of matches then you decrease the value of k, if you are getting too many false positives then you increase the number of k.
Of course there is the approach of implicit features of users such as thier IP addresses and cookie etc.

Resources