Snowflake table not populated even though pipe is running

Snowflake table not populated even though pipe is running - snowflake-cloud-data-platform

I'm using Snowflake's Kafka connector to get some data into a table. I can see the files in the stage (when I run LIST #STAGE_NAME;). According to SYSTEM$PIPE_STATUS, the pipe is running. Still, I don't see anything in the copy history. However, when I refresh the pipe, I see the table populated in a bit.
Does someone what could be causing this?
Here's the connector configuration in case it helps (nothing special):
{
"buffer.count.records": "2",
"buffer.flush.time": "10",
"buffer.size.bytes": "1",
"input.data.format": "JSON",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"name": "mysnowflakesink",
"snowflake.database.name": "MY_DB",
"snowflake.private.key": "abc",
"snowflake.schema.name": "STREAM_DATA",
"snowflake.topic2table.map": "customers:CUSTOMERS_TEST",
"snowflake.url.name": "xyz.snowflakecomputing.com:443",
"snowflake.user.name": "my_user",
"tasks.max": "8",
"topics": "customers",
"value.converter": "com.snowflake.kafka.connector.records.SnowflakeJsonConverter"
}

It turned out to be my mistake. Namely, I'm using the connector programmatically, and what I didn't do is to call the preCommit method. Under the hood, it's making the pipe being used to ingest data.

Related

How to create Alexa Catalog for URL Reference for slots

I've found this example that is using the Catalog URL Reference for populating custom slots in Alexa Skill.
The problem is that I don't know how to populate this catalog.
I was able to create the model catalog using ask cli like this:
ask api create-model-catalog -n catalog_name -d "description"
That produces me the catalogId in the form "catalogId" : "amzn1.ask.interactionModel.catalog.blabla" like the one in the GitHub example in the first link.
The problem is that I don't know how to put the values (for example the ingredients.json in the above example) inside that catalog.
I've tried using
ask api create-model-catalog-version -c catalogId -f ingredients.json
But what I obtain is
Call create-model-catalog-version error.
Error code: 400
{
"message": "Request is not valid.",
"violations": [
{
"message": "'source' field of the request is invalid."
}
]
}
In the documentation, there isn't an example of how to deal with this so I'm stuck at this point.
Thanks for your help

In order to create and use a catalog in your alexa skill, you have to:
Upload the catalog file into a bucket or another public storage endpoint.
After that, you have to specify a JSON file with the following content (eg. catalog.json):
{
"type": "URL",
"url": "[your catalog url]"
}
Use the ask api to create the catalog, as you mentioned, to get your catalog-id:
ask api create-model-catalog --catalog-name "IngredientsCatalog"
--catalog-description "Ingredients"
Create a model catalog version by using the file that contains your catalog URL:
ask api create-model-catalog-version --file .\catalog.json --catalog-id [your catalog-id]
This call will provide a command to track the create version status. Something like:
ask api get-model-catalog-update-status -c [your catalog-id] -u [request id]
If the version has been created successfully, then you can set it on your skill interaction model:
"types": [
{
"name": "Ingredient",
"valueSupplier": {
"type": "CatalogValueSupplier",
"valueCatalog": {
"catalogId": "[your catalog-id]",
"version": "[the desired version number]"
}
}
}
]

I can add a little more here that may be helpful:
In step 1 in the answer above, from my testing the S3 bucket must be public. Also, below is the format of the JSON that you will want to use, including the use of synonyms which is not described in the official Amazon example. Note that you don't have to include an ID as shown in that example.
{
"values": [
{
"name": {
"value": "hair salon",
"synonyms": [
"hairdresser",
"beauty parlor"
]
}
},
{
"name": {
"value": "hospital",
"synonyms": [
"emergency room",
"clinic"
]
}
},
]
}

Client-side fan out data structure

I will implement my fan out model in Google's Firebase, but my question is only theoretical so the answer doesn't need to be in Firebase terms.
I am creating an app that I think should have a data structure similar to Tinder. The idea is only one post shows in your feed at a time; you then accept it or reject it and another one pops up and so on. My question is how exactly to structure the data so that it remains fast when the app scales up.
What I have right now is one node called "Posts" that contains every post that has ever been made. The app then queries for a post which is checked against the user Node of "viewedPosts" so that if the queried post has already been accepted/rejected by the user another one is queried until an unseen one is found. This obviously isn't a great solution because if there are a lot of posts, a query through them will be slow (especially if a lot of them have already been seen and the query has to be repeated multiple times).
I came across this article: The Firebase Blog: Client-side fan-out for data consistency which gave me the idea of creating a node inside each user which is "unseen posts" and every time a new post is uploaded by someone, to put it in the unseen folder for every user. This solves the problem on the side of the viewer, but to upload, one would have to download the list of all users in the app and then write to every single one of them.
So the question is, is there a middle ground between these which I can use to accurately do this?
Thank you.
EDIT:
Someone asked for my data structure:
{
"posts": {
"jkldsahjfkds": {
"title": "Simple Post",
"description": "This is my first post",
"numberOfImages": "2",
"price": "14.99",
"timestamp": "51782345",
"postedBy": "-hjd673bbewi7n",
"name": "Ryan Jacobs"
}
"-nisd7enskwes" : {...}
"-asdjfhk7385i" : {...}
"-sdfh49506ndk" : {...}
}
"users": {
"user1": {
"postsViewed": {
"-nisd7enskwes": 51784645,
"-sdfh49506ndk": 51782329
}
"postsLiked": {
"-sdfh49506ndk": 51782329
}
"userData": {
"name": "Albert Jones",
"bio": "Hi! Jow is everyone doing!?",
"location": "London"
}
}
}
}

Is including additional information in the output object a good idea?

I'm experimenting with a Conversation where I would like to modify the output in a couple of different ways:
different output for speech or text
different output depending on the tone of the conversation
It looks like I can add extra output details which make it through to the client ok. For example, adding speech alongside text...
{
"output": {
"speech": {
"Hi. Please see my website for details."
},
"link": "http://www.example.com",
"text": {
"Hi. Please see http://www.example.com for details."
}
}
}
For the tone, I wondered about making up a custom selection policy, unfortunately it seems to treat it the same as a random selection policy. For example...
{
"output": {
"text": {
"values": [
"Hello. Please see http://www.example.com for more details.",
"Hi. Please see http://www.example.com for details."
]
},
"append": false,
"selection_policy": "tone"
}
}
I could just add a separate tone-sensitive object to output though so that's not a big problem.
Would there be any issues adding things to output in this way?

You can definitely use the output field to specify custom variables you want your client app to see with the benefit that these variables will not persist across multiple dialog rounds (which they would if you would add them to the context field).
Now currently there is no "easy" way how to define your custom selection policy (apart from the random and sequential supported by the runtime right now) - but you could still return an array of possible answers to the client app with some attribute telling the client app which selection policy to use and you would implement this policy in the client app.

How do I perform specific parentless requests using Restangular in a neat way?

Say I have an API endpoint at /users and another one at /cars. Naturally, a GET request to either of them will get all users or cars available. Now, a GET /users/74/cars should return all cars belonging to user 74.
But my app has many models related to cars, not just users, so more endpoints exist like /shops/34/cars and /mechanics/12/cars. For simplicity, I want all PUT/PATCH requests to be made to the main /cars endpoint.
At the moment of performing the save, Restangular will by default do a PUT request to the endpoint through which the item was loaded. But that endpoint do not exist.
It also provides a nice Restangular.setParentless(['cars']) method that will discard the first part of the url. However, I don't want to do this globally, but specifically for a particular element.
The neatest would actually do it globally, but restrict it for a specific method, like: Restangular.setParentless(['cars'], ['PUT']).
Anything like that around? Or am I overcomplicating it?
So far I tried stuff I don't like:
delete car.parentResource;

I would recommend using self reference links. A self reference link is a link which stores the route which should be used for GET/PUT/DELETE etc. on the item, rather than the URL from which it was pulled.
Example, update the mileage on one of user id 74's cars:
First, configure Restangular to look for a self link property called 'self' on each object.
RestangularProvider.setRestangularFields({
selfLink: 'self'
});
Next, make your call to get the cars. I'll assume that you have already modified your API to return a property called 'self' on each object that has the URL to it's proper API endpoint.
GET /users/74/cars
[
{
"id": 12,
"model": "Camaro",
"make": "Chevrolet",
"year": 1969,
"color": "red",
"odometer": 67294,
"license": "ABC12345",
"self": "/cars/12"
},
{
"id": 14,
"model": "Gallardo",
"make": "Lamborghini",
"year": 2015,
"color": "black",
"odometer": 521,
"license": "XYZ34567",
"self": "/cars/14"
}
]
We want to add some miles to one of them, and then save it. The entire Restangular code would look like:
Restangular.one('users', 74).all('cars').getList().then(function(cars){
cars[1].odometer = 613;
cars[1].put();
});
The PUT will go to /cars/14 instead of /users/74/cars/14. This is very useful for applications like yours that relate models as a graph rather than a strict hierarchical tree.

How to NOT delete a file in a Azure FTP connector action if the transfer fails in a logic app

I have a successful connection setup between a FTP site and dropbox using a azure logic app. But while setting it up it kept just downloading the file then, since I had the next step wrong, deleting.
In a test environment this is annoying. In production, pretty awful.
Here is the code I am using on the action part:
"operation": "UploadFile",
"parameters": {
"FilePath": "#{triggers().outputs.body.FilePath}",
"content": {
"Content": "#{triggers().outputs.body.Content}",
"ContentTransferEncoding": "None"
},
"overwrite": true
},
Is there anything I can do so that if it fails it leave the file on the server?

I'm not 100% sure what you mean, but I will give it a try. Maybe you can reformulate the question if I misinterpret you.
But yes, there exists "conditions" in Logic Apps which can be used. If you are new to Logic Apps I'd suggest you use "designer view" and you can then click "Add condition to be met". This would visualize a text box in which you can formulate conditions. For instance:
#equals({your data}, bool('true'))
To check if some value is true, or something similar to check if data is null.