Snowpipe auto ingestion - snowflake-cloud-data-platform

I'm new to Snowflake. I have created snowpipe, stages also configured SQS in AWS.
Data is not getting loaded into table through snowpipe when I placed files in my S3 bucket.
If I'm executing statement: alter pipe snow_pipename refresh then only data getting loaded into table.
Do I need to do any more setup/instructions for auto ingest data load.

The SQS notification might not have been set up properly on the S3 bucket. Check the configuration to set it up in the below link
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html#step-4-configure-event-notifications

Ensure you have configured your snowflake IAM role correctly with appropriate policies and trust relationship.
Step 5 of the documentation:
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html#step-4-configure-event-notifications
Also, ensure that these values are reflected in your STAGE_CREDENTIALS within your stage, using DESCRIBE STAGE snowpipe_emp;

Related

The best GCP architecture for exporting Bigquery data to an external application with API

I use these following GCP products together for a CRM system:
Cloud SQL
App Engine
Bigquery
Once a week an external application exports data from Bigquery in this way:
The external application makes a request to Appengine with a token.
AppEngine retrieves permissions for this token from Cloud SQL, makes some additional computation to obtain a list of allowed IDs.
Appengine runs a Bigquery's query filtered with these ids. Something like that: SELECT * FROM table WHERE id IN(ids)
Appengine responds to the external application with a unmodified result of query in JSON.
The problem is that the export is not very often, but amount of data can be large and I dont want to load AppEngine with this data. What other GCP products are useful in this case? Remember I need to retrieve permissions from Appengine and CloudSQL.
Unclear whether the JSON is just directly from BigQuery query results, or you do additional processing in the application to render/format it. I'm assuming direct results.
An option that comes to mind is to leverage cloud storage. You can use the signed url feature to provide a time-limited link to your (potential large) results without exposing public access.
This, coupled with BigQuery's ability to export results to GCS (either via an export job, or using the newer EXPORT DATA SQL statement allows you to run a query and deliver results directly to GCS.
With this, you could simply redirect the user to the signed URL at the end of your current flow. There's additional features that are complementary here, such as using GCS data lifecycle features to age out and remove files automatically so you don't need to concern yourself with slow accumulation of results.

External table snowflake refresh issue /questions / confirmation / with external stage on azure plateform

I am looking for small help in below snowflake scenario.
Going by snowflake official documentation its clear that external tables needs to refresh at least once(auto or manual) so that its setup is completed and query should be able to fetch rows.
What happens in case where underline path and filenames is not changing, but we are replacing the file with new data. Whether new data will be available if we query the external table from snowflake
without event grid setup (which perform refresh) or without performing manual refresh. ?
ex . some_external_path/location/file_with_data.csv
you can trigger a refresh automatically when the external files change. Snowflake documentation covering this process is here: https://docs.snowflake.com/en/user-guide/tables-external-azure.html
Alternatively, if you just want to run the refresh on a schedule you could set up a Snowflake task to execute an ALTER EXTERNAL TABLE … REFRESH statement

Edit Client in IdentityServer4

The sample and seed data shows creating a new client in Startup.
This is fine in case of creating a client.
Are there any existing methods or provision for Updating a client. Update involves tracking the existing records from the collection fields within the clients too.
How are entities mapped from IdentityServer4.Models to IdentityServer4.EntityFramework.Entities during an update considering the records are already available in database?
What do you mean when you say client?? If you mean Client for Identity server then you can edit/configure or add more clients or other resources in your config class. While startup, identity server will loads up all the clients by itself, all because of this code:
// Add identity server.
services.AddIdentityServer()
.AddTemporarySigningCredential()
.AddInMemoryIdentityResources(Config.GetInMemoryIdentityResources())
.AddInMemoryApiResources(Config.GetInMemoryApiResources())
.AddInMemoryClients(Config.GetInMemoryClients(Configuration))
.AddAspNetIdentity<ApplicationUser>()
.AddProfileService<SqlProfileService>();
Are there any existing methods or provision for Updating a client.
Update involves tracking the existing records from the collection
fields within the clients too
Yes, you can update client as you can update any other data. Check here how you can use EntityFramework core with identityserver4
How are entities mapped from IdentityServer4.Models to
IdentityServer4.EntityFramework.Entities during an update considering
the records are already available in database?
If you check the IdentityServer4 source you will find AutoMapper is used to convert entities (namespace IdentityServer4.EntityFramework.Mappers). And an extension named ToModel has been provided

Cloudant CDTDatastore to pull only part of the database

We're using Cloudant as the remote database for our app. The database contains documents for each user of the app. When the app launches, we need to query the database for all the documents belonging to a user. What we found is the CDTDatastore API only allows pulling the entire database and storing it inside the app then performing the query in the local copy. The initial pulling to the local datastore takes about 10 seconds and I imagine will take longer when adding more users.
Is there a way I can save only part of the remote database to the local datastore? Or, are we using the wrong service for our app?
You can use a server side replication filter function; you'll need to add information about your filter to the pull replicator. However replication will have a performance hit when using the function.
That being said a common pattern is to use one database per user, however this has other trade offs and it is something you should read up on. There is some information on the one database per user pattern here.

Recommended architecture for bulk-refreshing Salesforce?

We would like to keep Salesforce synced with data from our organization's back-end. The organizational data gets updated by nightly batch processes, so "real-time" syncing to Salesforce isn't in view. We intend to refresh Salesforce nightly, after our batch processes complete.
We will have somewhere around 1 million records in Salesforce (some are Accounts, some are Contacts, and some belong to custom objects).
We want the refresh to be efficient, so it would be nice to send only updated records to Salesforce. One thought is to use Salesforce's Bulk API to first get all records, then compare to our data, and only send updated records to Salesforce. But this might be an expensive GET.
Another thought is to just send all 1 million records through the Bulk API as upserts to Salesforce - as a "full refresh".
What we'd like to avoid is the burden/complexity of keeping track of what's in Salesforce ourselves (i.e. tables that attempt to reflect what's in Salesforce, so that we can determine the changes to send to Salesforce).

Resources