Loading Files from a Named External Stage vs External Location - snowflake-cloud-data-platform

We have our data files as JSON on GCP Cloud Storage.
Which of the below 2 approach is the ideal/efficient way to load it to snowflake existing table
Use GCS as Named External Stage
Use GCS as External Location to load data
If (1), then should we go for Calling Snowpipe REST Endpoints to Load Data ?

The "efficiency" is pretty much the same for either method, but I'd strongly recommend going the route of Auto Ingest Snowpipe, as outlined in this link:
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-gcs.html
This works really well and allows for a "set it and forget it" type of project.

Related

Ways to Send Snowflake Data to a REST API (POST)

I was wondering if anyone has sent data from Snowflake to an API (POST Request).
What is the best way to do that?
Thinking of using Snowflake to unload (COPY INTO) Azure blob storage then creating a script to send that data to an API. Or I could just use the Snowflake API directly all within a script and avoid blob storage.
Curious about what other people have done.
To send data to an API you will need to have a script running outside of Snowflake.
Then with Snowflake external functions you can trigger that script from within Snowflake - and you can send parameters to it too.
I did something similar here:
https://towardsdatascience.com/forecasts-in-snowflake-facebook-prophet-on-cloud-run-with-sql-71c6f7fdc4e3
The basic steps on that post are:
Have a script that runs Facebook Prophet inside a container that runs on Cloud Run.
Set up a Snowflake external function that calls the GCP proxy that calls that script with the parameters I need.
In your case I would look for a similar setup, with the script running within Azure.
https://docs.snowflake.com/en/sql-reference/external-functions-creating-azure.html

Fetch thousand of files from S3/minio with a single page webapp (no server)

I'm developing a single page app for image annotation. Each .jpg file is stored on S3/minIO services, coupled with a .xml file (Pascal VOC notation), which describes the coordinates and positions for each annotation associated to the image.
I'd like to fetch all the xml data, to be able filtering my image results within the webapp project (based upon ReactJS). But thousand of request to an S3 server directly from a web app seems a bit odd to me; nevertheless, I would prefer avoid using any "middleware" servers (like python/flask or nodejs), relying on the ReactJS app.
I've not been able to find any workaround to download all the xml files content with a single ajax call; do you have some idea to address this kind of issue?
The S3 API doesn't provide an API to fetch multiple files in a single operation. As you have suggested in your question, your application will need to handle this logic by first getting a list of the objects then iterating through that list.
Alternatively, if you can consider storing the xml files as a single archive.

Download large file on Google App Engine Python

On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.
DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.
Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.

Google BigQuery Dataset Export

I'm trying to use Google BigQuery to download a large dataset for the GitHub Data Challenge. I have designed my query and am able to run it in the console for Google BigQuery, but I am not allowed to export the data as CSV because it is too large. The recommended help tells me to save it to a table. This requires requires me to enable billing on my account and make a payment as far as I can tell.
Is there a way to save datasets as CSV (or JSON) files for export without payment?
For clarification, I do not need this data on Google's cloud and I only need to be able to download it once. No persistent storage required.
If you can enable the BigQuery API without enabling billing on your application, you can try using the getQueryResult API call. You're best bet is probably to enable billing (you probably won't be charged for the limited usage you need as you will probably stay within the free tier but if you do get charged it should only be a few cents) and save your query as a Google Storage object. If its too large I don't think you'll be able to use the Web UI effectively.
See this exact topic documentation:
https://developers.google.com/bigquery/exporting-data-from-bigquery
Summary: Use the extract operation. You can export CSV, JSON, or Avro. Exporting is free, but you need to have Google Cloud Storage activated to put the resulting files there.
use BQ command line tool
$ bq query
use the --format flag to save results as CSV.

HTML5 Database Use without Server

Is it possible to use a local database file with html5 without using a server. I would like to create a small application that depends on information from a small database. I do not want to host a server just to pull information. Is it possible to create a database file and pull information from the local files ?
Depends on the following:
The type of application you want to build:
Normal website with some data being pulled from a local storage;
Special purpose hosted website / application with data generated by the user;
Special purpose local application with a dedicated platform (a particular browser) and with access to the browser's non-web API -- in order to access the browser's own persistent storage methods (file storage, SQLite etc.);
Special purpose local application with a dedicated environment -- in order to deploy the application with a local web server and database;
Available options:
Indexed DB
Web Storage
XML files used for storing data and XSLT stylesheets for translating the data into HTML;
Indexed DB and Web Storage ar available in some browsers but you need to make sure the targeted browsers have it. Their features aren't quite as complete and flexible as SQL RDBMSs but they may fit the bill if your application doesn't need all that flexibility.
XML files can contain the data you want to be shown to the user and they can be updated manually (not by the user) or dynamically (by a server script).
For dynamic updating the content of the XML is kept in JavaScript and manipulated / altered (using the XML DOM) and when the session is over the XML content is sent to the server to entirely replace the previous XML file. This works OK if the individual users have a file each and they never write to each other's files.
Reading local files:
Normal file access is prohibited (for security reasons) to all local (JavaScript) code, which means that "having" a file locally implies either downloading it from a known source (a server) or asking the user to offer access to a local file.
Asking the user to offer access to a local file which implies offering the user a "file input" -- like for uploads but without actually uploading the file.
After a file has been selected using FileAPI to read that file should be fairly simple.
This workflow would involve the user "giving" you the database on every page refresh -- but since it's a one page thing it would mean giving you the data on every session as long as your script does not refresh the page.
You can use localstorage but you can run a server from your own computer. You can use Wamp or Xampp. Which use Apache and mysql.
What i'm looking for is a little more robust than a cookie. I am making a web application for a friend that will be 1 page, and have a list of names on the page. The person wants to be able to add names to the list, however they do not want to use a web server. Just want the files locally on a computer so a folder called test-app , with index.html, and possibly a database file that can be stored in the web browser or a way to save information to the web browser for repeated use.

Resources