Is it possible to create external table in Snowflake referring to on premise Oracle database?
No, Snowflake does not presently support query federation to other DBMS software.
External tables in Snowflake exist only to expose a collection of data files (commonly found in data-lake architectures) as a qualified table without requiring a load first.
Querying your Oracle tables will currently require an explicit export of its data onto a cloud storage location to allow Snowflake to access it.
Related
I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html
We know show create table in hive gives storage path. Checking how to find a storage path for a snowflake table. I don’t see show create or desc table giving a storage path for a table.
One of the main advantages of Snowflake Data Platform is automatic storage handling:
Key Concepts & Architecture
Database Storage
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.
Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake
The two concepts confused me a lot recently.
Snowflake Database more refers to the data service and its website address as below:
https://www.snowflake.com/
This is more like a data platform or data warehouse on the cloud that provides SQL engine functionalities.
On the other hand, Snowflake schema is more like an algorithm that design database schema.
Are they totally two different things and just have the same name coincidently?
Databases and schemas are used to organize data stored in Snowflake:
A database is a logical grouping of schemas. Each database belongs to a single Snowflake account.
A schema is a logical grouping of database objects (tables, views, etc.). Each schema belongs to a single database.
Together, a database and schema comprise a namespace in Snowflake.
Source: https://docs.snowflake.com/en/sql-reference/ddl-database.html
I am working on an assignment which needs to store the files uploaded by the user. I thought of storing it in the filesystem and store the path in the DB. Usually the files will be within 5MB.
I have a doubt whether I can store the file in a NoSQL db and give the reference of the NoSQL db(the file) in my Postgre DB. Kinldy help. thanks in advance.
Are you looking for FOREIGN DATA WRAPPERS in postgresql?
Postgresql allows you to store and access data that resides outside postgresql's storage via ForeignDataWrappers. FDWs are like extensions to postgresql. You can piggy back on the postgresql's SQL standards and store the data in you wrapper. Such tables are called FOREIGN TABLES.
From the docs,
A foreign table can be used in queries just like a normal table, but a
foreign table has no storage in the PostgreSQL server.
There are various FDWs available for NoSQLs. Refer link
More about FDWs
I'm developing a web application using PHP and an RDBMS. Some of the data my application needs are stored in a remote database owned by another entity. I have limited read-only access to this other database. Is there an RDBMS capable of executing a query to the remote database and using the result as if it were a local table (i.e. satisfying foreign key relationships, JOINing, etc)? I would prefer FOSS, but it's not a requirement.
MySQL have the FEDERATED table type. You can create a table that references a table at a remote MySQL instance, and than use that table as if it was a local table.
This does have some limitations (e.g. no transactions support, IMHO), but it should work as you described.