I am looking for best way to move 2 TB of data from on-premises to snowflake. Data is in zipped files of size ~150 MB each and similar files will be generated on going basis. As we don't have cloud account (only have snowflake account) so cannot use cloud native storage like S3 or Azure BLOB. Also we want to use public internet to establish connectivity from on-premises network to Snowflake DB on the cloud. (no VPN or direct connect available or 3rd party tool is to be used)
How can we best ensure that data while in-transit from on premises to snowflake DB on the cloud is secure.
And without using S3 or Azure BLOB storage the data is loaded into snowflake.
So you do not have external CLoud Storage Accounts to Store these files into ; i could see one option and that is with regard to use SnowSQL to upload files into Snowflake Storage, the internal stage location using PUT command , have look at PUT Command of SQL at following URL
https://docs.snowflake.com/en/sql-reference/sql/put.html
It can upload the file to Snowflake's internal Stage as well as User & Table internal Stage.
There is optional parameter PARALLEL which Specifies the number of threads to use for uploading files, Increasing the number of threads can improve performance when uploading large files. Larger files are automatically split into chunks, staged concurrently, and reassembled in the target stage. A single thread can upload multiple chunks.
Uploaded files are automatically encrypted with 128-bit or 256-bit keys. The CLIENT_ENCRYPTION_KEY_SIZE account parameter specifies the size key used to encrypt the files.
Given the 2TBs of File upload , you should experiment with multiple files of small sizes.
You can use any of the Snowflake connectors to move data directly from your on-premises servers to Snowflake. https://docs.snowflake.com/en/user-guide/conns-drivers.html
You can also start simply with the command-line interface snowsql using put commands. https://docs.snowflake.com/en/user-guide/snowsql.html
All traffic to/from Snowflake is encrypted in-transit with SSL. https://resources.snowflake.com/snowflake/automatic-encryption-of-data
Related
We have a SQL Server/Windows (2016) database that stores documents inside BLOB (nvarbinary) fields. We are looking to migrate these documents out of the database and store them in AWS S3 storage.
I have been able to read BLOB data out of SQL Server in Powershell using stream read/write to a local file system and I can then use AWS S3 CP ... to get the files out to S3. However, this approach causes a need for an extra step - to store the file locally, on the SQL Server drive.
Is there a way to read binary data out of SQL Server database and store it directly in S3? I tried Write-S3Object, but it looks like it is expecting text (system.String) for contents, which is not working with images/pdfs and other non-text type documents we have.
Any suggestions, Powershell code samples, or references are much appreciated!
Thank you in advance
--Alex
For staging in Snowflake, we need S3 AWS layer or Azure or Local machine. Instead of this, can we FTP a file from a source team directly to Snowflake internal storage, so that, from there the Snowpipe can the file and load to our Snowflake table.
If yes, please tell how. If no, please confirm that as well. If no, won't that is a big drawback of Snowflake to depend on other platforms every time.
You can use just about any driver from Snowflake to move files to Internal stage on Snowflake. ODBC, JDBC, Python, SnowSQL, etc. FTP isn't a very common protocol in the cloud, though. Snowflake has a lot of customers without any presence on AWS, Azure, or GCP that are using Snowflake without issues in this manner.
guys.
I know it is a general question. Let me give you my scenario:
I have a client who sends a bunch of Excel files to me and I use my on-premise SSIS package to export it to a database located on Azure. My SSIS package does call stored procedures stored on the Azure SQL Server to manipulate the data.
I want to move the whole process to the cloud and I want to know what is the best way and how can we achieve it. I was thinking maybe I can use blob storage in a container and by providing a cloud folder located on Azure and let my client through the files there. Then my an app (service) such as Data Factory can detect those files and run my SSIS package that is deployed on Azure "Somehow".
Any ideas or sample code would be great.
Thanks!
You can try below manual approach -
1. Copy all csv files to ADLS (Azure Data Lake Storage) (For Automation - Copy activity, for loop and lookup activity, you may use)
2. For any transformation of data, use USQL jobs (ADLA) whose output is also stored in ADLS. You can save USQL scripts in blob storage (for ADF AUTOMATOMATION).
3. To transfer the data from ADLS files to Azure sql database, use copy activity of azure data factory and use sql as sink and csv as source format.
Is there any way that I can accomplish the functionality of File Tables, in Azure SQL Database?
I know the File Tables functionality is not included in SQL Database Azure, so a way to accomplish these functionality?
I intend to save PDF files, and it can be larger than 20 Mb, and it will be a lot, so I am thinking a way to solve these...
any suggestions??
One Azure solution for file storage is Azure Blob Storage.
Azure Blob storage is a service that stores unstructured data in the cloud as objects/blobs. Blob storage can store any type of text or binary data, such as a document, media file, or application installer. Blob storage is also referred to as object storage.
Common uses of Blob storage include:
Serving images or documents directly to a browser
Storing files for distributed access
Streaming video and audio Storing data for backup and restore, disaster recovery, and archiving
Storing data for analysis by an on-premises or Azure-hosted service
It has RESTful API as well as some SDK for all popular languages that will help you manage your blob files in Azure.
For more details see these pages:
https://learn.microsoft.com/en-ca/azure/storage/storage-dotnet-how-to-use-blobs
https://azure.microsoft.com/en-ca/services/storage/blobs/
I have a standard WinForms application that connects to a SQL Server. The application allows users to upload documents which are currently stored in the database, in a table using an image column.
I need to change this approach so the documents are stored as files and a link to the file is stored in the database table.
Using the current approach - when the user uploads a document they are shielded from how this is stored, as they have a connection to the database they do not need to know anything about where the files are stored, no special directory permissions etc are required. If I set up a network share for the documents I want to avoid any IT issues such as the users having to have access to this directory to upload to or access existing documents.
What are the options available to do this? I thought of having a temporary database where the documents are uploaded to in the same way as the current approach and then a process running on the server to save these to the file store. This database could then be deleted and recreated to reclaim any space. Are there any better approaches?
ADDITIONAL INFO: There is no web server element to my application so I do not think a WCF service is possible
Is there a reason why you want to get the files out of the database in the first place?
How about still saving them in SQL Server, but using a FILESTREAM column instead of IMAGE?
Quote from the link:
FILESTREAM enables SQL Server-based applications to store unstructured
data, such as documents and images, on the file system. Applications
can leverage the rich streaming APIs and performance of the file
system and at the same time maintain transactional consistency between
the unstructured data and corresponding structured data.
FILESTREAM integrates the SQL Server Database Engine with an NTFS file
system by storing varbinary(max) binary large object (BLOB) data as
files on the file system. Transact-SQL statements can insert, update,
query, search, and back up FILESTREAM data. Win32 file system
interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps
reduce any effect that FILESTREAM data might have on Database Engine
performance. The SQL Server buffer pool is not used; therefore, this
memory is available for query processing.
So you would get the best out of both worlds:
The files would be stored as files on the hard disk (probabl faster compared to storing them in the database), but you don't have to care about file shares, permissions etc.
Note that you need at least SQL Server 2008 to use FILESTREAM.
I can tell you how I implemented this task. I wrote a WCF service which is used to send archived files. So, if I were you, I would create such a service which should be able to save files and send them back. This is easy and you also must be sure that the user under which context the WCF service works has permission to read write files.
You could just have your application pass the object to a procedure (CLR maybe) in the database which then writes the data out to the location of your choosing without storing the file contents. That way you still have a layer of abstraction between the file store and the application but you don't need to have a process which cleans up after you.
Alternatively a WCF/web service could be created which the application connects to. A web method could be used to accept the file contents and write them to the correct place, it could return the path to the file or some file identifier.