Is there any way that I can accomplish the functionality of File Tables, in Azure SQL Database?
I know the File Tables functionality is not included in SQL Database Azure, so a way to accomplish these functionality?
I intend to save PDF files, and it can be larger than 20 Mb, and it will be a lot, so I am thinking a way to solve these...
any suggestions??
One Azure solution for file storage is Azure Blob Storage.
Azure Blob storage is a service that stores unstructured data in the cloud as objects/blobs. Blob storage can store any type of text or binary data, such as a document, media file, or application installer. Blob storage is also referred to as object storage.
Common uses of Blob storage include:
Serving images or documents directly to a browser
Storing files for distributed access
Streaming video and audio Storing data for backup and restore, disaster recovery, and archiving
Storing data for analysis by an on-premises or Azure-hosted service
It has RESTful API as well as some SDK for all popular languages that will help you manage your blob files in Azure.
For more details see these pages:
https://learn.microsoft.com/en-ca/azure/storage/storage-dotnet-how-to-use-blobs
https://azure.microsoft.com/en-ca/services/storage/blobs/
Related
I need to design a scalable database architecture in order to store all the data coming from flat files - CSV, html etc. These files come from elastic search. most of the scripts are created in python. This data architecture should be able to automate most of the daily manual processing performed using excel, csv, html and all the data will be retrieved from this database instead of relying on populating within csv, html.
Database requirements:
Database must have a better performance to retrieve data on day to day basis and it will be queried by multiple teams.
ER model, schema will be developed for the data with logical relationship.
The database can be within cloud storage.
The database must be highly available and should be able to retrieve data faster.
This database will be utilized to create multiple dashboards.
The ETL jobs will be responsible for storing data in the database.
There will be many reads from the database and multiple writes each day with lots of data coming from Elastic Search and some of the cloud tools.
I am considering RDS, Azure SQL, DynamoDB, Postgres or Google Cloud. I would want to know which database engine would be a better solution considering these requirements. I also want to know how ETL process should be designed- lambda or kappa architecture.
To store the relational data like CSV and excel files, you can use relational database. For flat files like HTML, which doesn't required to be queried, you can simply use Storage account in any cloud service provider, for example Azure.
Azure SQL Database is a fully managed platform as a service (PaaS) database engine that handles most of the database management functions such as upgrading, patching, backups, and monitoring without user involvement. Azure SQL Database is always running on the latest stable version of the SQL Server database engine and patched OS with 99.99% availability. You can restore the database at any point of time. This should be the best choice to store relational data and perform SQL query.
Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Your HTML files can be stored here.
The ETL jobs can be performed using Azure Data Factory (ADF). It allows you to connect almost any data source (including outside Azure) to transform the stored dataset and store it into desired destination. Data flow transformation in ADF is capable to perform all the ETL related tasks.
I am just confuse with the explanation given on multiple forum for database storage for snowflake. When they say that data is stored in the form of columner and optimised files in cloud storage, does it mean S3 bucket or azure blob storage? Does Snowflake store data or uses cloud host storage ?
According to the paper The Snowflake Elastic Data Warehouse (2016) - see paragraph 3.1 Data Storage:
Snowflake initially chose Amazon S3 to store table data, query
results, and temp data generated by query operators (e.g. massive
joins) once local disk space is exhausted, as well as for large query
results. Metadata such as catalog objects, which table consists of
which S3 files, statistics, locks, transaction logs, etc. is stored in
a scalable, transactional key-value store, which is part of the Cloud
Services layer.
Since then, and as of today, Snowflake has been made available to run also on Azure and Google Cloud.
Therefore, when setting up a Snowflake account, the user is presented with the option of a cloud storage provider to use: for AWS Snowflake will use Simple Storage Service (S3), for Azure it will use Azure Blob Storage, and for Google Cloud it will use Google Cloud Storage (GCS).
The database storage is in files in S3, Azure Blob on Azure and GCP buckets (or whatever they're called). Data and storage are completely separate, unlike server based RDBMS such as REDSHIFT, where the servers have both compute and storage. See the Snowflake documentation for more detail.
I am looking for best way to move 2 TB of data from on-premises to snowflake. Data is in zipped files of size ~150 MB each and similar files will be generated on going basis. As we don't have cloud account (only have snowflake account) so cannot use cloud native storage like S3 or Azure BLOB. Also we want to use public internet to establish connectivity from on-premises network to Snowflake DB on the cloud. (no VPN or direct connect available or 3rd party tool is to be used)
How can we best ensure that data while in-transit from on premises to snowflake DB on the cloud is secure.
And without using S3 or Azure BLOB storage the data is loaded into snowflake.
So you do not have external CLoud Storage Accounts to Store these files into ; i could see one option and that is with regard to use SnowSQL to upload files into Snowflake Storage, the internal stage location using PUT command , have look at PUT Command of SQL at following URL
https://docs.snowflake.com/en/sql-reference/sql/put.html
It can upload the file to Snowflake's internal Stage as well as User & Table internal Stage.
There is optional parameter PARALLEL which Specifies the number of threads to use for uploading files, Increasing the number of threads can improve performance when uploading large files. Larger files are automatically split into chunks, staged concurrently, and reassembled in the target stage. A single thread can upload multiple chunks.
Uploaded files are automatically encrypted with 128-bit or 256-bit keys. The CLIENT_ENCRYPTION_KEY_SIZE account parameter specifies the size key used to encrypt the files.
Given the 2TBs of File upload , you should experiment with multiple files of small sizes.
You can use any of the Snowflake connectors to move data directly from your on-premises servers to Snowflake. https://docs.snowflake.com/en/user-guide/conns-drivers.html
You can also start simply with the command-line interface snowsql using put commands. https://docs.snowflake.com/en/user-guide/snowsql.html
All traffic to/from Snowflake is encrypted in-transit with SSL. https://resources.snowflake.com/snowflake/automatic-encryption-of-data
I'm trying to find a way to store images and 3d assets as I think they shouldn't be stored in the database (Azure Database for MySQL).
I'm thinking that it's better to store them in a cloud storage, so I chose Azure Blob Storage for this reason.
I think that I should store the file name, and other properties onto the database, and using the fileID returned from the database to obtain the location of the item from Azure Blob Storage.
But I am a worried about transactional safety, what should I do in the event of failure? E.g. if uploading to Azure Blob Storage fails, should I delete the record in attempts to 'undo'?
Is there a better way to do this? Is it recommended at all?
Edit: On second thought, a better idea might be to first upload to Azure Blob Storage, and then only after a successful upload, to upload a record onto the database. I think it is safer this way.
In a try-catch block, check if an image's URL exist first, if not, upload an image to storage and get the file URL
if upload success, start a transaction and add the new record to the database with the image URL.
if the transaction failed, delete the image
What Datastore/Database runs on top of Amazon S3 or S3-compatible storage?
I understand that S3 is an Object Storage and thus not a database, but a database must have something to store data into, thus, my question is if there is a Database or Datastore that saves its data on an Amazon S3 or S3-compatible storage instead of a local file system.
Here are some databases and database-like products that use S3 (or can use S3).
Amazon Athena
S3 Select
Apache HBase
Redshift
Also, if you want some theory, here’s a MIT paper about Building a Database on S3.
This is by no means exhaustive, but it’s probably a good place to start.
Update
Here are some more that aren't AWS owned software.
Cassandra
Hadoop—this isn't a database, but S3 already provides you with key-value storage, and Hadoop can provide you with querying.
s3-db
Ultimately, you need to consider what sort of query functionality you need and what sort of consistency you can tolerate.