Reading data from Oracle cloud storage to target database - database

My application Fusion BICC is dumping data into oracle cloud object storage in the form of csv. I need to upload this data in my target database. so I am uploading data in external table and then comparing data of external table and target table using Minus and if data is new I am inserting it and if it exist I am updating them. I need few suggestion.
1) what is the best way to compare record if there is huge data.
2) instead of writing to external table is there any other better way? sqlloader, utl_file etc
3) If any record got deleted in BICC it does not come into csv file. but I have to delete those record if they are not in file. how to tackle that.
other than DBMS_CLOUD is there any package to upload data. I am very new to this. Request you to please suggest me on the same.
Consider BICC is an application which is dumping data in the form of cs file to Oracle cloud. I am interested basically in reading data from cloud storage to DBaaS.

Related

External Table in snowflake should read and point to the latest files created in Azure data lake storage automatically

When everytime my azure pipeline runs a new files gets created in azure data lake storage, so now I want my external table already created for this table to point to the latest file created in data lake
I have multiple parquet files of the same table in blob storage, we want to read the latest parquet file in external table in snowflake.
Have you checked out this section in the Snowflake documentation. It covers the steps required to configure Automatic Refresh of External Tables using Azure Event Grid. If this is not suitable for your use case, can you provide more detail on your issue and why.

Creating table in Snowflake from external stage when there are several files in blob

Im trying to create an external table in Snowflake by using an external stage in Azure blob storage. Query runs witout errors but since there are several files in the external stage I can not successfully create a table with data from a correct file which I want to load in. Even though I have been trying to specify the file name by writing different path I still dont get the table with the right data.
Does anyone know how to specify a file among many files in an external stage?
I would also like to update the table in Snowflake evey time the file has been updated in Azure blob storage. How would you do?
Thank you in advance!

Load all CSVs from path on local drive into AzureSQL DB w/Auto Create Tables

I frequently need to validate CSVs submitted from clients to make sure that the headers and values in the file meet our specifications. Typically I do this by using the Import/Export Wizard and have the wizard create the table based on the CSV (file name becomes table name, and the headers become the column names). Then we run a set of stored procedures that checks the information_schema for said table(s) and matches that up with our specs, etc.
Most of the time, this involves loading multiple files at a time for a client, which becomes very time consuming and laborious very quickly when using the import/export wizard. I tried using an xp_cmshell sql script to load everything from a path at once to have the same result, but xp_cmshell is not supported by AzureSQL DB.
https://learn.microsoft.com/en-us/azure/azure-sql/load-from-csv-with-bcp
The above says that one can load using bcp, but it also requires the table to exist before the import... I need the table structure to mimic the CSV. Any ideas here?
Thanks
If you want to load the data into your target SQL db, then you can use Azure Data Factory[ADF] to upload your CSV files to Azure Blob Storage, and then use Copy Data Activity to load that data in CSV files into Azure SQL db tables - without creating those tables upfront.
ADF supports 'auto create' of sink tables. See this, and this

How to move data from S3 to Snowflake

I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html

What is the better and faster way to save and get files Azure BLOB storage or SQL SERVER?

[Background]
Now I am creating WCF for keeping and getting articles of our university.
I need to save files and metadata of these files.
My WCF need to be used by 1000 person a day.
The storage will contains about 60000 aticles.
I have three different ways to do it.
I can save metadata(file name, file type) in sql server to create unique id) and save files into Azure BLOB storage.
I can save metadata and data into sql server.
I can save metadata and data into Azure BLOB storage.
What way do chose and why ?
If you suggest your own solution, it will be wondefull.
P.S. Both of them use Azure.
I would recommend going with option 1 - save metadata in database but save files in blob storage. Here're my reasons:
Blob storage is meant for this purpose only. As of today an account can hold 500TB of data and size of each blob can be of 200 GB. So space is not a limitation.
Compared to SQL Server, it is extremely cheap to store in blob storage.
The reason I am recommending storing metadata in database is because blob storage is a simple object store without any querying capabilities. So if you want to search for files, you can query your database to find the files and then return the file URLs to your users.
However please keep in mind that because these (database server and blob storage) are two distinct data stores, you won't be able to achieve transactional consistency. When creating files, I would recommend uploading files in blob storage first and then create a record in the database. Likewise when deleting files, I would recommend deleting the record from the database first and then removing blob. If you're concerned about having orphaned blobs (i.e. blobs without a matching record in the database), I would recommend running a background task which finds the orphaned blobs and delete them.

Resources