As the kaha db used to store the persistent data is there any method to access the un consumed files in the db. Please suggest with some of the UI through which the data in the kaha db can be accessed, Is there any method to use some queries to access the data from kaha db. please help me out to get with a solution.
Is there any query browser used to query kaha db
Kaha db is an in memory database used with Activemq to store the queuing data. It stores the files in the form of journal files with a Btree index made on every journal file it stores. On my finding no query browsers are available to query the kaha db but there are options to access the jornal files of the kahadb which stores the data.
unable to find any UI way to access it rather we can make use of the below amq-kahadb-tool to see what's inside and a summary of the kaha db logs
https://github.com/Hill30/amq-kahadb-tool
Related
I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html
My application Fusion BICC is dumping data into oracle cloud object storage in the form of csv. I need to upload this data in my target database. so I am uploading data in external table and then comparing data of external table and target table using Minus and if data is new I am inserting it and if it exist I am updating them. I need few suggestion.
1) what is the best way to compare record if there is huge data.
2) instead of writing to external table is there any other better way? sqlloader, utl_file etc
3) If any record got deleted in BICC it does not come into csv file. but I have to delete those record if they are not in file. how to tackle that.
other than DBMS_CLOUD is there any package to upload data. I am very new to this. Request you to please suggest me on the same.
Consider BICC is an application which is dumping data in the form of cs file to Oracle cloud. I am interested basically in reading data from cloud storage to DBaaS.
Tried searching for snowflake tags on metastack and superuser, and couldn't find them. Hence asking the questions here.
I have 2 snowflake accounts and I need to copy data from production account to testing account.
How can I do that? I read the snowflake documentation for loading and unloading the data using s3, but is there a quick way to get the data across?
As Howard said you could use Data Sharing. Create a share with the data you want to copy, grant the testing account access to the SHARE, create a new database in the testing account from the share. Now you can query that data as if it was in the test account. If you need an actual copy of the data, so you can change it, etc, then you need to do a CTAS from the share to an empty table in another database. This will be much faster than unloading and loading all the data.
Here is the doc to get started: https://docs.snowflake.net/manuals/user-guide/data-sharing-intro.html
Context:
Need to extract data from dB owned by another team to run some modeling. Frequency of extraction is biweekly. Data size is around 500k-1million rows.
Question:
I did direct connection by asking the other team to create a dB role for my extraction or get a dump file from them.
What are some of the ways that we can extract the data? Web services is good?
Thank you in advanced
[Background]
Now I am creating WCF for keeping and getting articles of our university.
I need to save files and metadata of these files.
My WCF need to be used by 1000 person a day.
The storage will contains about 60000 aticles.
I have three different ways to do it.
I can save metadata(file name, file type) in sql server to create unique id) and save files into Azure BLOB storage.
I can save metadata and data into sql server.
I can save metadata and data into Azure BLOB storage.
What way do chose and why ?
If you suggest your own solution, it will be wondefull.
P.S. Both of them use Azure.
I would recommend going with option 1 - save metadata in database but save files in blob storage. Here're my reasons:
Blob storage is meant for this purpose only. As of today an account can hold 500TB of data and size of each blob can be of 200 GB. So space is not a limitation.
Compared to SQL Server, it is extremely cheap to store in blob storage.
The reason I am recommending storing metadata in database is because blob storage is a simple object store without any querying capabilities. So if you want to search for files, you can query your database to find the files and then return the file URLs to your users.
However please keep in mind that because these (database server and blob storage) are two distinct data stores, you won't be able to achieve transactional consistency. When creating files, I would recommend uploading files in blob storage first and then create a record in the database. Likewise when deleting files, I would recommend deleting the record from the database first and then removing blob. If you're concerned about having orphaned blobs (i.e. blobs without a matching record in the database), I would recommend running a background task which finds the orphaned blobs and delete them.