Data Sharing between two accounts, but with a capability to modify data

Data Sharing between two accounts, but with a capability to modify data - snowflake-cloud-data-platform

Need inputs on ways to achieve this scenario
In Data Sharing between two accounts, I understand Consumer can read (SELECT) data provided by Provider.
Is it possible for Consumer to insert/update data into the same table provided by Provider.
Here is the scenario I would like to achieve.
Provider shares TABLE-A with 3 Columns (Value1, Value2, AggValue columns). Provider inserts data only into Value1 and Value2 columns.
Consumer performs calculations by reading data from Value1 and Value2 columns and updates AggValue column in TABLE-A provided by Provider
Provides now reads data from AggValue column that Consumer has updated
Note: It is single table Table-A that both Provider and Consumer are acting on.
Is above scenario possible to implement using Data Sharing, if not what are the suggested alternatives.
Thanks & Appreciate your response.

So, this can't be done directly in a single data share, but can be completed leveraging 2 data shares...1 in each direction.
In your scenario Provider A shares a table with the value1 and value2 to Consumer B. Consumer B then uses that data to populate a table in another database with value1, value2, and AggValue. Then, Consumer B shares that database back to Provider A.
Shares are single-direction, but a share can be created on each account going to the other account.

Related

what approach should I follow to check duplicate references id

I have some service which communicate with a central Core-service. Every time service pass Reference Id to Core-service and Core service check the duplicate data and store Reference Id along the service request. Now I checking the duplicate data from DB table .Which is too much time consuming. The Reference Id is generated in service. Reference Id cant be generated from one point in this case.
1.How to check duplicate data with less time lose ?
2.Do I need any NoSQL db for Reference Id check ?
or any thing else ?

Following query will return the duplicate reference Ids in the table.
SELECT referenceId, COUNT(referenceId)
FROM table
GROUP BY referenceId
HAVING COUNT(referenceId) > 1

Store task info in database

Let's say I have a database where I store the users tasks.
To simplify lets say that a task has:
id
status (New, To do, In progress, Done, Rejected)
description
owner/creator
Now what I want to achieve is when a user finishes a task - makes it Done or Rejected, I want the user to provide some info about that state change. I imagine a popup with two fields: one to choose from Done/Rejected statuses and a textfield area to fill in the reason if you will.
The problem I have here is: how do I store the information about the task completion?
My idea is to create a table like task_completions which will have:
id
task_id
comment (varchar or whatever)
Now the thing is: when I load a task for a user from the database and I see it's completed (again, Done or Rejected) I need to load the information about the task completion. So it would result in two API calls:
get task
get task_completion, where task_completion.task_id == task_id
There's of course another way - I can add a task_completion field to my task table. And of course make it nullable, since it won't be filled with anything until the task is completed. And then I won't have to make two API calls, since when using something like Django serializers (or something like this), I will be able to "expand" the task_completion field from just a plain number (id from task_completion) column to a full nested JSON with information about the completion (id, task_id, comment) - I just have to call get task. And then I don't even have to provide the task_id field in the task_completion table.
So summing up, I think that I have 3 choices:
task and task_completion tables, where task_completion table contains the task_id column - two API calls
task and task_completion tables, where task table contains task_completion_id column - one API call if I do it right
task and task_completion tables, where task table contains task_completion_id column and task_completion table contains a task_id column - still one API call I guess - but does it make sens to have both of those columns in those tables?

SSIS Move Data Between Databases - Maintain Referential Integrity

I need to move data between two databases and wanted to see if SSIS would be a good tool. I've pieced together the following solution, but it is much more complex than I was hoping it would be - any insight on a better approach to tackling this problem would be greatly appreciated!
So what makes my situation unique; we have a large volume of data, so to keep the system performant we have split our customers into multiple database servers. These servers have databases with the same schema, but are each populated with unique data. Occasionally we have the need to move a customer's data from one server to another. Because of this, simple recreating the tables and moving the data in place won't work as in the database on server A there could be 20 records, but there could be 30 records in the same table for the database on server B. So when moving record 20 from A to B, it will need to be assigned ID 31. Getting past this wasn't difficult, but the trouble comes when needing to move the tables which have a foreign key reference to what is now record 31....
An example:
Here's a sample schema for a simple example:
There is a table to track manufacturers, and a table to track products which each reference a manufacturer.
Example of data in the source database:
To handle moving this data while maintaining relational integrity, I've taken the approach of gathering the manufacturer records, looping through them, and for each manufacturer moving the associated products. Here's a high level look at the Control Flow in SSDT:
The first Data Flow grabs the records from the source database and pulls them into a Recordset Destination:
The OLE DB Source pulls from the source databases manufacturer table while pulling all columns, and places it into a record set:
Back in the control flow, I then loop through the records in the Manufacturer recordset:
For each record in the manufacturer recordset I then execute a SQL task which determines what the next available auto-incrementing ID will be in the destination database, inserts the record, and then returns the results of a SELECT MAX(ManufacturerID) in the Execute SQL Task result set so that the newly created Manufacturer ID can be used when inserting the related products into the destination database:
The above works, however once you get more than a few layers deep of tables that reference one another, this is no longer very tenable. Is there a better way to do this?

You could always try this:
Populate you manufacturers table.
Get your products data (ensure you have a reference such as name etc. to manufacturer)
Use a lookup to get the ID where your name or whatever you choose matches.
Insert into database.
This will keep your FK constraints and not require you to do all that max key selection.

Fetch data from 20 related tables (through id), combine them to a json File and leverage spring batch for this

I have a Person database in SQL Server with tables like address, license, relatives etc. about 20 of them. All the tables have id parameter that is unique per person. There are millions of records in these tables. I need to combine theses records of the person using their common id parameter, and convert to a json table file with some column name changes. This json file then gets pushed to kafka through a producer. If I can get the example with the kafka producer as item writer- fine, but real problem is understanding the strategy and specifics on how to utilize spring batch item reader, processor, and item writer to create the composite json file. This is my first Spring batch application so I am relatively new to this.
I am hoping for the suggestions on the implementation strategy using a composite reader or processor to use person id as the cursor, and query each table using the id for each table , convert the resulting records to json and aggregate it to a composite, relational json file with root table PersonData that feeds to kafka cluster.
Basically I have one data source, same database for the reader. I plan to use Person table to fetch id and other records unique for the person, and use id as the where clause for 19 other tables. convert each resultset from the table to json, and composite the json object at the end and write to kafka.

We had such an requirement in a project and solved it with the following approach.
In Splitflow, that run parallel, we had a step for ever table that loaded the data of the table in the file, sorted by common id (this is optional, but it is easier for testing, if you have the data in files).
Then we implemented our own "MergeReader".
This mergereader had FlatFileItemReaders for every file/table (let's call them dataReaders). All these FlatFileItemReaders were wrapped with a SingleItemPeekableItemReader.
The logic for the read method of the MergeReader is as follows:
public MyContainerPerId read() {
// you need a container to store the items, that belong together
MyContainerPerId container = new MyContainerPerId();
// peek through all "dataReaders" to find the lowest actual key
int lowestId = searchLowestKey();
for (Reader dataReader : dataReaders) {
// I assume, that more than one entry in a table can belong to
// the same person id
wihile (dataReader.peek().getId() == lowestId) {
{
container.add(dataReader.read());
}
}
// the container contains all entries from all tables
// belonging to the same person id
return container;
}
If you need restart capability, you have implement ItemStream in a way, that it keeps track of the current readposition for every dataReader.

I used the Driving Query Based ItemReaders usage pattern described here to solve this issue.
Reader: just a default implementation of JdbcCursoritemReader with sql to fetch
the unique relational id (e.g. select id from person -)
Processor: Uses this long id as the input and a dao implemented by me using
jdbcTemplate from spring fetches data through queries against each of
the table for a specific id (e.g. select * from license where id=) and map results in list format to a POJO
of Person - then convert to json object (using Jackson) and then to
string
Writer: either write the file out with json string or publish json string to a
topic in case of kafka use

We went through similar exercise migrating 100mn + rows from multiple tables as a form of JSON so that we can post it to a message bus.
The idea is create a view, de-normalize the data and read from that view using JdbcPagingItemReader.Reading from one source has less overhead.
When you de-normalize the data make sure you do not get multiple rows for master table.
Example - SQL server -
create or alter view viewName as
select master.col1 , master.col2,
(select dep1.col1,
dep1.col2
from dependent1 dep1
where dep1.col3 = master.col3 for json path
) as dep1
from master master;
The above will give you dependent table data in a json String with one row for each master table data. Once you retrieve the data you can use GSON or Jackson to convert it as POJO.
We tried to avoid JdbcCursoritemReader as it will pull all data in memory and read one by one from it. It does not support pagination.

sli database structure best approach?

We are working on SLI Search module for our client. I want to ask you is that a good approach to make separate tables or should I manage all clients data into single table?
Keep in mind that user can ask to update their module, means that table structure can be different of each client.
Secondly, client will give me their data and I will update all clients data in their tables or in single table using Package.
So, it will be good approach to make their separate tables in database or should i make a centralized table for all our clients?

If the table field varies from client (customer) to client, then it is recommended to have separate tables otherwise you will create a lot of null records.
Secondly, if every client is a separate instance and there is no correlation between them, then why now have a separate schema or database for them?
Sounds like SQLite database will be a good fit as every client data structure is unique, so you better make it portable so you can amend one SQLite database at a time.
Centralised Tables Approach
There is a new study and research that we used recently but you need to consider indexing issue for fast search on the fields.
Anyway, you can create a flat table, which have have many columns with sequential numbers e.g. Field1, Field2, Field3, Field4.....Field99, Field100... Field150 (as many potential customer fields you have).
You create another table and in that you map labels for every client (customer) to these fields. E.g.
Client ABC id is 10032
He has used from Field1 to Field11
Field1 has a label name FirstName
Field2 label is Surname
Field3 label is DOB
...
...
Field11 label is UserCountry
Now every time records are shown, you fetch the logged user labels and then map them to fields.
I hope this answers the question.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Data Sharing between two accounts, but with a capability to modify data - snowflake-cloud-data-platform

Related

what approach should I follow to check duplicate references id

Store task info in database

SSIS Move Data Between Databases - Maintain Referential Integrity

Fetch data from 20 related tables (through id), combine them to a json File and leverage spring batch for this

sli database structure best approach?

Categories

Resources