Performance Effective JSON data masking Snowflake - snowflake-cloud-data-platform

I am trying to perform data masking on JSON data.
Using a Javascript UDF to update list of NESTED JPATH attributes similar to what is done here,
https://www.snowflake.com/blog/masking-semi-structured-data-with-snowflake/
Additionally I tried nested OBJECT_INSERT statements to mask a specific attribute but having multiple attributes to mask I have to build a list of subqueries to perform OBJECT INSERT on previous sub query result which is complex.
Ex:
FROM (
SELECT OBJECT_INSERT(VAR_COL,'LVL1',OBJECT_INSERT(VAR_COL:LVL1,'KEY1',OBJECT_INSERT(VAR_COL:LVL1.KEY1,'KEY2','VALUE',TRUE),TRUE),TRUE) AS VAR_COL
FROM TABLE
)
Another problem with OBJECT_INSERT which is not letting me use it is if the JPATH doesn't exists for a specific JSON row it will add that JPATH which I dont want.
I am working with million of records and using XS Warehouse it takes 15 mins to do a simple query using JavaScript UDF.
Alternately, also tried Snowpark UDF but it is also showing very small improvement.
Any idea on improving performance further?

Related

Take data from different datasets and inserting them into a SQL table from SSRS report

I have a SSRS report with 20 different datasets with some calculated columns in each.
I want to take few fields from all data sets including some calculated columns and insert them into a SQL table.
I want to do this for each month so that I can see the trends during a period. Is there any way to do that with out editing the data sets?
Can I refer the fields that I need by referring to Textbox4 and insert them into a SQL table? What is the easy way to do without touching data sets?
There is most likely alot better solution than using SSRS to update a SQL database. I am not proposing this as the best solution but rather a way to achieve what was asked.
You could create a dataset that runs a stored procedure you can pass the data as parameters to. The Sproc would do the insert into your chosen table and you can pass in the parameters from your original dataset however you see fit. You could even setup a second report with the Stored procedure Dataset that you can call on command by having an action event on an item to call the report. (I had a subreport embedded in a column of a tablix configured so it would only update with values from that row for instance).
To clarify:
Create a subreport that accepts the data you want to insert as a parameter for each column
Instead of adding a normal Dataset, have it call a stored procedure that inserts as you require
Add the subreport to your main report to be called once its run and configure the required parameters to be passed through.
There will be better, more efficient, cleaner ways to do this, but I found the above to work for my purposes since I was limited by time and resources. But I would still recommend you seek other solutions if possible.

Bulk Update with SQL Server based on a list of primary keys

I am processing data from a database with millions of rows. I am pulling a batch of 1000 items from the database and processing them without a problem. I am not loading the whole entity and I am just pulling down a few columns of data for the batch.
What I want to do is mark the 1000 rows as processed with a single SQL command.
Something like:
UPDATE dbo.Clients
SET HasProcessed = 1
WHERE ClientID IN (...)
The ... is just a list of integers.
Context:
Azure SQL Server 2012
Entity Framework
Database.ExecuteSqlCommand
Ideas:
I know I could build the command as a pure string, but this would mean not using any SqlParameters and not benefiting from the query plan optimization.
Also, I found some information about table-valued parameters, but this requires creating a table type and some overhead which I would like to avoid. This is just a list of integers after all.
Question:
Is there an easy (performant) way to do this that I am overlooking either with Entity Framework or ExecuteSqlCommand?
If not and using table-valued parameters is the best way, could you provide a complete example of how to convert an integer list into the simplest type and running that with the above query?

How to apply same data manipulation codes to a group of SSIS components' inputs?

I am new to SSIS.
I have a number of MS access tables to transform to SQL. Some of these tables have datetime fields needed to go under some rules before sitting in respected SQL tables. I want to use Script component that deals with these kind of fields converting them to the desired values.
Since all of these fields need same modification rules, I want to apply the same code base to all of them thus avoiding the code duplication. What would be the best option for this scenario?
I know I can't use the same Script Component and direct all of those datasets outputs to it because unfortunately it doesn't support multi-inputs . So the question is is it possible to apply a set of generic data manipulation rules
on a group of different datasets' fields without repeating the rules. I can use a Script component for each ole db input and apply the same rule on them each. But it would not be an efficient way of doing that.
Any help would be highly appreciated.
SQL Server Integration Services has a specific task to suit this need, called a Data Conversion Transformation. This can be accomplished on the data source or via the task, as noted here.
You can also use the Derived Column transformation to convert data. This transformation is also simple, select an input column and then chose whether to replace this column or create a new output column. Then you apply an expression for the output column.
So why use one over the other?
The Data Conversion transformation (Pictured Below) will take an input, convert the type and provide a new output column. If you use the Derived Column transformation, you get to apply an expression to the data, which allows you to do more complex manipulations on the data.

ADO - Can I edit results of a complex query with multiple join statements?

I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.

SSRS Multi value parameters - appropriate layer for implmentation of the filter

When using multivalue parameters in sql reporting services is it more appropriate to implement the list filter using a filter on the dataset itself, the data region control or change the actual query that drives the dataset?
SSRS will support any scenario, so then I ask, is there a reason beyond the obvious of why this should be done at one level over another?
It makes sense to me that modifying the query itself and asking the RDBMS to handle the filtering would be most efficient but maybe I am missing something with respect to how the SSRS Data Processing Extension may handle this scenario?
You are correct. The way to go is to pass the parameters through to the database engine.
Reporting Services should only be ideally used to render content. The less data that you need to pass back to the client web browser, the faster the report will render.
You may find my answer to a similar post regarding using mulit-value parameters to be of use.
Passing multiple values for a single parameter in Reporting Services
Hope this helps but please feel free to pose any further questions you may have.
Cheers,
John
Using table-valued UDF is a good approach, but there is still one issue - in case if this function is called in many places of query, and even inside inner select, there can be performance problem. You can resolve this issue using table variable (or temp table eather):
DECLARE #Param (Value INT)
INSERT INTO #Param (Value)
SELECT Param FROM dbo.fn_MVParam(#sParameterString,',')
...
where someColumn IN(SELECT Value FROM #Param)
so function will be called only once.
Othe thing, if you don't use stored procedure, but embedded SQL query instead, you can just put MVP into query:
...
where someColumn IN(#Param)
...
Use the RDBMS to do the main filtering
SSRS provides filtering for the purposes on data driven display and/or dynamic display. Especially useful for sub reports etc

Resources