Snowflake in dBeaver - Display Snowflake only objects (Streams, Tasks, etc) - snowflake-cloud-data-platform

Anyone coding in snowflake and using dBeaver. There anyway to show streams, tasks, pipelines ,etc?
I created a stream on table and object explorer doesn't show it.

Related

How to easily find direct BigQuery table data source vs custom query data source in Data Studio Data Source list?

Is it possible to easily find BigQuery data sources in Google Data Studio by direct connection from BigQuery tables vs custom query? Currently, you will have to open one by one to see if the connection is a direct one or use a custom query.
I have to deal with about 50+ connections and was wondering if there is a better way to see which one has a direct connection to the BigQuery table and which one has a custom query in use. The goal is to build custom tables for the one where we are using custom queries.
Google Data Studio Data Source Tab:
This is not currently supported.

Tableau incremental refresh from Snowflake

I have a question regarding incremental refresh from Snowflake to Tableau. I know the feature for Incremental refresh/Incremental extract is available in Tableau but can it be used for incremental loads from Snowflake? And how does it work?
The reason for me asking is because I know that query folding which other BI-tools on the market uses for incremental refreshes, isn't possible in Snowflake.
Thanks!
/P
Tableau incremental refreshes work the same for Snowflake as it does for other databases.
"Query Folding" looks like a Microsoft (and specifically PowerBI) term. According to this article https://exceleratorbi.com.au/how-query-folding-works/ "query folding" is the process of pushing the work load down to the database, which is what Tableau does when querying Snowflake tables directly.
With Snowflake I would recommend querying the tables directly as they are already setup in columnar format, and you can avoid moving the data to a Tableau Server and waiting on refreshes. Snowflake has unlimited storage whereas you might be limited by your Tableau Server.
If you need the tables in Snowflake to only show data as of a point in time, there are different ways you could accomplish this including:
Preset date filters (or parameters as filter within Tableau) that are pushed down to Snowflake
Using Tasks in Snowflake to run at a specific time to:
Clone your tables, and use the clones for reporting
Update existing reporting tables
I agree with Chris' answer accept for avoiding the extracts on Tableau Server. There can be a lot of performance gains had by using Tableau to extract the data. We run extracts out of Snowflake for most of our data sources. We also test both live connections and extracts for each to see which performs best. If timing is an issue, extracts can be set to refresh every 15 minutes at the most.
To get extracts loaded and refreshing use the following steps.
Switch your data source to an extract in Tableau Desktop
This will create a local copy of the data to be used to publish next.
Select Server/Publish Workbook
In the Publish settings, choose your refresh schedule and publish to Tableau Server. The workbook and data source will be loaded to Server.
You can also update the refresh schedules directly in Server by navigating to the new data source and going to the Extract Refreshes tab.
If you don't have the correct schedule available, you can create one in the Schedules menu for the site.

What the Process to transfer the staging table data to Fact tables in Snowflake by Custom Validations

good Day.
I need help. I want to transfer the data in Snowflake from Staging tables to Fact tables automatically, when data is available in Stage table. While moving data from Staging table to Fact tables, I have couple of Custom validations on each column and row.
Any idea how to do this in Snowflake.
If any one knows could you please suggest me...!
Thanks in Advance...!
There are many ways to do this and how you go about it depends on what tools you have available. The simplest way to do this without using tools outside of the Snowflake ecosystem would be:
On each of the staging tables you have, set up a stream on these tables (here is the Snowflake documentation on streams)
Create a task that runs on a schedule (here is the Snowflake doc on tasks) to pull from the streams and write into the fact table.
This is really a general data warehousing question rather than a Snowflake one. Here is some more documentation on building SCD type 2 dimensions also written by someone at Snowflake
Assuming "staging tables" refers to a Snowflake table and not a file in a Snowflake stage, I would recommend using a Stream and Task for this. A stream will identify the delta of data that needs to be loaded, and a Task can execute on a schedule and will only actually run something if there is data in the stream. Create a stored procedure that is executed in the Task to run your validations and Merge the outcome of those into your Fact.

AWS Glue: SQL Server multiple partitioned databases ETL into Redshift

Our team is trying to create an ETL into Redshift to be our data warehouse for some reporting. We are using Microsoft SQL Server and have partitioned out our database into 40+ datasources. We are looking for a way to be able to pipe the data from all of these identical data sources into 1 Redshift DB.
Looking at AWS Glue it doesn't seem possible to achieve this. Since they open up the job script to be edited by developers, I was wondering if anyone else has had experience with looping through multiple databases and transfering the same table into a single data warehouse. We are trying to prevent ourselves from having to create a job for each database... Unless we can programmatically loop through and create multiple jobs for each database.
We've taken a look at DMS as well, which is helpful for getting the schema and current data over to redshift, but it doesn't seem like it would work for the multiple partitioned datasource issue as well.
This sounds like an excellent use-case for Matillion ETL for Redshift.
(Full disclosure: I am the product manager for Matillion ETL for Redshift)
Matillion is an ELT tool - it will Extract data from your (numerous) SQL server databases and Load them, via an efficient Redshift COPY, into some staging tables (which can be stored inside Redshift in the usual way, or can be held on S3 and accessed from Redshift via Spectrum). From there you can add Transformation jobs to clean/filter/join (and much more!) into nice queryable star-schemas for your reporting users.
If the table schemas on your 40+ databases are very similar (your question doesn't clarify how you are breaking your data down into those servers - horizontal or vertical) you can parameterise the connection details in your jobs and use iteration to run them over each source database, either serially or with a level of parallelism.
Pushing down transformations to Redshift works nicely because all of those transformation queries can utilize the power of a massively parallel, scalable compute architecture. Workload Management configuration can be used to ensure ETL and User queries can happen concurrently.
Also, you may have other sources of data you want to mash-up inside your Redshift cluster, and Matillion supports many more - see https://www.matillion.com/etl-for-redshift/integrations/.
You can use AWS DMS for this.
Steps:
set up and configure DMS instance
set up target endpoint for redshift
set up source endpoints for each sql server instance see
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html
set up a task for each sql server source, you can specify the tables
to copy/synchronise and you can use a transformation to specify
which schema name(s) on redshift you want to write to.
You will then have all of the data in identical schemas on redshift.
If you want to query all those together, you can do that by wither running some transformation code inside redsshift to combine and make new tables. Or you may be able to use views.

How to secure SharePoint Shared SSRS Datasources

I have a large reporting SharePoint site that contains about a dozen different shared data source connections, each one pointing at a different SQL server that is being utilized by the SSRS reports hosted on the site. Each data source has a cached account that is used to retrieve the data when a report runs so that report readers do not have to have read access to all of our SQL databases.
When someone with report building privileges creates a report, they are able to select one of the shared data sources hosted on the website, but then have to pass an authentication popup before they can actually write a query against the database:
The strategy currently in use is that our authors do have read access to the SQL database and use that authentication (Use the current Windows user) to create the report and then when they save the report, readers utilize the account stored in the shared data source. We then manage access to the data in the report through SharePoint security by only allowing people who should see that data to have access to the report.
This seems all very standard to me...however
I am able to query any database that any of the shared data sources have access to, regardless of my own permissions with a bit of rdl definition manipulation by following these steps:
1) Current account needs access to report builder and AD access to at least one SQL datasource (to make things easier)
2) Add a shared data source to the report that I have access to
3) Add a dataset with a query that follows this format SELECT '' as Field1 FROM DBNAME
4) Add a table to the report that simply displays Field1 from the query
5) Add one of the shared data sources that I should have no access to (there is no stopping me from adding the shared connection to the report, I simply am unable to use report builder to create a dataset using that data source)
6) Save report on the SharePoint site and then download a copy to local computer
7) Open rdl definition. Replace the data source for the SQL query with the name of the "unauthorized" data source (can delete original data source). Replace the SQL query with one that queries the database for a list of table names (SELECT name as Field1 FROM sys.Tables)
8) Upload report definition back to SharePoint and run report
The report now uses the cached account and I've bypassed the nice authorization window that using report builder would have provided. By using sys queries, I can find the databases, tables, columns and eventually the data without having to know anything about the database. I could slow this method done by preventing access to the master database so that a list of databases can't be retrieved, but that's minor and not a complete solution.
Options:
- Could enforce security at the database level, however I don't want report readers to have permission against any of my source databases. While each report could be fed from a view and then separately controlled to prevent access to anything more than what the report shows, this would be uncontrollable
- Force every report to use an embedded connection and not a shared connection. This would be hard to manage in the future when moving servers or when we need to know what reports are utilizing a specific connection (dependent items are available in the data source drop down menu)
I feel like I'm missing something obvious here as this seems to totally defeat the purpose of hosted, shared data sources.
The advantages of Shared Data Sources are administrative, in that they reduce the overhead in making changes to data source connection details such as passwords and server names. As you pointed out, using Shared Data Sources also allows you to easily identify dependent reports.
However Shared Data Sources are not a mechanism for securing data sources such as databases. Security really needs to be addressed at the database level to properly ensure only authorized people have access. If the credentials are stored in the report data source, then anyone able to access that data source or refernce it in a report is going to be able to execute queries on the connection.
I think the issue is in this step:
5) Add one of the shared data sources that I should have no access to
(there is no stopping me from adding the shared connection to the
report, I simply am unable to use report builder to create a dataset
using that data source)
There should be some way to prevent the report designers seeing shared data sources that they do not have permissions on. You might need to set individual permissions for each item, or put them in different locations to allow them to be secured with the correct permissions. I'm not a Sharepoint expert though so this is just a suggestion.

Resources