Monitoring the cost of Snowpipe (in Snowflake)? - snowflake-cloud-data-platform

Monitoring the cost of Snowpipe (in Snowflake)? - snowflake-cloud-data-platform

(A Snowflake User asked the following in our legacy Q&A Forum)
Is there a way to put snowpipe under resource monitor? Or is there a similar way to monitor the cost of the snowpipe?

(According to one of our Technical Account Managers on Snowflake's Professional Services Team)
It does not appear at this time to have that functionality. From the documentation:
https://docs.snowflake.net/manuals/user-guide/resource-monitors.html
"In addition, an account-level resource monitor does not control credit usage by the Snowflake-provided warehouses (used for Snowpipe, automatic reclustering, and materialized views); the monitor only controls the virtual warehouses created in your account."
and
"An account-level resource monitor only controls the virtual warehouses explicitly created in your account; it does not control credit usage by the Snowflake-provided warehouses (for Snowpipe, Automatic Clustering, and materialized views)."
You can monitor your credits however in the PIPE_USAGE_HISTORY table.
Information Schema
https://docs.snowflake.net/manuals/sql-reference/functions/pipe_usage_history.html
Account Usage
https://docs.snowflake.net/manuals/sql-reference/account-usage/pipe_usage_history.html
Using that information you could pause the PIPE with PIPE_EXECUTION_PAUSED
https://docs.snowflake.net/manuals/sql-reference/sql/alter-pipe.html
Maybe a scheduled stored procedure to monitor and notify/pause at levels set in the stored procedure? I have not done this, but think it should work in theory. Hope that helps.
Interested to see if anyone else has any "outside of the box" ideas in addressing the above question... perhaps a method to employ as a work-around?

Related

Is data sharing only supported between accounts in the same Snowflake region?

"Data Sharing is only supported between accounts in the same Snowflake region" as per multiple sources I have come across.
However, with dB replication we can replicate dB to other accounts across regions and cloud platforms. These two statements seem to contradict. Can someone please clarify?

"Data Sharing is only supported between accounts in the same Snowflake region"
This text appears to be from an earlier documentation revision that does not appear to hold true anymore with recent enhancements to SnowflakeDB.
As surmised, data shares are supported across regions via a replication step per the current documentation.

This is all technically accurate. Data Sharing only works within a single region. However, you can replicate your account to another region and then data share from that replicated account within that account's region. This is also true across cloud platforms.

was preparing for Snowflake Certification and had similar problem.
But I finally realised that you can share data across region ONLY if REPLICATING.
If you are NOT REPLICATING then sharing the data can be only within the region
Please see link below for more details :)
https://www.linkedin.com/pulse/snowflake-data-sharing-limitations-tips-minzhen-yang

NServiceBus & ServiceInsight Sql Server Transport & Persistence

The application we have been building is starting to solidify in that the majority of the functionality is now in place. This has given us some breathing room and we are starting evaluate our persistence model and the management of it. I guess you could say the big elephant in the room is RavenDB. While we functionally have not experienced any issues with it yet, we are not comfortable with managing it. Simple tasks such as executing a query, truncating a collection, etc, are challenging for us as we are new to the platform and document based NoSql solutions in general. Of course we are capable of learning it, but I think it comes down to confidence, time, and leveraging our existing Sql Server skill sets. For example, we pumped millions of events through the system over the course of a few weeks and the successfully processes message were routed to our Audit queue in MSMQ. We also had ServiceInsight installed and it processed the messages in the Audit queue, which chewed up all the disk space on the server. We did not know how to fix this and literally had to delete the Data file that we found for RavenDB. Let me just say, doing that caused all kinds of headaches.
So with that in mind, I have been charged with evaluating the feasibility and benefits of potentially leveraging Sql Server for the Transport and/or Persistence for our Service Endpoints. In addition, I could use some guidance as well for configuring ServiceControl and ServiceInsight to leverage Sql Server. Any information you might be able to provide regarding configuring these and identifying any draw backs or architectural issues that we should consider would be greatly appreciated.
Thank you, Jeffrey

Using SQL persistence requires very little configuration (implementation detail), however, using SQL transport is more of an architectural decision then an infrastructure one as you are changing to a broker style architecture, that has implications you need to consider before going down that route.
ServiceControl and ServiceInsight persistance:
Although the ServiceControl monitors MSMQ as the default transport, you can use ServiceControl to support other transports such as RabbitMQ, SqlServer as well, Here you can find the details of how to do that
At the moment ServiceControl relies on RavenDb for it's persistence and it is not possible to change that to SQL as ServiceControl relies on Raven features.(AFIK)
There is an open issue for expiring data in ServiceControl's data, see this issue in github
HTH

Regarding ServiceControl usage of RavenDB (this is the underlying service that serves the data to ServiceInsight UI):
As Sean Farmar mentioned (above), in the post-beta releases we will be including message expiration, and on-demand audited message deletion commands so that you can have full control of the capacity utilization of SC.
You can also change the drive/path of the ServiceControl database location to allow it to use a larger drive.
Note that ServiceControl (and ServiceInsight / ServicePulse that use it) is intended for analysis, debugging and operational monitoring. Its intended to store a limited amount of audited data (based on your throughput and capacity needs, this may vary significantly when counted as number of messages, but the database storage capacity can be up to 16TB).
If you need a long term storage for audited data, you can hook into ServiceControl HTTP API and transfer the messages' data into various long-term / unlimited-size / low-cost storage solutions (e.g. http://aws.amazon.com/glacier).
Please let us know if this answers your needs and whether you have additional questions
Danny.

Databases with utilization constraints

I have a web-based application that allows users to create their own complicated queries using a simplified scripting language and GUI. Problem is - sometimes my users are well, not so bright. Often, they'll create a query that does massive joins, or employs pointless comparisons over large datasets that quickly consumes most of the available resources on the machine. In effect, a small amount of folks are ruining the party for everyone else. Training or banning these "special" users isn't an option.
So here's my question: Are there any databases (NoSQL or SQL, or anything really) that support resource constraints on a per query basis?
Limiting CPU utilization would be bare minimum, but other constraints like execution time, memory usage and rows-returned limits would be nice too. It'd be especially handy if I could programmatically specify limits so I could target my problem users.
EDIT: Extra points for opensource and/or free products.
EDIT2: Found some related questions, that make it clear that Oracle supports some sort of resource-limiting scheme, but are there any other products that do? Just Oracle and SQL Server?
https://serverfault.com/questions/124158/throttle-or-limit-resources-used-by-a-user-in-a-database
Is there a way to throttle or limit resources used by a user in Oracle?

SQL Server 2008 supports a resource governor:
Resource Governor is a new technology
in SQL Server 2008 that enables you to
manage SQL Server workload and
resources by specifying limits on
resource consumption by incoming
requests. In the Resource Governor
context, workload is a set of
similarly sized queries or requests
that can, and should be, treated as a
single entity. This is not a
requirement, but the more uniform the
resource usage pattern of a workload
is, the more benefit you are likely to
derive from Resource Governor.
Resource limits can be reconfigured in
real time with minimal impact on
workloads that are executing.
Resource Governor provides:
The ability to classify incoming connections and route their workloads
to a specific group.
The ability to monitor resource usage for each workload in a group.
The ability to pool resources and set pool-specific limits on CPU usage
and memory allocation. This prevents
or minimizes the probability of
run-away queries.
The ability to associate grouped workloads with a specific pool of
resources.
The ability to identify and set priorities for workloads.
Ref.

Resource constraints may ease your problem, but I think that the problem behine the situation is the unpredictable usage of resources.
When database is executing queries, database need to load data into memory and lock resources to maintain the consistent of status.
Whatever the constraint system can do, the unpredictable behavior in internal mechanism of database is the most risky thing you should concern.
If I'm facing this kind of situation, I would try to figure out what the user really need and provide more precise query(in some table, or some conditions of data) for it.
If nothing can do, however, I would try to clone(replication) the database for heavy-used user.

What does 'MGMTCLASS' of a dataset describe?

While allocating a dataset, What does MGMTCLASS of a dataset describe? To my knowledge it gives the retention and expiration period that it is gonna reside on disk and the possible values I have observed are BKUP35, NOBKNLIM etc. What are these stand for and what else are the possible value for this parameter? Hope I put my question exactly, please lemme know if i missed something...
Addendum: Can i ask another question here. How often does a dataset is set to be backed up? I know it's specific to installation of SMS, but do we have something related to MGMTCLASS. Say dataset will be backed up when it stays some % of time specified on MGMTCLASS like that.. am I clear?

On IBM z/os mainframes, the MGMTCLAS values are defined by your systems management. Each installation may have different values. You will have to ask your site management for the values they have defined for your environment.
MGMTCLAS is given when defining a new dataset on a system where the Storage Management System (SMS) feature is installed. SMS uses the assigned MGMTCLAS to guide various aspects of dataset management. Typical usages are:
Manage migration of inactive datasets to archival storage
Set the backup frequency
Determine how often to compress partitioned datasets
To delete datasets where the retention period has expired
To release unused space in a dataset
To enable cost accounting on datasets

Should application users be database users?

My previous job involved maintenance and programming for a very large database with massive amounts of data. Users viewed this data primarily through an intranet web interface. Instead of having a table of user accounts, each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc., as well as permitting us to control access through the RDBMS itself instead of using our own application logic.
Is this a good setup, assuming you're not on the public intranet and dealing with potentially millions of (potentially malicious) users or something? Or is it always better to define your own means of handling user accounts, your own permissions, your own application security logic, and only hand out RDBMS accounts to power users with special needs?

I don't agree that using the database for user access control is as dangerous others are making it out to be. I come from the Oracle Forms Development realm, where this type of user access control is the norm. Just like any design decision, it has it's advantages and disadvantages.
One of the advantages is that I could control select/insert/update/delete privileges for EACH table from a single setting in the database. On one system we had 4 different applications (managed by different teams and in different languages) hitting the same database tables. We were able to declare that only users with the Manager role were able to insert/update/delete data in a specific table. If we didn't manage it through the database, then each application team would have to correctly implement (duplicate) that logic throughout their application. If one application got it wrong, then the other apps would suffer. Plus you would have duplicate code to manage if you ever wanted to change the permissions on a single resource.
Another advantage is that we did not need to worry about storing user passwords in a database table (and all the restrictions that come with it).
I don't agree that "Database user accounts are inherently more dangerous than anything in an account defined by your application". The privileges required to change database-specific privileges are normally MUCH tougher than the privileges required to update/delete a single row in a "PERSONS" table.
And "scaling" was not a problem because we assigned privileges to Oracle roles and then assigned roles to users. With a single Oracle statement we could change the privilege for millions of users (not that we had that many users).
Application authorization is not a trivial problem. Many custom solutions have holes that hackers can easily exploit. The big names like Oracle have put a lot of thought and code into providing a robust application authorization system. I agree that using Oracle security doesn't work for every application. But I wouldn't be so quick to dismiss it in favor of a custom solution.

Edit: I should clarify that despite anything in the OP, what you're doing is logically defining an application even if no code exists. Otherwise it's just a public database with all the dangers that entails by itself.
Maybe I'll get flamed to death for this post, but I think this is an extraordinarily dangerous anti-pattern in security and design terms.
A user object should be defined by the system it's running in. If you're actually defining these in another application (the database) you have a loss of control.
It makes no sense from a design point of view because if you wanted to extend those accounts with any kind of data at all (email address, employee number, MyTheme...) you're not going to be able to extend the DB user and you're going to need to build that users table anyway.
Database user accounts are inherently more dangerous than anything in an account defined by your application because they could be promoted, deleted, accessed or otherwise manipulated by not only the database and any passing DBA, but anything else connected to the database. You've exposed a critical system element as public.
Scaling is out of the question. Imagine an abstraction where you're going to have tens or hundreds of thousands of users. That's just not going to manageable as DB accounts, but as records in a table it's just data. The age old argument of "well there's onyl ever going to be X users" doesn't hold any water with me because I've seen very limited internal apps become publicly exposed when the business feels it's could add value to the customer or the company just got bought by a giant partner who now needs access. You must plan for reasonable extensibility.
You're not going to be able to share conn pooling, you're not going to be any more secure than if you just created a handful of e.g. role accounts, and you're not necessarily going to be able to affect mass changes when you need to, or backup effectively.
All in there seems to be numerous serious problems to me, and I imagine other more experienced SOers could list more.

I think generally. In your traditional database application they shouldnt be. For all the reason already given. In a traditional database application there is a business layer that handles all the security and this is because there is such a strong line between people who interact with the application, and people who interact with the database.
In this situation is is generally better to manage these users and roles yourself. You can decide what information you need to store about them, and what you log and audit. And most importantly you define access based on pure business rules rather than database rules. Its got nothing to do with which tables they access and everything to do with whether they can insert business action here. However these are not technical issues. These are design issues. If that is what you are required to control then it makes sense to manage your users yourself.
You have described a system where you allow users to query the database directly. In this case why not use DB accounts. They will do the job far better than you will if you attempt to analyse the querys that users write and vet them against some rules that you have designed. That to me sounds like a nightmare system to write and maintain.
Don't lock things down because you can. Explain to those in charge what the security implications are but dont attempt to prevent people from doing things because you can. Especially not when they are used to accessing the data directly.
Our job as developers is to enable people to do what they need to do. And in the situation you have described. Specifically connect to the database and query it with their own tools. Then I think that anything other than database accounts is either going to be insecure, or unneccasarily restrictive.

"each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc.,"
not a good idea if the RDBMS contains:
any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
credit card information or other customer credit info (POs, lines of credit etc)
personal information (ssn, dob, etc)
competitive, proprietary, or IP information
because when users can use their own non-managed query tools the company has no way of knowing or auditing what information was queried or where the query results were delivered.
oh and what #annakata said.

I would avoid giving any user database access. Later, when this starts causing problems, taking away their access becomes very dificult.
At the very least, give them access to a read-only replica of the database so they can't kill your whole company with a bad query.

A lot of database query tools are very advanced these days, and it can feel a real shame to reimplement the world just to add restrictions. And as long as the database user permissions are properly locked down it might be okay. However in many cases you can't do this, you should be exposing a high-level API to the database to insert objects over many tables properly, without the user needing specific training that they should "just add an address into that table there, why isn't it working?".
If they only want to use the data to generate reports in Excel, etc, then maybe you could use a reporting front end like BIRT instead.
So basically: if the users are knowledgeable about databases, and resources to implement a proper front-end are low, keep on doing this. However is the resource does come up, it is probably time to get people's requirements in for creating a simpler, task-oriented front-end for them.

This is, in a way, similar to: is sql server/AD good for anything
I don't think it's a bad idea to throw your security model, at least a basic one, in the database itself. You can add restrictions in the application layer for cosmetics, but whichever account the user is accessing the database with, be it based on the application or the user, it's best if that account is restricted to only the operations the user is allowed.
I don't speak for all apps, but there are a large number I have seen where capturing the password is as simple as opening the code in notepad, using an included dll to decrypt the configuration file, or finding a backup file (e.g. web.config.bak in asp.net) that can be accessed from the browser.

*not a good idea if the RDBMS contains:
* any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
* credit card information or other customer credit info (POs, lines of credit etc)
* personal information (ssn, dob, etc)
* competitive, proprietary, or IP information*
Not true, one can perfectly manage which data a database user can see and which data it can modify. A database (at least Oracle) can also audit all activities, including selects. To have thousands of database users is also perfectly normal.
It is more difficult to build good secure applications because you have to program this security, a database offers this security and you can configure it in a declarative way, no code required.

I know, I am replying to a very old post, but recently came across same situation in my current project. I was also thinking on similar lines, whether "Application users be Database users?".
This is what I analysed:
Definitely it doesn't make sense to create that big number of application users on database(if your application is going to be used by many users).
Let's say you created X(huge number) of users on database. You are opening a clear gateway to your database.
Let's take a scenario for the solution:
There are two types of application users (Managers and Assistant). Both needs access to database for some transactions.
It's obvious you would create two roles, one for each type(Manager and Assistant) in database. But how about database user to connect from application. If you create one account per user then you would end up linearly creating the accounts on the database.
What I suggest:
Create one database account per Role. (Let's say Manager_Role_Account)
Let your application have business logic to map an application user with corresponding role.(User Tom with Manager role to Manager_Role_Account)
Use the database user(Manager_Role_Account) corresponding to identified role in #2 to connect to database and execute your query.
Hope this makes sense!
Updated: As I said, I came across similar situation in my project (with respect to Postgresql database at back end and a Java Web app at front end), I found something very useful called as Proxy Authentication.
This means that you can login to the database as one user but limit or extend your privileges based on the Proxy user.
I found very good links explaining the same.
For Postgresql below Choice of authentication approach for
financial app on PostgreSQL
For Oracle Proxy Authentication
Hope this helps!

It depends (like most things).
Having multiple database users negates connection pooling, since most libraries handle pooling based on connection strings and user accounts.
On the other hand, it's probably a more secure solution than anything you or I will do from scratch. It leaves security up to the OS and Database server, which I trust much more than myself. However, this is only the case if you go to the effort to configure the database permissions well. If you're using a bunch of OS/db users with the same permissions,it won't help much. You'll still get an audit trail, but that's about it.
All that said, I don't know that I'd feel comfortable letting normal users connect directly to the database with their own tools.

I think it's worth highlighting what other answers have touched upon:
A database can only define restrictions based on the data. Ie restrict select/insert/update/delete on particular tables or columns. I'm sure some databases can do somewhat cleverer things, but they'll never be able to implement business-rule based restrictions like an application can. What if a certain user is allowed to update a column only to certain values (say <1000) or only increase prices, or change either of two columns but not both?
I'd say unless you are absolutely sure you'll never need anything but table/column granularity, this is reason enough by itself.

This is not a good idea for any application where you store data for multiple users in the same table and you don't want one user to be able to read or modify another user's data. How would you restrict access in this case?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight