Snowflakes : Accessing the database objects - snowflake-cloud-data-platform

statement :User/role should have access to both database and warehouse in order to access the tables in snowflakes.
Is this statement correct ?am i understanding it correctly?
Person X has access to database and not having access to any of the available warehouse.In this case
can person x able to access the tables in the database?

Databases are storage for Snowflake. Warehouse is compute. So, in order to query data in Snowflake, you need access to (a) a database, schema, and table, as well as (b) a warehouse in order to execute a query against that data.
To answer your question, Person X has no compute to get to the data in the database, so, no.

Related

How to edit records in SQL-Server stored procedure

I would like to know the secret of how SQL statements in SQL-Server go from being read-only to editable. Right click on any table, and the interface gives the option of "Selecting" or "Editing" records. Is there a property in the SQL statement that designates the recordset as editable or read-only?
I will use the simplest possible example: I have designed a table with two fields: an integer field, designated as an identity and a unique index. The second is an nvarchar, designed for manual editing. Writing a query window, I write a SQL statement for the table, and I am not able to edit the text field. Also, Stored procedures, which I favor because I can evoke them with the greatest effeciency, also renders an uneditable recordset. The only way I have found to succeed is in SSMS, when choosing the edit feature on a table.
I use Microsoft Access extensively, and all the tables that Access hosts are linked to SQL-Server tables. When I use the Microsoft Access JET engine to write queries on these same tables, I can edit the recordsets the queries generate, but not when I use pass-through queries to evoke the same contents in a table function or stored procedure. With no table joins, no calculated fields, nor anything else that would impose a known reason for me not to be able to edit the recordsets, the inability poses a barrier to producing some of my deliverables.
Thanks, in advance, for your support. Here are quick examples:
Select
IDField
, TextField
From
SampTable
Create Procedure TestProc
AS
BEGIN
Select
IDField
, TextField
From
SampTable
END
Create FUNCTION [dbo].[TestFunction]()
RETURNS TABLE
AS
RETURN
(
Select
IDField
, TextField
From
SampTable
)
SQL Server is not the same kind of thing as MS Access. MS access is a combination of front-end and back end at the same time, which is nice and easy for users, and does have its place. It's like a souped up version of excel with some very limited multi user functionality. But with SQL Server, the expectation is that you are splitting the responsibilities between front end and back end.
Yes, SSMS does provide the ability to right click a table (or a view referencing one table) and "edit top 200 rows". Honestly, I wish it didn't. It shouldn't.
If you have an access "front end" using linked tables in SQL Server in the "back end", that's similar functionality. And yeah, there are some limited uses cases where that's an appropriate sort of solution, ideally as a temporary thing. But really, if you're putting data into SQL Server, the expectation is that you're building some kind of "real" user interface, which uses DML statements constructed by the application, or stored procedure execution, or some kind of ORM and DBContext, to modify the data. Even in MS Access, you should switch from direct table editing to forms.
The reason why you can't edit the results of a stored procedure or function is that the output of those objects is just a temporary copy of the data. It's not the "actual data in the tables". And, if you think about it, how could it be? For example, imagine if I wrote a stored procedure like this:
create table t (i int primary key, j int);
create procedure p as begin
select total_j = sum(j) from t;
end
When I run that stored procedure I'm going to get a single value which is the sum of j across all rows. How could I edit this value? If I changed it from, say, 100 to 200, what does that mean in terms of the contents of column j in the table? Do I add 100 to some arbitrary row? Do I add 1 to each of the first 100 rows in order of the primary key? The concept becomes incoherent.
I know what you're thinking: "But what if my stored procedure doesn't aggregate? Surely then the data that comes back can really just be a "pointer" to the data in the table, not a copy?". And yeah, in principle that could be true. But think about the implications of that. While you're looking at the results, can anyone else change the underlying data in the table? Can you both change it at the same time? Who decides how to resolve that problem - the SQL engine? Can someone else drop the table while you are editing data? And so on and so forth.
It's the wrong way to think about SQL Server (or any "real" database engine). The data you see as the result of a select is read from the tables, and sent over the network to the client as your own personal copy. It is no longer connected to the tables it came from.
Oh... and in case you're wondering how you can edit the data "directly in the tables" if you're using linked tables in MS Access: you still can't. Access does some work under the covers for you. To prove this, try linking a SQL Server table to MS Access, then pulling up the row in access, and starting to edit it. Then, before finishing your edit, go in to SSMS and update the row you are editing in access. Then try to save your changes in Access.

Snowflake: How can I find out which internal table stages consumes most storage?

In my snowflake account I can see that there is a lot of storage used by stages. I can see this for example using the following query:
select *
from table(information_schema.stage_storage_usage_history(dateadd('days',-10,current_date()),current_date()));
There are no named stages in the databases. All used storage must be in internal stages.
How can I find out which internal stages consumes most storage?
If I know the name of the table I can list all files in the table stage using something like this:
list #SCHEMANAME.%TABLENAME;
My problem is that there are hundreds of tables in the databases and I have no idea which tables to query.
There is an ACCOUNT_USAGE view called STAGE_STORAGE_USAGE_HISTORY in the Snowflake database/share that will give you everything, including internal stages. I would use this over the information_schema view, since that is limited to what your role currently has access to.
https://docs.snowflake.com/en/sql-reference/account-usage/stage_storage_usage_history.html
You can use the view STAGES in Information schema or account usage to get the stages. Do note that Account usage has higher retention period than information schema and data retrieval is faster. You can read more here
If I understood correctly,you want to do something with stages to reduce the overall billing or storage size
Snowflake Stages
'Internal Named' and 'External' stages :
These are the only stages which can be altered or dropped and controlled by user
User Stage and Table stages are one which can not be altered or dropped,
managed by snowflake
So even if you could identify those Table and User specific stages , you can not drop those.
Snowflake Storage consumption includes below three components for billing
1. Databases size
2. Stages size
3. Fail Safe size
The size of the storage occupied can be visible (Only when you have Accountadmin role
or MONITOR privs) under below location webUI tab
Account Tab ---> Usage --> Average Storage Used
Note : on the account tab, No DB object wise storage billing details available
So how you can see the storage consumption of the associated tables (including their fail safe and time travel bit) and stage details
select * from <DB Name>."INFORMATION_SCHEMA"."TABLE_STORAGE_METRICS"
select * from <DB Name>."INFORMATION_SCHEMA"."STAGES"
Hope clarification helped
Thanks
Palash Chatterjee

In this situation, best practice to create an ORACLE VIEW or TABLE?

In order to be able to report out oracle logons I have a query to find logons to the system. I want to be able to output the query results to a table or view in order to then report on this table/view. The underlying tables that the query is based on do not keep historical data hence my need for a new table/view.
Query will run 3 times a day to gather logon information at those times.
Is it best to create a new table and append the daily information updates or would a view be better practice? I'm unsure on the updating of a view as the underlying tables needs to be present, if that's correct.
Thanks
A view won't help at all. It is just a stored query and doesn't contain any data; it just reflects what you have in underlying table(s).
Therefore, you'll need a "history" table which will permanently hold data you're interested in.

SSIS - How to improve a work flow

Previously I have asked for a possible solution for a situation that I had to face in order to implement a sql query (which is implementing originally in access). I have reach a solution (after asking a lot) but I would like to know if anyone has another way to execute this query.
I have got two different tables, one in sql and another in oracle (S and O)
O(A, B, C) => PK=(A,B) and S(D,E,F) => PK = (D,E)
The query looks like this
SELECT A,B,C,E,F
FROM S INNER JOIN O ON
S.D = O.A (Only one attribute of the PK in O)
S has over 10.000 registers and O more than 700 millions. Given this, is not logic to implement a merge join, or a look up because I will have only the first match between D and A.
So I thought that it will be better to assemble the query in the Oracle side. To do this I have implemented an scheme like this.
With the sql I have executed this query:
with tmp(A) as ( select distinct D as A from S
)
select cast( select concat(' or A = ', A)
from tmp
for xml path('')) as nvarchar(max)) as ID
I am getting a string with the values that I gonna search on oracle.
Finally in the data flow, I am creating an expression like this:
select A, B, C
from O
where A= '' + #ID
I downnload this values to sql server and then I am able to manipulate them as I wish.
The use of the foreach loop was necessary because I am storing the string of sql inside an object variable. I found that SSIS has some troubles with the nvarchar(max) variables.
Some considerations:
1) The Oracle database is administered for another area of the company and they only gives reading permissions over the tables.
2) The DBA of the sql server does not allow to download the O table on a staging area. Not possibilities of negotiations with him, besides, this tabla is updated every day with more registers. He only manages this server and does not have any authority over Oracle.
3) The solution that was given for some members of my team was to create a query in oracle between different tables that can give me the attributes of O that I need, as a result I could get more than 3 millions of register and not all of the attributes A are presented in S. Even more, some the values of D has been manipulated, so possibly they are not going to be present in O.
With this implementation I am getting more than 150.000 registers from Oracle. But I would like to know if another solution can be implemented or if there are other components that I can use to reach the same results. Believe me when I say that I have read, asked and searched a lot before to implement this flow.
EDITED:
Option 1 (You say that you cannot use this solution – but it would be the first one – the best)
Use a DBLink to let Oracle access S table (you must use Oracle Database Gateway). Create a view in Oracle joining O and S. And finally use linked server to let SQL Server access the Oracle Joining view and get the results.
The process is as follow:
You must convince your Oracle DBA to configure the Oracle Database Gateway for SQL Server (see
http://docs.oracle.com/cd/B28359_01/gateways.111/b31043/conf_sql.htm#CIHGADGB)
. When it is properly configured then you can create a DBLink from SQL Server to Oracle. With the DBLink Oracle will have have a direct
access to S table.
Now create a view V just joining O and S table.
As you want the result back in your SQL Server and you cannot use
SSIS then you can proceed as described in:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/111df59c-b309-4d59-b56c-9cd5574ee181/how-to-access-oracle-table-from-sql-server-?forum=transactsql
Option 2 (You say that you cannot use this solution – but it would be the second one)
As your Oracle admins seem to be monsters that will kill you if they get their paws on you. Then you can try (if they let you create a table in oracle):
Create a linked server in SQL Server (to access Oracle from SQL
Server). As I mentioned in the "normal case".
And Create a (temporary) table in Oracle schema with only 1 column (it will store D values from SQL Server)
Everytime you need to evaluate your query execute in SQL Server:
INSERT INTO ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE
SELECT DISTINCT D FROM S;
SELECT * FROM OPENQUERY('SELECT * FROM ORACLE_OWNER.O WHERE A IN (SELECT D FROM ORACLE_OWNER.TEMP_TABLE)');
And finally don't forget to delete the Oracle's temp table:
DELETE * FROM ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE;
Option 3 (If you have an Oracle license and one available host)
You can install your own Oracle server in your host and use Option 2.
Option 4
If your solution is really the only way out, then let's try to improve it a little bit.
As you know, your solution works but it is a little bit aggressive (you are transforming a relational algebra semijoin operator into a relational algebra selection operator with a monster condition). You say that the Oracle table is updated everyday with more register, but if the update rate of your tables are lower than your query rate then you can create a result cache that you can use while the tables S or O are not changed.
Proceed as follows:
Create a table in your SQL Server to store the Oracle result of your monster query. And before build and launch your query execute this:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
This returns the most recent time when your table S was update. Store this value in a table (create a new table or store this value in a typical parameter table).
Create your monster query. But before launch it, send this query to Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
It returns the last SCN (System Change Number) that cause a change in the table. Store this value in a table (create a new table or store this value in a typical parameter table).
Launch the big query and store its result into the cache table.
Finally, when you need to repeat the big query, first execute in your SQL Server:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
And execute in Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
If one or both values have changed with respect the one you have stored in your parameter table, then you must store them in the parameters table (updating the old values) and launch again the big query. But if none of the values have changed, then your cache is up to date, and you can use it.
Note that
SCN is not absolutely precise, but it is a good approximation (see: http://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns007.htm)
The greater is your query rate with respect to your update rate, the better is this solution.
If you can tolerate working with old values, then you can improve the cache with expiration time.

Creating an index on a view with OpenQuery

SQL Server doesn't allow creating an view with schema binding where the view query uses OpenQuery as shown below.
Is there a way or a work-around to create an index on such a view?
The best you could do would be to schedule a periodic export of the AD data you are interested in to a table.
The table could of course then have all the indexes you like. If you ran the export every 10 minutes and the possibility of getting data that is 9 minutes and 59 seconds out of date is not a problem, then your queries will be lightning fast.
The only part of concern would be managing locking and concurrency during the export time. One strategy might be to export the data into a new table and then through renames swap it into place. Another might be to use SYNONYMs (SQL 2005 and up) to do something similar where you just point the SYNONYM to two alternating tables.
The data that supplies the query you're performing comes from a completely different system outside of SQL Server. There's no way that SQL Server can create an indexed view on data it does not own. For starters, how would it be notified when something had been changed so it could update its indexes? There would have to be some notification and update mechanism, which is implausible because SQL Server could not reasonably maintain ACID for such a distributed, slow, non-SQL server transaction to an outside system.
Thus my suggestion for mimicking such a thing through your own scheduled jobs that refresh the data every X minutes.
--Responding to your comment--
You can't tell whether a new user has been added without querying. If Active Directory supports some API that generates events, I've never heard of it.
But, each time you query, you could store the greatest creation time of all the users in a table, then through dynamic SQL, query only for new users with a creation date after that. This query should theoretically be very fast as it would pull very little data across the wire. You would just have to look into what the exact AD field would be for the creation date of the user and the syntax for conditions on that field.
If managing the dynamic SQL was too tough, a very simple vbscript, VB, or .Net application could also query active directory for you on a schedule and update the database.
Here are the basics for Indexed views and thier requirements. Note what you are trying to do would probably fall in the category of a Derived Table, therefore it is not possible to create an indexed view using "OpenQuery"
This list is from http://www.sqlteam.com/article/indexed-views-in-sql-server-2000
1.View definition must always return the same results from the same underlying data.
2.Views cannot use non-deterministic functions.
3.The first index on a View must be a clustered, UNIQUE index.
4.If you use Group By, you must include the new COUNT_BIG(*) in the select list.
5.View definition cannot contain the following
a.TOP
b.Text, ntext or image columns
c.DISTINCT
d.MIN, MAX, COUNT, STDEV, VARIANCE, AVG
e.SUM on a nullable expression
f.A derived table
g.Rowset function
h.Another view
i.UNION
j.Subqueries, outer joins, self joins
k.Full-text predicates like CONTAIN or FREETEXT
l.COMPUTE or COMPUTE BY
m.Cannot include order by in view definition
In this case, there is no way for SQL Server to know of any changes (data, schema, whatever) in the remote data source. For a local table, it can use SCHEMABINDING etc to ensure the underlying tables(s) stay the same and it can track datachanges.
If you need to query the view often, then I'd use a local table that is refreshed periodically. In fact, I'd use a table anyway. AD queries are't the quickest at the best of times...

Resources