Is query_id unique among all the Snowflake accounts?
You can see query_id is a UUID string in the picture below. So I am wondering if the Snowflake generated query_id is unique across all the Snowflake accounts. And if Snowflake users can submit query_id to Snowflake support for help. And if Snowflake support can distinguish the account and the QUERY details by query_id.
Support can infer a lot of information from the Query ID and can look up additional information on that specific query (all related to metadata), but they may ask you to confirm other pieces.
For example, account confirmation so that they know you are an authorized user of the account and aren't someone random who copied a Query ID from Stack Overflow. If you are enabled for Snowflake Support and can submit a case (see link), then you are already confirmed an authorized user for the account. https://support.snowflake.net/s/article/How-To-Submit-a-Support-Case-in-Snowflake-Lodge
You're correct that the Query ID appears to be a UUID, but there's no documented guarantee that a Query ID is unique beyond the guarantee that the UUID standard itself provides (read: it's extraordinarily unlikely, but not impossible, that two UUIDs will collide). So for whatever purpose you need it, Query ID is likely to suffice as unique.
It's not, however, a documented guarantee that Snowflake will continue to use UUIDs for query ids indefinitely. If tomorrow the engineering team decides to use some other constructed string, they probably can. I don't know why they would though.
QueryID is based on UUID and they are unique among all the snowflake accounts. So when you are submitting a case and you just provide queryID and nothing else even in those cases, Snowflake support will be able to locate your query. Though Support may obviously ask for more information in order to help you better.
Related
We have tables on Salesforce which we'd like to make available to other applications usings Microsoft Common Data Service. Moreover, we'd like to keep CDS more or less up to date, even having data which was created or updated five minutes ago.
However, some of those tables have hundreds of thousands, or even millions, of records. So, refreshing all the data is inefficient and impractical.
In theory, when CDS queries for data, it should be able to know when its most recent data is from and include this data in the query for new data.
But I'm not clear how to make that part of the query that gets used in the refresh operation.
Is this possible?
What do I need to do?
How are you selecting the data that you are retrieving? If you have the option to use a SOQL query, you can use the fields CreatedDate and LastModifiedDate as part of your queries. For example:
SELECT Id, Name
FROM Account
WHERE CreatedDate > 2020-10-25T23:01:01Z
or
SELECT Id, Name
FROM Account
WHERE LastModifiedDate > 2020-10-25T23:01:01Z
There are some options but I have no idea what (if anything) is implemented in your connector. Read up a bit and maybe you'll decide to do something custom. There's also a semi-decent SF connector in Azure Data Factory 2 if that helps.
Almost every Salesforce table contains CreatedDate, LastModifiedDate and SystemModstamp columns but we don't have raw access to underlying database. There's no ODBC driver (or if there is - it's lying, pretending SF objects are "linked tables" and hiding the API implementation magic in stored procedures). I'm not affiliated, never used it personally but heard good stuff about DBAmp. It's been few years though, alternatives might have popped up. Go give that ADF connector a go.
I mean you could even rephrase it a bit, look around for backup tool that does incremental backups and works with SQL Server? kill 2 birds with 1 stone.
So...
The other answer gives you query route which is OK but bit impractical if it's 100+ tables.
There's Data Replication API to get Ids (primary keys) of recently updated/deleted records. SOAP API has getUpdated call, similar in REST API. You'd still have to call it per object though. And you'd just know that these were modified, still need to query all columns (there's "retrieve" call similar to SELECT *)
Perhaps you need to change the direction. SF can raise events when data changes, subscribing apps have between 1 to 3 days to consume them. It uses cometd protocol, chances are there's app for that in .NET world. There are few types of events (could raise custom events, could raise only when certain conditions are met, from SF config or code; and other way around - subscribing app could specify a query it's interested in and get notified whenever query's results would change). But if you just want everything - search for "Change Data Capture". Could be nice near-realtime solution
Is there a way in snowflake to describe/catalog schema so it is easier to search? For example, column name is "s_id", which really represents "system identifier". Is there a way to specify that s_id is "system identifier" and user can search "system identifier", which would return s_id as column name.
Just about every object in Snowflake allows for a comment to be added. You could leverage the comment field to be more descriptive of the columns intent, which can then be searched in information_schema.columns. I've seen many Snowflake customers use the comments as a full-blown data dictionary. My only suggestion is to not use any apostrophe's in your comments, as they screw up your get_ddl() results.
Adding comments along with table and field can or may serve the purpose but if you are in a complex enterprise environment and a lot of groups are part of it, it will be hard to follow this approach for long. Either you must have single ownership for data models tool which promotes such changes to the snowflake. Managing such information via comment is error-prone and that's why snowflake has partners which help on the different development requirement.
If you go to their partner website (link), you will find a lot of companies that have good integration with SF and one of them is Alation (link). You can try it for free for 30 days and see if that serves your purpose of data cataloging or not.
Maybe this has been asked a lot, but I can't find a comprehensive post about it.
Q: What are the options when you don't want to pass the ids from database to the frontend? You don't want the user to be able to see how many records are in your database.
What I found/heard so far:
Encrypt and decrypt the Id on backend
Use a GUID instead of a numeric auto-incremented Id as PK
Use a GUID together with an auto-incremented Id as PK
Q: Do you know any other or do you have experience with any of these? What are the performance and technical issues? Please provide documentation and blog posts on this topic if you know any.
Two things:
The sheer existence of an id doesn't tell you anything about how many records are in a database. Even if the id is something like 10, that doesn't mean there's only 10 records; it's just likely the tenth that was created.
Exposing ids has nothing to do with security, one way or another. Ids only have a meaning in the context of the database table they reside in. Therefore, in order to discern anything based on an id, the user would have to have access directly to your database. If that's the case, you've got far more issues than whether or not you exposed an id.
If users shouldn't be able to access certain ids, such as perhaps an edit page, where an id is passed as part of the URL, then you control that via row-level access policies, not by obfuscating or attempting to hide the id. Security by obscurity is not security.
That said, if you're just totally against the idea of sequential ids, then use GUIDs. There is no performance impact to using GUIDs. It's still a clustered index, just as any other primary key. They take up more space than something like an int, obviously, but we're talking a difference of 12 bytes per id - hardly anything to worry about with today's storage.
I have a web app that stores objects in a database and then sends emails based on changes to those objects. For debugging and tracking, I am thinking of including the Document Id in the email metadata. Is there a security risk here? I could encrypt it (AES-256).
In general, I realize that security through obscurity isn't good practice, but I am wondering if I should still be careful with Document Ids.
For clarity, I am using CouchDB, but I think this can apply to databases in general.
By default, CouchDB uses UUIDs with a UTC time prefix. The worst you can leak there is the time the document was created, and you will be able to correlate about 1k worth of IDs likely having been produced on the same machine.
You can change this in the CouchDB configuration to use purely 128bit random UUIDs by setting the algorithm setting within the uuids section to random. For more information see the CouchDB Docs. Nothing should be possible to be gained from them.
Edit: If you choose your own document IDs, of course, you leak whatever you put in there :)
Compare Convenience and Security:
Convenience:
how useful is it for you having the document id in the mail?
can you quickly get useful information / the document having the ID ?
does encrypting/hashing it mean it's harder to get the actual database document? (answer here is yes unless you have a nice lookup form/something which takes the hash directly, avoid manual steps )
Security:
having a document ID what could I possibly do that's bad?
let's say you have a web application to look at documents..you have the same ID in a URL, it can't be considered 'secret'
if I have the ID can I access the 'document' or some other information I shouldn't be able to access. Hint: you should always properly check rights, if that's done then you have no problem.
as long as an ID isn't considered 'secret', meaning there aren't any security checks purely on ID, you should have no problems.
do you care if someone finds out the time a document was created? ( from Jan Lehnardt's answer )
I'm certainly no DBA and only a beginner when it comes to software development, so any help is appreciated. What is the most secure structure for storing the data from multiple parties in one database? For instance if three people have access to the same tables, I want to make sure that each person can only see their data. Is it best to create a unique ID for each person and store that along with the data then query based on that ID? Are there other considerations I should take into account as well?
You are on the right track, but mapping the USER ID into the table is probably not what you want, because in practice many users have access to the corporations data. In those cases you would store "CorpID" as a column, or more generically "ContextID". But yes, to limit access to data, each row should be able to convey who the data is for, either directly (the row actually contains a reference to CorpID, UserID, ContextID or the like) or it can be inferred by joining to other tables that reference the qualifier.
In practice, these rules are enforced by a middle tier that queries the database, providing the user context in some way so that only the correct records are selected out of the database and ultimately presented to the user.
...three people have access to the same tables...
If these persons can query the tables directly through some query tool like toad then we have a serious problem. if not, that is like they access through some middle tier/service layer or so then #wagregg's solution above holds.
coming to the case when they have direct access rights then one approach is:
create database level user accounts for each of the users.
have another table with row level grant information. say your_table has a primary key column MY_PK_COL then the structure of the GRANTS_TABLE table would be like {USER_ID; MY_PK_COL} with MY_PK_COL a foreign key to your_table.
Remove all privileges of concerned users from your_table
Create a view. SELECT * FROM your_table WHERE user_id=getCurrentUserID();
give your users SELECT/INSERT/UPDATE rights on this view.
Most of the database systems (MySQL, Oracle, SQLServer) provide way to get current logged user. (the one used in the connection string). They also provide ways to restrict access to certain tables. now for your users the view will behave as a normal table. they will never know the difference.
a problem happens when there are too many users. provisioning a database level uer account to every one of them may turn difficult. but then DBMS like MsSQLServer can use windows authentication, there by reducing the user/creation problem.
In most of the scenarios the filter at middle tier approach is the best way. but there are times when security is paramount. Also a bug in the middle tier may allow malicious users to bypass the security. SQL injection is one thing to name. then you have to do what you have to do.
It sounds like you're talking about a multi-tenant architecture, but I can't tell for sure.
This SO answer has a summary of the issues, and links to an online article containing details about the trade-offs.