Can a database index be on a function of a column, or must it be on precisely what's in the column itself without any change/adjustment/calculation?
Simple example:
If a transactions table contains a column that specifies the datetime of the transaction (e.g. 2020-12-13 12:58:59), if we want to index on just the date (e.g. 2020-12-13) of the transaction, does that require another column (with just the date) to be created, or can the index be created on a function of the datetime column?
Postgres in fact supports function indices. For you example, we can define:
CREATE INDEX idx ON transactions (cast(ts_col AS date));
Yes it can. It can even be any expression, including function calls, and use more than one column of the table.
From the documentation of CREATE INDEX:
CREATE ... INDEX ... ON ... table_name ...
( { ... ( expression ) } ... ) ...
...
expression
An expression based on one or more columns of the table. The expression usually must be written with surrounding parentheses, as shown in the syntax. However, the parentheses can be omitted if the expression has the form of a function call.
Related
I've created a table and added default values for some columns.
E.g.
Create table Table1( COL1 NUMBER(38,0),
COL2 STRING,
MODIFIED_DT STRING DEFAULT CURRENT_DATE(),
IS_USER_MODIFIED BOOLEAN DEFAULT 'FALSE' )
Current Behavior:
During data load, I see that when running inserts, my column 'MODIFIED_DT' is getting inserted with default values.
However, if there are any subsequent updates, the default value is not getting updated.
Expected Behavior:
My requirement is that the column value should be automatically taken care by ANY INSERT/UPDATE operation.
E.g. In SQL Server, if I add a Default, the column value will always be inserted/updated with the default values whenever a DML operation takes place on the record
Is there a way to make it work? or does default value apply only to Inserts?
Is there a way to add logic to the DEFAULT values.
E.g. In the above table's example, for the column IS_USER_MODIFIED, can I do:
Case when CURRENT_USER() = 'Admin_Login' then 'FALSE' Else 'TRUE' end
If not, is there another option in snowflake to implement such functionality?
the following is generic to most (all?) databases and is not specific to Snowflake...
Default values on columns in table definitions only get inserted when there is no explicit reference to that column in an INSERT statement. So if I have a table with 2 columns (column_a and column_b and with a default value for column_b) and I execute this type of INSERT:
INSERT INTO [dbo].[doc_exz]
([column_a])
VALUES
(3),
(2);
column_b will be set to the default value. However, with this INSERT statement:
INSERT INTO [dbo].[doc_exz]
([column_a]
,[column_b])
VALUES
(5,1),
(6,NULL);
column_b will have values of 1 and null. Because I have explicitly referenced column_b the value I use, even if it is NULL, will be written to the record even though that column definition has a default value.
Default values only work with INSERT statements not UPDATE statements (as an existing record must have a "value" in the column, even if it is a NULL value, so when you UPDATE it the default doesn't apply) - so I don't believe your statement about defaults working with updates on SQL Server is correct; I've just tried it, just to be sure, and it doesn't.
Snowflake-specific Answer
Given that column defaults only work with INSERT statements, they are not going to be a solution to your problem. The only straightforward solution I can think of is to explicitly include these columns in your INSERT/UPDATE statements.
You could write a stored procedure to do the INSERT/UPDATES, and automatically populate these columns, but that would perform poorly for bulk changes and probably wouldn't be simple to use as you'd need to pass in the table name, the list of columns and the list of values.
Obviously, if you are inserting/updating these records using an external tool you'd put this logic in the tool rather than trying to implement it in Snowflake.
Snowflake has a "DERIVED COLUMN" feature. These columns are VIRTUAL/COMPUTED and are not used in ETL process. However, any DML activity will automatically influence the column values.
Nice thing is, we can even write CASE logic in the column definition. This solved my problem.
CREATE OR REPLACE TABLE DB_NAME.DBO.TEST_TABLE
(
FILE_ID NUMBER(38,0),
MANUAL_OVERRIDE_FLG INT as (case when current_user() = 'some_admin_login' then 0 else 1 end),
RECORD_MODIFIED_DT DATE as (CURRENT_DATE()),
RECORD_MODIFIED_BY STRING as (current_user())
);
I am trying to create an external table with various partition columns.
It works to do the following, for instance:
create or replace external table mytable(
myday date as to_date(substr(metadata$filename, 35, 10), 'YYYY-MM-DD'))
partition by (myday)
location = #mys3stage
file_format = (type = parquet);
However, I would like to use regex_substr instead of character indexing, as I won't always have consistent character indices for all partitioning columns. I would like to do this:
create or replace external table mytable(
myday date as to_date(regexp_substr(metadata$filename, 'day=[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]'), 'day=YYYY-MM-DD'))
partition by (myday)
location = #mys3stage
file_format = (type = parquet);
This gives me an error Defining expression for partition column MYDAY is invalid. I can run the regexp_substr clause successfully in a select statement outside of the external table creation, getting the same results as the substr approach.
How can I use regex string matching in my external table partition column definition?
REGEX_SUBSTR is not a function that is on the list of currently supported partition key functions. Please use the following link to see the list of acceptable functions:
https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html#partitioning-parameters
I am not completely sure I understand how your folder structure wouldn't be consistent. Perhaps if you provided an example, this community can offer a more concise response. However, if you are unable to come up with a parsing mechanism that works, perhaps you could leverage the CASE statement to handle each unique folder structure that you may come across in your environment.
I'm trying to query my sql database to return all the rows where the ID is contained in a separate tables column. The list of project IDs is kept in the Feedback table in the Project_ID Column with datatype varchar. I am trying to return the rows from the Projects table where the IDs are kept in the Project_ID column with datatype varchar.
I am doing this using the query
SELECT * FROM Projects WHERE Project_ID IN (
SELECT Project_ID FROM Feedback WHERE ID = 268 and Project_ID IS NOT NULL
)
When I run this query I am returned with the message:
Conversion failed when converting the varchar value '36;10;59' to data type int
This is yet another example of the importance of normalizing your data.
Keeping multiple data points in a single column is almost never the correct design, and by almost never I mean about 99.9999%.
If you can't normalize your database, you can use a workaround like this:
SELECT *
FROM Projects p
WHERE EXISTS (
SELECT Project_ID
FROM Feedback F WHERE ID = 268
AND Project_ID IS NOT NULL
AND ';'+ F.Project_ID +';' LIKE '%;'+ CAST(p.Project_ID as varchar) +';%'
)
You can't use the IN operator since it's expecting a list of values delimited by a comma, while you try to supply it with a single value that is delimited by a semicolon. Even if the values in Project_ID was delimited by a comma it would still not work.
The reason I've added the ; on each side of the Project_ID in both tables is that this way the LIKE operator will return true for any location it finds the Projects.Project_Id inside the Feedback.Project_Id. You must add the ; to the Projects.Project_Id to prevent the LIKE to return true when you are looking for a number that is a partial match to the numbers in the delimited string. Consider looking for 12 in a string containing 1;112;455 - without adding the delimiter to the search value (12 in this example) the LIKE operator would return true.
I am using Sql-server 2012
My table definition is:
create table products1(pid int,pnm varchar(10),oid int foreign key references orders1(oid))
create table orders1(oid int primary key,onm varchar(10),odt date)
insert into products1 values(1,'ABC',1),(2,'DEF',2)
insert into orders1 values(1,'QTY','2013-04-27'),(2,'PROD','2015-04-29')
create function dbo.udf_getmaxdt(#a date)
returns date
as
begin
select #a=max(o.odt) from orders1 o inner join products1 p on o.oid=p.pid
return #a
end
create function dbo.udf_getmaxdt(#a date)
returns TABLE
as
RETURN
(
select #a=max(o.odt) from orders1 o inner join products1 p on o.oid=p.pid
)
end
The syntax might be wrong but I am on the task to determine the best of the 2 types of function.
But aint able to tell which one,could you help me on this?
regards,
Chio
The first one is scalar function
The second one is inline table valued UDF
The second one has a pattern which you can recognize:
create FUNCTION [dbo].[fn_geo_calcDistance]
(
...
)
RETURNS table
AS
RETURN SELECT ...
The second one , when inlined with a query , will cause the query not to be referenced as an outside query but as an inline query , which will be mush faster
Aside from the fact that your provided definition of table-valued function has incorrect syntax (you're not returning resultset there), it is impossible to state "scalar function is better than table-valued" (or vice versa) - because this statement like "apples are better than cars".
It is two absolutely different types of functions:
table-valued function returns result of some query, it can be multiple rows in this resultset.
scalar function returns only one scalar value.
So it's up to you - what should you use in each particular case.
There are some tricks using inline table-valued functions as scalar ones (by returning single row of data containing single value), and this makes sense in some cases (mostly by eliminating bottlenecks in performance optimization), but it should not be considered as some "universal recipe" for any use-case.
I am trying to create a table that when a row is inserted, a computed column
calculates a commission value. I have wrapped the logic for this calculation into a function that returns a table. the function is a table-valued function.
what I thought I could do is
ALTER TABLE [dbo].[CostChange] ADD Commission
AS (SELECT TOP 1 Value
FROM fn_WesleyTest(CostId, BookingId, ChangeDate))
GO
But I get an error
Subqueries are not allowed in this context. Only scalar expressions are allowed.
Is it possible to overcome this?
Will I have to change the return value of my function?