What is a SQL "pseudocolumn"? - sql-server

I accidentally coded SELECT $FOO.. and got the error "Invalid pseudocolumn "$FOO".
I can't find any documentation for them. Is it something I should know?
Edit: this is a MS SQL Server specific question.

Pseudocolumns are symbolic aliases for actual columns, that have special properties, for example, $IDENTITY is an alias for the column that has the IDENTITY assigned, $ROWGUID for the column with ROWGUIDCOL assigned. It's used for the internal plumbing scripts.

I don't know why most of the answers are Oracle specific the Question is about SQL Server!
As well as RobsonROX's answer an other example of their use is in the output clause of a merge statement. $action indicates whether the row was inserted or deleted.

A pseudocolumn behaves like a table column, but is not actually stored in the table. You can select from pseudocolumns, but you cannot insert, update, or delete their values.

A simple Google search brings up this from Oracle's reference:
A pseudocolumn behaves like a table
column, but is not actually stored in
the table. You can select from
pseudocolumns, but you cannot insert,
update, or delete their values.
I think that the error you got is simply because there is no column $FOO, so the query parser tests to see if there's a psuedocolumn named $FOO as a fallback. And since there is no pseudocolumn named "$FOO" (and there are no other fallback) you get the error "Invalid pseudocolumn $FOO". This is a guess, though. I'm no expert when it comes to databases.

Pseudocolumns are virtual columns that are available in special cases. In an Oracle database, there's a ROWNUM pseudocolumn that will give you the row number. SQL server, as far as I know, doesn't actually support pseudocolumns but there are errors and stored procedures that refer to pseudo columns, probably for Oracle migration.

One example of a pseudo-column is ROWID in Informix. It is a 32-bit number that can be used to find the data page more quickly than any other way (subject to caveats, such as the table is not fragmented) because it is basically the page address for the data. You can do SELECT * FROM SomeTable and it won't show up; you can do SELECT ROWID, * FROM SomeTable and it will show up in your data. Because it is not actually stored on disk (you won't see the ROWID on the disk with the data, though the ROWID tells you where to look on the disk for the data), it is a pseudo-column. There can be other pseudo-columns associated with tables - they tend to be similarly somewhat esoteric.
They can also be called hidden columns, particularly if they are (contrary to pseudo-columns) actually stored in the database, but are not selected by *; you have to specifically request the column to see it.

pseudo columns are the false columns.
any table will support pseudo columns as same to that of it own column
strictly speaking they are functions:
1.SYSDATE;
2.ROWNUM;
3.ROWID;
4.NEXTVAL;
5.CURRVAL;
6.LEVEL;

Suppose, we have a situation where number of columns are not same in two tables, and we need to apply UNION then generally we take help of Pseudo columns just to perform UNION operation.
Select Col1, Col2, Col3, Col4, Col5 from Table1
UNION
Select Col1, Col2, Col3, Null as Col4, Null as Col5 from Table2

Related

COUNT(*) vs COUNT(column) Performance in Snowflake

Since SnowFlake is a columnar database, does it impact performance when you use COUNT(*) vs COUNT(column)? And this is assuming that the column that you're referencing does NOT have any NULLs
As a_horse_with_no_name explained these two functions are different but you already mentioned that the column has no NULL values. So they should return the same result in your case.
More important thing is, Snowflake has a special optimization for the COUNT function. As far I see, it does NOT impact performance if you use COUNT(*) or COUNT(column), even when the column contains NULL values! For both of them, Snowflake uses METADATA statistics, so it does not actually count rows.
You can test it with SNOWFLAKE_SAMPLE_DATA:
select count(*) from snowflake_sample_data.TPCH_SF1000.LINEITEM;
-- 5999989709
select count(L_ORDERKEY) from snowflake_sample_data.TPCH_SF1000.LINEITEM;
-- 5999989709
Both queries will return a result immediately although the table size is about 170G, and contain more than 5G rows.
I have to add this extra information because of the conversation between Niru and a_horse_with_no_name. a_horse_with_no_name said:
Even if all columns of a row are NULL, count(*) should include that row in the result. If it doesn't this is a clear violation of the SQL standard
I'm not sure about the SQL standard but when you use COUNT(*), Snowflake doesn't check if the columns are NULL or not (as you expected). I can see why Niru misunderstood the documents, the docs and the samples should be improved.
If you run my sample queries, you will see that they are completed in milliseconds. We are talking about counting almost 6 billion rows:
select count(*) from snowflake_sample_data.TPCH_SF1000.LINEITEM;
-- completes in milliseconds
select count(L_ORDERKEY) from snowflake_sample_data.TPCH_SF1000.LINEITEM;
-- completes in milliseconds
But if I do a little modification on the query, it takes about 3 minutes on the same warehouse (XSMALL):
select count(t.*) from sample_data.TPCH_SF1000.LINEITEM t;
-- completes in 3 minutes!?
Here is the trick:
Alias.*, which indicates that the function should return the number of rows that do not contain any NULLs.
https://docs.snowflake.com/en/sql-reference/functions/count.html#arguments
Only if you use alias.* (like I used t.* in my sample), Snowflake will check if all columns are null when producing the count. This is why it is much slower, and this is why there shouldn't be any performance issues when you are running COUNT(XYZ) or COUNT(*) on a table.
Here is the snowflake doc.. hope it helps
https://docs.snowflake.com/en/sql-reference/functions/count.html Please refer to snowflake document.. it does effect count(alias.*) will check the each column in the row where as count(column) do null check only on that column..

How to use a recursive CTE in a check constraint?

I'm trying to create a check constraint on a table so that ParentID is never a descendant of current record
For instance, I have a table Categories, with the following fields ID, Name, ParentID
I have the following CTE
WITH Children AS (SELECT ID AS AncestorID, ID, ParentID AS NextAncestorID FROM Categories UNION ALL SELECT Categories.ID, Children.ID, Categories.ParentID FROM Categories JOIN Children ON Categories.ID = Children.NextAncestorID) SELECT ID FROM Children where AncestorID =99
The results here are correct, but when I try to add it as a constraint to the table like this:
ALTER TABLE dbo.Categories ADD CONSTRAINT CK_Categories CHECK (ParentID NOT IN(WITH Children AS (SELECT ID AS AncestorID, ID, ParentID AS NextAncestorID FROM Categories UNION ALL SELECT Categories.ID, Children.ID, Categories.ParentID FROM Categories JOIN Children ON Categories.ID = Children.NextAncestorID) SELECT ID FROM Children where AncestorID =ID))
I get the following error:
Incorrect syntax near the keyword 'with'. If this statement is a
common table expression, an xmlnamespaces clause or a change tracking
context clause, the previous statement must be terminated with a
semicolon.
Adding a semicolon before the WITH, didn't help.
What would be the correct way to do this?
Thanks!
Per the SQL Server documentation on column constraints:
CHECK
Is a constraint that enforces domain integrity by limiting the possible values that can be entered into a column or columns.
logical_expression
Is a logical expression used in a CHECK constraint and returns TRUE or FALSE. logical_expression used with CHECK constraints cannot reference another table but can reference other columns in the same table for the same row. The expression cannot reference an alias data type.
(The above was quoted from the SQL Server 2017 version of the documentation, but the general principle applies to all previous versions as well, and you didn't state what version you are working with.)
The important part here is the "cannot reference another table, but can reference other columns in the same table for the same row" (emphasis added). A CTE would count as another table.
As such, you can't have a complex query like a CTE used for a CHECK constraint.
Like Saman suggested, if you want to check against existing data in other rows, and it must be in the DB layer, you could do it as a trigger.
However, triggers have their own drawbacks (e.g. issues with discoverability, behavior that is unexpected by those who are unaware of the trigger's presence).
As Sami suggested in their comment, another option is a UDF, but that's not w/o its own issues, potentially with both performance and stability according to the answers on this question about this approach in SQL Server 2008. It's probably still applicable to later versions as well.
If possible, I would say it's usually best to move that logic into the application layer. I see from your comment that you already have "client-side" validation. If there is an app server between that client and the database server (such as in a web app), I would suggest putting that additional validation there (in the app server) instead of in the database.

Bad int8 external representation "6*725" in Netezza

I am getting an error like "Bad int8 external representation "6*725" " in netezza while executing a stored procedure . This stored procedure takes data from a table does some transformations and load into another table.
Can any one please help me .
Thanks,
Brajendra
FYI: May be multiple answers to this question because do not have the query that you ran to get the error.
If you did a direct INSERT command like this, the order of the columns of the table from the select clause DO NOT match the order of the columns of the table from the insert clause. Most database management systems doesn't care what the order is, but Netezza does. The fact that it threw the "Bad int8" just means the first column it couldn't match in the select clause has that data type, and the data type in the insert clause has a different data type.
INSERT INTO DB1..TABLE1
SELECT * FROM DB1..TABLE2;
You can fix with one of two methods. Either change the order of the columns by dropping and recreating the table. Or use explicit column names in the INSERT INTO/SELECT command.

SQL Server Performance With Large Query

Hi everyone I have a couple of queries for some reports in which each query is pulling Data from 35+ tables. Each Table has almost 100K records. All the Queries are Union ALL for Example
;With CTE
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table3 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table4 WHERE Some_Condition
.
.
. And so on
)
SELECT col1, col2, col3 FROM CTE
ORDER BY col3 DESC
So far I have only tested this query on Dev Server and I can see It takes its time to get the results. All of these 35+ tables are not related with each other and this is the only way I can think of to get all the Desired Data in result set.
Is there a better way to do this kind of query ??
If this is the only way to go for this kind of query how can I
improve the performance for this Query by making any changes if
possible??
My Opinion
I Dont mind having a few dirty-reads in this report. I was thinking of using Query hints with nolock or Transaction Isolation Level set to READ UNCOMMITED.
Will any of this help ???
Edit
Every Table has 5-10 Bit columns and a Corresponding Date column to each Bit Column and my condition for each SELECT Statement is something like
WHERE BitColumn = 1 AND DateColumn IS NULL
Suggestion By Peers
Filtered Index
CREATE NONCLUSTERED INDEX IX_Table_Column
ON TableName(BitColumn)
WHERE BitColum = 1
Filtered Index with Included Column
CREATE NONCLUSTERED INDEX fIX_IX_Table_Column
ON TableName(BitColumn)
INCLUDE (DateColumn)
WHERE DateColumn IS NULL
Is this the best way to go ? or any suggestions please ???
There are lots of things that can be done to make it faster.
If I assume you need to do these UNIONs, then you can speed up the query by :
Caching the results, for example,
Can you create an indexed view from the whole statement ? Or there are lots of different WHERE conditions, so there'd be lots of indexed views ? But know that this will slow down modifications (INSERT, etc.) for those tables
Can you cache it in a different way ? Maybe in the mid layer ?
Can it be recalculated in advance ?
Make a covering index. Leading columns are columns form WHERE and then all other columns from the query as included columns
Note that a covering index can be also filtered but filtered index isn't used if the WHERE in the query will have variables / parameters and they can potentially have the value that is not covered by the filtered index (i.e., the row isn't covered)
ORDER BY will cause sort
If you can cache it, then it's fine - no sort will be needed (it's cached sorted)
Otherwise, sort is CPU bound (and I/O bound if not in memory). To speed it up, do you use fast collation ? The performance difference between the slowest and fastest collation can be even 3 times. For example, SQL_EBCDIC280_CP1_CS_AS, SQL_Latin1_General_CP1251_CS_AS, SQL_Latin1_General_CP1_CI_AS are one of the fastest collations. However, it's hard to make recommendations if I don't know the collation characteristics you need
Network
'network packet size' for the connection that does the SELECT should be the maximum value possible - 32,767 bytes if the result set (number of rows) will be big. This can be set on the client side, e.g., if you use .NET and SqlConnection in the connection string. This will minimize CPU overhead when sending data from the SQL Server and will improve performance on both side - client and server. This can boost performance even by tens of percents if the network was the bottleneck
Use shared memory endpoint if the client is on the SQL Server; otherwise TCP/IP for the best performance
General things
As you said, using isolation level read uncommmitted will improve the performance
...
Probably you can't do changes beyond rewriting the query, etc. but just in case, adding more memory in case it isn't sufficient now, or using SQL Server 2014 in memory features :-), ... would surely help.
There are way too many things that could be tuned but it's hard to point out the key ones if the question isn't very specific.
Hope this helps a bit
well you haven't give any statistics or sample run time of any execution so it is not possible to guess what is slow and is it really slow. how much data is in the result set? it might be just retrieving 100K rows as in result is just taking its time. if the result set of 10000 rows is taking 5 minute, yes definitely something can be looked at. so if you have sample query, number of rows in result and how much time it took for couple of execution with different where conditions, post that. it will help us to compare results.
BTW, do not use CTE just use regular inner and outer query select. make sure the Temp DB is configured properly. LDF and MDF is not default configured for 10% increase. by certain try and error you will come to know how much log and temp DB is increased for verity of range queries and based on that you should set the initial and increment size of the MDF and LDF of temp DB. for the Covered filter index the include column should be col1, col2 and co3 not column Date unless Date is also in select list.
how frequently the data in original 35 tables get updated? if max once per day or if they all get updates almost same time then Indexed-Views can be a possible solution. but if original tables getting updates more than once a day or they gets updates anytime and no where they are in same line then do no think about Indexed-View.
if disk space is not an issue as a last resort try and test performance using trigger on each 35 table. create new table to hold final results as you are expecting from this select query. create insert/update/delete trigger on each 35 table where you check the conditions inside trigger and if yes then only copy the same insert/update/delete to new table. yes you will need a column in new table that identifies which data coming from which table. because Date is Null-Able column you do not get full advantage of Index on that Column as "mostly you are looking for WHERE Date is NULL".
in the new Table only query you always do is where Date is NULL then do not even bother to create that column just create BIT columns and other col1, col2, col3 etc... if you give real example of your query and explain the actual tables, other details can be workout later.
The query hints or the Isolation Level are only going to help you in case of any blocking occurs.
If you dont mind dirty reads and there are locks during the execution it could be a good idea.
The key question is how many data fits the Where clausule you need to use (WHERE BitColumn = 1 AND DateColumn IS NULL)
If the subset filtered by that is small compared with the total number of rows, then use an index on both columns, BitColum and DateColumn, including the columns in the select clausule to avoid "Page Lookup" operations in your query plan.
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(BitColumn, DateColumn)
INCLUDE (col1, col2, col3)
Of course the space needed for that covered-filtered index depends on the datatype of the fields involved and the number of rows that satisfy WHERE BitColumn = 1 AND DateColumn IS NULL.
After that I recomend to use a View instead of a CTE:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
.
.
.
)
By doing that, your query plan should look like 35 small index scans, but if most of the data satisfies the where clausule of your index, the performance is going to be similar to scan the 35 source tables and the solution won't worth it.
But You say "Every Table has 5-10 Bit columns and a Corresponding Date column.." then I think is not going to be a good idea to make an index per bit colum.
If you need to filter by using diferent BitColums and Different DateColums, use a compute column in your table:
ALTER TABLE Table1 ADD ComputedFilterFlag AS
CAST(
CASE WHEN BitColum1 = 1 AND DateColumn1 IS NULL THEN 1 ELSE 0 END +
CASE WHEN BitColum2 = 1 AND DateColumn2 IS NULL THEN 2 ELSE 0 END +
CASE WHEN BitColum3 = 1 AND DateColumn3 IS NULL THEN 4 ELSE 0 END
AS tinyint)
I recomend you use the value 2^(X-1) for conditionX(BitColumnX=1 and DateColumnX IS NOT NULL). It is going to allow you to filter by using any combination of that criteria.
By using value 3 you can locate all rows that accomplish: Bit1, Date1 and Bit2, Date2 condition. Any condition combination has its corresponding ComputedFilterFlag value because the ComputedFilterFlag acts as a bitmap of conditions.
If you heve less than 8 diferents filters you should use tinyint to save space in the index and decrease the IO operations needed.
Then use an Index over ComputedFilterFlag colum:
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(ComputedFilterFlag)
INCLUDE (col1, col2, col3)
And create the view:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
.
.
.
)
By doing that, your index coveres all the conditions and your query plan should look like 35 small index seeks.
But this is a tricky solution, may be a refactoring in your table schema could produce simpler and faster results.
You'll never get real time results from a union all query over many tables but I can tell you how I got a little speed out of a similar situation. Hopefully this will help you out.
You can actually run all of them at once with a little bit coding and ingenuity.
You create a global temporary table instead of a common table expression and don't put any keys on the global temporary table it will just slow things down. Then you start all the individual queries which insert into the global temporary table. I've done this a hundred or so times manually and it's faster than a union query because you get a query running on each cpu core. The tricky part is the mechanism to determine when the individual queries have finished your on your own for that piece hence I do these manually.

how to get name of column(s) causing error "Column name or number of supplied values does not match table definition"

I want to insert data from one table into another. Both tables have about 100 columns.
They don't have the same structure, but "almost": the source table has about 20 columns fewer- some of them are NOT NULL. For those columns, I have to define a default, of course.
My first trial resulted in an error message (surprise surprise):
Column name or number of supplied values does not match table definition
But in my complex case, this message is not really helpful. Is there a way to get a more precise error message?
Suggest making your query easier to read, rather than relying on the error message from the RDBMS. One thought:
Put each column on its own line.
INSERT INTO TargetTable (
Col1,
Col2,
....
)
SELECT
Col1,
Col2,
....
FROM SourceTable
create a new Excel sheet.
In SQL Server Management Studio, Alt-F1 your TargetTable; copy-paste the Column_name (and perhaps the Nullable values) into Excel sheet's ColumnA.
Copy/paste the TargetTable columns into ColumnB.
Run a macro or visually inspect/adjust the differences. The nullables you mention, you'll have to know offhand.

Resources