I have below query, where I want to filter out records using multiple criteria. But I am getting below syntax error.
Query
SELECT * FROM mydb.test
where org=123
AND (status = 'over' AND ecode = 196)
OR (status = 'start' AND ecode = 195)
ALLOW FILTERING;
Syntax error
SyntaxException: <Error from server: code=2000 [Syntax error in CQL query]
message="line 1:88 mismatched input 'AND' expecting ')' (... AND (status = 'over' [AND]...)">
How can I fix this syntax error?
OR is not supported by Cassandra...
Alex is correct. Cassandra does not support the OR keyword. It's one of the differences between CQL and SQL. In fact, given Cassandra's storage model, an OR construct is particularly problematic.
How can I achieve this scenario?
I can think of a few ways.
With Cassandra, the general idea with data modeling is to build your tables to suit your queries. So the first, would be to apply the logic on the data load, but your logic may be too complex for that.
You could also split this query into two queries (based on your AND conditions) and process the result sets on the application side. Not optimal, but it might be the only way to get the fine-grained control you need.
The other approach, would be to try using IN to get around the absence of OR. Just be careful not to restrict your partition key with IN, and always specify your partition key (with an = operator) when you do. That way you'll limit your query to processing on a single node. In fact, using IN on a clustering key (again, with = on your partition key) is really the only way I would recommend its use in a production system.
Related
I am trying to create a Stream on top of a VIEW. The VIEW joins to other multiple VIEWs that use QUALIFY to get the latest record for each PK from base tables.
Snowflake gives me the following error when I try to create this Stream:
Change tracking is not supported on queries with QUALIFY.
What are my options? Thanks.
Note:
Change tracking is not supported on queries with window functions.
Streams on views with the following operations are not yet supported:
- GROUP BY clauses
- QUALIFY clauses
- Subqueries not in the FROM clause
- Correlated subqueries
- LIMIT clauses
You can read more about this here.
I noticed even the simplest 'SELECT MAX(TIMESTAMP) FROM MYVIEW' is somewhat slow (taking minutes) in my environment, and found it's doing a TableScan of 100+GB across 80K micropartitions.
My expectation was this to finish in milliseconds using MIN/MAX/COUNT metadata in each micropartitions. In fact, I do see Snowflake finishing the job in milliseconds using metadata for almost same MIN/MAX value lookup in following article:
http://cloudsqale.com/2019/05/03/performance-of-min-max-functions-metadata-operations-and-partition-pruning-in-snowflake/
Is there any limitation in how Snowflake decides to use metadata? Could it be because I'm querying through a view, instead of querying a table directly?
=== Added for clarity ===
Thanks for answering! Regarding how the view is defined, it seems to adds a WHERE clause for additional filtering with a cluster key. So I believe it should still be possible to fully use metadata of miropartitions. But as posted, TableScan is being done in profilter output.
I'm bit concerned on your comment on SecureView. The view I'm querying is indeed a SECURE VIEW - does it affect how optimizer handles my query? Could that be a reason why TableScan is done?
It looks like you're running the query on a view. The metadata you're referring to will be used when you're running a simple MIN MAX etc on the table, however if you have some logic inside your view which requires filtering / joining of data then Snowflake cannot return results just based off the metadata.
So yes, you are correct when you say the following because your view is probably doing something other than a simple MAX on the table:
...Could it be because I'm querying through a view, instead of querying a table directly?
Is there a best practice for bypassing fulltext filtering of resultset, if a search text not specified? What I do now is:
SELECT * FROM items
WHERE #search='all' OR CONTAINS(*,#search)
But I wonder if there is a more elegant way?
Your example is the most elegant way of writing the query. However you should really look at the execution plan to see if this is the most performant approach.
You may want to consider writing the query like this to ensure SQL Server isn't evaluating the CONTAINS operator when it isn't necessary.
if (#search = 'all')
SELECT * FROM items
else
SELECT * FROM items
WHERE CONTAINS(*,#search)
I am using Teradata. In that I am getting 'no more spool space in Database'. My database utilization is 85%.
Is there any relationship between this error and DB utilization factor ?
Any studies on this would be more helpful for me to resolve this.
Share me your ideas to avoid this.
Spool space problems occur either when you have an inefficient query or when statistics have not been properly collected on the tables you are using. It can also happen with tables where the primary index was poorly chosen (high skew). Spool is an attribute of the user account you are using to connect to the Teradata environment; it is not really an attribute of the database itself.
The only way to know for certain is to look at the EXPLAIN plan for your query.
If your query is inefficient, rewrite it. If statistics need to be collected or if the index needs to be altered, contact the DBA responsible for the tables you are using.
If there is a particular query that is giving you an "out of spool" error, update this question with the complete text of the query.
I was not able to resolve my "out of spool" error by the methods above. I resolved the error by moving a rank function into its own small table without any join or extraneous columns.
Spool space can occur when you use tables having large data. If you are using multiple tables, check if you are using alias names instead of referring the complete table. Using alias names actually narrows down the data by the joins. Also see if functions like oreplace which consume more data are being used. Try using regular expressions in that case.
Eventually, you might create too low spool space.
You need to specify a new value for SPOOL in a MODIFY PROFILE or MODIFY USER statement, depending on where the user spool is defined. The syntax is:
MODIFY [PROFILE profile_name | USER user_name ] AS SPOOL = 100000000 bytes ;
I'm not trying to start a debate on which is better in general, I'm asking specifically to this question. :)
I need to write a query to pull back a list of userid (uid) from a database containing 500k+ records. I'm returning just the one field, uid. I can query either our Oracle box or our MSSQL 2000 box. The query looks like this (this has not been simplied)
select uid
from employeeRec
where uid = 'abc123'
Yes, it really is that simply of a query. Where I need the tuninig help is that the uid is indexed and some uid could be (not many but some) 'ABC123' or 'abc123'. MSSQL doesn't care of the case-sensitivity whereas Oracle does. So for Oracle, my query would look like this:
select uid
from employeeRec
where lower(uid) = 'abc123'
I've learned that if you use lower on an index field in MSSQL, you render the index useless (there are ways around it but that is beyond the scope of my question here - since if I choose MSSQL, I don't need to use lower at all). I wanted to know if I choose Oracle, and use the lower() function, will that also hurt performance of the query?
I'm looping over this query about 200 times in addition to some other queries that are being run and to process the entire loop takes 1 second per iteration and I've narrowed down the slowness to this particular query. For a web page, 200 seconds seems like eternity. For you CF readers, timeout value has been increased so the page doesn't error out and there are no page errors, I'm just trying to speed up this query.
Another item to note: This database is in a different city than the other queries being run so I do expect some lag time there.
As TomTom put, your index will simply not be used by Oracle. But, you can create a function based index, and this new index will be used when you issue your query.
create index my_new_ix on employeeRec(lower(uid));
Wrapping an indexed column in a function call would have the potential to cause performance problems in Oracle. Oracle couldn't use a plain index on UID to process your query. On the other hand, you could create a function-based index on lower(uid) that would be used by the query, i.e.
CREATE INDEX case_insensitive_idx
ON employeeRec( lower( uid ) );
Note that if you want to do case-insensitive queries in general, you may be better served setting NLS parameters to force case-insensitivity. You'd still need function-based indexes on the columns you're searching on, but it can simplify your queries a bit.
I wanted to know if I choose Oracle,
and use the lower() function, will
that also hurt performance of the
query?
Yes. The perforamnce reduction is because the index is on the original value and the collation i case sensitive, so all possible values must be run through the function to filter out the ones matching.