Cassandra bitwise operations and operators (&, or, not) - database

My use case is the following:
I have a table that has a bigint clustering column X. I also have a value Y that is a bitmask in this use case. I want to do the following query
select * from table where key1 = something1 AND key2 = something2 AND (X & Y) = 1
where the & is the bitwise and operation between X and Y. Is this possible in cassandra?
Also does cassandra have and, or, xor and not operators?

No, Cassandra "operators" are limited to >, >=, <= and >
They are used in range queries on the sorted clustering column after having selected a primary key.
What you want would be a sort of filtering, not a range query.
You can read the post here to know more about the WHERE clause: http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

Related

Why case..when get a table scan ? how to workarround

When I use CASE .. WHEN .. END I get an index scan less efficient than the index seek.
I have complex business rules I need to use the CASE, is there any workaround ?
Query A:
select * from [dbo].[Mobile]
where((
CASE
where ([MobileNumber] = (LTRIM(RTRIM('987654321'))))
END
) = 1)
This query gets an index scan and 199 logical reads.
Query B:
select * from [dbo].[Mobile]
where ([MobileNumber] = (LTRIM(RTRIM('987654321'))))
This query gets an index seek and 122 logical reads.
For the table
CREATE TABLE #T(X CHAR(1) PRIMARY KEY);
And the query
SELECT *
FROM #T
WHERE CASE WHEN X = 'A' THEN 1 ELSE 0 END = 1;
It is apparent without that much thought that the only circumstances in which the CASE expression evaluates to 1 are when X = 'A' and that the query has the same semantics as
SELECT *
FROM #T
WHERE X = 'A';
However the first query will get a scan and the second one a seek.
The SQL Server optimiser will try all sorts of relational transformations on queries but will not even attempt to rearrange expressions such as CASE WHEN X = 'A' THEN 1 ELSE 0 END = 1 to express it as an X = expression so it can perform an index seek on it.
It is up to the query writer to write their queries in such a way that they are sargable.
There is no workaround to get an index seek on column MobileNumber with your existing CASE predicate. You just need to express the condition differently (as in your example B).
Potentially you could create a computed column with the CASE expression and index that - and you could then see an index seek on the new column. However this is unlikely to be useful to you as I assume in reality the mobile number 987654321 is dynamic and not something to be hardcoded into a column used by an index.
After cleaning up and fixing your code, you have a WHERE which is boolean expression based around a CASE.
As mentioned by #MartinSmith, there is simply no way SQL Server will re-arrange this. It does not do the kind of dynamic slicing that would allow it to re-arrange the first query into the second version.
select *
from [dbo].[Mobile]
where
CASE
WHEN [MobileNumber] = LTRIM(RTRIM('987654321'))
THEN 1
END
= 1
You may ask: the second version also has an expression in it, why does this not also get a scan?
select *
from [dbo].[Mobile]
where [MobileNumber] = LTRIM(RTRIM('987654321'))
The reason is that what SQL Server can recognize is that LTRIM(RTRIM('987654321')) is a deterministic constant expression: it does not change depending on runtime settings, nor on the result of in-row calculations.
Therefore, it can optimize by calculating it at compile time. The query therefore becomes this under the hood, which can be used against an index on MobileNumber.
select *
from [dbo].[Mobile]
where [MobileNumber] = '987654321'

Creating formula in SQL server

Maybe some of you could help me with creation of formula in sql. I need to perform calculations of the result of the expression for all given formulas. The notation of the formula is simple: P (X) means that the expression appends the integer X in parentheses. M (Y) means that the expression subtracts the integer Y in parentheses. The “+” symbol combines elements of the formula.
Example. given formula: P (10) + M (5) + M (3) + P (1). It translates to 10-5-3 + 1 = 3.
The result should look like this:
Those easy formulas can be done in Microsoft SQL Server.
Split the formula in the different parts with STRING_SPLIT and + as separator.
Use REPLACE to apply a negative number sign: P(X) --> X and M(X) --> -X.
Use CONVERT to turn the string parts into numbers.
Add everything up with a SUM aggregation and group by clause.
Sample data
create table input
(
formula nvarchar(50)
);
insert into input (formula) values
('P(10)+M(5)+M(3)+P(1)'),
('P(7)+M(3)+M(4)');
Solution
select i.formula,
sum(convert(int, replace(replace(replace(s.value,'P(', ''),'M(','-'),')',''))) as rez
from input i
cross apply string_split(i.formula, '+') s
group by i.formula;
Result
formula rez
-------------------- ---
P(10)+M(5)+M(3)+P(1) 3
P(7)+M(3)+M(4) 0
Fiddle to see everything in action with intermediate steps.

Using a SqlServer sql query, how to tell if integer column is odd or even?

I've got a simple column definition for a primary key and I'm wanting to add to my where clause something that evaluates true when the value is odd. My column definition is simply this:
Id int IDENTITY(1, 1) NOT NULL,
and I'm wanting something like
SELECT Id FROM Attendees Where Id is Odd
Where id % 2 = 1
The percent operator in SQL is a modulus operator, basically returning the remainder of the division.

Compare arrays for equality, ignoring order of elements

I have a table with 4 array columns.. the results are like:
ids signed_ids new_ids new_ids_signed
{1,2,3} | {2,1,3} | {4,5,6} | {6,5,4}
Anyway to compare ids and signed_ids so that they come out equal, by ignoring the order of the elements?
You can use contained by operator:
(array1 <# array2 and array1 #> array2)
The additional module intarray provides operators for arrays of integer, which are typically (much) faster. Install once per database with (in Postgres 9.1 or later):
CREATE EXTENSION intarray;
Then you can:
SELECT uniq(sort(ids)) = uniq(sort(signed_ids));
Or:
SELECT ids #> signed_ids AND ids <# signed_ids;
Bold emphasis on functions and operators from intarray.
In the second example, operator resolution arrives at the specialized intarray operators if left and right argument are type integer[].
Both expressions will ignore order and duplicity of elements. Further reading in the helpful manual here.
intarray operators only work for arrays of integer (int4), not bigint (int8) or smallint (int2) or any other data type.
Unlike the default generic operators, intarray operators do not accept NULL values in arrays. NULL in any involved array raises an exception. If you need to work with NULL values, you can default to the standard, generic operators by schema-qualifying the operator with the OPERATOR construct:
SELECT ARRAY[1,4,null,3]::int[] OPERATOR(pg_catalog.#>) ARRAY[3,1]::int[]
The generic operators can't use indexes with an intarray operator class and vice versa.
Related:
GIN index on smallint[] column not used or error "operator is not unique"
The simplest thing to do is sort them and compare them sorted. See sorting arrays in PostgreSQL.
Given sample data:
CREATE TABLE aa(ids integer[], signed_ids integer[]);
INSERT INTO aa(ids, signed_ids) VALUES (ARRAY[1,2,3], ARRAY[2,1,3]);
the best thing to do is to if the array entries are always integers is to use the intarray extension, as Erwin explains in his answer. It's a lot faster than any pure-SQL formulation.
Otherwise, for a general version that works for any data type, define an array_sort(anyarray):
CREATE OR REPLACE FUNCTION array_sort(anyarray) RETURNS anyarray AS $$
SELECT array_agg(x order by x) FROM unnest($1) x;
$$ LANGUAGE 'SQL';
and use it sort and compare the sorted arrays:
SELECT array_sort(ids) = array_sort(signed_ids) FROM aa;
There's an important caveat:
SELECT array_sort( ARRAY[1,2,2,4,4] ) = array_sort( ARRAY[1,2,4] );
will be false. This may or may not be what you want, depending on your intentions.
Alternately, define a function array_compare_as_set:
CREATE OR REPLACE FUNCTION array_compare_as_set(anyarray,anyarray) RETURNS boolean AS $$
SELECT CASE
WHEN array_dims($1) <> array_dims($2) THEN
'f'
WHEN array_length($1,1) <> array_length($2,1) THEN
'f'
ELSE
NOT EXISTS (
SELECT 1
FROM unnest($1) a
FULL JOIN unnest($2) b ON (a=b)
WHERE a IS NULL or b IS NULL
)
END
$$ LANGUAGE 'SQL' IMMUTABLE;
and then:
SELECT array_compare_as_set(ids, signed_ids) FROM aa;
This is subtly different from comparing two array_sorted values. array_compare_as_set will eliminate duplicates, making array_compare_as_set(ARRAY[1,2,3,3],ARRAY[1,2,3]) true, whereas array_sort(ARRAY[1,2,3,3]) = array_sort(ARRAY[1,2,3]) will be false.
Both of these approaches will have pretty bad performance. Consider ensuring that you always store your arrays sorted in the first place.
If your arrays have no duplicates and are of the same dimension:
use array contains #>
AND array_length where the length must match the size you want on both sides
select (string_agg(a,',' order by a) = string_agg(b,',' order by b)) from
(select unnest(array[1,2,3,2])::text as a,unnest(array[2,2,3,1])::text as b) A

How would you query an array of 1's and 0's chars from a database?

Say you had a long array of chars that are either 1 or 0, kind of like a bitvector, but on a database column. How would you query to know what values are set/no set? Say you need to know if the char 500 and char 1500 are "true" or not.
SELECT
Id
FROM
BitVectorTable
WHERE
SUBSTRING(BitVector, 500, 1) = '1'
AND SUBSTRING(BitVector, 1000, 1) = '1'
No index can be used for this kind of query, though. When you have many rows, this will get slow very quickly.
Edit: On SQL Server at least, all built-in string functions are deterministic. That means you could look into the possibility to make computed columns based on the SUBSTRING() results for the whole combined value, putting an index on each of them. Inserts will be slower, table size will increase, but searches will be really fast.
SELECT
Id
FROM
BitVectorTable
WHERE
BitVector_0500 = '1'
AND BitVector_1000 = '1'
Edit #2: The limits for SQL Server are:
1,024 columns per normal table
30.000 columns per "wide" table
In MySQL, something using substring like
select foo from bar
where substring(col, 500,1)='1' and substring(col, 1500,1)='1';
This will be pretty inefficient though, you might want to rethink your schema. For example, you could store each bit separately to tradeoff space for speed...
create table foo
(
id int not null,
bar varchar(128),
primary key(id)
);
create table foobit
(
int foo_id int not null,
int idx int not null,
value tinyint not null,
primary key(foo_id,idx),
index(idx,value)
);
Which would be queried
select foo.bar from foo
inner join foobit as bit500
on(foo.id=bit500.foo_id and bit500.idx=500)
inner join foobit as bit1500
on(foo.id=bit1500.foo_id and bit1500.idx=1500)
where
bit500.value=1 and bit1500.value=1;
Obviously consumes more storage, but should be faster for those query operations as an index will be used.
I would convert the column to multiple bit-columns and rewrite the relevant code - Bit masks are so much faster than string comparisons. But if you can't do that, you must use db-specific functions. Regular expressions could be an option
-- Flavor: MySql
SELECT * FROM table WHERE column REGEXP "^.{499}1.{999}1"
select substring(your_col, 500,1) as char500,
substring(your_col, 1500,1) as char1500 from your_table;

Resources