PostgreSQL - CREATE INDEX - c

I'm working with PostgreSQL to create some data types written in C.
For example, I have:
typedef struct Point3D
{
char id[50];
double x;
double y;
double z;
Point3D;
}
The input and output functions are working properly.
But the problem is the following:
Every id of Point3D must be unique (and can be NULL), so I have decided to create an unique index on this field id, but is that possible?
I'm thinking in something like this:
CREATE UNIQUE INDEX test_point3d_idx ON test_point3d (( getID(columname) ));
where getID returns the field ID of columname.
But I need to implement getID and I am really blocked.
Any advice?

The Postgres manual section "Interfacing Extensions To Indexes" explains indexes on user-defined types like your Point3D. That requires a fair amount of work. I don't know any shortcuts.
Unrelated to your question: are you sure you need this C-language Point3D datatype? Mistakes in such a datatype definition can "confuse or even crash the server". I presume the same applies to C-language operator functions supporting it.
Could you create tables with four columns, one for each Point3D field? Otherwise, could you forego C in favor of a simple CREATE TYPE point3d AS (id char(50), x float8, y float8, z float8)? Perhaps not, but worth a shot...

A unique column will allow multiple values of NULL because NULL is an unknown value so one null compared to another can never really be considered to be equal. Now logically you might consider NULL = NULL to be true, but unique constraint doesn't work that way.
Simple example to prove it.
CREATE TABLE test2
(
unq_id integer NULL,
CONSTRAINT uq_test2 UNIQUE (unq_id)
);
INSERT INTO test2
SElECT NULL;
INSERT INTO test2
SElECT NULL;
INSERT INTO test2
SElECT NULL;
SELECT *
FROM test2;

Related

How to replace string column with number 0 if the values in that column is null

Hope this is right place to ask question related to snowflake database..
I would like to know how to replace string column value with 0 if it is null.
so far, i tried nvl function, but that didn't work.
CREATE TABLE EMP(ENAME VARCHAR(10));
INSERT INTO EMP VALUES('JACK');
INSERT INTO EMP VALUES(null);
SELECT NVL(ENAME,0) FROM EMP1;
Error :Numeric value 'JACK' is not recognized
Any help would be appreciated.
Thanks,
Nawaz
SQL is strongly typed. The output type of NVL is being inferred to be an NUMBER, so the actual query looks something like
SELECT NVL(ENAME::NUMBER, 0) FROM EMP1;
You should decide what type your output should be. If you want strings, then you will need to pass NVL a string, like
SELECT NVL(ENAME, '0') FROM EMP1;
If you want integers, you will need to convert the strings to integers safely. For example, if you want non-integers to become NULL, then 0, then you can use
SELECT NVL(TRY_TO_INTEGER(ENAME), 0) FROM EMP1;

Check if value exists in postgres array for partitioning via check constraint

I found this very similar question but all of the answer started with a 'select' statement. I want to check whether a string is contained in a constant array with about 30 other strings. I could write a long x == a OR x == b OR... statement but I thought there might be a cleaner way.
So this doesn't work as a constraint check: SELECT language = ANY ('{"en", "pt", "es", "fr"}'::text[])
Just removing the SELECT works:
CHECK(
language = ANY ('{"en", "pt", "es", "fr"}'::text[])
)
But as a_horse_with_no_name pointed out:
Not using an array is even better, as this does not break the partitioning optimization.
CHECK(
not( language in ('en', 'pt', 'es'))
)
Now SELECT * FROM myTable WHERE language='de'; will not even look at this table.

SQLServer choosing primary key type

I have a list of objects each of its own id and I need to create a table for them in a database. It's a good idea to use their ids(since they are unique) as a primary key in the table but there's one problem. All ids are integers except for the one object - it has 2 subobjects with ids 142.1 and 142.2, so the id list is 140, 141, 142.1, 142.2, 143...
Now if I choose a double as a type of primary key then it will store unnecessary 6 bytes(since double is 8 bytes and INT is 2) to only support two double numbers and I can't choose INT. So what type should I use if I cannot change the list of objects?
The math for double is imprecise, you shouldn't use it for discrete numbers like money or object id's. Consider using decimal(p,s) instead. Where p is the total number of digits, and s is the number of digits behind the dot. For example, a decimal(5,2) could store 123.45, but not 1234 or 12.345.
Another option is a composite primary key for two integers n1, n2:
alter table YourTable add constraint PK_YourTable primary key (n1, n2)
An int is four bytes, not two, so the size difference to a double is not so big.
However, you should definitely not use a floating point number as key, as a floating point number isn't stored as an exact values, but as an approximation.
You can use a decimal with one fractional digit, like decimal(5,1), to store a value like that. A decimal is a fixed point number, so it's stored as an exact value, not an approximation.
Choose VARCHAR of an appropriate length, with CHECK constraints to ensure the data conforms to your domain rules e.g. based on the small sample data you posted:
CREATE TABLE Ids
(
id VARCHAR(5) NOT NULL UNIQUE
CONSTRAINT id__pattern
CHECK (
id LIKE '[0-9][0-9][0-9]'
OR id LIKE '[0-9][0-9][0-9].[1-9]'
)
);

Database Design for 2D Matrix Algebra

Can anyone advise on a database design/DBMS for storing 2D Time Series Matrix data. To allow for quick BACK END algebraic calculations: e.g:
Table A,B,C..
Col1: Date- Timestamp
col2: Data- Array? (Matrix Data)
SQL Psuedo Code
INSERT INTO TABLE C
SELECT
Multiply A.Data A by B.Data
Where Matrix A Start Date = Matrix B Start Date
And Matrix A End Date = Matrix B End Date
Essentially set the co-ordinates for the calculation.
The difficulty with matrix algebra is determining what is a domain on the matrix for data modelling purposes. Is it a value? Is it a matrix as a whole? This is not a pre-defined question, so I will give you two solutions and what the tradeoffs are.
Solution 1: Value in a matrix cell is a domain:
CREATE TABLE matrix_info (
x_size int,
y_size int,
id serial not null unique,
timestamp not null,
);
CREATE TABLE matrix_cell (
matrix_id int references matrix_info(id),
x int,
y int,
value numeric not null,
primary key (matrix_id, x, y)
);
The big concern is that this does not enforce matrix sizes very well. Additionally a missing value could be used to represent 0, or might not be allowed. The idea of using a matrix as a whole as a domain has some attractiveness. In this case:
CREATE TABLE matrix (
id serial not null unique,
timestamp not null,
matrix_data numeric[]
);
Note that many db's including PostgreSQL will enforce that an array is actually a matrix. Then you'd need to write your own functions for multiplication etc. I would recommend doing this in an object-relational way and on PostgreSQL since it is quite programmable for this sort of thing. Something like:
CREATE TABLE matrix(int) RETURNS matrix LANGUAGE SQL AS
$$ select * from matrix where id = $1 $$;
CREATE FUNCTION multiply(matrix, matrix) RETURNS matrix LANGUAGE plpgsql AS
$$
DECLARE matrix1 = $1.matrix_data;
matrix2 = $2.matrix_data;
begin
...
end;
$$;
Then you can call the matrix multiplication as:
SELECT * FROM multiply(matrix(1), matrix(2));
You could even insert into the table the product of two other matrices:
INSERT INTO matrix (matrix_data)
SELECT matrix_data FROM multiply(matrix(1), matrix(2));

How would you query an array of 1's and 0's chars from a database?

Say you had a long array of chars that are either 1 or 0, kind of like a bitvector, but on a database column. How would you query to know what values are set/no set? Say you need to know if the char 500 and char 1500 are "true" or not.
SELECT
Id
FROM
BitVectorTable
WHERE
SUBSTRING(BitVector, 500, 1) = '1'
AND SUBSTRING(BitVector, 1000, 1) = '1'
No index can be used for this kind of query, though. When you have many rows, this will get slow very quickly.
Edit: On SQL Server at least, all built-in string functions are deterministic. That means you could look into the possibility to make computed columns based on the SUBSTRING() results for the whole combined value, putting an index on each of them. Inserts will be slower, table size will increase, but searches will be really fast.
SELECT
Id
FROM
BitVectorTable
WHERE
BitVector_0500 = '1'
AND BitVector_1000 = '1'
Edit #2: The limits for SQL Server are:
1,024 columns per normal table
30.000 columns per "wide" table
In MySQL, something using substring like
select foo from bar
where substring(col, 500,1)='1' and substring(col, 1500,1)='1';
This will be pretty inefficient though, you might want to rethink your schema. For example, you could store each bit separately to tradeoff space for speed...
create table foo
(
id int not null,
bar varchar(128),
primary key(id)
);
create table foobit
(
int foo_id int not null,
int idx int not null,
value tinyint not null,
primary key(foo_id,idx),
index(idx,value)
);
Which would be queried
select foo.bar from foo
inner join foobit as bit500
on(foo.id=bit500.foo_id and bit500.idx=500)
inner join foobit as bit1500
on(foo.id=bit1500.foo_id and bit1500.idx=1500)
where
bit500.value=1 and bit1500.value=1;
Obviously consumes more storage, but should be faster for those query operations as an index will be used.
I would convert the column to multiple bit-columns and rewrite the relevant code - Bit masks are so much faster than string comparisons. But if you can't do that, you must use db-specific functions. Regular expressions could be an option
-- Flavor: MySql
SELECT * FROM table WHERE column REGEXP "^.{499}1.{999}1"
select substring(your_col, 500,1) as char500,
substring(your_col, 1500,1) as char1500 from your_table;

Resources