I'm trying to store arrays (or vectors) of data in a SQLite database but I'm having problems trying to find a decent way to do so. I found some other post on StackOverflow, that I can't seem to find anymore, which mentioned storing the data in a table like the following:
CREATE TABLE array_of_points
(
id integer NOT NULL,
position integer NOT NULL,
x integer NOT NULL,
y integer NOT NULL,
PRIMARY KEY (id, position)
);
So to store all the data for a single array you would insert each item under the same ID and just increment the position. So for example to insert an array with three values it would be something like:
INSERT INTO array_of_points VALUES (0, 0, 1, 1);
INSERT INTO array_of_points VALUES (0, 1, 2, 2);
INSERT INTO array_of_points VALUES (0, 2, 3, 3);
And then to retrieve the values you would select everything with the same ID and order by the position:
SELECT x,y FROM array_of_points WHERE id = 0 ORDER BY position;
This is all great and works wonderfully, but I'm now running into a problem where I don't know how to reference an array in a different table. For example I want to do something like the following:
CREATE TABLE foo
(
id integer NOT NULL,
array_id integer NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (array_id) REFERENCES array_of_points (id)
);
This will create the table just fine but once you try to execute a query on it the foreign key constraint throws an error since it must reference both the id and position of the array_of_points table since they are part of a composite primary key.
The only solution I currently have is to just remove the foreign key from the foo table, but that is not a good solution since it means it can now hold any value even if it doesn't actually map to an array in the array_of_points table.
Is there any way to work around this problem? Or maybe there's some other way to store the data so that this is possible?
Just as an FYI, please do not suggest I store the data in some sort of comma/semi-colon/whatever delimited list because that is an even worse option that I am not going to consider. It is also not possible to do with some of the more complex objects that are going to be stored in the database.
There is one special case that this schema cannot handle: it is not possible to store an array of size zero.
This might not be a concern in practice, but it shows that the database is not fully normalized.
A foreign key always references a single parent record.
Therefore, what is missing is a table that has a single record for each array.
Implementing this would result in a schema like this:
CREATE TABLE array
(
id integer PRIMARY KEY
-- no other properties
);
CREATE TABLE array_points
(
array_id integer REFERENCES array(id),
position integer,
x, y, [...],
PRIMARY KEY (array_id, position)
) WITHOUT ROWID; -- see http://www.sqlite.org/withoutrowid.html
CREATE TABLE foo
(
[...],
array_id integer REFERENCES array(id)
);
The additional table requires more effort to manage, but now you have the ability to generate array IDs through autoincrementing.
Related
I have a table of identifiers, IntervalFrom and IntervalTo:
Identifier
IntervalFrom
IntervalTo
1
0
2
1
2
4
2
0
2
2
2
4
I already have a trigger to NOT allow the intervals to overlap.
I am looking for a trigger or constraint that will not allow data gaps.
I have search and the information I found relates to gaps in queries and data rather than not allowing them in the first place.
I am unable to find anything in relation to this as a trigger or constraint.
Is this possible using T-SQL?
Thanks in advance.
You can construct a table that automatically is immune from overlaps and gaps:
create table T (
ID int not null,
IntervalFrom int null,
IntervalTo int null,
constraint UQ_T_Previous_XRef UNIQUE (ID, IntervalTo),
constraint UQ_T_Next_XRef UNIQUE (ID, IntervalFrom),
constraint FK_T_Previous FOREIGN KEY (ID, IntervalFrom) references T (ID, IntervalTo),
constraint FK_T_Next FOREIGN KEY (ID, IntervalTo) references T (ID, IntervalFrom)
)
go
create unique index UQ_T_Start on T (ID) where IntervalFrom is null
go
create unique index UQ_T_End on T(ID) where IntervalTo is null
go
Note, this does require a slightly different convention for you first and last intervals - they need to use null rather than 0 or the (somewhat arbitrary) 4.
Note also that modifying data in such a table can be a challenge - if you're inserting a new interval, you also need to update other intervals to accommodate the new one. MERGE is your friend here.
Given the above, we can insert your (modified) sample data:
insert into T (ID, IntervalFrom, IntervalTo) values
(1,null,2),
(1,2,null),
(2,null,2),
(2,2,null)
go
But we cannot insert an overlapping value (this errors):
insert into T(ID, IntervalFrom, IntervalTo) values (1,1,3)
You should also see that the foreign keys prevent gaps from existing in a sequence
What order can I copy data into two different tables to comply with the table constraints I created locally?
I created an example from the documentation, but was hoping to get recommendations on how to optimize the data stored by selecting the right types.
I created two tables, one was the list of names and the second is a list of names with a date they did something.
create or replace table name_key (
id integer not null,
id_sub integer not null,
constraint pkey_1 primary key (id, id_sub) not enforced,
name varchar
);
create or replace table recipts (
col_a integer not null,
col_b integer not null,
constraint fkey_1 foreign key (col_a, col_b) references name_key (id, id_sub) not enforced,
recipt_date datetime,
did_stuff variant
);
Insert into name_key values (0, 0, 'Geinie'), (1, 1, 'Greg'), (2,2, 'Alex'), (3,3, 'Willow');
Insert into recipts values(0,0, Current_date()), (1,1, Current_date()), (2,2, Current_date()), (3,3, Current_date());
Select * from name_key;
Select * from recipts;
Select * from name_key
join recipts on name_key.id = recipts.col_a
where id = 0 or col_b = 2;
I read: https://docs.snowflake.net/manuals/user-guide/table-considerations.html#storing-semi-structured-data-in-a-variant-column-vs-flattening-the-nested-structure where it recommends to change timestamps from strings to a variant. I did not include the fourth column, I left it blank for future use. Essentially it captures data in json format, so I made it a variant. Would it be better to rethink this table structure to flatten the variant column?
Also I would like to change the key to AUTO_INCRDEMENT, is there something like this in Snowflake?
What order can I copy data into two different tables to comply with the table constraints I created locally?
You need to give more context about your constraints, but you can control the order of copy statements. For foreign keys generally you want to load the table that is referenced before the table that does the referencing.
where it recommends to change timestamps from strings to a variant.
I think you misread that documentation. It recommends extracting values from a variant column into their own separate columns (in this case a timestamp column), ESPECIALLY if those columns are dates and times, arrays, and numbers within strings.
Converting a timestamp column to a variant, is exactly what it is recommending against.
Would it be better to rethink this table structure to flatten the variant column?
It's definitely good to think carefully about, and do performance tests on, situations where you are using semi-structured data, but without more information on your specific situation and data, it's hard to say.
Also I would like to change the key to AUTO_INCRDEMENT, is there something like this in Snowflake?
Yes Snowflake has an Auto_increment feature. Although I've heard this has some issue with working with COPY INTO Statements
I want to start off saying I am not a database guru, but I'm decent with the basics.
I have a set of IO data that I'm storing in two tables which are uniquely identified by 'ioid' and 'machinenum'.
I have a 2 tables: IOConfig which uniquely identifies points (all the identifying information and a primary key: ConfigID). And a data table that contains samples of these items.
My table layouts below are to test using a primary key + index versus using just an index, so I know there is duplicate data.
Think of IOConfig table as such:
ConfigId(PK) machineNum ioId ioType
Think of IOData table as such:
Timestamp ConfigId machineNum ioId value
If I use the ConfigID primary key, with an index on (timestamp,ConfigId) my query is like this:
select * from AnalogInput
where sampleTimestamp>=1520306916007000000 and sampleTimestamp<=1520351489939000000
and configId in (1112)
"0" "0" "0" "SEARCH TABLE IOData USING INDEX cfgIndexAnalogInput (configId=? AND sampleTimestamp>? AND sampleTimestamp<?)"
If I avoid using ConfigID the query is like this:
select * from AnalogInput
where sampleTimestamp>=1520306916007000000 and sampleTimestamp<=1520351489939000000
and ioId in (1)
and machineid=1111
"0" "0" "0" "SEARCH TABLE IOData USING INDEX tsIndexAnalogInput (sampleTimestamp>? AND sampleTimestamp<?)"
Why wouldn't I get the improvement that I see with the first query + Index of (timestamp,configid) for the second query using an index of (timestamp,machineNum,ioid)? I ask because machineNum and ioid are used to define what point is unique to make a configId primary key... so one would expect them to equate?
schema:
CREATE TABLE 'IOData'(
'sampleTimestamp' INTEGER,
'configId' INTEGER,
'machineId' INTEGER,
'ioId' INTEGER,
'value' REAL);
CREATE TABLE 'IOConfig'(
'sampleTimestamp' INTEGER,
'configId' INTEGER PRIMARY KEY,
'machineId' INTEGER,
'ioId' INTEGER,
'ioType' INTEGER);
CREATE INDEX `Something` ON `IOData` (`sampleTimestamp` ASC,`machineId` ASC,`ioId` ASC)
CREATE INDEX cfgIndexAnalogInput ON IOData(configId,sampleTimestamp)
CREATE INDEX tsIndexAnalogInput ON IOData(sampleTimestamp)
Read Query Planning to understand how indexes work, and The SQLite Query Optimizer Overview to see what specific optimization will be applied.
In this case, the filter on sampleTimestamp uses inequality comparisons, so, according to section 1.0, that must be the last column in the index (either in an explicit index, or in a three-column primary key):
CREATE INDEX SomethingBetter ON IOData(machineId, ioId, sampleTimestamp);
we have an Oracle Database and we have a table where we store a lot of data in.
This table has a primary key and usually those primary keys are just created upon insertion of a new row.
But now we need to manually insert data into this table with certain fixed primary keys. There is no way to change those primary keys.
So for example:
Our table has already 20 entries with the primary keys 1 to 20.
Now we need to add data manually with the primary keys 21 to 23.
When someone wants to enter a row using our standard approach, the insert process will fail because of:
Caused by: java.sql.BatchUpdateException: ORA-00001: Unique Constraint (VDMA.SYS_C0013552) verletzt
at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10500)
at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:230)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:70)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:268)
I totally understand this: The database routine (sequence) that is creating the next primary key fails because the next primary key is already taken.
But: How do I tell my sequence to look at the table again and to realize that the next primary key is 24 and not 21 ?
UPDATE
The reason why the IDs need to stay the same is because is accessing the records using a Web Interface using links that contain the ID.
So either we change the implementation mapping the old IDs to new IDs or we keep the IDs in the database.
UPDATE2
Found a solution: Since we are using hibernate, only one sequence is populating all the tables. Thus the primary keys in those 4 days where I was looking for an answer went so high that I can savely import all the data.
How do I tell my sequence to look at the table again and to realize that the next primary key is 24 and not 21 ?
In Oracle, a sequence doesn't know that you intend to use it for any particular table. All the sequence knows is its current value, its increment, its maxval and so on. So, you can't tell the sequence to look at a table, but you can tell your stored procedure to check the table and then increment the sequence beyond the maximum val of the primary key. In other words, if you really insist on manually updating the primary key with non sequence values, then your code needs to check for non sequence values in the PK and get the sequence up to speed before it uses the sequence to generate a new PK.
Here is something simple you can use to bring the sequence up to where it needs to be:
select testseq.nextval from dual;
Each time you run it the sequence increments by 1. Stick it in a for loop and run it until testseq.currval is where you need it to be.
Having said that, I agree with #a_horse_with_no_name and #EdStevens. If you have to insert rows manually, at least use sequence_name.nextval in the insert instead of a literal like '21'. Like this:
create table testtab (testpk number primary key, testval number);
create sequence testseq start with 1 increment by 1;
insert into testtab values (testseq.nextval, '12');
insert into testtab values (testseq.nextval, '123');
insert into testtab values (testseq.nextval, '1234');
insert into testtab values (testseq.nextval, '12345');
insert into testtab values (testseq.nextval, '123456');
select * from testtab;
testpk testval
2 12
3 123
4 1234
5 12345
6 123456
I've a natural primary key in table A.
In table B, I want to have an array of foreign key references to A.
Is it possible to specify ON UPDATE CASCADE on the elements of the array, such that when the value of a primary key in table A changes, arrays in B get modified.
Or should I just normalise the array out into a separate table?
Normalizing this would allow you to use standard ON UPDATE CASCADE in a foreign key constraint. That would be much faster, because the system can use plain indexes. That should give you three tables. Needs somewhat more disk space, but worth every bit:
table a
table b
table a_b -- to implement n:m relationship
See:
How to implement a many-to-many relationship in PostgreSQL?
Can PostgreSQL array be optimized for join?
Else you will have to write a trigger function to find and replace all references in B to values of master A.
Is it possible to specify ON UPDATE CASCADE on the elements of the
array, such that when the value of a primary key in table A changes,
arrays in B get modified.
Only if
both the referenced column and the referencing column are arrays of the same type, and
the values have the same number of elements.
If you want to insert valid values for array elements in one table, and in another table store an array of those valid values, it won't work.
OTOH, this does work, but only in part.
create table a (
str varchar[2] primary key
);
create table b (
-- Room for two values from table a . . .
str varchar[4] primary key references a (str) on update cascade
);
insert into a values
('{''A'', ''B''}'),
('{''C'', ''D''}'),
('{''E'', ''F''}');
insert into b values
('{''A'', ''B''}');
update a set str = '{''A'',''C''}'
where str = '{''A'',''B''}';
select * from b;
{'A','C'}
That much works. But if you try to store two arrays in table b, you'll get an error.
insert into b values
('{{"C", "D"}, {"E", "F"}}');
ERROR: insert or update on table "b" violates foreign key constraint "b_str_fkey"
DETAIL: Key (str)=({{C,D},{E,F}}) is not present in table "a".
And, when you squint and tilt your head just right, that makes sense. In the relational model, the intersection of every row and column contains just one value. So you shouldn't be able to update half a value by ON UPDATE CASCADE.