sqlite3 - making the combination of two entries unique - arrays

Table:
Car | Year | Colour
---------------
=
BMW | 2013 | Black
Benz | 2011 | Red
BMW | 2011 | Orange
As you can see, neither 'Car' nor 'Year' columns are unique. But how can I make the combination of Car and Make unique such that this table doesn't accept any other (BMW, 2013, whatever_colour) entries?

You can do either:
CREATE UNIQUE INDEX ON TableName(Car,Year)
or you can re-create the table with a PRIMARY KEY on (Car, Year). If you have other tables that identify models with the Car, Year combination and you want to ensure that those pairs are checked against the main table, the PRIMARY KEY is the preferred solution (with matching FOREIGN KEYs declared on the other table(s)).

Related

Does taking advantage of dynamic columns in Cassandra require duplicated data in each row?

I've been trying to understand how one would model time series data in Cassandra, like shown in the below image from a popular System Design Interview video, where counts of views are stored hourly.
While I would think the schema for this time series data would be something like the below, I don't believe this would lead to data actually being stored in the way the screenshot shows.
CREATE table views_data {
video_id uuid
channel_name varchar
video_name varchar
viewed_at timestamp
count int
PRIMARY_KEY (video_id, viewed_at)
};
Instead, I'm assuming it would lead to something like this (inspired by datastax), where technically there is a single row for each video_id, but the other columns seem like they would all be duplicated, such as channel_name, video_name, etc.. within the row for each unique viewed_at.
[cassandra-cli]
list views_data;
RowKey: A
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=2, viewed_at=1370463146717000)
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=3, viewed_at=1370463282090000)
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=8, viewed_at=1370463282093000)
-------------------
RowKey: B
=> (channel_name='Some other channel', video_name='Some video', count=4, viewed_at=1370463282093000)
I assume this is still considered dynamic wide row, as we're able to expand the row for each unique (video_id, viewed_at) combination. But it seems less than ideal that we need to duplicate the extra information such as channel_name and video_name.
Is the screenshot of modeling time series data misleading or is it actually possible to have dynamic columns where certain columns in the row do not need to be duplicated?
If I was upserting time series data to this row, I wouldn't want to have to provide the channel_name and video_name for every single upsert, I would just want to provide the count.
No, it is not necessary to duplicate the values of columns within the rows of a partition. It is possible to model your table to accomodate your use case.
In Cassandra, there is a concept of "static columns" -- columns which have the same value for all rows within a partition.
Here's the schema of an example table that contains two static columns, colour and item:
CREATE TABLE statictbl (
pk int,
ck text,
c int,
colour text static,
item text static,
PRIMARY KEY (pk, ck)
)
In this table, each partition share the same colour and item for all rows of the same partition. For example, partition pk=1 has the same colour='red' and item='apple' for all rows:
pk | ck | colour | item | c
----+----+--------+--------+----
1 | a | red | apple | 12
1 | b | red | apple | 23
1 | c | red | apple | 34
If I insert a new partition pk=2:
INSERT INTO statictbl (pk, ck, colour, item, c) VALUES (2, 'd', 'yellow', 'banana', 45)
we get:
pk | ck | colour | item | c
----+----+--------+--------+----
2 | d | yellow | banana | 45
If I then insert another row withOUT specifying a colour and item:
INSERT INTO statictbl (pk, ck, c) VALUES (2, 'e', 56)
the new row with ck='e' still has the colour and item populated even though I didn't insert a value for them:
pk | ck | colour | item | c
----+----+--------+--------+----
2 | d | yellow | banana | 45
2 | e | yellow | banana | 56
In your case, both the channel and video names will share the same value for all rows in a given partition if you declare them as static and you only ever need to insert them once. Note that when you update the value of static columns, ALL the rows for that partition will reflect the updated value.
For details, see Sharing a static column in Cassandra. Cheers!

SQL column which is a list containing other rows

I have a table which looks like this:
Persons
| Id | Name | FavoriteColor |
And I want to add a new column which is "Friends" so that it becomes
Persons
| Id | Name | FavoriteColor | Friends |
I want the Friends column to be (in abstraction) a list containing the Ids of other rows in the Persons table.
What's the best way to do this? I know I can use FK's to link up tables, but I'm not just linking a specific row to a specific row in a different table, but rather a specific row to a specific table.
You dont add a field, you create a second table
Persons: id, Name, Favorite Color
Friends: PersonA_id, PersonB_id
Where PersonA_id, PersonB_id are Foreign Key to Persons table.
So you can have things like this in persons
id Name Color
1 Luis Blue
2 Pedro Red
3 Ana Yellow
4 Donald Black
Friends:
PersonA_id PersonB_id
1 2
3 1
Luis have 2 friends, Pedro and Ana only have a friend (Luis), and Donald 0 friends.

Single table column refers to multiple primary key

I need to store multiple values in a single column.
For example I am creating table which holds the user preferences
e.g.
| user_id | cities | countries |
|---------|------------|------------|
| 1 | 10, 11, 23 | 21, 34 |
because i can't store them as array (or don't prefer to store as array even if it is available - due to maintenance and performance reasons - and better RDMS design), i have to create a mapping table like this
| user_id | type | reference_id |
|---------|---------|--------------|
| 1 | CITY | 10 |
| 1 | CITY | 11 |
| 1 | CITY | 23 |
| 1 | COUNTRY | 21 |
| 1 | COUNTRY | 34 |
The reference id in this column refers to the master tables like city, country, etc.
The problem here i see is
I can't have FK reference to city or country table, because single reference_id column may refer to city or country depends on the type
As i can't have FK, there is no guaranty that we can't have dirty data
Is there any better approach?
Note:
I have given city/country as sample, but i need to have around 20 columns which can have multiple values like city or country
In future i may introduce some boolean preference like "whether you like to travel" so i might want to store TYPE as "TRAVEL" and referece_id as 0 for yes 1 for no; which definately will not have any reference
You could create a Location Table {LocationId, locationType (city/country)}
and then everytime you add a new record to the city or country table, add it to location table first, then add it to city (or country) table as appropriate with same cityId (or countryId) as was used as LocationId in Location Table.
then create FK between preferences table and location table, and add [zero or one] to one (0/1 - 1) FK relationship between City and country tables to the Location table. (Every record in City and COuntry table tables must be in Location table, but not the other way around.
You're saying you want a table for generic data instead of 20 lookup tables enforcing RI? On a large system, the data would be stored in multiple tables instead of using a delimiter to separate the values and then exploding them out in another table, introducing the problem of enforcing RI. If you're storing values that are really generic, like code/description pairs, you just need a codeSetID field to identify which codes belong in which codesets.

References in a table

I have a table like this, that contains items that are added to the database.
Catalog table example
id | element | catalog
0 | mazda | car
1 | penguin | animal
2 | zebra | animal
etc....
And then I have a table where the user selects items from that table, and I keep a reference of what has been selected like this
User table example
id | name | age | itemsSelected
0 | john | 18 | 2;3;7;9
So what I am trying to say, is that I keep a reference to what the user has selected as a string if ID's, but I think this seems a tad troublesome
Because when I do a query to get information about a user, all I get is the string of 2;3;7;9, when what I really want is an array of the items corresponing to those ID's
Right now I get the ID's and I have to split the string, and then run another query to find the elements the ID's correspond to
Is there any easier ways to do this, if my question is understandable?
Yes, there is a way to do this. You create a third table which contains a map of A/B. It's called a Multiple to Multiple foreign-key relationship.
You have your Catalogue table (int, varchar(MAX), varchar(MAX)) or similar.
You have your User table (int, varchar(MAX), varchar(MAX), varchar(MAX)) or similar, essentially, remove the last column and then create another table:
You create a UserCatalogue table: (int UserId, int CatalogueId) with a Primary Key on both columns. Then the UserId column gets a Foreign-Key to User.Id, and the CatalogueId table gets a Foreign-Key to Catalogue.Id. This preserves the relationship and eases queries. It also means that if Catalogue.Id number 22 does not exist, you cannot accidentally insert it as a relation between the two. This is called referential-integrity. The SQL Server mandates that if you say, "This column must have a reference to this other table" then the SQL Server will mandate that relationship.
After you create this, for each itemsSelected you add an entry: I.e.
UserId | CatalogueId
0 | 2
0 | 3
0 | 7
0 | 9
This also alows you to use JOINs on the tables for faster queries.
Additionally, and unrelated to the question, you can also optimize the Catalogue table you have a bit, and create another table for CatalogueGroup, which contains your last column there (catalog: car, animal) which is referenced via a Foreign-Key Relationship in the current Catalogue table definition you have. This will also save storage space and speed up SQL Server work, as it no longer has to read a string column if you only want the element value.

Modelling a cube in SSAS

I'm new to designing cubes with SSAS.
In my simple cube, I have one fact table with 3 dimension tables, as below. The fact table (table1) contains a list of client IDs and other columns linking to the 3 dimensions. This all works fine.
table1
client_id | dimension_link_1 | dimension_link_2 | dimension_link_3
AAAAA | xxx |zzz |bbb
BBBBB | yyy |aaa |ccc
I have another table (table2) that contains three columns - Client ID, Classification Type and Classification Name. A client may have 1-n classifications recorded against them (i.e. ethnicity, religion, allergies etc) so the Client ID may appear on multiple rows in table2. e.g.
table2
client_id | classification_type | classification_name
AAAAA | Ethnicity | Japanese
AAAAA | Allergy | Hayfever
AAAAA | Nationality | Russian
BBBBB | Ethnicity | Spanish
BBBBB | Allergy | Aspirin
BBBBB | Nationality | Spanish
BBBBB | Physical Support | Yes
I want to add table2 into my cube so that I can aggregate the list of client IDs by the existing fact table (table1) by Classification Type and Classification Name in table2.
However, I'm not sure what the correct approach for doing this is? I tried joining table2 to the fact table (table1) as a dimension linked on Client ID but I think this only joined the two objects together using the first occurrence of the Client ID in table2.
Help! :)
Thanks,
Hologram
Import table2 as both a fact table and a dimension. Then in the cube designer in the dimension usage tab, when specifying the relationship between the measure group formed from table1 and the dimension formed from table2, choose "Many to Many" as the relationship type and ensure the "intermediate measure group" is the one you formed from table2.

Resources