Lets say I have a table that stores user data. It stores 2 types of data - a UserId (partition key) with attributes (json blob). The other type is a reference to the UserId, based off of values within the attributes, for example, here would be 3 rows of the table:
pk attributes userId
5 | { email: example#example.com, tel: 123456789 } | null
email/example#example.com | null | 5
phone/123456789 | null | 5
This is so I am able to query directly off of values to obtain attributes, without needing to do a scan and filter (a very compute intensive operation on large tables).
My question is: Can I, in a single query, do something like getByPartitionKey(email/example#example.com), obtain the userId, and then use that userID to query for the whole attributes document, without doing 2 individual requests? Something akin to a join in SQL.
Your data model is very wrong, here is how to achieve what you want:
pk
sk
phone
email
other
user123
user123
0293480983
example#example.com
some map {}
SELECT * FROM mytable WHERE PK = 'user123'
This would allow you to get all of the information for a given userId. If you want the same information but this time by email, you create a GSI on the email attribute:
email
pk
sk
phone
other
example#example.com
user123
user123
0293480983
some map {}
SELECT * FROM mytable.myindex WHERE email = 'example#example.com'
I have looked at the Twissandra examples. I asked a similar question regarding this a few days back and received some tips I implemented here. However, by looking at the tables (column families) I see barely any difference between this and a relational database.
My scenario:
A simple address book where a user can create his own contacts and group them (one contact can be placed in many groups, one group can contain many contacts). A contact may have multiple addresses for example.
I want to retrieve all the contacts who live in address x and are placed in group y. Therefore, I did the following:
CREATE TABLE if not exists User (user_id uuid, contact_id uuid, type varchar, email varchar, PRIMARY KEY(id));
CREATE TABLE if not exists Contact (contact_id uuid, firstname varchar,lastname varchar, photo blob, imagelength int, note varchar, PRIMARY KEY (id));
CREATE TABLE if not exists Address (address_id uuid, contact_id uuid, street varchar, number int, zipcode varchar, country varchar, PRIMARY KEY(address_id));
CREATE TABLE if not exists Group (group_id uuid, user_id, groupname varchar, PRIMARY KEY(group_id));
CREATE TABLE if not exists Group_Contact (group_id uuid, contact_id, PRIMARY KEY(id, contact_id));
However, based on this, this is literally exactly the same as a relational database, well, except that I believe Cassandra is putting this data in a different way than a RDBMS on disk. I don't see how this can be made better in Cassandra and whether I even modeled this the right way. It just feels as a plain relational database.
I feel that I did something wrong since I have to use application level joins to get the address of the contacts. I really don't know how I can de-normalize this to allow multiple addresses (and maybe even phones, emails).
Any suggestions to improve this scenario would be greatly appreciated!
As jny indicated, data duplication, denormalization and query-based modeling are keys to building good Cassandra data models. If I wanted to take your tables above, and build a table to support address/contact queries based-on country, I could do it like this:
First, I'll create a user defined type for the contact's address.
aploetz#cqlsh:stackoverflow> CREATE TYPE contactAddress (
... street varchar,
... city varchar,
... zip_code varchar,
... country varchar);
Next, I'll create a table called UserContactsByCountry to store user contact info, as well as any user contact addresses:
aploetz#cqlsh:stackoverflow> CREATE TABLE UserContactsByCountry (
... country varchar,
... user_id uuid,
... type varchar,
... email varchar,
... firstname varchar,
... lastname varchar,
... photo blob,
... imagelength int,
... note varchar,
... addresses map<text, frozen <contactAddress>>,
... PRIMARY KEY ((country),user_id));
A couple of things to note here:
I am using country as a partitioning key for querying, and addding user_id as a clustering key for uniqueness.
Technically, country is being stored multiple in each row. Once as the partiiton key, and again with each address. Note that the country partition key is the one which allows us to run our query.
I assume that user contacts can have multiple addresses, so I'll store them in a map of type text (varchar), contactAddress (type created above).
Next, I'll insert three user contacts, each with two addresses, two from the USA and one from Great Britain.
aploetz#cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
VALUES ('USA',uuid(),'Tech','brycelynch#network23.com','Bryce','Lynch','Head of R&D at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'8192 N. 42nd St.',city:'New York',zip_code:'10025',country:'USA'}});
aploetz#cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
VALUES ('USA',uuid(),'Reporter','edisoncarter#network23.com','Edison','Carter','Reporter at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'76534 N. 62nd St.',city:'New York',zip_code:'10024',country:'USA'}});
aploetz#cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
VALUES ('GBR',uuid(),'Reporter','theorajones#network23.com','Theora','Jones','Controller at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'821 Wembley St.',city:'London',zip_code:'W11 2BQ',country:'GBR'}});
Now I can query that table for all user contacts in the USA:
aploetz#cqlsh:stackoverflow> SELECT * FROM usercontactsbycountry WHERE country ='USA';
country | user_id | addresses | email | firstname | imagelength | lastname | note | photo | type
---------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+-------------+----------+---------------------------+-------+----------
USA | 2dee94e2-4887-4988-8cf5-9aee5fd0ea1e | {'home': {street: '8192 N. 42nd St.', city: 'New York', zip_code: '10025', country: 'USA'}, 'work': {street: '101 Big Network Drive', city: 'New York', zip_code: '10023', country: 'USA'}} | brycelynch#network23.com | Bryce | null | Lynch | Head of R&D at Network 23 | null | Tech
USA | b92612dd-dbaa-42f2-8ff2-d36b6c601aeb | {'home': {street: '76534 N. 62nd St.', city: 'New York', zip_code: '10024', country: 'USA'}, 'work': {street: '101 Big Network Drive', city: 'New York', zip_code: '10023', country: 'USA'}} | edisoncarter#network23.com | Edison | null | Carter | Reporter at Network 23 | null | Reporter
(2 rows)
There are probably other ways in which this could be modeled, but this is one that I hoped to use to help you understand some of the techniques available.
It is difficult to switch from modeling for relational databases to modeling for Cassandra, because they seem so similar: the query language looks almost the same. But the first rule of Cassandra is model to your queries while in Relational Databases we model to data. That means:
Consider what your query the most on
Learn about partition keys and cluster keys
Don't be afraid of data duplication
There is a good example on data modeling in Cassandra: https://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html
Hi i am creating a contacts database and i want to use a create a cities table that i can use for the people table in the City field. How do i do this?
City table:
ID | City
--------------
1 | Wellington
2 | Auckland
3 | Christchurch
People Table Design
Field Name: City
Data Type: Short Text
Display Control: Combobox
Row Source Type: Table/Query
Row Source: City
These are my table design for the field City, but it is only showing the ID numbers in the combobox
I really am against the concept of Lookups in table. So I would suggest you to have a read of "The Evils of Lookup" before you proceed.
The problem is because you have used a table name as the RowSource. You need t modify some of the properties of the Field. In the lookup tab, change the Column Count to 2, Column Width to 0cm;2.04cm. Probably RowSource to
SELECT ID, City FROM City;
Say I have a User database table with the regular username, password, email fields. What is a sensible way to add additional boolean fields that enable/disable features for any given user.
e.g.,
user_can_view_page_x
user_can_send_emails
user_can_send_pms
etc
Adding a bunch of boolean columns to the existing user table seems like the wrong way to go.
Yes, I would think that this is the wrong approach.
I would rather create a
User_Features Table
with columns something like
UserID
FeatureName
And check if a given user has the feature in question enabled/entered in the table.
You could even go as far as creating a Users_Groups table, where users are also assosiated with groups and features can be inherited/disallowed from group settings.
I would use three tables.
One is your existing user table:
USER table
----
user_id (pk)
name
email
...
Another is a table containing possible user privileges:
PRIVILEGE table
----
privilege_id (pk)
name
Lastly is a join table containing an entry for each privilege setting for each user:
USER_PRIVILEGE table
----
user_id (pk) (fk)
privilege_id (pk) (fk)
allowed
Here is some sample data for two users, one with the send email privilege and the send pms privilege and another with a view page privilege:
USER data
USER_ID NAME EMAIL
------- ----- -------------------
1 USER1 user1#somewhere.com
2 USER2 user2#somewhere.com
PRIVILEGE data
PRIVILEGE_ID NAME
------------ -----------
1 view_page_x
2 send_email
3 send_pms
USER_PRIVILEGE data
USER_ID PRIVILEGE_ID ALLOWED
------- ------------ -------
1 1 'N'
1 2 'Y'
1 3 'Y'
2 1 'Y'
2 2 'N'
2 3 'N'
An application I am developing needs to provide access to data based on a list of cities defined for each client. A client can have:
access to all cities in a country OR
access to all cities in a state / region OR
access to select cities in any state
or country.
What would be the best way to define this in the database (if the db has a Country table, State / Region table, City table and a Client table)?
Clarification:
(A simplified view of the tables with only the essential columns pertaining to this question).
Country table -
idCountry | Name
State table -
idState | idCountry | Name
City table -
idCity | idState | Name
Client table -
idClient | Name
You could to create a Location self related table (Id, Name, ParentLocation) and a AccessControl table (ClientId, LocationId). When a client is related to a location, you could grant access to all location below it. Some examples:
ID Name Parent
-------------------
1 World NULL -- Need to represent all countries
2 Brazil 1 -- A country
3 São Paulo 2 -- A state
4 São Paulo 3 -- A city
If you want to stick your current model, maybe a table like (ClientId, CountryId nullable, StateId nullable, CityId nullable). This way you could define your security access as your definition, but would need to deal with nullable fields.