Considere the following case regarding Cassandra database: I must perform a batch statement with some related data, e.g: table users and table users_by_username. I want to insert on user creation data on both tables. It's a transaction. In Cassandra documentation says the batch statement cannot reach multiple partitions. If I model the primary key as a composite key like following :
CREATE TABLE IF NOT EXISTS user(
id text,
tpe text,
username text,
PRIMARY KEY((tpe, id))
);
CREATE TABLE IF NOT EXISTS user_by_username(
username text,
tpe text,
id text,
PRIMARY KEY((tpe, username))
);
Example of rows:
user: ('1', 'users', 'lucasrpb')
user_by_username: ('lucasrpb', 'users', '1')
My doubt: will data be on the same partition to be able to do the batch?
Partitions are within a table, not across tables. However, the token for data, which is used to identify which replicas will host the data, is based on the partition key (the first column in the primary key, or the first column(s) surrounded in parenthesis).
In your case, the partition key for 'user' is (tpe, id) and user_by_username is (tpe, username). Because of this, the token for the data will likely not be the same.
If the primary key for user was (tpe, id), user_by_username (tpe, username), the partition key for each case would be tpe, therefore granted tpe was the same, the token would be the same and therefore the data would be stored on the same replicas.
In any case, I would not recommend batching operations to update user_by_username and user together, but it'd be better in the case where the partition key was the same as less C* nodes need to be written to in the batch.
Since the only difference between your tables is your primary key, I think a good candidate for you if you are on a 3.0+ version would be to look into materialized views which were introduced in 3.0. With this you could set up a user_by_username view off of the user table like:
CREATE MATERIALIZED VIEW user_by_username AS
SELECT * FROM users
WHERE username IS NOT NULL
PRIMARY KEY ((tpe, username));
This way you only have to make changes to user, which will then be propagated to user_by_username for you.
Related
I am talking about the normalization of a primary key. So let's say my primary key column is of type nvarchar, which violates the rules of normalization. After removing the primary key constraint and the identity specification from the desired column. I need to create a new column which will be the new primary key of that table.
My question is, what should happen with the previous primary key?
I've got an answer that sounds like: "the column should became a semantic key", but i can't understand this answer.
It's not unusual when designing a database schema to use a SURROGATE primary key. The idea is to give each record a unique and permanent identifier so it can be easily referenced by applications and foreign keys. This key has no meaning. Knowing the surrogate key gives you no information about the content of the record. The user of your application would never see this value.
On the other hand, your record may have a SEMANTIC primary key. This is a unique value that identifies this data to that makes sense to the user.
For example, let's say you have a table of Employees. The employer assigns each employee a unique Employee ID Number. Let's say you store this value as a string. To the user that value serves as the unique identifier that refers to that employee. Meanwhile, your table may have a numeric column that serves as the unique identifier for that record.
create table Employee ( EmployeeRecordID int identity(1,1) primary key,
EmployerAssignedID nvarchar(12),
EmployeeName nvarchar(60),
Salary money )
insert into Employee ( EmployerAssignedID, EmployeeName, Salary ) values
( '#ABC100', 'Fred', 25000.12 ),
( '#AZZ314', 'Mary', 37700.00 ),
( '#MAA719', 'Fran', 34444.04 ),
( '#MZA977', 'Mary', 36000.00 )
As each record is added, SQL Server generates a unique EmployeeRecordID for each record, starting with 1. This is the SURROGATE key. Within your database and within your application, you would use this value to reference the record.
But when your application is communicating with the users, you would use the EmployerAssignedID. This is the SEMANTIC primary key. It makes sense to your users to use this value to search for a particular employee.
A primary key is no more than a unique index which can't have NULL value as a key. Like any of indexes it can be clustered or nonclustered.
Deleting a clustered index makes table become a heap with changes in structure and behaviour. Deleting a nonclustered index is just deallocation its space and does not affect that table and other indexes on the table as well.
So after deleting you just have a column(s) with unique values and you are able to consider them as a semantic key until some duplicate values are inserted.
I'm having a really, really strange issue with postgres. I'm trying to generate GUIDs for business objects in my database, and I'm using a new schema for this. I've done this with several business objects already; the code I'm using here has been tested and has worked in other scenarios.
Here's the definition for the new table:
CREATE TABLE guid.public_obj
(
guid uuid NOT NULL DEFAULT uuid_generate_v4(),
id integer NOT NULL,
CONSTRAINT obj_guid_pkey PRIMARY KEY (guid),
CONSTRAINT obj_id_fkey FOREIGN KEY (id)
REFERENCES obj (obj_id)
ON UPDATE CASCADE ON DELETE CASCADE
)
However when I try to backfill this using the following code, I get a SQL state 23503 claiming that I'm violating the foreign key constraint.
INSERT INTO guid.public_obj (guid, id)
SELECT uuid_generate_v4(), o.obj_id
FROM obj o;
ERROR: insert or update on table "public_obj" violates foreign key constraint "obj_id_fkey"
SQL state: 23503
Detail: Key (id)=(-2) is not present in table "obj".
However, if I do a SELECT on the source table, the value is definitely present:
SELECT uuid_generate_v4(), o.obj_id
FROM obj o
WHERE obj_id = -2;
"0f218286-5b55-4836-8d70-54cfb117d836";-2
I'm baffled as to why postgres might think I'm violating the fkey constraint when I'm pulling the value directly out of the corresponding table. The only constraint on obj_id in the source table definition is that it's the primary key. It's defined as a serial; the select returns it as an integer. Please help!
Okay, apparently the reason this is failing is because unbeknownst to me the table (which, I stress, does not contain many elements) is partitioned. If I do a SELECT COUNT(*) FROM obj; it returns 348, but if I do a SELECT COUNT(*) FROM ONLY obj; it returns 44. Thus, there are two problems: first, some of the data in the table has not been partitioned correctly (there exists unpartitioned data in the parent table), and second, the data I'm interested in is split out across multiple child tables and the fkey constraint on the parent table fails because the data isn't actually in the parent table. (As a note, this is not my architecture; I'm having to work with something that's been around for quite some time.)
The partitioning is by implicit type (there are three partitions, each of which contains rows relating to a specific subtype of obj) and I think the eventual solution is going to be creating GUID tables for each of the subtypes. I'm going to have to handle the stuff that's actually in the obj table probably by selecting it into a temp table, dropping the rows from the obj table, then reinserting them so that they can be partitioned properly.
In SQL Server I have a table with the following data: first name, last name, birthplace, etc, etc.
The table has an identity ID column (the primary key of the table). In the interface of my application, the user can modify this data until they hit a button named "Close Record", when this happens I have to generate a Closed_Record_ID with syntax year + consecutive_number. For all the records in my table, the order they entered the database (given by the identity ID), may not be the same in wich they were closed, so I have to generate a new consecutive number.
How should I use the Tablockx hint, or what should I do to avoid duplicate consecutive numbers in the Closed_Record_ID column?
I have separate assets tables for storing different kind of physical and logical assets, such as:-
Vehicle table( ID, model, EngineSize, Drivername, lastMaintenanceDate)
Server table ( ID, IP, OSName, etc…)
VM (ID, Size, etc…).
VM_IP (VM_ID,IP)
Now the problems I have is:-
For the IP column in the server table and in the VM_IP table, I need this column to be unique in these two tables, so for example the database should not allow a server and a VM to have the same IP. In the current design I can only guarantee uniqueness for the table separately.
So can anyone advice on how I can handle this unique requirement on the databases level.
Regards
::EDITED::
I have currently the following database structure:-
Currently I see these points:-
I have introduced a redundant AssetTypeID column in the base Asset table, so I can know the asset type without having to join tables. This might break normalization.
In my above architecture , I cannot control (on the database level) which asset should have IP, which asset should not have IP and which asset can/cannot have multiple IPs.
So is there a way to improve my architecture to handle these two points.
Thanks in advance for any help.
Create an IP table and use foreign keys
If I were facing the problem in design level, I would add two more tables:
A valid_IP table (containing valid IP range)
A Network_Enabeled, base table for all entities that may have an
IP, like Server table, VM_IP ,... the primary key of this base
table will be the primary key of child tables.
In Network_Enabeled table, Having a foreign key from valid_IP table and setting a unique key on the filed will be the answer.
Hope be helpful.
You can use an indexed view.
CREATE VIEW YourViewName with SCHEMABINDING
as
...
GO
CREATE UNIQUE CLUSTERED INDEX IX_YourIndexName
on YourViewName (..., ...)
Based on your edit, you can introduce a superkey on the asset table and use various constraints to enforce most of what it sounds like you're looking for:
create table Asset (
AssetID int not null primary key,
AssetTypeID int not null
--Skip all of the rest, foreign keys, etc, irrelevant to example
,constraint UQ_Asset_TypeCheck
UNIQUE (AssetID,AssetTypeID) --This is the superkey
)
The above means that the AssetTypeID column can now be checked/enforced in other tables, and there's no risk of inconsistency
create table Servers (
AssetID int not null primary key,
AssetTypeID as 1 persisted,
constraint FK_Servers_Assets FOREIGN KEY (AssetID)
references Asset (AssetID), --Strictly, this will become redundant
constraint FK_Servers_Assets_TypeCheck FOREIGN KEY (AssetID,AssetTypeID)
references Asset (AssetID,AssetTypeID)
)
So, in the above, we enforce that all entries in this table must actually be of the correct asset type, by making it a fixed computed column that is then used in a foreign key back to the superkey.
--So on for other asset types
create table Asset_IP (
AssetID int not null,
IPAddress int not null primary key, --Wrong type, for IPv6
AssetTypeID int not null,
constraint FK_Asset_IP_Assets FOREIGN KEY (AssetID)
references Asset (AssetID), --Again, redundant
constraint CK_Asset_Types CHECK (
AssetTypeID in (1/*, Other types allowed IPs */)),
constraint FK_Asset_IP_Assets_TypeCheck FOREIGN KEY (AssetID,AssetTypeID)
references Asset (AssetID,AssetTypeID)
)
And now, above, we again reference the superkey to ensure that we've got a local (to this table) correct AssetTypeID value, which we can then use in a check constraint to limit which asset types are actually allowed entries in this table.
create unique index UQ_Asset_SingleIPs on Asset_IP (AssetID)
where AssetTypeID in (1/* Type IDs that are only allowed 1 IP address */)
And finally, for certain AssetTypeID values, we ensure that this table only contains one row for that AssetID.
I hope that gives you enough ideas of how to implement your various checks based on types. If you want/need to, you can now construct some views (through which the rest of your code will interact) which hides the extra columns and provides triggers to ease INSERT statements.
On a side note, I'd recommend picking a convention and sticking to it when it comes to table naming. My preferred one is to use the plural/collective name, unless the table is only intended to contain one row. So I'd rename Asset as Assets, for example, or Asset_IP as Asset_IPs. At the moment, you have a mixture.
I want to create a DB , where each table's PK will be GUID and which will be unique across the DB,
Example: my DB name is 'LOCATION'. And I have 3 table as 'CITY' , 'STATE' and 'COUNTRY'.
I want that all the 3 tables have same PK field as GUID ,and that value will be unique across DB.
How to do this in SQL Server, any idea? I have never used SQL Server before, so it will be helpful if briefly explained.
create table CITY (
ID uniqueidentifier not null primary key default newid(),
.
.
.
)
Repeat for the other tables.
What do you mean exactly ?
Just create the table, add an Id field to each table, set the data type of the Id field to 'uniqueidentifier', and you're good to go.
Next, add a primary constraint on those columns, and make sure that, when inserting a new record you assign a new guid to that column (for instance, by using the newid() function).
I can't think of any good reason to have a unique number shared by 3 tables, why not just give each table a unique index with a foreign key reference? Indexed fields are queried quicker than random numbers would be.
I would create a 'Location' table with foreign keys CityId, StateId & CountryId to link them logically.
edit:
If you are adding a unique id across the City, State and Country tables then why not just have them as fields in the same table? I would have thought that your reason for splitting them into 3 tables was to reduce duplication in the database.