Cassandra - Datacenter data segregation

Cassandra - Datacenter data segregation - database

I am setting up an Apache Cassandra cluster and I want to segregate certain data to only certain datacenters. I know I can limit where the data is stored via replication factor, but that is not enough.
I have the keyspaces DC1DATA, DC2DATA, ALL, and I want my DC1 data to be
A) stored in DC1 - solved via replication factor
B) inaccessible from DC2 (like you cannot run a select statement even as admin user)
And I want both datacenter having access to the "ALL" keyspace.
Can I do that somehow?
This is what I am doing for setting up the keyspaces (example had 1 node x datacenter, total 2 nodes):
CREATE KEYSPACE dc1data
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'dc1' : 1
} ;
CREATE KEYSPACE dc2data
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'dc2' : 1
} ;
CREATE KEYSPACE all
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'dc1' : 1,
'dc2' : 1
} ;
but I can still connect to any node in DC1 and do
cqlsh> use dc2data;
cqlsh:dc2data> create table if not exists test (
name text,
lastname text,
primary key ((lastname),name)
);
cqlsh:dc2data> insert into test (name, lastname) values ('Homer','Simpson');
cqlsh:dc2data> select * from test;
lastname | name
----------+----------
Simpson | Homer
That is what I want to avoid: seeing the dc2data keyspace from dc1, at all. Is that possible? Even to admin users?

Related

How to determine account_id and region from a query?

I have datasets that are pulling from multiple Amazon RDS servers in multiple accounts, and I'd really like to be able to have the SQL Server instance tell me which account owns it and which region it lives in.
For example, this would be ideal when constructing ARNs on the fly:
SELECT id, 'arn:aws:quicksight:' + rdsadmin.dbo.get_region() +
':' + rdsadmin.dbo.get_account_id() + ':group/default/admin' AS groupArn
FROM my_rules_table
I've looked all over and I don't see a way to infer this information. I could create unique versions of those UDFs on every server with static values, but I'd really rather fetch the actual values dynamically.
EDIT:
Another way to think about my request is that I want to do in Amazon RDS what I can do in all my other EC2 instances:
read -r account_id region <<< $(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r '. | "\(.accountId) \(.region)"')
echo "arn:aws:quicksight:$region:$account_id:group/default/admin"
This is just a workaround because Amazon QuickSight has certain requirements on the supported SQL features used to fetch data.

I was unable to find the information exposed from Amazon RDS for Microsoft® SQL Server®, so I created a table to hold this information in each RDS instance:
CREATE TABLE rds_instance (
id UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()),
account_id VARCHAR(20) NOT NULL,
environment VARCHAR(20) NOT NULL,
region VARCHAR(20) NOT NULL,
active BIT NOT NULL DEFAULT(0),
PRIMARY KEY (id)
);
The values for account_id, environment, and region can be plugged in and used where needed. A copied database can be programmatically modified for its new placement:
UPDATE rds_instance SET active = 0;
INSERT INTO rds_instance (account_id, environment, region, active)
VALUES ('12341234123', 'stage', 'us-southwest-7', 1);
The instance information can be used to produce ARNs in a query like so:
SELECT u.fkid, 'arn:aws:quicksight:' + ri.region + ':' + ri.account_id +
':user/' + ri.environment + '_ns/' + u.username AS userArn
FROM users AS u
JOIN rds_instance AS ri ON (ri.active = 1)

Is it safe to drop the local database in mongodb?

Sometimes when we drop a database from mongodb, not all the data is removed from the local database if replication is enabled. I wanted to know if it is safe to drop the local database.

By dropping the local database you "de-initialize" the Replica Set, i.e. afterwards you need to run rs.initiate() to get a running Replica Set.
However, you may drop the local database only when your node is running in Maintenance Mode!

The local database in replicaSet or sharded cluster members contain metadata for replication process but it is not replicated itself , if you check the local database content you will see the main consumer is the rs.oplog collection which by default occupy 5% of your partition , so if you have big partition the oplog capped collection will ocupy more space , the good news are that you may resize the oplog manually after version 3.6 with the command:
db.adminCommand({replSetResizeOplog: 1, size: 990})
where you limit the oplog collection to 990MB
( 990MB is the minimmum allowed size of rs.oplog )
Dropping the local database is not generally recommended.
In your case it looks you have 400GB partition and mongo automatically capped the rs.oplog to 20GB .
If you try to drop the database when replicaSet mode is active you will get an error:
rs1:PRIMARY> use local
switched to db local
rs1:PRIMARY> db.runCommand( { dropDatabase: 1 } )
{
"operationTime" : Timestamp(1643481374, 1),
"ok" : 0,
"errmsg" : "Cannot drop 'local' database while replication is active",
"code" : 20,
"codeName" : "IllegalOperation",
"$clusterTime" : {
"clusterTime" : Timestamp(1643481374, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs1:PRIMARY>
If you try dropping the rs.oplog collection only , it is also no possible in replication mode:
rs1:PRIMARY> db.oplog.rs.drop()
uncaught exception: Error: drop failed: {
"ok" : 0,
"errmsg" : "can't drop live oplog while replicating",
"$clusterTime" : {
"clusterTime" : Timestamp(1643482576, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1643482576, 1)
} :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
DBCollection.prototype.drop#src/mongo/shell/collection.js:713:15
#(shell):1:1
rs1:PRIMARY>
so if you still want to drop it you will need to restart the member as standalone ( without replication mode active ) to be able to drop it.
Following is the content of typical local database(v4.4 in example):
> use local
switched to db local
> show collections
oplog.rs
replset.election
replset.initialSyncId
replset.minvalid
replset.oplogTruncateAfterPoint
startup_log
system.replset
system.rollback.id
>
and this is how you can drop it:
> use local
switched to db local
> db.runCommand( { dropDatabase: 1 } )
{ "dropped" : "local", "ok" : 1 }
>
Bear in mind after dropping the collection all local replication info will be lost , if the member was SECONDARY before restarting in standalone mode there will be no issues since after restarting in replication mode the member will get its configuration from the PRIMARY so local database will be recreated with all its collections.
If the member was PRIMARY and no other seeding members available , the replication info will be lost and you will need to rs.initiate() the collection once again.

Why do I get a 'select active warehouse' error in dbt when trying the table materialization, but not with the view materialization?

I've been working with dbt for a couple of months now, so still fairly new to it. When running a test model, I have no problems when using the view materialization:
{{ config(materialized='view') }}
select 1 as id
Resulting in:
15:30:25 | 1 of 1 START view model dbt.stg_CampaignTableTest.................... [RUN]
15:30:26 | 1 of 1 OK created view model dbt.stg_CampaignTableTest............... [SUCCESS 1 in 1.48s]
However, when I make the switch to a table materialization I get an error message about not having an active warehouse selected in Snowflake:
{{ config(materialized='table') }}
select 1 as id
Resulting in:
15:32:52 | 1 of 1 START table model dbt.stg_CampaignTableTest................... [RUN]
15:32:53 | 1 of 1 ERROR creating table model dbt.stg_CampaignTableTest.......... [ERROR in 1.22s]
Database Error in model stg_CampaignTableTest (models/test/stg_CampaignTableTest.sql)
000606 (57P03): No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command.
Of course, it's not possible to include a "use warehouse" statement within my test model as it is inserted into the compiled SQL at the wrong position:
{{ config(materialized='table') }}
use warehouse "AnalysisTeam_WH";
select 1 as id
Because it leads to:
2021-10-07T15:33:59.366279Z: On model.my_new_project.stg_CampaignTableTest: /* {"app": "dbt", "dbt_version": "0.21.0", "profile_name": "user", "target_name": "default", "node_id": "model.my_new_project.stg_CampaignTableTest"} */
create or replace transient table "AnalysisTeam"."dbt"."stg_CampaignTableTest" as
(
use warehouse "AnalysisTeam_WH";
2021-10-07T15:33:59.366342Z: Opening a new connection, currently in state closed
2021-10-07T15:34:00.163673Z: Snowflake query id: 019f7386-3200-ec67-0000-464100e189fa
2021-10-07T15:34:00.163803Z: Snowflake error: 001003 (42000): SQL compilation error:
syntax error line 4 at position 0 unexpected 'use'.
I appear to have the correct permissions with my Snowflake 'role' to create tables, views, etc., so I was at a loss to understand why changing from view to table would cause the model to fail. I suspect it could be related to Snowflake permissions rather than a dbt issue but I am not sure. Any ideas would be really appreciated!
Edit: I appeared to make a mistake with my screenshots so I have switched to code snippets which is hopefully clearer.

I would suggest checking two possibilities.
A. The active profile coniguration at "~/.dbt/profiles.yml" Snowflake Profile:
and search for 'warehouse:'
my-snowflake-db:
target: dev
outputs:
dev:
type: snowflake
account: [account id]
# User/password auth
user: [username]
password: [password]
role: [user role]
database: [database name]
warehouse: [warehouse name] <<<<<
schema: [dbt schema]
threads: [1 or more]
B. Default warehouse setting for user used for connection ALTER USER:
SHOW USERS;
ALTER USER user_name SET DEFAULT_WAREHOUSE = '<existing_warehouse_name>';

Make sure the Snowflake Role dbt is using has been granted access to the Snowflake Warehouse dbt is using.
show grants on warehouse 'xxxxxxxx'

Should I create new table for every consistent data set or one systemCodes table?

Should I create new table for every consistent data set or one system--codes table WHEN using Microsoft Entity framework 6 ?
I mean by the "consistent data set" are like:
+Agent (Table)
-Id
-Name
-Status (consistent data = available | busy | unavailable)
-Type (consistent data = reception | delivery | driver)
-Gender (consistent data = male | female)
-AddressId
+Address (Table)
-Id
-Description
-Longitude
-Latitude
-City (consistent data = [ .... any city .... ])
-State (consistent data = [ .... any state.... ])
My question is should i Have a table for each of Status, Type, Gender,...etc and link them to the "Agent" table using foreign keys and navigational properties ?
OR just make one table like this :
+SystemCodeTable
-CodeId
-CodeParentId
-NameAr
-NameEn
-Description
and save all my consistent data into it and then assign the CodeId to the "Status", "Type", "Gender", ...etc columns ?
Thank you

I think For
Status and Type - better you create StatusMaster and TypeMaster resp.
StatusMaster (StatusID)0 - Available
1 - Busy 2- Unavailable and Use StatusID in Agent Table same you can create for TypeMaster. and for Gender you can directly use 'M' and 'F' or 'Male' and 'Female' as it is in Agent table.
Hope this will help you.

Entity Framework 6 Code First From Database context performs CREATE TABLE on existing an VIEW

We have a production Oracle database server maintained by our ERP partner.
For some custom development I need to connect to this Oracle database using Entity Framework 6. I have a user that can SELECT any table on the ERP schema and I create views in the schema/user used in my EF context.
The view itself is pretty straightforward, a few joins but all referencing tables on another schema ofcourse.
i.e.:
CREATE TABLE ERP.M_GROUP
(
FILE VARCHAR2(3 BYTE)
, MATFAM VARCHAR2(1 BYTE) NOT NULL
, GROUP VARCHAR2(20 BYTE) NOT NULL
, OMS1 VARCHAR2(60 BYTE)
, OMS2 VARCHAR2(60 BYTE)
, RESTW_FACTOR1_I NUMBER
)
CREATE VIEW EF6CTX.GROUPS AS
SELECT
GROUP Id,
MAX(OMS1) Name
FROM
M_GROUP
WHERE
FILE = 'BAT'
AND MATFAM IN ('B','C','I', 'K')
GROUP BY GROEP
When I connect to my database using Visual Studio's Entity Framework 6 Code First from Database identifing as user EF6CTX I can select this view and my model is created as it should.
But when I try to read these groups..
var ctx = new TestContext();
ctx.Database.Log = Console.WriteLine;
foreach (var group in ctx.GROUPS)
{
Console.WriteLine("Group: {0}", group.NAME);
}
I get this result:
Opened connection at 21/11/2014 15:29:05 +01:00
Started transaction at 21/11/2014 15:29:05 +01:00
create table "EF6CTX"."GROUPS"
(
"ID" varchar2(20 CHAR) not null,
"NAME" varchar2(60 CHAR) null,
constraint "PK_GROUPS" primary key ("ID")
)
-- Executing at 21/11/2014 15:29:05 +01:00
-- Failed in 217 ms with error: ORA-01031: insufficient privileges
The user EF6CTX has no permissions to create a table.. ofcourse. But why is it trying to create a table? It should USE the existing view!

Fixed when migrations are disabled:
System.Data.Entity.Database.SetInitializer<TestContext>(null);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Cassandra - Datacenter data segregation - database

Related

How to determine account_id and region from a query?

Is it safe to drop the local database in mongodb?

Why do I get a 'select active warehouse' error in dbt when trying the table materialization, but not with the view materialization?

Should I create new table for every consistent data set or one systemCodes table?

Entity Framework 6 Code First From Database context performs CREATE TABLE on existing an VIEW

Categories

Resources