How to define dynamic column families in cassandra - database

Here it is said, that no special effort is need to get a dynamic column family. But I always get an exception, when I try to set a value for an undefined column.
I created a column family like this:
CREATE TABLE places (
latitude double,
longitude double,
name text,
tags text,
PRIMARY KEY (latitude, longitude, name)
)
BTW: I had to define the tags column. Can somebody explain me why? Maybe because all other columns are part of the Index?
Now when inserting data like this:
INSERT INTO places ("latitude","longitude","name","tags") VALUES (49.797888,9.934771,'Test','foo,bar')
it works just fine! But when I try:
INSERT INTO places ("latitude","longitude","name","tags","website") VALUES (49.797888,9.934771,'Test','foo,bar','test.de')
I get following error:
Bad Request: Unknown identifier website
text could not be lexed at line 1, char 21
Which changes are needed so I can dynamically add columns?
I am using Cassandra 1.1.9 with CQL3 with the cqlsh directly on a server.

CQL3 supports dynamic column family but you have to alter the table schema first
ALTER TABLE places ADD website varchar;
Check out the 1.2 documentation and CQL in depth slides

CQL3 requires column metadata to exist. CQL3 is actually an abstraction over the underlying storage rows, so it's not a one-to-one. If you want to use dynamic column names (and there are lots of excellent use cases for them), use the traditional Thrift interface (via the client library of your choice). This will give you full control over what gets stored.

Related

Multiple column names in PutHBaseRecord Row Identifier Field Name property in NiFi

I have a simple flow of extracting rows from Oracle Table and putting it into Hbase via NiFi.
To Extract Data from DB I am using "QueryDataBase Table" and put to HBase I am using "PutHbase Record" Processor.
Usually, whatever is the primary key of my Table I am using it as a "Row Identifier Field" in putHbaseRecord.
My problem is arising when there is Composite Primary Key, As Row Identifier Field property in putHbase Record processor is not taking multiple columns.
Any help in this will be really Helpful.
Thanks
Unfortunately this is not currently possible with PutHBaseRecord. It would require a code change to the processor to allow the specifying multiple field names for the row id, and then it would have to get them and from each record and concatenate them together to form the row id value.
It might be better to make the property be a record path expression that creates the row id. This way if you want a single value you just put something like '/field1' and if you wanted a composite value you'd do something like "concat('/field1', '/field2')".

Dynamic table name vs array column in static table

I'm using PostgreSQL to coordinate a large scale simulation which involves initializing various components of my application via arrays of integers. In particular, I have the notion of a "Controller", where each controller requires a variable number of parameters to be initialized.
I have a job table which stores the controller_id and a controller_parameters foreign key for actually linking to the set of parameters we want. My idea to start with was to do the following:
Use the controller_id to dynamically choose a table name from which to select the initialization parameters. Each of these tables would have a controller_parameters column that links the actual initialization data to the source table.
Once we know the table, run a SELECT * FROM #someController_parameters_table p WHERE p.controller_parameters = controller_parameters LIMIT 1;
Put these into a custom type which has an integer array field to be returned to the client.
The main problem is that this has Dynamic SQL, which I hear is not a good thing to do.
My proposed change is to have a new table, let's say controller_parameters which has the columns (controller_id, parameters_id, parameters[]). The third column stores the initialization parameters for an individual controller and parameter set.
Pros of this design are that we move back into static SQL land, which is good. Cons are that, when I generate the actual parameters to insert to the individual parameters table, I usually use a cross join to get all of the permutations of the individual parameters, and insert them accordingly to individual tables. I personally don't know how to take a cross-joined table row and convert it to an int[], so that's a potential roadblock.
Thoughts?
You can use array_agg to take the result of a group and turn it into an array.
select controller_id, parameters_id, array_agg(parameter)
from ...
group by controller_id, parameters_id
Here's postgresql docs on aggregate functions:
https://www.postgresql.org/docs/9.5/static/functions-aggregate.html

When creating a database - Is it advisable to create a column for data I can calculate based on other fields?

I am building an application for a lawyer, in which he can create a client portfolio. In this portfolio, there is the ID of the portfolio, the creation date, the client's name, telephone, etc.
Besides of all these fields, there is another field: "portfolio name". This field contains some information about the client from the other fields, in a formatted text.
So, for example, if:
ID = 271
client_name = "John Doe"
creation_date = 18/02/2016
the portfolio_name will be 271/John Doe/18022016.
Now, since the portifolio_name is not really containing new data, but only formatted data from other fields, should it really exist in the database table as a column? Is that Data Duplication or not?
This is a textbook violation of 1NF and should generally be avoided. It's acceptable in some cases -- for example, where the calculated value is very difficult or time-consuming to obtain. However, since string concatenation is so simple (you can even do it right in your query, without the definition of any pseudo-fields) I wouldn't ever recommend doing this unless the field simply contains an initial default value and the client has the ability to customize it later. Otherwise, it will eventually become inconsistent. E.g., what happens when a client's name changes?
It depends on the size of the table and how you query the table.
If the table is large you can create a column for the calculated field. So that it will be easy for querying.
If the table small you can calculate in the query
Most database engines allow you to create a computed column for this exact purpose. Depending on the engine and how you set up the computed column it may or may not be saved to disk, but it will be guaranteed to be always up to date. The nice thing is that you can treat it like it's a read only column.
https://technet.microsoft.com/en-us/library/ms191250%28v=sql.105%29.aspx

Multiple elements in one database cell

The question is how database design should I apply for this situation:
main table:
ID | name | number_of_parameters | parameters
parameters table:
parameter | parameter | parameter
Number of elements in parameters table does not change. number_of_parameters cell defines how many parameters tables should be stored in next cell.
I have problems to move from object thinking to database design. So when we talk about object one row has as much parameters as number_of_parameters says.
I hope that description of requirements is clear. What is the correct way to design such database. If someone can provide some SQL statments to obtain it it would be nice. But the main goal of this question is to understand how to make such architecture.
I want to use SQLite to create this database.
The relational way is to have two tables. The main table has an ID, name and as many other universally-present parameters as possible. The parameters table contains a mapping from an ID in the main table to a parameter name and a parameter value; the main table ID should be a foreign key, and the combination of ID and name should be unique.
The number of parameters can be found by just counting the number of rows with a particular ID.
If you can serialize the data whiile saving to the database and deserialize it back when you get the record it will work. You can get total number of objects in serialized container and save the count to the number_of_parameters field and serialized data in parameters field.
There isn't one perfect correct way, but if you want to use a relational database, you preferably have relational tables.
If you have a key-value database, you place your serialized data as a document attached to your key.
If you want a hybrid solution, both human editable and single table, you can serialize your data to a human-readable format such as yaml, which sees heavy usage in configuration sections of open source projects.

"/" in the Database column name

I have database with a column name "State/Province". All the queries and data transfers work properly. But in the "SelectedValue" property of the dropdownlist control, bind expressions throws an error.
When I edit the column name by removing the slash sign, it works well.
So using slash in the column name is not a proper way of naming?
Basically using anything different than:
Alphabets
Numbers (not at start of the column name)
Underscore (_)
is not recommended as it is not a good way to name fields and some datasources might throw errors on other characters.
Some good points about Column Naming convention:
Avoid underscores, they look unnatural and slow the reader down.
Never use a column name that requires [ ]. Shame on Microsoft for
excessive use of ID which requires the use of a table qualifier.
Use Proper Case, descriptive names and don't abbreviate.
Name primary keys with a suffix that denotes it data type.
TableNameID for integer (the preferred choice for all primary keys).
TableNameCode for varchar.
TableNameKey (other data types).
Do not change the spelling of the primary key from a parent table
when it's used in a child table.
Don't use acronyms unless they are well know by programmers or all
employees of your company.
I know it's an old threat, but if you're not the designer of the table and fields but just want to use the data, I would suggest you use:
SELECT * FROM <YOUR TABLE NAME>
You probably notice that SQL Management studio returns a field name for your column like 'State_Province'.
This is the SQL fieldname that you can use in your queries

Resources