I'm designing an application where one table would be really useful for me to be NoSQL,
and others SQL.
So I have a table where I need to store multiple and unknown attributes and then be able to search on them. The rest of db tables are just simple relational.
Example
item
id
name
one item can have attribute : color, shape, other item can have attribute : height, width but not any other ones .
So it smells like NoSQL, but I do a lot more dev with SQL and I always want to choose technology that I know best.
I won't be needing a lot of selects by those attributes at the moment so I will just add
a field "attributes" where I will be keeping all attributes as json_encoded string .
And if i need to select anything by attributes I will write a script for that .
But maybe there's an extra feature of MySQL ( this is what i'm using as RDBMS) that I don't know of ? Or any better ideas ?
I was also thinking of keeping parallel Mongo DB just for 'items' but I generally detest having
same data in 2 places .
Maybe anyone knows a technology that is Relational DB with NoSQL extension like this ?
I ended using RDBMS and table with attributes ( id, name, value ) and it's doing great job
Related
I was working with R09 of Temenos T24 which had Oracle as the background.
Table structure was 2 columns - RECID + Data in Blob (XML format).
Has anyone got an idea, if the structure has been changed to RDBMS structure with the new T24 versions such as R17 or R18?
Thank you for any help in advance !!!
Temenos T24 core was built around the so called "Multi Value Database" UniVerse and then moved to jBASE around year 2003. See this link https://en.wikipedia.org/wiki/MultiValue for an explanation what is a Multi Value Database.
Later, to add support for Oracle and other industry standard "big" databases, Temenos developed a special DB driver for their system that was designed to imitate the Multi-value database functionality inside the RDBMS. The solution was to use XML to store the multi dimensioned fields. And so all T24 tables in Oracle have two columns:
RECID for the ID or Unique Key of the Record
XMLRECORD to store the data.
The XMLRECORD by default is created as XMLTYPE, but can also be BLOB or CLOB type. In this case the data will be stored as it used to be stored inside the old Multi Value Database, i.e. a string where fields are separated by field markers, value markers and sub-value markers.
This basically means that T24 will never move to proper RDBMS structure as that would mean to completely re-write the whole T24 solution, or at least a significant part of it. Since T24 is being developed for 30 or more years now, you can imagine what it would take to perform such a task.
Working with R15 - still RECID + Blob.
I'm quite sure that R18 is the same as we're currently upgrading to R18 and no DB scheme change is in the road map.
you can select table view directly from DB, like SELECT * FROM V_FXXX_ACCOUNT. That table RDBMS you can select field that you need.
Temenos do have a product called Relational Replication aimed at providing a selected table from T24 in a relational format. So all the multi-value / group multi-value elements become child tables and sub-value elements come in further child tables with foregin keys. So easier to index and query. They also have a data model viewer for T24 in Design Studio which gives you an idea of how these tables will be structured.
I'm trying to find the best data model to adapt a very big mysql table in Cassandra.
This table is structured like this:
CREATE TABLE big_table (
social_id,
remote_id,
timestamp,
visibility,
type,
title,
description,
other_field,
other_field,
...
)
A page (which is not here) can contain many socials, which can contain many remote_ids.
Social_id is the partitioning key, remote_id and timestamp are the clustering key: "Remote_id" gives unicity, "Time" is used to order the results. So far so good.
The problem is that users can also search on their page contents, filtering by one or more socials, one or more types, visibility (could be 0,1,2), a range of dates or even nothing at all.
Plus, based on the filters, users should be able to set visibility.
I tried to handle this case, but I really can find a sustainable solution.
The best I've got is to create another table, which I need to keep up with the original one.
This table will have:
page_id: partition key
timestamp, social_id, type, remote_id: clustering key
Plus, create a Materialized View for each combination of filters, which is madness.
Can I avoid creating the second table? What wuold be the best Cassandra model in this case? Should I consider switching to other technologies?
I start from last questions.
> What would be the best Cassandra model in this case?
As stated in Cassandra: The Definitive Guide, 2nd edition (which I highly recommend to read before choosing or using Cassandra),
In Cassandra you don’t start with the data model; you start with the query model.
You may want to read an available chapter about data design at Safaribooksonline.com. Basically, Cassandra wants you to think about queries only and don't care about normalization.
So the answer on
> Can I avoid creating the second table?
is You shouldn't avoiding it.
> Should I consider switching to other technologies?
That depends on what you need in terms of replication and partitioning. You may end up creating master-master synchronization based on RDBMS or something else. In Cassandra, you'll end up with duplicated data between tables and that's perfectly normal for it. You trade disk space in exchange for reading/writing speed.
> how to filter and update a big table dynamically?
If after all of the above you still want to use normalized data model in Cassandra, I suggest you look on secondary indexes at first and then move on to custom indexes like Lucene index.
Please, read first my previous question: T-SQL finding of exactly same values in referenced table
The main purpose of this question is to find out if this approach of storing of data is effective.
Maybe it would be better to get rid of PropertyValues table. And use additional PropertyValues nvarchar(max) column in Entities table instead of it. For example instead of
EntityId PropertyId PropertyValue
1 4 Val4
1 5 Val5
1 6 Val6
table, I could store such data in PropertyValues column: "4:Val4;5:Val5;6Val6"
As an alternative, I could store XML in PropertyValues column....
What do you think about the best approach here?
[ADDED]
Please, keep in mind:
Set of properties must be customizable
Objects will have dozens of properties (approximately from 20 to 120). Database will contain thousands of objects
[ADDED]
Data in PropertyValues table will be changed very often. Actually, I store configured products. For example, admin configures that clothes have attributes "type", "size", "color", "buttons type", "label type", "label location" etc... User will select values for these attributes from the system. So, PropertyValues data cannot be effectively cached.
You will hate yourself later if you implement a solution using multi-value attributes (i.e. 4:Val4;5:Val5;6Val6).
XML is marginally better because there are XQuery functions to help you pull out and parse the values. But the XML type is implemented as a CLR type in SQL Server and it can get extremely slow to work with.
The best solution to this problem is one like you have. Use the sql_variant type for the column if it could be any number of data types. Ideally you'd refactor this into multiple tables / entities so that the data type can be something more concrete.
I work with the similar project (web-shop generator). So every product has attribute and every attribute has set of values. It is different tables. And for all of this there are translations in several languages. (So exists additional tables for attributes and values translations).
Why we choose such solution? Because for every client there should be database with the same scheme. So such database scheme is very elastic.
So what about this solution. As always, "it depends" -))
Storage. If your value will be used often for different products, e.g. clothes where attribute "size" and values of sizes will be repeated often, your attribute/values tables will be smaller. Meanwhile, if values will be rather unique that repeatable (e.g. values for attribute "page count" for books), you will get a big enough table with values, where every value will be linked to one product.
Speed. This scheme is not weakest part of project, because here data will be changed rarely. And remember that you always can denormalize database scheme to prepare DW-like solution. You can use caching if database part will be slow too.
Elasticity This is the strongest part of solution. You can easily add/remove attributes and values and ever to move values from one attribute to another!
So answer on your question is not simple. If you prepare elastic scheme with unknown attributes and values, you should use different tables. I suggest to you remember about storing values in CSV strings. It is better to store it as XML (typed and indexed).
UPDATE
I think that PropertyValues will not change often , if comparing with user orders. But if you doubt, you should use denormalization tables or indexed views to speed up.Anyway, changing XML/CSV on large quantity of rows will have poor performance, so "separate table" solution looks good.
The SQL Customer Advisory Team (CAT) has a whitepaper written just for you: Best Practices for Semantic Data Modeling for Performance and Scalability. It goes through the common pitfalls of EAV modeling and recommends how to design a scalable EAV solution.
I'm trying to figure out how to implement this relationship in coldfusion. Also if anyone knows the name for this kind of relationship I'd be curious to know it.
I'm trying to create the brown table.
Recreating the table from the values is not the problem, the problem that I've been stuck with for a couple of days now is how to create an editing environment.
I'm thinking that I should have a table with all the Tenants and TenantValues (TenantValues that match TenantID I'm editing) and have the empty values as well (the green table)
any other suggestions?
The name of this relationship is called an Entity Attribute Value model (EAV). In your case Tenant, TenantVariable, TenantValues are the entity, attribute and value tables, respectively. EAV is attempt to allow for the runtime definition or entities and is most found in my experience backing content managements systems. It has been referred to an as anti pattern database model because you lose certain RDBMS advantages, while gaining disadvantages such as having to lock several tables on delete or save. Often a suitable persistence alternative is a NoSQL solution such as Couch.
As for edits, the paradigm I typically see is deleting all the value records for a given ID and inserting inside a loop, and then updating the entity table record. Do this inside of a transaction to ensure consistency. The upshot of this approach is that it's must easier to figure out than delta detection algorithm. Another option is using the MERGE statement if your database supports it.
You may want to consider an RDF Triple Store for this problem. It's an alternative to Relational DBs that's particularly good for sparse categorical data. The data is represented as triples - directed graph edges consisting of a subject, an object, and the predicate that describes the property connecting them:
(subject) (predicate) (object)
Some example triples from your data set would look something like:
<Apple> rdf:type <Red_Fruit>
<Apple> hasWeight "1"^^xsd:integer
RDF triple stores provide the SPARQL query language to retrieve data from your store much like you would use SQL.
Is there a paradigm in which I can change a data-key name in one place and one place only, and have it properly be dealt with by both the application and database?
I have resorted most recently to using class constants to map to database field names, but
I still have to keep those aligned with the raw database keys.
What I mean is, using PHP as an example, right now I might use
$infoToUpdateUser[ User::FIELD_FIRST_NAME ]
This means that when I change it at the constant, I don't have to search through the code to change all references to that field.
Another area this crops up in is in referencing fields. Due to some early poor design decisions, I have, for example, these sorts of tables:
( table name : primary_key )
cats : cat_id
dogs : dog_id
parrots : bird_id (remember, poor design, thus the mismatch between parrots / bird_id)
lizards: lizard_id
etc
Then let's say I have a series of form classes that update records.
AnimalForm
DogForm extends AnimalForm
CatForm extends AnimalForm
ParrotForm extends AnimalForm
etc
Now I want to update a record in the SQL database using an update function in the parent class, AnimalForm, so I don't have to replicate code in 20 subclasses.
However I do not know of a way to generalize the update query, so currently each subclass has an idFieldName member variable, and the parent class inserts that into the query, like
"UPDATE " . $this->table . " SET <data> WHERE " . $this->idFieldName
It seems sloppy to do it this way but I can't think of a better solution at this point.
Is there a design model or paradigm that links together or abstracts data-key names to be shared as a reference by both a database and an application?
What you are looking for is called an Object-Relational Mapping layer.
An ORM separates the concerns of data access from business logic by mapping a relational database into an object model. Since the ORM does all the translation, if you change the name of a database table or column, you only have to tell the ORM once, and it will properly apply that change to all of your code.
Since you indicate that you are using PHP, here is a question that addresses ORM libraries in PHP. Additional information about ORM technologies can be found in Wikipedia.