What is RDBMS and database engine? [closed] - database

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
It's kinda a noob question but what is the difference between Relational Database Management System and database engine?
Thanks.

The original idea of an RDBMS differs from what is called an RDBMS these days. SQL DBMSs are commonly called RDBMSs, but it's more correct to say they can be used mostly relationally, if one has the knowledge and discipline. It's also possible to use them in the style of a network data model or even inconsistently, which seems like the more common attitude in the field.
The essence of the relational model is not about tables, but about first-order logic. Tables are simply a general-purpose data structure which can be used to represent relations. For example, a graph can be viewed in a relational way - as a set of ordered pairs - and can be represented as a table (with some rules to ensure the table is interpreted or manipulated correctly). By describing all data using domains, relations, dependencies and constraints, we can develop declarative consistency guarantees and allow any reasonable question to be answered correctly from the data.
A database engine is software that handles the data structure and physical storage and management of data. Different storage engines have different features and performance characteristics, so a single DBMS could use multiple engines. Ideally, they should not affect the logical view of data presented to users of the DBMS.
How easily you can migrate to another DBMS / engine depends on how much they differ. Unfortunately, every DBMS implements somewhat different subsets of the SQL standard, and different engines support different features. Trying to stick to the lowest common denominator tends to produce inefficient solutions. Object-relational mappers reintroduce the network data model and its associated problems which the relational model was meant to address. Other data access middleware generally don't provide a complete or effective data sublanguage.
Whatever approach you choose, changing it is going to be difficult. At least there's some degree of overlap between SQL implementations, and queries are shorter and more declarative than the equivalent imperative code, so I tend to stick to simple queries and result sets rather than using data access libraries or mappers.

A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model where in you can create many tables and have relations between them. While database engine is the underlying software component that a database management system (DBMS) uses to perform the operations from a database

Related

What is the difference between data model and database model? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I read many things about data model and database model, there are different views about this, some says both are same, some other says data model is a base for database model, but still i am confuse about this.
that what is data model and its example, what is database model and its example, and what are the differences between these two.
I am confused with the following Explanation:
The database design/model stores the structure of the data and the links/relationships between data that should be stored to meet the users’ requirements. Database design is stored in the database schema, which is in turn stored in the data dictionary. Data model is a set or collection of construct used for creating a database and producing designs for the databases.
There are a few components of a data model:
1. Structure: What structures can be used to store the data is identified by the structures provided by the data model structures.
2. Manipulation Language For using a certain model certain data manipulations are performed using a specific language. This specific language is called data manipulation language.
3. Integrity Constraints These are the rules which ensure the correctness of data in the database and maintain the database in usable state so that correct information is portrayed in designing the database.
If someone was to say "Data Model" to me I would assume they are talking about a data structure internal to the program most likely with respect to some Model/View approach (e.g. MVC, MVVM), so more focused on providing data for User Interface and service consumption and responding to changes to that data usually from the User Interface and services.
For Database Model I would assume they are looking at how they store this data within their database. Usually this is divided into a logical design, where the data is organised as per the database paradigm (e.g. relational) and then this leads to a physical design, which takes into account the limitations of the DB tech, as well as optimizations they want to include.
The classical definition of Data Model (at least in the context of Database Design), is a set of abstraction mechanisms used to represent a part of the reality in order to build a database. For instance, in the Entity-Relationship Data Model one can represent the reality with Entities (weak and strong) and Relationships among them; in the Object-Oriented Data Model one can represent the reality through Objects, and the related mechanisms of Aggregation (on object is an aggregate of simple properties and other objets), Class (a class is a set of object having the same type) and Inheritance; in the Relational Data Model (the model adopted by Relational Database Systems) the reality is represented through tables (or more correctly relations) with keys, foreign keys and other types of constraints, etc.
On the other hand, the term Database Model usually is the name of the model of the reality, built with a specific Data Model; in other words, it corresponds to a particular schema in a certain Database Management System, representing a specific reality (i.e. the result of the design of a certain database). For instance, in an Database Model for a University, you have the entities Students, Courses, Faculty, with several associations among them and each of them with a certain sets of attributes.

I want to move data from SQL server DB to Hbase/Cassandra etc.. How to decide which bigdata database to use? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to develop a plan to move data from SQL server DB to any of the bigdata databases? Some of the questions that I have thought of are :
How big is the data?
What is the expected growth rate for this data?
What kind of queries will be run frequently? eg: look-up, range-scan, full-scan etc
How frequently the data moved from source to destination?
Can anyone help add to this questionnaire?
Firstly, How big is the data doesn't matter! This point barely can be used to decide on which NoSQL DB to use as most NoSQL DBs are made for easy scalability & storage. So all that matters is the query you fire rather than how much data is there. (Unless of course you intend to use it for storage & access of very small amounts of data because they would be a little expensive in many of the NoSQL DBs) Your first question must be Why consider NoSQL? Can't RDBMS handle it?
Expected growth-rate is a considerable parameter but then again not so valid, since most of the NOSQL DBs support storage of large amounts of data (without any scalability issues).
The most important one in your list is What kind of queries will be run?
This matters most since the RDBMS stores data as tuples and its easier to select tuples & output them with smaller amounts of data. Its faster at executing * queries(as its row-wise storage). But coming to NoSQL, most DBs are columnar or Column-oriented DBMS.
Row-oriented system : As data is inserted into the table, it is assigned an internal ID, the rowid that is used internally in the system to refer to data. In this case the records have sequential rowids independent of the user-assigned empid.
Column-oriented systems : A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on.
Comparisons between row-oriented and column-oriented databases are typically concerned with the efficiency of hard-disk access for a given workload, as seek time is incredibly long compared to the other bottlenecks in computers.
How frequently the data will be moved/accessed? is again a good question as accesses are costly and few of the NoSQL DBs are very slow the first time a query is shot(Eg: Hive).
Other parameters you may consider are :
Are update of rows(data in the table) required? (Hive has problems with updation, you usually have to delete and insert again)
Why are you using the database? (Search, derive relationships or analytics, etc) What type of operations would you want to perform on the data?
Will it require relationship searches? Like in case of Facebook Db(Presto)
Will it require aggregations?
Will it be used to relate various columns to derive insights?(like analytics to be done)
Last but a very important one, Do you want to store that data on HDFS(Hadoop distributed File System) as files or your DB's specific storage format or anything else? This is important since your processing depends on how your data is stored, whether it can be accessed directly or needs a query call which may be time consuming , etc.
couple more pointers
Type of no-sql DB that suits your requirement. i.e. key-value, document, column family and graph databases
CAP theorem to decide which is more critical amongst Consistency, Availability and Partition tolerance

Are there any design patterns for bitemporal NoSQL databases? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm curious if anyone has implemented or even knows of any bitemporal databases built on NoSQL platforms (e.g., riak).
I don't know of any NoSQL datastore that are specifically designed to handle temporal data.
In order to put the valid and transaction time periods onto data in Riak you would need to either:
Wrap your documents/values with a structure that can hold metadata like:
{
meta:{
valid:["2001-11-08", "2001-11-09"],
transaction:["2011-01-29 10:27:00", "2011-01-29 10:28:00"]
}
payload:"This is the actual document/value I want to store!"
}
Create a "meta-document" for each document and use Riak Links to link them up.
I think this is a little bit cleaner but if you need to retrieve these times often then this method may be too slow.
If you want to retrieve documents by time then I don't think Riak (or any other key/value datastores that I know of) will be the right datastore to use. SQL or possibly some BigTable system may be your only good option.
I have written a small bitemporal, open source database layer based on Mongodb:
https://github.com/1123/bitemporaldb
When storing Scala or Java objects, the object is wrapped into a generic bitemporal object with bitemporal meta-information (valid time, transaction time). Subsequently it is serialized to json and stored as BSON in MongoDB.
It handles temporal and non-temporal updates to objects transparently. Search by bitemporal context is possible.
Document-oriented databases for bitemporal data are beneficial, since document oriented storage reduces the number of joins for data retrieval. Joins in a bitemporal context can be inefficient and hard to code by hand.
Feedback, contribution and feature-requests are very welcome.
To support a bitemporal (or temporal db model), you need acid transactions to perform the proper DML to update and insert records on two time dimensions (valid/effective time and transaction/system time). See for details on temporal modeling.
The popular NoSQL database like Cassandra, MongoDB, Couchbase, for example, don't have ACID support to perform the necessary record update/insert operations needed to support bitemporal record manipulation. With temporal and bitemporal databases records must never overlap and records must properly be terminated when superseded by succeeding valid/transaction time records.
MarkLogic NoSQL database claims support for bitemporal, but never tried it and is not open source. But you can roll your own solution by using ACID database that effectively functions as a valid/transaction time tracking journal and then use NoSQL for the actual data store. See high-level description of this approach here.
From Wikipedia:
"Bitemporal data is a concept used in a temporal database. It denotes both the valid time and transaction time of the data.
In a database table bitemporal data is often represented by four extra table-columns StartVT and EndVT, StartTT and EndTT. Each time interval is closed at its lower bound, and open at its upper bound."
So you can't just put these four values onto your data?

When to use a key-value data store vs. a more traditional relational DB? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
When would one choose a key-value data store over a relational DB? What considerations go into deciding one or the other? When is mix of both the best route? Please provide examples if you can.
Key-value, heirarchical, map-reduce, or graph database systems are much closer to implementation strategies, they are heavily tied to the physical representation. The primary reason to choose one of these is if there is a compelling performance argument and it fits your data processing strategy very closely. Beware, ad-hoc queries are usually not practical for these systems, and you're better off deciding on your queries ahead of time.
Relational database systems try to separate the logical, business-oriented model from the underlying physical representation and processing strategies. This separation is imperfect, but still quite good. Relational systems are great for handling facts and extracting reliable information from collections of facts. Relational systems are also great at ad-hoc queries, which the other systems are notoriously bad at. That's a great fit in the business world and many other places. That's why relational systems are so prevalent.
If it's a business application, a relational system is almost always the answer. For other systems, it's probably the answer. If you have more of a data processing problem, like some pipeline of things that need to happen and you have massive amounts of data, and you know all of your queries up front, another system may be right for you.
If your data is simply a list of things and you can derive a unique identifier for each item, then a KVS is a good match. They are close implementations of the simple data structures we learned in freshman computer science and do not allow for complex relationships.
A simple test: can you represent your data and all of its relationships as a linked list or hash table? If yes, a KVS may work. If no, you need an RDB.
You still need to find a KVS that will work in your environment. Support for KVSes, even the major ones, is nowhere near what it is for, say, PostgreSQL and MySQL/MariaDB.
IMO, Key value pair (e.g. NoSQL databases) works best when the underlying data is unstructured, unpredictable, or changing often. If you don't have structured data, a relational database is going to be more trouble than its worth because you will need to make lots of schema changes and/or jump through hoops to conform your data to the structure.
KVP / JSON / NoSql is great because changes to the data structure do not require completely refactoring the data model. Adding a field to your data object is simply a matter of adding it to the data. The other side of the coin is there are fewer constraints and validation checks in a KVP / Nosql database than a relational database so your data might get messy.
There are performance and space saving benefits for relational data models. Normalized relational data can make understanding and validating the data easier because there are table key relationships and constraints to help you out.
One of the worst patterns i've seen is trying to have it both ways. Trying to put a key-value pair into a relational database is often a recipe for disaster. I would recommend using the technology that suits your data foremost.
If you want O(1) lookups of values based on keys, then you want a KV store. Meaning, if you have data of the form k1={foo}, k2={bar}, etc, even when the values are larger/ nested structures, and want fast lookups, you want a KV store.
Even with proper indexing, you cannot achieve O(1) lookups in a relational DB for arbitrary keys. Sometimes this is referred to as "random lookups".
Alliteratively stated, if you only ever query by one column, a "primary key" if you will, to retrieve the rest of the data, then using that column as a keyspace and the rest of the data as a value in a KV store is the most efficient way to do lookups.
In contrast, if you often query the data by any of several columns, aka you support a richer query API for the data, then you may want a relational database.
A traditional relational database has problems scaling beyond a point. Where that point is depends a bit on what you are trying to do.
All (most?) of the suppliers of cloud computing are providing key-value data stores.
However, if you have a reasonably sized application with a complicated data structure, then the support that you get from using a relational database can reduce your development costs.
In my experience, if you're even asking the question whether to use traditional vs esoteric practices, then go traditional. While esoteric practices are sexy, challenging, and fun, 99.999% of applications call for a traditional approach.
With regards to relational vs KV, the question you should be asking is:
Why would I not want to use a relational model for this scenario: ...
Since you have not described the scenario, it's impossible for anyone to tell you why you shouldn't use it. The "catch all" reason for KV is scalability, which isn't a problem now. Do you know the rules of optimization?
Don't do it.
(for experts only) Don't do it now.
KV is a highly optimized solution to scalability that will most likely be completely unecessary for your application.

DataModel for Workflow/Business Process Application [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What should be the data model for a work flow application? Currently we are using an Entity Attribute Value based model in SQL Server 2000 with the user having the ability to create dynamic forms (on asp.net), but as the data grows performance is getting down and hard to generate report and worse if too many users concurrently query the data (EAV).
As you have probably realized, the problem with an EAV model is that tables grow very large and queries grow very complex very quickly. For example, EAV-based queries typically require lots of subqueries just to get at the same data that would be trivial to select if you were using more traditionally-structured tables.
Unfortunately, it is quite difficult to move to a traditionally-structured relational model while simultaneously leaving old forms open to modification.
Thus, my suggestion: consider closing changes on well-established forms and moving their data to standard, normalized tables. For example, if you have a set of shipping forms that are not likely to change (or whose change you could manage by changing the app because it happens so rarely), then you could create a fixed table and then copy the existing data out of your EAV table(s). This would A) improve your ability to do reporting, B) reduce the amount of data in your existing EAV table(s) and C) improve your ability to support concurrent users / improve performance because you could build more appropriate indices into your data.
In short, think of the dynamic EAV-based system as a way to collect user's needs (they tell you by building their forms) and NOT as the permanent storage. As the forms evolve into their final form, you transition to fixed tables in order to gain the benefits discussed above.
One last thing. If all of this isn't possible, have you considered segmenting your EAV table into multiple, category-specific tables? For example, have all of your shipping forms in one table, personnel forms in a second, etc. It won't solve the querying structure problem (needing subqueries) but it will help shrink your tables and improve performance.
I hope this helps - I do sympathize with your plight as I've been in a similar situation myself!
Typically, when your database schema becomes very large and multiple users are trying to access the same information in many different ways, Data Warehousing, is applied in order to reduce major load on the database server. Unlike your traditional schema where you are more than likely using Normalization to keep data integrity, data warehousing is optimized for speed and multiple copies of your data are stored.
Try using the relational model of data. It works.

Resources