Best noSQL to just insert and fetch massive data - database

My requirement is just, i need to store a massive data (round about 50k records) in a single call and fetch that data in single call too. Which noSQL category will be best. Also that category should be best compatible with Microsoft azure.

Related

How can I use Redis to fetch data from large dataset?

I'm working on blockchain explorer where two nodes are running.One to insert data from blockchain to postgres database and another one to fetch using api calls.
Now database entries are reached over million records and single api tooks too much time to fetch data especially when I have to sort and fetch latest records.
I have found redis can be best option but don't know how to place latest records in redis from postgres.
Any idea how can I get latest records fast ?

Analytics along with OLTP Database

I have a primary use case where I want to have a transactional relational database for which I am using Postgres.
I also need to run frequent aggregate queries (count, sum, average) on the data. These statistics cannot be precomputed as there are multiple filters for search that we have to provide.
I was initially thinking of using Redshift as a secondary storage, which can serve these queries, but then I would also need to build a system to keep the data in sync between the two storages.
Is there a better way to achieve this?
Take a look at AWS DMS, you can set this up to keep a near real time replica of your Postgres data on Redshift.
It is reliable and requires minimal maintenance (e.g. if you add new columns to your source data).
Read both of these carefully, especially limitations and requirements.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html
and
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redshift.html
Unless you need them, I recommend excluding text (and other large object) columns from the sync. this can be done easily by setting a flag, or can be tailored column by column.
The source Postgres database does not have to be held on AWS.

Using a regular database as a data warehouse

Can anyone tell me what the implications are when attempting to use a regular database as a data warehouse?
I understand a data warehouse is known for storing data in a more structured manner however what's the implication of using a standard database to achieve the same result? Can we not just create a regular database table with structured data as it would reside in a data warehouse?
Data structure is not the issue - optimization is.
OLTP databases like SQLS are optimized to reliably record transactions. They store data as records, and extensively use disk I/O.
BI databases like Redshift or Teradata are optimized to query data. They store data as columns, and often are in-memory only (no disk I/O).
As a result, traditional databases are better at getting data in, while BI databases are better at getting data out (both platforms are trying to mitigate their weaknesses, so the difference is blurring).
Practically speaking, you can use regular databases like SQLS to build a data warehouse without any problems, unless your needs are special:
Data size is large (billions of records)
Refresh rate is high (hour/minute/real time)
You intend to use live connection from BI tools like Tableau or PowerBI (as opposed to loading data extract into them)
Your queries are highly complex and computationally intensive
You can also combine both platforms. Import, process, integrate and store data in a regular database, and then convert it into a star schema (dimensional model) and publish it to a BI database (i.e, keep normalized data in SQLS and publish star schema to Redshift).
If you intend to import data into BI tools like Tableau or PowerBI, then you can safely use any traditional database, because they rely on their internal engines and using BI database won't give you any advantages.
data warehouses also will have redundant or duplicate data in them, not really what you are looking for in a regular database

why operational database are not fulfilling business challenges as data warehouse?

i have a question why operational database are not fulfilling business challenges as data warehouse?
in operational database i can create reports in details about any product or any thing and i can issue statistical reports with charts and diagrams, so why the operational database can not use as data warehouse?
Best Regards
Usually an operational database only keeps track of the current state of each record.
The purpose of a data warehouse is two-fold:
- Keep track of historic events without overwhelming the operational database;
- Isolate OLAP queries so that they don't impact the load on the operational datastore.
If you try to query your operational data store for sales per product line per month for the past year the amount of joins required, as well as the amount of information you need to read from storage may cause performance degradation on your operational database.
A data warehouse tries to avoid this by 1) keeping things separated and 2) denormalising the data model (Kimball approach) so that query plans are simpler.
I suggest reading The Data Warehouse Toolkit, by Ralph Kimball. The first chapter deals precisely with this question: why do we need a data warehouse if we already have an operational data store?
i can create reports in details about any product or any thing
and i can issue statistical reports with charts and diagrams
Yes you can but a business user can not as they don't know SQL. And you it's very difficult to put a BI tool (for a business users to use) over the top of an operational database for many reasons:
The data model is not built for an end user to understand. A data warehouse data model is (i.e. there is ONE table for customers that has everything about a customer in it rather than being split into addresses, accounts etc.)
The operational data store is probably missing important reporting reference data such as grouping levels and hierarchies
A slowly changing dimension is a method of 'transparently' modelling changes to for example, customers. An operational data model generally doesn't do this very well. You need to understand all the table and join them correctly, if this information is even stored
There are many other reasons but these just serve to address your points
When you get to the point that you are too busy to service business users requests, and you are issuing reports that don't match from one day to the other, you'll start to see the value of a data warehouse.

Need Suggestions: Utilizing columnar database

I am working on a project which is highly performance dashboard where results are mostly aggregated mixed with non-aggregated data. First page is loaded by 8 different complex queries, getting mixed data. Dashboard is served by a centralized database (Oracle 11g) which is receiving data from many systems in realtime ( using replication tool). Data which is shown is realized through very complex queries ( multiple join, count, group by and many where conditions).
The issue is that as data is increasing, DB queries are taking more time than defined/agreed. I am thinking to move aggregated functionality to Columnar database say HBase ( all the counts), and rest linear data will be fetched from Oracle. Both the data will be merged based on a key on App layer. Need experts opinion if this is correct approach.
There are few things which are not clear to me:
1. Will Sqoop be able to load data based on query/view or only tables? on continuous basis or one time?
2. If a record is modified ( e.g. status is changed), how will HBase get to know?
My two cents. HBase is a NoSQL database build for fast lookup queries, not to make aggregated, ad-hoc queries.
If you are planning to use a hadoop cluster, you can try hive with parquet storage formart. If you need near real-time queries, you can go with MPP database. A commercial option is Vertica or maybe Redshift from Amazon. For an open-source solution, you can use InfoBrigth.
These columnar options is going to give you a greate aggregate query performance.

Resources