I'm working on blockchain explorer where two nodes are running.One to insert data from blockchain to postgres database and another one to fetch using api calls.
Now database entries are reached over million records and single api tooks too much time to fetch data especially when I have to sort and fetch latest records.
I have found redis can be best option but don't know how to place latest records in redis from postgres.
Any idea how can I get latest records fast ?
Related
We have ETL process which ingest data every 5 mins once from different source system(as400,orcale,sap etc) into our sqlserver database, and from there we ingest data into elastic index every 5 mins so that both are in sync.
I wanted to tighten the timeframe to seconds rather than 5 mins and i wanted to make sure they both are in sync all time.
I am using control log table to make sure elastic ingestion and SSIS ETL are not running at the same time, so that we might go out of sync. which is very poor solution and not allowing me to achieve near real time data capture
I am looking for better solution to sync sqlserver database and elastic index in near real time rather than doing manually.
Note:I am using python scripts for pumping the data from sql to elastic index currently.
One approach would be to have an event stream coming out of your database or even directly out of the SSIS package run (which might actually be simpler to implement) that feeds directly into your elastic search index. ELK handles streaming log files so should handle an event stream pretty well.
i'm looking for the best database for my big data project.
We are collecting data from some sensors. Every row has about one hundred column.
every day we store some milions of rows.
The most common query is for retreiving data for one sensor in a range of date.
at the moment i use a percona mysql cluster. when i ask data for a range on some days, the response is fast. The problem is when i ask data for a month.
The database is perfectly optimized, but the response time is not acceptable.
I would like to change percona cluster with a database able to perform query in parallel on all the nodes to improve response time.
With Cassandra i could partition data accross nodes (maybe based on the current date) but i have read that cassandra cannot read data between partition in parallel, but i have to create a query for every day. (i don't know why)
Is there a database that manage shard queries automatically, so i can distribute data across all nodes?
With Cassandra, If you split your data across multiple partitions, you still can read data between partition in parallel by executing multiples queries asynchronously.
Cassandra drivers help you handle this, see execute_concurrent from the python driver.
Moreover, the cassandra driver is aware of the data partitioning, it knows which node holds which data. So when reading or writing, it chooses an appropriate node to send the query, according to the driver load balancing policy (specifically with the TokenAwarePolicy).
Thus, the client acts as a load balancer, and your request is processed in parallel by the available nodes.
I have an application set up using mysql at the backend with about 130 tables, total size is currently more than 30-40 GB and growing fast.
Our db is well optimized but we believe that due to the size of the database , the performance is taking a hit.
I need to implement a process to archive data, after a little reading i read that i could push all archivable data to hadoop , what i need to know is , is there any way by which i can directly hit hadoop to retrieve data from my backend (codeigniter,cakephp,django etc...) Thanks
I think you could try Apache Sqoop: http://sqoop.apache.org/
Sqoop 1 was originally designed for moving data from relational databases to Hadoop. Sqoop 2 is more ambitious and aims to move data between any two sources.
I need to merge data from a mssql server and rest service on the fly. I have been asked to not store the data permanently in the mssql database as it changes periodically (caching would be OK, I believe as long as the cache time was adjustable).
At the moment, I am querying for data, then pulling joined data from a memory cache. If the data is not in cache, I call a rest service and store the result in cache.
This can be cumbersome and slow. Are there any patterns, applications or solutions that would help me solve this problem?
My thought is I should move the cached data to a database table which would speed up joins and have the application periodically refresh the data in the database table. Any thoughts?
You can try Denodo. It allows connecting multiple data source and has inbuild caching feature.
http://www.denodo.com/en
My requirement is just, i need to store a massive data (round about 50k records) in a single call and fetch that data in single call too. Which noSQL category will be best. Also that category should be best compatible with Microsoft azure.