Redis database cache in laravel 5.6 - database

I want cache my query results and I read about Cache::remember in laravel but it take a time parameter and I don't want to set time for my redis cache.
I need something to cache my queries and after queries have updated the results changed by the updating.
what's your recommendation?

Storing the full collection of eloquent models in redis can be slower than expected.
In my case, i had to create nested selection with lot of where, count, join, group by and order by ... etc.
It has consumed a lot of resources at every request, so i tried to cache the result. It was not the best solution, because it was (4 times) slower than i wanted (200+ ms response).
The solution is SELECT id FROM ... from "huge" query and store IDs in redis. After this the SQL query looks like SELECT * FROM <table> WHERE id IN (...); in every request. (Re-order the data in sql query if necessary)
In this way, the required data from redis and sql can be queried quickly. The average response time is less than 50 ms.
I hope this will help.

There is a very good library for this but they warn that it’s only compatible with laravel 5.8. If you could update, this is a way to go. If updating laravel is not an option, at least you can read the code and try to follow the same direction they did.
https://github.com/GeneaLabs/laravel-model-caching
This library does exactly what you need. You can have your models and/or custom queries cached and you can invalidate this cache whenever a model gets updated, created or deleted.

Related

idiomatic way to do many dynamic filtered views of a Flink table?

I would like to create a per-user view of data tables stored in Flink, which is constantly updated as changes happen to the source data, so that I can have a constantly updating UI based on a toChangelogStream() of the user's view of the data. To do that, I was thinking that I could create an ad-hoc SQL query like SELECT * FROM foo WHERE userid=X and convert it to a changelog stream, which would have a bunch of inserts at the beginning of the stream to give me the initial state, followed by live updates after that point. I would leave that query running as long as the user is using the UI, and then delete the table when the user's session ends. I think this is effectively how the Flink SQL client must work, so it seem like this is possible.
However, I anticipate that there may be some large overheads associated with each ad hoc query if I do it this way. When I write a SQL query, based on the answer in Apache Flink Table 1.4: External SQL execution on Table possible?, it sounds like internally this is going to compile a new JAR file and create new pipeline stages, I assume using more JVM metaspace for each user. I can have tens of thousands of users using the UI at once, so I'm not sure that's really feasible.
What's the idiomatic way to do this? The other ways I'm looking at are:
I could maybe use queryable state since I could group the current rows behind the userid as the key, but as far as I can tell it does not provide a way to get a changelog stream, so I would have to constantly re-query the state on a periodic basis, which is not ideal for my use case (the per-user state can be large sometimes but doesn't change quickly).
Another alternative is to output the table to both a changelog stream sink and an external RDBMS sink, but if I do that, what's the best pattern for how to join those together in the client?

Database cache ( redis, memcache ) usage, query vs. items

I'm wondering what the preferred way is to cache elements from a database with an in-memory cache, like redis or memcache. The context is that I have a table of items which are being accessed by an API, frequently ( millions of times per second ) as real-time stats. In general, the API is just looking for items in a given range of time, with a certain secondary id. The same data is likely to be hit many times. It seems like you could do it in a few ways:
Cache the entire query.
Meaning, the entire data string resulting from the real query to the Database would get stored in the cache, with a minimal query as the key. The advantage is that for frequently used queries, there is just a single access to get the entire set of results back. But any slightly different query needs to be redone and cached.
Cache the items in the query.
Meaning, each item returned from the real query gets stored individually in the cache, with a searchable id as the key. The advantage is that for slightly different queries, you don't need to run a full query against the DB again, just elements that are not currently cached.
Mirror the entire database
Meaning, each item is put into the cache as soon as it gets created/udpdated in the DB. The cache is always assumed to be up to date, and so all queries can just run on the cache directly.
It seems like these approaches might be better or worse in certain circumstances, but are there some pitfalls here that make some completely undesirable? Or just clearly better in this use-case?
Thanks for any advice!
#3 i.e., Mirroring the database is not a good option. Also, keep in mind that most in memory systems like Redis don't have a query langurage but rather retreival is based on Keys. So, it is not a good idea to replicate data, especially if data is relational.
You should use a combination of #1 and #2. Redis is key based, so you will have to design the keys as per your query criteria. I would suggest to build a library that works on the concept of etag. In redis, save the etag and the query response. The library should pass the etag to backend logic, which will re-run the query only if etag doesn't match. If the etag matches then backend will not re-run query and library will take the cached response from redis and send back to client.
Refer
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag for concept.

Preventing duplicates with MapReduce to BigQuery pipeline

I was reading the answer by Michael to this post here, which suggests using a pipeline to move data from datastore to cloud storage to big query.
Google App Engine: Using Big Query on datastore?
I want to use this technique to append data to a bigquery table. That means I have to have some way of knowing if the entities have been processed, so they don't get repeatedly submitted to bigquery during mapreduce runs. I don't want to rebuild my table each time.
The way I see it, I have two options. I can put a flag on the entities and update it when each entity is processed and filter it out on subsequent runs - or - I can save each entity to a new table and delete it from the source table. The second way seems superior but I wanted to ask for options or see if there's any gotchas
Assuming you have some stream of activity represented as entities, you can use query cursors to start up one query where a prior one left off. Query cursors are perfect for the type of incremental situation that you've described, because they avoid the overhead for marking entities as having been processed.
I'd have to poke around a bit to see if App Engine MapReduce supports cursors (I suspect that it doesn't, yet).

Choosing a DB for a caching system

I am working on a financial database that I need to develop caching for. I have a MySQL database with a lot of raw, realtime data. This data is then provided over a HTTP API using Flask (Python).
Before the raw data is returned it is manipulated by my python code. This manipulation can involve a lot of data, therefore a caching system is in order.
The cached data never changes. For example, if someone queries for data for a time range of 2000-01-01 till now, the data will get manipulated, returned and stored in the cache as being the specifically manipulated data from 2000-01-01 till now. If the same manipulated data is queried again later, the cache will retrieve the values from 2000-01-01 till the last time it was queried, elimination the need for manipulation for that entire period. Then, it will manipulate the new data from that point till now, and add that to the cache too.
The data size shouldn't be enormous (under 5GB I would say at max).
I need to be able to retrieve from the cache using date ranges.
Which DB should I be looking it? MongoDB? Redis? CouchDB?
Thanks!
Using BigData solution for such a small data set seems like a waste and might still not yell the required latency.
It seems like what you need is not one of the BigData solution like MongoDB or CouchDB but a distributed Caching (or In Memory Data Grid).
One of the leading solution which (which I'm one of its contributors) seems like a perfect match for you needs is XAP Elastic Caching.
For more details see: http://www.gigaspaces.com/datagrid
And you can find a post describing exactly this case on how you can use DataGrid to scale MySQL: "Scaling MySQL" - http://www.gigaspaces.com/mysql

Retrieving information from aggregated weblogs data, how to do it?

I would like to know how to retrieve data from aggregated logs? This is what I have:
- about 30GB daily of uncompressed log data loaded into HDFS (and this will grow soon to about 100GB)
This is my idea:
- each night this data is processed with Pig
- logs are read, split, and custom UDF retrieves data like: timestamp, url, user_id (lets say, this is all what I need)
- from log entry and loads this into HBase (log data will be stored infinitely)
Then if I want to know which users saw particular page within given time range I can quickly query HBase without scanning whole log data with each query (and I want fast answers - minutes are acceptable). And there will be multiple querying taking place simultaneously.
What do you think about this workflow? Do you think, that loading this information into HBase would make sense? What are other options and how do they compare to my solution?
I appreciate all comments/questions and answers. Thank you in advance.
With Hadoop you are always doing one of two things (either processing or querying).
For what you are looking to-do I would suggest using HIVE http://hadoop.apache.org/hive/. You can take your data and then create a M/R job to process and push that data how you like it into HIVE tables. From there (you can even partition on data as it might be appropriate for speed to not look at data not required as you say). From here you can query out your data results as you like. Here is very good online tutorial http://www.cloudera.com/videos/hive_tutorial
There are a lots of ways to solve this but it sounds like HBase is a bit overkill unless you want to setup all the server required for it to run as an exercise to learn it. HBase would be good if you have thousands of people simultaneously looking to get at the information.
You might also want to look into FLUME which is new import server from Cloudera . It will get your files from some place straight to HDFS http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-flume/

Resources