Redis cache and overall process - sql-server

I am having trouble understanding the start to finish call process of an application using redis as a cache. Say an application has a DB such as SQL and is using redis to cache how does the timing of that process work? To my understanding redis makes a call to the SQL DB to cache data and the UI calls and consumes that data directly from redis therefore limiting the calls to the SQL DB. Typically when does redis make the call to the DB to get its data? and how does it keep it in sync with the SQL DB? I am just trying to understand this high level. Thank you!

Redis is just a cache, it provides a high-speed data store, whereas a database is slower but more reliable and with more features. Your application is responsible for getting data from Redis, and if it doesn't find it, pulling it from the database and adding it to Redis, with appropriate expiration and invalidation mechanics.

Let's walk through a simplified example. Let's consider a hypothetical application in Python that uses Redis to cache a user object. The user objects are cached in Redis under the key user:{id} in a Hash datatype. The system of record for the user data is stored in a relational database. Id is an integer, pk generated in the db.
To fetch a user we would execute code something like this:
# r = redis.StrictRedis(hostname)
def get_user(r, id):
key = "user:{}".format(id)
user = r.hgetall(key)
if user is None:
return fetch_user_from_db(id)
else:
return user
To write a user we would execute code similar to:
# r = redis.StrictRedis(hostname)
def update_user(r, user):
key = "user:{}".format(user['id'])
r.delete(key)
write_user_to_db(user)
This simplified example leaves out many of the details of keeping your cache and db consistency in a distributed environment, but for a single node this is the basic process. Your app has to handle the details of caching, checking for a cache hit and invalidating on write.

I myself have never come across a scenario where Redis calls a SQL DB. Since Redis is a key-value store that keeps its own data in RAM, it's usually used directly for caching by apps. There's no involvement of the SQL DB at all in these cases.
The greatest strength of Redis is that many operations return data in constant time, independent on how many elements are stored. This is why it's great for many use cases like caching and locking mechanisms that need very quick response times.

Related

Best way to return huge dataset where data can be analyzed asynchronously

This is a general question on handling huge sets of data returned for analysis.
I am using python, but don't think the programming language or db server type is important for this.
I currently have a huge set of data that is returned which takes about 20mins. I can use the data as soon as it arrives for analysis by using multiple threads/processes. The problem is waiting for the data.
I belive I could use paging for the data, but this still requires me to wait for the data which is really the problem.
I could break the query for the data into many separate calls so I can have 10 calls going at once all grabbing a different parts of the table via a where clause.
Is there a better way? Again, I need to get the data from the table(s) to the application as fast as possible.
Also - when I query a database for data, what is it that really determines the speed that the data returns?
Let's say the db server and my app are on the same machine. Is there a limiting speed due to a single connection/request to the db server?
If I am on an intranet - will the db connection use up as much bandwith as possible when sending?
do you know any into tutorials on performance of a database query (no where clause - just return all rows)? Will the query use the max connection speed and bandwidth?
Thanks - yes I am a new to db considerations.

Real-time chats using Redis - how not to lose messages?

I want to implement a real-time chat. My main db is PostgreSQL, with the backend written in NodeJS. Clients will be mobile devices.
As far as I understand, to achieve real-time performance for messaging, I need to use Redis.
Therefore, my plan was to use Redis for the X most recent messages between 2 or more people(group chat) , for example 1000, and everything is synced and backed in my main Db which is PostgreSQL.
However, since Redis is essentially just RAM, the chat history can be too "vulnerable", owing to the volatile nature of storing data in RAM.
So if my redis server has some unexpected and temporary failure, the recent messages in conversations would be lost.
What are the best practices nowaydays to implement something like this?
Do I simply need to persist Redis data to disk? but in that case, wouldn't that hurt performance, since it will increase the write time for each message sent ?
Or perhaps I should just prepare a recovery method, that fetches the recent history from PostgreSQL in case my redis chat history list is empty?
P.S - while we're at it, could you please suggest a good approach for maintaining the status of users (online/offline) ? Is it done with Redis as well?
What are the best practices nowaydays to implement something like
this? Do I simply need to persist Redis data to disk? but in that
case, wouldn't that hurt performance, since it will increase the write
time for each message sent?
Yes, enabling persistence will impact performance of redis.
The best bet will be run a quick benchmark with the expected IOPS and type of operations from your application to identify impacts on IOPS with persistence enabled.
RBD vs AOF:
With RDB persistence enabled, the parent process does not perform disk I/O to store changes to data to RDB. Based on the values of save points, redis forks a child process to perform RDB.
However, based on the configuration of save points, you may loose data written after last save point - in case of the event of server restart or crash if data was not saved from last save point
If your use case can not tolerate to the data loss for this period, you need to look at the AOF persistence method. AOF will keep track of all write operations, that can be used to construct data upon server restart event.
AOF with fsync policy set to every second can be slower, however, it can be as good as RDB if fsync is disabled.
Read the trade-offs of using RDB or AOF: https://redis.io/topics/persistence#redis-persistence
P.S - while we're at it, could you please suggest a good approach for
maintaining the status of users (online/offline) ? Is it done with
Redis as well?
Yes

When is SQL Server as a distributed caching mechanism worthwhile?

I have 2 web servers, and I'm running into an issue where I need to prematurely expire (remove) a cached item. Since I'm currently using IMemoryCache, a Remove(key) call only removes the cached item from one server. I don't have the ability to leverage Redis, Nache, etc. but the app is already using SQL server. I can easily set up distributed caching with a cache table, but it seems counter-intuitive because what I'm caching is user data that I don't want to hit the database for on every call (e.g., I cache 50 items of user data every 5 minutes which has cut down on 500 trips to the database). Is there something I'm missing which would make using SQL server as my distributed cache backend actually beneficial?
Sounds like you are having the typical problem of cache invalidation and expiry. You can use a grid-cache for distributed caching (e.g. Redis, Hazelcast) but it doesn't solve the invalidation problem. You may want to consider vendors like ScaleArc or Heimdall Data. They provide the caching logic. You choose the storage of choice (in-memory, Redis etc.) and it handles query caching and invalidation. The is SQL Server blog on it: https://www.itprotoday.com/industry-perspectives/reduce-sql-server-costs-heimdall-data-caching

Data consistency across multiple microservices, which duplicate data

I am currently trying to get into microservices architecture, and I came across Data consistency issue. I've read, that duplicating data between several microservices considered a good idea, because it makes each service more independent.
However, I can't figure out what to do in the following case to provide consistency:
I have a Customer service which has a RegisterCustomer method.
When I register a customer, I want to send a message via RabbitMQ, so other services can pick up this information and store in its DB.
My code looks something like this:
...
_dbContext.Add(customer);
CustomerRegistered e = Mapper.Map<CustomerRegistered>(customer);
await _messagePublisher.PublishMessageAsync(e.MessageType, e, "");
//!!app crashes
_dbContext.SaveChanges();
...
So I would like to know, how can I handle such case, when application sends the message, but is unable to save data itself? Of course, I could swap DbContextSave and PublishMessage methods, but trouble is still there. Is there something wrong with my data storing approach?
Yes. You are doing dual persistence - persistence in DB and durable queue. If one succeeds and other fails, you'd always be in trouble. There are a few ways to handle this:
Persist in DB and then do Change Data Capture (CDC) such that the data from the DB Write Ahead Log (WAL) is used to create a materialized view in the second service DB using real time streaming
Persist in a durable queue and a cache. Using real time streaming persist the data in both the services. Read data from cache if the data is available in cache, otherwise read from DB. This will allow read after write. Even if write to cache fails in worst case, within seconds the data will be in DB through streaming
NServiceBus does support durable distributed transaction in many scenarios vs. RMQ.Maybe you can look into using that feature to ensure that both the contexts are saved or rolled back together in case of failures if you can use NServiceBus instead of RMQ.
I think the solution you're looking for is outbox pattern,
there is an event related database table in the same database as your business data,
this allows them to be committed in the same database transaction,
and then a background worker loop push the event to mq

How to update redis after updating database?

I cache some data in redis, and reading data from redis if it's exists, otherwise reading data from database and write the data in redis.
I find that there are several ways to update redis after updating database.For example:
set keys in redis to expired
update redis immediately after updating datebase.
put data in MQ and use consumer to update redis.
I'm a little confused and don't know how to choose.
Could you tell me the advantage and disadvantage of each way and it's better to tell me other ways to update redis or recommend some blog about this problem.
Actual data store and cache should be synchronized using the third approach you've already described in your question.
As you add data to your definitive store (i.e. your SQL database), you need to enqueue this data to some service bus or message queue, and let some asynchronous service do the whole synchronization using some kind of background process.
You don't want get into this cases (when not using a service bus and asynchronous service):
Make your requests or processes slower because the user needs to wait until the data is both stored in your database and cache.
Have the risk of a fail during the caching process and not being able to have a retry policy (which is usually a built-in feature in a service bus or some message queues). Also, this failure can end up in a partial or complete cache corruption and you won't be able to automatically and easily schedule some task to fix this situation.
About using Redis key expiration, it's a good idea. Since Redis can expire keys using its built-in mechanism, you shouldn't implement key expiration from the whole background process. If a key exists is because it's still valid.
BTW, you won't be always on this case (if a key isn't expired it means that it shouldn't be overwritten). It might depend on your actual domain.
You can create an api to interact with your redis server, then use SQL CLR to call the call api

Resources