How to update redis after updating database? - database

I cache some data in redis, and reading data from redis if it's exists, otherwise reading data from database and write the data in redis.
I find that there are several ways to update redis after updating database.For example:
set keys in redis to expired
update redis immediately after updating datebase.
put data in MQ and use consumer to update redis.
I'm a little confused and don't know how to choose.
Could you tell me the advantage and disadvantage of each way and it's better to tell me other ways to update redis or recommend some blog about this problem.

Actual data store and cache should be synchronized using the third approach you've already described in your question.
As you add data to your definitive store (i.e. your SQL database), you need to enqueue this data to some service bus or message queue, and let some asynchronous service do the whole synchronization using some kind of background process.
You don't want get into this cases (when not using a service bus and asynchronous service):
Make your requests or processes slower because the user needs to wait until the data is both stored in your database and cache.
Have the risk of a fail during the caching process and not being able to have a retry policy (which is usually a built-in feature in a service bus or some message queues). Also, this failure can end up in a partial or complete cache corruption and you won't be able to automatically and easily schedule some task to fix this situation.
About using Redis key expiration, it's a good idea. Since Redis can expire keys using its built-in mechanism, you shouldn't implement key expiration from the whole background process. If a key exists is because it's still valid.
BTW, you won't be always on this case (if a key isn't expired it means that it shouldn't be overwritten). It might depend on your actual domain.

You can create an api to interact with your redis server, then use SQL CLR to call the call api

Related

Flink job dynamic input parameters

One parameter for my flink job is dynamic and i have an api so as to fetch the dynamic value. Can i call the api in source everytime so as to fetch data based on the parameter? Is it the correct way? Will it cause any trouble in flink job?
So, If I understand correctly the idea is that You first get some key from dynamoDB and then use that to query external service from the source.
I think that should be possible in general, but there are few things to have in mind when doing that.
Not sure about performance of such solution. Are You going to query database constantly? Or somehow just get changes ? There are several things to consider here to have good performance of the source.
It may be hard to provide any strong guarantees for such setup, but that depends on the charcteristics of the setup itself. I.e. how are You going to handle failures? How often will key in database change? Will the data be accessible via URL after the key in DB changes ? You probably can keep the last read key in state, so that when the job fails and key in DB changes You can try to read the data for the previous key (for which the job has failed) but that depends on the questions above.
Finally, depending on the characteristics of the setup, it may be possible to use existing Flink operators to achieve that. For example, You can technically stream changes from Database (using one of existing connectors depending on DB) and then use that data in AsyncIO to query external URL, so that finally You have a stream of data from URL witout creating Your own source.

Real-time chats using Redis - how not to lose messages?

I want to implement a real-time chat. My main db is PostgreSQL, with the backend written in NodeJS. Clients will be mobile devices.
As far as I understand, to achieve real-time performance for messaging, I need to use Redis.
Therefore, my plan was to use Redis for the X most recent messages between 2 or more people(group chat) , for example 1000, and everything is synced and backed in my main Db which is PostgreSQL.
However, since Redis is essentially just RAM, the chat history can be too "vulnerable", owing to the volatile nature of storing data in RAM.
So if my redis server has some unexpected and temporary failure, the recent messages in conversations would be lost.
What are the best practices nowaydays to implement something like this?
Do I simply need to persist Redis data to disk? but in that case, wouldn't that hurt performance, since it will increase the write time for each message sent ?
Or perhaps I should just prepare a recovery method, that fetches the recent history from PostgreSQL in case my redis chat history list is empty?
P.S - while we're at it, could you please suggest a good approach for maintaining the status of users (online/offline) ? Is it done with Redis as well?
What are the best practices nowaydays to implement something like
this? Do I simply need to persist Redis data to disk? but in that
case, wouldn't that hurt performance, since it will increase the write
time for each message sent?
Yes, enabling persistence will impact performance of redis.
The best bet will be run a quick benchmark with the expected IOPS and type of operations from your application to identify impacts on IOPS with persistence enabled.
RBD vs AOF:
With RDB persistence enabled, the parent process does not perform disk I/O to store changes to data to RDB. Based on the values of save points, redis forks a child process to perform RDB.
However, based on the configuration of save points, you may loose data written after last save point - in case of the event of server restart or crash if data was not saved from last save point
If your use case can not tolerate to the data loss for this period, you need to look at the AOF persistence method. AOF will keep track of all write operations, that can be used to construct data upon server restart event.
AOF with fsync policy set to every second can be slower, however, it can be as good as RDB if fsync is disabled.
Read the trade-offs of using RDB or AOF: https://redis.io/topics/persistence#redis-persistence
P.S - while we're at it, could you please suggest a good approach for
maintaining the status of users (online/offline) ? Is it done with
Redis as well?
Yes

Data consistency across multiple microservices, which duplicate data

I am currently trying to get into microservices architecture, and I came across Data consistency issue. I've read, that duplicating data between several microservices considered a good idea, because it makes each service more independent.
However, I can't figure out what to do in the following case to provide consistency:
I have a Customer service which has a RegisterCustomer method.
When I register a customer, I want to send a message via RabbitMQ, so other services can pick up this information and store in its DB.
My code looks something like this:
...
_dbContext.Add(customer);
CustomerRegistered e = Mapper.Map<CustomerRegistered>(customer);
await _messagePublisher.PublishMessageAsync(e.MessageType, e, "");
//!!app crashes
_dbContext.SaveChanges();
...
So I would like to know, how can I handle such case, when application sends the message, but is unable to save data itself? Of course, I could swap DbContextSave and PublishMessage methods, but trouble is still there. Is there something wrong with my data storing approach?
Yes. You are doing dual persistence - persistence in DB and durable queue. If one succeeds and other fails, you'd always be in trouble. There are a few ways to handle this:
Persist in DB and then do Change Data Capture (CDC) such that the data from the DB Write Ahead Log (WAL) is used to create a materialized view in the second service DB using real time streaming
Persist in a durable queue and a cache. Using real time streaming persist the data in both the services. Read data from cache if the data is available in cache, otherwise read from DB. This will allow read after write. Even if write to cache fails in worst case, within seconds the data will be in DB through streaming
NServiceBus does support durable distributed transaction in many scenarios vs. RMQ.Maybe you can look into using that feature to ensure that both the contexts are saved or rolled back together in case of failures if you can use NServiceBus instead of RMQ.
I think the solution you're looking for is outbox pattern,
there is an event related database table in the same database as your business data,
this allows them to be committed in the same database transaction,
and then a background worker loop push the event to mq

Redis cache and overall process

I am having trouble understanding the start to finish call process of an application using redis as a cache. Say an application has a DB such as SQL and is using redis to cache how does the timing of that process work? To my understanding redis makes a call to the SQL DB to cache data and the UI calls and consumes that data directly from redis therefore limiting the calls to the SQL DB. Typically when does redis make the call to the DB to get its data? and how does it keep it in sync with the SQL DB? I am just trying to understand this high level. Thank you!
Redis is just a cache, it provides a high-speed data store, whereas a database is slower but more reliable and with more features. Your application is responsible for getting data from Redis, and if it doesn't find it, pulling it from the database and adding it to Redis, with appropriate expiration and invalidation mechanics.
Let's walk through a simplified example. Let's consider a hypothetical application in Python that uses Redis to cache a user object. The user objects are cached in Redis under the key user:{id} in a Hash datatype. The system of record for the user data is stored in a relational database. Id is an integer, pk generated in the db.
To fetch a user we would execute code something like this:
# r = redis.StrictRedis(hostname)
def get_user(r, id):
key = "user:{}".format(id)
user = r.hgetall(key)
if user is None:
return fetch_user_from_db(id)
else:
return user
To write a user we would execute code similar to:
# r = redis.StrictRedis(hostname)
def update_user(r, user):
key = "user:{}".format(user['id'])
r.delete(key)
write_user_to_db(user)
This simplified example leaves out many of the details of keeping your cache and db consistency in a distributed environment, but for a single node this is the basic process. Your app has to handle the details of caching, checking for a cache hit and invalidating on write.
I myself have never come across a scenario where Redis calls a SQL DB. Since Redis is a key-value store that keeps its own data in RAM, it's usually used directly for caching by apps. There's no involvement of the SQL DB at all in these cases.
The greatest strength of Redis is that many operations return data in constant time, independent on how many elements are stored. This is why it's great for many use cases like caching and locking mechanisms that need very quick response times.

How do you best offload a database insert, so a web response is returned quicker?

Setup
I have web service that takes its inputs through a REST interface. The REST call does not return any meaningful data, so whatever is passed in to the web service is just recorded in the database and that is it. It is an analytics service which my company is using internally to do some special processing on web requests that are received on their web page. So it is very important the response take as little time to return as possible.
I have pretty much optimized the code down as much as possible, to make the response as fast as possible. However, the time the database stays open still keeps the connection open for longer than I want before a response is sent back to the web client.
The code looks basically like this, by the way it is ASP.NET MVC, using Entity Framework, running on IIS 7, if that matters.
public ActionResult Add(/*..bunch of parameters..*/) {
using (var db = new Entities()) {
var log = new Log {
// populate Log from parameters
}
db.AddToLogs(log);
db.SaveChanges();
}
return File(pixelImage, "image/gif");
}
Question
Is there a way to off load the database insert in to another process, so the response to the client is returned almost instantly?
I was thinking about wrapping everything in the using block in another thread, to make the database insert asynchronous, but didn't know if that was the best way to free up the response back to the client.
What would you recommend if you were trying to accomplish this goal?
If the request has to be reliable then you need to write it into the database. Eg. if your return means 'I have paid the merchant' then you can't return before you actually commit in the database. If the processing is long then there are database based asynchronous patterns, using a table as a queue or using built-in queuing like Asynchronous procedure execution. But these apply when heavy and lengthy processing is needed, not for a simple log insert.
When you want just to insert a log record (visitor/url tracking stuff) then the simplest solution is to use CLR's thread pools and just queue the work, something like:
...
var log = new Log {// populate Log from parameters}
ThreadPool.QueueUserWorkItem(stateInfo=>{
var queueLog = stateInfo as Log;
using (var db = new Entities())
{
db.AddToLogs(queuedLog);
db.SaveChanges();
}
}, log);
...
This is quick and easy and it frees the ASP handler thread to return the response as soon as possible. But it has some drawbacks:
If the incomming rate of requests exceeds the thread pool processing rate then the in memory queue will grow until it will trigger an app pool 'recycle', thus loosing all items 'in progress' (as well as warm caches and other goodies).
The order of requests is not preserved (may or may not be important)
It consumes a CLR pool thread on doing nothing but waiting for a response from the DB
The last concern can be addressed by using a true asynchronous database call, via SqlCommand.BeginExecuteXXX and setting the AsynchronousProcessing on the connection to true. Unfortunately AFAIK EF doesn't yet have true asynchronous execution, so you would have to resort to the SqlClient layer (SqlConnection, SqlCommand). But this solution would not address the first concern, when the rate of page hits is so high that this logging (= writes on every page hit) becomes a critical bottle neck.
If the first concern is real then and no threading and/or producer/consumer wizardry can aleviate it. If you trully have an incomming rate vs. write rate scalability concern ('pending' queue grows in memory) you have to either make the writes faster in the DB layer (faster IO, special log flush IO) and/or you have to aggregate the writes. Instead of logging every requests, just increment in memory counters and write them periodically as aggregates.
I've been working on multi-tier solutions mostly for the last year or so that require this sort of functionality, and that's exactly how I've been doing it.
I have a singleton that takes care of running tasks in the background based on an ITask interface. Then I just register a new ITask with my singleton and pass control from my main thread back to the client.
Create a separate thread that monitors a global, in memory queue. Have your request put it's information on the queue and return, the thread then takes the item off the queue and posts it to the DB.
Under heavy load, if the thread lags the requests, your queue will grow.
Also, if you lose the machine, you will lose any unprocessed queue entries.
Whether these limitations are acceptable to you, you'd need to decide that.
A more formal mechanism is using some actual middleware messaging system (JMS in Java land, dunno the equivalent in .NET, but there's certainly something).
It depends: When you return to the client do you need to be 100% sure that the data is stored in the database?
Take this scenario:
Request comes in
A thread is started to save to the database
Response is sent to the client
Server crashes
Data was not saved to the database
You also need to check how many milliseconds you save by starting a new thread instead of saving to the database.
The added complexity and maintainence cost is probably too high compared with the savings in response time. And the savings in response time are probably so low that they will not be noticed.
Before I spent a lot of time on the optimization I'd be sure of where the time is going. Connections like these have significant latency overhead (check this out). Just for grins, make your service a NOP and see how it performs.
It seems to me that the 'async-ness' needs to be on the client - it should fire off the call to your service and move on, especially since it doesn't care about the result?
I also suspect that if the NOP performance is good-to-tolerable on your LAN it will be a different story in the wild.

Resources