Need of a redis server in a centralized setup - database

I have gone through the documentation on the Logstash server to find out that we require a redis server which will act as a broker.
Here is the link:
http://logstash.net/docs/1.1.12/tutorials/getting-started-centralized
But what is not clear to me is why we use Redis at all as a broker?
We could rather simply directly ship the logs to the elastic search from logstash itself, that would save us the need of the redis broker. Then why do we go for a shipper and a indexer ?
Need clear explanation.
Thanks.

I believe you can find an answer here:
https://groups.google.com/forum/#!topic/logstash-users/VakCOAzZI8k
Redis basically acts as a temporary key value store for raw shipper information which is then parsed by the indexer. Then the log info is ultimately stored in elasticsearch, and not in redis.
Apparently the idea is to offload indexing to a server dedicated to such tasks, as indexing is CPU intensive. Redis being called a broker seems appropriate, i guess.

When using Logstash with Redis, you can configure Redis to only store all the log entries in memory which would like a in memory queue (like memcache).
You mat come to the point where the number of logs sent will not be processed by Logstash and it can bring down your system on constant basis (observed in our environment).
If you feel Redis is an overhead for your disk, you can configure it to store all the logs in memory until they are processed by logstash.

Related

Why use Redis if it's not being run on your application server?

My understanding is that using Redis requires you to host it on its own server. So why even use it if the data being stored on it isn't being run on the same VM (thus using the same RAM) as your app server (eg Node)?
You're not required to host Redis on a separate server at all. In fact, it's not uncommon for application servers to run an in-memory store like Redis or Memcached on the same server for simple caching tasks.
However, what I think is at the heart of your question is a fundamental misunderstanding of how in-memory storage works. Even if you were to run Redis on the same server as your application, your application would never be able to directly access the RAM blocks that Redis uses to store your data--you would still need to send a request to the Redis instance to retrieve the data. Hosting Redis separately from your application server does introduce network latency, but there's zero difference in terms of accessing or modifying the data in RAM.
The name "Redis" is an acronym for REmote DIctionary Server - the "Remote" part indicates that it is intended to be used over a network. The main concept here is that the data stored in memory in Redis is accessible to multiple application instances, instead of having an in-app store for each.
That said, there is no requirement to have Redis on a separate server or use it with multiple application instance. On the other hand, it makes a lot of sense because that is what it was designed for.

Using NATS Streaming Server as the primary data store for IoT position data?

I have a Mosquitto broker which receives positioning information from remote devices.
I need to store this data somewhere to be processed by other micro-services.
At present, there is a Node.js process which subscribes to the broker, and writes to the Postgres database in batches.
Devices -> Mosquitto -> DB writer -> (source-of-truth) Postgres
(source-of-truth) -> Service A
-> Service B
But the problem I see is that now any other service which needs to process this position data, needs to query the Postgres database.
Constraint: This is for on-premise deployments so ideally we want to maintain as little as possible. One VM with a database, and perhaps a link to a customer-maintained database.
An alternative to the database as the source of truth for the sensor data is a Kafka-like event log / event-sourcing approach. Then there would be one subscriber to the broker, and all microservices could read from it, and pick up where they left off if they go down.
Because it is on-premise I want something more lightweight than Kafka, and have found NATS Streaming Server.
Now, the NATS event log can be persisted by configuring it with a data store. It currently supports a simple file store and a SQL store.
Now if I used the SQL store, it seems like a waste of time to store raw messages to database, read from database, then store them again, plus bad for performance. The SQL store interface also has its own batching implemented. I'm not sure how much I trust the file store as the source of truth too.
So, is this a viable approach?
You can consume messages "by batches" in NATS Streaming by creating your subscription with a MaxInflight and ManualAckMode. The server will not send more than MaxInflight messages without receiving corresponding message acks from the clients.
If you need to do transformation before storing, I understand your process. However, if you just don't trust the FileStore or SQLStore from the NATS Streaming server, why would you be using NATS Streaming in the first place? That is, the stores have been implemented by the same people (including me) that wrote the NATS Streaming server ;-)

How should I use Redis as a cache for SQL Server?

I have got some tabular data that due to unrelated issues is proving too slow to get out of SQL Server in realtime. As we get more users this will only get worse so I am thinking of using Redis as a front-end cache to store users' tabular pageable data.
This data could become stale after about 10 minutes after which time I would like to get the record set again and put in in Redis.
The app is an .NET MVC app. I was thinking that when the user logs into the app this data gets pulled out of the database (takes around 10 seconds) and put into Redis ready to be consumed by the MVC client. I would put an expiry on that data and then when it becomes stale it will get refetched from the SQL Server database.
Does this all sound reasonable? I'm a little bit scared that:
The user could get to the page before the data is in Redis
If Redis goes down or does not respond I need to ensure that the ViewModel can get filled direct from SQL SErver without Redis being there
I will go for Service stack redis implementation, here are all the details required. Redis is particularly good when doing caching in compare to other nosql. But if you are having high read - write application, I will insist to check out nosql database as database combined with SQL server. That will help in case of scalability.
Please let me know if any further details required. You just need to fire nuget command and you are almost up and running.
You could use something like MemcacheD to store cached pages in memory.
You can set a validity of 10 minutes on a cached object. After that the cache will automatically remove the object.
Your actual repository would have to do these steps:
1. Check the cache for the data you want, if it is there, great use it
2. If the cached data doesn't exist, go to SQL server to retrieve it
3. Update the cache with data returned from SQL server
I've used the Enyim client before. It works great. https://github.com/enyim/EnyimMemcached
I might also use something like Quartz to schedule a background task to prime the cache. http://quartznet.sourceforge.net/

Is it better to send data to hbase via one stream or via several servers concurrently?

I'm sorry if this question is basic(I'm new to nosql). Basically I have a large mathimatical process that I'm splitting up and having different servers process and send the result to an hbase database. Each server computing the data, is an hbase regional server, and has thrift on it.
I was thinking of each server processing the data and then updating hbase locally(via thrift). I'm not sure if this is the best approach because I don't fully understand how the master(named) node will handle the upload/splitting.
I'm wondering what the best practice is when uploading large amounts of data(in total I suspect it'll be several million rows)? Is it okay to send it to regional servers or should everything go through the master?
From this blog post,
The general flow is that a new client contacts the Zookeeper quorum (a
separate cluster of Zookeeper nodes) first to find a particular row
key. It does so by retrieving the server name (i.e. host name) that
hosts the -ROOT- region from Zookeeper. With that information it can
query that server to get the server that hosts the .META. table. Both
of these two details are cached and only looked up once. Lastly it can
query the .META. server and retrieve the server that has the row the
client is looking for.
Once it has been told where the row resides, i.e. in what region, it
caches this information as well and contacts the HRegionServer hosting
that region directly. So over time the client has a pretty complete
picture of where to get rows from without needing to query the .META.
server again.
I am assuming you directly use the thrift interface. In that case, even if you call any mutation from a particular regionserver, that regionserver only acts as a client. It will contact Zookeeper quorum, then contact Master to get the regions where to write the data and proceed in the same way as if it was written from another regionserver.
Is it okay to send it to regional servers or should everything go through the master?
Both are same. There is no such thing as writing directly to regionserver. Master will have to be contacted to determine which region to write the output to.
If you are using a hadoop map-reduce job, and using the Java API for the mapreduce job, then you can use the TableOutputFormat to write directly to HFiles without going through the HBase API. It is about ~10x faster than using the API.

Access Redis from relational databases

Is there any way to access Redis data from relational databases, such as Oracle or SQL Server?
One use case I have in mind is ETL to a data warehouse.
Am trying to understand the question: You have data in a traditional RDBMs, and you want to extract information from here, and load into Redis? Or its the other way around?
Either way, since am not competent to talk about RDBMS, I would expect to create a program (Java in my case), which would extract information from Redis, and upload it to Oracle. There are options to interact with Redis using a Java Client library (JDBC Redis, and JRedis are examples)
You may get a better answer from the community, if you can elaborate on your question.
Well, if you use server side Java object on your ORA (and they can make REST calls, in the very least, if not socket io (don't know)) then you can call Redis from your stored procedures in Oracle.
[edit]
Should add that if you can make socket connections, then just include JRedis jar in your Oracle server's lib so server-side object can create clients.
Should that not be possible -- I would seriously question a DB that lets SProcs and triggers to open generic TCP connections -- then you are left with consuming Web Services.
JRedis doesn't support web services, but nothing is stopping you from wrapping JRedis and exposing whatever command you need as a RESTFul resource. So here, you would run Redis on server R, a java web server (Jetty/Jettison would do fine) running JRedis on server R or R`. Since Redis is single threaded, it is perfectly fine to run it on the same multi-core box as a JVM; its just a matter of resources, and it they are sufficient then you are using the loopback on the connection between Redis and JRedis and that's guaranteed to be faster than traversing network boundaries! But if the loads you require preclude colocation of Redis and JRedis (proxy), then use a second server.
And of course, you are running your DB on server D. So D <=> R` <=> R. You will pay the second hop's latency costs, of course.

Resources