Is Amazon SQS a good tool for handling analytics logging data to a database?

Is Amazon SQS a good tool for handling analytics logging data to a database? - sql-server

We have a few nodejs servers where the details and payload of each request needs to be logged to SQL Server for reporting and other business analytics.
The amount of requests and similarity of needs between servers has me wanting to approach this with an centralized logging service. My first instinct is to use something like Amazon SQS and let it act as a buffer with either SQL Server directly or build a small logging server which would make database calls directed by SQS.
Does this sound like a good use for SQS or am I missing a widely used tool for this task?

The solution will really depend on how much data you're working with, as each service has limitations. To name a few:
SQS
First off since you're dealing with logs, you don't want duplication. With this in mind you'll need a FIFO (first in first out) queue.
SQS by itself doesn't really invoke anything. What you'll want to do here is setup the queue, then make a call to submit a message via the AWS JS SDK. Then when you get the message back in your callback, get the message ID and pass that data to an invoked Lambda function (you can write those in NodeJS as well) which stores the info you need in your database.
That said it's important to know that messages in an SQS queue have a size limit:
The minimum message size is 1 byte (1 character). The maximum is
262,144 bytes (256 KB).
To send messages larger than 256 KB, you can use the Amazon SQS
Extended Client Library for Java. This library allows you to send an
Amazon SQS message that contains a reference to a message payload in
Amazon S3. The maximum payload size is 2 GB.
CloudWatch Logs
(not to be confused with the high level cloud watch service itself, which is more sending metrics)
The idea here is that you submit event data to CloudWatch logs
It also has a limit here:
Event size: 256 KB (maximum). This limit cannot be changed
Unlike SQS, CloudWatch logs can be automated to pass log data to Lambda, which then can be written to your SQL server. The AWS docs explain how to set that up.
S3
Simply setup a bucket and have your servers write out data to it. The nice thing here is that since S3 is meant for storing large files, you really don't have to worry about the previously mentioned size limitations. S3 buckets also have events which can trigger lambda functions. Then you can happily go on your way sending out logo data.
If your log data gets big enough, you can scale out to something like AWS Batch which gets you a cluster of containers that can be used to process log data. Finally you also get a data backup. If your DB goes down, you've got the log data stored in S3 and can throw together a script to load everything back up. You can also use Lifecycle Policies to migrate old data to lower cost storage, or straight remove it all together.

Related

How to load huge amount of data from spring boot to reactjs?

I have two applications, spring boot backend and react frontend. I need to load a lot of data (lets say 100 000 objects, each 3 Integer fields), and present it on a leaflet map. However i don't know which protocol should I use. I thought about two approaches:
Do it with REST, 1 000 (or more/less) objects each request, create some progress bar on front end so user does not refresh the page all the time because he thinks something is wrong.
Do it with websocket, so it is faster? Same idea with progress bar, however I am worried that if user starts to refresh the page, backend will stream the data even though connection from frontend is crashed and new one is established, for the new one the process will begin too, and so on.
If it is worth mentioning, I am using spring-boot 2.3.1, together with spring cloud (eureka, spring-cloud-gateway). Websocket i chose is SockJS, data is being streamed by SimpMessagingTemplate from org.springframework.messaging.simp.SimpMessagingTemplate.

If you have that amount of data and alot of read write operations, I would recommend not returning it in either websocket or rest call(reactor or MVC) sending big amount of data over tcp has it issues, what I would recommend is quite simple, save the data to Storage(AWS S3 for example), return the S3 bucket url, and from the client side read the data from the S3 directly,
alternatively you can have a message queue that the client is subscribe on(pub/sub), publish the data in the server side, and subscribe to it on the client side, but this may be an overkill.
If you are set on rest you can use multipart data see the stack overflow question here:
Multipart example

How to do mobile application monitoring?

I want to be able to monitor issues from mobile application like from backend micro-services.
I'm not aware of any real time monitoring for mobile applications outside.
I think that it can really help to monitor mobile application and report errors from the application and not only from the backend services. Sometimes the application is connected to multiple services and has its own logic so it seems like one place to catch all errors and wrong behaviour.
Are there any tools outside?
If for example I'll use mParticle/Segment as Hub to report events, can I connect it to Graphite somehow which is push-based monitoring ? Maybe through SQS / AWS Lambda ?
https://www.mparticle.com/integrations

In theory, yes it's possible to send data to Graphite using a combination of SQS + Lambda. I've tested this by writing some metric data to SQS and used a node js lambda function to read and forward that data to our carbon endpoint at https://hostedgraphite.com via UDP per our language guide here
Having said that, there are some further considerations that we must take in order to ensure this works: the main one being data format. Graphite/Carbon require data in a specific format, something that mParticle might not support directly. As such, you will need an AWS Lambda that formats the messages and then forwards to Graphite (or optionally, to another SQS queue where another Lambda reads and forwards that data to Graphite).

Channel API overkill?

Hi I am currently using channel API for my project. My client is a signage player which receives data from app engine server only when user changes a media content. Appengine sends data to client only ones or twice a day. Do you think channel api is a over kill for this? what are some other alternatives?

Overall, I'd think not. How many clients will be connected?
Per https://cloud.google.com/appengine/docs/quotas?hl=en#Channel the free quota is 200 channel-hours/day, so if you have no more than 8 clients connected you'll be within the free quota -- no "overkill".
Even beyond that, per https://cloud.google.com/appengine/pricing , there's "no additional charge" beyond the computational resources keeping the channel open entails -- I don't have exact numbers but I don't think those resources would be "overkill" compared with alternatives such as reasonably frequent polling by the clients.

According to the Channel API documentation (https://cloud.google.com/appengine/features/#channel), "The Channel API creates a persistent connection between an application and its users, allowing the application to send real time messages without the use of polling.". IMHO, yours might not the best use case for it.
You may want to take a look into the TaskQueue API (https://cloud.google.com/appengine/features/#taskqueue) as an alternative of sending data from AppEngine to the client.

Compression when using the Channel API in Google App Engine

In this FAQ question it says that compression is used automatically when the browser supports it and that I don't need to modify my application in any way.
My question is, does that apply to Channel API messages too?
I have an application that needs to send relatively large JSON (text) data through a persistent connection and I'm hoping I can get things through faster if they are compressed.
If not, I can think of a workaround to have the server send just a ping through the channel when a big load comes through and have the browser then make a GET request to fetch it (and that would "automatically" compress it), but that would add the latency of another request.

Data sent over the connection the Channel API uses is gzip compressed.
However, Channel API messages are limited to 32K uncompressed, so for anything bigger than that you'll need to use the ping/GET method anyway.

Google App Engine: keep state of an object between HTTP-requests (Java)

User makes HTTP-request to the server. This request is processed with an object of some class, let's call it "Processor". Then the same user in two minutes makes another HTTP request. And I want it to be processed with the same instance of Processor as the first one. So basically I want to keep the state of some object among several requests.
I know that I can save it each time to the datastore and then load back, but this approach seems to be very slow. Is there a way to store objects in some RAM place?

How about using memcache?
You can't ensure that consecutive requests to your app will go to the same instance, but memcache can help reduce or eliminate the overhead of accessing the datastore for each request.

It sounds like you are describing is a session.
I am not sure which language runtime and web framework you are using, but it is sure to include support for a sessions. (If you are using Java you will need to enable it.)
The standard session mechanism puts a small ID in a cookie that is stored in the user's browser. On every request, each of which could be go to a different application server, this ID is used as a key to read and write persistent information from the data store.
If the datastore accesses are too slow for you I would suggest not using memcache for this session storage, because memcache is by design unreliable, so the user's session information could disappear at any time, which would be a bad experience for them.
If the amount of data you want to store is less than about a few kilobytes, then I recommend doing what Play Framework does, which is to encrypt your session data and store it directly in a cookie stored in the user's browser. This is fast and truly stateless.
If you have more data than can be stored in a cookie, and you don't want to use the data store, you could could use JavaScript local storage on the browser, and use AJAX calls to communicate with the server. (If you want to support older browsers you may need to use the jStorage wrapper library.)

If memcache isn't enough, you could use backends to maintain state. Use a resident backend (or a set of them) and route incoming requests from the frontend to the backend machine that has the state.
Docs: Python Java

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight