How to save streaming data to InfluxDB? - database

I am trying to save data as it arrives in a streaming fashion (with the least amount of delay) to my database which is InfluxDB. Currently I save it in batches.
Current setup - interval based
Currently I have an Airflow instance where I read the data from a REST API every 5min and then save it to the InfluxDB.
Desired setup - continuous
Instead of saving data every 5 min, I would like to establish a connection via a Web-socket (I guess) and save the data as it arrives. I have never done this before and I am confusing how actually it is done? Some question I have are:
One I write the code for it, do I keep it up like a daemon?
Do I need to use something like Telegraf for this or that's not really the case (example article)
Instead of Airflow (since it is for batch processing) do I need to use something like Apache Beam or Spark?
As you can see, I am quite lost on where to start, what to read and how to make sense from all this. Any advise on direction and/or guidance for a set-up would be very appreciated.

If I understand correctly, you are keen to code a java service which would process the incoming data, so one of the solution is to implement a websocket with for example jetty.
From there you receive the data in json format for example and you process the data using the influxdb-java framework with which you fill the database. Influxdb-java will allow you to create and manage the data.
I don't know airflow, and how you produce the data, so maybe there is built-in tools (influxdb sinks) that can save you some work in your context.
I hope that this can give you some guide lines to start digging more.

Related

How batch processing over multiple loops work in Apache Flink?

I wanted to clear my understanding on the following.
Use case
Basically I am running a flink batch job. My requirement is following
I have 10 tables having raw data in postgresql
I want to aggregate that data by creating a tumble window of 10 minutes
I need to store the aggregated data into aggregated postgresql tables
My pseudo code somewhat looks like this
initialize StreamExecutionEnvironment, StreamTableEnvironment
load all the configs from file
configs.foreach(
load data from table
aggregate
store data
delete temporary views created
)
streamExecutionEnvironment.execute()
Everything works fine for now. Still I have gotten one question. I think with this approach all the load functions would be executed simultaneously. So it would put load on flink right as all data is getting loaded simultaneously?? Or my understanding is wrong and the data would get loaded, processed and stored one by one?? please guide

Saving content as JSON to a database in real time

I have a text editor built on Tiptap that sends its content as JSON to a API which saves it to a mongoDB database. I have a timeout on it so it only sends a request after the user has stopped typing for 1000ms.
Currently it sends the entire document in the request body making it very taxing performance-wise so I'm trying to figure out a way to identify the parts that have been changed, and only send the parts that have been changed in my request body.
Because the content is being saved in JSON I've been trying to find the index of the node so I can update said node, but I haven't been able to find a clear cut way of doing it. I've also been looking into Steps from the prosemirror docs but I can't figure out a way of implementing steps into the JSON content saved to the DB
I've tried assigning uuids to each node as an attribute, to later iterate through the JSON saved in the database but it seems highly inefficient and I'd really love some input as to how:
this could be achieved
OR in the case that my approach is entirely out of convention, how content is usually saved to a database
if you want save your content real time you have to do more efficient that i will say some solutions for this:
create a de-bouncer to save your content in data base after while user stop typing or create save button.
use UDP instead of TCP it's faster but its not safe then maybe u will lost some data
use RDB instead of Mongo like Redis to store data faster if you have prob in saving data then you can save you data after a while in you main database (mongoDB)
in order to get real time chat and and real time saving data base use web-socket instead of sending request and waiting for handshake with this approach you can show the text data with web socket in client side and also send data into Redis to save temporary in RDB then save in Mongo db for permanently store.
for more performance and more safety is better to combine all these struct er together.
these solution are my personal experience and i do it before.
I'm new to Tiptap and ProseMirror but :
You might want to look at Tiptap's Collaborative editing.
You could intercept the transactions beeing sent to the socket server (It should only contain the updated part, I guess its only sends Prose Mirror transaction). And I guess that, since the socket server is able to send full document to new connected user, It is always able to reconstruct the full document. So you potentially could replace your system, with hocuspocus.
PS: it is beeing developped right now, so you have to pay a little to have access to it. But it could save you time.

Obtaining Raw Data from NagiosXI and/or OPSview

I am currently working on completing my Masters Thesis project. In order to do so I need to be able to obtain the raw data accumulated in NagiosXI and/or OPSview. Because both of these are based off of the Nagios core, I assume the method to obtaining the raw data may be similar. This raw data is needed so that I can at a later time perform specific statical calculations which relate to my Masters Thesis. I have looked online and so far found some Nagios plugins which obtain raw data and then manipulate it for graphs and visuals, but I need the raw numbers in order to complete my calculations.
I am also researching to see if I can create maybe a PHP script, or some other language, that will extract the data from Nagios and save it in a word or excel document. However, this would be a bit of extra work as I am unfamiliar with either PHP or MySQL queries. Because of this I hope to be able to find a plugin, or something similar, that can get the data for me.
Cyanide,
I can't speak for NagiosXI, but I can for Opsview :)
You could access the data that is stored in the RRD files. You can use rrdtool dump to pull the values out or use a URL like: /rrdfetch?start=1307608993&end=1307695393&hsm=opsview%3A%3ACheck%20Loadavg%3A%3Aload1&hsm=opsview%3A%3ACheck%20Loadavg%3A%3Aload5
And this returns back the JSON data points. This is undocumented, but is used to power the interactive javascript graphing.
Alternatively, if you have ODW enabled with full statistics, then the raw data is stored in the ODW database and you can then extract the raw data with SQL commands. See http://docs.opsview.com/doku.php?id=opsview-community:odw for more information.
Ton
You can try use mk livestatus http://mathias-kettner.de/checkmk_livestatus.html
or http://exchange.nagios.org/directory/Addons/APIs/JSON/Nagios2JSON/details
All this tools get you status data without need to go to DB or status file. While XI is based on Nagios it can still work with him.
Please take a look at http://dmytro.github.com/nagira
It's a web services API to access Nagios data. You can get all hosts, service status data, objects configuration in multiple formats JSON, XML or YAML.

BizTalk 2006 - Copy a received file to a new directory

I want to be able to copy the file I have which comes in as XML into a new folder location on the server. Essentially I want to hold a back up of the input files in a new folder.
What I have done so far is try to follow what has been said on this forum post - link text
At first I tried the last method which didn't do anything (file renaming while reading). So I tried one of the other options and altered the orchestration and put a Send shape just after the Receive shape. So the same message that comes in is sent out to the logical port. I export the MSI, and I have created a Send Port in the Admin console which has been set to point to my copy location. It copies the file but it continues to create one every second. The Event Viewer also reports warnings saying "The file exists". I have set the Copy Mode of the port to 'overwrite' and 'Create New', both are not working.
I have looked on Google but nothing helps - BTW I support BizTalk but I have no idea how pipelines, ports work. So any help would be appreciated.
thanks for the quick responses.
As David has suggested I want to be able to track the message off the wire before BizTalk does any processing with it.
I have tried to the CodePlex link that Ben supplied and its points to 'Atomic-Scope's BizTalk Message Archiving Pipeline Component' which looks like my client will have to pay for. I have downloaded the trial and will see if I have any luck.
David - I agree that the orchestration should represent the business flow and making a copy of a file isn't part of the business process. I just assumed when I started tinkering around I could do it myself in the orchestration as suggested on the link I posted.
I'd also rather not rely on the BizTalk tracking within the message box database as I suppose the tracked messages will need to be pruned on a regular basis. Is that correct or am I talking nonsense?
However is there a way I can do what Atomic-Scope have done which may be cheaper?
**Hi again, I have figured it out from David's original post as indicated I also created a Send port which just has a "Filter" expression like - BTS.ReceivePortName == ReceivePortName
Thanks all**
As the post you linked to suggests there are several ways of achieving this sort of result.
The first question is: What do you need to track?
It sounds like there are two possible answers to that question in your case, which I'll address seperately.
You need to track the message as received off the wire before BizTalk touches it
This scenario often arises where you need to be able to prove that your BizTalk solution is not the source of any message corruption or degradation being seen in messages.
There are two common approaches to this:
Use a pipeline component such as the one as Ben Runchey suggests
There is another example of a pipeline component for archiving here on codebetter.com. It looks good - just be careful if you use other components, and where you place this component, that you are still following BizTalk streaming model proper practices. BizTalk pipelines are all forwardonly streaming, meaning that your stream is readonly once, and all the work on them the happens in an eventing manner.
This is a good approach, but with the following caveats:
You need to be careful about the streaming employed within the pipeline component
You are not actually tracking the on the wire message - what your pipeline actually sees is the message after it has gone through the BizTalk adapter (e.g. HTTP adapter, File etc...)
Rely upon BizTalk's out of the box tracking
BizTalk automatically persists all messages to the message box database and if you turn on BizTalk tracking you can make BizTalk keep these messages around.
The main downside here is that enabling this tracking will result in some performance degradation on your server - depending on the exact scenario, this may not be a huge hit, but it can be signifigant.
You can track the message after it has gone through the initial receive pipeline
With this approach there are two main options, to use a pure messaging send port subscribing to the receive port, to use an orchestration send port.
I personally do not like the idea of using an orchestration send port. Orchestrations are generally best used to model the business flow needed. Unless this archiving is part of the business flow as understood by standard users, it could simply confuse what does what in your solution.
The approach I tend to use is to create a messaging send port in the BizTalk admin console that subscribes to your receive port. The send port will then just use a standard BizTalk file adapter, with a pass through pipeline.
I think you should look at the Biztalk Message Archiving pipeline component. You can find it on Codeplex (http://www.codeplex.com/btsmsgarchcomp).
You will have to create a new pipeline and deploy it to your biztalk group. Then update your receive pipeline to archive the file to a location that the host this receive location is running under has access to.

Writing data into a database using a fully REST web service

How would one create a REST web service to write a row into a databse table. Use the follwoing scenario:
The table is called Customer - the
data to be inserted into the row would
be the name, addresss, telephone
number, email.
I think its impossible to describe the whole thing end to end in Java or C#, and I would never expect that, but here are the questions I have popping into my head as I prepare for coding:
How the URI would look (eg. if you use this URL - http://www.example.com/)?
What info would go into the HTTP envelope?
Would I use POST when writing to the database in this way?
Do I use a resource to store the posted data from the client? Is this even necessary if the data is being written to a database anyway?
When the data to be writeen into the db is recieved by the server - how do I physically insert it into the database - do I call some method on the server to actually write the data (in Java)? - this doesn't seem to fit with truely REST architecture - shunning RPC calls.
Should I even be bothering writing to a DB - should I be storing my data as a resource?
As you can see I need a few issues clearing in my head. Any help much appreciated.
First of all, I'm not either java nor c# expert and I don't exactly know what means do these languages have to support REST design, but in general:
http://www.example.com/customers - customers is a collection of resources and you want to add a new resource to this collection
It depends on various things - you should probably set the content-type header (according to the data format in which you are sending the representation) and set some authentication headers if you need it.
Yes, you always use POST to create a new entry in a collection of resources.
I don't fully understand this question, to be honest. What do you mean by "inmediately writing data into the database"?
REST is primarily just a style of communication between server and a client. It doesn't say anything about how you should handle the data received by using it. The usual way how modern web approaches (MVC style frameworks) solve it, is by routing every REST action to a method of some class (usually a controller instance) where you handle the received parameters (eg. save them to the database) and generate a response to be sent back.
For a very brief and very clear introduction to REST have a look at this short video.
RESTful Web Services, published by O'Reilly and Associates, seems to fit the bill you're looking for.
As far as doing it in Java, Sun has a page on it.

Resources