Floating Point Precision using REST - sql-server

I'm pulling timeseries data in from a MS-SQL database using REST. I've found that the floating point precision goes down from a value like 0.00166667 to 0.002 when I'm using REST to retrieve data, but using the DB designer's own tools, the precision is maintained.
Is this a limitation of the REST method, or is it something specific to the implementation?
Just to clarify -- my work is using a proprietary database that uses MS-SQL as its backbone. It's not open-source so I can't poke around and see how requests are being handled.
A SOAP method is offered, which I'm going to try to implement to compare, but I'm more concerned whether or not this is a REST problem or not.

Representational State Transfer is just a general style of client-server architecture. It doesn't specify anything nearly so detailed such the appropriate handling of floating point values. The only constraints it imposes are things like the communication should be "stateless". So the concepts of REST exist on a higher level of abstraction and the issue you are seeing must be something specific to the implementation of the service that is providing the floating point values.

Related

How to compare the features and performance of JuliaDB and Queryverse? Which is better?

It is observed there are a few feature overlapping amongst packages.
Please guide me in comparing features and performance of JuliaDB and Queryverse and deciding the better one.
JuliaDB.jl and Queryverse operate on different layers of abstraction.
Queryverse provides tools for manipulation and visualization of various data sources and does not provide data source layer itself.
JuliaDB.jl, on the other hand, provides a specific data source implementation, that is particularly valuable when working with very large data sets that do not fit into RAM and are processed in distributed manner. The closest alternative to JuliaDB.jl is DataFrames.jl package. A brief comparison of both is given here, so you can see that each has its uses in different contexts. Queryverse works "on top" of any of these sources.
You might also want to have a look at Tables.jl package that defines a low-level API for tabular data. In particular even NamedTuple of vectors and a vector of NamedTuples can be considered as tabular data.
One thing you should keep in mind when working with Queryverse is that for type inference reasons it defines its own notion of missingness in in DataValues.jl package that is not the same as Missing type defined in Base.

train a classifier directly from mySQL database

Currently, I got a position to work as a data scientist on ML. my question is as follows, is it possible to train an algorithm directly from mySQL database and is there a similarity with the way you train it from an csv file. moreover, I would like to know if you are working on very unbalanced dataset. when you use for instance 0.2 percentage of the data for testing, does it divides the proportion of the negative and positive cases in the training and the testing in equal proportion. Can any one propose me either a good tutorial or documentation?
Sure you can train your model, directly from the database. This is what happens all around in production systems. Your software should be designed, that is does not matter if your data source is SQL, csv or whatever. As you don´t mention the programming language, it is hard to say, how to do it, but in python you can take a look here: How do I connect to a MySQL Database in Python?
If your data set is unbalanced, like it is often in reality, you can use class weights to make your classifier aware of that. e.G. in keras/sci-kit learn you can just pass the class_weights parameter. Be aware that if your data set is too small, you can run into problems with default measures like accuracy. Better take a look at the confusion matrix or other metrics like the Matthews correlation coefficient
Another good reference:
How does the class_weight parameter in scikit-learn work?

Consensus algorithm for Node.js

I'm trying to implement a collaborative canvas in which many people can draw free-handly or with specific shape tools.
Server has been developed in Node.js and client with Angular1-js (and I am pretty new to them both).
I must use a consensus algorithm for it to show always the same stuff to all the users.
I'm seriously in troubles with it since I cannot find a proper tutorial its use. I have been looking and studying Paxos implementation but it seems like Raft is very used in practical.
Any suggestions? I would really appreciate it.
Writing a distributed system is not an easy task[1], so I'd recommend using some existing strongly consistent one instead of implementing one from scratch. The usual suspects are zookeeper, consul, etcd, atomix/copycat. Some of them offer nodejs clients:
https://github.com/alexguan/node-zookeeper-client
https://www.npmjs.com/package/consul
https://github.com/stianeikeland/node-etcd
I've personally never used any of them with nodejs though, so I won't comment on maturity of clients.
If you insist on implementing consensus on your own, then raft should be easier to understand — the paper is surprisingly accessible https://raft.github.io/raft.pdf. They also have some nodejs implementations, but again, I haven't used them, so it is hard to recommend any particular one. Gaggle readme contains an example and skiff has an integration test which documents its usage.
Taking a step back, I'm not sure if the distributed consensus is what you need here. Seems like you have multiple clients and a single server. You can probably use a centralized data store. The problem domain is not really that distributed as well - shapes can be overlaid one on top of the other when they are received by server according to FIFO (imagine multiple people writing on the same whiteboard, the last one wins). The challenge is with concurrent modifications of existing shapes, by maybe you can fallback to last/first change wins or something like that.
Another interesting avenue to explore here would be Conflict-free Replicated Data Types — CRDT. Folks at github used them to implement collaborative "pair" programming in atom. See the atom teletype blog post, also their implementation maybe useful, as collaborative editing seems to be exactly the problem you try to solve.
Hope this helps.
[1] Take a look at jepsen series https://jepsen.io/analyses where Kyle Kingsbury tests various failure conditions of distribute data stores.
Try reading Understanding Paxos. It's geared towards software developers rather than an academic audience. For this particular application you may also be interested in the Multi-Paxos Example Application referenced by the article. It's intended both to help illustrate the concepts behind the consensus algorithm and it sounds like it's almost exactly what you need for this application. Raft and most Multi-Paxos designs tend to get bogged down with an overabundance of accumulated history that generates a new set of problems to deal with beyond simple consistency. An initial prototype could easily handle sending the full-state of the drawing on each update and ignore the history issue entirely, which is what the example application does. Later optimizations could be made to reduce network overhead.

Is there a lightweight database system that supports interpolation and gaps in time-series data?

I need to implement a system for storing timestamped sensor data from several devices on an embedded platform. According to this related question, a relational database is the preferred solution for storing that kind of data and I have therefore been looking at SQLLite.
However, I also need the database to be able to answer questions such as "What was the indoor temperature on Sep 12 at 13:15", even if the sensor data was not recorded precisely at that time. In other words, I need the database to be able to handle interpolation. As far as I could tell, SQLite cannot handle this, nor can the usual suspects (MySQL, PostgreSQL).
In addition, I would also need the database to be able to detect gaps in the data.
This related question seems to deal with mainframe-ish databases and not with embedded ones.
Therefore: Is there a database system suitable for embedded platforms that supports the "usual" operations one might want to perform on time-series data?
You shouldn't want or expect a database to do interpolation for you. Just pull out the nearest values to your desired time, and write your own interpolation. Only you would know what the appropriate type of interpolation should be used. Maybe simple linear across two points, maybe higher order polynomial across more points. It really depends on your system and data modeling situation.
Specialized time series databases.
Try:
RRDTool (simple utility, may be sufficient for you)
OpenTSDB
InfluxDB
Given your use case, may also be relevant to take a totally different approach, using an "IoT" targeted data store optimized for simple inserts (Xively, Phant.io) and then post-process for time series analysis.

UML representation for tasks

I am in the process of designing system with a many tasks and a lot of inter-task messages. The system will be basically developed in C.
In my design I am trying to use UML representation to show the messages that are passed between tasks. But it is becoming difficult to represent things like decision making etc.
Are their any predefined method for creating a flow-chart for task based systems that uses a lot messages?
Need not be UML, is their any other standard method that can be used for this design?
For documenting message flow, I have found that state machines and sequence diagrams each have their place. State machines are better at describing the decisions that change the state of a system. Sequence diagrams are better at describing the messages that implement a specific element of a protocol.
Since I like to use Doxygen for internal documentation anyway, and it likes to draw call graphs and other figures with GraphViz tool dot, I started using dot to document my state machines. Since Doxygen has a syntax for including a dot language directly in the source code (and even allows hyperlinks from elements in the drawing to other pages of the generated documentation) this has been really convenient. Recently, Doxygen grew explicit support for sequence diagrams expressed with mscgen, allowing both styles of diagram to be used.
Having the figures expressed in a reasonably natural way directly in the source code makes them a lot more likely to be maintained than if they were drawn externally in Visio or some other drawing tool.
Maybe you need state machines or sequence diagrams.
Please try a software called Umbrello if you are representing your design in UML. This gives you lot of flexibility in representing your design
use sequence diagram or state machine diagram with MARTE(UML profile for modeling and analysis of real-time embedded system) annotation because i noticed you are working with real-time operating system

Resources