Algotrading - store websocket data in database - database

I'm trying with Algo-trading. so I have WebSocket which has market data, I'm trying to store in the database. other modules read this stored database and make the buy/sell decisions.
what should be the best database option here considering it should be faster to insert the data and required concurrent reading.
how do I handle the WebSocket data, it contains a lot of data to store, without losing the data.
?

To answer question 1:
You can use a memory based database such as redis.
For question 2:
you can use rabbitMQ or kafka.

To answer question 1:
Several people use Redis Timeseries to store market data for analysis/display. Insert and retreival are fast.
For question 2:
You could run several consumers and use Redis for de-duplication so you don't update everything twice, but it will heavily depend on your stack

Related

Database Design for continuous data stream

I am currently developing a tool where one client A can send a continuous data stream (just text) to a server and another client B should be able to watch the data stream in real time, as in fetching the same data from the server again.
Of course the server should not send all the available data to client B since it can get a lot of text, so I am currently thinking in how to design that client B is only fetching the newest data.
My first approach was to do it similar to pagination where client B sends another attribute client_lines = 10 to the server indicating how many lines of data he already posses and then we can query our database with where lines > client_lines.
But the database can grow quite large since we would have one database for many users each sending data which can have a lot of text-lines. So querying the complete database with data from different users does not seem like a super efficient solution.
Is there any smarter approach? Maybe using a NoSQL database like MongoDB?
You are looking for a Topic, a pub-sub implementation in which multiple subscribers can receive messages that are published and consumers may consume just the incremental bits. You can find good implementations of this by products like ActiveMQ, JMS, Kafka, Amazon SNS, Kinesis, and many more. It occasionally implemented in a relational database, but rarely is it implemented well in a relational database. You are generally far better off using a dedicated solution.
Note, often a database will subscribe to the topic in order to receive updates, and bridge to the relational model.

Tools to Listen for Database Updates

I'm extremely new towards databases, so I apologize in advance. I have been tasked with a problem to update tables in one database if another is updated. From what I've read, this is considered bad practice (duplicating data); however, the two programs that use the two databases are too large to make any changes to update tables in both of the respective tables.
As a temporary solution, we are planning on implementing a service that pulls from one database and updates the tables in another. The tables may not have the same identical names, but they will hold the same attributes. At this moment, I believe one database is Oracle and the other is Postgres.
What frameworks / languages / services would be most appropriate to complete this plan of action? We were thinking about using Flask or another API, but were unsure if this is truly needed. Is there an easier solution than building an API?

Transfer data between NoSQL and SQL databases on different servers

Currently, I'm working on a MERN Web Application that'll need to communicate with a Microsft SQL Server database on a different server but on the same network.
Data will only be "transferred" from the Mongo database to the MSSQL one based on a user action. I think I can accomplish this by simply transforming the data to transfer into the appropriate format on my Express server and connecting to the MSSQL via the matching API.
On the flip side, data will be transferred from the MSSQL database to the Mongo one when a certain field is updated in a record. I think I can accomplish this with a Trigger, but I'm not exactly sure how.
Do either of these solutions sound reasonable or are there more better/industry standard methods that I should be employing. Any and all help is much appreciated!
There are (in general) two ways of doing this.
If the data transfer needs to happen immediately, you may be able to use triggers to accomplish this, although be aware of your error handling.
The other option is to develop some form of worker process in your favourite scripting language and run this on a schedule. (This would be my preferred option, as my personal familiarity with triggers is fairly limited). If option 1 isn't viable, you could set your schedule to be very frequent, say once per minute or every x seconds, as long as a new task doesn't spawn before the previous is completed.
The broader question though, is do you need to have data duplicated across two different sources? The obvious pitfall with this approach is consistency, should anything fail you can end up with two data sources wildly out of sync with each other and your approach will have to account for this.

SQL Server switching live database

A client has one of my company's applications which points to a specific database and tables within the database on their server. We need to update the data several times a day. We don't want to update the tables that the users are looking at in live sessions. We want to refresh the data on the side and then flip which database/tables the users are accessing.
What is the accepted way of doing this? Do we have two databases and rename the databases? Do we put the data into separate tables, then rename the tables? Are there other approaches that we can take?
Based on the information you have provided, I believe your best bet would be partition switching. I've included a couple links for you to check out because it's much easier to direct you to a source that already explains it well. There are several approaches with partition switching you can take.
Links: Microsoft and Catherin Wilhelmsen blog
Hope this helps!
I think I understand what you're saying: if the user is on a screen you don't want the screen updating with new information while they're viewing it, only update when they pull a new screen after the new data has been loaded? Correct me if I'm wrong. And Mike's question is also a good one, how is this data being fed to the users? Possibly there's a way to pause that or something while the new data is being loaded. There are more elegant ways to load data like possibly partitioning the table, using a staging table, replication, have the users view snapshots, etc. But we need to know what you mean by 'live sessions'.
Edit: with the additional information you've given me, partition switching could be the answer. The process takes virtually no time, it just changes the pointers from the old records to the new ones. Only issue is you have to partition on something patitionable, like a date or timestamp, to differentiate old and new data. It's also an Enterprise-Edition feature and I'm not sure what version you're running.
Possibly a better thing to look at is Read Committed Snapshot Isolation. It will ensure that your users only look at the new data after it's committed; it provides a transaction-level consistent view of the data and has minimal concurrency issues, though there is more overhead in TempDB. Here are some resources for more research:
http://www.databasejournal.com/features/mssql/snapshot-isolation-level-in-sql-server-what-why-and-how-part-1.html
https://msdn.microsoft.com/en-us/library/tcbchxcb(v=vs.110).aspx
Hope this helps and good luck!
The question details are a little vague so to clarify:
What is a live session? Is it a session in the application itself (with app code managing it's own connections to the database) or is it a low level connection per user/session situation? Are users just running reports or active reading/writing from the database during the session? When is a session over and how do you know?
Some options:
1) Pull all the data into the client for the entire session.
2) Use read committed or partitions as mentioned in other answers (however requires careful setup for your queries and increases requirements for the database)
3) Use replica database for all queries, pause/resume replication when necessary (updating data should be faster than your process but it still might take a while depending on volume and complexity)
4) Use replica database and automate a backup/restore from the master (this might be the quickest depending on the overall size of your database)
5) Use multiple replica databases with replication or backup/restore and then switch the connection string (this allows you to update the master constantly and then update a replica and switch over at a certain predictable time)

Data Processing / Mining Question

I'm starting to work on a financial information website (somewhat like google finance or bloomberg).
My website needs to display live currency, commodity, and stock values. I know how to do this frontend-wize, but I have a backend data storing question (I already have the data feed APIs):
How would you guys go about this - would you set up your own database and save all the data in the db with some kind of a backend worker, and then plug in your frontend to your db, or would you plug your frontend directly to the API and not mine the data?
Mining the data could be good for later reference (statistics and other things that the API wont allow), but can such a big quantity of ever growing information be stored on a database? Is this feasible? What other things should I be considering?
Thank you - any comment would be much appreciated!
First, I'd cleanly separate the front end from the code that reads the source APIs. Having done that, I could have the code that reads the source APIs feed the front end directly, feed a database, or both.
I'm a database guy. I'd lean toward feeding data from the APIs into a database, and connecting the front end to the database. But it really depends on the application's requirements.
Feeding a database makes it simple and cheap to change your mind. If you (or whoever) decides later to keep no historical data, just delete old data after storing new data. If you (or whoever) decides later to keep all historical data, just don't delete old data.
Feeding a database also gives you fine-grained control over who gets to see the data, relatively independent of their network operating system permissions. Depending on the application, this may or may not be a good thing.

Resources