I'm new to TimescaleDB and I have a question about how to model my data with it and hypertables.
Just to give you a little bit of context, in the project I'm developing, I have time series data coming from a PLC with several sensors and, besides that, I have to store user information (email, password, etc.).
I discover TimescaleDB and I thought it would fit my necessities because it allows me to store the user data and the time series data in the same database.
However, what I don't understand is this: Should the user data be model as normal PostgreSQL tables and use hypertables only for time-series data? or Should both of them be inside a hypertable?
Do you think using TimescaleDB in this case is a good approach?
Thanks for reading my question!
Related
We're building an app that will have a number of games. Kids will learn Math as they play these games. All the user profile data, game data and lessons/ questions data are all being stored in the app and will sync to a MySQL database on the server side.
There also tons of events data that we would like to capture, analyze and improve our game. These events could be the start of a lesson, touching a game object, choosing the correct game object but targeting it wrongly, answering correctly but got timed out and so on. We expect this to be 100s of rows for each game that the kids plays. Also the data stored will be dependent on the type of event.
The database should allow us to analyze the data and answer questions like which games are tough on kids, which lessons are too easy for kids, are kids from some countries finding some of the lessons to be tough, how long are each of these games able to hold the attention of the kid and so on.
Which database would allow us to store so many different types of events, scale to millions of rows a day and allow for all these kinds of analysis? Given the changing nature of the data model, NoSQL seems to be an obvious choice. But which one would allow us to do all these analysis. Or should we go with Hadoop / Hive?
Thanks in advance.
Although you can do this using Hadoop/Hive, but you won't get real time performance as Hive is best suited for batch processing kinda stuff. Hbase would be a better choice in such a scenario. You could create OLAP datacube kinda thing whose dimensions could be the info specified by you, like session info, info about each kid etc etc. Or you could serialize all of this information as JSON objects and then store them in Hbase cells. You could also store each of these events in individual cells, but that would consume unnecessary space and won't be that efficient while fetching the data back.
HTH
I hope anyone can help me out in this topic, even if it's not a specific programming question.
I'm writing a bachelor thesis, where I compare MySQL to MongoDB and I want to write something about Youtube, as the platform has to handle many requests with heavy dataload.
The only good resource which I found was this video: Seattle Conference on Scalability: YouTube Scalability
As the conference was in 2007, I can imagine there were some updates regarding to the database.
The last information that I have from this talk is that the thumbnails are stored in a BigTable database and the metadata in MySQL. Are there any changes since then?
Where are the videos stored? Is there an entry in the MySQL table, which refers to the stored video?
Thanks in advance for the answer!
According to this, youtube still uses mysql: http://code.google.com/p/vitess/wiki/ProjectGoals
I am not sure of how things are at youtube but I am in process of developing a similar application for our client. So what we are doing is we are making the use of best of both worlds i.e SQL and NoSQL..
We store the videos on disk and store the path to these videos in MySQL db table. Then we have a separate table which holds the genre and video mapping i.e which video belongs to which particular genre.
Today with vast of pool of user data we are in position to leverage upon these data like we had never been before, so you see things are now way different then 2007 and with the popularity and dependency of people on internet when it comes to sites like you tube we have vast set of unstructured data which if used properly can give you great results. So in our project we store the site admin and reporting stuff like user db, video locations and genre mapping etc in MySQL and store the unstructured data about user interaction in NoSQL database. We then use the NoSQL data to do all the analytics and give appropriate results to the user.
They are using mysql with Bigdata.
The user information such has who uploaded the file,file information all will be stored in mysql and data will be stored in Bigdata.
I think they are using database that can use FileTable
I'm starting to work on a financial information website (somewhat like google finance or bloomberg).
My website needs to display live currency, commodity, and stock values. I know how to do this frontend-wize, but I have a backend data storing question (I already have the data feed APIs):
How would you guys go about this - would you set up your own database and save all the data in the db with some kind of a backend worker, and then plug in your frontend to your db, or would you plug your frontend directly to the API and not mine the data?
Mining the data could be good for later reference (statistics and other things that the API wont allow), but can such a big quantity of ever growing information be stored on a database? Is this feasible? What other things should I be considering?
Thank you - any comment would be much appreciated!
First, I'd cleanly separate the front end from the code that reads the source APIs. Having done that, I could have the code that reads the source APIs feed the front end directly, feed a database, or both.
I'm a database guy. I'd lean toward feeding data from the APIs into a database, and connecting the front end to the database. But it really depends on the application's requirements.
Feeding a database makes it simple and cheap to change your mind. If you (or whoever) decides later to keep no historical data, just delete old data after storing new data. If you (or whoever) decides later to keep all historical data, just don't delete old data.
Feeding a database also gives you fine-grained control over who gets to see the data, relatively independent of their network operating system permissions. Depending on the application, this may or may not be a good thing.
I've never designed a database before, but I've had experience programming in a few languages and assembler throughout college, as well as some web design, so I'm able to at least pick up what I need to know if I can be pointed in the right direction. One of the tasks of my job is to sort through some data that we've been collecting in the field, using a "sonde" which measures temperature, pH, conductivity, and other parameters. The device sits in a stream 24/7 (except for when we take it out and switch it with our other sonde every couple weeks, so that we can put in a newly calibrated one in the stream and retrieve the data from the one that was in the field). It collects data every 15 minutes or so, and has done so since 2007. Currently, all of our data is spread across multiple excel spreadsheets, and we have additional data from a weather station and another instrument that all gets compiled into quarterly documents. My goal is to design as simple of a database as possible with most of the functionality of a database like this: http://hudson.dl.stevens-tech.edu/hrecos/d/index.shtml. Ours would be significantly simpler as it is not live data (but would instead retrieve data from files that we upload once we'd finished handling the formatting and compilation of all our data). I would very much like the graphing ability on the site that the above database has, but I at least need to be able to select a range of data and select as many variables as I want within that time range and then be able to download a spreadsheet with the generated data (or at least a CSV file).
I realize this is a tough task, and as I have not designed a database before, I suspect it is very much an uphill task. However if I would be able to learn the things necessary to do this, and make it web-accessible, that would be a huge accomplishment and very much impress my boss. Any advice or tips to go off in the right direction would be very much appreciated.
Thanks for your help!
There are actually 2 parts to the solution you're looking for:
The database, which will store your data in a single organized place, and
The application, which is the interface used by people to interact with the database.
Basically, a database by itself is just a container. You need some kind of application which accept criteria from a user, pull the appropriate data meeting the criteria from the database, and display it to the user in a meaningful fashion - in this case, a graph or a spreadsheet.
Normally for web-based apps the database and application are two separate components. However, for a small app with a fairly small number of users, and especially for someone just starting out, you may want to consider an all-in-one solution like InfoDome, sort of like MSAccess for the web.
Either way, you're still going to need to learn about database design. There's many good tutorials out there, just do some searching. DatabaseAnswers.org has been useful for me. They have a set of tutorials as well as a large collection of sample database schemas.
I have a simple application to store Contacts. This application uses a simple relational database to store Contact information, like Name, Address and other data fields.
While designing it, I question came to my mind:
When designing programs that uses databases, should I retrieve all database records and store them in objects in my program, so I have a very fast performance or I should always gather data only when required?
Of course, retrieving all data can only be done if it`s not too many, but do you use this approach when you make sure that the database will be small (< 300 records for example)?
I have designed once a similar application that fetches data only when needed, but that was slow (using a Access database).
Thanks for all help.
This depends a lot on the type of data, the state your application works in, transactions, multiple users, etc.
Generally you don't want to pull everything and operate on the data within your application because almost all of the above conditions will cause data to become non-synchronized. Imagine a user updating a contact while someone else is viewing that information from a cached version inside their application.
In your application, you should design the database queries such that they retrieve what is going to be displayed on the current screen. If the user is viewing a list of contacts, then the query would retrieve the entire contact table, or a portion of it if you are doing a paginated view. When they click on a contact, for example, for more information, then a new query would request the full details of that contact.
For strings and small pieces of data like what a contact list involves, you shouldn't have any speed issues working with a relational database like SQL, MySql or Oracle.
I think it will be best to retrieve data when needed , retrieving all the records and storing it in object can be an overhead. And when you say you have a small database , retrieving the records when needed should not be an issue at all.