While allocating a dataset, What does MGMTCLASS of a dataset describe? To my knowledge it gives the retention and expiration period that it is gonna reside on disk and the possible values I have observed are BKUP35, NOBKNLIM etc. What are these stand for and what else are the possible value for this parameter? Hope I put my question exactly, please lemme know if i missed something...
Addendum: Can i ask another question here. How often does a dataset is set to be backed up? I know it's specific to installation of SMS, but do we have something related to MGMTCLASS. Say dataset will be backed up when it stays some % of time specified on MGMTCLASS like that.. am I clear?
On IBM z/os mainframes, the MGMTCLAS values are defined by your systems management. Each installation may have different values. You will have to ask your site management for the values they have defined for your environment.
MGMTCLAS is given when defining a new dataset on a system where the Storage Management System (SMS) feature is installed. SMS uses the assigned MGMTCLAS to guide various aspects of dataset management. Typical usages are:
Manage migration of inactive datasets to archival storage
Set the backup frequency
Determine how often to compress partitioned datasets
To delete datasets where the retention period has expired
To release unused space in a dataset
To enable cost accounting on datasets
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Following the Prometheus webpage one main difference between Prometheus and InfluxDB is the usecase: while Prometheus stores time series only InfluxDB is better geared towards storing individual events. Since there was some major work done on the storage engine of InfluxDB I wonder if this is still true.
I want to setup a time series database and apart from the push/push model (and probably a difference in performance) I can see no big thing which separates both projects. Can someone explain the difference in usecases?
InfluxDB CEO and developer here. The next version of InfluxDB (0.9.5) will have our new storage engine. With that engine we'll be able to efficiently store either single event data or regularly sampled series. i.e. Irregular and regular time series.
InfluxDB supports int64, float64, bool, and string data types using different compression schemes for each one. Prometheus only supports float64.
For compression, the 0.9.5 version will have compression competitive with Prometheus. For some cases we'll see better results since we vary the compression on timestamps based on what we see. Best case scenario is a regular series sampled at exact intervals. In those by default we can compress 1k points timestamps as an 8 byte starting time, a delta (zig-zag encoded) and a count (also zig-zag encoded).
Depending on the shape of the data we've seen < 2.5 bytes per point on average after compactions.
YMMV based on your timestamps, the data type, and the shape of the data. Random floats with nanosecond scale timestamps with large variable deltas would be the worst, for instance.
The variable precision in timestamps is another feature that InfluxDB has. It can represent second, millisecond, microsecond, or nanosecond scale times. Prometheus is fixed at milliseconds.
Another difference is that writes to InfluxDB are durable after a success response is sent to the client. Prometheus buffers writes in memory and by default flushes them every 5 minutes, which opens a window of potential data loss.
Our hope is that once 0.9.5 of InfluxDB is released, it will be a good choice for Prometheus users to use as long term metrics storage (in conjunction with Prometheus). I'm pretty sure that support is already in Prometheus, but until the 0.9.5 release drops it might be a bit rocky. Obviously we'll have to work together and do a bunch of testing, but that's what I'm hoping for.
For single server metrics ingest, I would expect Prometheus to have better performance (although we've done no testing here and have no numbers) because of their more constrained data model and because they don't append writes to disk before writing out the index.
The query language between the two are very different. I'm not sure what they support that we don't yet or visa versa so you'd need to dig into the docs on both to see if there's something one can do that you need. Longer term our goal is to have InfluxDB's query functionality be a superset of Graphite, RRD, Prometheus and other time series solutions. I say superset because we want to cover those in addition to more analytic functions later on. It'll obviously take us time to get there.
Finally, a longer term goal for InfluxDB is to support high availability and horizontal scalability through clustering. The current clustering implementation isn't feature complete yet and is only in alpha. However, we're working on it and it's a core design goal for the project. Our clustering design is that data is eventually consistent.
To my knowledge, Prometheus' approach is to use double writes for HA (so there's no eventual consistency guarantee) and to use federation for horizontal scalability. I'm not sure how querying across federated servers would work.
Within an InfluxDB cluster, you can query across the server boundaries without copying all the data over the network. That's because each query is decomposed into a sort of MapReduce job that gets run on the fly.
There's probably more, but that's what I can think of at the moment.
We've got the marketing message from the two companies in the other answers. Now let's ignore it and get back to the sad real world of time-data series.
Some History
InfluxDB and prometheus were made to replace old tools from the past era (RRDtool, graphite).
InfluxDB is a time series database. Prometheus is a sort-of metrics collection and alerting tool, with a storage engine written just for that. (I'm actually not sure you could [or should] reuse the storage engine for something else)
Limitations
Sadly, writing a database is a very complex undertaking. The only way both these tools manage to ship something is by dropping all the hard features relating to high-availability and clustering.
To put it bluntly, it's a single application running only a single node.
Prometheus has no goal to support clustering and replication whatsoever. The official way to support failover is to "run 2 nodes and send data to both of them". Ouch. (Note that it's seriously the ONLY existing way possible, it's written countless times in the official documentation).
InfluxDB has been talking about clustering for years... until it was officially abandoned in March. Clustering ain't on the table anymore for InfluxDB. Just forget it. When it will be done (supposing it ever is) it will only be available in the Enterprise Edition.
https://influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/
Within the next few years, we will hopefully have a well-engineered time-series database that is handling all the hard problems relating to databases: replication, failover, data safety, scalability, backup...
At the moment, there is no silver bullet.
What to do
Evaluate the volume of data to be expected.
100 metrics * 100 sources * 1 second => 10000 datapoints per second => 864 Mega-datapoints per day.
The nice thing about times series databases is that they use a compact format, they compress well, they aggregate datapoints, and they clean old data. (Plus they come with features relevant to time data series.)
Supposing that a datapoint is treated as 4 bytes, that's only a few Gigabytes per day. Lucky for us, there are systems with 10 cores and 10 TB drives readily available. That could probably run on a single node.
The alternative is to use a classic NoSQL database (Cassandra, ElasticSearch or Riak) then engineer the missing bits in the application. These databases may not be optimized for that kind of storage (or are they? modern databases are so complex and optimized, can't know for sure unless benchmarked).
You should evaluate the capacity required by your application. Write a proof of concept with these various databases and measures things.
See if it falls within the limitations of InfluxDB. If so, it's probably the best bet. If not, you'll have to make your own solution on top of something else.
InfluxDB simply cannot hold production load (metrics) from 1000 servers. It has some real problems with data ingestion and ends up stalled/hanged and unusable. We tried to use it for a while but once data amount reached some critical level it could not be used anymore. No memory or cpu upgrades helped.
Therefore our experience is definitely avoid it, it's not mature product and has serious architectural design problems. And I am not even talking about sudden shift to commercial by Influx.
Next we researched Prometheus and while it required to rewrite queries it now ingests 4 times more metrics without any problems whatsoever compared to what we tried to feed to Influx. And all that load is handled by single Prometheus server, it's fast, reliable, and dependable. This is our experience running huge international internet shop under pretty heavy load.
IIRC current Prometheus implementation is designed around all the data fitting on a single server. If you have gigantic quantities of data, it may not all fit in Prometheus.
So I just started messing around with Google BigQuery about 10 minutes ago, and I was wondering if anyone is aware of the underlying architecture that they're using to store the data? For example, is this just the next generation of their own BigTable infrastructure?
Also, is it clear what sorts of strategies they're using for indexes, index rebuilds, etc? I'm just trying to analyze whether this is mature enough at this point where you can be 100% sure of what's going on with your data end-to-end, or is there a bit of a black box area where "things just work"?
There are no indexes... every query is a table scan. The query architecture is described here.
Your data is stored in a proprietary columnar format called ColumnIO on Colossus (a successor to GFS). Colossus replicates the data within a datacenter and your data is also replicated to other geographic regions to make sure it stays available even if a Google datacenter goes offline.
To answer your specific questions
While data may be temporarily stored in Bigtable, all data is stored long-term in Colossus (for now!).
New data added to bigquery is encrypted at rest (that is, whenever it is written out to permanent storage). It is also encrypted when sent over the network.
As mentioned, no indexes, so there are no strategies for rebuilding the index. Depending on how you add data to your table, your table may be coalesced, which means rewriting the underlying files in a more efficient manner.
Colossus underlies a massive amount of Google data across a wide range of services, ColumnIO is a standard throughout Google. I would call both of these technologies mature.
However, you should also consider it a black box. All of the details here may change as storage systems at Google mature or architectures change. However, it should always "just work" (within SLA caveats, of course)
If you're interested in more details about how BigQuery works under the covers or how to use it effectively, here is a shameless plug for our book on the subject which is due out in June.
I will keep this short, I am looking to store product plan data, these are the plans that the users would pick for their payment options. This data include how much the plan cost and what the unit details of the plan are, like what makes a unit (day/week/month) and fairly simple data about the plan. These plans may or may not change once a month or once a year, the company is a start up and things are always changing on the 11th hour and contently so there is no real way to predict when they will change. A co-worker and I are discussing whether these values should be stored in the web.config (where they currently are) or move them to the database.
I have done some googling and I have not found any good resource that help draw a clear line of when something should be in the database or in the web config. I wanted to know what your thought on this was and see if someone could clearly define when data should be stored in config or in the database.
Thanks for the help!
From the brief description you provide, it seems to me that the configuration data, eventually, may be accessed not just by your web server-based application running on one computer, but also by other supporting applications, such as end-of-month batch jobs, that you may want to run on other computers. To support that possibility, it would be a good idea to store the data in some sort of centralized repository that can be accessed remotely from multiple computers.
Storing the configuration data in a database is the obvious way to meet that requirement. But if you don't want to do that, then another approach would be to store the configuration data in a file on a company-internal (rather than public) web/ftp server. Then an application can use a utility such as curl to retrieve the configuration file from the web/ftp server.
Of those two approaches, I think using a database is probably best, because it provides an ergonomic way to not just read the configuration data, but also update it.
i want to code a software with Delphi XE, that will be able to connect to a server and users should be able to read/write the database.
All records will be string (unicode enabled), maybe small amount of it can be blob.
My needs are;
Multiple users enabled
More than one user should be able add new records at one time
Capable of storing huge amount of data
Users can be able to edit their own records
Unicode enabled
As possible as low cost solution
Thanks right now...
I vote for Firebird. It fits all your needs and it is free.
I would go with postgres - it's also free and is very fast.
Sandeep
Most of your requirements are handled by most modern database engines (althout concurrency management is not exactly the same among all databases). But to choose the database(s) that would suit you best you should give more precise informations:
"Multiple users". How many concurrent connections? 10? 100? 1000? 10000? 100000? More?
"More than one user should be able add new records at one time". How many inserts per hour? Is this an OLTP database, or a DW one?
"Capable of storing huge amount of data". How many tables? How many rows? How many fields? What's the average row size? Do you need LOB support? How many indexes?
"Users can be able to edit their own records". How often? How many? How long? Some databases have better locking mechanism than others.
"Unicode enabled". Which flavour? UTF-8? UTF-16?
"All records will be string". Which is the maximum string length you need? Hope they are "natural" string fields - storing non-natural string data in string fields usually lower performance.
I'm sure you'll get others, but ElevateDB fits your needs.
It's the follow-on to DBISAM, which does NOT have Unicode support. But ElevateDB does.
May I suggest to take a look at NexusDB. It also fits all your needs. Bill Todd has just reviewed the product.
"Users can be able to edit their own records" What does it mean for you? A database in which records are not editable, that is a Read/Only database, is not very common.
You'll have to think about the general architecture of your software. You just don't select a database like a new car. I'd suggest that you won't be focused on the database choice, but take a look at the whole picture.
Here are some advices:
Separate your database storage, the User Interface, and your software logic. This is called 3-Tier, and is definitively a good idea if you're starting a new project in 2010. It will save you a lot of time in the future. We use such an architecture in our http://blog.synopse.info/category/Open-Source-Projects/SQLite3-Framework
Use a database connection which is not fixed to one database engine. Delphi comes with DBX, and there are free or not so expensive alternatives around. See http://edn.embarcadero.com/article/39500 for dbx and http://www.torry.net/pages.php?id=552 for alternatives
Think about the future: try to guess what will be the features of your application after some time, and try to be ready to implement them in your today's architecture choices.
In all cases, you're right asking for advice and feedback. The time you're spending now before coding will spare your time during future maintenance.
For example, if one of your request is that "All records will be string", with some BLOB, your database size won't never be bigger than a few GB. SQLite3 could be enough for you, and there is no size limitation in the TEXT fields in this database.
Nobody's mentioned SQL Server Express so I guess I'll do it...
Microsoft SQL Server Express is jolly good and is also free.
Yes, it does have limits but they're pretty big and it's not possible to know if they're sufficient without further info from the OP.
Multiple users enabled - yep
More than one user should be able add new records at one time - yep
Capable of storing huge amount of data - depends on definition of huge. But "probably"
Users can be able to edit their own records - umm, yes
Unicode enabled - yep
As possible as low cost solution - it's free. But the data access components will depend on your choice of access method
I have to develop a database for a unique environment. I don't have experience with database design and could use everybody's wisdom.
My group is designing a database for piece of physics hardware and a data acquisition system. We need a system that will store all the hardware configuration parameters, and track the changes to these parameters as they are changed by the user.
The setup:
We have nearly 200 detectors and roughly 40 parameters associated with each detector. Of these 40 parameters, we expect only a few to change during the course of the experiment. Most parameters associated with a single detector are static.
We collect data for this experiment in timed runs. During these runs, the parameters loaded into the hardware must not change, although we should be able to edit the database at any time to prepare for the next run. The current plan:
The database will provide the difference between the current parameters and the parameters used during last run.
At the start of a new run, the most recent database changes be loaded into hardware.
The settings used for the upcoming run must be tagged with a run number and the current date and time. This is essential. I need a run-by-run history of the experimental setup.
There will be several different clients that both read and write to the database. Although changes to the database will be infrequent, I cannot guarantee that the changes won't happen concurrently.
Must be robust and non-corruptible. The configuration of the experimental system depends on the hardware. Any breakdown of the database would prevent data acquisition, and our time is expensive. Database backups?
My current plan is to implement the above requirements using a sqlite database, although I am unsure if it can support all my requirements. Is there any other technology I should look into? Has anybody done something similar? I am willing to learn any technology, as long as it's mature.
Tips and advice are welcome.
Thank you,
Sean
Update 1:
Database access:
There are three lite applications that can write and read to the database and one application that can only read.
The applications with write access are responsible for setting a non-overlapping subset of the hardware parameters. To be specific, we have one application (of which there may be multiple copies) which sets the high voltage, one application which sets the remainder of the hardware parameters which may change during the experiment, and one GUI which sets the remainder of the parameters which are nearly static and are only essential for the proper reconstruction of the data.
The program with read access only is our data analysis software. It needs access to nearly all of the parameters in the database to properly format the incoming data into something we can analyze properly. The number of connections to the database should be >10.
Backups:
Another setup at our lab dumps an xml file every run. Even though I don't think xml is appropriate, I was planning to back up the system every run, just in case.
Some basic things about the design; you should make sure that you don't delete data from any tables; keep track of the most recent data (probably best with most recent updated datetime); when the data value changes, though, don't delete the old data. When a run is initiated, tag every table used with the Run ID (in another column); this way, you maintain full historical record about every setting, and can pin exactly what the state used at a given run was.
Ask around of your colleagues.
You don't say what kind of physics you're doing, or how big the working group is, but in my discipline (particle physics) there is a deep repository of experience putting up and running just this type of systems (we call it "slow controls" and similar). There is a pretty good chance that someone you work with has either done this or knows someone who has. There may be a detailed description of the last time out in someone's thesis.
I don't personally have much to do with this, but I do know this: one common feature is to have no-delete-no-overwrite design. You can only add data, never remove it. This preserves your chances of figuring out what really happened in the case of trouble
Perhaps I should explain a little more. While this is an important task and has to be done right, it is not really related to physics, so you can't look it up on Spires or on arXive.org. No one writes papers on the design and implementation of medium sized slow controls databases. But they do sometimes put it in their dissertations. The easiest way to find a pointer really is to ask a bunch of people around the lab.
This is not a particularly large database by the sounds of things. So you might be able to get away with using Oracle's free database which will give you all kinds of great flexibility with journaling (not sure if that is an actual word) and administration.
Your mentioning of 'non-corruptible' right after you say "There will be several different clients that both read and write to the database" raises a red flag for me. Are you planning on creating some sort of application that has a interface for this? Or were you planning on direct access to the db via a tool like TOAD?
In order to preserve your data integrity you will need to get really strict on your permissions. I would only allow one (and a backup) person to have admin rights with the ability to do the data manipulation outside the GUI (which will make your life easier).
Backups? Yes, absolutely! Not only should you do daily, weekly and monthly backups you should do full and incremental. Also, test your backup images often to confirm they are in fact working.
As for the data structure I would need much greater detail in what you are trying to store and how you would access it. But from what you have put here I would say you need the following tables (to begin with):
Detectors
Parameters
Detector_Parameters
Some additional notes:
Since you will be doing so many changes I recommend using a version control like SVN to keep track of all your DDLs etc. I would also recommend using something like bugzilla for bug tracking (if needed) and using google docs for team document management.
Hope that helps.