Designing a database without knowing the details of the data? [closed] - database

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Are there cases where database designers are not allowed to know the details of the data? I am looking for real-world examples to learn from — please.

I can't help but tell a story about database nightmares. One of the worst was when Amazon was first growing. Initially they only sold books, then expanded to music, and then to many other things.
For a period of about two years, Amazon would announce a new market every two or three months -- children's clothing, housewares, garden supplies, food, and so on. The database folks were tasked with developing and supporting the systems for the product lines. However, Amazon considered the new product announcements to be highly, highly secret.
In particular, the data warehouse people would be kept further from the loop. Sometimes, they would find out about a new line of business by reading news -- and then have to support it in the data warehouse.
So, they had to develop a flexible database to meet unannounced business needs.
In any business environment, there are new needs that arise. I would suggest a book such as Ralph Kimball's "Data Warehouse Toolkit" for more background on how to develop a fairly robust system.

I am currently working at a company that stores very private personal information. I am not allowed access to the production database. For our development and test environments, we replace all names, addresses, and other personal information with randomly generated information.

Yes, I've often seen databases allow for custom data to be defined by the user. The basic approach is to design a meta data system for your database. Then allow entities associations with custom fields. You wouldn't want to do this for all your data, otherwise you'll just end up with a database in a database, but for dynamically adding a number of custom fields this approach works well.

Related

Cloud based NoSQL database service for sensor data [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We are new to NoSQL and now are starting on a project that aims to record sensor data from many different sensors, each recording a timestamp - value pair, into a cloud based database. The amount of sensors should scale, so the solution should be able handle the sizes of hundreds of millions or possibly even billion(s) writes a year.
Each sensor has its own table with key(timestamp) - value and sensor metadata is in its own table.
The system should support search functions such as the most recent values (fast data retrieval) of certain sensor types and values from time frame of sensors in certain areas (from metadata).
So the question is which cloud database service would be most suited to our needs?
Thanks in advance.
Couchbase is a great option for this type of use case.
Try Apache Cassandra. DataStax provide easy to install packages that includes some useful extras.
I wholeheartedly agree with #Ben that this isn't an answerable question; nevertheless, I would at least consider the reasons for choosing a simple k/v type store over a typical RBDMS. It sounds like this data will likely be aggregated and counted; an RBDMS will typically answer those questions very quickly with correct indexing. 1B writes/yr (or even 30B/yr) is really not that high.

Which Document Oriented Database with better Reporting Performance than Sql Server [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm looking for a document oriented database to store over millions of invoices with fast reporting speed.
I find some options such as MongoDB, Ravendb, Couchdb but I don't know the risk of performance failure vs. Sql Server Xml type column.
The source of this question is here.
Fast reporting is something that you want to do in sql server. I'm not aware of a good NoSQL solution for this scenario.
RavenDB has the index replication bundle that enables you to replicate an index to a sql table, so that you can do some advanced reports on them.
Reza,
RavenDB seems like a good match here. It all depends on what you are actually calling "reporting".
Doing things like "how many invoices are there for last month" is easy in RavenDB.
As is doing things like "how much money does Northwind owe us?"
We don't recommend RavenDB for reporting for the specific case where you have dynamic reporting needs such as the need to do on the fly aggregation.
What is it that you are actually trying to do with regards to reporting?
That aside, invoices is a nice place where RavenDB truly shines, especially given the other parameters of this question with the dynamic nature of the invoices.
but I don't know the risk of performance failure vs. Sql Server Xml type column.
Epic fail already here. Invoices are relational data in most cases (in all you need) so address links, line items, numbers and prices are in tables, not XML data type. This is the "ok, so - you planned to wkr at McDonalds, not in our team?" level design decision.
What line items and invoices may have is additional data in XML (like timesheets etc.) but if you run accounting, you dont run it as documents.

Classroom management software; storing data? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So I am working on a mini-project for the summer to keep my coding skills sharp. I will be using the Qt4 and C++ to make a classroom management system for college professors. I just came up with the idea like 10 minutes ago so I don't have much.
One question I have is what is the best way to store student/class/assignment information so that the software could still be portable and used my different schools.
My first guess would be a MySQL database. I need a gurus opinion on this one though.
Since different sites have different database preferences you might wish to use a layer such as ActiveRecord or PDO or ODBC to abstract out the specific database that your end users want to use. This would allow people to deploy onto PostgreSQL or MySQL or whatever they prefer.
A good choice for single-process server systems could be SQLite3. It's not suitable for all systems, but if your system is designed to scale to a few dozen users at most, it'll probably work fine. (The amount of work you'd need to put into a server to make SQLite3 scale into the hundreds or thousands might argue for planning for a database server environment instead.)
http://www.sqlite.org/
might be a good option. It is embeddable so you don't need a specific database instance running wherever you deploy it
also, http://www.microsoft.com/sqlserver/2005/en/us/compact.aspx is an option

Best language for scripting large scale file management [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The National Park Service's Natural Sounds Program collects multiple terabytes of data each year measuring soundscapes. In your opinion, what is best available scripting language to manage massive amounts of files and file types? We would like to easily design and run efficient user-friendly scripts to search for and retrieve/create copies of files that may be located in different directories according a single static hierarchy. The OS will most likely be windows. Thanks!
Use the one your developers are most familiar with. The productivity gains you'll get from that will almost certainly beat out any advantages that one language may have over another.
Use Python. It's easy to learn. Everyone can easily convert.
The size of the files doesn't much matter when you're searching directories or searching for metadata outside the files. Even so, you rarely need to read an entire sound sample file to strip off the metadata.
Also, if you're doing this frequently, you might want to consider
Extract all the metadata to a relational database.
Use the relational database as a complex "index" to the sound sample files.
Each file add or change would be done through an application that synchronized file changes with database updates to assure that the database index actually matches the filesystem.
The bulk of your searches might become SQL queries.
I don't really know what your are going to be looking for in a scripting language, but Eric is right that you should use something all your developers are familiar with. However, if you don't have developers (yet) and are designing the project (and team) from the ground up, C++ or .Net (C# or VB).
While C++ offers more powerful programming and performance, C# and VB.Net offer quicker production. Regardless of .Net's production advantage, I would think that for massive amounts of files & file types, you will have the best overall satisfaction from C++. In my opinion, the best user friendly design requires very little user input other than clicking buttons or selecting options from a list.

Is Google Spreadsheets a viable database for applications? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 13 years ago.
Is Google Spreadsheets / Docs a viable database option for real-world applications?
Without knowing what this application does, I'd say no. Simply on the basis that a spreadshet != a database.
no, a spreadsheet and a database are distinct concepts. Spreadsheets do not allow you to query the data. Spreadsheets (especially "hosted" like Google's) will not support real world load.
Depends on your application. As a database, typically not. It can be useful in instances where you need "Excel With Web Features" or as an alternative to a database.
Especially now that Spreadsheets, Presentations and Docs are all basically used under the "Google Docs" moniker, the entire "Google Docs" suite is best used as the sum of its parts.
Two examples:
Web forms. Create a form using the Forms application, and all its entrants are stored in a Google Spreadsheet. Personally I've found this to be a pretty quick and easy way to collect names and contact information online, and its all output into a very easy-to-use and portable format.
Using the spreadsheet to interact with other web content. See this example of displaying and grabbing content from Wikipedia for editing: http://ouseful.wordpress.com/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/
You might see for MySQL, PostgreSQL, SQLite to use in applications, for example, to archive my books.

Resources