SQLite3 uses dynamic typing rather than static typing, in contrast to other flavors of SQL. The SQLite website reads:
Most SQL database engines (every SQL database engine other than SQLite, as far as we know) uses static, rigid typing. With static typing, the datatype of a value is determined by its container - the particular column in which the value is stored.
SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container.
It seems to me that this is exactly what you don't want, as it lets you store, for example, strings in integer columns.
The page continues:
...the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases.
I have two questions:
The use case question: What are some examples where SQLite3's dynamic typing is beneficial?
The historical/design question: What was the motivation for implementing SQLite with dynamic typing?
This is called type affinity in SQLite.
According to the SQLite website, they have done this "in order to maximize compatibility between SQLite and other database engines." (see the above link)
SQLite supports the concept of "type affinity" on columns. The type affinity of a column is the recommended type for data stored in that column. The important idea here is that the type is recommended, not required. Any column can still store any type of data. It is just that some columns, given the choice, will prefer to use one storage class over another. The preferred storage class for a column is called its "affinity".
My understanding is that SQLite is exactly what it's named for - a very lightweight, minimalistic database engine. The overhead associated with strong typing is probably beyond the scope of the project, and best left to the application that uses SQLite.
But again, according to their website, they've done this to maximize compatibility with other DB engines.
If you look at, say, Firefox's "about:config" page, I believe these settings are actually stored in an SQlite database (I'm not 100% sure, though). The benefit of using SQlite's dynamic typing is that each value in the settings can be strong-typed (e.g. the "alerts.totalOpenTime" setting is an integer, while "app.update.channel" is a string) without having to have one separate column per type.
It's basically the same argument as for programming languages, in the end: why have dynamic typing in a programming language over static typing?
Related
This is a Postgres specific question. I am in a middle of the classic design situation where I have to decide whether to use Stored Procedure or Dynamic SQL (Prepared statement). I have read a lot and lot of blogs regarding the same and have come to conclusion that with current implementation of advanced database systems, there isn't any specific attribute that would weigh one over the other.
Hence my question is being Postgresql specific.
What I want to ask is, are there advantages or disadvantages of using Stored Procedures in Postgres?
More about my design: As we are using Postgres specific functions like width_bucket and relying on various other things like Partitioning and Inheritance that Postgres provides, it is unlikely that we would switch to any other database provider in future. Our queries would be complex queries involving building of graphs and reports from the real time/ non-real time data.
There also would be some analytics built. Moreover, we would also be planning of sharding and partitioning our database.
I want view points on the use of Stored Procedure with the type of system and environment I have describe above, specific to Postgresql.
I would also like to understand how query optimization and execution works in Postgres.
Ok, so your question is whether to create sql on the client side and send it to the server, vs stored procedures. Note, usually if you use stored procedures, you still have to create the sql that calls them so it is not purely an either/or. So this is about a relational interface vs stored procedures.
Additionally it is worth noting that a key question is whether this is a database owned by an application or something that many applications may use. In the former, you may not worry about encapsulation, but in the latter you want to think about your database as having a service interface.
So if it is "my application has a database and all material use goes through my application" then go with dynamic SQL against the underlying tables.
If your database has one or more applications, however, you want to make sure you can change your database structure without breaking any or all of your databases. This usually means encapsulating access behind some sort of abstract interface. This can be a use of VIEWs or stored procedures.
Views have an advantage that they can be directly manipulated in SQL, and are very flexible. This allows wide-open retrieval (and with some work storage) of data behind them. The application does not need to know how data is physically stored, just how to access it.
Stored procedures have the same benefit of encapsulation but provide a much more limited interface. They also have the problem that usually people use them in ways that require a fixed number of arguments, so adding an argument requires close coordination of updates for the db and the application (Oracle's revision based editions are a solution to this problem but PostgreSQL has nothing similar). However, one can discover arguments and handle them appropriately at run-time with a little work.
All in all this is a wide question and the specifics will be more important than generalities.
When building objects that make use of data stored in a RDBMS, it's normally pretty clear what you're getting back, as dictated by the tables and columns being queried. However, when dealing with NoSQL, document-based systems, it's less clear what is being retrieved.
What are common methods of keeping track of structure in which data is stored?
It depends on the driver. With the NORM driver you can "serialize" and "deserialize" an instance of an object in and out the db. It will throw an error when there is an extra field in the db that isn't present in the class definition. This is the default behaviour of NORM but they are adding the possibility to make it more flexible.
Read here: http://groups.google.com/group/norm-mongodb/browse_thread/thread/31102ec553a50e19
Not only does this depend on what database you're using, but it also depends on the language/framework you're coding with.
Most opinionated frameworks expect an ODM of some sort where you define a schema that is enforced in your models - like Rails, for example - and other frameworks let you do whatever you want, which puts you at risk of having data in multiple formats and not knowing what to do with it...
For MongoDB I've toyed with the notion of a soft schema, where every collection (table) has a document with a title of "schema" and defines the different elements and their datatypes in an embedded array called "definition." This allows me to generate dynamic scaffolds based on each collection, and can come in very handy when integrating with non-ODM platforms - in my case, Joomla.
Another approach is to store those schema definitions in a separate collection called schemas or schemata or some such.
You most certainly want to lock down some sort of schema in your code to ensure your data is in a predictable format; this is also important to address whenever your schemas change, and they invariably will.
There are also frameworks where your coding style does not change too much like playOrm which allows you to store relational data in a noSQL store and perform joins. The trick is partitioning of the data and Scalable SQL so it scales just fine and you can still query your data like you did in the past.
I'm writing a CAD (Computer-Aided Design) application. I'll need to ship a library of 3d objects with this product. These are simple objects made up of nothing more than 3d coordinates and there are going to be no more than about 300 of them.
I'm considering using a relational database for this purpose. But given my simple needs, I don't want any thing complicated. Till now, I'm leaning towards SQLite. It's small, runs within the client process and is claimed to be fast. Besides I'm a poor guy and it's free.
But before I commit myself to SQLite, I just wish to ask your opinion whether it is a good choice given my requirements. Also is there any equivalent alternative that I should try as well before making a decision?
Edit:
I failed to mention earlier that the above-said CAD objects that I'll ship are not going to be immutable. I expect the user to edit them (change dimensions, colors etc.) and save back to the library. I also expect users to add their own newly-created objects. Kindly consider this in your answers.
(Thanks for the answers so far.)
The real thing to consider is what your program does with the data. Relational databases are designed to handle complex relationships between sets of data. However, they're not designed to perform complex calculations.
Also, the amount of data and relative simplicity of it suggests to me that you could simply use a flat file to store the coordinates and read them into memory when needed. This way you can design your data structures to more closely reflect how you're going to be using this data, rather than how you're going to store it.
Many languages provide a mechanism to write data structures to a file and read them back in again called serialization. Python's pickle is one such library, and I'm sure you can find one for whatever language you use. Basically, just design your classes or data structures as dictated by how they're used by your program and use one of these serialization libraries to populate the instances of that class or data structure.
edit: The requirement that the structures be mutable doesn't really affect much with regard to my answer - I still think that serialization and deserialization is the best solution to this problem. The fact that users need to be able to modify and save the structures necessitates a bit of planning to ensure that the files are updated completely and correctly, but ultimately I think you'll end up spending less time and effort with this approach than trying to marshall SQLite or another embedded database into doing this job for you.
The only case in which a database would be better is if you have a system where multiple users are interacting with and updating a central data repository, and for a case like that you'd be looking at a database server like MySQL, PostgreSQL, or SQL Server for both speed and concurrency.
You also commented that you're going to be using C# as your language. .NET has support for serialization built in so you should be good to go.
I suggest you to consider using H2, it's really lightweight and fast.
When you say you'll have a library of 300 3D objects, I'll assume you mean objects for your code, not models that users will create.
I've read that object databases are well suited to help with CAD problems, because they're perfect for chasing down long reference chains that are characteristic of complex models. Perhaps something like db4o would be useful in your context.
How many objects are you shipping? Can you define each of these Objects and their coordinates in an xml file? So basically use a distinct xml file for each object? You can place these xml files in a directory. This can be a simple structure.
I would not use a SQL database. You can easy describe every 3D object with an XML file. Pack this files in a directory and pack (zip) all. If you need easy access to the meta data of the objects, you can generate an index file (only with name or description) so not all objects must be parsed and loaded to memory (nice if you have something like a library manager)
There are quick and easy SAX parsers available and you can easy write a XML writer (or found some free code you can use for this).
Many similar applications using XML today. Its easy to parse/write, human readable and needs not much space if zipped.
I have used Sqlite, its easy to use and easy to integrate with own objects. But I would prefer a SQL database like Sqlite more for applications where you need some good searching tools for a huge amount of data records.
For the specific requirement i.e. to provide a library of objects shipped with the application a database system is probably not the right answer.
First thing that springs to mind is that you probably want the file to be updatable i.e. you need to be able to drop and updated file into the application without changing the rest of the application.
Second thing is that the data you're shipping is immutable - for this purpose therefore you don't need the capabilities of a relational db, just to be able to access a particular model with adequate efficiency.
For simplicity (sort of) an XML file would do nicely as you've got good structure. Using that as a basis you can then choose to compress it, encrypt it, embed it as a resource in an assembly (if one were playing in .NET) etc, etc.
Obviously if SQLite stores its data in a single file per database and if you have other reasons to need the capabilities of a db in you storage system then yes, but I'd want to think about the utility of the db to the app as a whole first.
SQL Server CE is free, has a small footprint (no service running), and is SQL Server compatible
With really small sets of data, the policy where I work is generally to stick them into text files, but in my experience this can be a development headache. Data generally comes from the database and when it doesn't, the process involved in setting it/storing it is generally hidden in the code. With the database you can generally see all the data available to you and the ways with which it relates to other data.
Sometimes for really small sets of data I just store them in an internal data structure in the code (like A Perl hash) but then when a change is needed, it's in the hands of a developer.
So how do you handle small sets of infrequently changed data? Do you have set criteria of when to use a database table or a text file or..?
I'm tempted to just use a database table for absolutely everything but I'm not sure if there are any implications to this.
Edit: For context:
I've been asked to put a new contact form on the website for a handful of companies, with more to be added occasionally in the future. Except, companies don't have contact email addresses.. the users inside these companies do (as they post jobs through their own accounts). Now though, we want a "speculative application" type functionality and the form needs an email address to send these applications to. But we also don't want to put an email address as a property in the form or else spammers can just use it as an open email gateway. So clearly, we need an ID -> contact_email type relationship with companies.
SO, I can either add a column to a table with millions of rows which will be used, literally, about 20 times OR create a new table that at most is going to hold about 20 rows. Typically how we handle this in the past is just to create a nasty text file and read it from there. But this creates maintenance nightmares and these text files are frequently looked over when data that they depend on changes. Perhaps this is a fault with the process, but I'm just interested in hearing views on this.
Put it in the database. If it changes infrequently, cache it in your middle tier.
The example that springs to mind immediately is what is appropriate to have stored as an enumeration and what is appropriate to have stored in a "lookup" database table.
I tend to "draw the line" with the rule that if it will result in a column in the database containing a "magic number" that maps to an enumeration value, then the enumeration should really exist as a lookup table. If it's unrelated to the data stored in the database (eg. Application configuration data rather than user generated data), then it's an enumeration all the way.
Surely it depends on the user of the software tool you've developed to consume the set of data, regardless of size?
It might just be that they know Excel, so your tool would have to parse a .csv file that they create.
If it's written for the developers, then who cares what you use. I'm not a fan of cluttering databases with minor or transient data however.
We have a standard config file format (key:value) and a class to handle it. We just use that on all projects. Mostly we're just setting persistent properties for our applications (mobile phone development) so that's an appropriate thing to do. YMMV
In cases where the program accesses a database, I'll store everything in there: easier for backup and moving data around.
For small programs without database access I store my data in the .net settings, which are stored in an xml file - of course this is a feature of c#, so it might not apply to you.
Anyway, I make sure to store all data in one place. Usually a database.
Have you considered sqlite ? It's file-based, which addresses your feeling that "just a file might do" (zero configuration), but it's a perfectly good database and scales remarkably well. It supports a number of APIs and there are numerous front ends for administering it.
If these are small config-like data, i use some simple and common format. ini, json and yaml are usually ok. Java and .NET fans also like XML. in short, use something that you can easily read to an in-memory object and forget about it.
I would add it to the database in the main table:
Backup and recovery (you do want to recover this text file, right?)
Adhoc querying (since you can do it will a SQL tool and join it to the other database data)
If the database column is empty the store requirements for it should be minimal (nothing if it's a NULL column at the end of the table in Oracle)
It will be easier if you want to have multiple application servers as you will not need to keep multiple copies of some extra config file around
Putting it into a little child table only complicates the design without giving any real benefits
You may well already be going to that same row in the database as part of your processing anyway, so performance is not likely to be a problem. If you are not, you could cache it in memory.
We have a set of applications that work with multiple database engines including Sql Server and Access. The schemas for each are maintained separately and are not stored in text form making source control difficult. We are interested in moving to a system where the schema is stored in some text-based format (such as XML or YAML) with descriptions of field data types, foreign key relationhsips, etc.
When all is said and done, we want to have a single text file in source control that can be used to generate a clean database that works with both SQL Server, Access at least (and preferably is capable of working with Oracle, DB2 and other engines).
I'm certain that there are tools or libraries out there that can get us at least part of the way there. For one, I've found Altova MapForce that looks like it may do the trick but I'm interested in hearing about any alternative tools or libraries or even entirely different solutions for those in the same predicament.
Note: The applications are written in C++ and ORM solutions are both not readily available in C++ and would take far too long to integrate into our aging products.
If you don't use a object relational mapper that does this (and many other things for you) the easiest way might be to whip up a few structures to define your tables and attributes in some form of (static) code and write little generators to create actual databases from that description.
That makes it easy for source control, and if you're careful when designing those structures, you can easily re-use them for other DBs if need arises.
The consensus when I asked a similar (if rather more naive) question seem to be to use raw SQL, and to manage the RDMS dependencies with an additional layer. Good luck.
Tool you're looking for is liquibase. No support for Access though...