Is there a way to open a Zope DB in .Net? - sql-server

I need to upgrade an old system based on Zope, I need to be able to export the data to something like SQL Server...does anyone know of a way I can open the Zope DB in .NET or directly export it to SQL Server?
Thanks,
Kieron

I am a Plone web developer, and Jason Coombs is correct. The ZODB is an Object Data Base, and contains python objects. These objects can be python code, data, meta-data, etc and are stored in a hierarchy. This is very different from the world of SQL tables and stored procedures. (The growing NoSQL movement, shows that Zope is not the only one doing this.) Additionally, since these are complex python objects you really want to be working on the ZODB with the version of python it was created with, or make sure that you can do a proper migration. I don't think that you will be able to do this with IronPython.
Without knowing what you are trying to get out of the ZODB, it is hard to give specific advice. As Jason suggested, trying WebDAV/FTP access to the ZODB might be all that you need. This allows to pull out the basic content of pages, or image files, but you may loose much of the more complex data (for example an event page may not have all of its datetime data included) and you will loose much of of the meta-data.
Here is how someone migrated from Plone to Word press:
http://www.len.ro/2008/10/plone-to-wordpress-migration/
There are a number of articles on migrating from one Plone version to another. Some of this information maybe useful to you. stackoverflow is not allowing me to post more links but search for:
"when the plone migration fails doing content migration only"
"Plone Product ContentMigration"

The first important thing to note is that the Zope Object Database (ZODB) stores Python objects in its hierarchy. Therefore, getting "data" out of the ZODB generally does not make sense outside of the Python language. So to some extent, it really depends on the type of data you want to get out.
If the data you're seeking is files (such as HTML, documents, etc), you might be able to stand up a Zope server and turn on something like WebDAV or FTP and extract the files that way.
The way you've described it, however, I suspect the data you seek is more fine-grained data elements (like numbers or accounts or some such thing). In that case, you'll almost certainly need Python of some kind to extract the data and transform it into some format suitable to import into SQL Server. You might be able to stay inside the .NET world by using IronPython, but to be honest, I would avoid that unless you can find evidence that IronPython works with the ZODB library.
Instead, I suggest making a copy of your Zope installation and zope instance (so you don't break the running system), and then use the version of Python used by Zope (often installed together) to mount the database, and manipulate it into a suitable format. You might even use something like PyODBC to connect to the SQL Server database to inject the data -- or you may punt, export to some file format, and use tools you're more familiar with to import the data.
It's been a while since I've interacted with a ZODB, but I remember this article was instrumental in helping me interact with the ZODB and understand its structure.
Good luck!

Related

Database recommendation

I'm writing a CAD (Computer-Aided Design) application. I'll need to ship a library of 3d objects with this product. These are simple objects made up of nothing more than 3d coordinates and there are going to be no more than about 300 of them.
I'm considering using a relational database for this purpose. But given my simple needs, I don't want any thing complicated. Till now, I'm leaning towards SQLite. It's small, runs within the client process and is claimed to be fast. Besides I'm a poor guy and it's free.
But before I commit myself to SQLite, I just wish to ask your opinion whether it is a good choice given my requirements. Also is there any equivalent alternative that I should try as well before making a decision?
Edit:
I failed to mention earlier that the above-said CAD objects that I'll ship are not going to be immutable. I expect the user to edit them (change dimensions, colors etc.) and save back to the library. I also expect users to add their own newly-created objects. Kindly consider this in your answers.
(Thanks for the answers so far.)
The real thing to consider is what your program does with the data. Relational databases are designed to handle complex relationships between sets of data. However, they're not designed to perform complex calculations.
Also, the amount of data and relative simplicity of it suggests to me that you could simply use a flat file to store the coordinates and read them into memory when needed. This way you can design your data structures to more closely reflect how you're going to be using this data, rather than how you're going to store it.
Many languages provide a mechanism to write data structures to a file and read them back in again called serialization. Python's pickle is one such library, and I'm sure you can find one for whatever language you use. Basically, just design your classes or data structures as dictated by how they're used by your program and use one of these serialization libraries to populate the instances of that class or data structure.
edit: The requirement that the structures be mutable doesn't really affect much with regard to my answer - I still think that serialization and deserialization is the best solution to this problem. The fact that users need to be able to modify and save the structures necessitates a bit of planning to ensure that the files are updated completely and correctly, but ultimately I think you'll end up spending less time and effort with this approach than trying to marshall SQLite or another embedded database into doing this job for you.
The only case in which a database would be better is if you have a system where multiple users are interacting with and updating a central data repository, and for a case like that you'd be looking at a database server like MySQL, PostgreSQL, or SQL Server for both speed and concurrency.
You also commented that you're going to be using C# as your language. .NET has support for serialization built in so you should be good to go.
I suggest you to consider using H2, it's really lightweight and fast.
When you say you'll have a library of 300 3D objects, I'll assume you mean objects for your code, not models that users will create.
I've read that object databases are well suited to help with CAD problems, because they're perfect for chasing down long reference chains that are characteristic of complex models. Perhaps something like db4o would be useful in your context.
How many objects are you shipping? Can you define each of these Objects and their coordinates in an xml file? So basically use a distinct xml file for each object? You can place these xml files in a directory. This can be a simple structure.
I would not use a SQL database. You can easy describe every 3D object with an XML file. Pack this files in a directory and pack (zip) all. If you need easy access to the meta data of the objects, you can generate an index file (only with name or description) so not all objects must be parsed and loaded to memory (nice if you have something like a library manager)
There are quick and easy SAX parsers available and you can easy write a XML writer (or found some free code you can use for this).
Many similar applications using XML today. Its easy to parse/write, human readable and needs not much space if zipped.
I have used Sqlite, its easy to use and easy to integrate with own objects. But I would prefer a SQL database like Sqlite more for applications where you need some good searching tools for a huge amount of data records.
For the specific requirement i.e. to provide a library of objects shipped with the application a database system is probably not the right answer.
First thing that springs to mind is that you probably want the file to be updatable i.e. you need to be able to drop and updated file into the application without changing the rest of the application.
Second thing is that the data you're shipping is immutable - for this purpose therefore you don't need the capabilities of a relational db, just to be able to access a particular model with adequate efficiency.
For simplicity (sort of) an XML file would do nicely as you've got good structure. Using that as a basis you can then choose to compress it, encrypt it, embed it as a resource in an assembly (if one were playing in .NET) etc, etc.
Obviously if SQLite stores its data in a single file per database and if you have other reasons to need the capabilities of a db in you storage system then yes, but I'd want to think about the utility of the db to the app as a whole first.
SQL Server CE is free, has a small footprint (no service running), and is SQL Server compatible

Database efficiency

I am about to write a program to keep track of my school assignments and I was wondering what database language would be the most efficient and simple to implement to track the meta-data of the assignments? I am thinking about XML, but it would require several documents.
I (currently) have at least ten assignments per week for 45 weeks. The data that has to be stored includes name, issue date, due date, path, and various states of completion. What ever language it's in would have to be able to take a large increase in both the number of assignments and the amount of meta-data without having to make large changes in either the format or the retrieval system.
Quite frankly, if you pick a full-fledged database you run the risk of spending more time on data entry than you do on your homework. If you really need to keep track of this, I would seriously recommend a spreadsheet.
First, I think you are confusing a relational database system with a database language. In all likelihood, you will be using a database that uses SQL. From there, you will need to another programming platform to build an application around. If you wanted, you could use an Microsoft Access database that allows you to build a simple front-end that is stored in the same file as the database. In this case you would be programming with VBA.
Pretty much any more database system would be suitable for your needs, even Access handle orders of magnitude more work than you are describing.
Some possible database systems are, again, Microsoft Access, Microsoft SQL Server Express, VistaDB, SQLite (probably the best choice after access for your needs), and of course there are many others.
You could either build a web front end or a desktop; I assume you are using Windows. You could use Visual Studio C# Express for this if you wanted. Or you could go with VB.NET, VB6, or what have you.
My answer isn't directly related, but as you are designing your database structures you might want to take at some of the the objects in the SIF specification in particular look at the Assignment and GradingAssignment objects.
As for how to store the data, you could use a rdbms (sqlite, mysql) or perhaps key-value database (zodb, link).
Of course, if this is just a small personal project you could just serialize the data to something like xml, json, csv or whatever and storing it as a file. It might be better in the long run to use a database though. A database format will probably scale a lot easier.
I would recommend Oracle Express (With Application Express) It will scale up to 4gb of user data. Beyond that, you would have to start paying. Application Express is very simple and build CRUD applications for, which is what is sounds like yours is.
For a project like that I would use Sqlite or Mysql, it's be fast enough. Plus it's easy to setup.

How would you build a database filesystem (DBFS)?

A database file system is a file system that is a database instead of a hierarchy. Not too complex an idea initially but I thought I'd ask if anyone has thought about how they might do something like this? What are the issues that a simple plan is likely to miss? My first guess at an implementation would be something like a filesystem to for a Linux platform (probably atop an existing file system) but I really don't know much about how that would be started. Its a passing thought that I doubt I'd ever follow through on but I'm hoping to at least satisfy my curiosity.
DBFS is a really nice PoC implementation for KDE. Instead of implementing it as a file system directly, it is based on indexing on a traditional file system, and building a new user interface to make the results accessible to users.
The easiest way would be to build it using fuse, with a database back-end.
A more difficult thing to do is to have it as a kernel module (VFS).
On Windows, you could use IFS.
I'm not really sure what you mean with "A database file system is a file system that is a database instead of a hierarchy".
Probably, using "Filesystem in Userspace" (FUSE), as mentioned by Osama ALASSIRY, is a good idea. The FUSE wiki lists a lot of existing projects about databased-backed filesystems as well as filesystems in which you can search by SQL-like queries.
Maybe this is a good starting point for getting an idea how it could work.
It's a basic overview of the Firebird architecture.
Firebird is an opensource RDBMS, so you can have a real deep insight look, too, if you're interested.
Its been a while since you asked this. I'm surprised no one suggested the obvious. Look at mainframes and minis, especially iSeries-OS (now called IBM-i used to be called iOS or OS/400).
How to do an relational database as a mass data store is relatively easy. Oracle and MySQL both have these. The catch is it must be essentially ubiquitous for end user applications.
So the steps for an app conversion are:
1) Everything in a normal hierarchical filesystem
2) Data in BLOBs with light metadata in the database. File with some catalogue information.
3) Large data in BLOBs with extensive metadata and complex structures in the database. File with substantial metadata associated with it that can be essentially to understanding the structure.
4) Internal structures of the BLOB exposed in an object <--> Relational map with extensive meta-data. While there may be an exportable form, the application naturally works with the database, the notion of the file as the repository is lost.

Storing Database-Agnostic Schema

We have a set of applications that work with multiple database engines including Sql Server and Access. The schemas for each are maintained separately and are not stored in text form making source control difficult. We are interested in moving to a system where the schema is stored in some text-based format (such as XML or YAML) with descriptions of field data types, foreign key relationhsips, etc.
When all is said and done, we want to have a single text file in source control that can be used to generate a clean database that works with both SQL Server, Access at least (and preferably is capable of working with Oracle, DB2 and other engines).
I'm certain that there are tools or libraries out there that can get us at least part of the way there. For one, I've found Altova MapForce that looks like it may do the trick but I'm interested in hearing about any alternative tools or libraries or even entirely different solutions for those in the same predicament.
Note: The applications are written in C++ and ORM solutions are both not readily available in C++ and would take far too long to integrate into our aging products.
If you don't use a object relational mapper that does this (and many other things for you) the easiest way might be to whip up a few structures to define your tables and attributes in some form of (static) code and write little generators to create actual databases from that description.
That makes it easy for source control, and if you're careful when designing those structures, you can easily re-use them for other DBs if need arises.
The consensus when I asked a similar (if rather more naive) question seem to be to use raw SQL, and to manage the RDMS dependencies with an additional layer. Good luck.
Tool you're looking for is liquibase. No support for Access though...

Generating database tables from object definitions

I know that there are a few (automatic) ways to create a data access layer to manipulate an existing database (LINQ to SQL, Hibernate, etc...). But I'm getting kind of tired (and I believe that there should be a better way of doing things) of stuff like:
Creating/altering tables in Visio
Using Visio's "Update Database" to create/alter the database
Importing the tables into a "LINQ to SQL classes" object
Changing the code accordingly
Compiling
What about a way to generate the database schema from the objects/entities definition? I can't seem to find good references for tools like this (and I would expect some kind of built-in support in at least some frameworks).
It would be perfect if I could just:
Change the object definition
Change the code that manipulates the object
Compile (the database changes are done auto-magically)
Check out DataObjects.Net - is is designed to support exactly this case. Code only, and nothing else. Its schema upgrade layer is probably the most featured one you can find, and it really fully abstracts schema upgrade SQL.
Check out product video - you'll notice nothing additional is made to sync the schema. Schema upgrade sample shows the intended usage of this feature.
You may be looking for an Object Database.
I believe this is the problem that the Microsofy Entity Framework is trying to address. Whilst not specifically designed to "Compile (the database changes are done auto-magically)" it does address the issue of handling changes to the domain model without a huge dependance on the underlying data model.
As Jason suggested, object db might be a good choice. Take a look at db4objects.
What you described is GORM. It is part of the Grails framework and is built to work with Hibernate (maybe JPA in the future). When I was first using Grails it seemed backwards. I was more comfortable with a Rails style workflow of making the tables and letting the framework generate scaffolding from the database schema. GORM persists your domain objects for you so you create and change the objects, it manages database create/update. This makes more sense now that I have gotten used to it. Sorry to tease you if you aren't looking for a new framework but it is on the roadmap for release 1.1 to make GORM available standalone.
When we built the first version of our own framework (Inon Datamanager) I had it read pre-existing SQL tables and autogenerate Java objects from them.
When my colleagues who came from a Smalltalkish background built the second version, they started from the objects and then autogenerated the tables.
Actually, they forgot about the SQL part altogether until I came back in and added it. But nowadays we just run a trigger on application startup which iterates over the object model, checks if the tables and all the right columns exist, and creates them if not. Very convenient.
This turned out to be a lot easier than you might expect - if your favourite tool doesn't support a similar process, you could probably write it in a couple of hours - assuming the relational to object mapping is relatively simple.
But the point is, it seems to depend on whether you're culturally an object person or a database person - you can regard either one as the authoritative source.
Some of the really big dogs, such as ERwin Data Modeler, will go object to DB. You need to have the big bucks to afford the product though.
I kept digging around some of the "major" frameworks and it seems that Django does exactly what I was talking about. Or so it seems from this screencast.
Does anyone have any remark to make about this? Does it work well?
Yes, Django works well.
yes, it will generate your SQL tables from your data model definitions (written in python)
It won't always alter existing tables if you update your structure, you might have to run an ALTER table manually
Ruby on Rails has an even more advanced version of these features (Rails migrations), but I don't like the framework as much, I find ruby and rails pretty idiosyncratic
Kind of a late answer, but here it goes:
I faced the exact same problem and ended up writing my own solution for it, working with .NET and SQL Server only however. It basicaly does implement the process you describe:
All DB objects are kept as embedded CREATE scripts as part of the source code
DB Objects are set up automatically (or on request) when using the data access functionality
All non-table changes are also performed automatically (or on request) at the same time
Table changes, which may require special attention to migrate data, are performend via (manually created) change scripts also upon upgrading the database
Even manual changes made to any databse object can be detected, so that schema integrity can be verified and rectified
An optional lightweight ORM can map stored procedures and objects as well as result sets (even multiple)
A command-line application helps keeping the SQL source files in sync with a development database
The library including the database are free under a LGPL license.
http://code.google.com/p/bsn-modulestore/

Resources