How to migrate Drupal data to Django? - database

I want to migrate part of a Drupal 6 site to a Django application, specifically a Drupal based questions and answers section that I think would work better with OSQA. I've already created another question related to the authentication part of this integration and for the purposes of this question we can assume that all Drupal users will be recreated, at least their usernames, in the Django database. This question is about the data migration from Drupal to Django.
In Drupal I have all questions as nodes of a 'question' content type with some CCK fields and the answers to these questions are standard comments. I need help to find the best way of moving this data to OSQA in Django.
At first I thought I could use South but I'm not sure if it would be the best fit for my needs.
For now I think my best approach would be to write a Django app that connects to the Drupal database, query for all the questions with their corresponding comments and users and then insert directly to Django's database using the correct models and Django methods.
Am I on the right path? Any other suggestions?
Thanks!

At first I thought I could use South but I'm not sure if it would be the best fit for my needs.
No, South is not for this kind of migration. It is for intra-project migrations, and you will want to have it, but it doesn't really do you any good here.
"Migration" is really not a good term for what you need. What you really want to do is export data from Drupal and import it into Django.
I haven't made an in-depth analysis of the possible solutions for this, but were I asked to do the same thing, I would simply define a JSON- or XML-based interchange format for the transfer, then write one set of code to export the data from Drupal to this format, then another to import data from this format into Django. I strongly recommend against using a binary format for this interchange; the ability to load the data into a text editor to verify your data and fix things is really important.
For now I think my best approach would be to write a Django app that connects to the Drupal database, query for all the questions with their corresponding comments and users and then insert directly to Django's database using the correct models and Django methods.
If you want to skip the interchange file and do it in one step, then you don't want to write a new Django app just for the import; that's (IMHO) overkill. What you want to write is a Django management command within the app that you will be importing data into, and you probably want to use Django's support for multiple databases as well as model properties (such as db_table and db_column) for using existing database schemas. This is why I recommend the interchange file method: you wouldn't need to reimplement Drupal tables in Django models.

Mike's answer is the good path to follow. However in real world scenario you can find useful to mix different techniques, for example connect to the original Drupal database for the files referencing a local directory for file content (query for files are simple join from few tables) but processing the most structured data via a custom JSON view (e.g. nodes).
In these case a JSON View created via Views Datasource module can help you to design and select your data via a simple Drupal view. Then you can write a management command to read and parse the data as suggested before. You have to page the view in a way that doesn't request too much to process and you can even do asynchronous requests to speed up the retrieval using gevent.
In this way I parsed more than 15k of contents in less than 10 minutes, not so fast but acceptable for one-time import. If you want to store content for process it later you can save raw data on a custom model on the database or on a on-memory redis data store via python redis integration. If you want some detail I've written a detailed howto for Drupal-Django migration deepening these techniques.

Related

Managing multiple datasources in CakePHP

I'm planning to develop a web application in CakePHP that shows information in graphics and cards. I chose CakePHP because the information that we need to show is very structured, so the model approach makes easier to manage data; also I have some experience with MVC from ASP.NET and I like how simple is to use the routing.
So, my problem is that the multiple organizations that could use the app would have their own database with a different schema that the one we need. I can't just set their string connection in the app.php file because their database won't match my model.
And the organization datasource couldn't fit my model for a lot of reasons: the tables don't have the same name, the schema is different, the fields of my entity are in separated tables, maybe they have the info in different databases or also in different DBMS!
I want to know if there's a way to make an interface that achieves this
In such a way that cakephp Model/Entity can use data regardless of the source. Do you have any suggestions of how to do that? Does CakePHP have an option to make this possible? Should I use PHP with some kind of markup language like JSON or XML? Maybe MySQL has an utility to transform data from different sources into a view and I can make CakePHP use the view instead of the table?
In case you have an answer be as detailed as you can.
This other options are possible if it's impossible to make the interface:
- Usw another framework that can handle this easier and has the features I mentioned above.
- Make the organization change their database so it matches my model (I don't like this one, and probably they won't do it).
- Transfer the data in the application own database.
Additional information:
The data shown in graphics are from students in university. Any university has its own database with their own structure and applications using the db, that's why isn't that easy to change structure. I just want to make it as easy as possible to any school to configure their own db.
EDIT:
The version is CakePHP 3.2.
An important appointment is that it doesn't need all CRUD operations, only "reading". Hope that makes the solution easier.
I don't think your "question" can be answered properly, it doesn't contain enough information, not enough details. I guess there is something that will stay the same for all organizations but their data and business logic will be different. But I'll try it.
And the organization datasource couldn't fit my model for a lot of reasons: the tables don't have the same name, the schema is different, the fields of my entity are in separated tables, maybe they have the info in different databases or also in different DBMS!
Model is a whole layer, so if you have completely different table schemas your business logic, which is part of that layer, will be different as well. Simply changing the database connection alone won't help you then. The data needs to be shown in the views as well and the views must be different as well then.
So what you could try to do and what your 2nd image shows is, that you implement a layer that contains interfaces and base classes. Then create a Cake plugin for each of the organizations that uses these interfaces and base classes and write some code that will conditionally use the plugin depending on whatever criteria (guess domain or sub-domain) is checked. You will have to define the intermediate interfaces in a way that you can access any organization the same way on the API level.
And one technical thing: You can define the connection of a table object in the model layer. Any entity knows about it's origin but you should not implement business logic inside an entity nor change the connection through an entity.
EDIT: The version is CakePHP 3.2. An important appointment is that it doesn't need all CRUD operations, only "reading". Hope that makes the solution easier.
If that's true either use the CRUD plugin (yes, you can use only the R part of it) or write some code, like a class that describes the organization and will be used to create your table objects and views on the fly.
Overall it's a pretty interesting problem but IMHO to broad for a simple answer or solution that can be given here. I think this would require some discussion and analysis to find the best solution. If you're interested in consulting you can contact me, check my profile.
I found a way without coding any interface. In fact, it's using some features already included in the DBMS and CakePHP.
In the case that the schema doesn't fit the model, you can create views to match de table names and column names from the model. By definition, views work as a table so CakePHP searches for the same table name and columns and the DBMS makes the work.
I made a test with views in MySQL and it worked fine. You can also combine the data from different tables.
MySQL views
SQL Server views.
If the user uses another DBMS you just change the datasource in app.php, and make the views if it's necessary
If the data is distributed in different DBMS, CakePHP let's you set a datasource for each table, you just add it to app.php and call it in the table if it's required.
Finally, in case you just need the "reading" option, create a user with limited access to the views and only with SELECT privileges.
USING:
CakePHP 3.2
SQL SERVER 2016
MySQL5.7

Database creation and query

So I have to created a recipe website and HTML-CSS is mainly my forte. I need a database to search through over a 100 recipes and mainly sort them,by author, apart from the other sorting orders. I don't want to use a CMS like Joomla. How do I start about?
Do I store the entire recipe(with a picture or two), into the database, or only a link to the recipe?
Secondly, the client would be updating the website as well, is there any way to simplify the process for the client who has absolutely no knowledge of adding into a database.
You're going to need to do some server-side scripting. If you don't want to use a CMS or framework, you (or someone else) will have to write the code for all of the site.
DB design pointers:
Store the recipe in the database, along with the author, etc.
Don't store the pictures in the db, even though it's easy enough to do. Better store than in a field in the db, called 'filename' or something which stores the path of the images on the server.
For the client - you will need to build a backend/admin page(s) with 'forms' for the client to upload (add), update and delete recipes and pictures.
You don't need save pictures into database. See database model of Prestashop(see only relative to images because are various tables), for example.
Regards and good luck!
You can add pictures into data bases as well. For that you can always reduce the size of the images before inserting into database.
For database, you can use php or javascript. Both provide easy way of accessing database.
Javascript even has inbuilt transaction commit and rollback feature.

is Using JSON data is better then Querying Database when there is no security issue for data

For my new project I'm looking forward to use JSON data as a text file rather then fetching data from database. My concept is to save a JSON file on the server whenever admin creates a new entry in the database.
As there is no issue of security, will this approach will make user access to data faster or shall I go with the usual database queries.
JSON is typically used as a way to format the data for the purpose of transporting it somewhere. Databases are typically used for storing data.
What you've described may be perfectly sensible, but you really need to say a little bit more about your project before the community can comment on your approach.
What's the pattern of access? Is it always read-only for the user, editable only by site administrator for example?
You shouldn't worry about performance early on. Worry more about ease of development, maintenance and reliability, you can always optimise afterwards.
You may want to look at http://www.mongodb.org/. MongoDB is a document-centric store that uses JSON as its storage format.
JSON in combination with Jquery is a great fast web page smooth updating option but ultimately it still will come down to the same database query.
Just make sure your query is efficient. Use a stored proc.
JSON is just the way the data is sent from the server (Web controller in MVC or code behind in standind c#) to the client (JQuery or JavaScript)
Ultimately the database will be queried the same way.
You should stick with the classic method (database), because you'll face many problems with concurrency and with having too many files to handle.
I think you should go with usual database query.
If you use JSON file you'll have to sync JSON files with the DB (That's mean an extra work is need) and face I/O problems (if your site super busy).

Google AppEngine DB Management best practice?

Google app engine offer a data store (some kind of DB wrapper) to hold your data.
It does not supply an editor to this data store - only a viewer.
When developing a web application with other DB - MSSQL, MySql etc. - I change the DB structure in the development process many times.
In AE data store you should edit it's structure and data by using code - Java in my case.
Do you - AE developers - have any best practice to manage this DB updates and save them in some smart way for deployment?
I don't know about "best practice", but I have a Servlet that I use during development which can upload and download all entity data as JSON.
I can then use a regular text editor to make changes or I use a hacked version of JSONpad to edit data live in the system.
Since, I use JSON through out my application this works best for me. One could also do the sample thing with XML and use any one of the many XML editors.
Also, I do use the low-level API for all my applications, so my data models tends to be fairly simple.
There are plenty of JSON/XML editors that could be adapter for your purposes, with a little bit of work.

What is best practice for working with DB in Wordpress?

I'm developing a plugin for Wordpress, and I need to store some information from a form into a table in the DB.
There are probably lots of pages explaining how to do this, this being one of them:
http://codex.wordpress.org/Function_Reference/wpdb_Class
But are there any other pages talking about best practice for interacting with th WP DB?
UPDATE
Found a few more pages which could be usefull:
http://wpengineer.com/wordpress-database-functions
http://blue-anvil.com/archives/wordpress-development-techniques-1-running-custom-queries-using-the-wpdb-class
Unless you need to create your own complex table structure I'd suggest using the existing tables for your needs. There's the options table, user meta table, and post meta table to work with. These all have built in apis for quick access.
Options: add_option(), get_option(), update_option(), delete_option()
Usermeta: add_user_meta(), get_user_meta(), update_user_meta(), delete_user_meta()
Postmeta: add_post_meta(), get_post_meta(), update_post_meta(), delete_post_meta()
I've not found much real need to go outside of these tables (yes, there are exceptions, I've done them myself when the data needs are complex) but these meta options all use a text field in the db to store the data, so there's a lot you can store here if its simple data.
If your data is simple then consider storing your information in one of these places as individual options or even as serialized arrays.
One of my BIGGEST pet peeves with plug in developers that leverage the WP database is that if/when a given database driven plugin isn't used anymore, the developer doesn't think to remove the footprint it made in the database.

Resources