Transferring database objects from one DB to another

Transferring database objects from one DB to another — Django / Postgres - database

I have a dev site and a live site for my Django app. A lot of front end copy and other object details are stored in the DB. Conversely, a lot of client data is also stored in the DB. What's the most efficient way to get the dev site ready and then copying over the new copy and objects to the live server?
I guess the easiest is to keep the changeable dev site data up to date with the live site, then update the dev site, then do a data dump over to the live site. But what happens if/when someone updates the live site whilst this is happening? Their update will be overwritten.
Is there a Django app to allow me to 'check' entries in the admin list view, press 'copy' then somehow paste these into the new site?
Or what are the other patterns people use for this common scenario?
It's a bit like using GIT - having a master branch, then a branch for each live site iteration, then creating hot fixes on the live site, whilst still working on the dev site — the hotfix can be merged into the dev site. How do we merge the data?

I'm not sure I get it correctly, please tell me if I'm wrong: so basically, you want to make changes on the (same) data both in production and in your dev instance ?
IMHO it is generally not a good solution for many reasons, one of those being this fragile need to merge you mentioned. To me, the best way to approach this would be in fact to keep the two environments as detached as possible (data-wise).
The part to avoid at all costs, IMHO, is this final dev->production merge. I can understand a little better the need to transfer in the other direction (from prod to dev). I can see two common cases here:
You want the same data in dev. to try to reproduce a data-related bug. In that case, why not simply dumping the production database and load it on the dev. machine ? IMHO, this dev. database should be considered "wipeable" all the time.
You have some "initial" data on the production site that is generic enough to be also needed in dev (something like Male/Female entries in a "Gender" table). In that case, you can imagine using Fixtures.
While thinking about you data flows and environments, you should probably also have a look at South, data migration is a very useful tool.
Hope this helps !

Is there a Django app to allow me to 'check' entries in the admin list view, press 'copy' then somehow paste these into the new site?
Not that I know off. But I think you could write such an app with a reasonable amount of effort. I would suggest the following:
Use admin actions for the change list view. That way you could publish or transfer multiple objects from your development system to your live system. See the documentation for further information.
Add another button to the change view of a single object. Maybe "Publish to Live site".
Add yet another button to the change view that saves and publishes in one step, like "Save and publish to Live site".
To add buttons to the change form I would override the corresponding template:
http://code.djangoproject.com/svn/django/trunk/django/contrib/admin/templates/admin/change_form.html
Override the block 'submit_buttons_bottom' like
{% block submit_buttons_bottom %}
{{ block.super }}
<!-- your buttons here -->
{% endblock %}
Preferably you would write just one function that does the job of transfering the object from the dev to the live system. Use this function in all three use cases.
But there is a catch!
It could be fairly hard to copy one object from one Django instance to another. Especially so, if those objects have relationships with other objects, which in turn might have relationships with yet another object. Retaining referential integrity should be a nightmare. Maybe someone has a nice solution for that.

Related

Updating a Site With Active Member Registration

I want to take on a project, but I’m not sure how to handle the updating process.
Normally, when asked to update a site, you back-up the database & site files, then make the updates locally or on a development server. Then when the updates are finished, you push them live.
My problem is that the site I’ll be working on registers new members every day, makes blog posts every day, and gets new comments on those posts every day. If I were to pull the site on Monday, update it in a testing environment, then push those changes live on Friday, every member who signed up and blog entry written during the week would be overwritten.
So what’s the best way to go about doing this? How do I update/add features to a site without losing the data gained on the live site during development? Surely it must be possible, since high-traffic sites like TechCrunch and Gizmodo make huge sitewide updates all the time without losing data.

It depends on what changes you're making. Is it file/template changes or database changes?
If it's just file changes, just pull the files and database to your local server, make changes to your files and then just push them (files only) to the live server when done. As long as no database changes have happened, that will work.
If there are db changes, things get a bit trickier. You would basically follow the same process, but make note of any db changes you are making on the local site. when everything is ready to be pushed to the live server, you have no other option but to take the site offline for users while you update.
You would then push all updated files to live server, and mirror any db changes you did on the local server (install/update plugins etc). When all that is done and tested, you can then put the site online again. Downtime should be minimal if you have made good notes on db changes.
This is dependant on being able to block access to users but still allow access for yourself, but that's standard with most CMSs.
Also, if you dont already you should look at integrating git into your workflow. If the changes you'll be making take a considerable amount of time you'll need a system in place where you can branch your code off into new versions while still keeping the original state of code that's on the live server.
That way, if there is an urgent fix that needs doing to the live site while your in the middle of developing new features locally, you can switch back to your master/original branch and make changes to the code that doesn't include any of the new stuff you have been working on the other branch.

Well, I've only done this for small traffic wordpress/drupal sites, but not having a "live" version hasn't been an issue for me. I have my development copy, make test the changes I want, and then roll those changes out to the live site on the fly by FTPing' them back up.
Are you going to be editing these registrations? Or are you just tweaking static files?
In the case of wordpress, I test a plugin out, and then just install it on the live site.
Typically the changes I'm making involve plugins/modules and some PHP stuff. This is obviously not the most nuanced solution, and I'm interested to see what more knowledgeable people have in mind.

Need ideas on retrieving data from a website

I'm stumped and need some ideas on how to do this or even whether it can be done at all.
I have a client who would like to build a website tailored to English-speaking travelers in a specific country (Thailand, in this case). The different modes of transportation (bus & train) have good web sites for providing their respective information. And both are very static in terms of the data they present (the schedules rarely change). Here's one of the sites I would need to get info from: train schedules The client wants to provide users the ability to search for a beginning and end location and determine, using the external website's information, how they can best get there, being provided a route with schedule times for the different modes of chosen transport.
Now, in my limited experience, I would think the way to do that would be to retrieve the original schedule info from the external site's server (via API or some other means) and retain the info in a database, which can be queried as needed. Our first thought was to contact the respective authorities to determine how/if this can be done, but this has proven to be problematic due to the language barrier, mainly.
My client suggested what is basically "screen scraping", but that sounds like it would be complicated at best, downloading the web page(s) and filtering through the HTML for relevant/necessary data to put into the database. My worry is that the info on these mainly static sites is so static, that the data isn't even kept in a database to build the page and the web page itself is updated (hard-coded) when something changes.
I could really use some help and suggestions here. Thanks!

Screen scraping is always problematic IMO as you are at the mercy of the person who wrote the page. If the content is static, then I think it would be easier to copy the data manually to your database. If you wanted to keep up to date with changes, you could then snapshot the page when you transcribe the info and run a job to periodically check whether the page has changed from the snapshot. When it does, it sends an email for you to update it.
The above method could also be used in conjunction with some sort of screen scaper which could fall back to a manual process if the page changes too drastically.
Ultimately, it is a case of how much effort (cost) is your client willing to bear for accuracy

I have done this for the following site: http://www.buscatchers.com/ so it's definitely more than doable! A key feature of a web scraping solution for travel sites is that it must send you emails if anything went wrong during the scraping process. On the site, I use a two day window so that I have two days to fix the code if the design changes. Only once or twice have I had to change my code, and it's very easy to do.
As for some examples. There is some simplified source code here: http://www.buscatchers.com/about/guide. The full source code for the project is here: https://github.com/nicodjimenez/bus_catchers. This should give you some ideas on how to get started.

I can tell that the data is dynamic, it's to well structured. It's not hard for someone who is familiar with xpath to scrape this site.

Web-App : Keeping trace of the version of the application in database?

We are building a webapp which is shipped to several client as a debian package. Each client runs his own server. But the update and support is done by us.
We make regular releases of the product, with a clean version number. Most of the users get an automatic update (by Puppet), some others don't.
We want to keep a trace of the version of the application (in order to allow the user to check the version in an "about" section, and for our support to help the user more accurately).
We plan to store the version of the code and the version of the base in our database, and to keep the info up to date automatically.
Is that a good idea ?
The other alternative we see is a file.
EDIT : The code and database schema are updated together. ( if we update to version x.y.z , both code and database go to x.y.z )

Using a table to track every change to a schema as described in this post is a good practice that I'd definitely suggest to follow.
For the application, if it is shipped independently of the database (which is not clear to me), I'd embed a file in the package (and thus not use the database to store the version of the web application).
If not and thus if both the application and the database versions are maintained in sync, then I'd just use the information stored in the database.

As a general rule, I would have both, DB version and application version. The problem here is how "private" is the database. If the database is "private" to the application, and user never modifies the schema then your initial solution is fine. In my experience, databases which accumulate several years of data stop being private, it means that users add a table or two and access data using some reporting tool; from that point on the database is not exclusively used by the application any more.
UPDATE
One more thing to consider is users (application) not being able to connect to the DB and calling for support. For this case it would be better to have version, etc.. stored on file system.

Assuming there are no compelling reasons to go with one approach or the other, I think I'd go with keeping them in the database.

I'd put them in both places. Then when running your about function you quickly check that they are both the same, and if they aren't you can display extra information about the version mismatch. If they're the same then you will only need to display one of them.
I've generally found users can do "clever" things like revert databases back to old versions by manually copying directories around "because they can" so defensively dealing with it is always a good idea.

Best Practice for seeing live data on the dev server?

Assumption: live/production web app suppresses errors being shown to end-users.
Suppose your tech support team wants to see live data but through the eyes of the development-side of the application (maybe you want to see what errors are occurring, or want to see when you've got an issue fixed using an end-user's data).
Right now we've got one database serving both the dev and live boxes (not my idea - I know it's gross).
Ideas?
Edit: Best/handy tools for implementing your suggestion?

We replicate the data back to a different database. Yes, there is a delay, but it keeps people hands out of the production servers. This also allows us to "hide" information that tech support (and other people for that matter) aren't supposed to see.

In addition to replicating data down, on production, we see who's logged into the application, and if it's a member of the company, send them to the real error page versus the happy kitten playing with a ball of yarn apologizing.

Back up and restore from live to dev on a regular basis (once, twice a day). It doesn't need to be realtime (as you might be entering data from the dev side anyway, which could cause problems).
If you have PCI or HIPAA data, make sure you don't put that in your dev environment -- that might break laws.

I generally like to have a 3-tier system for web development:
Development
Testing
Live
Most of the time testing is an exact copy of the live system, except that errors are turned on, when a new version is about to be moved live it's replaced with the new version BEFORE live is, to detect upgrade issues.
Development is completely separate from live, to allow for major changes to things like the database, or changes to the production environment.

I would firstly make errors are either emailed to someone with details of how the user got there or at minimum logged so you can watch the error log while you perform similar actions to see if you get the same messages in the log.
And yes, copying the database on the dev server/site is probably your only option. You don't want any changes made by the development team to live data and you'll probably also have changes that won't work with the production database at some point.
I wouldn't recommend doing a nightly copy as a developer might be in the middle of some new feature where they have added data and then it's erased that night. I usually copy the production database(s) to dev each time a major version is released. This also allows me to do speed testing with a lot of live data. On some systems I also change everyones password to a default so I can login easily as any user.

If your configuration permits it:
a. Add a logging function (if there isn't one already) to write messages of interest to a log file.
b. Run the unix command
tail -f < logfile.txt
which will stream the growing log file to your console.
http://www.monkey.org/cgi-bin/man2html?tail
If you have Windows, you might try this:
http://tailforwin32.sourceforge.net/

Altering database tables in Django

I'm considering using Django for a project I'm starting (fyi, a browser-based game) and one of the features I'm liking the most is using syncdb to automatically create the database tables based on the Django models I define (a feature that I can't seem to find in any other framework).
I was already thinking this was too good to be true when I saw this in the documentation:
Syncdb will not alter existing tables
syncdb will only create tables for models which have not yet been installed. It will never issue ALTER TABLE statements to match changes made to a model class after installation. Changes to model classes and database schemas often involve some form of ambiguity and, in those cases, Django would have to guess at the correct changes to make. There is a risk that critical data would be lost in the process.
If you have made changes to a model and wish to alter the database tables to match, use the sql command to display the new SQL structure and compare that to your existing table schema to work out the changes.
It seems that altering existing tables will have to be done "by hand".
What I would like to know is the best way to do this. Two solutions come to mind:
As the documentation suggests, make the changes manually in the DB;
Do a backup of the database, wipe it, create the tables again (with syncdb, since now it's creating the tables from scratch) and import the backed-up data (this might take too long if the database is big)
Any ideas?

Manually doing the SQL changes and dump/reload are both options, but you may also want to check out some of the schema-evolution packages for Django. The most mature options are django-evolution and South.
EDIT: And hey, here comes dmigrations.
UPDATE: Since this answer was originally written, django-evolution and dmigrations have both ceased active development and South has become the de-facto standard for schema migration in Django. Parts of South may even be integrated into Django within the next release or two.
UPDATE: A schema-migrations framework based on South (and authored by Andrew Godwin, author of South) is included in Django 1.7+.

As noted in other answers to the same topic, be sure to watch the DjangoCon 2008 Schema Evolution Panel on YouTube.
Also, two new projects on the map: Simplemigrations and Migratory.

One good way to do this is via fixtures, particularly the initial_data fixtures.
A fixture is a collection of files that contain the serialized contents of the database. So it's like having a backup of the database but as it's something Django is aware of it's easier to use and will have additional benefits when you come to do things like unit testing.
You can create a fixture from the data currently in your DB using django-admin.py dumpdata. By default the data is in JSON format, but other options such as XML are available. A good place to store fixtures is a fixtures sub-directory of your application directories.
You can load a fixure using django-admin.py loaddata but more significantly, if your fixture has a name like initial_data.json it will be automatically loaded when you do a syncdb, saving the trouble of importing it yourself.
Another benefit is that when you run manage.py test to run your Unit Tests the temporary test database will also have the Initial Data Fixture loaded.
Of course, this will work when when you're adding attributes to models and columns to the DB. If you drop a column from the Database you'll need to update your fixture to remove the data for that column which might not be straightforward.
This works best when doing lots of little database changes during development. For updating production DBs a manually generated SQL script can often work best.

I've been using django-evolution. Caveats include:
Its automatic suggestions have been uniformly rotten; and
Its fingerprint function returns different values for the same database on different platforms.
That said, I find the custom schema_evolution.py approach handy. To work around the fingerprint problem, I suggest code like:
BEFORE = 'fv1:-436177719' # first fingerprint
BEFORE64 = 'fv1:-108578349625146375' # same, but on 64-bit Linux
AFTER = 'fv1:-2132605944'
AFTER64 = 'fv1:-3559032165562222486'
fingerprints = [
BEFORE, AFTER,
BEFORE64, AFTER64,
]
CHANGESQL = """
/* put your SQL code to make the changes here */
"""
evolutions = [
((BEFORE, AFTER), CHANGESQL),
((BEFORE64, AFTER64), CHANGESQL)
]
If I had more fingerprints and changes, I'd re-factor it. Until then, making it cleaner would be stealing development time from something else.
EDIT: Given that I'm manually constructing my changes anyway, I'll try dmigrations next time.

django-command-extensions is a django library that gives some extra commands to manage.py. One of them is sqldiff, which should give you the sql needed to update to your new model. It is, however, listed as 'very experimental'.

So far in my company we have used the manual approach. What works best for you depends very much on your development style.
We generally have not so many schema changes in production systems and somewhat formalized rollouts from development to production servers. Whenever we roll out (10-20 times a year) we do a fill diff of the current and the upcoming production branch reviewing all the code and noting what has to be changed on the production server. The required changes might be additional dependencies, changes to the settings file and changes to the database.
This works very well for us. Having it all automated is a niche vision but to difficult for us - maybe we could manage migrations but we still would need to handle additional library, server, whatever dependencies.

Django 1.7 (currently in development) is adding native support for schema migration with manage.py migrate and manage.py makemigrations (migrate deprecates syncdb).