only one database create while using a multi-database setup - database

I am trying to setup celery in one of my django projects. I want celery to use a separate database. Currently, as the project is in development phase we are using sqlite3. In order to setup multiple databases i did the following.
Defined databases in the settings.py file.
DATABASES = {'default':
{'ENGINE': 'django.db.backends.sqlite3',
'NAME':'devel',
'USER':'',
'PASSWORD':'',
'HOST':'',
'PORT':'',
},
'celery':
{'ENGINE': 'django.db.backends.sqlite3',
'NAME':'celery',
'USER':'',
'PASSWORD':'',
'HOST':'',
'PORT':'',
},
}
Created a Router Object in db_routers.py file
class CeleryRouter(object):
"""
This class will route all celery related models to a»
separate database.
"""
# Define the applications to be used in the celery database
APPS = (
'django',
'djcelery'
)
# Define Database Alias
DB = 'celery'
def db_for_read(self, model, **hints):
"""
Point read operations to celery database.
"""
if model._meta.app_label in self.APPS:
return self.DB
return None
def db_for_write(self, model, **hints):
"""
Point write operations to celery database.
"""
if model._meta.app_label in self.APPS:
return self.DB
return None
def allow_relation(self, obj1, obj2, **hints):
"""
Allow any relation between two objects in the db pool
"""
if (obj1._meta.app_label is self.APPS) and \
(obj2._meta.app_label in self.APPS):
return True
return None
def allow_syncdb(self, db, model):
"""
Make sure the celery tables appear only in celery
database.
"""
if db == self.DB:
return model._meta.app_label in self.APPS
elif model._meta.app_label in self.APPS:
return False
return None
Updated the DATABASE_ROUTER variable in settings.py file
DATABASE_ROUTERS = [
'appname.db_routers.CeleryRouter',
]
Now, when i do python manage.py syncdb i see that the tables are created for celery but there is only one database created i.e. devel. Why are the tables being created in the devel database and not in celery database ?

Quote from Django docs:
The syncdb management command operates on one database at a time. By default, it operates on the default database, but by providing a --database argument, you can tell syncdb to synchronize a different database.
Try running:
./manage.py syncdb --database=celery

Related

Vapor 3: Using multiple databases

Using Vapor 3, is there an easy way to switch databases, while the server is running?
For example, a user logs in using the 'login' db. I then set
the db for that user in their cookie. Any subsequent requests from that user then use the db identified in the cookie (the 'user' in this scenario would really be a company).
All db's would be from the same db family (eg MySQL).
This would keep every companies data in their own db, and limit
the size of each db (and hopefully, overall, db operations would be faster).
Also, any need to restore a db would only impact one company, and backups would be simpler.
How to achieve this?
Would this be very inefficient?
Are there other better ways to achieve this?
As far as I understand you could create some different database identifiers like:
extension DatabaseIdentifier {
static var db1: DatabaseIdentifier<MySQLDatabase> {
return .init("db1")
}
static var db2: DatabaseIdentifier< MySQLDatabase > {
return .init("db2")
}
}
and then register them in configure.swift like this
let db1 = MySQLDatabase(config: MySQLDatabaseConfig(hostname: "localhost", username: "root", database: "db1"))
let db2 = MySQLDatabase(config: MySQLDatabaseConfig(hostname: "localhost", username: "root", database: "db2"))
var databaseConfig = DatabasesConfig()
databaseConfig.add(database: db1, as: .db1)
databaseConfig.add(database: db2, as: .db2)
services.register(databaseConfig)
after that don't forget to use .db1 and .db2 identifiers everywhere instead of default .mysql (for MySQL), e.g. in migrations
migrations.add(model: User.self, database: .db1)
with pooled connections
return req.requestPooledConnection(to: . db1).flatMap { conn in
defer { try? req.releasePooledConnection(conn, to: . db1) }
return User.query(on: conn).all()
}
and in transactions
return req.transaction(on: .db1) { conn in
return User.query(on: conn).all()
}
Sorry if I haven't answered your questions. I understand that it'd be great if Fluent could support passing database name for each query, but I haven't found that in it. (or it's not obvious how to pass database name on query)
But btw from my point of view having separate databases for each client may give you a real headache on migrations... maybe it'd be better to store them all in one database but with partitioning? e.g. for PostgreSQL like described here

Syncronizing data between two django servers

I have a central Django server containing all of my information in a database. I want to have a second Django server that contains a subset of that information in a second database. I need a bulletproof way to selectively sync data between the two.
The secondary Django will need to pull its subset of data from the primary at certain times. The subset will have to be filtered by certain fields.
The secondary Django will have to occasionally push its data to the primary.
Ideally, the two-way sync would keep the most recently modified objects for each model.
I was thinking something along the lines of having using TimeStampedModel (from django-extensions) or adding my own DateTimeField(auto_now=True) so that every object stores its last modified time. Then, maybe a mechanism to dump the data from one DB and load it in to the other such that only the more recently modified objects are kept.
Possibilities I am considering are django's dumpdata, django-extensions dumpscript, django-test-utils makefixture or maybe django-fixture magic. There's a lot to think about, so I'm not sure which road to proceed down.
Here is my solution, which fits all of my requirements:
Implement natural keys and unique constraints on all models
Allows for a unique way to refer to each object without using primary key IDs
Sublcass each model from TimeStampedModel in django-extensions
Adds automatically updated created and modified fields
Create a Django management command for exporting, which filters a subset of data and serializes it with natural keys
baz = Baz.objects.filter(foo=bar)
yaz = Yaz.objects.filter(foo=bar)
objects = [baz, yaz]
flat_objects = list(itertools.chain.from_iterable(objects))
data = serializers.serialize("json", flat_objects, indent=3, use_natural_keys=True)
print(data)
Create a Django management command for importing, which reads in the serialized file and iterates through the objects as follows:
If the object does not exist in the database (by natural key), create it
If the object exists, check the modified timestamps
If the imported object is newer, update the fields
If the imported object is older, do not update (but print a warning)
Code sample:
# Open the file
with open(args[0]) as data_file:
json_str = data_file.read()
# Deserialize and iterate
for obj in serializers.deserialize("json", json_str, indent=3, use_natural_keys=True):
# Get model info
model_class = obj.object.__class__
natural_key = obj.object.natural_key()
manager = model_class._default_manager
# Delete PK value
obj.object.pk = None
try:
# Get the existing object
existing_obj = model_class.objects.get_by_natural_key(*natural_key)
# Check the timestamps
date_existing = existing_obj.modified
date_imported = obj.object.modified
if date_imported > date_existing:
# Update fields
for field in obj.object._meta.fields:
if field.editable and not field.primary_key:
imported_val = getattr(obj.object, field.name)
existing_val = getattr(existing_obj, field.name)
if existing_val != imported_val:
setattr(existing_obj, field.name, imported_val)
except ObjectDoesNotExist:
obj.save()
The workflow for this is to first call python manage.py exportTool > data.json, then on another django instance (or the same), call python manage.py importTool data.json.

GAE Datastore ID

I've created two different Entities, one a User and one a Message they can create. I assign each user an ID and then want to assign this ID to each message which that user creates. How can I go about this? Do I have to do it in a query?
Thanks
Assuming that you are using Python NDB, you can having something like the following:
class User(ndb.Model):
# put your fileds here
class Message(ndb.Model):
owner = ndb.KeyProperty()
# other fields
Create and save a User:
user = User(field1=value1, ....)
user.put()
Create and save a Message:
message = Message(owner=user.key, ...)
message.put()
Query a message based on user:
messages = Message.query().filter(Message.owner==user.key).fetch() # returns a list of messages that have this owner
For more information about NDB, take a look at Python NDB API.
Also, you should take a look at Python Datastore in order to get a better understanding of data modeling in App Engine.

How to create tables in multiple databases using Django models.py

I would like to create table "A" in one database (assume in SQL Server 2008) and another table in "B" (My SQL) using models.py through Django.
Both the tables structures "A" and "B" may differ. I have verified that through router.py can achieve this.
I want to do it without "router.py" file.
Could anyone guide me on this please.
Thanks,
Shiva.
Try to use super in save() method.
When you save, it will launch others commands like write in other database.
For example :
You have :
class Chair(models.Model) :
name = models.Charfield(max_length=30)
You can implement save() def like that :
class Chair(models.Model) :
name = models.Charfield(max_length=30)
def save(self, *args, **kwargs):
<your cmd here>
super(Chair, self).save(*args, **kwargs)
It could seems impossible like that, but you can use arguments for make anything you want.

Google App Engine: Using Big Query on datastore?

Have a GAE datastore kind with several 100'000s of objects in them. Want to do several involved queries (involving counting queries). Big Query seems a god fit for doing this.
Is there currently an easy way to query a live AppEngine Datastore using Big Query?
You can't run a BigQuery directly on DataStore entities, but you can write a Mapper Pipeline that reads entities out of DataStore, writes them to CSV in Google Cloud Storage, and then ingests those into BigQuery - you can even automate the process. Here's an example of using the Mapper API classes for just the DataStore to CSV step:
import re
import time
from datetime import datetime
import urllib
import httplib2
import pickle
from google.appengine.ext import blobstore
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp import util
from google.appengine.ext.webapp import template
from mapreduce.lib import files
from google.appengine.api import taskqueue
from google.appengine.api import users
from mapreduce import base_handler
from mapreduce import mapreduce_pipeline
from mapreduce import operation as op
from apiclient.discovery import build
from google.appengine.api import memcache
from oauth2client.appengine import AppAssertionCredentials
#Number of shards to use in the Mapper pipeline
SHARDS = 20
# Name of the project's Google Cloud Storage Bucket
GS_BUCKET = 'your bucket'
# DataStore Model
class YourEntity(db.Expando):
field1 = db.StringProperty() # etc, etc
ENTITY_KIND = 'main.YourEntity'
class MapReduceStart(webapp.RequestHandler):
"""Handler that provides link for user to start MapReduce pipeline.
"""
def get(self):
pipeline = IteratorPipeline(ENTITY_KIND)
pipeline.start()
path = pipeline.base_path + "/status?root=" + pipeline.pipeline_id
logging.info('Redirecting to: %s' % path)
self.redirect(path)
class IteratorPipeline(base_handler.PipelineBase):
""" A pipeline that iterates through datastore
"""
def run(self, entity_type):
output = yield mapreduce_pipeline.MapperPipeline(
"DataStore_to_Google_Storage_Pipeline",
"main.datastore_map",
"mapreduce.input_readers.DatastoreInputReader",
output_writer_spec="mapreduce.output_writers.FileOutputWriter",
params={
"input_reader":{
"entity_kind": entity_type,
},
"output_writer":{
"filesystem": "gs",
"gs_bucket_name": GS_BUCKET,
"output_sharding":"none",
}
},
shards=SHARDS)
def datastore_map(entity_type):
props = GetPropsFor(entity_type)
data = db.to_dict(entity_type)
result = ','.join(['"%s"' % str(data.get(k)) for k in props])
yield('%s\n' % result)
def GetPropsFor(entity_or_kind):
if (isinstance(entity_or_kind, basestring)):
kind = entity_or_kind
else:
kind = entity_or_kind.kind()
cls = globals().get(kind)
return cls.properties()
application = webapp.WSGIApplication(
[('/start', MapReduceStart)],
debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
If you append this to the end of your IteratorPipeline class: yield CloudStorageToBigQuery(output), you can pipe the resulting csv filehandle into a BigQuery ingestion pipe... like this:
class CloudStorageToBigQuery(base_handler.PipelineBase):
"""A Pipeline that kicks off a BigQuery ingestion job.
"""
def run(self, output):
# BigQuery API Settings
SCOPE = 'https://www.googleapis.com/auth/bigquery'
PROJECT_ID = 'Some_ProjectXXXX'
DATASET_ID = 'Some_DATASET'
# Create a new API service for interacting with BigQuery
credentials = AppAssertionCredentials(scope=SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery_service = build("bigquery", "v2", http=http)
jobs = bigquery_service.jobs()
table_name = 'datastore_dump_%s' % datetime.utcnow().strftime(
'%m%d%Y_%H%M%S')
files = [str(f.replace('/gs/', 'gs://')) for f in output]
result = jobs.insert(projectId=PROJECT_ID,
body=build_job_data(table_name,files)).execute()
logging.info(result)
def build_job_data(table_name, files):
return {"projectId": PROJECT_ID,
"configuration":{
"load": {
"sourceUris": files,
"schema":{
# put your schema here
"fields": fields
},
"destinationTable":{
"projectId": PROJECT_ID,
"datasetId": DATASET_ID,
"tableId": table_name,
},
}
}
}
With the new (from September 2013) streaming inserts api you can import records from your app into BigQuery.
The data is available in BigQuery immediately so this should satisfy your live requirement.
Whilst this question is now a bit old, this may be an easier solution for anyone stumbling across this question
At the moment though getting this to work from a the local dev server is patchy at best.
We're doing a Trusted Tester program for moving from Datastore to BigQuery in two simple operations:
Backup the datastore using Datastore Admin's backup functionality
Import backup directly into BigQuery
It automatically takes care of the schema for you.
More info (to apply): https://docs.google.com/a/google.com/spreadsheet/viewform?formkey=dHdpeXlmRlZCNWlYSE9BcE5jc2NYOUE6MQ
For BigQuery you got to export those Kind into a CSV or delimited record structure , load into to BigQuery and you can query. There is no facility that i know of which allows querying the live GAE Datastore.
Biquery is Analytical query engine that means you can't change the record. No update or delete allowed, you can only append.
No, BigQuery is a different product that needs the data to be uploaded to it. It cannot work over the datastore. You can use GQL to query the datastore.
As of 2016, This is very possible now! You must do the following:
Make a new bucket in google storage
Backup entities using using the database admin at console.developers.google.com I have a complete tutorial
Head to bigquery Web UI, and import the files generated in step 1.
See this post for a complete example of this workflow!

Resources