Best way to initialize datastore w/ related entities

Best way to initialize datastore w/ related entities - google-app-engine

To run my app, I need some static entities,
so I've decided to upload them w/ bulkloader through Remote API from CSV files.
But I have some entities those have relationship in between.
Like:
- kind: Category
properties:
- name: name
- kind: SubCategory
ancestor: yes
properties:
- name: parent_id
- name: name
How should I create a csv data to make it?
Is there any other way that I should take to init my app datastore?

If you define the key values (as strings) then you can create csv files with those values. One file would contain the Category values category_key,name. The other file would contain the SubCategory values subcategory_key,category_key,name. For example,
cat1,Category 1
cat2,Category 2
subcat1,cat1,SubCategory 1
subcat2,cat1,SubCategory 2
subcat3,cat2,SubCategory 3
You can read the files line by line, and create the static entities from the data like this (in Python):
import csv
with open('categories.csv') as csvfile:
categories = csv.reader(csvfile)
for row in categories:
Category.get_or_insert(row[0], name=row[1])
with open('subcategories.csv') as csvfile:
subcategories = csv.reader(csvfile)
for row in subcategories:
SubCategory.get_or_insert(row[0], parent_id=ndb.Key(Category, row[1]), name=row[2])
The parent_id value is constructed as a key. Both loops use the get_or_insert() function to prevent duplicate values so you can run it multiple times.
I see that SubCategory has an ancestor, so you could replace the last call with this (and remove the parent_id attribute):
SubCategory.get_or_insert(row[0], parent=ndb.Key(Category, row[1]), name=row[2])

Related

database design eloquent problem with query

I need to create an app who manage soccer sheets
I have actually a table who store the match with both teams
match :
-id
-dt_math
-club_home_id
-club_visitor_id
each team have a sheet to create the list of players.
So what i did, i created table match_sheet to store the both sheets from the teams.
match_sheet :
-id
-match_id
to store the players in each sheets i created the table match_sheet_player
match_sheet_player:
-id
-match_sheet_id
-player_id
Now i need to display only the matchs who have the both sheets in my view. and i don't know how to achieve that.
The first query that i made is that :
$matchs_sheets = MatchSheet::all();
$matchs = Match::whereIn('id', $matchs_sheets->pluck('match_id'))->orderByDesc('dt_match')->paginate(5);
But this return my the match even if there is one sheet but not both sheets. i really need to show the match onyl if there is the two sheets.
Update :
here my data for math_sheet
there is two records with 1659. 1659 is the id of the match. so i would like to only show match 1659 and not 1649 because there is only one record for this match

Assuming your model relationships are set up correctly, you can ask Laravel to get the matches only if the related model has a count of at least 2, using has(). For instance:
$matches = Match::whereIn('id', $ids)->has('matchSheet', '=', 2)...
Your relationship should be set up as e.g. this:
// on Match model
public function matchSheets()
{
return $this->hasMany(MatchSheet::class);
}
// on MatchSheet model
public function match()
{
return $this->belongsTo(Match::class);
}
Docs here: https://laravel.com/docs/5.6/eloquent-relationships#one-to-many - I really recommend reading through them, they'll save you huge amounts of time eventually!

Syncronizing data between two django servers

I have a central Django server containing all of my information in a database. I want to have a second Django server that contains a subset of that information in a second database. I need a bulletproof way to selectively sync data between the two.
The secondary Django will need to pull its subset of data from the primary at certain times. The subset will have to be filtered by certain fields.
The secondary Django will have to occasionally push its data to the primary.
Ideally, the two-way sync would keep the most recently modified objects for each model.
I was thinking something along the lines of having using TimeStampedModel (from django-extensions) or adding my own DateTimeField(auto_now=True) so that every object stores its last modified time. Then, maybe a mechanism to dump the data from one DB and load it in to the other such that only the more recently modified objects are kept.
Possibilities I am considering are django's dumpdata, django-extensions dumpscript, django-test-utils makefixture or maybe django-fixture magic. There's a lot to think about, so I'm not sure which road to proceed down.

Here is my solution, which fits all of my requirements:
Implement natural keys and unique constraints on all models
Allows for a unique way to refer to each object without using primary key IDs
Sublcass each model from TimeStampedModel in django-extensions
Adds automatically updated created and modified fields
Create a Django management command for exporting, which filters a subset of data and serializes it with natural keys
baz = Baz.objects.filter(foo=bar)
yaz = Yaz.objects.filter(foo=bar)
objects = [baz, yaz]
flat_objects = list(itertools.chain.from_iterable(objects))
data = serializers.serialize("json", flat_objects, indent=3, use_natural_keys=True)
print(data)
Create a Django management command for importing, which reads in the serialized file and iterates through the objects as follows:
If the object does not exist in the database (by natural key), create it
If the object exists, check the modified timestamps
If the imported object is newer, update the fields
If the imported object is older, do not update (but print a warning)
Code sample:
# Open the file
with open(args[0]) as data_file:
json_str = data_file.read()
# Deserialize and iterate
for obj in serializers.deserialize("json", json_str, indent=3, use_natural_keys=True):
# Get model info
model_class = obj.object.__class__
natural_key = obj.object.natural_key()
manager = model_class._default_manager
# Delete PK value
obj.object.pk = None
try:
# Get the existing object
existing_obj = model_class.objects.get_by_natural_key(*natural_key)
# Check the timestamps
date_existing = existing_obj.modified
date_imported = obj.object.modified
if date_imported > date_existing:
# Update fields
for field in obj.object._meta.fields:
if field.editable and not field.primary_key:
imported_val = getattr(obj.object, field.name)
existing_val = getattr(existing_obj, field.name)
if existing_val != imported_val:
setattr(existing_obj, field.name, imported_val)
except ObjectDoesNotExist:
obj.save()
The workflow for this is to first call python manage.py exportTool > data.json, then on another django instance (or the same), call python manage.py importTool data.json.

Database structure: how to best design for this issue?

I have users that have several objects and can upload images for those objects. Each object has several items. The photos the user upload can be assigned to those items. The thing is, one object can have one specific item more than once.
To give an example: objects are cars and items are seats, windows, doors, etc. A car may have 5 seats, but all seats are the same item. The description of the image should, however, still be "seat 1", "seat 2", etc. and the user can upload multiple images for seat 2 as well.
Till now I have the following tables:
objects: id, name
items: id, name
assigned_items: id, object_id, item_id, quantity
images: id, object_id, item_id
How would you best solve this issue?
The reason I use quantity is, because if type of the item changes, then most probably of all the items. E.g. 4 seats can become 4 wheels, etc. So, if there was a row for each assigned_item, lets say seat1, seat2, seat3, etc, then this would be more difficult to change, no?

Take a look at this model:
It allows you to:
Connect multiple items to multiple objects (thanks to OBJECT_ITEM table).
Connect the same item multiple times to the same object (thanks to OBJECT_ITEM.POSITION field).
Connect multiple images to an object-item connection (thanks to OBJECT_ITEM_IMAGE table). So, we are connecting to a connection, not directly to an item.
Name the image specific to the object-item connection (thanks to OBJECT_ITEM_IMAGE.IMAGE_NAME field), instead of just specific to the image.
Ensure image name is unique per object-item connection (thanks to UNIQUE constraint "U1").
NOTE: This model can be simplified in case OBJECT:ITEM relationship is 1:N instead of the M:N, but your own attempted model seems to suggest it is M:N.
NOTE: To connect an image directly to OBJECT (instead of OBJECT_ITEM), you'd need additional link table (OBJECT_IMAGE) in "between" OBJECT and IMAGE.
Example data:
OBJECT:
Car
ITEM:
Seat
OBJECT_ITEM:
Car-Seat-1
Car-Seat-2
Car-Seat-3
Car-Seat-4
Car-Seat-5
OBJECT_ITEM_IMAGE:
Car-Seat-1-Image1 "Seat1 Image"
Car-Seat-2-Image1 "Seat2 Image"
Car-Seat-2-Image2 "Seat2 Alternate Image"
Car-Seat-3-Image1 "Seat3 Image"
Car-Seat-4-Image1 "Seat4 Image"
Car-Seat-5-Image1 "Seat5 Image"
IMAGE:
Image1
Image2

Unless you actually mean that items can belong to multiple objects, using assigned_items is not helpful. If I understand you correctly, your main concern is that you sometimes have images that are for part of an item, so how do you describe the image?
Here is what I suggest:
OBJECT: id, name
ITEM: id, name, quantity, object_id
IMAGE: id, name (null), object_id (null), item_id (null)
If your DBMS supports constraints, have a constraint on IMAGE to enforce one or the other of object_id or item_id (but not both). This allows you to define the image as being either for an item or for the object as a whole.
When you query for the name of an image, you would use the COALESCE function (or your DB's equivalent) to pick up the image override name (if it exists) or the object/item name (if the override doesn't exist).

Simple search by value?

I would like to store some information as follows (note, I'm not wedded to this data structure at all, but this shows you the underlying information I want to store):
{ user_id: 12345, page_id: 2, country: 'DE' }
In these records, user_id is a unique field, but the page_id is not.
I would like to translate this into a Redis data structure, and I would like to be able to run efficient searches as follows:
For user_id 12345, find the related country.
For page_id 2, find all related user_ids and their countries.
Is it actually possible to do this in Redis? If so, what data structures should I use, and how should I avoid the possibility of duplicating records when I insert them?

It sounds like you need two key types: a HASH key to store your user's data, and a LIST for each page that contains a list of related users. Below is an example of how this could work.
Load Data:
> RPUSH page:2:users 12345
> HMSET user:12345 country DE key2 value2
Pull Data:
# All users for page 2
> LRANGE page:2:users 0 -1
# All users for page 2 and their countries
> SORT page:2:users By nosort GET # GET user:*->country GET user:*->key2
Remove User From Page:
> LREM page:2:users 0 12345
Repeat GETs in the SORT to retrieve additional values for the user.
I hope this helps, let me know if there's anything you'd like clarified or if you need further assistance. I also recommend reading the commands list and documentation available at the redis web site, especially concerning the SORT operation.

Since user_id is unique and so does country, keep them in a simple key-value pair. Quering for a user is O(1) in such a case... Then, keep some Redis sets, with key the page_id and members all the user_ids..

Best database design (model) for user tables

I'm developping a web application using google appengine and django, but I think my problem is more general.
The users have the possibility to create tables, look: tables are not represented as TABLES in the database. I give you an example:
First form:
Name of the the table: __________
First column name: __________
Second column name: _________
...
The number of columns is not fixed, but there is a maximum (100 for example). The type in every columns is the same.
Second form (after choosing a particular table the user can fill the table):
column_name1: _____________
column_name2: _____________
....
I'm using this solution, but it's wrong:
class Table(db.Model):
name = db.StringProperty(required = True)
class Column(db.Model):
name = db.StringProperty(required = True)
number = db.IntegerProperty()
table = db.ReferenceProperty(table, collection_name="columns")
class Value(db.Model):
time = db.TimeProperty()
column = db.ReferenceProperty(Column, collection_name="values")
when I want to list a table I take its columns and from every columns I take their values:
data = []
for column in data.columns:
column_data = []
for value in column.values:
column_data.append(value.time)
data.append(column_data)
data = zip(*data)
I think that the problem is the order of the values, because it is not true that the order for one column is the same for the others. I'm waiting for this bug (but until now I never seen it):
Table as I want: as I will got:
a z c a e c
d e f d h f
g h i g z i
Better solutions? Maybe using ListProperty?

Here's a data model that might do the trick for you:
class Table(db.Model):
name = db.StringProperty(required=True)
owner = db.UserProperty()
column_names = db.StringListProperty()
class Row(db.Model):
values = db.ListProperty(yourtype)
table = db.ReferenceProperty(Table, collection_name='rows')
My reasoning:
You don't really need a separate entity to store column names. Since all columns are of the same data type, you only need to store the name, and the fact that they are stored in a list gives you an implicit order number.
By storing the values in a list in the Row entity, you can use an index into the column_names property to find the matching value in the values property.
By storing all of the values for a row together in a single entity, there is no possibility of values appearing out of their correct order.
Caveat emptor:
This model will not work well if the table can have columns added to it after it has been populated with data. To make that possible, every time that a column is added, every existing row belonging to that table would have to have a value appended to its values list. If it were possible to efficiently store dictionaries in the datastore, this would not be a problem, but list can really only be appended to.
Alternatively, you could use Expando...
Another possibility is that you could define the Row model as an Expando, which allows you to dynamically create properties on an entity. You could set column values only for the columns that have values in them, and that you could also add columns to the table after it has data in it and not break anything:
class Row(db.Expando):
table = db.ReferenceProperty(Table, collection_name='rows')
#staticmethod
def __name_for_column_index(index):
return "column_%d" % index
def __getitem__(self, key):
# Allows one to get at the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# col1 = first_row[1]
# col12 = first_row[12]
value = None
try:
value = self.__dict__[Row.__name_for_column_index]
catch KeyError:
# The given column is not defined for this Row
pass
return value
def __setitem__(self, key, value):
# Allows one to set the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# first_row[5] = "New values for column 5"
self.__dict__[Row.__name_for_column_index] = value
# In order to allow efficient multiple column changes,
# the put() can go somewhere else.
self.put()

Why don't you add an IntegerProperty to Value for rowNumber and increment it every time you add a new row of values and then you can reconstruct the table by sorting by rowNumber.

You're going to make life very hard for yourself unless your user's 'tables' are actually stored as real tables in a relational database. Find some way of actually creating tables and use the power of an RDBMS, or you're reinventing a very complex and sophisticated wheel.

This is the conceptual idea I would use:
I would create two classes for the data-store:
table this would serve as a
dictionary, storing the structure of
the pseudo-tables your app would
create. it would have two fields :
table_name, column_name,
column_order . where column_order
would give the position of the
column within the table
data
this would store the actual data in
the pseudo-tables. it would have
four fields : row_id, table_name,
column_name , column_data. row_id
would be the same for data
pertaining to the same row and would
be unique for data across the
various pseudo-tables.

Put the data in a LongBlob.
The power of a database is to be able to search and organise data so that you are able to get only the part you want for performances and simplicity issues : you don't want the whole database, you just want a part of it and want it fast. But from what I understand, when you retrieve a user's data, you retrieve it all and display it. So you don't need to sotre the data in a normal "database" way.
What I would suggest is to simply format and store the whole data from a single user in a single column with a suitable type (LongBlob for example). The format would be an object with a list of columns and rows of type. And you define the object in whatever language you use to communicate with the database.
The columns in your (real) database would be : User int, TableNo int, Table Longblob.
If user8 has 3 tables, you will have the following rows :
8, 1, objectcontaintingtable1;
8, 2, objectcontaintingtable2;
8, 3, objectcontaintingtable3;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Best way to initialize datastore w/ related entities - google-app-engine

Related

database design eloquent problem with query

Syncronizing data between two django servers

Database structure: how to best design for this issue?

Simple search by value?

Best database design (model) for user tables

Categories

Resources