Merge multiple columns in bulkloader - google-app-engine

I'm using app engine's bulkloader to import a CSV file into my datastore. I've got a number of columns that I want to merge into one, for example they're all URLs, but not all of them are supplied and there is a superseding order, eg:
url_main
url_temp
url_test
I want to say: "Ok, if url_main exists, use that, otherwise user url_test and then use url_temp"
Is it, therefore, possible to create a custom import transform that references columns and merges them into one based on conditions?

Ok, so after reading https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader I learnt about import_transform and that this can use custom functions.
With that in mind, this pointed me the right way:
... a two-argument function with the keyword argument bulkload_state,
which on return contains useful information about the entity:
bulkload_state.current_entity, which is the current entity being
processed; bulkload_state.current_dictionary, the current export
dictionary ...
So, I created a function that handled two variables, one would be the value of the current entity and the second would be the bulkload_state that allowed me to fetch the current row, like so:
def check_url(value, bulkload_state):
row = bulkload_state.current_dictionary
fields = [ 'Final URL', 'URL', 'Temporary URL' ]
for field in fields:
if field in row:
return row[ field ]
return None
All this does is grab the current row (bulkload_state.current_dictionary) and then checks which URL fields exist, otherwise it just returns None.
In my bulkloader.yaml I call this function simply by setting:
- property: business_url
external_name: URL
import_transform: bulkloader_helper.check_url
Note: the external_name doesn't matter, as long as it exists as I'm not actually using it, I'm making use of multiple columns.
Simples!

Related

Query based on multiple filters in Firebase

I am working out the structure for a JSON database for an app like onlyFans. Basically, someone can create a club, then inside of that club, there are sections where the creator's posts are shown and another where the club members posts are shown. There is however a filter option where both can be seen.
In order to make option 1 below work, I need to be able to filter based on if isFromCreator=true and at the same time based on timstamp. How can I do this?
Here are the 2 I have written down:
ClubContent
CreatorID
clubID
postID: {isFromCreator: Bool}
OR
creatorPosts
postID: {}
MemeberPosts
postID: {}
Something like the below would be what I want:
ref.child("Content").child("jhTFin5npXeOv2fdwHBrTxTdWIi2").child("1622325513718")
.queryOrdered(byChild: "timestamp")
.queryLimited(toLast: 10)
.queryEqual(toValue: true, childKey: "isFromCreator")
I triedqueryEqual yet it did not return any of the values I know exist with the configuration I specified.
You can use additional resource locations within rules by referencing the parent/child directories specifically and comparing the val() of the respective node structure.
for example:
".write": "data.parent().child('postID').child('isFromCreator').val()"
Just be aware that Security Rules do not filter or process the data in the request, only allow or deny the requested operation.
You can read more about this from the relevant documentation:
https://firebase.google.com/docs/database/security/rules-conditions#referencing_data_in_other_paths
https://firebase.google.com/docs/database/security/core-syntax#rules-not-filters

Index a dictionary property in azure search

I have a DTO with a property of type Dictionary<string, string>. It's not annotated. When I upload my DTO and call indexClient.Documents.Index(batch), I get this error back from the service:
The request is invalid. Details: parameters : A node of type 'StartObject' was read from the JSON reader when trying to read the contents of the property 'Data'; however, a 'StartArray' node was expected.
The only way I've found to avoid it is by setting it to null. This is how I created my index:
var fields = FieldBuilder.BuildForType<DTO>();
client.Indexes.Create(new Index
{
Name = indexName,
Fields = fields
});
How can I index my dictionary?
Azure Cognitive Search doesn't support fields that behave like loosely-typed property bags like dictionaries. All fields in the index must have a well-defined EDM type.
If you don't know the set of possible fields at design-time, you have a couple options, but they come with big caveats:
In your application code, add new fields to the index definition as you discover them while indexing documents. Updating the index will add latency to your overall write path, so depending on how frequently new fields are added, this may or may not be practical.
Model your "dynamic" fields as a set of name/value collection fields, one for each desired data type. For example, if a new string field "color" is discovered with value "blue", the document you upload might look like this:
{
"id": "123",
"someOtherField": 3.5,
"dynamicStringFields": [
{
"name": "color",
"value": "blue"
}
]
}
Approach #1 risks bumping into the limit on the maximum number of fields per index.
Approach #2 risks bumping into the limit on the maximum number of elements across all complex collections per document. It also complicates the query model, especially for cases where you might want correlated semantics in queries.

Cakephp 3 - How to integrate external sources in table?

I working on an application that has its own database and gets user information from another serivce (an LDAP is this case, through an API package).
Say I have a tables called Articles, with a column user_id. There is no Users table, instead a user or set of users is retrieved through the external API:
$user = LDAPConnector::getUser($user_id);
$users = LDAPConnector::getUsers([1, 2, 5, 6]);
Of course I want retrieving data from inside a controller to be as simple as possible, ideally still with something like:
$articles = $this->Articles->find()->contain('Users');
foreach ($articles as $article) {
echo $article->user->getFullname();
}
I'm not sure how to approach this.
Where should I place the code in the table object to allow integration with the external API?
And as a bonus question: How to minimise the number of LDAP queries when filling the Entities?
i.e. it seems to be a lot faster by first retrieving the relevant users with a single ->getUsers() and placing them later, even though iterating over the articles and using multiple ->getUser() might be simpler.
The most simple solution would be to use a result formatter to fetch and inject the external data.
The more sophisticated solution would a custom association, and a custom association loader, but given how database-centric associations are, you'd probably also have to come up with a table and possibly a query implementation that handles your LDAP datasource. While it would be rather simple to move this into a custom association, containing the association will look up a matching table, cause the schema to be inspected, etc.
So I'll stick with providing an example for the first option. A result formatter would be pretty simple, something like this:
$this->Articles
->find()
->formatResults(function (\Cake\Collection\CollectionInterface $results) {
$userIds = array_unique($results->extract('user_id')->toArray());
$users = LDAPConnector::getUsers($userIds);
$usersMap = collection($users)->indexBy('id')->toArray();
return $results
->map(function ($article) use ($usersMap) {
if (isset($usersMap[$article['user_id']])) {
$article['user'] = $usersMap[$article['user_id']];
}
return $article;
});
});
The example makes the assumption that the data returned from LDAPConnector::getUsers() is a collection of associative arrays, with an id key that matches the user id. You'd have to adapt this accordingly, depending on what exactly LDAPConnector::getUsers() returns.
That aside, the example should be rather self-explanatory, first obtain a unique list of users IDs found in the queried articles, obtain the LDAP users using those IDs, then inject the users into the articles.
If you wanted to have entities in your results, then create entities from the user data, for example like this:
$userData = $usersMap[$article['user_id']];
$article['user'] = new \App\Model\Entity\User($userData);
For better reusability, put the formatter in a custom finder. In your ArticlesTable class:
public function findWithUsers(\Cake\ORM\Query $query, array $options)
{
return $query->formatResults(/* ... */);
}
Then you can just do $this->Articles->find('withUsers'), just as simple as containing.
See also
Cookbook > Database Access & ORM > Query Builder > Adding Calculated Fields
Cookbook > Database Access & ORM > Retrieving Data & Results Sets > Custom Finder Methods

CakePHP3: Check if model exists

I have a search engine which calls a Cakephp action and receives which model the engine should search in eg. "Projects". The variable is called $data_type;
Right now I use this to check if the model exists:
// Check if Table really exists
if(!TableRegistry::get($data_type)){
// Send error response to view
$response = [
'success' => false,
'error' => 'Data type does not exist'
];
$this->set('response', $response);
return;
}
I'm not sure I'm doing it the right or the safest way to check if a model exists, because I don't know if the TableRegistry::get() function is vulnerable to SQL injection behind the scenes.
I also found that inputing an empty string to the get() function doesn't need in a false result??? Is there a safe solution I can implement that will solve my problem?
TableRegistry::get() is not safe to use with user input
First things first. It's probably rather complicated to inject dangerous SQL via TableRegistry::get(), but not impossible, as the alias passed in the first argument will be used as the database table name in case an auto/generic-table instance is created. However the schema lookup will most likely fail before anything else, also the name will be subject to inflection, specifically underscore and lowercase inflection, so an injection attempt like
Foo; DELETE * FROM Bar;
would end up as:
foo;d_e_l_e_t_e*f_r_o_m_bar;
This would break things as it's invalid SQL, but it won't cause further harm. The bottom line however is that TableRegistry::get() cannot be regarded as safe to use with user input!
The class of the returned instance indicates a table class' existence
TableRegistry::get() looks up and instantiates possible existing table classes for the given alias, and if that fails, it will create a so called auto/generic-table, which is an instance of \Cake\ORM\Table instead of an instance of a concrete subclass thereof.
So you could check the return value against \Cake\ORM\Table to figure whether you've retrieved an instance of an actual existing table class:
$table = TableRegistry::get($data_type);
if (get_class($table) === \Cake\ORM\Table::class) {
// not an existing table class
// ...
}
Use a whitelist
That being said, unless you're working on some kind of administration tool that explicitly needs to be able to access to all tables, the proper thing do would be to use some sort of whitelisting, as having users arbitrarily look up any tables they want could be a security risk:
$whitelist = [
'Projects',
'...'
];
if (in_array($data_type, $whitelist, true) !== true) {
// not in the whitelist, access prohibited
// ...
}
Ideally you'd go even further and apply similar restrictions to the columns that can be looked up.
You may want to checkout https://github.com/FriendsOfCake/awesome-cakephp#search for some ready made search plugins.

AppEngine Datastore get entities that have ALL items in list property

I want to implement some kind of tagging functionality to my app. I want to do something like...
class Item(db.Model):
name = db.StringProperty()
tags = db.ListProperty(str)
Suppose I get a search that have 2 or more tags. Eg. "restaurant" and "mexican".
Now, I want to get Items that have ALL, in this case 2, given tags.
How do I do that? Or is there a better way to implement what I want?
I believe you want tags to be stored as 'db.ListProperty(db.Category)' and then query them with something like:
return db.Query(Item)\
.filter('tags = ', expected_tag1)\
.filter('tags = ', expected_tag2)\
.order('name')\
.fetch(256)
(Unfortunately I can't find any good documentation for the db.Category type. So I cannot definitively say this is the right way to go.) Also note, that in order to create a db.Category you need to use:
new_item.tags.append(db.Category(unicode(new_tag_text)))
use db.ListProperty(db.Key) instead,which stores a list of entity's keys.
models:
class Profile(db.Model):
data_list=db.ListProperty(db.Key)
class Data(db.Model):
name=db.StringProperty()
views:
prof=Profile()
data=Data.gql("")#The Data entities you want to fetch
for data in data:
prof.data_list.append(data)
/// Here data_list stores the keys of Data entity
Data.get(prof.data_list) will get all the Data entities whose key are in the data_list attribute

Resources