App Engine no longer updating index.yaml - google-app-engine

The index.yaml file of my GAE app is no longer updated by the development server.
I have recently added a new kind to my app and a handler that queries this kind like so:
from google.appengine.ext import ndb
class MyKind(ndb.Model):
thing = ndb.TextProperty()
timestamp = ndb.DateTimeProperty(auto_now_add=True)
and in the handler I have a query
query = MyKind.query()
query.order(-MyKind.timestamp)
logging.info(query.iter().index_list())
entities = query.fetch(100)
for entity in entities:
# do something
AFAIK, the development server should create an index for this query and update index.yaml accordingly. However, it doesn't. It just looks like this:
indexes:
# AUTOGENERATED
The logging.info(query.iter().index_list()) should output the index used for the query, it just says 'None'. Also, the SDK console says 'Datastore contains no indexes.'
Running the query returns the entities unsorted. I have two questions:
is there some syntax error in my code causes the query results be unsorted or is it the missing index?
if it's the missing index, is there a way to manually force the dev server to update index.yaml? Other suggestions?
Thank you

your call to order returns the new query..
query = MyKind.query()
query = query.order(-MyKind.timestamp)
..to clarify..
query.order(-MyKind.timestamp) does not change the query, it returns a new one, so you need to use the query returned by that method. As it is query.order(-MyKind.timestamp) in your code does nothing.

Related

NDB Query hangs for one set of criteria

UPDATED:
I do not have an entry in the index.yaml, expecting zig-zag to work fine on equalities (which it appears to in the datastore UI) and here is the actual query I'm making:
UploadBlock.query(
UploadBlock.time_slices == timeslice, # timeslice is each value in the repeated property
UploadBlock.company == block.company,
UploadBlock.aggregator_serial == block.aggregator_serial,
UploadBlock.instrument_serial == block.instrument_serial)\
.fetch(10, keys_only=True)
I've cross-posted this because I didn't read the initial support page first, so this is also on the google forum at https://groups.google.com/forum/#!topic/google-appengine/-7n8Y_tzCgM
I have a datastore query I make as an integral part of my system. It's an equality filter on four properties. When I make this query for one set of criteria, the query just hangs. It does not hang for this set of criteria in the datastore viewer, so I can see that there are only two results to return. I can fetch both of these entities by id without problem, but the query always hangs, both in remote shell and in production. The query is running fine for a multitude of different criteria, it appears to just be this one set. I have attempted replacing both entities so that the criteria still matches, but they have different ids and I get the same result.
For some more specific information, the four properties are defined as follows:
company = ndb.KeyProperty(indexed=True)
aggregator_serial = ndb.StringProperty(indexed=True)
instrument_serial = ndb.StringProperty(indexed=True)
time_slices = ndb.IntegerProperty(indexed=True, repeated=True)
I'm running gcloud versions
Google Cloud SDK 207.0.0
app-engine-java 1.9.64
app-engine-python 1.9.71
app-engine-python-extras 1.9.69
beta 2018.06.22
bq 2.0.34
core 2018.06.22
gcloud
gsutil 4.32
Any help on what I should try for next steps would be greatly appreciated.
Thanks in advance!

SQLALchemy - cannot reflect a SQL Server DB running on Amazon RDS

My code is simple:
app = Flask(__name__)
app.config.from_object('config')
db = SQLAlchemy(app)
db.metadata.reflect()
And it throws no errors. However, when I inspect the metadata after this reflection, it returns an empty immutabledict object.
The parameters in my connection string is 100% correct and the code works with non-RDS databases.
It seems to happen to others as well but I can't find a solution.
Also, I have tried to limit the reflection to specific tables using the "only" parameter in the metadata.reflect function, and this is the error I get:
sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in mssql+pyodbc://{connection_string}: (users)
I've fixed it. The reflect() method of the SQLAlchemy class has a parameter named 'schema'. Setting this parameter, to "dbo" in my case, solved it.
I am using Flask-SQLAlchemy, which does not have the said parameter in its reflect() method. You can follow this post to gain access to that parameter and others, such as 'only'.
This error occurs when reflect is called without the schema name provided. For example, this will cause the error to happen:
metadata.reflect(only = [tableName])
It needs to be updated to use the schema of the table you are trying to reflect over like this:
metadata.reflect(schema=schemaName, only = [tableName])
You have to set schema='dbo' in parameter for reflect.
db.Model.metadata.reflect(bind=engine, schema='dbo', only=['User'])
and then create model of your table:
class User(db.Model):
__table__ = Base.metadata.tables['dbo.User']
and to access data from that table:

Performance issue with django exclude

I have a Django 1.8 application, and I am using an MsSQL database, with pyodbc as the db backend (using "django-pyodbc-azure" module).
I have the following models:
class Branch(models.Model):
name = models.CharField(max_length=30)
startTime = models.DateTimeField()
class Device(models.Model):
uid = models.CharField(max_length=100, primary_key=True)
type = models.CharField(max_length=20)
firstSeen = models.DateTimeField()
lastSeen = models.DateTimeField()
class Session(models.Model):
device = models.ForeignKey(Device)
branch = models.ForeignKey(Branch)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
I need to query the session model, and I want to exclude some records with specific device values. So I issue the following query:
sessionCount = Session.objects.filter(branch=branch)
.exclude(device__in=badDevices)
.filter(end__gte=F('start')+timedelta(minutes=30)).count()
badDevices is a pre-filled list of device ids with around 60 items.
badDevices = ['id-1', 'id-2', ...]
This query takes around 1.5 seconds to complete. If I remove the exclude from the query, it takes around 250 miliseconds.
I printed the generated sql for this queryset, and tried it in my database client. There, both versions executed in around 250 miliseconds.
This is the generated SQL:
SELECT [session].[id], [session].[device_id], [session].[branch_id], [session].[start], [session].[end]
FROM [session]
WHERE ([session].[branch_id] = my-branch-id AND
NOT ([session].[device_id] IN ('id-1', 'id-2', 'id-3',...)) AND
DATEPART(dw, [session].[start]) = 1
AND [session].[end] IS NOT NULL AND
[session].[end] >= ((DATEADD(second, 600, CAST([session].[start] AS datetime)))))
So, using the exclude in database level doesn't seem to be affecting the query performance, but in django, the query runs 6 times slower if I add the exclude part. What could be causing this?
The general issue seems to be that django is doing some extra work to prepare the exclude clause. After that step and by the time the SQL has been generated and sent to the database, there isn't anything interesting happening on the django side that could cause such a significant delay.
In your case, one thing that might be causing this is some kind of pre-processing of badDevices. If, for instance, badDevices is a QuerySet then django might be executing the badDevices query just to prepare the actual query's SQL. Possibly something similar might be happening in the case where device has a non-default primary key.
The other thing might delay the SQL preparation is of course django-pyodbc-azure. Maybe it's doing something strange while compiling the query and it becomes a bottleneck.
This is all wild speculation though, so if you're still having this issue then post the Device and Branch models as well, the exact content of badDevices and the SQL generated from the queries. Then maybe some scenarios can be at least eliminated.
EDIT: I think it must be the Device.uid field. Possibly django or pyodbc is getting confused by the non-default primary key and is fetching all the devices while generating the query. Try two things:
Replace device__in with device_id__in, device__pk__in and device__uid__in and check each one again. Maybe a more explicit query will be easier for django to translate into SQL. You can even try replacing branch with branch_id, just in case.
If the above doesn't work, try replacing the exclude expression with a raw SQL where clause:
# add quotes (because of the hyphens) & join
badDevicesIdString = ", ".join(["'%s'" % id for id in badDevices])
# Replaces .exclude()
... .extra(where=['device_id NOT IN (%s)' % badDevicesIdString])
If neither works, then most likely the problem is with the whole query and not just exclude. There are some more options in that case but try the above first and I will update my answer later if necessary.
Just want to share a similar problem that I had with MySQL and exclude clauses performance and how it was fixed.
When running the exclude clause, the list with the "in" lookup was actually a Queryset that I got using values_list method. Checking the exclude query executed by MySQL, the "in" objects were not values but actually another query. This behavior was impacting performance on specific large queries.
To fix that, instead of passing the queryset, I flat it out in a python list of values. By doing that, each value is passed as an argument inside the in lookup and the performance was really improved.

CFINDEX throwing attribute validation error exception

I am upgrading to ColdFusion 11 from ColdFusion 8, so I need to rebuild my search indices to work Solr instead of Verity. I cannot find any reliable way to import my old Verity collections, so I'm attempting to build the new indices from scratch. I am using the following code to index some items along with their corresponding documents which are located on the server:
<cfsetting requesttimeout="3600" />
<cfquery name="qDocuments" datasource="#APPLICATION.DataSource#">
SELECT DISTINCT
ID,
Status,
'C:\Documents\'
CONCAT ID
CONCAT '.PDF' AS File
FROM tblDocuments
</cfquery>
<cfindex
query="qDocuments"
collection="solrdocuments"
action="fullimport"
type="file"
key="document_file"
custom1="ID"
custom2="Status" />
A very similar setup was used with Verity for years without a problem.
When I run the above code, I get the following exception:
Attribute validation error for CFINDEX.
The value of the FULLIMPORT attribute is invalid.
Valid values are: UPDATE, DELETE, PURGE, REFRESH, FULL-IMPORT,
DELTA-IMPORT,STATUS, ABORT.
This makes absolutely no sense, since there is no "FULLIMPORT" attribute for CFINDEX.
I am running ColdFusion 11 Update 3 with Java 1.8.0_25 on Windows Server 2008R2/IIS7.5.
You should believe the error message. try this:
<cfindex
query="qDocuments"
collection="solrdocuments"
action="FULL-IMPORT"
type="file"
key="document_file"
custom1="ID"
custom2="Status" />
It's referring to the value of the attribute action.
This is definitely a bug. In the ColdFusion documentation, fullimport is not an attribute of cfindex.
I know this is an old thread, but in case anyone else has the same question, it's just poor descriptions in the documentation. The action "FullImport" is only available when using type="dih" (i.e. Data Import Handler). When using the query attributes, use action="refresh" instead.
Source: CFIndex Documentation:
...
When type="dih", these actions are used:
abort: Aborts an ongoing indexing task.
deltaimport: For partial indexing. For instance, for any updates in the database,
instead of a full import, you can perform delta import to update your
collection.
fullimport: To index full database. For instance,
when you index the database for the first time.
status:
Provides the status of indexing, such as the total number of documents
processed and status such as idle or running.

GQL Query Not Returning Results on StringProperty Query Test for Equality

class MyEntity(db.Model):
timestamp = db.DateTimeProperty()
title = db.StringProperty()
number = db.FloatProperty()
db.GqlQuery("SELECT * FROM MyEntity WHERE title = 'mystring' AND timestamp >= date('2012-01-01') AND timestamp <= date('2012-12-31') ORDER BY timestamp DESC").fetch(1000)
This should fetch ~600 entities on app engine. On my dev server it behaves as expected, builds the index.yaml, I upload it, test on server but on app engine it does not return anything.
Index:
- kind: MyEntity
properties:
- name: title
- name: timestamp
direction: desc
I try splitting the query down on datastore viewer to see where the issue is and the timestamp constraints work as expected. The query returns nothing on WHERE title = 'mystring' when it should be returning a bunch of entities.
I vaguely remember fussy filtering where you had to call .filter("prop =",propValue) with the space between property and operator, but this is a GqlQuery so it's not that (and I tried that format with the GQL too).
Anyone know what my issue is?
One thing I can think of:
I added the list of MyEntity entities into the app via BulkLoader.py prior to the new index being created on my devserver & uploaded. Would that make a difference?
The last line you wrote is probably the problem.
Your entities in the actual real datastore are missing the index required for the query.
As far as I know, when you add a new index, App Engine is supposed to rebuild your indexes for you. This may take some time. You can check your admin page to check the state of your indexes and see if it's still building.
Turns out there's a slight bug in the bulkloader supplied with App Engine SDK - basically autogenerated config transforms strings as db.Text, which is no good if you want these fields indexed. The correct import_transform directive should be:
transform.none_if_empty(str)
This will instruct App Engine to index the uploaded field as a db.StringProperty().

Resources