Selecting all related rows with BigQuery (reading logs from GAE) - google-app-engine

My Google App Engine logs are being exported to BigQuery via the standard streaming export tool. I'd like to query "show me all log lines for requests in which any log line contains a string".
This query gives me the request ids I'm interested in:
SELECT protoPayload.requestId AS reqId
FROM TABLE_QUERY(logs, 'true')
WHERE protoPayload.line.logMessage contains 'INTERNAL_SERVICE_ERROR'
...and this lets me query for the related lines:
SELECT
metadata.timestamp AS Time,
protoPayload.host AS Host,
protoPayload.status AS Status,
protoPayload.resource AS Path,
protoPayload.line.logMessage
FROM
TABLE_QUERY(logs, 'true')
WHERE
protoPayload.requestId in ("requestid1", "requestid2", "etc")
ORDER BY time
However, I'm having trouble combining the two into a single query. BQ doesn't seem to allow subselects in the WHERE clause and I get confusing error messages when I try to do a traditional self-join with named tables. What's the secret?

To select lines where at least one of logMessage contains given string, you can use OMIT IF construct
SELECT
metadata.timestamp AS Time,
protoPayload.host AS Host,
protoPayload.status AS Status,
protoPayload.resource AS Path,
protoPayload.line.logMessage
FROM
TABLE_QUERY(logs, 'true')
OMIT RECORD IF
EVERY(NOT (protoPayload.line.logMessage contains 'INTERNAL_SERVICE_ERROR'))
ORDER BY time

Related

SQL LIKE + wildcard operator only returns results with first value

What's happening StackOverflow.
I'm using DB Browser for SQLLite to query a database with two tables - people and states. people has the field state_code and states has the fields state_abbrev and state_name. people.state_code and states.state_abbrev both have postal codes for USA states (e.g. AK, AZ, MI, MN, etc.).
I'm trying to search for all records where the state names match a certain text string criteria, using LIKE and wildcards. Here's my code:
SELECT *
FROM people
WHERE state_code = (
SELECT state_abbrev FROM states WHERE state_name LIKE 'mi%');
I want the above code to return all records where the state name begins with "mi" - i.e. Michigan, Minnesota, and Missouri. However, this is only returning records with the first alphabetical state name - Michigan. Same happens with LIKE '%ans%' - this only returns records from Arkansas, despite the existence of records from Kansas.
What am I doing wrong yall? I've tried using GROUP BY state_name, state_abbrev within the nested SELECT to no avail, and I can't seem to find anyone else encountering the same issue.
You can try using WHERE state_code IN (... this will search for all the records in your inner sql query

Exported App Engine logs in BigQuery has missing fields

For App Engine (Python, Standard environment), I have created a log-export (v2) in the same project as the app. Destination of the sink is a dataset in Google BigQuery.
I can perform some simple queries in BigQuery:
SELECT
severity,
timestamp AS Time,
protoPayload.host AS Host,
protoPayload.status AS Status,
protoPayload.resource AS Path,
httpRequest.status,
httpRequest.requestMethod,
httpRequest.userAgent,
httpRequest.remoteIp
FROM
[MY_PROJECT:MYLOGS.appengine_googleapis_com_request_log_20170214]
LIMIT
10
While httpRequest.status will be shown in the results with values (and the same for all the other selected fields), other fields of httpRequest are shown with null, e.g.: requestMethod, userAgent, remoteIp.
On the Cloud Log web page, I can see these log entries and these values are existing, but it seems that they are not exported to BigQuery.
When I try to filter by the request method to be GET, e.g.:
SELECT
severity,
timestamp AS Time,
protoPayload.host AS Host,
protoPayload.status AS Status,
protoPayload.resource AS Path,
httpRequest.status,
httpRequest.requestMethod,
httpRequest.userAgent,
httpRequest.remoteIp
FROM
[MY_PROJECT:MYLOGS.appengine_googleapis_com_request_log_20170214]
WHERE
httpRequest.requestMethod = 'GET'
LIMIT
10
This query will return zero records.
Any idea, why some fields are not shown in queries and cannot be used for filters in BigQuery?

Solr deletions with custom full import

I'm trying to use the DataImportHandler to keep my index in sync with a SQL database (what I would think is a pretty vanilla thing to do). Since my database will be pretty large, I want to use incremental imports using this method http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport so the calls are of the form http://localhost:8983/solr/Items/dataimport?command=full-import&clean=false. This works perfectly well for adding items.
I have a separate DeletedItems table in my database which contains the primary keys of the items that have been deleted from the Items table, along with when they were deleted. As part of the DataImport call, I had hoped to be able to delete the relevant items from my index based on a query along the lines of
SELECT Id FROM DeletedItems WHERE WasDeletedOn > '${dataimporter.last_index_time}'
but I can't figure out how to do this. The link above alludes to it with the cryptic
In this case it means obviously that in case you also want to use deletedPkQuery then when running the delta-import command is still necessary.
but setting deletedPkQuery to the above SQL query doesn't seem to work. I then read that deletedPkQuery only works with delta-imports, so I am forced to make two requests to my solr server as part of the sync process? This doesn't seem right as the operations are parameterized by the dataimporter.last_index_time property, which changes. Both steps would need to be done in one "atomic" action, surely? Any ideas?
You must use the import handler special commands
https://wiki.apache.org/solr/DataImportHandler#Special_Commands
With these commands you can alter the boost or delete a document coming from the recordset of the full import query. Be aware that you must use the $skipDoc field to avoid that the document gets indexed again and that you must repeat the id in the $deleteDocById field.
You can use a union query
select
id,
text,
'false' as [$deleteDocById],
'false' as [$skipDoc]
from [rows to update or add]
Union Select
id,
'' as text,
id as [$deleteDocById],
true as [$skipDoc]
or a case when
select
id,
text,
CASE
when deleted = 1 then id
else 'false'
END as [$deleteDocById],
CASE
when deleted = 1 then 'true'
else 'false'
END as [$skipDoc]
Where updated > ${dih.last_index_time}
The deletedPkQuery is run as part of the regular call to delta-import, so you don't have to run anything twice (and when doing a full-import, there's no need to run deletedPkQuery, since the whole connection is cleared before importing anyway).
The deletedPkQuery should be configured on the same element as your main query. Be sure to match the field names exactly as well, and that the id produced by your deletedPkQuery matches the one provided by the main query.
There's a minimal example on solr.pl for importing and deleting fields using the same deleted_entries-table structure as you have here:
<entity
name="album"
query="SELECT * from albums"
deletedPkQuery="SELECT deleted_id as id FROM deletes WHERE deleted_at > '${dataimporter.last_index_time}'"
>
Also make sure that the format of the deleted_at-field is comparable against the value produced by last_index_time. The default is yyyy-MM-dd HH:mm:ss.
.. and lastly, remember that the last_index_time property isn't available before the second time the task is run, since there's no "previous index time" the first time an index is being populated (but the deletedPkQuery shouldn't run before that anyway).

Get all the lines from a request that match a criteria

I was trying to optimize a BigQuery query that I use to search through my AppEngine application logs (exported to BigQuery automatically through Google Cloud Logging) but I got an error that I don't understand.
SELECT
protoPayload.requestId,
protoPayload.line.logMessage
FROM (
SELECT
protoPayload.requestId AS matchingRequestId
FROM
TABLE_DATE_RANGE(MyProject_Logs.appengine_googleapis_com_request_log_, DATE_ADD(CURRENT_TIMESTAMP(), -1, 'HOUR'), CURRENT_TIMESTAMP())
WHERE
protoPayload.resource CONTAINS '/url'
AND protoPayload.line.logMessage CONTAINS 'criteria'
LIMIT 50)
WHERE
protoPayload.requestId = matchingRequestId
results in
Query Failed
Error: Field 'protoPayload.requestId' not found.
Job ID: myProject:job_DZpCc0u52LBFh8DFL0nDCsizo8o
This error does not make sense to me because when I try to execute just the sub-query that also use the protoPayload.requestId field, it works fine.
Just as a side note, this SO answers better what I am trying to achieve but I am still curious what cause the error in my query.
This error makes sense to me for that particular example in the question:
outside of subselect - protoPayload.requestId is not visible anymore - it is a matchingRequestId based on alias in protoPayload.requestId AS matchingRequestId
Please note, after you fix outside (two) references to protoPayload.requestId, next error will be about protoPayload.line.logMessage
It is also not visible to outer select because a) it is not part of subselect and b) reference to table from subselect
it looks like you oversimplified your example as even if/after above fixed - it still makes no much sense to me - especisaaly because of WHERE matchingRequestId = matchingRequestId

SOQL Count with multiple Where clauses

I am trying to count all the results that match multiple Where conditions in Salesforce. All these Where conditions exist under the same object that I am selecting from. It seems like it should be a simply query but my SQL and SOQL experience is limited.
Here's my code right now:
SELECT count() FROM Account
WHERE Success__c='yes'
AND Active__c='true'
AND Days__c>'30'
AND Days__c<'37'
It'd be useful to see the actual error message, but at a guess, you have quotes around things that shouldn't have them, e.g. you want
SELECT count() FROM Account
WHERE Success__c = 'yes'
AND Active__c = true
AND Days__c > 30
AND Days__c < 37
Also there are tools like SoqlX, the Force.com IDE and Workbench that'll let you run queries, so if Geckoboard is hiding the actual error message, you can work through getting a good query in one of these tools first.

Resources