How to solve the Index warning on GAE? - google-app-engine

We have introduced a new model to our Datastore a few days ago. Surprisingly I still get Index warnings
W 2014-02-09 03:38:28.480
suspended generator run_to_queue(query.py:938) raised NeedIndexError(no matching index found.
The suggested index for this query is:
- kind: FeelTrackerRecord
ancestor: yes
properties:
- name: timestamp)
W 2014-02-09 03:38:28.480
suspended generator helper(context.py:814) raised NeedIndexError(no matching index found.
The suggested index for this query is:
- kind: FeelTrackerRecord
ancestor: yes
properties:
- name: timestamp)
even though the index is served under DataStore Indexes
indexes:
# AUTOGENERATED
# This index.yaml is automatically updated whenever the dev_appserver
# detects that a new type of query is run. If you want to manage the
# index.yaml file manually, remove the above marker line (the line
# saying "# AUTOGENERATED"). If you want to manage some indexes
# manually, move them above the marker line. The index.yaml file is
# automatically uploaded to the admin console when you next deploy
# your application using appcfg.py.
- kind: FeelTrackerRecord
ancestor: yes
properties:
- name: record_date
- name: timestamp
What am I missing please?

I finally found the problem.
Best way to solve this is to make sure the local index.yaml is empty (delete all the indices). Then simply run your GAE app on localhost and access your app as you would expect.
Http access is pretty straightforward over browser and if GET/POST over REST is required you can use curl from a terminal:
GET:
curl --user test#gmail.com:test123 http://localhost:8080/api/v1.0/records/1391944029
POST:
curl --user test#gmail.com:test123 http://localhost:8080/api/v1.0/records/1391944029 -d '{"records": [
{
"notes": "update",
"record_date": "2014-02-02",
"timestamp": 1391944929
}
], "server_sync_timestamp": null}' -X POST -v -H "Accept: application/json" -H "Content-type: application/json"
GAE is now updating the index.yaml automatically and add the correct indices in there.
After your deploying your app, you need to cleanup the old indices.
This is done through a terminal:
appcfg.py vacuum_indexes src
After login with credentials it will ask you about the old indices that are no longer in your index.yaml and if they should be deleted. Press y and continue.

I mentioned in a comment your indexes don't match the required. The error says
raised NeedIndexError(no matching index found.
The suggested index for this query is:
- kind: FeelTrackerRecord
ancestor: yes
properties:
- name: timestamp)
However you the index you list is different
- kind: FeelTrackerRecord
ancestor: yes
properties:
- name: record_date
- name: timestamp
Do you see the difference ?
Just add the index as listed and update your indexes.

Related

Connection to SQL Server database using sql_exporter and Prometheus but can't execute custom metrics on db

So, I've went through many of the options to get this set up and none worked with my SQL Server setup except using sql_exporter. There is a successful connection where I can read all the built-in metrics but when I tried my own query on a specific database and its table there is always something wrong with my query such as "Invalid Object" when trying to reach the database. There have been many resources I have attempted to use but would mostly like a custom metric like: https://sysdig.com/blog/monitor-sql-server-prometheus/.
sql_exporter.yml:
# The target to monitor and the collectors to execute on it.
target:
# Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)
# the schema gets dropped or replaced to match the driver expected DSN format.
data_source_name: 'sqlserver://username:password#localhost:1433'
# Collectors (referenced by name) to execute on the target.
collectors: [mssql_standard]
# Collector files specifies a list of globs. One collector definition is read from each matching file.
collector_files:
- "*.collector.yml"
prometheus.yml:
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: 'sql_server'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9966']
When I tried the custom metric in the post I linked, sql_exporter crashes instantly no errors. My database is being found in the standard metrics of https://github.com/free/sql_exporter but I am unsure the syntax to execute a simple SELECT db_value FROM db_table. I understand there are ways out there and I've tried so will need assistance. Thank you in advance!

IOException while reading response

I get this error message when I try to index files inside one directory. The steps to reproduce this error are following:
$ bin/solr stop -all
$ bin/solr start -e cloud
How many solr nodes ...? 2
Please enter the port for node1 [8983] 8983
Please enter the port for node2 [7574] 7574
Please provide a name for your new collection [gettingstarted] test
How many shard would you like to split test to [2] 2
How many replicas per shard .... [2] 2
Please choose a configuration for test collection .... basic_configs
Then, if I go to localhost:8983/solr/#/ under Core Admin tab I see two shards of a new collection test, which we have just created. I then want to index one of my folders and associate this index with test collection. I do it like this:
bin/post -c test ~/Projects/
As a result, I see how files are being indexed, but among all this information I see a lot of such warnings :
SimplePostTool: WARNING: Response Solr returned an error #400 (Bad request) for url
http://localhost:8983/solr/test/update
....
SimplePostTool: WARNING: IOException while reading response: java.io.IOException;
Server returned HTTP response code: 400 for URL http://localhost/8983/solr/test/update
What am I doing wrong?

How can I debug problems with warehouse creation?

When trying to create a warehouse from the Cloudant dashboard, sometimes the process fails with an error dialog. Other times, the warehouse extraction stays in a state of triggered even after hours.
How can I debug this? For example is there an API I can call to see what is going on?
Take a look inside the document inside the _warehouser database, and look for the warehouser_error_message element. For example:
"warehouser_error_message": "Exception occurred while creating table.
[SQL0670N The statement failed because the row size of the
resulting table would have exceeded the row size limit. Row size
limit: \"\". Table space name: \"\". Resulting row size: \"\".
com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-670,
SQLSTATE=54010, SQLERRMC=32677;;34593, DRIVER=4.18.60]"
The warehouser error message usually gives you enough information to debug the problem.
You can view the _warehouser document in the Cloudant dashboard or use the API, e.g.
export cl_username='<your_cloudant_account>'
curl -s -u $cl_username -p \
https://$cl_username.cloudant.com/_warehouser/_all_docs?include_docs=true \
| jq [.warehouse_error_code]

Run filter queries in Appengine datastore

I want to run the query
SELECT * FROM users WHERE uname='foo' AND passwd='bar'
This always returns None.My table has entries matching the query.
I feel it has to do something with indexing.I edited my index.yaml to
indexes:
- kind: users
ancestor: no
properties:
- name: uname
direction: asc
- name: passwd
direction: asc
But when I define uname=db.TextProperty(required=True,indexed=True),it returns a strange error saying
<class 'google.appengine.ext.db.ConfigurationError'>: indexed must be False.
args = ('indexed must be False.',)
message = 'indexed must be False.'
I call it strange because when i google the error,there are zero exact matches.
What am i missing?
Did you try to run the query in the Datastore viewer of the admin console. It suggests you what indexed to create.
However, text properties are not indexable. According to the docs
Unlike StringProperty, a TextProperty value can be more than 500 characters long. However, TextProperty values are not indexed and cannot be used in filters or sort orders.

How can I tell how many objects I've stored in an S3 bucket?

Unless I'm missing something, it seems that none of the APIs I've looked at will tell you how many objects are in an <S3 bucket>/<folder>. Is there any way to get a count?
Using AWS CLI
aws s3 ls s3://mybucket/ --recursive | wc -l
or
aws cloudwatch get-metric-statistics \
--namespace AWS/S3 --metric-name NumberOfObjects \
--dimensions Name=BucketName,Value=BUCKETNAME \
Name=StorageType,Value=AllStorageTypes \
--start-time 2016-11-05T00:00 --end-time 2016-11-05T00:10 \
--period 60 --statistic Average
Note: The above cloudwatch command seems to work for some while not for others. Discussed here: https://forums.aws.amazon.com/thread.jspa?threadID=217050
Using AWS Web Console
You can look at cloudwatch's metric section to get approx number of objects stored.
I have approx 50 Million products and it took more than an hour to count using aws s3 ls
There is a --summarize switch that shows bucket summary information (i.e. number of objects, total size).
Here's the correct answer using AWS cli:
aws s3 ls s3://bucketName/path/ --recursive --summarize | grep "Total Objects:"
Total Objects: 194273
See the documentation
Although this is an old question, and feedback was provided in 2015, right now it's much simpler, as S3 Web Console has enabled a "Get Size" option:
Which provides the following:
There is an easy solution with the S3 API now (available in the AWS cli):
aws s3api list-objects --bucket BUCKETNAME --output json --query "[length(Contents[])]"
or for a specific folder:
aws s3api list-objects --bucket BUCKETNAME --prefix "folder/subfolder/" --output json --query "[length(Contents[])]"
If you use the s3cmd command-line tool, you can get a recursive listing of a particular bucket, outputting it to a text file.
s3cmd ls -r s3://logs.mybucket/subfolder/ > listing.txt
Then in linux you can run a wc -l on the file to count the lines (1 line per object).
wc -l listing.txt
There is no way, unless you
list them all in batches of 1000 (which can be slow and suck bandwidth - amazon seems to never compress the XML responses), or
log into your account on S3, and go Account - Usage. It seems the billing dept knows exactly how many objects you have stored!
Simply downloading the list of all your objects will actually take some time and cost some money if you have 50 million objects stored.
Also see this thread about StorageObjectCount - which is in the usage data.
An S3 API to get at least the basics, even if it was hours old, would be great.
You can use AWS cloudwatch metrics for s3 to see exact count for each bucket.
2020/10/22
With AWS Console
Look at Metrics tab on your bucket
or:
Look at AWS Cloudwatch's metrics
With AWS CLI
Number of objects:
or:
aws s3api list-objects --bucket <BUCKET_NAME> --prefix "<FOLDER_NAME>" | wc -l
or:
aws s3 ls s3://<BUCKET_NAME>/<FOLDER_NAME>/ --recursive --summarize --human-readable | grep "Total Objects"
or with s4cmd:
s4cmd ls -r s3://<BUCKET_NAME>/<FOLDER_NAME>/ | wc -l
Objects size:
aws s3api list-objects --bucket <BUCKET_NAME> --output json --query "[sum(Contents[].Size), length(Contents[])]" | awk 'NR!=2 {print $0;next} NR==2 {print $0/1024/1024/1024" GB"}'
or:
aws s3 ls s3://<BUCKET_NAME>/<FOLDER_NAME>/ --recursive --summarize --human-readable | grep "Total Size"
or with s4cmd:
s4cmd du s3://<BUCKET_NAME>
or with CloudWatch metrics:
aws cloudwatch get-metric-statistics --metric-name BucketSizeBytes --namespace AWS/S3 --start-time 2020-10-20T16:00:00Z --end-time 2020-10-22T17:00:00Z --period 3600 --statistics Average --unit Bytes --dimensions Name=BucketName,Value=<BUCKET_NAME> Name=StorageType,Value=StandardStorage --output json | grep "Average"
2021 Answer
This information is now surfaced in the AWS dashboard. Simply navigate to the bucket and click the Metrics tab.
If you are using AWS CLI on Windows, you can use the Measure-Object from PowerShell to get the total counts of files, just like wc -l on *nix.
PS C:\> aws s3 ls s3://mybucket/ --recursive | Measure-Object
Count : 25
Average :
Sum :
Maximum :
Minimum :
Property :
Hope it helps.
Go to AWS Billing, then reports, then AWS Usage reports.
Select Amazon Simple Storage Service, then Operation StandardStorage.
Then you can download a CSV file that includes a UsageType of StorageObjectCount that lists the item count for each bucket.
From the command line in AWS CLI, use ls plus --summarize. It will give you the list of all of your items and the total number of documents in a particular bucket. I have not tried this with buckets containing sub-buckets:
aws s3 ls "s3://MyBucket" --summarize
It make take a bit long (it took listing my 16+K documents about 4 minutes), but it's faster than counting 1K at a time.
You can easily get the total count and the history if you go to the s3 console "Management" tab and then click on "Metrics"... Screen shot of the tab
One of the simplest ways to count number of objects in s3 is:
Step 1: Select root folder
Step 2: Click on Actions -> Delete (obviously, be careful - don't delete it)
Step 3: Wait for a few mins aws will show you number of objects and its total size.
As of November 18, 2020 there is now an easier way to get this information without taxing your API requests:
AWS S3 Storage Lens
The default, built-in, free dashboard allows you to see the count for all buckets, or individual buckets under the "Buckets" tab. There are many drop downs to filter and sort almost any reasonable metric you would look for.
In s3cmd, simply run the following command (on a Ubuntu system):
s3cmd ls -r s3://mybucket | wc -l
None of the APIs will give you a count because there really isn't any Amazon specific API to do that. You have to just run a list-contents and count the number of results that are returned.
You can just execute this cli command to get the total file count in the bucket or a specific folder
Scan whole bucket
aws s3api list-objects-v2 --bucket testbucket | grep "Key" | wc -l
aws s3api list-objects-v2 --bucket BUCKET_NAME | grep "Key" | wc -l
you can use this command to get in details
aws s3api list-objects-v2 --bucket BUCKET_NAME
Scan a specific folder
aws s3api list-objects-v2 --bucket testbucket --prefix testfolder --start-after testfolder/ | grep "Key" | wc -l
aws s3api list-objects-v2 --bucket BUCKET_NAME --prefix FOLDER_NAME --start-after FOLDER_NAME/ | grep "Key" | wc -l
Select the bucket/Folder-> Click on actions -> Click on Calculate Total Size
The api will return the list in increments of 1000. Check the IsTruncated property to see if there are still more. If there are, you need to make another call and pass the last key that you got as the Marker property on the next call. You would then continue to loop like this until IsTruncated is false.
See this Amazon doc for more info: Iterating Through Multi-Page Results
Old thread, but still relevant as I was looking for the answer until I just figured this out. I wanted a file count using a GUI-based tool (i.e. no code). I happen to already use a tool called 3Hub for drag & drop transfers to and from S3. I wanted to know how many files I had in a particular bucket (I don't think billing breaks it down by buckets).
So, using 3Hub,
- list the contents of the bucket (looks basically like a finder or explorer window)
- go to the bottom of the list, click 'show all'
- select all (ctrl+a)
- choose copy URLs from right-click menu
- paste the list into a text file (I use TextWrangler for Mac)
- look at the line count
I had 20521 files in the bucket and did the file count in less than a minute.
I used the python script from scalablelogic.com (adding in the count logging). Worked great.
#!/usr/local/bin/python
import sys
from boto.s3.connection import S3Connection
s3bucket = S3Connection().get_bucket(sys.argv[1])
size = 0
totalCount = 0
for key in s3bucket.list():
totalCount += 1
size += key.size
print 'total size:'
print "%.3f GB" % (size*1.0/1024/1024/1024)
print 'total count:'
print totalCount
The issue #Mayank Jaiswal mentioned about using cloudwatch metrics should not actually be an issue. If you aren't getting results, your range just might not be wide enough. It's currently Nov 3, and I wasn't getting results no matter what I tried. I went to the s3 bucket and looked at the counts and the last record for the "Total number of objects" count was Nov 1.
So here is how the cloudwatch solution looks like using javascript aws-sdk:
import aws from 'aws-sdk';
import { startOfMonth } from 'date-fns';
const region = 'us-east-1';
const profile = 'default';
const credentials = new aws.SharedIniFileCredentials({ profile });
aws.config.update({ region, credentials });
export const main = async () => {
const cw = new aws.CloudWatch();
const bucket_name = 'MY_BUCKET_NAME';
const end = new Date();
const start = startOfMonth(end);
const results = await cw
.getMetricStatistics({
// #ts-ignore
Namespace: 'AWS/S3',
MetricName: 'NumberOfObjects',
Period: 3600 * 24,
StartTime: start.toISOString(),
EndTime: end.toISOString(),
Statistics: ['Average'],
Dimensions: [
{ Name: 'BucketName', Value: bucket_name },
{ Name: 'StorageType', Value: 'AllStorageTypes' },
],
Unit: 'Count',
})
.promise();
console.log({ results });
};
main()
.then(() => console.log('Done.'))
.catch((err) => console.error(err));
Notice two things:
The start of the range is set to the beginning of the month
The period is set to a day. Any less and you might get an error saying that you have requested too many data points.
aws s3 ls s3://bucket-name/folder-prefix-if-any --recursive | wc -l
Here's the boto3 version of the python script embedded above.
import sys
import boto3
s3 = boto3.resource("s3")
s3bucket = s3.Bucket(sys.argv[1])
size = 0
totalCount = 0
for key in s3bucket.objects.all():
totalCount += 1
size += key.size
print("total size:")
print("%.3f GB" % (size * 1.0 / 1024 / 1024 / 1024))
print("total count:")
print(totalCount)
3Hub is discontinued. There's a better solution, you can use Transmit (Mac only), then you just connect to your bucket and choose Show Item Count from the View menu.
You can download and install s3 browser from http://s3browser.com/. When you select a bucket in the center right corner you can see the number of files in the bucket. But, the size it shows is incorrect in the current version.
Gubs
You can potentially use Amazon S3 inventory that will give you list of objects in a csv file
Can also be done with gsutil du (Yes, a Google Cloud tool)
gsutil du s3://mybucket/ | wc -l
If you're looking for specific files, let's say .jpg images, you can do the following:
aws s3 ls s3://your_bucket | grep jpg | wc -l

Resources