iterate over a large number of entities on appengine with go

iterate over a large number of entities on appengine with go - google-app-engine

On app engine I have a large number of entities of a particular kind.
I want to run a function on each entity (e.g. edit the entity or copy it)
I would do this in a taskqueue but a taskqueue is limited to 10 minutes runtime and each function call is prone to many kinds of errors. What is the best way to do this?

Here's my solution although I'm hoping someone out there has a better solution. I also wonder if this is prone to fork bombs e.g. if the task runs twice, it will set off two chains of iteration.. ! I'm only using it to iterate a few hundred thousand entities, although the operation on each entity is expensive.
First I create a taskqueue for running each individual function call on an entity one at a time:
queue:
- name: entity-iter
rate: 100/s
max_concurrent_requests: 1
retry_parameters:
task_retry_limit: 3
task_age_limit: 30m
min_backoff_seconds: 200
and then I have an iterate entity method which, given the kind, will call your delay func on each entity with the key.
package sysadmin
import (
"google.golang.org/appengine/datastore"
"golang.org/x/net/context"
"google.golang.org/appengine/log"
"google.golang.org/appengine/delay"
"google.golang.org/appengine/taskqueue"
)
func ForEachEntity(kind string, f *delay.Function) *delay.Function {
var callWithNextKey *delay.Function // func(c context.Context, depth int, cursorString string) error
callWithNextKey = delay.Func("something", func(c context.Context, depth int, cursorString string) error {
q := datastore.NewQuery(kind).KeysOnly()
if cursorString != "" {
if curs, err := datastore.DecodeCursor(cursorString); err != nil {
log.Errorf(c, "error decoding cursor %v", err)
return err
} else {
q = q.Start(curs)
}
}
it := q.Run(c)
if key, err := it.Next(nil); err != nil {
if err == datastore.Done {
log.Infof(c, "Done %v", err)
return nil
}
log.Errorf(c, "datastore error %v", err)
return err
} else {
curs, _ := it.Cursor()
if t, err := f.Task(key); err != nil {
return err
} else if _, err = taskqueue.Add(c, t, "entity-iter"); err != nil {
log.Errorf(c, "error %v", err)
return err
}
if depth - 1 > 0 {
if err := callWithNextKey.Call(c, depth - 1, curs.String()); err != nil {
log.Errorf(c, "error2 %v", err)
return err
}
}
}
return nil
})
return callWithNextKey
}
example usage:
var DoCopyCourse = delay.Func("something2", CopyCourse)
var DoCopyCourses = ForEachEntity("Course", DoCopyCourse)
func CopyCourses(c context.Context) {
//sharedmodels.MakeMockCourses(c)
DoCopyCourses.Call(c, 9999999, "")
}

Related

MongoDB aggregation query not matching the structure that being defined

I have a requirement in my project where I have to perform a DB operation for getting a particular type of a total number of users. What I am doing is that, filtered all the queries in a slice and passing that Silce to my DB function.
This is the code snippet from where I am calling the DB function
{
filters = []bson.D{
{{Key: "Mykey", Value: myvalue}},
{{Key: "Mykey", Value: myvalue}},
{{Key: "Mykey", Value: myvalue}},
{{Key: "Mykey", Value: myvalue}},
counts, err := dbmain.NoOfDocumentsInfo(MyDBName, myCollectionName, filters...)
}
Below is my called function
func NoOfDocumentsInfo(DB string, col string, filters ...bson.D) ([]int64, error) {
if nil == dbInstance {
if nil == GetDBInstance() {
logger.Error("Not connecting to DB")
err := errors.New("DB connection error")
return nil, err
}
}
logger.Debugf("%s %s", DB, col)
coll := dbInstance.Database(DB).Collection(col)
counts := make([]int64, len(filters))
for i, filter := range filters {
count, err := coll.CountDocuments(context.TODO(), filter)
if err != nil {
logger.Fatal(err)
return nil, err
}
counts[i] = count
}
return counts, nil
}
As you can see I am calling the "coll.CountDocuments" functions multiple times. What I want is to write the code without calling the "coll.CountDocuments" function multiple times by aggregating all the filters into a single query.
I have tried to use the aggregation pipeline but my "cur" and "result" is giving null output. If you run the code you will be able to see it.
func NoOfDocumentsInfo(DB string, col string, filters ...bson.D) ([]int64, error) {
if dbInstance == nil {
if GetDBInstance() == nil {
logger.Error("Not connecting to DB")
err := errors.New("DB connection error")
return nil, err
}
}
logger.Debugf("%s %s", DB, col)
coll := dbInstance.Database(DB).Collection(col)
pipeline := make([]bson.M, 0, len(filters)+2)
pipeline = append(pipeline, bson.M{"$match": bson.M{"$or": filters}})
pipeline = append(pipeline, bson.M{"$group": bson.M{"_id": nil, "count": bson.M{"$sum": 1}}})
pipeline = append(pipeline, bson.M{"$group": bson.M{"_id": nil, "count": bson.M{"$first": "$count"}}})
var result struct {
Count int64 `bson:"count"`
}
cur, err := coll.Aggregate(context.TODO(), pipeline)
if err != nil {
logger.Fatal(err)
return nil, err
}
logger.Debugf("cur: %+v", cur)
err = cur.Decode(&result)
logger.Debugf("result: %+v, err: %v", result, err)
if err != nil {
logger.Fatal(err)
return nil, err
}
return []int64{result.Count}, nil
}

You have to add a field for each filter in $group, you may use $cond to conditionally increment the given counter. But this may very well end up not using indices, and thus being even slower than the separate, original count queries. Also note that using $or may also result in skipping indices. Also note that in $cond you may have to transform filters (e.g. add $ to field names).
You'd better launch concurrent count queries (using go) for each filter, and if they are indexed, they will complete fast. This is how it could look like:
func docCounts(db string, col string, filters ...bson.D) ([]int64, error) {
// ... obtain collection
coll := dbInstance.Database(db).Collection(col)
counts := make([]int64, len(filters))
errs := make([]error, len(filters))
wg := &sync.WaitGroup{}
wg.Add(len(filters))
for i := range filters {
go func(i int) {
defer wg.Done()
counts[i], errs[i] = coll.CountDocuments(context.TODO(), filters[i])
}(i)
}
wg.Wait()
// Produce some kind of error if any of the queries failed.
var err error
for _, e := range errs {
if e != nil {
err = fmt.Errorf("at least one query failed: %w", e)
break
}
}
// Note: starting with Go 1.20, you could simply write:
// err = errors.Join(errs)
return counts, err
}

Alternative to 'coll.CountDocuments' function on Mongodb in golang. (Aggregation Pipeline)

As you can see I am calling the "coll.CountDocuments" functions multiples times. What I want is to write the code without calling the "coll.CountDocuments" function multiple times by aggregating all the filters into a single query.
func NoOfDocumentsInfo(DB string, col string, filters ...bson.D) ([]int64, error) {
if nil == dbInstance {
if nil == GetDBInstance() {
logger.Error("Not connecting to DB")
err := errors.New("DB connection error")
return nil, err
}
}
logger.Debugf("%s %s", DB, col)
coll := dbInstance.Database(DB).Collection(col)
counts := make([]int64, len(filters))
for i, filter := range filters {
count, err := coll.CountDocuments(context.TODO(), filter)
if err != nil {
logger.Fatal(err)
return nil, err
}
counts[i] = count
}
return counts, nil
}
I have tried to used aggragation pipeline but "cur" and "result" is giving null output.
`func NoOfDocumentsInfo(DB string, col string, filters ...bson.D) ([]int64, error) {
if dbInstance == nil {
if GetDBInstance() == nil {
logger.Error("Not connecting to DB")
err := errors.New("DB connection error")
return nil, err
}
}
logger.Debugf("%s %s", DB, col)
coll := dbInstance.Database(DB).Collection(col)
pipeline := make([]bson.M, 0, len(filters)+2)
pipeline = append(pipeline, bson.M{"$match": bson.M{"$or": filters}})
pipeline = append(pipeline, bson.M{"$group": bson.M{"_id": nil, "count": bson.M{"$sum": 1}}})
pipeline = append(pipeline, bson.M{"$group": bson.M{"_id": nil, "count": bson.M{"$first": "$count"}}})
var result struct {
Count int64 `bson:"count"`
}
cur, err := coll.Aggregate(context.TODO(), pipeline)
if err != nil {
logger.Fatal(err)
return nil, err
}
logger.Debugf("cur: %+v", cur)
err = cur.Decode(&result)
logger.Debugf("result: %+v, err: %v", result, err)
if err != nil {
logger.Fatal(err)
return nil, err
}
return []int64{result.Count}, nil
}`

A much simpler approach would be the one that I'm going to share here. Let's start with the code:
package main
import (
"context"
"fmt"
"time"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
var (
dbInstance *mongo.Client
ctx context.Context
cancel context.CancelFunc
)
func NoOfDocumentsInfo(client *mongo.Client, DB string, col string, filters bson.A) (int64, error) {
coll := client.Database(DB).Collection(col)
myFilters := bson.D{
bson.E{
Key: "$and",
Value: filters,
},
}
counts, err := coll.CountDocuments(ctx, myFilters)
if err != nil {
panic(err)
}
return counts, nil
}
func main() {
ctx, cancel = context.WithTimeout(context.Background(), 20*time.Second)
defer cancel()
// set MongoDB connection
clientOptions := options.Client().ApplyURI("mongodb://root:root#localhost:27017")
mongoClient, err := mongo.Connect(ctx, clientOptions)
if err != nil {
panic(err)
}
defer mongoClient.Disconnect(ctx)
// query with filters
numDocs, err := NoOfDocumentsInfo(mongoClient, "demodb", "myCollection", bson.A{
bson.D{bson.E{Key: "Name", Value: bson.D{bson.E{Key: "$eq", Value: "John Doe"}}}},
bson.D{bson.E{Key: "Song", Value: bson.D{bson.E{Key: "$eq", Value: "White Roses"}}}},
})
if err != nil {
panic(err)
}
fmt.Println("num docs:", numDocs)
}
Let's see the relevant changes applied to the code:
Expect a parameter called filters of type bson.A which is the type for the array in the MongoDB environment.
Build the myFilters variable which is of type bson.D (slice) with the following single item (bson.E) in this way:
The Key is the logical operator
The Value is the array passed into the function
Build the array to pass to the function with all of the needed filters (e.g. two equal conditions: one on the Name key and the other on the Song).
Finally, I also did some improvements on how you've opened the MongoDB connection and how you've released the allocated resources.
Let me know if this solves your issue, thanks!

Record inserted twice into database

I have code in Go like below :
package main
import (
"database/sql"
"log"
_ "github.com/lib/pq"
)
const (
insertLoginSQL = `insert into Logins(id, name,password) values($1, $2, $3)`
)
func main() {
db, err := sql.Open("postgres", "user=postgres password=admin dbname=Quality sslmode=disable")
if err != nil {
log.Fatal(err)
}
defer db.Close()
if err := Insert(db); err != nil {
log.Println("error with double insert", err)
}
}
func Insert(db *sql.DB) error {
tx, err := db.Begin()
if err != nil {
return err
}
stmt, err := tx.Prepare(insertLoginSQL)
if err != nil {
return err
}
defer stmt.Close()
if _, err := stmt.Exec(10, "user","pwd"); err != nil {
tx.Rollback()
return err
}
return tx.Commit()
}
When I run above code, records inserted twice in database. Can someone let me know why duplicate records inserted? Any issue with this code?

Probably commit is done twice. First time by some of previous operations like stmt.exec and second time when tx.Commit() executed.

Builds at commandline but fails to build as gae app

No issues building at commandline:
Darians-MacBook-Pro:gdriveweb darianhickman$ go build helloworld/hello.go
Darians-MacBook-Pro:gdriveweb darianhickman$
Error at locahost:8080/
The Go application could not be built.
(Executed command: /Users/darianhickman/go_appengine/goroot/bin/go-app-builder -app_base /Users/darianhickman/gowork/src/bitbucket.org/darian_hickman/gdriveweb/helloworld -arch 6 -dynamic -goroot /Users/darianhickman/go_appengine/goroot -nobuild_files ^^$ -unsafe -gopath /Users/darianhickman/gowork -binary_name _go_app -extra_imports appengine_internal/init -work_dir /var/folders/fk/wknp5jzn53gbgbml0yn695_m0000gn/T/tmpsHFP6tappengine-go-bin -gcflags -I,/Users/darianhickman/go_appengine/goroot/pkg/darwin_amd64_appengine -ldflags -L,/Users/darianhickman/go_appengine/goroot/pkg/darwin_amd64_appengine hello.go)
/Users/darianhickman/gowork/src/golang.org/x/net/context/ctxhttp/ctxhttp.go:35: req.Cancel undefined (type *http.Request has no field or method Cancel)
2016/05/24 19:39:17 go-app-builder: build timing: 6×6g (469ms total), 0×6l (0 total)
2016/05/24 19:39:17 go-app-builder: failed running 6g: exit status 1
When I research the error
*http.Request has no field or method Cancel
it leads to a bunch of nonapplicable posts about updating to >Go1.5.
Source:
package hello
import (
"encoding/json"
"fmt"
"golang.org/x/net/context"
"golang.org/x/oauth2"
"golang.org/x/oauth2/google"
"google.golang.org/api/drive/v3"
_ "google.golang.org/appengine/urlfetch"
"io/ioutil"
"log"
"net/http"
"net/url"
"os"
"os/user"
"path/filepath"
)
const (
assetfolder = "0B-zdryEj60U_MXVkajFweXBQWHM"
)
var (
dir *drive.FileList
)
func init() {
http.HandleFunc("/", handler)
ctx := context.Background()
b, err := ioutil.ReadFile("client_secret.json")
if err != nil {
log.Fatalf("Unable to read client secret file: %v", err)
}
// If modifying these scopes, delete your previously saved credentials
// at ~/.credentials/drive-go-quickstart.json
config, err := google.ConfigFromJSON(b, drive.DriveMetadataReadonlyScope)
if err != nil {
log.Fatalf("Unable to parse client secret file to config: %v", err)
}
client := getClient(ctx, config)
srv, err := drive.New(client)
if err != nil {
log.Fatalf("Unable to retrieve drive Client %v", err)
}
dir, err = srv.Files.List().PageSize(10).
Fields("nextPageToken, files(id, name)").Do()
if err != nil {
log.Fatalf("Unable to retrieve files.", err)
}
}
func handler(w http.ResponseWriter, r *http.Request) {
//fmt.Fprint(w, r.RequestURI)
fmt.Fprint(w, "Files:")
if len(dir.Files) > 0 {
for _, i := range dir.Files {
fmt.Fprint(w, "%s (%s)\n", i.Name, i.Id)
}
} else {
fmt.Fprint(w, "No files found.")
}
}
// getClient uses a Context and Config to retrieve a Token
// then generate a Client. It returns the generated Client.
func getClient(ctx context.Context, config *oauth2.Config) *http.Client {
cacheFile, err := tokenCacheFile()
if err != nil {
log.Fatalf("Unable to get path to cached credential file. %v", err)
}
tok, err := tokenFromFile(cacheFile)
if err != nil {
tok = getTokenFromWeb(config)
saveToken(cacheFile, tok)
}
return config.Client(ctx, tok)
}
// getTokenFromWeb uses Config to request a Token.
// It returns the retrieved Token.
func getTokenFromWeb(config *oauth2.Config) *oauth2.Token {
authURL := config.AuthCodeURL("state-token", oauth2.AccessTypeOffline)
fmt.Printf("Go to the following link in your browser then type the "+
"authorization code: \n%v\n", authURL)
var code string
if _, err := fmt.Scan(&code); err != nil {
log.Fatalf("Unable to read authorization code %v", err)
}
tok, err := config.Exchange(oauth2.NoContext, code)
if err != nil {
log.Fatalf("Unable to retrieve token from web %v", err)
}
return tok
}
// tokenCacheFile generates credential file path/filename.
// It returns the generated credential path/filename.
func tokenCacheFile() (string, error) {
usr, err := user.Current()
if err != nil {
return "", err
}
tokenCacheDir := filepath.Join(usr.HomeDir, ".credentials")
os.MkdirAll(tokenCacheDir, 0700)
return filepath.Join(tokenCacheDir,
url.QueryEscape("drive-go-quickstart.json")), err
}
// tokenFromFile retrieves a Token from a given file path.
// It returns the retrieved Token and any read error encountered.
func tokenFromFile(file string) (*oauth2.Token, error) {
f, err := os.Open(file)
if err != nil {
return nil, err
}
t := &oauth2.Token{}
err = json.NewDecoder(f).Decode(t)
defer f.Close()
return t, err
}
// saveToken uses a file path to create a file and store the
// token in it.
func saveToken(file string, token *oauth2.Token) {
fmt.Printf("Saving credential file to: %s\n", file)
f, err := os.Create(file)
if err != nil {
log.Fatalf("Unable to cache oauth token: %v", err)
}
defer f.Close()
json.NewEncoder(f).Encode(token)
}
func main() {
ctx := context.Background()
b, err := ioutil.ReadFile("client_secret.json")
if err != nil {
log.Fatalf("Unable to read client secret file: %v", err)
}
// If modifying these scopes, delete your previously saved credentials
// at ~/.credentials/drive-go-quickstart.json
config, err := google.ConfigFromJSON(b, drive.DriveMetadataReadonlyScope)
if err != nil {
log.Fatalf("Unable to parse client secret file to config: %v", err)
}
client := getClient(ctx, config)
srv, err := drive.New(client)
if err != nil {
log.Fatalf("Unable to retrieve drive Client %v", err)
}
r, err := srv.Files.List().PageSize(10).
Fields("nextPageToken, files(id, name)").Do()
if err != nil {
log.Fatalf("Unable to retrieve files.", err)
}
fmt.Println("Files:")
if len(r.Files) > 0 {
for _, i := range r.Files {
fmt.Printf("%s (%s)\n", i.Name, i.Id)
}
} else {
fmt.Print("No files found.")
}
}

I got past this issue by redownloading and reinstalling Go App Engine SDK . My best guess why that worked is that an old version of go was somehow getting included.

appengine is not saving

For some reason nothing gets saved when the test code below is run. I have other api methods that when run (not through tests, this is just the first test) do save.
When I check the database stats via localhost:8000, it can be seen that nothing is being inserted.
Update: After copying and pasting the code below and wrapping it is GET request handler with some hardcoded data it does save to the database. So this seems like an issue with the testing aetest.Context that is used. I have added the code for the NewTestHandler helper code.
Method to create the context within the tests
func NewTestHandler(handlerFunc func(appengine.Context, http.ResponseWriter, *http.Request)) http.HandlerFunc {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
c, _ := aetest.NewContext(nil)
handlerFunc(c, w, r)
})
}
Error (update: the key that is generated returns 0 when calling .IntId())
// happens in the .Get() error handling
--- err datastore: internal error: server returned the wrong number of entities
Model
package app
import "time"
type League struct {
Name string `json:"name"`
Location string `json:"location"`
CreatedAt time.Time
}
Code
func (api *LeagueApi) Create(c appengine.Context, w http.ResponseWriter, r *http.Request) {
// data
var league League
json.NewDecoder(r.Body).Decode(&league)
defer r.Body.Close()
// save to db
key := datastore.NewIncompleteKey(c, "leagues", nil)
if _, err := datastore.Put(c, key, &league); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
var leagueCheck League
if err := datastore.Get(c, key, &leagueCheck); err != nil {
log.Println("--- err", err)
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
// json response
if err := json.NewEncoder(w).Encode(league); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
Test
func Test_LeagueReturnedOnCreate(t *testing.T) {
league := League{Name: "foobar"}
data, _ := json.Marshal(league)
reader := bytes.NewReader(data)
// setup request and writer
r, _ := http.NewRequest("POST", "/leagues", reader)
w := httptest.NewRecorder()
// make request
api := LeagueApi{}
handler := tux.NewTestHandler(api.Create)
handler.ServeHTTP(w, r)
// extract api response
var leagueCheck League
json.NewDecoder(w.Body).Decode(&leagueCheck)
if leagueCheck.Name != "foobar" {
t.Error("should return the league")
}
// ensure the league is in the db
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

iterate over a large number of entities on appengine with go - google-app-engine

Related

MongoDB aggregation query not matching the structure that being defined

Alternative to 'coll.CountDocuments' function on Mongodb in golang. (Aggregation Pipeline)

Record inserted twice into database

Builds at commandline but fails to build as gae app

appengine is not saving

Categories

Resources