In the below code from codebase, mongodb client is created(as shown below):
import (
"context"
"time"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
"go.mongodb.org/mongo-driver/mongo/readpref"
)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://localhost:27017"))
In our scenario:
Goroutine 1 uses collection1 for read & write operations:
collection1 := client.Database("testing").Collection("collectionone")
Go-routine 2 uses both collection1 &collection2 for read & write operations:
collection2 := client.Database("testing").Collection("collectiontwo")
Is client concurrent safe to be used in multiple go-routines?
The documentation of mongo.Database explicitly states that:
Database is a handle to a MongoDB database. It is safe for concurrent use by multiple goroutines.
And mongo.Client that:
Client is a handle representing a pool of connections to a MongoDB deployment. It is safe for concurrent use by multiple goroutines.
And mongo.Collection:
Collection is a handle to a MongoDB collection. It is safe for concurrent use by multiple goroutines.
See related: goroutine create multiple mongodb connection
Related
My Google App Engine application (Python3, standard environment) serves requests from users: if there is no wanted record in the database, then create it.
Here is the problem about database overwriting:
When one user (via browser) sends a request to database, the running GAE instance may temporarily fail to respond to the request and then it creates a new process to respond this request. It results that two instances respond to the same request. Both instances make a query to database almost in the same time, and each of them finds there is no wanted record and thus creates a new record. It results as two repeated records.
Another scenery is that for certain reason, the user's browser sends twice requests with time difference less than 0.01 second, which are processed by two instances at the server side and thus repeated records are created.
I am wondering how to temporarily lock the database by one instance to prevent the database overwriting from another instance.
I have considered the following schemes but have no idea whether it is efficient or not.
For python 2, Google App Engine provides "memcache", which can be used to mark the status of query for the purpose of database locking. But for python3, it seems that one has to setup a Redis server to rapidly exchange database status among different instances. So, how about the efficiency of database locking by using Redis?
The usage of session module of Flask. The session module can be used to share data (in most cases, the login status of users) among different requests and thus different instances. I am wondering the speed to exchange the data between different instances.
Appended information (1)
I followed the advice to use transaction, but it did not work.
Below is the code I used to verify the transaction.
The reason of failure may be that the transaction only works for CURRENT client. For multiple requests at the same time, the server side of GAE will create different processes or instances to respond to the requests, and each process or instance will have its own independent client.
#staticmethod
def get_test(test_key_id, unique_user_id, course_key_id, make_new=False):
client = ndb.Client()
with client.context():
from google.cloud import datastore
from datetime import datetime
client2 = datastore.Client()
print("transaction started at: ", datetime.utcnow())
with client2.transaction():
print("query started at: ", datetime.utcnow())
my_test = MyTest.query(MyTest.test_key_id==test_key_id, MyTest.unique_user_id==unique_user_id).get()
import time
time.sleep(5)
if make_new and not my_test:
print("data to create started at: ", datetime.utcnow())
my_test = MyTest(test_key_id=test_key_id, unique_user_id=unique_user_id, course_key_id=course_key_id, status="")
my_test.put()
print("data to created at: ", datetime.utcnow())
print("transaction ended at: ", datetime.utcnow())
return my_test
Appended information (2)
Here is new information about usage of memcache (Python 3)
I have tried the follow code to lock the database by using memcache, but it still failed to avoid overwriting.
#user_student.route("/run_test/<test_key_id>/<user_key_id>/")
def run_test(test_key_id, user_key_id=0):
from google.appengine.api import memcache
import time
cache_key_id = test_key_id+"_"+user_key_id
print("cache_key_id", cache_key_id)
counter = 0
client = memcache.Client()
while True: # Retry loop
result = client.gets(cache_key_id)
if result is None or result == "":
client.cas(cache_key_id, "LOCKED")
print("memcache added new value: counter = ", counter)
break
time.sleep(0.01)
counter+=1
if counter>500:
print("failed after 500 tries.")
break
my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
client.cas(cache_key_id, "")
memcache.delete(cache_key_id)
If the problem is duplication but not overwriting, maybe you should specify data id when creating new entries, but not let GAE generate a random one for you. Then the application will write to the same entry twice, instead of creating two entries. The data id can be anything unique, such as a session id, a timestamp, etc.
The problem of transaction is, it prevents you modifying the same entry in parallel, but it does not stop you creating two new entries in parallel.
I used memcache in the following way (using get/set ) and succeeded in locking the database writing.
It seems that gets/cas does not work well. In a test, I set the valve by cas() but then it failed to read value by gets() later.
Memcache API: https://cloud.google.com/appengine/docs/standard/python3/reference/services/bundled/google/appengine/api/memcache
#user_student.route("/run_test/<test_key_id>/<user_key_id>/")
def run_test(test_key_id, user_key_id=0):
from google.appengine.api import memcache
import time
cache_key_id = test_key_id+"_"+user_key_id
print("cache_key_id", cache_key_id)
counter = 0
client = memcache.Client()
while True: # Retry loop
result = client.get(cache_key_id)
if result is None or result == "":
client.set(cache_key_id, "LOCKED")
print("memcache added new value: counter = ", counter)
break
time.sleep(0.01)
counter+=1
if counter>500:
return "failed after 500 tries of memcache checking."
my_test = MyTest.get_test(int(test_key_id), current_user.unique_user_id, current_user.course_key_id, make_new=True)
client.delete(cache_key_id)
...
Transactions:
https://developers.google.com/appengine/docs/python/datastore/transactions
When two or more transactions simultaneously attempt to modify entities in one or more common entity groups, only the first transaction to commit its changes can succeed; all the others will fail on commit.
You should be updating your values inside a transaction. App Engine's transactions will prevent two updates from overwriting each other as long as your read and write are within a single transaction. Be sure to pay attention to the discussion about entity groups.
You have two options:
Implement your own logic for transaction failures (how many times to
retry, etc.)
Instead of writing to the datastore directly, create a task to modify
an entity. Run a transaction inside a task. If it fails, the App
Engine will retry this task until it succeeds.
I am using the mongo Db package for golang to find a document. When I run multiple readMongo functions in go routines, the time taken to run the readMongo function increases. Is this a I/O limit of my machine? The documents i'm reading are less then 0.5MB.
func main(){
go readMongo()
go readMongo()
go readMongo()
}
func go readMongo(){
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://localhost:27017"))
if err != nil{
//handle err
}
t1:=time.Now()
collection := client.Database("Data").Collection("myCollection")
if err := collection.FindOne(ctx,bson.M{"_id": "myKey"}).Decode(&data); err!= nil{}
t2:=time.Now()
fmt.Println(t2.sub(t1).Milliseconds())
}
First: connect to the db once, and use the same client for all the goroutines. The mongodb client manages the connection pooling.
With increased concurrency, the performance of individual reads will decrease mainly because of the network bandwidth. All parallel reads will share the same connection to the database, causing slowdowns. In your example, you are reading through multiple connections in parallel, causing the reads to share the network bandwidth. In real-life production loads, the database traffic is usually random and this slowdown for each individual goroutine is not so predictable.
When I connect to database (using standard go sql library) using VPN and VPN interface goes down, there's a 75 seconds timeout when I try to do SQL query, no matter if the interface goes up meanwhile. I'd like to decrease this timeout to some reasonable time, so my application won't be frozen for 75 seconds in such case.
db, err := sql.Open(driverName, dataSourceName)
Is it possible to set it somehow via db variable?
The database/sql package doesn't provide a general way to timeout a call to database/sql.Open. However, individual drivers provide this functionality via the DSN (dataSourceName) connection strings.
https://github.com/lib/pq
sql.Open("postgres", "user=user dbname=dbname connect_timeout=5")
https://github.com/go-sql-driver/mysql
sql.Open("mysql", "user:password#/dbname?timeout=5s")
https://github.com/denisenkom/go-mssqldb
sql.Open("sqlserver", "sqlserver://username:password#host/instance?dial+timeout=5")
etc ...
Starting with Go 1.8, the sql.DB abstraction now accepts context.Context, which can be used to time out connections faster.
func (c *Client) DoLookup(ctx context.Context, id int) (string, error) {
var name string
// create a child context with a timeout
newCtx, cancel := context.WithTimeout(ctx, time.Second)
// release resources used in `newCtx` if
// the DB operation finishes faster than the timeout
defer cancel()
row := c.db.QueryRowContext(newCtx, "SELECT name FROM items WHERE id = ?", id)
err := row.Scan(&name)
if err != nil {
return "", err
}
return name, nil
}
If your DoLookup function doesn't yet take a context.Context (and it really should!) you can create a parent one by calling context.TODO().
I get this error every so often when utilizing sqlx with pgx, and I believe it's a configuration error on my end and a db concept I'm not grasping:
error: 'write tcp [redacted-ip]:[redacted-port]->[redacted-ip]:[redacted-port]: write: connection timed out
This occurs when attempting to read from the database. I init sqlx in the startup phase:
package main
import (
_ "github.com/jackc/pgx/stdlib"
"github.com/jmoiron/sqlx"
)
//NewDB attempts to connect to the DB
func NewDB(connectionString string) (*sqlx.DB, error) {
db, err := sqlx.Connect("pgx", connectionString)
if err != nil {
return nil, err
}
return db, nil
}
Any structs responsible for interacting with the database have access to this pointer. The majority of them utilize Select or Get, and I understand those return connections to the pool on their own. There are two functions that use Exec, and they only return the result and error at the end of the function.
Other Notes
My Postgres db supports 100 max_connections
I only showed a few active connections at the time of this error
I am not using SetMaxIdleConnections or SetMaxOpenConnections
Refreshing the page and triggering the request again always works
Any tips on what might be happening here?
EDIT: I did not mention this server is on compose.io, which in turn is hosted on AWS. Is it possible AWS turns these connections into zombies because they've been open for so long and the timeout occurs after unsuccessfully trying them one by one?
With the help of some rough calculations, I've set the maximum lifetime of these connections to 10 minutes. I inserted this code into the init function in the question above to limit the number of open connections, idle connections, and to limit to the life of the connection to 30s.
db.SetConnMaxLifetime(time.Duration(30) * time.Second)
db.SetMaxOpenConns(20)
db.SetMaxIdleConns(20)
Hopefully this helps someone else.
SELECT * FROM pg_stat_activity; is great for nailing down connections as well.
I am introducing namespacing into my application, but I have run into an issue with one of my existing queries that performs the following operation in order to determine whether or not an entity exists for the given key.
// c is of type context.Context
c, _ = appengine.Namespace(c, "name")
k := datastore.NewKey(c, "Kind", "", id, nil)
q := datastore.NewQuery("Kind").Filter("__key__ =", k).KeysOnly()
keys, err := q.GetAll(c, nil)
When this command is executed with a namespace applied to the context, it gives back the following error:
datastore_v3 API error 1: __key__ filter namespace is but query namespace is db
I could just use a Get query instead, but I don't need to actually retrieve the entity at all. Plus, keys-only queries are free!
Update
It seems that all queries are failing after I have introduced namespacing. The documentation doesn't mention any sort of special treatment for the indices:
https://cloud.google.com/appengine/docs/go/multitenancy/multitenancy
"By default, the datastore uses the current namespace for datastore requests. The API applies this current namespace to datastore.Key objects when they are created. Therefore, you need to be careful if an application stores Key objects in serialized forms, since the namespace is preserved in those serializations."
Using namespaces with the Datastore
https://cloud.google.com/appengine/docs/go/multitenancy/multitenancy#Go_Using_namespaces_with_the_Datastore