Setting TCP timeout for SQL connection in Go

Setting TCP timeout for SQL connection in Go - database

When I connect to database (using standard go sql library) using VPN and VPN interface goes down, there's a 75 seconds timeout when I try to do SQL query, no matter if the interface goes up meanwhile. I'd like to decrease this timeout to some reasonable time, so my application won't be frozen for 75 seconds in such case.
db, err := sql.Open(driverName, dataSourceName)
Is it possible to set it somehow via db variable?

The database/sql package doesn't provide a general way to timeout a call to database/sql.Open. However, individual drivers provide this functionality via the DSN (dataSourceName) connection strings.
https://github.com/lib/pq
sql.Open("postgres", "user=user dbname=dbname connect_timeout=5")
https://github.com/go-sql-driver/mysql
sql.Open("mysql", "user:password#/dbname?timeout=5s")
https://github.com/denisenkom/go-mssqldb
sql.Open("sqlserver", "sqlserver://username:password#host/instance?dial+timeout=5")
etc ...

Starting with Go 1.8, the sql.DB abstraction now accepts context.Context, which can be used to time out connections faster.
func (c *Client) DoLookup(ctx context.Context, id int) (string, error) {
var name string
// create a child context with a timeout
newCtx, cancel := context.WithTimeout(ctx, time.Second)
// release resources used in `newCtx` if
// the DB operation finishes faster than the timeout
defer cancel()
row := c.db.QueryRowContext(newCtx, "SELECT name FROM items WHERE id = ?", id)
err := row.Scan(&name)
if err != nil {
return "", err
}
return name, nil
}
If your DoLookup function doesn't yet take a context.Context (and it really should!) you can create a parent one by calling context.TODO().

Related

Is mongodb client driver concurrent safe?

In the below code from codebase, mongodb client is created(as shown below):
import (
"context"
"time"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
"go.mongodb.org/mongo-driver/mongo/readpref"
)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://localhost:27017"))
In our scenario:
Goroutine 1 uses collection1 for read & write operations:
collection1 := client.Database("testing").Collection("collectionone")
Go-routine 2 uses both collection1 &collection2 for read & write operations:
collection2 := client.Database("testing").Collection("collectiontwo")
Is client concurrent safe to be used in multiple go-routines?

The documentation of mongo.Database explicitly states that:
Database is a handle to a MongoDB database. It is safe for concurrent use by multiple goroutines.
And mongo.Client that:
Client is a handle representing a pool of connections to a MongoDB deployment. It is safe for concurrent use by multiple goroutines.
And mongo.Collection:
Collection is a handle to a MongoDB collection. It is safe for concurrent use by multiple goroutines.
See related: goroutine create multiple mongodb connection

Mongo Find slows down

I am using the mongo Db package for golang to find a document. When I run multiple readMongo functions in go routines, the time taken to run the readMongo function increases. Is this a I/O limit of my machine? The documents i'm reading are less then 0.5MB.
func main(){
go readMongo()
go readMongo()
go readMongo()
}
func go readMongo(){
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://localhost:27017"))
if err != nil{
//handle err
}
t1:=time.Now()
collection := client.Database("Data").Collection("myCollection")
if err := collection.FindOne(ctx,bson.M{"_id": "myKey"}).Decode(&data); err!= nil{}
t2:=time.Now()
fmt.Println(t2.sub(t1).Milliseconds())
}

First: connect to the db once, and use the same client for all the goroutines. The mongodb client manages the connection pooling.
With increased concurrency, the performance of individual reads will decrease mainly because of the network bandwidth. All parallel reads will share the same connection to the database, causing slowdowns. In your example, you are reading through multiple connections in parallel, causing the reads to share the network bandwidth. In real-life production loads, the database traffic is usually random and this slowdown for each individual goroutine is not so predictable.

GCP PubSub - How to enqueue asynchronous message?

I would like to have information about the setting of the publisher in the pubsub environment of gcp. I would like to enqueue messages that will be consumed via a google function. To achieve this, the publication will trigger when a number of messages is reached or from a certain time.
I set the topic as follows :
topic.PublishSettings = pubsub.PublishSettings{
ByteThreshold: 1e6, // Publish a batch when its size in bytes reaches this value. (1e6 = 1Mo)
CountThreshold: 100, // Publish a batch when it has this many messages.
DelayThreshold: 10 * time.Second, // Publish a non-empty batch after this delay has passed.
}
When I call the publish function, I have a 10 second delay on each call. Messages are not added to the queue ...
for _, v := range list {
ctx := context.Background()
res := a.Topic.Publish(ctx, &pubsub.Message{Data: v})
// Block until the result is returned and a server-generated
// ID is returned for the published message.
serverID, err = res.Get(ctx)
if err != nil {
return "", err
}
}
Someone can help me ?
Cheers

Batching the publisher side is designed to allow for more cost efficiency when sending messages to Google Cloud Pub/Sub. Given that the minimum billing unit for the service is 1KB, it can be cheaper to send multiple messages in the same Publish request. For example, sending two 0.5KB messages as separate Publish requests would result in being changed for sending 2KB of data (1KB for each). If one were to batch that into a single Publish request, it would be charged as 1KB of data.
The tradeoff with batching is latency: in order to fill up batches, the publisher has to wait to receive more messages to batch together. The three batching properties (ByteThreshold, CountThreshold, and DelayThreshold) allow one to control the level of that tradeoff. The first two properties control how much data or how many messages we put in a single batch. The last property controls how long the publisher should wait to send a batch.
As an example, imagine you have CountThreshold set to 100. If you are publishing few messages, it could take awhile to receive 100 messages to send as a batch. This means that the latency for messages in that batch will be higher because they are sitting in the client waiting to be sent. With DelayThreshold set to 10 seconds, that means that a batch would be sent if it had 100 messages in it or if the first message in the batch was received at least 10 seconds ago. Therefore, this is putting a limit on the amount of latency to introduce in order to have more data in an individual batch.
The code as you have it is going to result in batches with only a single message that each take 10 seconds to publish. The reason is the call to res.Get(ctx), which will block until the message has been successfully sent to the server. With CountThreshold set to 100 and DelayThreshold set to 10 seconds, the sequence that is happening inside your loop is:
A call to Publish puts a message in a batch to publish.
That batch is waiting to receive 99 more messages or for 10 seconds to pass before sending the batch to the server.
The code is waiting for this message to be sent to the server and return with a serverID.
Given the code doesn't call Publish again until res.Get(ctx) returns, it waits 10 seconds to send the batch.
res.Get(ctx) returns with a serverID for the single message.
Go back to 1.
If you actually want to batch messages together, you can't call res.Get(ctx) before the next Publish call. You'll want to either call publish inside a goroutine (so one routine per message) or you'll want to amass the res objects in a list and then call Get on them outside the loop, e.g.:
var res []*PublishResult
ctx := context.Background()
for _, v := range list {
res = append(res, a.Topic.Publish(ctx, &pubsub.Message{Data: v}))
}
for _, r := range res {
serverID, err = r.Get(ctx)
if err != nil {
return "", err
}
}
Something to keep in mind is that batching will optimize cost on the publish side, not on the subscribe side. Cloud Functions is built with push subscriptions. This means that messages must be delivered to the subscriber one at a time (since the response code is what is used to ack or nack each message), which means there is no batching of messages delivered to the subscriber.

Appengine Datastore query returns different result inside transaction

Hoping someone can help point out the issue in my code.
I have a query defined outside a transaction, and when it's executed, it correctly matches an existing record in the database.
However, the moment that query is executed inside a transaction, it fails to match the existing records in the database, despite the fact that they exist.
Here's the code, with output below:
// Query for URL to see if any already exist
existingRemoteURLQuery := datastore.NewQuery("RepoStats").
Filter("RepoURL =", statsToSave.RepoURL).
KeysOnly().Limit(1)
testKey, _ := existingRemoteURLQuery.GetAll(ctx, new(models.RepoStats))
if len(testKey) > 0 {
log.Infof(ctx, "TEST Update existing record vice new key")
} else {
log.Infof(ctx, "TEST No existing key found, use new key")
}
// Check if we already have a record with this remote URL
var key *datastore.Key
err := datastore.RunInTransaction(ctx, func(ctx context.Context) error {
// This function's argument ctx shadows the variable ctx from the surrounding function.
// last parameter is ignored because it's a keys-only query
existingKeys, err := existingRemoteURLQuery.GetAll(ctx, new(models.RepoStats))
if len(existingKeys) > 0 {
log.Infof(ctx, "Update existing record vice new key")
// use existing key
key = existingKeys[0]
} else {
log.Infof(ctx, "No existing key found, use new key")
key = datastore.NewIncompleteKey(ctx, "RepoStats", nil)
}
return err
}, nil)
As you can see in the output, the first query outside the transaction correctly matches the existing record. But inside the transaction, it doesn't recognize the existing record:
2018/08/28 11:50:47 INFO: TEST Update existing record vice new key
2018/08/28 11:50:47 INFO: No existing key found, use new key
Thanks for any help in advance
Updated
Dan's comment lead to printing out the error message on the query inside the transaction:
if err != nil {
log.Errorf(ctx, "Issue running in transaction: %v", err)
}
Which prints:
ERROR: Issue running in transaction: API error 1 (datastore_v3: BAD_REQUEST): Only ancestor queries are allowed inside transactions.

Converting a comment into an answer
Turns out this is the go-specific behaviour when attempting to perform non-ancestor queries inside transactions (FWIW, in python attempting to do so actually raises an exception).
Ancestor queries are the only queries allowed inside transactions. From What can be done in a transaction (not very explicit, tho, IMHO implicit as queries could return entities not meeting the transaction restrictions):
All Cloud Datastore operations in a transaction must operate on
entities in the same entity group if the transaction is a single-group
transaction, or on entities in a maximum of twenty-five entity groups
if the transaction is a cross-group transaction. This includes
querying for entities by ancestor, retrieving entities by key,
updating entities, and deleting entities. Notice that each root entity
belongs to a separate entity group, so a single transaction cannot
create or operate on more than one root entity unless it is a
cross-group transaction.

Occasionally retrieving "connection timed out" errors when querying Postgresql

I get this error every so often when utilizing sqlx with pgx, and I believe it's a configuration error on my end and a db concept I'm not grasping:
error: 'write tcp [redacted-ip]:[redacted-port]->[redacted-ip]:[redacted-port]: write: connection timed out
This occurs when attempting to read from the database. I init sqlx in the startup phase:
package main
import (
_ "github.com/jackc/pgx/stdlib"
"github.com/jmoiron/sqlx"
)
//NewDB attempts to connect to the DB
func NewDB(connectionString string) (*sqlx.DB, error) {
db, err := sqlx.Connect("pgx", connectionString)
if err != nil {
return nil, err
}
return db, nil
}
Any structs responsible for interacting with the database have access to this pointer. The majority of them utilize Select or Get, and I understand those return connections to the pool on their own. There are two functions that use Exec, and they only return the result and error at the end of the function.
Other Notes
My Postgres db supports 100 max_connections
I only showed a few active connections at the time of this error
I am not using SetMaxIdleConnections or SetMaxOpenConnections
Refreshing the page and triggering the request again always works
Any tips on what might be happening here?
EDIT: I did not mention this server is on compose.io, which in turn is hosted on AWS. Is it possible AWS turns these connections into zombies because they've been open for so long and the timeout occurs after unsuccessfully trying them one by one?

With the help of some rough calculations, I've set the maximum lifetime of these connections to 10 minutes. I inserted this code into the init function in the question above to limit the number of open connections, idle connections, and to limit to the life of the connection to 30s.
db.SetConnMaxLifetime(time.Duration(30) * time.Second)
db.SetMaxOpenConns(20)
db.SetMaxIdleConns(20)
Hopefully this helps someone else.
SELECT * FROM pg_stat_activity; is great for nailing down connections as well.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight