I have a rabbit MQ queue which is high load, it may have up to several millions messages. The MQ broker reads messages from queue and writes them to MS SQL DB. I tried to write non-blocking, concurrently, using a goroutine:
for m := range msgs {
//......
se := &sqlEntity{
body: string(m.Body),
cnt: m.MessageCount,
timeStamp: fmt.Sprintf("%v", m.Timestamp.Format("2006-01-02 15:04:05")),
uuid: u,
}
go func(se *sqlEntity) {
writeSQL(se)
}(se)
//.........
}
func writeSQL(se *sqlEntity) {
result, err := db.Exec(cmd, args...)
//.......
}
So, write function does not block reading from MQ. But if there are too many messages, write process exhausts all present connections on MS SQL server. Thus I tried to setup the pool, set the number of connections explicitly - (DB.SetMaxOpenConns). I was sure that the database/sql driver will manage the connections, but it does not. If connections (for example let SetMaxOpenConns = 256) exhausts, writeSQL() call does not wait for free connection in the pool, result, err := db.Exec(cmd, args...) inside it simply returns connection error in this case.
So, how can I design my application to call writeSQL() concurrently, but strictly within the pool limits? Now I simply loose my data if the pool is exhausted. Or DB overloads if there is no pool limit.
One thing you can do is to use a buffered channel with a size equal to the maximum number of connections in the pool to control the concurrency of writeSQL function.
Every time writeSQL is called, it sends a message to the channel.
And before executing the db.Exec statement, it waits for a message to be received from the channel which indicates a free connection is available in the pool.
This way should allow you to handle the number of concurrent writeSQL functions and ensure that it will never exceed the maximum number of connections in the pool.
You won't lose any data when the pool is exhausted.
Using the code you've provided, it should look like:
connPool := make(chan struct{}, maxOpenConns) // create a buffered channel with a size equal to the maximum number of connections in the pool
for m := range msgs {
// ...
se := &sqlEntity{
body: string(m.Body),
cnt: m.MessageCount,
timeStamp: fmt.Sprintf("%v", m.Timestamp.Format("2006-01-02 15:04:05")),
uuid: u,
}
go func(se *sqlEntity) {
writeSQL(se)
}(se)
// ...
}
func writeSQL(se *sqlEntity) {
connPool <- struct{}{} // wait for a free connection in the pool
defer func() {
<-connPool // release the connection after writeSQL is done
}()
result, err := db.Exec(cmd, args...)
// handle error and return
}
Related
My goal is to persist a swift-nio Scheduled instance in a struct named Receiver in an array to potentially use later. It creates the Receiver, stores it in the array and adds the Scheduled instance, but the Receiver instance in the array does not contain the Scheduled instance.
What could be causing this and how could I fix it?
It does not occur with anything else and the task associated with the Scheduled instance still executes. The XCode debugger even shows the instance in the Receiver associated with a variable, yet not in the array.
private var receivers: Array<Receiver>
func connect() {
var entry = Receiver(
// ...
)
receivers.append(entry)
connect(&entry)
}
func connect(_ device: inout Receiver) {
startTimeout(&device)
// device contains timeout, but receivers[0] does not
}
private func startTimeout(_ entry: inout Receiver) {
stopTimeout(&entry) // Does nothing initially (timeout not yet set)
var device = entry
// The timeout is added once here
entry.timeout = entry.eventLoop.scheduleTask(in: .seconds(10), {
print("Receiver timeout occurred")
self.reconnect(&device)
return 0
})
}
func someStuff() {
// This does not work. timeout is nil, the timeout still occurs
stopTimeout(&entry) // Cancels the Scheduled timeout
}
struct Receiver {
var timeout: Scheduled<Any>?
}
Your struct Receiver is a value type. Everytime it is passed it gets copied (a new instance is created). Manipulating it will alter only the new instance. You avoided that by using inout parameters.
But with:
receivers.append(entry)
you are appending a new instance to the array. Modifing entry later on will not affect the instance in the array.
In this case you should use a class.
class Receiver {
var timeout: Scheduled<Any>?
}
Idea. Have tcp server and client. On client writing info about file and file contents to to tcp connection. On server reading that data, creating files and want to return errors if any back to client.
On client using fmt.Fprintf(conn, "test.txt\n") for file info and io.Copy(conn, f), where f is file contents
On server using ReadString('\n') for info about file and
w := io.MultiWriter(writers...)
_, err = io.Copy(w, r)
if err != nil {
fmt.Println([]error{err})
}
And all goes well at 1st, but then I ran into problem on server side with io.Copy going into infinite loop trying to read data from client even when there's none. It doesn't understand that it's EOF or anything.
Tried using conn.Read and similar in loop on server side instead of IO.Copy - and got infinite loop again.
w := io.MultiWriter(writers...)
for {
input := make([]byte, 1024*4)
n, err := conn.Read(input)
if n == 0 || err != nil {
log.Error(err, "Read error:")
break
}
_, err = w.Write(input[0:n])
if err != nil {
log.Error(err, "Failed to copy data on server")
m = m + fmt.Sprintf("failed to copy data on server: %s\n", err)
}
}
Read some articles and ran into explanation that it's "normal" behaviour for such blocks during server-client exchanges with readers-writers. One of the suggestions was to use timeout for connection reading(but won't work for me, cuz files can be huge and hard to figure out how long is enough)
Another option is sending some STOP-WORD from client to server to break the loop
But I wander if there's some other way(may be I'm missing out some instruments in go) to achieve desired result without infinite loops, time-outs and stop-words.
Given the sign capability from Go NaCl library (https://github.com/golang/crypto/tree/master/nacl/sign), how to sign a file, especially, a very large file as big as more than 1GB? Most of the internet search results are all about signing a slice or small array of bytes.
I can think of 2 ways:
Loop through the file and stream in a block manner (e.g. 16k each time), then feed it into the sign function. The streamed output are concatenated into a signature certificate. For verification, it is done reversely.
Use SHA(X) to generate the shasum of the file and then sign the shasum output.
For signing very large files (multiple gigabytes and up), the problem of using a standard signing function is often runtime and fragility. For very large files (or just slow disks) it could perhaps take hours or more just to serially read the full file from start to end.
In such cases, you want a way to process the file in parallel. One of the common ways to do this which is suitable for cryptographic signatures is Merkle tree hashes. They allow you to split the large file into smaller chunks, hash them in parallel (producing "leaf hashes"), and then further hash those hashes in a tree structure to produce a root hash which represents the full file.
Once you have calculated this Merkle tree root hash, you can sign this root hash. It then becomes possible to use the signed Merkle tree root hash to verify all of the file chunks in parallel, as well as verifying their order (based on the positions of the leaf hashes in the tree structure).
The problem with NaCl is that you need to put the whole message into RAM, as per godoc:
Messages should be small because: 1. The whole message needs to be held in memory to be processed. 2. Using large messages pressures implementations on small machines to process plaintext without verifying the signature. This is very dangerous, and this API discourages it, but a protocol that uses excessive message sizes might present some implementations with no other choice. 3. Performance may be improved by working with messages that fit into data caches. Thus large amounts of data should be chunked so that each message is small.
However, there are various other methods. Most of them basically do what you described in the first way. You basically copy the file contents into an io.Writer which takes the contents and calculates the hash sum - this is most efficient.
The code below is pretty hacked, but you should get the picture.
I achieved an average throughput of 315MB/s with it.
package main
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/sha256"
"flag"
"fmt"
"io"
"log"
"math/big"
"os"
"time"
)
var filename = flag.String("file", "", "file to sign")
func main() {
flag.Parse()
if *filename == "" {
log.Fatal("file can not be empty")
}
f, err := os.Open(*filename)
if err != nil {
log.Fatalf("Error opening '%s': %s", *filename, err)
}
defer f.Close()
start := time.Now()
sum, n, err := hash(f)
duration := time.Now().Sub(start)
log.Printf("Hashed %s (%d bytes)in %s to %x", *filename, n, duration, sum)
log.Printf("Average: %.2f MB/s", (float64(n)/1000000)/duration.Seconds())
r, s, err := sign(sum)
if err != nil {
log.Fatalf("Error creatig signature: %s", err)
}
log.Printf("Signature: (0x%x,0x%x)\n", r, s)
}
func sign(sum []byte) (*big.Int, *big.Int, error) {
priv, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
log.Printf("Error creating private key: %s", err)
}
return ecdsa.Sign(rand.Reader, priv, sum[:])
}
func hash(f *os.File) ([]byte, int64, error) {
var (
hash []byte
n int64
err error
)
h := sha256.New()
// This is where the magic happens.
// We use the efficient io.Copy to feed the contents
// of the file into the hash function.
if n, err = io.Copy(h, f); err != nil {
return nil, n, fmt.Errorf("Error creating hash: %s", err)
}
hash = h.Sum(nil)
return hash, n, nil
}
I have implemented the levigo wrapper in my project so I can use LevelDB. The declaration is fairly boilerplate, like so:
func NewLeveldbStorage(dbPath string) *leveldbStorage {
opts := levigo.NewOptions()
opts.SetCache(levigo.NewLRUCache(3<<30))
opts.SetCreateIfMissing(true)
log.Debugf("Entering Open")
db, err := levigo.Open(dbPath, opts); if err != nil {
log.Fatal("BOOM %v", err)
}
log.Debugf("Finished calling open")
opts.Close()
return &leveldbStorage{db:db}
}
Here is the struct returned:
type leveldbStorage struct {
db *levigo.DB
}
I then made a few simple GET and STORE commands on the struct that essentially just use s.db.Get and s.db.Put. This works fine in my tests, but when I run the following benchmark:
func BenchmarkLeviDbGet(b *testing.B) {
s := storage.NewLeveldbStorage("/path/to/db")
value := "value"
uid,_ := s.Store(value)
b.ResetTimer()
for i := 0; i < b.N; i++ {
s.Get(uid)
}
This benchmark, when run, returns:
2014/10/12 21:17:09 BOOM %vIO error: lock /path/to/db/LOCK: already held by process
Is there an appropriate way to use levigo/leveldb to enable multithreaded reading? What about writing? I would not be surprised if multithreaded writing is not possible, but multithreaded reading seems like it should be. What am I doing wrong here?
You either need to close the database file or use a global instance to it, you can't open the file multiple times, you can however access the same instance from multiple goroutines.
Is there a Go analogue to Python/Java's async datastore APIs? Or can one just use the normal API with the go keyword?
There is no Go equivalent to the Python or Java asynchronous APIs for any AppEngine service. In fact, the Go standard library has nothing in the standard asynchronous style either. The reason is that in Go, you write functions using a blocking style and compose them using some basic concurrency primitives based on need. While you cannot just tack go at the beginning of a dastore.Get call, it is still relatively straightforward. Consider the following, contrived example:
func loadUser(ctx appengine.Context, name strings) (*User, err) {
var u User
var entries []*Entry
done := make(chan error)
go func() {
// Load the main features of the User
key := datastore.NewKey(ctx, "user", name, 0, nil)
done <- datastore.Get(ctx, key)
}()
go func() {
// Load the entries associated with the user
q := datastore.NewQuery("entries").Filter("user", name)
keys, err := q.GetAll(ctx, &entries)
for i, k := range keys {
entries[i].key = k
}
done <- err
}()
success := true
// Wait for the queries to finish in parallel
for i := 0; i < 2 /* count the funcs above */; i++ {
if err := <-done; err != nil {
ctx.Errorf("loaduser: %s", err)
success = false
}
}
if !success {
return
}
// maybe more stuff here
}
This same approach can be used in pretty much any context in which you need to run more than one thing that might take awhile at the same time, whether it's a datastore call, urlfetch, file load, etc.
There is no explicit API for async in Go. You should use go routines instead. I haven't seen any source on this, but I suspect the async API isn't there because of how easy it is to use go routines.