How to configure checkpointing on an XTDB node using AWS S3 - database

I am using XTDB 1.21.0 deployed on AWS/ECS (Fargate) with checkpoints configured (frequency 30 minutes) and stored on an S3 bucket (RocksDB). After a couple of successful checkpoints, they seem to be constantly failing with an XTDB warning due to an exception in the HTTP request to AWS, as shown below:
This leaves the S3 buckets with incomplete checkpoints (i.e., a Folder containing a set of SSTs and other RocksDB files and no associated EDN index file):
XTDB documentation mentions the fact that an optional S3configurator can be passed to the node configuration and after a bit of Googling around I figured that makeClient should be overridden so that connectionAcquisitionTimeout can be set:
NettyNioAsyncHttpClient.builder()
.maxConcurrency(200)
.connectionAcquisitionTimeout(Duration.ofMillis(20000))
I am not too familiar with NETTY so would appreciate if someone could help with the right incantation.
Also I am configuring the XT node from an EDN file, and haven't figure out how to write a S3 configurator in an EDN file (or if it is even possible).
Thanks in advance!

This can happen for large datasets where the default S3 client used will create a new async request for each object (for which the number of objects may be very large, particularly if using the RockDBs index). Internally it uses the connectionAcquisitionTimeout as a type of backpressure to ensure that incoming requests don't wait indefinitely for a connection from the connection pool, however, in this case we're the only source of these requests and we definitely want the requests to complete before starting the nodes so it's reasonable to set the connectionAcquisitionTimeout to something very high (the default is only 10 seconds). A good choice of limit might be something like the maximum amount of time you want to wait for the node to start before failing.
This appears to be a non-optional parameter of the SDK for what I can only assume is a sensible default strategy for requests coming from an external source, in our case we essentially want it to behave as if it was a synchronous operation.
Configuring this in Clojure with xtdb would look something like this:
(ns foo.db
(:require
[xtdb.api :as xtdb]
[xtdb.checkpoint]
[xtdb.rocksdb]
[xtdb.s3.checkpoint])
(:import
(java.time Duration)
(software.amazon.awssdk.http.nio.netty NettyNioAsyncHttpClient)
(software.amazon.awssdk.services.s3 S3AsyncClient)
(xtdb.checkpoint Checkpointer)
(xtdb.s3 S3Configurator)))
(def s3-configurator
(reify S3Configurator
(makeClient [this]
(.. (S3AsyncClient/builder)
(httpClientBuilder
(.. (NettyNioAsyncHttpClient/builder)
(connectionAcquisitionTimeout
(Duration/ofSeconds 600)) ;; Set a high limit here
;; We can rely on the defaults for maxConcurrency and
;; maxPendingConnectionAcquires
;; (maxConcurrency (Integer. 200))
;; (maxPendingConnectionAcquires (Integer. 10000))
))
(build)))))
(defn start-node!
[]
(xtdb/start-node
{:xtdb/index-store
{:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
:db-dir "/var/xtdb/idxs"
:checkpointer {:xtdb/module 'xtdb.checkpoint/->checkpointer
:store {:xtdb/module 'xtdb.s3.checkpoint/->cp-store
:configurator (constantly s3-configurator)
:bucket "checkpoints"}
:approx-frequency "PT3H"}}}}))

Related

What's the best way to upload an ILP file into QuestDB?

I am migrating from InfluxDB into QuestDB and I have exported my data (using influxd inspect) as a large file containing all my ILP points. It looks something like this (just several Gigs of it):
diagnostics,device_version=v1.0,driver=Albert,fleet=East,model=F-150,name=truck_1027 current_load=2658 1451612300000000000
diagnostics,device_version=v1.0,driver=Albert,fleet=East,model=F-150,name=truck_1027 current_load=3436 1451612310000000000
readings,driver=Trish,fleet=West,model=H-2,name=truck_972 velocity=89 1451831680000000000
Please note I exported a whole bucket so the ILP file contains entries for several measurements/tables.
I want to load into QuestDB, but I can see the HTTP endpoint supports loading CSV files only. I know QuestDB supports ingesting ILP, but the official clients don't accept sending an ILP file. It seems with the client libraries I have to compose an object representing my point and then send it over. I could read the file line by line, parse it and then use the Python client to send the points, but I am wondering if there is a better way.
QuestDB does support the ILP protocol. You can just send your ILP points over a socket connection using the TCP port (defaults to 9009).
The official client libraries do exactly that, but they do it in a more convenient way so you don't have to compose the raw message yourself, which could be error prone.
In your case, since you already have valid ILP points, you can just iterate over your file and send via socket. I am showing a basic example using Python:
import socket
import sys
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def send_utf8(msg):
print(msg)
sock.sendall(msg.encode())
if __name__ == '__main__':
try:
sock.connect(('localhost', 9009))
with open("YOUR_FILE") as infile:
for line in infile:
# print(line)
send_utf8(line)
except socket.error as e:
sys.stderr.write(f'Got error: {e}')
sock.close()

Set default settings to 'no-cache' on Google Cloud Storage

Is there a way to set all public links to have 'no-cache' in Google Cloud Storage?
I've seen solutions to use gsutil to set the "Cache-Control" upon file-upload, but I'm looking for a more permanent solution.
There was a conversation about providing a cache invalidation feature but I didn't quite follow the reasoning. Any explanations would be greatly appreciated!
it would be difficult to provide a cache invalidation feature because once served with a non-0 cache TTL any cache on the Internet (not just those under Google's control) is allowed (per HTTP spec) to cache the data
Thanks!
For a more permanent one-time-effort solution, with the current offerings on GCP, you can do this with Cloud Functions.
Create a new Funciton, set the Event type to "On (finalizing/creating) file in the selected bucket" - google.storage.object.finalize. Make sure to select the bucket you want this on. In the body of the function, set the cacheControl / Cache-Control attribute for the blob. The attribute name depends on the language. Here's my version in Python, using cache_control:
main.py:
match the function name below to the Entry point
from google.cloud import storage
def set_file_uncached(event, context):
file = event # auto-generated
print(f"Processing file: {file=}") # logging, if you want it
storage_client = storage.Client()
# we expect just one with that name
blob = storage_client.bucket(file["bucket"]).get_blob(file["name"])
if not blob:
# in case the blob is deleted before this executes
print(f"blob not found")
return None
blob.cache_control = "public, max-age=0" # or whatever you need
blob.patch()
requirements.txt
google-cloud-storage
From the logs: Function execution took 1712 ms, finished with status: 'ok'. This could have been faster but I've set the minimum to 0 instances so it needs to spin-up for each upload. Depending on your usage and cost constraints, you can set it to 1 or something higher.
Other settings:
Retry on failure: No/False
Region: [wherever your bucket is]
Memory allocated: 128 MB (smallest available currently)
Timeout: 5 seconds (smallest available currently, function shouldn't take longer)
Minimum instances: 0
Maximum instances: 1

Failing lein uberjar when reading database from configuration

I am writing a Ring / Compojure app with Clojure that fetches content for pages from database. To be able to create tests for how the content is displayed, I created prod and dev environments and when using dev environment, a mock database is used instead of the production database. I achieve this by reading the database from another file and giving it as a parameter to my routes. Here's a simplified version:
(defn www-routes [db]
(defroutes www-routes
(GET "/" [] ...)))
(def config (delay (load-file (.getFile (resource "config.clj")))))
(defn db []
(if (= "dev" (:database #(force config)))
'kipsu.db-mock
'kipsu.database))
(def app (routes (www-routes (db)))
The setup is largely taken from the example here, with the addition of setting the database as a parameter.
This setup works great with running the tests with the mock database and displaying real content on prod environment. Things run fine when I start a lein server locally, run tests or any of the functions in lein repl. My problem comes when I'd like to create an uberjar for deploying the changes on my server.
This is where I get a NullPointerException when compiling, starting from (db) function call inside the def app. I've tried debugging with poor success and am not even 100% sure where the actual error is. All I know is the db function is never even called. Here's the stack trace:
Compiling kipsu.jdbc.json
Compiling kipsu.database
Compiling kipsu.api_converter
Compiling kipsu.web
java.lang.NullPointerException, compiling:(web.clj:124:29)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3628)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3622)
at clojure.lang.Compiler$BodyExpr.eval(Compiler.java:5879)
at clojure.lang.Compiler$DefExpr.eval(Compiler.java:439)
at clojure.lang.Compiler.compile1(Compiler.java:7323)
at clojure.lang.Compiler.compile(Compiler.java:7390)
at clojure.lang.RT.compile(RT.java:399)
at clojure.lang.RT.load(RT.java:444)
at clojure.lang.RT.load(RT.java:412)
at clojure.core$load$fn__5448.invoke(core.clj:5866)
at clojure.core$load.doInvoke(core.clj:5865)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invoke(core.clj:5671)
at clojure.core$compile$fn__5453.invoke(core.clj:5877)
at clojure.core$compile.invoke(core.clj:5876)
at user$eval9$fn__16.invoke(form-init1768231915654429312.clj:1)
at user$eval9.invoke(form-init1768231915654429312.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:6782)
at clojure.lang.Compiler.eval(Compiler.java:6772)
at clojure.lang.Compiler.load(Compiler.java:7227)
at clojure.lang.Compiler.loadFile(Compiler.java:7165)
at clojure.main$load_script.invoke(main.clj:275)
at clojure.main$init_opt.invoke(main.clj:280)
at clojure.main$initialize.invoke(main.clj:308)
at clojure.main$null_opt.invoke(main.clj:343)
at clojure.main$main.doInvoke(main.clj:421)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at clojure.lang.Var.invoke(Var.java:383)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
at kipsu.web$fn__4568.invoke(web.clj:115)
at clojure.lang.Delay.deref(Delay.java:37)
at clojure.lang.Delay.force(Delay.java:27)
at clojure.core$force.invoke(core.clj:730)
at kipsu.web$db.invoke(web.clj:118)
at clojure.lang.AFn.applyToHelper(AFn.java:152)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3623)
... 30 more
Exception in thread "main" java.lang.NullPointerException, compiling (web.clj:124:29)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3628)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3622)
at clojure.lang.Compiler$BodyExpr.eval(Compiler.java:5879)
at clojure.lang.Compiler$DefExpr.eval(Compiler.java:439)
at clojure.lang.Compiler.compile1(Compiler.java:7323)
at clojure.lang.Compiler.compile(Compiler.java:7390)
at clojure.lang.RT.compile(RT.java:399)
at clojure.lang.RT.load(RT.java:444)
at clojure.lang.RT.load(RT.java:412)
at clojure.core$load$fn__5448.invoke(core.clj:5866)
at clojure.core$load.doInvoke(core.clj:5865)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invoke(core.clj:5671)
at clojure.core$compile$fn__5453.invoke(core.clj:5877)
at clojure.core$compile.invoke(core.clj:5876)
at user$eval9$fn__16.invoke(form-init1768231915654429312.clj:1)
at user$eval9.invoke(form-init1768231915654429312.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:6782)
at clojure.lang.Compiler.eval(Compiler.java:6772)
at clojure.lang.Compiler.load(Compiler.java:7227)
at clojure.lang.Compiler.loadFile(Compiler.java:7165)
at clojure.main$load_script.invoke(main.clj:275)
at clojure.main$init_opt.invoke(main.clj:280)
at clojure.main$initialize.invoke(main.clj:308)
at clojure.main$null_opt.invoke(main.clj:343)
at clojure.main$main.doInvoke(main.clj:421)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at clojure.lang.Var.invoke(Var.java:383)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
at kipsu.web$fn__4568.invoke(web.clj:115)
at clojure.lang.Delay.deref(Delay.java:37)
at clojure.lang.Delay.force(Delay.java:27)
at clojure.core$force.invoke(core.clj:730)
at kipsu.web$db.invoke(web.clj:118)
at clojure.lang.AFn.applyToHelper(AFn.java:152)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3623)
... 30 more
Compilation failed: Subprocess failed
I'm not the most fluent with Clojure and am working with this app to learn more. Any help for steering me at the right direction from here is greatly appreciated!
I think your problem is because def will execute at compile time. You've done the right thing in (def config) and (defn db), but (def app) is going to cause compile time errors if it cannot find your file. To understand why let's look at def.
(def hello (println "hello"))
If you try to compile this code you'll see "hello" printed to your console at compile time and at runtime the var hello will have the value nil.
(def hello (delay (println "hello"))
(def world #hello)
The var hello now won't get evaluated at compile time, but by introducing the var world we get the exact same problem.
Now back to your specific problem. You don't want your configuration to get read at compile time, and you don't want your configuration to have to read a file from disk every single time you need it.
Not reading your configuration at compile time makes me think that maybe it should be a function. If it is then a function you can simply use memoize to ensure it doesn't read from disk every time you call that function.

Camel ActiveMQ client blocking, temp storage usage immediately hits 100%

I'm seeing 100% utilisation of activemq's temp storage (configured to be 100mb), and the activemq client is blocking. This 100% usage remains permanently, and I have no idea what's going on
I have a camel route, which consumes from a queue (QUEUE.IN) using the JmsTransactionManager.
public final class RouteUnderTest extends RouteBuilder {
#Override
public void configure() throws Exception {
from("activemq-transacted:QUEUE.IN")
.bean(myBean)
.to("activemq:QUEUE.OUT");
}
}
While processing the message from this queue I'm invoking a spring-integration client (myBean) which is configured as follows
<int:gateway id="myBean" service-interface="MyBean">
<int:method name="request" request-channel="channel"/>
</int:gateway>
<int:chain input-channel="channel">
<int:transformer ref="transformedToJsonHere"/>
<jms:outbound-gateway request-destination-name="QUEUE.MYBEAN"
receive-timeout="5000"
explicit-qos-enabled="true"
time-to-live="5000"
delivery-persistent="false"/>
<int:transformer ref="transformedToAnObjectHere"/>
</int:chain>
My broker is configured to use LevelDB, and with the following usage limits:
<persistenceAdapter>
<levelDB directory="${activemq.data}/leveldb"/>
</persistenceAdapter>
<systemUsage>
<systemUsage>
<memoryUsage>
<memoryUsage percentOfJvmHeap="70"/>
</memoryUsage>
<storeUsage>
<storeUsage limit="500 mb"/>
</storeUsage>
<tempUsage>
<tempUsage limit="100 mb"/>
</tempUsage>
</systemUsage>
</systemUsage>
When my route consumes the message and then attempts to put a non-persistent message on QUEUE.OUT the client is blocked and my broker shows 100% usage of temp storage.
And I see the following activemq logs
2015-07-28 15:44:59,678 | INFO | Usage(default:temp:queue://QUEUE.MYBEAN:temp) percentUsage=0%, usage=104857600, limit=104857600, percentUsageMinDelta=1%;Parent:Usage(default:temp) percentUsage=100%, usage=104857600, limit=104857600, percentUsageMinDelta=1%: Temp Store is Full (0% of 104857600). Stopping producer (ID:orbit-vm-55561-1438094698190-1:1:3:1) to prevent flooding queue://QUEUE.MYBEAN. See http://activemq.apache.org/producer-flow-control.html for more info (blocking for: 1s) | org.apache.activemq.broker.region.Queue | ActiveMQ NIO Worker 6
The queues look like (You can see that the QUEUE.IN message has been not been dequeued because it's still being processed transactionally, and no message has gone to QUEUE.MYBEAN)
I can fix this problem with any one of the following approaches:
Use KahaDB instead of LevelDB
Increase temp storage limit (150MB seems to do it but I haven't experimented a great deal)
Configure tempDataStore in activemq.xml (see below)
When configuring the tempDataStore it looks like:
<tempDataStore>
<bean xmlns="http://www.springframework.org/schema/beans" class="org.apache.activemq.leveldb.LevelDBStore">
<property name="directory" value="${activemq.data}/tmp" />
</bean>
</tempDataStore>
I should add, we were using KahaDB previously and this worked fine, but the upgrade to LevelDB has exposed this issue. Reverting to KahaDB is not an option.
I'm hoping someone could explain what we're seeing here, as the results are really difficult to understand. Why does using LevelDB necessitate a higher temp usage limit?, and why does configuring the tempDataStore explicitly also fix the problem?
I don't fully understand what's going on here so I'm worried that simply increasing the temp usage limit a little will just hide the problem until a later date.
Versions:
ActiveMQ: 5.11.1
Camel: 2.14.0
Spring: 4.0.8.RELEASE
Spring Integration: 4.0.5.RELEASE
We ran into exactly the same issue with ActiveMQ 5.13.2
The solution when using LevelDB is to explicitly configure a dedicated tempDataStore as you did.
If not, the broker uses the same store (LevelDB) for both persistent (persistent usage) and non-persistent messages (temp usage). You may therefore end-up in situations where the broker doesn't accept any non-persistent message anymore just because the store already holds persistent ones up to the configured tempUsage limit. It will however accept persistent ones if your storeUsage limit is set higher...
When using KahaDB, the broker automatically uses another store for the non-persistent messages (created in the tmp directory). So you don't have the problem...
Look at the following code for more indepth information: https://github.com/apache/activemq/blob/activemq-5.13.2/activemq-broker/src/main/java/org/apache/activemq/broker/BrokerService.java#L1739
When reading that code, remember LevelDBStore implements PListStore, but KahaDBStore doesn't...

java.lang.OutOfMemoryError when processing large pgp file

I want to use Camel 2.12.1 to decrypt some potentially large pgp files. The following flow results in an out of memory exception and the call stack shows that the PGPDataFormat.unmarshal() function is trying to build a ByteArray which is destined to fail if the file is large. Is there a way to pass streams around during unmarshalling?
My route:
from("file:///home/cps/camel/sftp-in?"
+ "include=.*&" // find files using this pattern
+ "move=/home/cps/camel/sftp-archive&" // after done adding records to queue, move file to archive
+ "delay=5000&"
+ "readLock=rename&" // readLock parameters prevent picking up file which is currently changing
+ "readLockCheckInterval=5000")
.choice()
.when(header(Exchange.FILE_NAME_ONLY).regex(".*pgp$|.*PGP$|.*gpg$|.*GPG$")).to("direct:decrypt")
.otherwise()
.to("file:///home/cps/camel/input");
from("direct:decrypt").unmarshal().pgp("file:///home/cps/.gnupg/secring.gpg", "developer", "set42now")
.setHeader(Exchange.FILE_NAME).groovy("request.headers.get('CamelFileNameOnly').replace('.gpg', '')")
.to("file:///home/cps/camel/input/")
.to("log:done");
The exception which shows the converter trying to create a ByteArray:
java.lang.OutOfMemoryError: Java heap space
at org.apache.commons.io.output.ByteArrayOutputStream.needNewBuffer(ByteArrayOutputStream.java:128)
at org.apache.commons.io.output.ByteArrayOutputStream.write(ByteArrayOutputStream.java:158)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1026)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:218)
at org.apache.camel.converter.crypto.PGPDataFormat.unmarshal(PGPDataFormat.java:238)
at org.apache.camel.processor.UnmarshalProcessor.process(UnmarshalProcessor.java:65)
Try with 2.13 or 2.12-SNAPSHOT as we have improved data format and streaming recently. So likely to be better in next release.

Resources