What's the best way to upload an ILP file into QuestDB? - database

I am migrating from InfluxDB into QuestDB and I have exported my data (using influxd inspect) as a large file containing all my ILP points. It looks something like this (just several Gigs of it):
diagnostics,device_version=v1.0,driver=Albert,fleet=East,model=F-150,name=truck_1027 current_load=2658 1451612300000000000
diagnostics,device_version=v1.0,driver=Albert,fleet=East,model=F-150,name=truck_1027 current_load=3436 1451612310000000000
readings,driver=Trish,fleet=West,model=H-2,name=truck_972 velocity=89 1451831680000000000
Please note I exported a whole bucket so the ILP file contains entries for several measurements/tables.
I want to load into QuestDB, but I can see the HTTP endpoint supports loading CSV files only. I know QuestDB supports ingesting ILP, but the official clients don't accept sending an ILP file. It seems with the client libraries I have to compose an object representing my point and then send it over. I could read the file line by line, parse it and then use the Python client to send the points, but I am wondering if there is a better way.

QuestDB does support the ILP protocol. You can just send your ILP points over a socket connection using the TCP port (defaults to 9009).
The official client libraries do exactly that, but they do it in a more convenient way so you don't have to compose the raw message yourself, which could be error prone.
In your case, since you already have valid ILP points, you can just iterate over your file and send via socket. I am showing a basic example using Python:
import socket
import sys
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def send_utf8(msg):
print(msg)
sock.sendall(msg.encode())
if __name__ == '__main__':
try:
sock.connect(('localhost', 9009))
with open("YOUR_FILE") as infile:
for line in infile:
# print(line)
send_utf8(line)
except socket.error as e:
sys.stderr.write(f'Got error: {e}')
sock.close()

Related

How to configure checkpointing on an XTDB node using AWS S3

I am using XTDB 1.21.0 deployed on AWS/ECS (Fargate) with checkpoints configured (frequency 30 minutes) and stored on an S3 bucket (RocksDB). After a couple of successful checkpoints, they seem to be constantly failing with an XTDB warning due to an exception in the HTTP request to AWS, as shown below:
This leaves the S3 buckets with incomplete checkpoints (i.e., a Folder containing a set of SSTs and other RocksDB files and no associated EDN index file):
XTDB documentation mentions the fact that an optional S3configurator can be passed to the node configuration and after a bit of Googling around I figured that makeClient should be overridden so that connectionAcquisitionTimeout can be set:
NettyNioAsyncHttpClient.builder()
.maxConcurrency(200)
.connectionAcquisitionTimeout(Duration.ofMillis(20000))
I am not too familiar with NETTY so would appreciate if someone could help with the right incantation.
Also I am configuring the XT node from an EDN file, and haven't figure out how to write a S3 configurator in an EDN file (or if it is even possible).
Thanks in advance!
This can happen for large datasets where the default S3 client used will create a new async request for each object (for which the number of objects may be very large, particularly if using the RockDBs index). Internally it uses the connectionAcquisitionTimeout as a type of backpressure to ensure that incoming requests don't wait indefinitely for a connection from the connection pool, however, in this case we're the only source of these requests and we definitely want the requests to complete before starting the nodes so it's reasonable to set the connectionAcquisitionTimeout to something very high (the default is only 10 seconds). A good choice of limit might be something like the maximum amount of time you want to wait for the node to start before failing.
This appears to be a non-optional parameter of the SDK for what I can only assume is a sensible default strategy for requests coming from an external source, in our case we essentially want it to behave as if it was a synchronous operation.
Configuring this in Clojure with xtdb would look something like this:
(ns foo.db
(:require
[xtdb.api :as xtdb]
[xtdb.checkpoint]
[xtdb.rocksdb]
[xtdb.s3.checkpoint])
(:import
(java.time Duration)
(software.amazon.awssdk.http.nio.netty NettyNioAsyncHttpClient)
(software.amazon.awssdk.services.s3 S3AsyncClient)
(xtdb.checkpoint Checkpointer)
(xtdb.s3 S3Configurator)))
(def s3-configurator
(reify S3Configurator
(makeClient [this]
(.. (S3AsyncClient/builder)
(httpClientBuilder
(.. (NettyNioAsyncHttpClient/builder)
(connectionAcquisitionTimeout
(Duration/ofSeconds 600)) ;; Set a high limit here
;; We can rely on the defaults for maxConcurrency and
;; maxPendingConnectionAcquires
;; (maxConcurrency (Integer. 200))
;; (maxPendingConnectionAcquires (Integer. 10000))
))
(build)))))
(defn start-node!
[]
(xtdb/start-node
{:xtdb/index-store
{:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
:db-dir "/var/xtdb/idxs"
:checkpointer {:xtdb/module 'xtdb.checkpoint/->checkpointer
:store {:xtdb/module 'xtdb.s3.checkpoint/->cp-store
:configurator (constantly s3-configurator)
:bucket "checkpoints"}
:approx-frequency "PT3H"}}}}))

Putsftp is taking a wrong path in SFTP server in nifi

I have a flow to fetch file from SFTP server, rename it and put it back to server in same location.
My flow:
Listsftp-> fetchsftp-> updateAttribute-> putsftp
My file location is in d drive, I have mentioned that location in remote path property of putsftp but it taking the path like
c:/users/myname/d:/file/location
And of course it is giving me error.
Is there any solution for this?
Thanks in advance.
you can use the SFTP processor only if you are using a server with Host - Port etc.
If you want to get some files from your disk (C:/ for example) you can use the GETFILE processor
an example of flow could be this:
GETSFTP with the property Keep Source File to false
UpdateAttribute
new property -> filename -> new_file_test.example
PUTSFTP
you can use GETSFTP/GETFILE PUTSFTP/PUTFILE

Anybody know if OrcTableSource supports S3 file system?

I'm running into some troubles with using OrcTableSource to fetch Orc file from cloud Object storage(IBM COS), the code fragment is provided below:
OrcTableSource soORCTableSource = OrcTableSource.builder() // path to ORC
.path("s3://orders/so.orc") // s3://orders/so.csv
// schema of ORC files
.forOrcSchema(OrderHeaderORCSchema)
.withConfiguration(orcconfig)
.build();
seems this path is incorrect but anyone can help out? appreciate a lot!
Caused by: java.io.FileNotFoundException: File /so.orc does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:142)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) at
org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:528)
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:370) at
org.apache.orc.OrcFile.createReader(OrcFile.java:342) at
org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225)
at
org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at
java.lang.Thread.run(Thread.java:748)
By the way, I've already set up flink-s3-fs-presto-1.6.2 and had following code running correctly. The question is limited to OrcTableSource only.
DataSet<Tuple5<String, String, String, String, String>> orderinfoSet =
env.readCsvFile("s3://orders/so.csv")
.types(String.class, String.class, String.class
,String.class, String.class);
The problem is that Flink's OrcRowInputFormat uses two different file systems: One for generating the input splits and one for reading the actual input splits. For the former, it uses Flink's FileSystem abstraction and for the latter it uses Hadoop's FileSystem. Therefore, you need to configure Hadoop's configuration core-site.xml to contain the following snippet
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
See this link for more information about setting up S3 for Hadoop.
This is a limitation of Flink's OrcRowInputFormat and should be fixed. I've created the corresponding issue.

Mule File Inbound - empty files are not triggered

I have a scenario wherein I need to read files from a particular folder. So I had a File inbound as below, its reading all non-empty files. But empty files are not read and sits in the same location as is.
<file:inbound-endpoint path="${file.path}" responseTimeout="10000" doc:name="File" moveToDirectory="${audit.location}">
<file:filename-regex-filter pattern="file.employee(.*).xml,file.client(.*).xml"
caseSensitive="true"/>
</file:inbound-endpoint>
I removed File filter, but still it doesn't read empty files.
Is there a way to enable file inbound to read empty files too?
According the the Mule File Connector documentation:
The File connector as inbound endpoint does not process empty (0 bytes) files.
So this behavior is expected. There is no documented way to process non empty file with the File Inbound Endpoint.
However you can still write your own connector to do this, or use a workaround such as fill your "empty" file with a single character (such as a space) to make it non-empty
If you want to read a file with the size of 0 KB, then you can`t achieve this with File Connector, but we can read a file by using MuleRequester in the flow. I will share sample snippet soon. Please let me know,If you need any help.
Regards,
Sreenivas B
Mule File connector does not process empty (0 bytes) files as inbound endpoint
As per my knowledge File Inbound connector will not process (0 KB) size files.
On the File Connector, the class org.mule.transport.file.FileMessageReceiver.java in method poll has :
if (file.length() == 0)
{
if (logger.isDebugEnabled())
{
logger.debug("Found empty file '" + file.getName() + "'. Skipping file.");
}
continue;
}
that prevents it from proccessing empty files
But you can create your own CustomFileMessageReceiver.java, create your package:
package com.mycompany.mule.transport.file;
And the class that extends AbstractPollingMessageReceiver
public class CustomFileMessageReceiver extends AbstractPollingMessageReceiver
Copy the original FileMessageReceiver.java methods but comment the above lines and change FileMessageReceiver to CustomFileMessageReceiver where needed.
The call fileConnector.move(file, workFile) is a protected method from the original package, commented and beware you cannot use workdir.
In the same package create a copy of org.mule.transport.file.ReceiverFileInputStream.java
Configure your connector:
<file:connector name="FILE" readFromDirectory="${incoming.directory}" autoDelete="true" streaming="false" recursive="true" validateConnections="true" doc:name="File" writeToDirectory="${processed.directory}">
<service-overrides messageReceiver="com.mycompany.mule.transport.file.CustomFileMessageReceiver" />
</file:connector>
Or you may implement your own file connector, as stated in the above answers.

How send file with parameters to soapui

I'm testing webservice request with soapUi and need to send parameters to request, stored in config file.
Parameters are:
Key - static parameter, for access to webservice
Parameters 1-4 - dynamic, can change in config file and send to web-service again.
All parameters both Key and Parameters 1-3 I need send to soapui request as config file using groovy script.
Request to web-service like:
<Header/>
<Body>
<request>
<accesskeytoservice>
<key>Key</key>
</accesskeytoservice>
<UseService>
<parameter1>Parameter1<parameter1>
<parameter2>Parameter2<parameter2>
<parameter3>Parameter3<parameter3>
</UseService>
</request>
</Body>
I had tried to store data in csv, txt and xml format and then read it to soapui request parsing data from file, but all tries still didn't work properly.
What format of config file is preferred in this case?
How are you opening and reading your config file? Using Groovy? If you have successfully done that part the rest is simpler. Define two variables and use
context.setProperty(name,value)
Loop until eof.

Resources