How to insert two docs to solr as one document - solr

I have two documents. One document contains the name of the person, corresponding rank and the doc id, this document is in csv format. Screenshot for the same is below.
The other set of documents contains paragraphs. Here is the screenshot of an ohter set of documents, these documents are named as doc id and is in text format.
I need to insert these two as one doc in solr such that in solr I have a doc of format :
Person: arthur w cabot
KDE Rank: 5.98+108
Text: Text from the other set of documents
How can I achieve this. Also, I would like to know if there is other approach that I can follow?

In your case you can build the solr document and commit it to solr.
Something like below :
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("title", fileName);
document.addField("text", contentBuilder.toString());
solr.add(document);
solr.commit();
In your case the fields are personName and personRank and the documentContent.
I assume that the reading of the csv file would be done from your end and you will retrieve the document name and you already know where the document is located.
As mentioned you can read the csv file, you will the data for the personName an PersonRank directly.
The third is about the field document content. As you only get the document file name, you can read the content of the document and pass it to the solr document as the third field.
I have done one option for you. Something like below :
String urlString = "http://localhost:8983/solr/TestCore";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
StringBuilder contentBuilder = new StringBuilder();
try (Stream<String> stream = Files.lines(Paths.get("D:/LogFolder/IB4_buildViewSchema.txt"),
StandardCharsets.UTF_8)) {
stream.forEach(s -> contentBuilder.append(s).append("\n"));
} catch (IOException e) {
e.printStackTrace();
}
try {
File file = new File("D:/LogFolder/IB4_buildViewSchema.txt");
String fileName = file.getName();
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("title", fileName);
document.addField("text", contentBuilder.toString());
solr.add(document);
solr.commit();
} catch (SolrServerException | IOException e) {
e.printStackTrace();
}
This will go in iterative mode for all the data of the csv.
Check if you can do it batches and you need to look for the optimizing the code as well.
This code is not a full proof solution for your problem.
I verified if the data is indexed in solr by querying it to solr by solr admin page.
Please refer the image below :
Note : I build a maven project and written the above piece of code. If you want you can use the below pom.xml for your reference.
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>solr</groupId>
<artifactId>TestSolr2</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>TestSolr2</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.source>1.8</maven.compiler.source>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>7.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-cell</artifactId>
<version>7.6.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>

Related

Does org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer implements SinkFunction<T> sinkFunction

I am trying to implement a simple flink job that use org.apache.flink.streaming.connectors, take a Kafka topic as input source and output to a Kafka sink. I am following this guide https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/ and write code as such
FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>(TOPIC_IN, new SimpleStringSchema(), props); //FlinkKafkaConsumer<String> testKafkaConsumer = new FlinkKafkaConsumer<>(TOPIC_TEST, new SimpleStringSchema(), props);
kafkaConsumer.setStartFromEarliest();
DataStream<String> dataStream = env.addSource(kafkaConsumer);
StringSchema stringSchema = new StringSchema(TOPIC_OUT);
FlinkKafkaProducer<String> kafkaProducer = new FlinkKafkaProducer<>(TOPIC_OUT, stringSchema, props, FlinkKafkaProducer.Semantic.EXACTLY_ONCE);
//addSink((SinkFunction<String>) kafkaProducer);
dataStream.addSink(kafkaProducer);
However, addSinkneeds SinkFunction while I provide a FlinkKafkaProducer, which extends TwoPhaseCommitSinkFunction. I am confused why it complains and not works.
My pom.xml file is as follows
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>1.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-base</artifactId>
<version>1.13.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.13.2</version>
<scope>provided</scope>
</dependency>
seems this class has been deprecated https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/kafka/package-summary.html.
There is no FlinkKafkaProducer constructor with the method signature you're using. You could use this one:
public FlinkKafkaProducer(
String topicId,
SerializationSchema<IN> serializationSchema,
Properties producerConfig,
#Nullable FlinkKafkaPartitioner<IN> customPartitioner,
FlinkKafkaProducer.Semantic semantic,
int kafkaProducersPoolSize)

Undocumented Constraint? publishing to topic *from* pubsub trigger

I don't know if I'm going crazy, or if this is a limitation that just isn't documented (I've scoured the GCP API docs):
Is it possible to have a cloud function with a pubsub trigger on 'topic A', and inside that cloud function, publish a message to 'topic B'.
I've tried all the other triggers with identical code running (cloud functions as HTTP triggers, Cloud Storage Triggers, Firebase Triggers), and they all successfully publish to topics.
But the moment I (almost literally) copy-paste my code into a pubsub trigger, after consuming the message, when it attempts to publish it's own message to the next topic, it just hangs. The function just times-out when attempting to publish.
So to recap, is the following possible in GCP?
PubSub Topic A --> Cloud Function --> Pubsub Topic B
Thanks in advance for any clarifications! This is all in Java 11. Here's the code:
...<bunch of imports>
public class SignedURLGenerator implements BackgroundFunction<PubSubMessage> {
private static final String PROJECT_ID = System.getenv("GOOGLE_CLOUD_PROJECT");
private static final Logger logger = Logger.getLogger(SignedURLGenerator.class.getName());
/**
* Handle the incoming PubsubMessage
**/
#Override
public void accept(PubSubMessage message, Context context) throws IOException, InterruptedException {
String data = new String(Base64.getDecoder().decode(message.data));
System.out.println("The input message is: " + data.toString());
//Do a bunch of other stuff not relevant to the issue at hand...
publishSignedURL(url.toString());
}
//Here's the interesting part
public static void publishSignedURL(String message) throws IOException, InterruptedException {
String topicName = "url-ready-notifier";
String responseMessage;
Publisher publisher = null;
try {
// Create the PubsubMessage object
ByteString byteStr = ByteString.copyFrom(message, StandardCharsets.UTF_8);
PubsubMessage pubsubApiMessage = PubsubMessage.newBuilder().setData(byteStr).build();
System.out.println("Message Constructed:" + message);
//This part works fine, the message gets constructed
publisher = Publisher.newBuilder(ProjectTopicName.of(PROJECT_ID, topicName)).build();
System.out.println("Publisher Created.");
//This part also works fine, the publisher gets created
publisher.publish(pubsubApiMessage).get();
responseMessage = "Message published.";
//The code NEVER GETS HERE. The message is never published. And eventually the cloud function time's out :(
} catch (InterruptedException | ExecutionException e) {
System.out.println("Something went wrong with publishing: " + e.getMessage());
}
System.out.println("Everything wrapped up.");
}
Edit
As requested, this is my current POM
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cloudfunctions</groupId>
<artifactId>pubsub-function</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.target>11</maven.compiler.target>
<maven.compiler.source>11</maven.compiler.source>
</properties>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>20.6.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>com.google.cloud.functions</groupId>
<artifactId>functions-framework-api</artifactId>
<version>1.0.1</version>
<type>jar</type>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.117.1</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-pubsub</artifactId>
<version>1.113.4</version>
</dependency>
<dependency>
<groupId>com.google.api</groupId>
<artifactId>gax</artifactId>
<version>1.66.0</version>
</dependency>
<dependency>
<groupId>com.google.api</groupId>
<artifactId>gax-grpc</artifactId>
<version>1.66.0</version>
</dependency>
<dependency>
<groupId>org.threeten</groupId>
<artifactId>threetenbp</artifactId>
<version>0.7.2</version>
</dependency>
</dependencies>
</project>
Can you try to explicitly set the flow control param in your publisher client? like that
publisher = Publisher.newBuilder(ProjectTopicName.of(PROJECT_ID, topicName)).setBatchingSettings(BatchingSettings.newBuilder()
.setDelayThreshold(Duration.of(10, ChronoUnit.SECONDS))
.setElementCountThreshold(1L)
.setIsEnabled(true)
.build()).build();
I don't know what happens, maybe a default and global configuration of PubSub. If it's not that, I will delete this answer.
EDIT 1
Here a screen capture of the builder class on a Publisher parent classe
You have all the default value of the library. However, the behavior that you observe isn't normal. The default must stay the default even if you are in a PubSub trigger. I will open an issue and forward it to the team directly.

Apache Cxf Webclient Doen't Work As Expected in Tomee 8

I am trying to get jwk keyset from google for use with Apache Cxf OIDC and Jose Libs. The code works fine when I run it on a stand alone main method.
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
final WebClient client = WebClient.create("https://www.googleapis.com/oauth2/v3/certs", Arrays.asList(new JsonWebKeysProvider()), true).accept(MediaType.APPLICATION_JSON);
JsonWebKeys keys = client.get(JsonWebKeys.class);
keys.getKeys().forEach(key -> {
System.out.println("****************************************************************************");
System.out.println("ID........." + key.getKeyId());
System.out.println("Alg........" + key.getAlgorithm());
System.out.println("Key Type..." + key.getKeyType());
System.out.println("Use........" + key.getPublicKeyUse());
});
}
}
The ID, algorithm, key type and use is printed properly meaning that the keys are property populated.
Sample output:
****************************************************************************
ID.........79c809dd1186cc228c4baf9358599530ce92b4c8
Alg........RS256
Key Type...RSA
Use........sig
****************************************************************************
ID.........17d55ff4e10991d6b0efd392b91a33e54c0e218b
Alg........RS256
Key Type...RSA
Use........sig
pom.xml extract for Main class.
<dependencies>
<dependency>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-rs-client</artifactId>
<version>3.3.5</version>
</dependency>
<dependency>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-rs-security-sso-oidc</artifactId>
<version>3.3.5</version>
</dependency>
</dependencies>
The same code however doesn't work when deployed in Tomee 8.
#WebServlet(name = "NewServlet", urlPatterns = {"/x"})
public class NewServlet extends HttpServlet {
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
PrintWriter writer = response.getWriter();
final WebClient client = WebClient.create("https://www.googleapis.com/oauth2/v3/certs", Arrays.asList(new JsonWebKeysProvider()), true).accept(MediaType.APPLICATION_JSON);
JsonWebKeys keys = client.get(JsonWebKeys.class);
keys.getKeys().forEach(key -> {
writer.println("****************************************************************************");
writer.println("ID........." + key.getKeyId());
writer.println("Alg........" + key.getAlgorithm());
writer.println("Key Type..." + key.getKeyType());
writer.println("Use........" + key.getPublicKeyUse());
});
}
}
The ID, algorithm, key type and use is null when this code runs in Tomee 8. I have added cxf oidc lib and jose jars are installed in tomee/lib folder.
Sample output:
****************************************************************************
ID.........null
Alg........null
Key Type...null
Use........null
****************************************************************************
ID.........null
Alg........null
Key Type...null
Use........null
pom.xml extract for the servlet.
<dependencies>
<dependency>
<groupId>org.apache.tomee</groupId>
<artifactId>javaee-api</artifactId>
<version>8.0-3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-frontend-jaxrs</artifactId>
<version>${cxf.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-rs-security-sso-oidc</artifactId>
<version>${cxf.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-rs-client</artifactId>
<version>${cxf.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
What is causing this issue?
I realized that when Webclient is created inside tomee, it picks up bus properties provided by tomee which was causing JsonWebKeysProvider not to be invoked.
In my case below is the correct way to create the client inside tomee.
JAXRSClientFactoryBean sf = new JAXRSClientFactoryBean();
sf.setAddress("https://www.googleapis.com/oauth2/v3/certs");
sf.setProvider(new JsonWebKeysProvider());
sf.setBus(new ExtensionManagerBus());
Calling sf.setBus(new ExtensionManagerBus()); ensures tomee provided values/properties aren't picked up.

How can I query any field with MongoRepository

Assume the domain object (MyDomain) has many fileds (f1, f2, f3 ... f100), define a MyDomainRepository from MongoRepository, I want to take field name and value as parameters instead of hard code the field name as part of query method, like below:
List<MyDomain> findByNameAndValue(string name, string value);
if the name and value is "f1" and "foo", the method will find all documents whose field "f1" equals "foo".
I have googled hours and no luck.
Any help from anybody, thanks!
You need to use QueryDSL predicates.
First, add the following dependencies to your pom.xml (assuming you're using maven to build your project):
<dependencies>
...
<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-apt</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.querydsl</groupId>
<artifactId>querydsl-mongodb</artifactId>
</dependency>
...
</dependencies>
Also add this to your build plugins:
<build>
<plugins>
...
<plugin>
<groupId>com.mysema.maven</groupId>
<artifactId>apt-maven-plugin</artifactId>
<version>1.1.3</version>
<executions>
<execution>
<goals>
<goal>process</goal>
</goals>
<configuration>
<outputDirectory>target/generated-sources/java</outputDirectory>
<processor>org.springframework.data.mongodb.repository.support.MongoAnnotationProcessor</processor>
</configuration>
</execution>
</executions>
</plugin>
...
</plugins>
Your repository must extend QueryDslPredicateExecutor:
public interface MyDomainRepository extends MongoRepository<MyDomain, String>,
QueryDslPredicateExecutor<MyDomain> { }
Your repository will then inherit
public Iterable<MyDomain> findAll(Predicate predicate)
and a few other methods.
When you build your project, QueryDSL will generate Q-classes for you, that you can use to programmatically build predicates and query documents matching your predicates:
QMyDomain q = QMyDomain.mydomain;
Predicate p = q.f1.eq(value);
Iterable<MydDomain> i = repository.findAll(p);
To query your resources using a REST controller, you'll need something similar to:
#RestController
#RequestMapping(/"mydomain")
public class MyDomainController {
#Autowired private MyDomainRepository repository;
#GetMapping("/search/query")
public List<MyDomain> query(#QuerydslPredicate(root = MyDomain.class) Predicate predicate) {
return repository.findAll(predicate);
}
}
This last piece of code is quick and dirty made, it won't probably work as is (at least return some kind of List), but you get the idea.
pvpkiran is right, there is no such thing out of the box. You need to build your own using an injected MongoTemplate, for instance:
List<MyDomain> findByNameAndValue(string name, string value) {
Document document = new Document(name, value);
Query query = new BasicQuery(document.toJson());
return mongoTemplate.find(query, MyDomain.class);
}
The interesting thing is that you can go a little further and pass several name/value using a Map:
List<MyDomain> findByNamesAndValues(Map<String, String> parameters) {
Document document = new Document(parameters);
Query query = new BasicQuery(document.toJson());
return mongoTemplate.find(query, MyDomain.class);
}
Just in case, that works with a QueryDSL predicate too:
List<MyDomain> findByNamesAndValues(Predicate predicate) {
AbstractMongodbQuery mongoQuery = new SpringDataMongodbQuery(mongoTemplate, MyDomain.class)
.where(predicate)
Query query = new BasicQuery(mongoQuery.toString());
return mongoTemplate.find(query, MyDomain.class);
}
These methods can be further improved to handle pagination, and other cools feature such as field inclusion/exclusion.

JDBI I want to bulk update using some thing like bulk insert with creating object

I have to batch update using JDBI same like batch insert with out creating object.Any one know the process please let me know .Remember not using object like mapping column to object's attribute
Use argument binding.
Perhaps this is what you're looking for?
PreparedBatch insertBatch = handle.prepareBatch("INSERT INTO foo.bar (baz) VALUES (:bazArgument)");
//assume what you want to insert is stored in a List<String> bazes
for (String st : bazes) {
insertBatch.bind("bazArgument", st).add();
}
int[] countArray = insertBatch.execute();
You can extend it for more variables etc.
Here is a simple example for a batch operation with JDBI and MySQL database. The table is of InnoDB type.
package com.zetcode;
import org.skife.jdbi.v2.Batch;
import org.skife.jdbi.v2.DBI;
import org.skife.jdbi.v2.Handle;
public class JDBIEx6 {
public static void main(String[] args) {
DBI dbi = new DBI("jdbc:mysql://localhost:3306/testdb",
"testuser", "test623");
Handle handle = dbi.open();
Batch batch = handle.createBatch();
batch.add("DROP TABLE IF EXISTS Friends");
batch.add("CREATE TABLE Friends(Id INT AUTO_INCREMENT PRIMARY KEY, Name TEXT)");
batch.add("INSERT INTO Friends(Name) VALUES ('Monika')");
batch.add("INSERT INTO Friends(Name) VALUES ('Tom')");
batch.add("INSERT INTO Friends(Name) VALUES ('Jane')");
batch.add("INSERT INTO Friends(Name) VALUES ('Robert')");
batch.execute();
}
}
The following is a Maven POM file for the project.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.zetcode</groupId>
<artifactId>JDBIEx6</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.jdbi</groupId>
<artifactId>jdbi</artifactId>
<version>2.73</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.39</version>
</dependency>
</dependencies>
</project>
You can learn more about JDBI from my tutorial.

Resources