Quarkus server-side http-cache

Quarkus server-side http-cache - resteasy

I tried to find out how to config. a server-side rest client (i.e. microservice A calls other microservice B using rest) to used a http cache.
The background is, that the binary entities transfered over the wire can be quite large. Overall performance can benefit from a cache on microservice A side which employs http caching headers and etags provided by microservice B.
I found a solution that seems to work, but I'm not sure it that is a proper solution, that work together with current requests, that can occur on microservice A at any time.
#Inject
/* package private */ ManagedExecutor executor;
//
// Instead of using a declarative rest client we create it ourselves, because we can then supply a server-side cache: See ctor()
//
private ServiceBApi serviceClientB;
#ConfigProperty(name="serviceB.url")
/* package private */ String serviceBUrl;
#ConfigProperty(name="cache-entries")
/* package private */ int cacheEntries;
#ConfigProperty(name="cache-entrysize")
/* package private */ int cacheEntrySize;
#PostConstruct
public void ctor()
{
// Create proxy ourselves, because we can then supply a server-side cache
final CacheConfig cc = CacheConfig.custom()
.setMaxCacheEntries(cacheEntries)
.setMaxObjectSize(cacheEntrySize)
.build();
final CloseableHttpClient httpClient = CachingHttpClientBuilder.create()
.setCacheConfig(cc)
.build();
final ResteasyClient client = new ResteasyClientBuilderImpl()
.httpEngine(new ApacheHttpClient43Engine(httpClient))
.executorService(executor)
.build();
final ResteasyWebTarget target = (ResteasyWebTarget) client.target(serviceBUrl);
this.serviceClientB = target.proxy(ServiceBApi.class);
}
#Override
public byte[] getDoc(final String id)
{
try (final Response response = serviceClientB.getDoc(id)) {
[...]
// Use normally and no need to handle conditional gets and caching headers and other HTTP protocol stuff here, because this does underlying impl.
[...]
}
}
My questions are:
Is my solution ok as server-side solution, i.e. can it handle concurrent requests?
Is there a declarative (quarkus) way (#RegisterRestClient. etc) to achieve the same?
--
Edit
To make things clear: I want service B to be able to control the caching based on the HTTP get request and the specific resource. Additionally I want to avoid the unnecessary transmission of the large documents service B provides.
--
Mik

Assuming that you have worked with the declarative way of using Quarkus' REST Client before, you would just inject the client in your serviceB-consuming class. The method, that will invoke Service B, should be annotated with #CacheResult. This will cache results depending on the incoming id. See also Quarkus Cache Guide.
Please note: As Quarkus and Vert.x are all about non-blocking operations, you should use the async support of the REST Client.
#Inject
#RestClient
ServiceBApi serviceB;
...
#Override
#CacheResult(cacheName = "service-b-cache")
public Uni<byte[]> getDoc(final String id) {
return serviceB.getDoc(id).map(...);
}
...

Related

Flink integration test(s) with Testcontainers

I have a simple Apache Flink job that looks very much like this:
public final class Application {
public static void main(final String... args) throws Exception {
final var env = StreamExecutionEnvironment.getExecutionEnvironment();
final var executionConfig = env.getConfig();
final var params = ParameterTool.fromArgs(args);
executionConfig.setGlobalJobParameters(params);
executionConfig.setParallelism(params.getInt("application.parallelism"));
final var source = KafkaSource.<CustomKafkaMessage>builder()
.setBootstrapServers(params.get("application.kafka.bootstrap-servers"))
.setGroupId(config.get("application.kafka.consumer.group-id"))
// .setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST))
.setStartingOffsets(OffsetsInitializer.earliest())
.setTopics(config.getString("application.kafka.listener.topics"))
.setValueOnlyDeserializer(new MessageDeserializationSchema())
.build();
env.fromSource(source, WatermarkStrategy.noWatermarks(), "custom.kafka-source")
.uid("custom.kafka-source")
.rebalance()
.flatMap(new CustomFlatMapFunction())
.uid("custom.flatmap-function")
.filter(new CustomFilterFunction())
.uid("custom.filter-function")
.addSink(new CustomDiscardSink()) // Will be a Kafka sink in the future
.uid("custom.discard-sink");
env.execute(config.get("application.job-name"));
}
}
Problem is that I would like to provide an integration test for the entire application — sort of like an end-to-end (set of) test(s) for the entire job. I'm using Testcontainers, but I'm not really sure how to move forward with this. For instance, this is how the test looks like (for now):
#Testcontainers
final class ApplicationTest {
private static final DockerImageName DOCKER_IMAGE = DockerImageName.parse("confluentinc/cp-kafka:7.0.1");
#Container
private static final KafkaContainer KAFKA_CONTAINER = new KafkaContainer(DOCKER_IMAGE);
#ClassRule // How come this work in JUnit Jupiter? :/
public static MiniClusterResource cluster;
#BeforeAll
static void init() {
KAFKA_CONTAINER.start();
// ...probably need to wait and create the topic(s) as well
final var config = new MiniClusterResourceConfiguration.Builder().setNumberSlotsPerTaskManager(2)
.setNumberTaskManagers(1)
.build();
cluster = new MiniClusterResource(config);
}
#Test
void main() throws Exception {
// new Application(); // ...what's next?
}
}
I'm not sure how to implement what's required to trigger the job as-is from that point on. Basically, I would like to execute what was defined before, without (almost) any modifications — I've seen plenty of examples that practically build the entire job again, so that's not an option.
Can somebody provide any pointers here?
MessageDeserializationSchema is unbounded, so isEndOfStream returns false. Not sure if that's an impediment.

In order to make the pipeline more testable, I suggest you create a method on your Application class that takes a source and a sink as parameters, and creates and executes the pipeline, using those connectors.
In your tests you can call that method with special sources and sinks that you use for testing. In particular, you will want to use a KafkaSource that uses .setBounded(...) in the tests so that it cleanly handles just the range of data intended for the test(s).
The solutions and tests for the Apache Flink training exercises are organized along these lines; for example, see RideCleansingSolution.java and RideCleansingIntegrationTest.java. These examples don't use kafka or test containers, but hopefully they'll still be helpful.

I would suggest you instrument your application as an opaque-box test by interacting with it through its public API. This can be done either as an out-process test (e.g. by running your application in a container as well, using Testcontainers) are as an in-process test (by creating your Application and calling its main() method).
Now in your comments you explained, that you want to check for the side-effects of interacting with your application (Kafka messages being published). To check this, connect to the KafkaContainer with your own KafkaConsumer from within the test and use a library such as Awaitiliy to wait until the messages have been received.

Spring Boot REST: How to dynamically access the appropriate database schema specified in a client request?

I have one database with 3 schemas (OPS, TEST, TRAIN). All of these schemas have a completely identical table structure. Now lets say I have an endpoint /cars that accepts a query param for the schema/environment. When the user makes a GET request to this endpoint, I need the Spring Boot backend to be able to dynamically access either the OPS, TEST, or TRAIN schema based on the query param specified in the client request.
The idea is something like this where the environment is passed as a request param to the endpoint and then is somehow used in the code to set the schema/datasource that the repository will use.
#Autowired
private CarsRepository carsRepository;
#GetMapping("/cars")
public List<Car> getCars(#RequestParam String env) {
setSchema(env);
return carsRepository.findAll();
}
private setSchema(String env) {
// Do something here to set the schema that the CarsRepository
// will use when it runs the .findAll() method.
}
So, if a client made a GET request to the /cars endpoint with the env request param set to "OPS" then the response would be a list of all the cars in the OPS schema. If a client made the same request but with the env request param set to "TEST", then the response would be all the cars in the TEST schema.
An example of my datasource configuration is below. This one is for the OPS schema. The other schemas are done in the same fashion, but without the #Primary annotation above the beans.
#Configuration
#EnableTransactionManagement
#EnableJpaRepositories(
entityManagerFactoryRef = "opsEntityManagerFactory",
transactionManagerRef = "opsTransactionManager",
basePackages = { "com.example.repo" }
)
public class OpsDbConfig {
#Autowired
private Environment env;
#Primary
#Bean(name = "opsDataSource")
#ConfigurationProperties(prefix = "db-ops.datasource")
public DataSource dataSource() {
return DataSourceBuilder
.create()
.url(env.getProperty("db-ops.datasource.url"))
.driverClassName(env.getProperty("db-ops.database.driverClassName"))
.username(env.getProperty("db-ops.database.username"))
.password(env.getProperty("db-ops.database.password"))
.build();
}
#Primary
#Bean(name = "opsEntityManagerFactory")
public LocalContainerEntityManagerFactoryBean opsEntityManagerFactory(
EntityManagerFactoryBuilder builder,
#Qualifier("opsDataSource") DataSource dataSource
) {
return builder
.dataSource(dataSource)
.packages("com.example.domain")
.persistenceUnit("ops")
.build();
}
#Primary
#Bean(name = "opsTransactionManager")
public PlatformTransactionManager opsTransactionManager(
#Qualifier("opsEntityManagerFactory") EntityManagerFactory opsEntityManagerFactory
) {
return new JpaTransactionManager(opsEntityManagerFactory);
}
}

Personally, I don't feel its right to pass environment as Request Param and toggle the repository based on the value passed.
Instead you can deploy multiple instance of the service pointing to different data source and have a gate keeper(router) to route to the respective service.
By this way clients will be exposed to one gateway service which in turn routes to respective service based on input to gate keeper.

You typically don't want TEST/ACPT instances running on the very same machines because it typically gets harder to [keep under] control the extent to which load on these environments will make the PROD environment slow down.
You also don't want the setup you envisage because it makes it nigh impossible to evolve the app and/or its database structure. (You're not going to switch db schema in PROD at the very same time you're doing this in DEV are you ? Not doing that simultaneous switch is wise, but it breaks your presupposition that "all three databases have exactly the same schema".

Flink 1.6 Async IO - How to increase throughput when enriching a stream, using a REST service call?

I am currently on Flink version 1.6 and am facing an issue with AsyncIO wherein the performance is not up to my expectation.
I am sure I am doing something wrong in my implementation, so any advice/suggestions would be appreciated.
Issue Synopsis -
I am consuming a stream of ids.
For each id, I need to call a REST service.
I've implemented a RichAsyncFunction, which performs the async REST call.
Here's the relevant code method and the asyncInvoke method
// these are initialized in the open method
ExecutorService executorService =
ExecutorService.newFixedThreadPool(n);
CloseableHttpAsyncClient client = ...
Gson gson = ...
public void asyncInvoke(String key, final ResultFuture<Item> resultFuture) throws Exception {
executorService.submit(new Runnable() {
client.execute(new HttpGet(new URI("http://myservice/" + key)), new FutureCallback<HttpResponse>() {
#Override
public void completed(final HttpResponse response) {
System.out.println("completed successfully");
Item item = gson.fromJson(EntityUtils.toString(response.getEntity), Item.class);
resultFuture.complete(Collections.singleton(item));
}
});
});
}
With the above implementation, I've tried :-
Increasing the parallelism of the enrichment operation
Increasing the number of threads in the executor service
Using apache http async client, I've tried tweaking the connection manager settings - setDefaultMaxPerRoute and setMaxTotal.
I am consistently getting a throughput of about 100 requests/sec. The service is able to handle more than 5k per sec.
What am I doing wrong, and how can I improve this ?

How to execute blocking calls within a Spring Webflux / Reactor Netty web application

In my use case where I have a Spring Webflux microservice with Reactor Netty, I have the following dependencies:
org.springframework.boot.spring-boot-starter-webflux (2.0.1.RELEASE)
org.springframework.boot.spring-boot-starter-data-mongodb-reactive (2.0.1.RELEASE)
org.projectreactor.reactor-spring (1.0.1.RELEASE)
For a very specific case I need to retrieve some information from my Mongo database, and process this into query parameters send with my reactive WebClient. As the WebClient nor the UriComponentsBuilder accepts a Publisher (Mono / Flux) I used a #block() call to receive the results.
Since reactor-core (version 0.7.6.RELEASE) which has been included in the latest spring-boot-dependencies (version 2.0.1.RELEASE) it is not possible anymore to use: block()/blockFirst()/blockLast() are blocking, which is not supported in thread xxx, see -> https://github.com/reactor/reactor-netty/issues/312
My code snippet:
public Mono<FooBar> getFooBar(Foo foo) {
MultiValueMap<String, String> parameters = new LinkedMultiValueMap<>();
parameters.add("size", foo.getSize());
parameters.addAll("bars", barReactiveCrudRepository.findAllByIdentifierIn(foo.getBarIdentifiers()) // This obviously returns a Flux
.map(Bar::toString)
.collectList()
.block());
String url = UriComponentsBuilder.fromHttpUrl("https://base-url/")
.port(8081)
.path("/foo-bar")
.queryParams(parameters)
.build()
.toString();
return webClient.get()
.uri(url)
.retrieve()
.bodyToMono(FooBar.class);
}
This worked with spring-boot version 2.0.0.RELEASE, but since the upgrade to version 2.0.1.RELEASE and hence the upgrade from reactor-core to version 0.7.6.RELEASE it is not allowed anymore.
The only real solution I see is to include a block (non-reactive) repository / mongo client as well, but I'm not sure if that is encouraged. Any suggestions?

The WebClient does not accept a Publisher type for its request URL, but nothing prevents you from doing the following:
public Mono<FooBar> getFooBar(Foo foo) {
Mono<List<String>> bars = barReactiveCrudRepository
.findAllByIdentifierIn(foo.getBarIdentifiers())
.map(Bar::toString)
.collectList();
Mono<FooBar> foobar = bars.flatMap(b -> {
MultiValueMap<String, String> parameters = new LinkedMultiValueMap<>();
parameters.add("size", foo.getSize());
parameters.addAll("bars", b);
String url = UriComponentsBuilder.fromHttpUrl("https://base-url/")
.port(8081)
.path("/foo-bar")
.queryParams(parameters)
.build()
.toString();
return webClient.get()
.uri(url)
.retrieve()
.bodyToMono(FooBar.class);
});
return foobar;
}
If anything, this new reactor-core inspection saved you from crashing your whole application with this blocking call in the middle of a WebFlux handler.

Storing the Cursor for App Engine Pagination

I'm trying to implement pagination using App Engine's RPC and GWT (it's an app engine connected project).
How can I pass both the query results and the web-safe cursor object to the GWT client from the RPC?
I've seen examples using a servlet but I want to know how to do it without a servelt.
I've considered caching the cursor on the server using memcache but I'm not sure if that's appropriate or what should be used as the key (session identifier I would assume, but I'm not sure how those are handled on App Engine).
Links to example projects would be fantastic, I've been unable to find any.

OK, so the best way to do this is to store the cursor as a string on the client.
To do this you have to create a wrapper class that is transportable so you can pass back it to the client via RequestFactory that can hold the results list and the cursor string. To do that you create a normal POJO and then a proxy for it.
here's what the code looks like for the POJO:
public class OrganizationResultsWrapper {
public List<Organization> list;
public String webSafeCursorString;
public List<Organization> getList() {
return list;
}
public void setList(List<Organization> list) {
this.list = list;
}
public String getWebSafeCursorString() {
return this.webSafeCursorString;
}
public void setWebSafeCursorString(String webSafeCursorString) {
this.webSafeCursorString = webSafeCursorString;
}
}
for the proxy:
#ProxyFor(OrganizationResultsWrapper.class)
public interface OrganizationResultsWrapperProxy extends ValueProxy{
List<OrganizationProxy> getList();
void setList(List<OrganizationProxy> list);
String getWebSafeCursorString();
void setWebSafeCursorString(String webSafeCursorString);
}
set up your service and requestFactory to use the POJO and proxy respectively
// service class method
#ServiceMethod
public OrganizationResultsWrapper getOrganizations(String webSafeCursorString) {
return dao.getOrganizations(webSafeCursorString);
}
// request factory method
Request<OrganizationResultsWrapperProxy> getOrganizations(String webSafeCursorString);
Then make sure and run the RPC wizard so that your validation process runs otherwise you'll get a request context error on the server.
Here's the implementation in my data access class:
public OrganizationResultsWrapper getOrganizations(String webSafeCursorString) {
List<Organization> list = new ArrayList<Organization>();
OrganizationResultsWrapper resultsWrapper = new OrganizationResultsWrapper();
Query<Organization> query = ofy().load().type(Organization.class).limit(50);
if (webSafeCursorString != null) {
query = query.startAt(Cursor.fromWebSafeString(webSafeCursorString));
}
QueryResultIterator<Organization> iterator = query.iterator();
while (iterator.hasNext()) {
list.add(iterator.next());
}
resultsWrapper.setList(list);
resultsWrapper.setWebSafeCursorString(iterator.getCursor().toWebSafeString());
return resultsWrapper;
}

a second option would be to save the webSafeCursorString in the memcache, as you already mentioned.
my idea looks like this:
the client sends always request like this "getMyObjects(Object... myParams, int maxResults, String clientPaginationString)". the clientPaginationString is uniquely created like shown below
server receives request and looks into the memcache if there is a webSafeCursorString for the key clientPaginationString
if the server finds nothing, he creates the query and save the webSafeCursorString into memcache with the clientPaginationString as the key. -> returns the results
if the server finds the webSafeCursorString he restarts the query with it and returns the results
the problems are how to clean the memcache and how to find a unique clientPaginationString:
a unique clientPaginationString should be the current UserId + the params of the current query + timestemp. this should work just fine!
i really can't think of a easy way how to clean the memcache, however i think we do not have to clean it at all.
we could store all the webSafeCursorStrings and timestemps+params+userid in a WebSafeCursor-Class that contains a map and store all this in the memcache... and clean this Class ones in a while (timestamp older then...).
one improvement i can think of is to save the webSafeCursorString in the memcache with a key that is created on the server (userSessionId + servicename + servicemethodname + params). however, important is that the client sends an information if he is interested in a new query (memcache is overriden) or wants the next pagination results (gets webSafeCursorString from memcache). a reload of the page should work. a second tap in the browser would be a problem i think...
what would you say?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight