Zeppelin 0.7.3 got local class incompatible error - apache-zeppelin

I simply installed Spark 2.2.1 standalone cluster and zeppelin 0.7.3. I tried some logistic regression code like below:
lr = LogisticRegression(maxIter=10000, regParam=0.2)
model1 = lr.fit(training)
trainingSummary = model1.summary
objectiveHistory = trainingSummary.objectiveHistory
for objective in objectiveHistory:
print(objective)
trainingSummary.roc.show()
print("areaUnderROC(training): " + str(trainingSummary.areaUnderROC))
prediction = model1.transform(test)
prediction_train = model1.transform(training)
evaluator=BinaryClassificationEvaluator().setLabelCol("label").setRawPredictionCol("probability").setMetricName("areaUnderROC")
pred_test=prediction.select("label","probability","rawPrediction")
pred_train=prediction_train.select("label","probability","rawPrediction")
ROC_test=evaluator.evaluate(pred_test)
ROC_train=evaluator.evaluate(pred_train)
print("areaUnderROC(training): " + str(ROC_train))
print("areaUnderROC(testing): " + str(ROC_test))
and got the following error. googled and found such problem was fixed in 0.7.1 when reading JSON.
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 192.168.0.17, executor 0): java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 3
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at
....
Caused by: java.io.InvalidClassException:
org.apache.commons.lang3.time.FastDateParser; local class incompatible:
stream classdesc serialVersionUID = 2, local class serialVersionUID = 3
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
....

Related

Is it possible to run Flink SQL without a cluster in java/kotlin?

I want to run locally SQL expressions over datastreams from Kafka, Kinesis, etc.
I've tried running the following code, which basically creates a datastream source from kafka, and registers it as a table, then I run a select * on it, and get the result back as a datastream. CollectSink is just a utility sink I have to be able to debug those messages.
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironment()
val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env, EnvironmentSettings.inStreamingMode())
val source = KafkaSource.builder<String>()
.setBootstrapServers(defaultKafkaProperties["bootstrap.servers"] as String)
.setTopics(topicName)
.setGroupId("my-group-test" + Math.random())
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(SimpleStringSchema())
.build()
val stream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
val table = tableEnv.fromDataStream(stream)
tableEnv.createTemporaryView("Source", table);
val resultTable = tableEnv.sqlQuery("select * from Source")
val resultStream = tableEnv.toDataStream(resultTable)
resultStream.addSink(CollectSink())
env.execute()
I'm always getting the following error, and I don't know why, as I'm not using scala in my application code, so I assume I'm missing a dependency somewhere?
java.lang.NoClassDefFoundError: scala/Serializable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:821)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:719)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:642)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:600)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.flink.table.planner.expressions.PlannerTypeInferenceUtilImpl.<clinit>(PlannerTypeInferenceUtilImpl.java:51)
at org.apache.flink.table.planner.delegation.PlannerBase.<init>(PlannerBase.scala:92)
at org.apache.flink.table.planner.delegation.StreamPlanner.<init>(StreamPlanner.scala:52)
at org.apache.flink.table.planner.delegation.DefaultPlannerFactory.create(DefaultPlannerFactory.java:61)
at org.apache.flink.table.factories.PlannerFactoryUtil.createPlanner(PlannerFactoryUtil.java:50)
at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.create(StreamTableEnvironmentImpl.java:151)
at org.apache.flink.table.api.bridge.java.StreamTableEnvironment.create(StreamTableEnvironment.java:128)
These are the dependencies I'm using
val flinkVersion = "1.14.3"
val flinkScalaVersion = 2.12
implementation("org.apache.flink:flink-clients_$flinkScalaVersion:$flinkVersion")
implementation("org.apache.flink:flink-table-api-java-bridge_$flinkScalaVersion:$flinkVersion")
implementation("org.apache.flink:flink-table-planner_$flinkScalaVersion:$flinkVersion")
implementation("org.apache.flink:flink-streaming-scala_$flinkScalaVersion:$flinkVersion")
implementation("org.apache.flink:flink-table-common:$flinkVersion")
implementation("org.apache.flink:flink-connector-kafka_$flinkScalaVersion:$flinkVersion")
implementation("org.apache.flink:flink-avro-confluent-registry:$flinkVersion")
implementation("org.apache.flink:flink-avro:$flinkVersion")

Springboot #DataJpaTest Integration test with external sql server db

I have below repository class in my spring boot project. This repository has a method that returns Inventory data from the SQL server. This is working on my project.
#Repository
public interface InventoryRepository extends JpaRepository<Inventory, Integer> {
Inventory findByInventoryIdAndCompanyId(Integer inventoryId, Integer companyId);
}
I want to write a integration for the repository which should get data from dev and test environment SQL server DB.
This dev and test environment db has data already.
Below are the application.yml files in my resources folder (I have changed url and credential intentionally to show here).
application.yml :
spring:
profiles.active: development
application-developement.yml :
spring:
profiles: development
spring.datasource.type: com.zaxxer.hikari.HikariDataSource
spring.datasource.jdbc-url: jdbc:sqlserver://22.22.22.22:1533;instanceName=SQLSVR;databaseName=dev
spring.datasource.username: admin
spring.datasource.password: admin
spring.datasource.driver-class-name: com.microsoft.sqlserver.jdbc.SQLServerDriver
application-test.yml :
spring:
profiles: test
spring.datasource.type: com.zaxxer.hikari.HikariDataSource
spring.datasource.jdbc-url: jdbc:sqlserver://11.11.11.11:1533;instanceName=SQLSVR;databaseName=qa
spring.datasource.url: jdbc:sqlserver://11.11.11.11:1533;instanceName=SQLSVR;databaseName=qa
spring.datasource.username: admin
spring.datasource.password: admin
spring.datasource.driver-class-name: com.microsoft.sqlserver.jdbc.SQLServerDriver
Below is the test class for my repository.
#ExtendWith(SpringExtension.class)
#DataJpaTest
#ContextConfiguration
#AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
public class InventoryRepositoryTest {
#Autowired
InventoryRepository inventoryRepository;
#Test
public void getRepositoryByIdTest () {
Assertions.assertEquals(1,inventoryRepository.findByInventoryIdAndCompanyId(1,1));
}
}
Below is the error I am getting while performing this test
2021-10-18 03:35:38.917 INFO 11968 --- [ main] o.s.t.c.transaction.TransactionContext : Began transaction (1) for test context [DefaultTestContext#18d87d80 testClass = InventoryRepositoryTest, testInstance = com.cropin.mwarehouse.common.repository.InventoryRepositoryTest#437da279, testMethod = getRepositoryByIdTest#InventoryRepositoryTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration#618425b5 testClass = InventoryRepositoryTest, locations = '{}', classes = '{class com.cropin.mwarehouse.CropinMWarehouseServiceApplication}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{org.springframework.boot.test.autoconfigure.orm.jpa.DataJpaTestContextBootstrapper=true}', contextCustomizers = set[[ImportsContextCustomizer#58695725 key = [org.springframework.boot.autoconfigure.cache.CacheAutoConfiguration, org.springframework.boot.autoconfigure.data.jpa.JpaRepositoriesAutoConfiguration, org.springframework.boot.autoconfigure.flyway.FlywayAutoConfiguration, org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration, org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration, org.springframework.boot.autoconfigure.jdbc.JdbcTemplateAutoConfiguration, org.springframework.boot.autoconfigure.liquibase.LiquibaseAutoConfiguration, org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration, org.springframework.boot.autoconfigure.transaction.TransactionAutoConfiguration, org.springframework.boot.test.autoconfigure.jdbc.TestDatabaseAutoConfiguration, org.springframework.boot.test.autoconfigure.orm.jpa.TestEntityManagerAutoConfiguration]], org.springframework.boot.test.context.filter.ExcludeFilterContextCustomizer#4b2bac3f, org.springframework.boot.test.json.DuplicateJsonObjectContextCustomizerFactory$DuplicateJsonObjectContextCustomizer#26794848, org.springframework.boot.test.mock.mockito.MockitoContextCustomizer#0, org.springframework.boot.test.autoconfigure.OverrideAutoConfigurationContextCustomizerFactory$DisableAutoConfigurationContextCustomizer#6ad82709, org.springframework.boot.test.autoconfigure.filter.TypeExcludeFiltersContextCustomizer#351584c0, org.springframework.boot.test.autoconfigure.properties.PropertyMappingContextCustomizer#fb6c1252, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverContextCustomizerFactory$Customizer#158a8276], contextLoader = 'org.springframework.boot.test.context.SpringBootContextLoader', parent = [null]], attributes = map[[empty]]]; transaction manager [org.springframework.orm.jpa.JpaTransactionManager#f310675]; rollback [true]
2021-10-18 03:35:38.963 DEBUG 11968 --- [ main] org.hibernate.SQL :
select
inventory0_.inventoryId as inventor1_10_,
inventory0_.createdBy as createdB2_10_,
inventory0_.createdDate as createdD3_10_,
inventory0_.lastModifiedBy as lastModi4_10_,
inventory0_.lastModifiedDate as lastModi5_10_,
inventory0_.balanceWeight as balanceW6_10_,
inventory0_.batchCreationDate as batchCre7_10_,
inventory0_.batchNumber as batchNum8_10_,
inventory0_.clientId as clientId9_10_,
inventory0_.companyId as company10_10_,
inventory0_.currencyUnitId as currenc11_10_,
inventory0_.dateOfEntry as dateOfE12_10_,
inventory0_.harvestReferenceId as harvest13_10_,
inventory0_.inventoryStatus as invento14_10_,
inventory0_.isActive as isActiv15_10_,
inventory0_.itemId as itemId16_10_,
inventory0_.locationId as locatio17_10_,
inventory0_.parentInventoryId as parentI18_10_,
inventory0_.processId as process19_10_,
inventory0_.quantity as quantit20_10_,
inventory0_.quantityBalance as quantit21_10_,
inventory0_.supplierId as supplie22_10_,
inventory0_.unitPrice as unitPri23_10_,
inventory0_.weight as weight24_10_
from
inv.Inventory inventory0_
where
inventory0_.inventoryId=?
and inventory0_.companyId=?
Hibernate:
select
inventory0_.inventoryId as inventor1_10_,
inventory0_.createdBy as createdB2_10_,
inventory0_.createdDate as createdD3_10_,
inventory0_.lastModifiedBy as lastModi4_10_,
inventory0_.lastModifiedDate as lastModi5_10_,
inventory0_.balanceWeight as balanceW6_10_,
inventory0_.batchCreationDate as batchCre7_10_,
inventory0_.batchNumber as batchNum8_10_,
inventory0_.clientId as clientId9_10_,
inventory0_.companyId as company10_10_,
inventory0_.currencyUnitId as currenc11_10_,
inventory0_.dateOfEntry as dateOfE12_10_,
inventory0_.harvestReferenceId as harvest13_10_,
inventory0_.inventoryStatus as invento14_10_,
inventory0_.isActive as isActiv15_10_,
inventory0_.itemId as itemId16_10_,
inventory0_.locationId as locatio17_10_,
inventory0_.parentInventoryId as parentI18_10_,
inventory0_.processId as process19_10_,
inventory0_.quantity as quantit20_10_,
inventory0_.quantityBalance as quantit21_10_,
inventory0_.supplierId as supplie22_10_,
inventory0_.unitPrice as unitPri23_10_,
inventory0_.weight as weight24_10_
from
inv.Inventory inventory0_
where
inventory0_.inventoryId=?
and inventory0_.companyId=?
2021-10-18 03:35:39.142 DEBUG 11968 --- [ main] org.hibernate.SQL :
select
companymas0_.companyId as companyI1_1_0_,
companymas0_.createdBy as createdB2_1_0_,
companymas0_.createdDate as createdD3_1_0_,
companymas0_.lastModifiedBy as lastModi4_1_0_,
companymas0_.lastModifiedDate as lastModi5_1_0_,
companymas0_.companyAddress as companyA6_1_0_,
companymas0_.companyCode as companyC7_1_0_,
companymas0_.companyDesc as companyD8_1_0_,
companymas0_.companyLogo as companyL9_1_0_,
companymas0_.companyName as company10_1_0_,
companymas0_.companyPreferredSubDomain as company11_1_0_,
companymas0_.contactEmail as contact12_1_0_,
companymas0_.contactNumber as contact13_1_0_,
companymas0_.defaultRadiusForGeoFencing as default14_1_0_,
companymas0_.fiscalMonth as fiscalM15_1_0_,
companymas0_.isGDPRRequired as isGDPRR16_1_0_,
companymas0_.isActive as isActiv17_1_0_,
companymas0_.isBlueToothRequired as isBlueT18_1_0_,
companymas0_.isGeoFencingRequired as isGeoFe19_1_0_,
companymas0_.isHarvestPaid as isHarve20_1_0_,
companymas0_.isShareImage as isShare21_1_0_,
companymas0_.isVerified as isVerif22_1_0_,
companymas0_.isZohoEnable as isZohoE23_1_0_,
companymas0_.planTypeId as planTyp24_1_0_,
companymas0_.primaryCountry as primary25_1_0_,
companymas0_.sTA as sTA26_1_0_,
companymas0_.webSite as webSite27_1_0_
from
dbo.CompanyMaster companymas0_
where
companymas0_.companyId=?
Hibernate:
select
companymas0_.companyId as companyI1_1_0_,
companymas0_.createdBy as createdB2_1_0_,
companymas0_.createdDate as createdD3_1_0_,
companymas0_.lastModifiedBy as lastModi4_1_0_,
companymas0_.lastModifiedDate as lastModi5_1_0_,
companymas0_.companyAddress as companyA6_1_0_,
companymas0_.companyCode as companyC7_1_0_,
companymas0_.companyDesc as companyD8_1_0_,
companymas0_.companyLogo as companyL9_1_0_,
companymas0_.companyName as company10_1_0_,
companymas0_.companyPreferredSubDomain as company11_1_0_,
companymas0_.contactEmail as contact12_1_0_,
companymas0_.contactNumber as contact13_1_0_,
companymas0_.defaultRadiusForGeoFencing as default14_1_0_,
companymas0_.fiscalMonth as fiscalM15_1_0_,
companymas0_.isGDPRRequired as isGDPRR16_1_0_,
companymas0_.isActive as isActiv17_1_0_,
companymas0_.isBlueToothRequired as isBlueT18_1_0_,
companymas0_.isGeoFencingRequired as isGeoFe19_1_0_,
companymas0_.isHarvestPaid as isHarve20_1_0_,
companymas0_.isShareImage as isShare21_1_0_,
companymas0_.isVerified as isVerif22_1_0_,
companymas0_.isZohoEnable as isZohoE23_1_0_,
companymas0_.planTypeId as planTyp24_1_0_,
companymas0_.primaryCountry as primary25_1_0_,
companymas0_.sTA as sTA26_1_0_,
companymas0_.webSite as webSite27_1_0_
from
dbo.CompanyMaster companymas0_
where
companymas0_.companyId=?
2021-10-18 03:35:39.292 DEBUG 11968 --- [ main] org.hibernate.SQL :
select
locationma0_.locationId as location1_22_0_,
locationma0_.createdBy as createdB2_22_0_,
locationma0_.createdDate as createdD3_22_0_,
locationma0_.lastModifiedBy as lastModi4_22_0_,
locationma0_.lastModifiedDate as lastModi5_22_0_,
locationma0_.addressLine1 as addressL6_22_0_,
locationma0_.addressLine2 as addressL7_22_0_,
locationma0_.companyId as companyI8_22_0_,
locationma0_.coordinates as coordina9_22_0_,
locationma0_.districtId as distric10_22_0_,
locationma0_.geoId as geoId11_22_0_,
locationma0_.imageName as imageNa12_22_0_,
locationma0_.isActive as isActiv13_22_0_,
locationma0_.latitude as latitud14_22_0_,
locationma0_.locationTypeId as locatio15_22_0_,
locationma0_.longitude as longitu16_22_0_,
locationma0_.name as name17_22_0_,
locationma0_.parentLocationId as parentL18_22_0_,
locationma0_.pincode as pincode19_22_0_,
locationma0_.placeName as placeNa20_22_0_,
locationma0_.stateId as stateId21_22_0_
from
inv.LocationMaster locationma0_
where
locationma0_.locationId=?
Hibernate:
select
locationma0_.locationId as location1_22_0_,
locationma0_.createdBy as createdB2_22_0_,
locationma0_.createdDate as createdD3_22_0_,
locationma0_.lastModifiedBy as lastModi4_22_0_,
locationma0_.lastModifiedDate as lastModi5_22_0_,
locationma0_.addressLine1 as addressL6_22_0_,
locationma0_.addressLine2 as addressL7_22_0_,
locationma0_.companyId as companyI8_22_0_,
locationma0_.coordinates as coordina9_22_0_,
locationma0_.districtId as distric10_22_0_,
locationma0_.geoId as geoId11_22_0_,
locationma0_.imageName as imageNa12_22_0_,
locationma0_.isActive as isActiv13_22_0_,
locationma0_.latitude as latitud14_22_0_,
locationma0_.locationTypeId as locatio15_22_0_,
locationma0_.longitude as longitu16_22_0_,
locationma0_.name as name17_22_0_,
locationma0_.parentLocationId as parentL18_22_0_,
locationma0_.pincode as pincode19_22_0_,
locationma0_.placeName as placeNa20_22_0_,
locationma0_.stateId as stateId21_22_0_
from
inv.LocationMaster locationma0_
where
locationma0_.locationId=?
2021-10-18 03:35:39.663 INFO 11968 --- [ main] o.s.t.c.transaction.TransactionContext : Rolled back transaction for test: [DefaultTestContext#18d87d80 testClass = InventoryRepositoryTest, testInstance = com.cropin.mwarehouse.common.repository.InventoryRepositoryTest#437da279, testMethod = getRepositoryByIdTest#InventoryRepositoryTest, testException = org.opentest4j.AssertionFailedError: expected: <1> but was: <com.cropin.mwarehouse.common.entity.Inventory#5b58f639>, mergedContextConfiguration = [MergedContextConfiguration#618425b5 testClass = InventoryRepositoryTest, locations = '{}', classes = '{class com.cropin.mwarehouse.CropinMWarehouseServiceApplication}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{org.springframework.boot.test.autoconfigure.orm.jpa.DataJpaTestContextBootstrapper=true}', contextCustomizers = set[[ImportsContextCustomizer#58695725 key = [org.springframework.boot.autoconfigure.cache.CacheAutoConfiguration, org.springframework.boot.autoconfigure.data.jpa.JpaRepositoriesAutoConfiguration, org.springframework.boot.autoconfigure.flyway.FlywayAutoConfiguration, org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration, org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration, org.springframework.boot.autoconfigure.jdbc.JdbcTemplateAutoConfiguration, org.springframework.boot.autoconfigure.liquibase.LiquibaseAutoConfiguration, org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration, org.springframework.boot.autoconfigure.transaction.TransactionAutoConfiguration, org.springframework.boot.test.autoconfigure.jdbc.TestDatabaseAutoConfiguration, org.springframework.boot.test.autoconfigure.orm.jpa.TestEntityManagerAutoConfiguration]], org.springframework.boot.test.context.filter.ExcludeFilterContextCustomizer#4b2bac3f, org.springframework.boot.test.json.DuplicateJsonObjectContextCustomizerFactory$DuplicateJsonObjectContextCustomizer#26794848, org.springframework.boot.test.mock.mockito.MockitoContextCustomizer#0, org.springframework.boot.test.autoconfigure.OverrideAutoConfigurationContextCustomizerFactory$DisableAutoConfigurationContextCustomizer#6ad82709, org.springframework.boot.test.autoconfigure.filter.TypeExcludeFiltersContextCustomizer#351584c0, org.springframework.boot.test.autoconfigure.properties.PropertyMappingContextCustomizer#fb6c1252, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverContextCustomizerFactory$Customizer#158a8276], contextLoader = 'org.springframework.boot.test.context.SpringBootContextLoader', parent = [null]], attributes = map[[empty]]]
org.opentest4j.AssertionFailedError:
Expected :1
Actual :com.cropin.mwarehouse.common.entity.Inventory#5b58f639
<Click to see difference>
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1124)
at com.cropin.mwarehouse.common.repository.InventoryRepositoryTest.getRepositoryByIdTest(InventoryRepositoryTest.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:675)
at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:125)
at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:132)
at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:124)
at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:74)
at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:104)
at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:62)
at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:43)
at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:35)
at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:202)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:198)
at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:135)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125)
at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125)
at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125)
at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80)
at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:229)
at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:197)
at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:211)
at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:191)
at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:71)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
Below are the questions for which I am looking for an answers:
Does #DataJpaTest only works with in memory db? Here I am trying to connect with external sql server db is it really connecting to it?
If #DataJpaTest is able to connect to external sql server db then why it is failing as I already have records available for above test parameter.
3.How Can I use profiling here? There is an option as #ActiveProfile but I want to use same block of test for both the environment dev and qa, in that how this profiling will work?
4.In error log it is showing active profile as empty, what does that mean? Is it not picking up the development profile?
How can I achieve integration test connecting to my dev and qa db which already has data. I don't want to use in memory db.
Please help me out with this question.
The actual connection seems to work. #DataJpaTest works with any DataSource configuration, it just takes the opinionated approach to use an in-memory database. You already added the required code to opt-out and use your own DataSource with:
#AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
The actual test failure happens because you try to compare an int with an actual Java object. Either return all objects and check the size:
Assertions.assertEquals(1, inventoryRepository.findAll().size());
... or assert that the result is not null:
Assertions.assertNotNull(inventoryRepository.findByInventoryIdAndCompanyId(1,1));
Take a closer look at the assertion error:
org.opentest4j.AssertionFailedError:
Expected :1
Actual :com.cropin.mwarehouse.common.entity.Inventory#5b58f639
<Click to see difference>
FYI: You should be able to reduce the test setup to:
// #ExtendWith(SpringExtension.class) already comes with #DataJpaTest
#DataJpaTest
// #ContextConfiguration
#AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
public class InventoryRepositoryTest {

Slick can't connect to mssql database

I would like to use Slick (3.2.3) to connect to a MSSQL database.
Currently, my project is the following.
In application.conf, I have
somedbname = {
driver = "slick.jdbc.SQLServerProfile$"
db {
host = "somehost"
port = "someport"
databaseName = "Recupel.Datawarehouse"
url = "jdbc:sqlserver://"${somedbname.db.host}":"${somedbname.db.port}";databaseName="${somedbname.db.databaseName}";"
user = "someuser"
password = "somepassword"
}
}
The "somehost" looks like XX.X.XX.XX where X's are numbers.
My build.sbt contains
name := "test-slick"
version := "0.1"
scalaVersion in ThisBuild := "2.12.7"
libraryDependencies ++= Seq(
"com.typesafe.slick" %% "slick" % "3.2.3",
"com.typesafe.slick" %% "slick-hikaricp" % "3.2.3",
"org.slf4j" % "slf4j-nop" % "1.6.4",
"com.microsoft.sqlserver" % "mssql-jdbc" % "7.0.0.jre10"
)
The file with the "main" object contains
import slick.basic.DatabaseConfig
import slick.jdbc.JdbcProfile
import slick.jdbc.SQLServerProfile.api._
import scala.concurrent.Await
import scala.concurrent.duration._
val dbConfig: DatabaseConfig[JdbcProfile] = DatabaseConfig.forConfig("somedbname")
val db: JdbcProfile#Backend#Database = dbConfig.db
def main(args: Array[String]): Unit = {
try {
val future = db.run(sql"SELECT * FROM somettable".as[(Int, String, String, String, String,
String, String, String, String, String, String, String)])
println(Await.result(future, 10.seconds))
} finally {
db.close()
}
}
}
This, according to all the documentation that I could find, should be enough to connect to the database. However, when I run this, I get
[error] (run-main-0) java.sql.SQLTransientConnectionException: somedbname.db - Connection is not available, request timed out after 1004ms.
[error] java.sql.SQLTransientConnectionException: somedbname.db - Connection is not available, request timed out after 1004ms.
[error] at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
[error] at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
[error] at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
[error] at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
[error] at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:14)
[error] at slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:453)
[error] at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
[error] at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
[error] at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:249)
[error] at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:248)
[error] at slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:37)
[error] at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:274)
[error] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
[error] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[error] at java.base/java.lang.Thread.run(Thread.java:844)
[error] Nonzero exit code: 1
Perhaps related and also annoying, when I run this code for the second (and subsequent) times, I get the following error instead:
Failed to get driver instance for jdbcUrl=jdbc:sqlserver://[...]
which forces me to kill and reload sbt each time.
What am I doing wrong? Worth noting: I can connect to the database with the same credential from a software like valentina.
As suggested by #MarkRotteveel, and following this link, I found a solution.
First, I explicitly set the driver, adding the line
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
in the db dictionary, after password = "somepassword".
Secondly, the default timeout (after one second) appears to be too short for my purposes, and therefore I added the line
connectionTimeout = "30 seconds"
after the previous driver line, still in the db dictionary.
Now it works.

Apache Flink: Class not resolvable through given classloader

I'm trying to consume some kafka topics through flink streams.
Below is the code for the stream.
object VehicleFuelEventStream {
def sink[IN](hosts: String, table: String, ds: DataStream[IN]): CassandraSink[IN] = {
CassandraSink.addSink(ds)
.setQuery(s"INSERT INTO global.$table values (id, vehicle_id, consumption, from, to, version) values (?, ?, ?, ?, ?, ?);")
.setClusterBuilder(new ClusterBuilder() {
override def buildCluster(builder: Cluster.Builder): Cluster = {
builder.addContactPoints(hosts.split(","): _*).build()
}
}).build()
}
def dataStream[T: TypeInformation](env: StreamExecutionEnvironment,
source: SourceFunction[String],
flinkConfigs: FlinkStreamProcessorConfigs,
windowFunction: WindowFunction[VehicleFuelEvent, (String, Double, SortedMap[Date, Double], Long, Long), String, TimeWindow]): DataStream[(String, Double, SortedMap[Date, Double], Long, Long)] = {
val fuelEventStream = env.addSource(source)
.map(e => EventSerialiserRepo.fromJsonStr[VehicleFuelEvent](e))
.filter(_.isSuccess)
.map(_.get)
.assignTimestampsAndWatermarks(VehicleFuelEventTimestampExtractor.withWatermarkDelay(flinkConfigs.windowWatermarkDelay))
.keyBy(_.vehicleId) // key by VehicleId
.timeWindow(Time.minutes(flinkConfigs.windowTimeWidth.toMinutes))
.apply(windowFunction)
fuelEventStream
}
}
Stream is triggered when play framework create it's dependencies on startup via google Guice as below.
#Singleton
class VehicleEventKafkaConsumer #Inject()(conf: Configuration,
lifecycle: ApplicationLifecycle,
repoFactory: StorageFactory,
cache: CacheApi,
cassandra: Cassandra,
fleetConfigs: FleetConfigManager) {
private val kafkaConfigs = KafkaConsumerConfigs(conf)
private val flinkConfigs = FlinkStreamProcessorConfigs(conf)
private val topicsWithClassTags = getClassTagsForTopics
private val cassandraConfigs = CassandraConfigs(conf)
private val repoCache = mutable.HashMap.empty[String, CachedSubjectStatePersistor]
private val props = new Properties()
// add props
private val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.enableCheckpointing(flinkConfigs.checkpointingInterval.toMillis)
if (flinkConfigs.enabled) {
topicsWithClassTags.toList
.map {
case (topic, tag) if tag.runtimeClass.isAssignableFrom(classOf[VehicleFuelEvent]) =>
Logger.info(s"starting for - $topic and $tag")
val source = new FlinkKafkaConsumer09[String](topic, new SimpleStringSchema(), props)
val fuelEventStream = VehicleFuelEventStream.dataStream[String](env, source, flinkConfigs, new VehicleFuelEventWindowFunction)
VehicleFuelEventStream.sink(cassandraConfigs.hosts, flinkConfigs.cassandraTable, fuelEventStream)
case (topic, _) =>
Logger.info(s"no stream processor found for topic $topic")
}
Logger.info("starting flink stream processors")
env.execute("flink vehicle event processors")
} else
Logger.info("Flink stream processor is disabled!")
}
I get the below errors on application startup.
03/13/2018 05:47:23 TriggerWindow(TumblingEventTimeWindows(1800000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer#e899e41f}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.scala:582)) -> Sink: Cassandra Sink(4/4) switched to RUNNING
2018-03-13 05:47:23,262 - [info] o.a.f.r.t.Task - TriggerWindow(TumblingEventTimeWindows(1800000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer#e899e41f}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.scala:582)) -> Sink: Cassandra Sink (3/4) (d5124be9bcef94bd0e305c4b4546b055) switched from RUNNING to FAILED.
org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user class: streams.fuelevent.VehicleFuelEventWindowFunction
ClassLoader info: URL ClassLoader:
Class not resolvable through given classloader.
at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:232)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:95)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:231)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:745)
2018-03-13 05:47:23,262 - [info] o.a.f.r.t.Task - TriggerWindow(TumblingEventTimeWindows(1800000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer#e899e41f}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.scala:582)) -> Sink: Cassandra Sink (2/4) (6b3de15a4f6
.....
org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could not instantiate outputs in order.
at org.apache.flink.streaming.api.graph.StreamConfig.getOutEdgesInOrder(StreamConfig.java:394)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:103)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:231)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: streams.fuelevent.VehicleFuelEventStream$$anonfun$4
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:128)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
2018-03-13 05:47:23,336 - [info] o.a.f.r.t.Task - Source: Custom Source -> Map -> Filter -> Map -> Timestamps/Watermarks (4/4) (97b2313c985592fdec0ac4f7fba8062f) switched from RUNNING to FAILED.
dependencies
// kafka
libraryDependencies += "org.apache.kafka" %% "kafka" % "1.0.0"
//flink
libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % "1.4.2"
libraryDependencies += "org.apache.flink" %% "flink-connector-kafka-0.9" % "1.4.0"
libraryDependencies += "org.apache.flink" %% "flink-connector-cassandra" % "1.4.2"
Any help is appreciated to solve this issue.
Look like Flink can't found this class streams.fuelevent.VehicleFuelEventStream
It's that class inside Flink classpath or inside you Jar file?
Flink doc could bring some light on this https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html

Flink Streaming: From one window, lookup state in another window

I have two streams:
Measurements
WhoMeasured (metadata about who took the measurement)
These are the case classes for them:
case class Measurement(var value: Int, var who_measured_id: Int)
case class WhoMeasured(var who_measured_id: Int, var name: String)
The Measurement stream has a lot of data. The WhoMeasured stream has little. In fact, for each who_measured_id in the WhoMeasured stream, only 1 name is relevant, so old elements can be discarded if one with the same who_measured_id arrives. This is essentially a HashTable that gets filled by the WhoMeasured stream.
In my custom window function
class WFunc extends WindowFunction[Measurement, Long, Int, TimeWindow] {
override def apply(key: Int, window: TimeWindow, input: Iterable[Measurement], out: Collector[Long]): Unit = {
// Here I need access to the WhoMeasured stream to get the name of the person who took a measurement
// The following two are equivalent since I keyed by who_measured_id
val name_who_measured = magic(key)
val name_who_measured = magic(input.head.who_measured_id)
}
}
This is my job. Now as you might see, there is something missing: The combination of the two streams.
val who_measured_stream = who_measured_source
.keyBy(w => w.who_measured_id)
.countWindow(1)
val measurement_stream = measurements_source
.keyBy(m => m.who_measured_id)
.timeWindow(Time.seconds(60), Time.seconds(5))
.apply(new WFunc)
So in essence this is sort of a lookup table that gets updated when new elements in the WhoMeasured stream arrive.
So the question is: How to achieve such a lookup from one WindowedStream into another?
Follow Up:
After implementing in the way Fabian suggested, the job always fails with some sort of serialization issue:
[info] Loading project definition from /home/jgroeger/Code/MeasurementJob/project
[info] Set current project to MeasurementJob (in build file:/home/jgroeger/Code/MeasurementJob/)
[info] Compiling 8 Scala sources to /home/jgroeger/Code/MeasurementJob/target/scala-2.11/classes...
[info] Running de.company.project.Main dev MeasurementJob
[error] Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: The implementation of the RichCoFlatMapFunction is not serializable. The object probably contains or references non serializable fields.
[error] at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:100)
[error] at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1478)
[error] at org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:161)
[error] at org.apache.flink.streaming.api.datastream.ConnectedStreams.flatMap(ConnectedStreams.java:230)
[error] at org.apache.flink.streaming.api.scala.ConnectedStreams.flatMap(ConnectedStreams.scala:127)
[error] at de.company.project.jobs.MeasurementJob.run(MeasurementJob.scala:139)
[error] at de.company.project.Main$.main(Main.scala:55)
[error] at de.company.project.Main.main(Main.scala)
[error] Caused by: java.io.NotSerializableException: de.company.project.jobs.MeasurementJob
[error] at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
[error] at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
[error] at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
[error] at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
[error] at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
[error] at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
[error] at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:301)
[error] at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:81)
[error] ... 7 more
java.lang.RuntimeException: Nonzero exit code returned from runner: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last MeasurementJob/compile:run for the full output.
[error] (MeasurementJob/compile:run) Nonzero exit code returned from runner: 1
[error] Total time: 9 s, completed Nov 15, 2016 2:28:46 PM
Process finished with exit code 1
The error message:
The implementation of the RichCoFlatMapFunction is not serializable. The object probably contains or references non serializable fields.
However, the only field my JoiningCoFlatMap has is the suggested ValueState.
The signature looks like this:
class JoiningCoFlatMap extends RichCoFlatMapFunction[Measurement, WhoMeasured, (Measurement, String)] {
I think what you want to do is a window operation followed by a join.
You can implement the a join of a high-volume stream and a low-value update-by-key stream using a stateful CoFlatMapFunction as in the example below:
val measures: DataStream[Measurement] = ???
val who: DataStream[WhoMeasured] = ???
val agg: DataStream[(Int, Long)] = measures
.keyBy(_._2) // measured_by_id
.timeWindow(Time.seconds(60), Time.seconds(5))
.apply( (id: Int, w: TimeWindow, v: Iterable[(Int, Int, String)], out: Collector[(Int, Long)]) => {
// do your aggregation
})
val joined: DataStream[(Int, Long, String)] = agg
.keyBy(_._1) // measured_by_id
.connect(who.keyBy(_.who_measured_id))
.flatMap(new JoiningCoFlatMap)
// CoFlatMapFunction
class JoiningCoFlatMap extends RichCoFlatMapFunction[(Int, Long), WhoMeasured, (Int, Long, String)] {
var names: ValueState[String] = null
override def open(conf: Configuration): Unit = {
val stateDescrptr = new ValueStateDescriptor[String](
"whoMeasuredName",
classOf[String],
"" // default value
)
names = getRuntimeContext.getState(stateDescrptr)
}
override def flatMap1(a: (Int, Long), out: Collector[(Int, Long, String)]): Unit = {
// join with state
out.collect( (a._1, a._2, names.value()) )
}
override def flatMap2(w: WhoMeasured, out: Collector[(Int, Long, String)]): Unit = {
// update state
names.update(w.name)
}
}
A note on the implementation: A CoFlatMapFunction cannot decide which input to process, i.e., the flatmap1 and flatmap2 functions are called depending on what data arrives at the operator. It cannot be controlled by the function. This is a problem when initializing the state. In the beginning, the state might not have the correct name for an arriving Measurement object but return the default value. You can avoid that by buffering the measurements and joining them once, the first update for the key from the who stream arrives. You'll need another state for that.

Resources