zeppelin imported classes not found when using - apache-zeppelin

I get a weird error when using spark on zeppelin. The imported classes are not found when I use them. The code sample is :
%spark
import java.io.Serializable
import java.text.{ParseException, SimpleDateFormat}
import java.util.{Calendar, SimpleTimeZone}
class Pos(val pos: String) extends Serializable {
if (pos.length != 12) {
throw new IllegalArgumentException(s"[${pos}] seems not a valid pos
string")
}
private val cstFormat = new SimpleDateFormat("yyyyMMddHHmm")
private val utcFormat = new SimpleDateFormat("yyyyMMddHHmm")
}
I get the following errors:
import java.io.Serializable
import java.text.{ParseException, SimpleDateFormat}
import java.util.{Calendar, SimpleTimeZone}
<console>:17: error: not found: type SimpleDateFormat
private val cstFormat = new SimpleDateFormat("yyyyMMddHHmm")
^
<console>:18: error: not found: type SimpleDateFormat
private val utcFormat = new SimpleDateFormat("yyyyMMddHHmm")
^
<console>:25: error: not found: type ParseException
case e: ParseException => throw newIllegalArgumentException(s"
^
Is there any method to solve the error?
The zeppelin version is 0.7.3 and spark version is 2.1
Thanks in advance!

It seems you have to write imports within class definition :
%spark
class Pos(val pos: String) extends Serializable {
import java.io.Serializable
import java.text.{ParseException, SimpleDateFormat}
import java.util.{Calendar, SimpleTimeZone}
if (pos.length != 12) {
throw new IllegalArgumentException(s"[${pos}] seems not a valid pos string")
}
private val cstFormat = new SimpleDateFormat("yyyyMMddHHmm")
private val utcFormat = new SimpleDateFormat("yyyyMMddHHmm")
}
If you need your imports for the arguments of your class constructor you can create your class within an object and then call YourObject.YourClass(args) in the Following paragraphs. See this question for another example.

In Zeppelin, you have to import everything on the same line, separating with ; to make it work:
import java.io.Serializable; import java.text.{ParseException, SimpleDateFormat}; import java.util.{Calendar, SimpleTimeZone}; class Pos(val pos: String) extends Serializable {
if (pos.length != 12) {
throw new IllegalArgumentException(s"[${pos}] seems not a valid pos
string")
}
private val cstFormat = new SimpleDateFormat("yyyyMMddHHmm")
private val utcFormat = new SimpleDateFormat("yyyyMMddHHmm")
}

Related

why pass arguments to the constructor of operator function class is null of Flink?

I am studying Flink, I want to build an operator function which extends ProcessWindowFunction and overload a new constructor with a parameter as a field value of the class, but when this class is instanced, without of this field, I am confused. code as follow.
import com.aliyun.datahub.client.model.Field;
import com.aliyun.datahub.client.model.FieldType;
import com.aliyun.datahub.client.model.PutRecordsResult;
import io.github.streamingwithflink.chapter8.PoJoElecMeterSource;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
public class DataHubSinkDemo {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
env.enableCheckpointing(10_000L);
env.setParallelism(2);
RecordSchemaSer schema = new RecordSchemaSer();
schema.addField(new Field("id", FieldType.STRING));
DataStream<PutRecordsResult> out = env
.addSource(new PoJoElecMeterSource())
.keyBy( r -> r.getId())
.window(TumblingProcessingTimeWindows.of(Time.seconds(3)))
.process(new PutDatahubFunction<>(schema)); // PutDatahubFunction is my build a new Operator function class
env.execute();
}
}
variable schema is a parameter which I want to send to the constructor, it is an instance of RecordSchemaSer Class
import com.aliyun.datahub.client.model.RecordSchema;
import java.io.Serializable;
public class RecordSchemaSer
extends RecordSchema
implements Serializable {
}
PutDatahubFunction is a class extends ProcessWindowFunction, code as follows
import com.aliyun.datahub.client.model.*;
import io.github.streamingwithflink.chapter8.PUDAPoJo;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;
public class PutDatahubFunction<IN extends PUDAPoJo, KEY>
extends ProcessWindowFunction<IN, PutRecordsResult, KEY, TimeWindow> {
private DataHubBase dataHubHandler;
private List<RecordEntry> recordEntries;
private RecordSchema schema;
public PutDatahubFunction(RecordSchema schema) {
this.schema = schema;
System.out.println("field 'id' not exist ? " + this.schema.containsField("id")); // it's true
}
#Override
public void open(Configuration parameters) throws Exception {
.........
}
#Override
public void process(KEY KEY,
Context context,
Iterable<IN> elements,
Collector<PutRecordsResult> out)
throws Exception {
RecordEntry entry = new RecordEntry();
for (IN e : elements) {
System.out.println("field 'id' not exist ? " + this.schema.containsField("id")); // it's false
......
}
}
}
the first system.out in the constructor, this.schema.containsField("id") is true, but the second system.out in process method, this.schema.containsField("id") is false! why? I have system.out two class name of the instance which both are PutDatahubFunction.
use ValueState not working, because constructor not call getRuntimeContext(), otherwise Exception in thread "main" java.lang.IllegalStateException: The runtime context has not been initialized. code as follow:
private ValueState<RecordSchema> schema;
public PutTupleDatahubFunction(RecordSchema schema) throws IOException {
ValueStateDescriptor schemaDes =
new ValueStateDescriptor("datahub schema", TypeInformation.of(RecordSchema.class));
/*
* error Exception in thread "main" java.lang.IllegalStateException:
* The runtime context has not been initialized.
*/
this.schema = getRuntimeContext().getState(schemaDes);
this.schema.update(schema);
}
I am very fuzzing, who can tell me the reason, Is there any way to pass arguments to the constructor of this operator function class? thanks.
I finally figured out why,the reason is Serialize and Deserialize. I am not coding RecordSchemaSer reason is Serialize content, due to null
public class RecordSchemaSer
extends RecordSchema
implements Serializable
{
}

Apache beam 2.1.0: Unable to upload to Datastore after following example

I'm having trouble uploading entities to the Cloud Datastore via the Apache Beam Java SDK (2.1.0). The following is my code:
import com.google.cloud.datastore.DatastoreOptions
import com.google.cloud.datastore.Entity
import com.opencsv.CSVParser
import org.apache.beam.runners.dataflow.DataflowRunner
import
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions
import org.apache.beam.sdk.Pipeline
import org.apache.beam.sdk.io.TextIO
import org.apache.beam.sdk.io.gcp.datastore.DatastoreIO
import org.apache.beam.sdk.options.PipelineOptionsFactory
import org.apache.beam.sdk.transforms.DoFn
import org.apache.beam.sdk.transforms.MapElements
import org.apache.beam.sdk.transforms.ParDo
import org.apache.beam.sdk.transforms.SimpleFunction
import java.io.Serializable
object PipelineClass {
class FoodGroup(var id: String? = null,
var group: String? = null) : Serializable
class CreateGroupsFn : SimpleFunction<String, FoodGroup>() {
override fun apply(line: String?): FoodGroup {
val group = FoodGroup()
val parser = CSVParser()
val parts = parser.parseLine(line)
group.id = parts[0].trim()
group.group = parts[1].trim()
return group
}
}
class CreateEntitiesFn : DoFn<FoodGroup, Entity>() {
#ProcessElement
fun processElement(c: ProcessContext) {
val datastore = DatastoreOptions.getDefaultInstance().service
val keyFactory = datastore.newKeyFactory()
.setKind("FoodGroup")
.setNamespace("nutrients")
val key = datastore.allocateId(keyFactory.newKey())
val entity = Entity.newBuilder(key)
.set("id", c.element().id)
.set("group", c.element().group)
.build()
c.output(entity)
}
}
#JvmStatic fun main(args: Array<String>) {
val options =
PipelineOptionsFactory.`as`(DataflowPipelineOptions::class.java)
options.runner = DataflowRunner::class.java
options.project = "simplesample"
options.jobName = "fgUpload"
val pipeline = Pipeline.create(options)
pipeline.apply(TextIO.read().from("gs://bucket/foodgroup.csv"))
.apply(MapElements.via(CreateGroupsFn()))
.apply(ParDo.of(CreateEntitiesFn()))
//error occurs below...
.apply(DatastoreIO.v1().write()
.withProjectId(options.project))
pipeline.run()
}
}
The following is the error I get:
PipelineClass.kt: (75, 24): Type mismatch: inferred type is
DatastoreV1.Write! but PTransform<in PCollection<Entity!>!, PDone!>!
was expected
I've tried SimpleFunction, DoFn, and PTransform (composite and non-composite) to do the transform from String to Entity with no success.
What am I doing wrong?
EDIT: I've finally managed to get my entities in the Datastore. I decided to use Dataflow 1.9.1 and ditched Beam (2.1.0) after seeing this example. Below is my code:
import com.google.cloud.dataflow.sdk.Pipeline;
import com.google.cloud.dataflow.sdk.io.TextIO;
import com.google.cloud.dataflow.sdk.io.datastore.DatastoreIO;
import com.google.cloud.dataflow.sdk.options.Default;
import com.google.cloud.dataflow.sdk.options.Description;
import com.google.cloud.dataflow.sdk.options.PipelineOptions;
import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
import com.google.cloud.dataflow.sdk.transforms.DoFn;
import com.google.cloud.dataflow.sdk.transforms.ParDo;
import com.google.datastore.v1.Entity;
import com.google.datastore.v1.Key;
import com.opencsv.CSVParser;
import javax.annotation.Nullable;
import java.util.UUID;
import static com.google.datastore.v1.client.DatastoreHelper.makeKey;
import static
com.google.datastore.v1.client.DatastoreHelper.makeValue;
public class PipelineClass {
static class CreateEntitiesFn extends DoFn<String, Entity> {
private final String namespace;
private final String kind;
private final Key ancestorKey;
CreateEntitiesFn(String namespace, String kind) {
this.namespace = namespace;
this.kind = kind;
ancestorKey = makeAncestorKey(namespace, kind);
}
Entity makeEntity(String id, String group) {
Entity.Builder entityBuilder = Entity.newBuilder();
Key.Builder keyBuilder = makeKey(ancestorKey, kind,
UUID.randomUUID().toString());
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
entityBuilder.setKey(keyBuilder.build());
entityBuilder.getMutableProperties().put("id",
makeValue(id).build());
entityBuilder.getMutableProperties().put("group",
makeValue(group).build());
return entityBuilder.build();
}
#Override
public void processElement(ProcessContext c) throws Exception {
CSVParser parser = new CSVParser();
String[] parts = parser.parseLine(c.element());
String id = parts[0];
String group = parts[1];
c.output(makeEntity(id, group));
}
}
static Key makeAncestorKey(#Nullable String namespace, String kind) {
Key.Builder keyBuilder = makeKey(kind, "root");
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
return keyBuilder.build();
}
public interface Options extends PipelineOptions {
#Description("Path of the file to read from and store to Cloud
Datastore")
#Default.String("gs://bucket/foodgroup.csv")
String getInput();
void setInput(String value);
#Description("Dataset ID to read from Cloud Datastore")
#Default.String("simplesample")
String getProject();
void setProject(String value);
#Description("Cloud Datastore Entity Kind")
#Default.String("FoodGroup")
String getKind();
void setKind(String value);
#Description("Dataset namespace")
#Default.String("nutrients")
String getNamespace();
void setNamespace(#Nullable String value);
#Description("Number of output shards")
#Default.Integer(0)
int getNumShards();
void setNumShards(int value);
}
public static void main(String args[]) {
PipelineOptionsFactory.register(Options.class);
Options options =
PipelineOptionsFactory.fromArgs(args).as(Options.class);
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.named("ReadLines").from(options.getInput()))
.apply(ParDo.named("CreateEntities").of(new
CreateEntitiesFn(options.getNamespace(), options.getKind())))
.apply(DatastoreIO.v1().write().withProjectId(options.getProject()));
p.run();
}
}

Flink CEP No Results Printed

I am trying to print out a string if Hello and world are found using the Flink CEP library. My source is Kafka and using the console-producer to input the data. That part is working. I can print out what I enter into the topic. However, it will not print out my final message "The world is so nice!". It will not even print out that it entered the lambda. Below is the class
package kafka;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector;
import java.util.Map;
import java.util.Properties;
/**
* Created by crackerman on 9/16/16.
*/
public class WordCount {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("zookeeper.connect", "localhost:2181");
properties.put("group.id", "test");
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<String> src = see.addSource(new FlinkKafkaConsumer08<>("complexString",
new SimpleStringSchema(),
properties));
src.print();
Pattern<String, String> pattern = Pattern.<String>begin("first")
.where(evt -> evt.contains("Hello"))
.followedBy("second")
.where(evt -> evt.contains("World"));
PatternStream<String> patternStream = CEP.pattern(src, pattern);
DataStream<String> alerts = patternStream.flatSelect(
(Map<String, String> in, Collector<String> out) -> {
System.out.println("Made it to the lambda");
String first = in.get("first");
String second = in.get("second");
System.out.println("First: " + first);
System.out.println("Second: " + second);
if (first.equals("Hello") && second.equals("World")) {
out.collect("The world is so nice!");
}
});
alerts.print();
see.execute();
}
}
Any help would be greatly appreciated.
Thanks!
The issue is the following line
see.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
If that is removed, it works the way I expected it to.

Camel FTP with pollStrategy fails

I have a standard route with a ftp uri as a consumer endpoint with a pollStrategy defined and added to the registry. However, I am getting the following error:
Caused by: java.lang.IllegalArgumentException: Could not find a suitable setter for property: pollStrategy as there isn't a setter method with same type: java.lang.String nor type conversion possible: No type converter available to convert from type: java.lang.String to the required type: org.apache.camel.spi.PollingConsumerPollStrategy with value #pollingStrategy
at org.apache.camel.util.IntrospectionSupport.setProperty(IntrospectionSupport.java:588)
at org.apache.camel.util.IntrospectionSupport.setProperty(IntrospectionSupport.java:616)
at org.apache.camel.util.IntrospectionSupport.setProperties(IntrospectionSupport.java:473)
at org.apache.camel.util.IntrospectionSupport.setProperties(IntrospectionSupport.java:483)
at org.apache.camel.util.EndpointHelper.setProperties(EndpointHelper.java:255)
at org.apache.camel.impl.DefaultComponent.setProperties(DefaultComponent.java:257)
at org.apache.camel.component.file.GenericFileComponent.createEndpoint(GenericFileComponent.java:67)
at org.apache.camel.component.file.GenericFileComponent.createEndpoint(GenericFileComponent.java:37)
at org.apache.camel.impl.DefaultComponent.createEndpoint(DefaultComponent.java:114)
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:568)
I have tried different combinations but always end up with this error. Can anyone spot what I am missing? My code seems fairly similar to the Camel unit tests I looked at. The route looks like this:
import org.apache.camel.*;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.impl.DefaultPollingConsumerPollStrategy;
import org.apache.camel.spi.PollingConsumerPollStrategy;
import org.apache.camel.util.ServiceHelper;
import org.apache.commons.lang3.StringUtils;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.CountDownLatch;
import static org.apache.camel.builder.ProcessorBuilder.setBody;
public class Test extends RouteBuilder {
final CamelContext camelContext = getContext();
final org.apache.camel.impl.SimpleRegistry registry = new org.apache.camel.impl.SimpleRegistry();
final org.apache.camel.impl.CompositeRegistry compositeRegistry = new org.apache.camel.impl.CompositeRegistry();
private final CountDownLatch latch = new CountDownLatch(1);
#Override
public void configure() throws Exception {
ExceptionBuilder.setup(this);
compositeRegistry.addRegistry(camelContext.getRegistry());
compositeRegistry.addRegistry(registry);
((org.apache.camel.impl.DefaultCamelContext) camelContext).setRegistry(compositeRegistry);
registry.put("pollingStrategy", new MyPollStrategy());
from("ftp://user#localhost/receive/in?password=1234&autoCreate=false&startingDirectoryMustExist=true&pollStrategy=#pollingStrategy&fileName=test.csv&consumer.delay=10m")
.convertBodyTo(String.class)
.log(LoggingLevel.INFO, "TEST", "${body} : ${headers}");
}
private class MyPollStrategy implements PollingConsumerPollStrategy {
int maxPolls=3;
public boolean begin(Consumer consumer, Endpoint endpoint) {
return true;
}
public void commit(Consumer consumer, Endpoint endpoint, int polledMessages) {
if (polledMessages > maxPolls) {
maxPolls = polledMessages;
}
latch.countDown();
}
public boolean rollback(Consumer consumer, Endpoint endpoint, int retryCounter, Exception cause) throws Exception {
return false;
}
}
}
Note, if I remove the pollStrategy reference in the uri then everything works.
Ok found the solution..must have had one too many beers when working on this..a bit too obvious.
final CamelContext camelContext = getContext();
final org.apache.camel.impl.SimpleRegistry registry = new org.apache.camel.impl.SimpleRegistry();
final org.apache.camel.impl.CompositeRegistry compositeRegistry = new org.apache.camel.impl.CompositeRegistry();
That part should be in the configure method and not in the class variable declaration part.

how to force jettison to write an array, even if there is only one element in the array?

With the simplified example below:
I get the following, as expected:
{"person":{"name":"john","tags":["tag1","tag2"]}}
However, if I only set one tag, I get this:
{"person":{"name":"john","tags":"tag1"}}
And I was expecting to get this:
{"person":{"name":"john","tags":["tag1"]}}
That is, jettison has removed the array for tags, because there is only one element in the array.
I think this is pretty unsafe.
How to force jettison to write an array, even if there is only one element?
Note: I am aware that there are other alternatives to jettison, such as StAXON.
However, here I am asking how to achieve this using Jettison.
Please do not suggest another alternative to jettison.
import java.util.ArrayList;
import java.util.List;
import javax.xml.bind.annotation.*;
import java.io.*;
import javax.xml.bind.*;
import javax.xml.stream.XMLStreamWriter;
import org.codehaus.jettison.mapped.*;
public class JettisonTest {
public static void main(String[] args) throws Exception {
JAXBContext jc = JAXBContext.newInstance(Person.class);
Person person = new Person();
person.name = "john";
person.tags.add("tag1");
person.tags.add("tag2");
Configuration config = new Configuration();
MappedNamespaceConvention con = new MappedNamespaceConvention(config);
Writer writer = new OutputStreamWriter(System.out);
XMLStreamWriter xmlStreamWriter = new MappedXMLStreamWriter(con, writer);
Marshaller marshaller = jc.createMarshaller();
marshaller.marshal(person, xmlStreamWriter);
}
}
#XmlRootElement
#XmlAccessorType(XmlAccessType.FIELD)
class Person {
String name;
List<String> tags = new ArrayList<String>();
}
I found this: https://blogs.oracle.com/japod/entry/missing_brackets_at_json_one
It seems that adding a line to your context resolver to explicitly state that tags is an array is the way to do this; i.e.
props.put(JSONJAXBContext.JSON_ARRAYS, "[\\"tags\\"]");
NB: I'm not familiar with Jettison, so have no personal experience to back this up; only the info on the above blog post.
#Provider
public class JAXBContextResolver implements ContextResolver<JAXBContext> {
private JAXBContext context;
private Class[] types = {ArrayWrapper.class};
public JAXBContextResolver() throws Exception {
Map props = new HashMap<String, Object>();
props.put(JSONJAXBContext.JSON_NOTATION, "MAPPED");
props.put(JSONJAXBContext.JSON_ROOT_UNWRAPPING, Boolean.TRUE);
props.put(JSONJAXBContext.JSON_ARRAYS, "[\\"tags\\"]"); //STATE WHICH ELEMENT IS AN ARRAY
this.context = new JSONJAXBContext(types, props);
}
public JAXBContext getContext(Class<?> objectType) {
return (types[0].equals(objectType)) ? context : null;
}
}

Resources