[Apache Flink]: Where is flink-s3-fs-hadoop plugin? - apache-flink

I would like to read and write some data with Apache Flink 1.11.2 from S3. The documentation recommends to use the presto plugin for checkpoints and the hadoop plugin for pipeline data.
After reading this section you have to copy the plugins from /opt to /plugin. I can find the flink-s3-fs-presto-1.11.2.jar under /opt but there is no flink-s3-fs-hadoop-1.11.2.jar. Where can i find the s3-hadoop plugin for setting up my production environment?
And how can i use these plugins in the IDE? Simply adding these to pom.xml als provided dependencies? And then how can i pass the crentials in IDE?

That is weird I can see that they are both present in the official binaries in opt in 1.11.1. However if You can't find them, You can simply try to get the jars from Maven here and copy them to the required place. Another thing that may work is adding the dependency into the project with compile scope.
Running the job locally is described here. There are various ways of configuring the credentials when running the job in IDE, one might be adding core-site.xml to resources folder with proper configruation.
EDIT:
As for the local execution it was explained here a little bit.

Related

Flink: Error while running the flink program on CLI

I am trying to run a flink steaming program that uses kafka connector(latest universal connector).
The jobs runs without any problem on IntelliJ but when I am submitting the code build into jar using sbt package is giving me below error.
java.lang.ClassNotFoundException: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase
I also used the jar built using traditional IntellIJ option but still i get the above error.
Most probably the issue is the fact that You are not including the dependencies in Your JAR file. Connector dependencies are not included in the Flink binary.
Generally, the preferred way of tackling this issue is to use the proper plugin for Your build tool like shade-plugin for Maven or assembly for sbt to create so-called fat-jar i.e. the JAR with the dependencies included.

NoClassDefFoundError: Could not initialize OauthRawGcsServiceFactory on production environment

I'm using appengine-sdk 1.9.3.
In devserver, works in Eclipse and Ant normally.
When I deploy (update) to appengine (production environment), I get this error:
event.getResults(): [<pre>Error for /p7/formPanelServlet
java.lang.NoClassDefFoundError: Could not initialize class
com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsServiceFactory
at com.google.appengine.tools.cloudstorage.GcsServiceFactory.createRawGcsService(GcsServiceFactory.java:42)
at com.google.appengine.tools.cloudstorage.GcsServiceFactory.createGcsService(GcsServiceFactory.java:34)
at com.bitvisio.p7.server.FormPanelServlet.<init>(FormPanelServlet.java:27)
At FormPanelServlet.java:27, the code is:
private final GcsService gcsService = GcsServiceFactory
.createGcsService(new RetryParams.Builder()
.initialRetryDelayMillis(10)
.retryMaxAttempts(10)
.totalRetryPeriodMillis(15000)
.build());
I put the lib appengine-gcs-client-0.3.9.jar in war/WEB-INF/lib. I think there is a problem with this lib.
Thanks for help.
Always use tools like Maven or Ivy to resolve dependencies for you. Copying JARs to war/WEB-INF/lib/ directory and editing .classpath file manually will be painful and may not help you always. If you use Eclipse & Google App Engine plugin, use Add Google APIs... as mentioned here - Google Plugin for Eclipse. In my case, adding Cloud Storage API via Google Plugin for Eclipse helped resolve this NoClassDefFoundError.
I had the same problem. I use Ivy to resolve dependencies and always get the latest.integration (with Maven use RELEASE) for revisions.
However I usually ignore transitive libraries. It looks like Google is expanding the API family - at least splitting out discrete functionality.
There are now quite a few transitive dependencies and it seems they released a new version of the GCS client around the same time as 1.9.3.
Getting all dependencies and packaging them in my deployment fixed my issue. I did not have the issue in development which made it more confusing.
You are facing this issue because you are not adding the some of the jar like
google-api-services-storage-v1-rev78-1.22.0.jar download link
joda-time-2.94.jar download link
guava-19.0.jar link to download
you can use the different version of jar according to your appengine-gcs-client jar file.
Note : Add all these jar and build path with the project and problem will get solve.

DataNucleus libraries and maven-gae-plugin

I'm using maven-gae-plugin to manage a Google AppEngine project but I don't know how to include the libraries required to use JPA.
Google's documentation says:
The classpath must contain the JARs 'datanucleus-core-*.jar', 'datanucleus-jpa-*', 'datanucleus-enhancer-*.jar', 'asm-*.jar', and 'geronimo-jpa-*.jar' (where * is the appropriate version number of each JAR) from the 'appengine-java-sdk/lib/tools/' directory, as well as all of your data classes.
How can I tell the plugin to put all the jars in the classpath?
So far I just edited the pom.xml file setting gae.version to 1.7.3 (Leaving datanucleus.version to 1.1.5 and I run mvn gae:unpack but I cannot get it to work.
First, I have problems with javax.persistance that is not found. Do I have to add it manually to pom.xml?
If I do it, the development server starts, but I cannot work with the storage: I get the following error:
SEVERE: Found Meta-Data for class com.sharecost.entities.User but this class is not enhanced!! Please enhance the class before running DataNucleus.
I found a solution to the second part of my question. Looking at the POM.xml file I discovered that the all entities are supposed to be in a **/model package.
I still don't know if the manual inclusion of the javax.persistence dependency is actually required.

DataNucleus Enhancer doesn't work

I'm writing a web app using Google AppEngine and Spring MVC. I carefully upgraded to the v2 of the DataNucleus pluging by following these steps: http://code.google.com/p/datanucleus-appengine/wiki/UpgradingToVersionTwo (I use Eclipse).
When I try to run the Enhancer Tool I get following error:
Exception in thread "main" Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL
"file:/.../eclipse/plugins/com.google.appengine.eclipse.sdkbundle_1.6.4.v201203300216r37/appengine-java-sdk-1.6.4/lib/opt/user/datanucleus/v2/datanucleus-core-3.0.6.jar" is already registered, and you are trying to register an identical plugin located at URL
"file:/.../eclipse/plugins/com.google.appengine.eclipse.sdkbundle_1.6.4.v201203300216r37/appengine-java-sdk-1.6.4/lib/opt/tools/datanucleus/v2/datanucleus-core-3.0.6.jar."
I formatted the message so that you could see the tiny difference, one jar is loaded from "user" directory, the other one from "tools" directory. I don't understand why. In the project build path, there is only the one from "user" and to the DataNucleus configuration I added the one from "tools", just like the howto above suggested.
In other cases I've seen around this message was mostly caused by conflicting versions of datanucleus plugin but it doesn't apply to me. I guess it's just some stupid thing in my case... so what am I doing wrong?
So after all, I didn't read the instructions as carefully as I thought. The problem was really that the jars were there twice, one in the project build path, one in the datanucleus configuration. It shouldn't be in the project build path (or in fact, it shouldn't be in one of them, doesn't matter which one). I added it there automatically when I copied libs to the war directory and I assumed it had to be done. But the instructions clearly say that only jdo-api needs to be in the project build path.
One thing I don't understand though. In one step of the instructions I had to uncheck "use project classpath when running tools" in the DataNuclues configuration. So how is it possible that the plugin was still using the libs configured in the project build path?

Java | Maven - Plugins | Is it suggested to use cargo maven plugin in production environment ?

Currently we have a script that does maven build + tomcat deploy.
Deploying to tomcat involves stopping the server (force stop by setting CATALINA_PID), deleting war file and exploded folder from its webapps directory, copying new war file to webapps/ , deleting tomcat work directory and finally starting the server.
Can we achieve all/most of these using cargo-maven2 plugin (cleaning work/webapps ??)
[ I have a basic config that can be used to just stop,deploy and start ]
And is it good to use the plugin in production environment as well ? The documentation mentions that its useful for integration tests.
Can you guys share your thoughts on this?
Thanks,
Gayathri
As you already mentioned from the docs Cargo is for functional testing. For production you should use other things like puppet or chef. Furthermore Maven is not a deployment tool it's a build tool. From a technical point of view it's of course possible to use it in production but it's not intended.

Resources