Flink Rest API : /jars/upload returning 404 - apache-flink

Following is my code snippet used for uploading Jar in Flink. I am getting 404 response for this post request. Following is the output for request. I also tried updating the url with /v1/jars/upload but same response. All the API related to jars is giving me same response. I am running this code inside AWS lambda which is present in same vpc where EMR exists which is runing my Flink Job. APIs like /config, /jobs working in this lambda, only APIs like upload jar, submit jobs not working and getting 404 for them
<Response [404]> {"errors":["Not found: /jars/upload"]}
Also tried the same thing by directly logging into job manager node and running curl command, but got the same response. I am using Flink 1.14.2 version on EMR cluster
curl -X POST -H "Expect:" -F
"jarfile=#/home/hadoop/test-1.0-global-14-dyn.jar"
http://ip-10-0-1-xxx:8081/jars/upload
{"errors":["Not found:> /jars/upload"]}
import json
import requests
import boto3
import os
def lambda_handler(event, context):
config = dict(
service_name="s3",
region_name="us-east-1"
)
s3_ = boto3.resource(**config)
bucket = "dv-stream-processor-na-gamma"
prefix = ""
file = "Test-1.0-global-14-dyn.jar"
path = "/tmp/"+file;
try:
s3_.Bucket(bucket).download_file(f"{file}", "/tmp/"+file)
except botocore.exceptions.ClientError as e:
print(e.response['Error']['Code'])
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
print(os.path.isfile('/tmp/' + file))
response = requests.post(
"http://ip-10-0-1-xx.ec2.internal:8081/jars/upload",
files={
"jarfile": (
os.path.basename(path),
open(path, "rb"),
"application/x-java-archive"
)
}
)
print(response)
print(response.text)

Reason for upload jar was not working for me was I was using Flink "Per Job" cluster mode where it was not allowed to submit job via REST API. I updated the cluster mode to "Session" mode and it started working
References for Flink cluster mode information :
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/overview/
Code you can refer to start cluster in session mode : https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#starting-a-flink-session-on-yarn

Related

Why are 404 codes returned from Twitter when using aiohttp and Python 3.8?

I've written a Python script to validate URLs in a website. The script has been used for many years and uses aiohttp to check multiple links in parallel.
Recently (like the last few days), checks against our websites have reported 404 errors when checking links to Twitter like https://twitter.com/linaroorg. The same links work with curl and Postman.
I've extracted as small an amount of the relevant code from the larger link checker script in order to try and figure out what is happening. What I've found is that if I change from using aiohttp to requests-async, the code then works. Since my script and the installed copy of aiohttp hasn't changed, I suspect that Twitter might have changed something at their end that requests-async copes with (somehow) but aiohttp doesn't.
import asyncio
import aiohttp
async def async_url_validation(session, url):
""" Validate the URL. """
async with session.get(
url) as response:
return response.status
async def async_check_web(session, links):
""" Check all external links. """
results = await asyncio.gather(
*[async_url_validation(session, url) for url in links]
)
# That gets us a collection of the responses, matching up to each of
# the tasks, so loop through the links again and the index counter
# will point to the corresponding result.
i = 0
for link in links:
print(link, results[i])
i += 1
async def check_unique_links():
UNIQUE_LINKS = [
"https://twitter.com/linaroorg",
"https://www.linaro.org"
]
async with aiohttp.ClientSession() as session:
await async_check_web(session, UNIQUE_LINKS)
loop = asyncio.get_event_loop()
cul_result = loop.run_until_complete(check_unique_links())
loop.close()
What I do find intriguing is that if I print the full response within async_url_validation then the response from Twitter includes this:
'Content-Length': '1723'
suggesting that Twitter is replying successfully and that it might be something within aiohttp that is triggering the 404 response code.
Versions:
aiohttp: 3.8.1
multidict: 6.0.2
yarl: 1.7.2
Python: 3.8.10
Ubuntu: 20.04
Interestingly, if I install Python 3.10.4 onto the same system, the script then works ... but I don't know why and I don't know what has happened to cause just Twitter links to break the code. The reason I picked Python 3.10.4 is because I tried my test script on Ubuntu 22.04 and it worked ... and that seemed to be the principal difference.

Using Solr V2 API to update Solr config in standalone mode throws SolrException "Solr not running in cloud mode"

Trying to modify config with V2 API in Solr runnning in the standalone mode.
The request is build as follow:
V2Request v2Request = new V2Request.Builder(String.format("/collections/%s/config", collectionName))
.withMethod(SolrRequest.METHOD.POST)
.withPayload(actionPayLoad)
.build();
That results in the SolrException: "Solr not running in cloud mode "
It appears that V2 Http request is generated through org.apache.solr.apiV2HttpCall(Maven: org.apache.solr:solr-core:7.0.0) which requires to run Zookeeper
protected DocCollection getDocCollection(String collectionName) {
if (!cores.isZooKeeperAware()) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Solr not running in cloud mode ");
}
Is there any equivalent Config API call for Solr running in standalone mode so without Zookeeper?
just noticed from https://lucene.apache.org/solr/guide/7_0/config-api.html that it should work in the similiar fashion:
The Config API enables manipulating various aspects of your solrconfig.xml using REST-like API > calls.
This feature is enabled by default and works similarly in both SolrCloud and standalone mode. > Many commonly edited properties (such as cache sizes and commit settings) and request handler > definitions can be changed with this API.
here is solution to update solr config through V1 API:
Collection<ContentStream> contentStreams = ClientUtils.toContentStreams(actionPayLoad, "application/json; charset=UTF-8");
GenericSolrRequest request = new GenericSolrRequest(SolrRequest.METHOD.POST, String.format("/%s/config", collectionName), null);
request.setContentStreams(contentStreams);
request.setUseV2(false);

How to pass classpath via rest api in apache flink 1.9

We successfully star jobs via cli, something like:
./bin/flink run -p 1 -C file://tmp/test-fatjar.jar -c ru.test.TestApps test.jar * some arguments*
Also, we sucessfully can run this job via api, if we register fatjar, json looks like:
{
"entryClass": "ru.test.TestApps",
"parallelism": "1",
"programArgsList" : [ *** cut *** ]
}
How to pass classpath (argument -C) via api?
Thank you.
There is no equivalent to the general classpath option of the CLI. The REST API always expects you to use a fat jar. Since your example also uses a fat jar, I'd point out the general flow:
Upload your fat jar with /jars/upload. The response contains filename (=jarid).
Post to /jars/:jarid/run to start your job. The response contains jobid, which you can use to query the status and cancel.

Error 404 for the url http://localhost:8081/jars/:jarid/run

I have jar file that already upload in flink cluster. I'm using flink 1.6.0
Here is the result after i uploaded the jar file
address "http://localhost:8081"
files
0
id "1d6dc437-bd5f-4147-a37e-b1d40d425a99_NicoWordCount.jar"
name "NicoWordCount.jar"
uploaded 1537174925000
entry
0
name "WordCount"
description null
When I run the following url
"http://localhost:8081/jars/1d6dc437-bd5f-4147-a37e-b1d40d425a99_NicoWordCount.jar/run"
it returns: Failure: 404 Not Found
When I run
"http://localhost:8081/jars/1d6dc437-bd5f-4147-a37e-b1d40d425a99_NicoWordCount.jar/plan"
it returns a result.
When I run NicoWordCount.jar in flink dashboard, it also run well and gives the expected result.
What am I doing wrong?
Which HTTP method do you use?
Run should be executed with POST. For more info on flink's API check this doc.

Testing Google Cloud PubSub push endpoints locally

Trying to figure out the best way to test PubSub push endpoints locally. We tried with ngrok.io, but you must own the domain in order to whitelist (the tool for doing so is also broken… resulting in an infinite redirect loop). We also tried emulating PubSub locally. I am able to publish and pull, but I cannot get the push subscriptions working. We are using a local Flask webserver like so:
#app.route('/_ah/push-handlers/events', methods=['POST'])
def handle_message():
print request.json
return jsonify({'ok': 1}), 200
The following produces no result:
client = pubsub.Client()
topic = client('events')
topic.create()
subscription = topic.subscription('test_push', push_endpoint='http://localhost:5000/_ah/push-handlers/events')
subscription.create()
topic.publish('{"test": 123}')
It does yell at us when we attempt to create a subscription to an HTTP endpoint (whereas live PubSub will if you do not use HTTPS). Perhaps this is by design? Pull works just fine… Any ideas on how to best develop PubSub push endpoints locally?
Following the latest PubSub library documentation at the time of writing, the following example creates a subscription with a push configuration.
Requirements
I have tested with the following requirements :
Google Cloud SDK 285.0.1 (for PubSub local emulator)
Python 3.8.1
Python packages (requirements.txt) :
flask==1.1.1
google-cloud-pubsub==1.3.1
Run PubSub emulator locally
export PUBSUB_PROJECT_ID=fake-project
gcloud beta emulators pubsub start --project=$PUBSUB_PROJECT_ID
By default, PubSub emulator starts on port 8085.
Project argument can be anything and does not matter.
Flask server
Considering the following server.py :
from flask import Flask, jsonify, request
app = Flask(__name__)
#app.route('/_ah/push-handlers/events', methods=['POST'])
def handle_message():
print(request.json)
return jsonify({'ok': 1}), 200
if __name__ == "__main__":
app.run(port=5000)
Run the server (starts on port 5000) :
python server.py
PubSub example
Considering the following pubsub.py :
import sys
from google.cloud import pubsub_v1
if __name__ == "__main__":
project_id = sys.argv[1]
# 1. create topic (events)
publisher_client = pubsub_v1.PublisherClient()
topic_path = publisher_client.topic_path(project_id, "events")
publisher_client.create_topic(topic_path)
# 2. create subscription (test_push with push_config)
subscriber_client = pubsub_v1.SubscriberClient()
subscription_path = subscriber_client.subscription_path(
project_id, "test_push"
)
subscriber_client.create_subscription(
subscription_path,
topic_path,
push_config={
'push_endpoint': 'http://localhost:5000/_ah/push-handlers/events'
}
)
# 3. publish a test message
publisher_client.publish(
topic_path,
data='{"test": 123}'.encode("utf-8")
)
Finally, run this script :
PUBSUB_EMULATOR_HOST=localhost:8085 \
PUBSUB_PROJECT_ID=fake-project \
python pubsub.py $PUBSUB_PROJECT_ID
Results
Then, you can see the results in Flask server's log :
{'subscription': 'projects/fake-project/subscriptions/test_push', 'message': {'data': 'eyJ0ZXN0IjogMTIzfQ==', 'messageId': '1', 'attributes': {}}}
127.0.0.1 - - [22/Mar/2020 12:11:00] "POST /_ah/push-handlers/events HTTP/1.1" 200 -
Note that you can retrieve the message sent, encoded here in base64 (message.data) :
$ echo "eyJ0ZXN0IjogMTIzfQ==" | base64 -d
{"test": 123}
Of course, you can also do the decoding in Python.
This could be a known bug (fix forthcoming) in the emulator where push endpoints created along with the subscription don't work. The bug only affects the initial push config; modifying the push config for an existing subscription should work. Can you try that?
I failed to get PubSub emulator to work on my local env (fails with various java exceptions). I didn't even get to try various features like push with auth, etc. So I end up using ngrok to expose my local dev server and used the public https URL from ngrok in PubSub subscription.
I had no issue with whitelisting and redirects like described in the Q.
So might be helpful for anyone else.

Resources