Flink 1.6.0 job jar upload size limit - apache-flink

What's job jar file size limit and is there a chance I could override it ?
With Flink 1.6.0 out and with a fully RESTified job submission I tried uploading jar like:
$ curl http://localhost:8081/jars/upload -X POST -F "jarfile=#word-count-beam/target/word-count-beam-bundled-0.1.jar" --verbose
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST /jars/upload HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Length: 108716165
> Expect: 100-continue
> Content-Type: multipart/form-data; boundary=------------------------ab44aa4cd2db3c75
>
* Done waiting for 100-continue
< HTTP/1.1 413 Request Entity Too Large
< content-length: 0
* HTTP error before end of send, stop sending
<
* Closing connection 0
but I get:
413 Request Entity Too Large
Actual jar file size is:
$ du -h word-count-beam/target/word-count-beam-bundled-0.1.jar
113M word-count-beam/target/word-count-beam-bundled-0.1.jar
I'm running Flink in docker using 1.6.0-scala_2.11 image.
UPDATE: it's same when trying uploading from Web UI:
NOTE: jar upload feature worked with Flink 1.5 (Docker).

#robosoul , I think there is a rest limit in config, by default the max size is 104857600 in bytes, looks like you are exceeding the limit

Related

hawkBit swupdate Suricatta: HTTP/1.1 401 Unauthorized

I want to set up hawkBit (running on server) and swupdate (running on multiple clients - Linux OS) to perform OS/Software update in Suricatta mode.
1/ Follow up my post on hawkBit community, I've succeeded to run hawkBit in my server as below:
Exported to external link: http://:
Enabled MariaDB
Enabled Gateway Token Authentication (in hawkBit system configuration)
Created a software module
Uploaded an artifact
Created a distribution set
Assigned the software module to the distribution set
Create Targets (in Deployment Mangement UI) with Targets ID is "dev01"
Created a Rollout
Created a Target Filter
2/ I've succeeded to build/execute swupdate as SWupdate guideline
Enabled Suricatta daemon mode
Run swupdate: /usr/bin/swupdate -v -k /etc/public.pem -u '-t DEFAULT -u http://<domain>:<port> -i dev01'
I'm pretty sure this command isn't correct, output log as below:
* Trying <ip address>...
* TCP_NODELAY set
* Connected to <domain> (<ip address>) port <port> (#0)
> GET /DEFAULT/controller/v1/10 HTTP/1.1
Host: <domain>:<port>
User-Agent: libcurl-agent/1.0
Content-Type: application/json
Accept: application/json
charsets: utf-8
< HTTP/1.1 401 Unauthorized
< Date: Sun, 16 May 2021 02:43:40 GMT
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Length: 0
<
* Connection #0 to host <domain> left intact
[TRACE] : SWUPDATE running : [channel_log_effective_url] : Channel's effective URL resolved to http://<domain>:<port>/DEFAULT/controller/v1/dev01
[ERROR] : SWUPDATE failed [0] ERROR corelib/channel_curl.c : channel_get : 1109 : Channel operation returned HTTP error code 401.
[DEBUG] : SWUPDATE running : [suricatta_wait] : Sleeping for 45 seconds.
As per a suggestion from #laverman on Gitter:
You can use Gateway token in the Auth header of the request, e.g. “Authorization : GatewayToken a56cacb7290a8d8a96a2f149ab2f23d1”
but I don't know how the client sends this request (it should be sent by swupdate, right?)
3/ Follow up these instructions from Tutorial # EclipseCon Europe 2019, it guides me to send the request to provision multiple clients from hawkBit Device Simulator. And the problem is how to apply this to real devices.
Another confusion is: when creating new Software Module, Distribution on hawkBit UI, I can't find the ID of these, but creating by send request as Tutorial, I can see the IDs in the response.
So my questions are:
1/ Are my hawkBit setup steps correct?
2/ How can I configure/run swupdate (on clients) to perform the update: poll for new software, download, update, report status, ...
If my description isn't clear enough, please tell me.
Thanks
happy to see that you're trying out Hawkbit for your solution!
I have a few remarks:
The suricatta parameter for GatewayToken is -g, and -k for TargetToken respectively.
The -g <GATEWAY_TOKEN> needs to be set inside of the quotation marks
see SwUpdate Documentation
Example: /usr/bin/swupdate -v -u '-t DEFAULT -u http://<domain>:<port> -i dev01 -g 76430e1830c56f2ea656c9bbc88834a3'
For the GatewayToken Authentication, you need to provide the generated token in System Config view, it is a generated hashcode that looks similar to this example here
You can also authenticate each device/client separately using their own TargetToken.
You can find more information in the Hawkbit documentation

haproxy will get 404 error for about 2-3 seconds if one of backend server down

this is the haproxy config.
defaults
option forwardfor
log global
option httplog
log 127.0.0.1 local3
option dontlognull
retries 3
option redispatch
timeout connect 5000ms
timeout client 5000ms
timeout server 5000ms
listen stats
bind *:9000
mode http
..............................................
backend testhosts
mode http
balance roundrobin
option httpchk HEAD /sabrix/scripts/menu-common.js
server host1 11.11.11.11:9080 check inter 2000 rise 1 fall 2
server host2 11.11.11.12:9080 check inter 2000 rise 1 fall 2
if service of 11.11.11.11 is down, haproxy will get 503 and 404 error about 2-3 seconds( it depends inter value, if inter value is very
small, the number of 404 error will be decreased).
2020-08-25T11:58:14 11.11.11.11:9080 200 POST /tsturl1 HTTP/1.1 2274
2020-08-25T11:58:14 11.11.11.22:9080 200 POST /tsturl1 HTTP/1.1 448
2020-08-25T11:58:14 11.11.11.11:9080 503 POST /tsturl1 HTTP/1.1 0
2020-08-25T11:58:14 11.11.11.11:9080 404 POST /tsturl1 HTTP/1.1 0
2020-08-25T11:58:14 11.11.11.11:9080 200 POST /tsturl1 HTTP/1.1 1503
2020-08-25T11:58:16 11.11.11.22:9080 200 POST /tsturl1 HTTP/1.1 617
2020-08-25T11:58:16 11.11.11.11:9080 404 POST /tsturl1 HTTP/1.1 0
2020-08-25T11:58:16 11.11.11.22:9080 200 POST /tsturl1 HTTP/1.1 618
2020-08-25T11:58:16 11.11.11.11:9080 404 POST /tsturl1 HTTP/1.1 0
2020-08-25T11:58:16 host1 is DOWN, reason: Layer7 wrong status, code: 404, info: "Not Found", check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2020-08-25T11:58:16 11.11.11.22:9080 200 POST /tsturl1 HTTP/1.1 645
2020-08-25T11:58:16 11.11.11.22:9080 200 POST /tsturl1 HTTP/1.1 618\
My question is :
why the parameter retry didn't work ? is it possible for the user always get 200 code rather than 400 error even when one of the backend server down ?
I'm using Haproxy 1.5.18.
THanks a lot
In the version you are using retries is for Layer 4 (i.e. connection timeout). HAProxy 2.0 introduced Layer 7 retries. These 2 blog posts may be helpful:
HAProxy 2.0
Layer 7 Retries and Chaos Engineering

Google App Engine Flex environment: ETag HTTP header removed when resource is gzip'd?

I have a custom Node.js application deployed on Google App Engine flexible environment that uses dynamically calculated digest hashes to set ETag HTTP response headers for specific resources. This works fine on an AWS EC2 instance. But, not on Google App Engine flexible environment; in some cases Google App Engine appears to removes my application's custom ETag HTTP response header and this severely degrades the performance of the application. And, will be needlessly expensive.
Specifically, it appears that Google App Engine flex environment strips my application's ETag header when it gzip's eligible resources.
For example, if I use curl to request a utf8::application/json resource AND do not indicate that I will accept the response in compressed format then everything works as I would expect --- the resource is returned along with my custom ETag header that is a digest hash of the resource's data.
curl https://viewpath5.appspot.com/javascript/client-app-bundle.js --verbose
... we get the client-app-bundle.js as an uncompressed UTF8 resource along with a ETag HTTP response header whose value is a digest hash of the JavaScript file's data.
However, if I emulate my browser and set the Accept-Encoding HTTP request header to indicate to Google App Engine that my user agent (here curl) will accept a compressed resource, then I do not ever get the ETag HTTP response header.
$ curl --verbose https://xxxxxxxx.appspot.com/javascript/client-app-bundle.js
* Hostname was NOT found in DNS cache
* Trying zzz.yyy.xxx.www...
* Connected to xxxxxxxx.appspot.com (zzz.yyy.xxx.www) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
* subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.appspot.com
* start date: 2019-05-07 11:31:13 GMT
* expire date: 2019-07-30 10:54:00 GMT
* subjectAltName: xxxxxxxx.appspot.com matched
* issuer: C=US; O=Google Trust Services; CN=Google Internet Authority G3
* SSL certificate verify ok.
> GET /javascript/client-app-bundle.js HTTP/1.1
> User-Agent: curl/7.38.0
> Host: xxxxxxxx.appspot.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 23 May 2019 00:24:06 GMT
< Content-Type: application/javascript
< Content-Length: 4153789
< Vary: Accept-Encoding
< ETag: #encapsule/holism::kiA2cG3c9FzkpicHzr8ftQ
< Cache-Control: must-revalidate
* Server #encapsule/holism v0.0.13 is not blacklisted
< Server: #encapsule/holism v0.0.13
< Via: 1.1 google
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
<
/******/ (function(modules) { // webpackBootstrap
/******/ // The module cache
/******/ var installedModules = {};
/******/
.... and lots more JavaScript. Importantly note ETag in HTTP response headers.
COMPRESSED (FAILING) CASE:
$ curl --verbose -H "Accept-Encoding: gzip" https://xxxxxxxx.appspot.com/javascript/client-app-bundle.js
* Hostname was NOT found in DNS cache
* Trying zzz.yyy.xxx.www...
* Connected to xxxxxxxx.appspot.com (zzz.yyy.xxx.www) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
* subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.appspot.com
* start date: 2019-05-07 11:31:13 GMT
* expire date: 2019-07-30 10:54:00 GMT
* subjectAltName: xxxxxxxx.appspot.com matched
* issuer: C=US; O=Google Trust Services; CN=Google Internet Authority G3
* SSL certificate verify ok.
> GET /javascript/client-app-bundle.js HTTP/1.1
> User-Agent: curl/7.38.0
> Host: xxxxxxxx.appspot.com
> Accept: */*
> Accept-Encoding: gzip
>
< HTTP/1.1 200 OK
< Date: Thu, 23 May 2019 00:27:15 GMT
< Content-Type: application/javascript
< Vary: Accept-Encoding
< Cache-Control: must-revalidate
* Server #encapsule/holism v0.0.13 is not blacklisted
< Server: #encapsule/holism v0.0.13
< Content-Encoding: gzip
< Via: 1.1 google
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
< Transfer-Encoding: chunked
<
�}{{G���˧�(�.rb�6`����1ƀw���,���4�$�23,���UU_碋-��
No ETag?
To me it seems incorrect that my application's custom ETag HTTP response header is removed; the gzip compression on the server and subsequent decompression in the user agent should be wholly encapsulated as an implementation detail of the network transport?
This behavior is caused by the NGINX proxy side car container that handles requests on GAE flex.
NGINX removes ETag headers when compressing content, maybe to comply with the strong identity semantics of the ETag, which is byte-by-byte, but not sure of that.
Unfortunately there is no way to configure the NGINX proxy in GAE Flex (other than manually SSHing into the container in each instance, changing nginx.comf, and restarting the nginx proxy).
The only workaround I know is to loosen the ETag strictness by making it "weak" by prepending "W/" to the value as specified in https://www.rfc-editor.org/rfc/rfc7232.
This is not documented. There's already an internal feature request to the App Engine documentation team to include this behavior in our public documentation.

Google's resumable video upload status API endpoint for Google Drive is failing with HTTP 400: "Failed to parse Content-Range header.":

In order to resume an interrupted upload to Google Drive, we have implemented status requests to Google's API following this guide.
https://developers.google.com/drive/v3/web/resumable-upload#resume-upload
Request:
PUT
https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable&upload_id=UPLOAD_ID HTTP/1.1
Content-Length: 0
Content-Range: bytes */*
It works perfectly in most of the cases. However, the following error occurs occasionally and even retries of the same call result in the same erroneous response.
Response:
HTTP 400: "Failed to parse Content-Range header."
We are using the google.appengine.api.urlfetch Python library to make this request in our Python App Engine backend.
Any ideas?
EDIT:
I could replicate this issue using cURL
curl -X PUT 'https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable&upload_id=UPLOAD_ID' -H 'Content-Length: 0' -vvvv -H 'Content-Range: bytes */*'
Response:
* Trying 172.217.25.138...
* Connected to www.googleapis.com (172.217.25.138) port 443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 697 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
* server certificate verification OK
* server certificate status verification SKIPPED
* common name: *.googleapis.com (matched)
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: RSA
* certificate version: #3
* subject: C=US,ST=California,L=Mountain View,O=Google Inc,CN=*.googleapis.com
* start date: Thu, 16 Mar 2017 08:54:00 GMT
* expire date: Thu, 08 Jun 2017 08:54:00 GMT
* issuer: C=US,O=Google Inc,CN=Google Internet Authority G2
* compression: NULL
* ALPN, server accepted to use http/1.1
> PUT /upload/drive/v3/files?uploadType=resumable&upload_id=UPLOAD_ID HTTP/1.1
> Host: www.googleapis.com
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 0
> Content-Range: bytes */*
>
< HTTP/1.1 400 Bad Request
< X-GUploader-UploadID: UPLOAD_ID
< Content-Type:
< X-GUploader-UploadID: UPLOAD_ID
< Content-Length: 37
< Date: Thu, 23 Mar 2017 22:45:58 GMT
< Server: UploadServer
< Content-Type: text/html; charset=UTF-8
< Alt-Svc: quic=":443"; ma=2592000; v="37,36,35"
<
* Connection #0 to host www.googleapis.com left intact
Failed to parse Content-Range header.

GCP HTTP Load Balancer returns 502 error if POST data is large

We have an API server and are using HTTP Load Balancer. We found that the L7 Load balancer returns 502 error if HTTP request's data is large.
We have confirmed that it works when accessing the API without the Load Balancer (accessing the API Server directly.)
This question might be a similar issue. HTTP Load Balancer cuts out part of a large request body
Someone said that using L4 Network Load Balancer is a possible solution but we don't want to use it for some reasons e.g. URL based load balancing and cross-region load balancing.
// Response OK (data size is 1024)
curl -H "Content-Type: application/json" -X POST -d '{"xx": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"}' https://xxxxxxxxxxxxxxx.com/xx/xxxxxxxxxxxx/xxxxxxxxx
// Response NG (data size is 1025)
curl -H "Content-Type: application/json" -X POST -d '{"xx": "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"}' https://xxxxxxxxxxxxxxx.com/xx/xxxxxxxxxxxx/xxxxxxxxx
It seems that LB has some limitation about the size of post data. Tests show the limit is around 1024 bytes.
Update1
#chaintng saved me. Someone on the linked post says that curl adds "Expect: 100-continue Header" if the post data is over 1024 byte.
// Response NG (data size is 1025. without "Expect: ")
curl -H "Content-Type: application/json" -X POST -d '{"xx": "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"}' https://xxxxxxxxxxxxxxx.com/xx/xxxxxxxxxxxx/xxxxxxxxx
// Response OK (data size is 1025. with "Expect: ")
curl -H "Expect: " -H "Content-Type: application/json" -X POST -d '{"xx": "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"}' https://xxxxxxxxxxxxxxx.com/xx/xxxxxxxxxxxx/xxxxxxxxx
reference from this question Curl to Google Compute load balancer gets error 502
It's because CURL has default value when request large POST body defining header as Expect: 100-continue
Which is not support in Google L7 Load Balancing (stated in this document https://cloud.google.com/compute/docs/load-balancing/http/)
All you have to do is ignoring this behaviour by set the header before execute curl.
For e.g. in PHP
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Expect:']);

Resources