I am quite new to airflow and trying to use the integration of apache airflow with google pubsub, which I guess is added to it under "Airflow-300" JIRA. Please correct me if I am reading incorrectly here.
Also, can you please advise if this has been released or when will it be released? We are looking at adding notifications on Google Cloud Storage, upon any file events, we want to trigger some workflow in Airflow.
I can't seem to find any documentation around how to use it.
Any advice would be highly appreciated.
Integration in Airflow has already been introduced.
Publish message
from base64 import b64encode as b64e
m1 = {'data': b64e('Hello, World!'),
'attributes': {'type': 'greeting'}
}
m2 = {'data': b64e('Knock, knock')}
m3 = {'attributes': {'foo': ''}}
t1 = PubSubPublishOperator(
topic='my_topic',
messages=[m1, m2, m3],
create_topic=True,
dag=dag)
Receive message
PubSubPullSensor(
task_id='pub_sub_wait',
project='my_project',
subscription='my-subscription',
ack_messages=True)
Reference:
https://github.com/apache/incubator-airflow/commit/d231dce37d753ed196a26d9b244ddf376385de38
https://github.com/apache/incubator-airflow/commit/6645218092096e4b10fc737a62bacc2670e1d6dc
Adding to #user1849502's answer, you can also use PubSubHook:
PubSubHook().publish(project, topic, message)
PubSubHook().pull(project, subscription, max_messages, return_immediately)
Reference https://airflow.readthedocs.io/en/stable/_modules/airflow/contrib/hooks/gcp_pubsub_hook.html
Related
I have a GCP Cloud Function publishing events into an output PubSub topic. If I have the output topic pre-created - all is fine.
However, I'd like to have the Cloud Function to auto create the topic if it does not already exist. Is it possible?
I have this code for publishing:
publisher = pubsub_v1.PublisherClient()
data = json.dumps(event).encode("utf-8")
topic_path = publisher.topic_path(project, topic)
publisher.publish(topic_path, data)
I know I could add a call to create the topic first:
topic = publisher.create_topic(request={"name": topic_path})
and catch an exception if it tires to create an existing topic
but this sounds more like a dirty hack to me, not to mention unnecessary call to the publisher API and wasted cycles handling the call and exceptions on every CF invocation ....
Thank you!
I have got an mail saying Legacy GAE and GCF Metadata Server endpoints will be turned down on April 30, 2020.
I need to update my metadata server endpoints to v1. But how do I know the current versions of my metadata server endpoints.
I have checked the google cloud documentation of migrating to v1 metadata server. It has given two commands but I really don't know what it meant and where it has to be run.
I had an eye on the documentation and tried these two commands
curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/legacy-endpoint-access/0.1
curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/legacy-endpoint-access/v1beta1
but ended up with an error saying
curl: (6) Could not resolve host: metadata.google.internal
When I put my local host I am getting the output as
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.4.6 (Ubuntu)</center>
</body>
</html>
Don't know how to proceed further.
Please help me.
Thank you in advance!
Searching around about it, as per the documentation Storing and retrieving instance metadata the v0 Metadata version is deprecated and it's recommended to be moved to v1.
I would recommend you to access the documentation Migrating to v1 metadata server endpoint, that will provide more information to you on how to migrate to the version v1 Metadata.
Let me know if the information helped you!
After a thorough reading of documentation, I have understood that my metadata server endpoints will be automatically updated to v1 by the gcloud.
The only thing that we supposed to do is to find the processes, applications, or images that are using the deprecated metadata server endpoints and update the dependencies(related to gcloud) to the latest version.
That's it! It is successfully updated to v1 metadata server.
I'm using google-cloud-pubsub in Django and the Google Cloud Pub/Sub Emulator.
I'm trying to create a topic in this way:
publisher = pubsub.PublisherClient ()
topic_path = publisher.topic_path ('my-project', 'my-new-topic')
topic = publisher.create_topic (topic_path)
topic.publish (topic, request.data ['url']. encode ('utf-8'))
but it gives me this exception:
StatusCode.PERMISSION_DENIED, User not authorized to perform this action.
I'm wondering if there is anything else to configure except the PUBSUB_EMULATOR_HOST environment variable.
Do I have to set some permissions, even for the emulator?
The tutorial doesn't explain much more.
Thanks in advance.
4 years late :) but this should solve it:
export PUBSUB_EMULATOR_HOST=localhost:8085
Why
The command in google tutorial (gcloud beta emulators pubsub env-init) for some reason doesn't set the environment variable. So your pubsub apis call the service account instead of the local emulator which result in permission errors.
Manually set it and verify it's set using echo $PUBSUB_EMULATOR_HOST
Suddenly my gapi client stopped sending request params to endpoint.
This is how my code looks like
Load the gapi JS
https://apis.google.com/js/client.js?onload=initGoogleApis
in initGoogleApis
function initGoogleApis() {
var ROOT = HOST + "/_ah/api";
gapi.client.load("userendpoint", "v1", function() {
userendpoint = gapi.client.userendpoint;
}, ROOT); }
Now when I query userendpoint.<some function>, then it is not passing the request params to endpoint
NOTE: it was working fine till today morning.
Anyone else facing the same issue? (this might be due to some update in the gapi library)
This issue has been resolved as of yesterday 2014-09-23 08:00 (US Pacific Time).
Details about this issue can be found in the Google App Engine Downtime Notify Group
However 'Google APIs Client Library for JavaScript' is still in Beta and breaking changes have been rolled out more than once. Clound Endpoints themselves are out of beta and can be used for production use.
Now, to properly answer this questions:
The simple advice here is: Don't use beta products for production applications.
To avoid problems with Google APIs Client Library for JavaScript, just don't use it. You can write your own REST API client that will not be affected by changes to the JavaScript library from Google. I have done this for testing purposes a couple of times and it is not hard, just a lot of work depending on how many endpoints you have and how complex they are.
We have the same problem on two projects.
I think that Google has deoployed a new version of the "https://apis.google.com/js/client.js" and it dosen't works as expected...
We need to open a ticket to Google support. If I have any news I will report them to you.
Google reports (https://groups.google.com/forum/#!topic/google-appengine-downtime-notify/t9GElAJwj8U):
We are currently experiencing an issue with Google Cloud Endpoints where the GAPI Javascript client is unable to pass request parameters. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by Tuesday, 2014-09-23 05:00 (all times are in US/Pacific) with current details, and if available an estimated time for resolution.
Update:
We have fixed the issue affecting Google Cloud Endpoints JavaScript client and are gradually rolling-out a fixed version. We estimate full resolution of the issue by 06:30 US/Pacific Pacific. We will provide an update by 06:00 AM.
Update:
Now it works for me.
Marco
UPDATE: Please, if anyone can help: Google is waiting for inputs and examples of this problem on their bug tracking tool. If you have reproducible steps for this issue, please share them on: https://code.google.com/p/googleappengine/issues/detail?id=10937
I'm trying to fetch data from the StackExchange API using a Google App Engine backend. As you may know, some of StackExchange's APIs are site-specific, requiring developers to run queries against every site the user is registered in.
So, here's my backend code for fetching timeline data from these sites. The feed_info_site variable holds the StackExchange site name (such as 'security', 'serverfault', etc.).
data = json.loads(urllib.urlopen("%sme/timeline?%s" %
(self.API_BASE_URL, urllib.urlencode({"pagesize": 100,
"fromdate": se_since_timestamp, "filter": "!9WWBR
(nmw", "site": feed_info_site, "access_token":
decrypt(self.API_ACCESS_TOKEN_SECRET, self.access_token), "key":
self.API_APP_KEY}))).read())
for item in data['items']:
... # code for parsing timeline items
When running this query on all sites except Stack Overflow, everything works OK. What's weird is, when the feed_info_site variable is set to 'stackoverflow', I get the following error from Google App Engine:
HTTPException: Invalid and/or missing SSL certificate for URL:
https://api.stackexchange.com/2.2/me/timeline?
filter=%219WWBR%28nmw&access_token=
<ACCESS_TOKEN_REMOVED>&fromdate=1&pagesize=100&key=
<API_KEY_REMOVED>&site=stackoverflow
Of course, if I run the same query in Safari, I get the JSON results I'm expecting from the API. So the problem really lies in Google's URLfetch service. I found several topics here on Stack Overflow related to similar HTTPS/SSL exceptions, but no accepted answer solved my problems. I tried removing cacerts.txt files. I also tried making the call with validate_certificate=False, with no success.
I think the problem is not strictly related to HTTPS/SSL. If so, how would you explain that changing a single API parameter would make the request to fail?
Wait for the next update to the app engine (scheduled one soon) then update.
Replace browserid.org/verify with another service (verifier.loogin.persona.org/verify is a good service hosted by Mozilla what could be used)
Make sure cacerts.txt doesnt exist (looks like you have sorted but just in-case :-) )
Attempt again
Good luck!
-Brendan
I was facing the same error, google has updated the app engine now, error resolved, please check the updated docs.