batching of events google pub/sub and then publishing a collection as single event - apache beam - google-cloud-pubsub

I am trying to batch events using google pub/sub I see that I can set microbatches based on fixed windows batch size etc. However I want to send the result as a collection of messages - so a) batch the events and send results of the batch as a single event - Do I need a custom transformation or is there a capability from apache beam that will allow me to do that.

Related

How to subscribe to Salesforce connected app webhooks?

I want to implement a connected OAuth app in Salesforce which should trigger push events in case some entities changed, for example an opportunity was closed.
Zapier implemented something similar
https://zapier.com/apps/salesforce/integrations/webhook
Could not find something I need which is a simple way to subscribe to entity changes using the OAuth client's token and passing a webhook endpoint. I read about apex callouts, streaming API and outbound messages.
Yeah, we solved this exact problem at Fusebit and I can help you understand the process as well.
Typically speaking here's what you need to do:
Create triggers on the Salesforce Objects you want to get updates for
Upload Apex class that will send an outgoing message to a pre-determined URL
Enable Remote Site Setting for the Domain you want to send the message to
Add in Secret Verification (or other auth method) to prevent spamming of your external URL
If you're leveraging javascript, then you can use the jsforce sdk & salesforce tooling API to push the code into the salesforce instance AFTER the Auth flow has occurred AND on Salesforce Instances that have API access enabled (typically - this is enterprise and above OR professional with API enabled).
This will be helpful for you to look through: https://jamesward.com/2014/06/30/create-webhooks-on-salesforce-com/
FYI - Zapier's webhooks implementation is actually polling every 15 minutes, instead of real-time incoming events.
In which programming language?
For consuming outbound messages you just need to be able to accept an XML message and send back "Ack" message to acknowledge receiving, otherwise SF will keep trying to resend it for 24h.
For consuming platform events / streaming API / Change Data Capture (CDC) you'll need to raise the event in SF (Platform Event you could raise from code, flow, process builder, CDC would happen automatically, you just tell it which objects it should track).
And then in client app you'd need to login to SF (SOAP or REST API), subscribe to channel (any library that supports cometd should be fine). Have you seen "EMP Connector", mentioned for example in https://trailhead.salesforce.com/en/content/learn/modules/change-data-capture/subscribe-to-events?trail_id=architect-solutions-with-the-right-api ?
Picking right messaging way is an art, there's free course that can help: https://trailhead.salesforce.com/en/content/learn/trails/architect-solutions-with-the-right-api
And pretty awesome PDF if you want to study for certification: https://resources.docs.salesforce.com/sfdc/pdf/integration_patterns_and_practices.pdf

Salesforce platform event duplicate events with ComeTd Client

I have built Salesforce platform event client using CometD java and It's similar to example EMP-Connector provided by forcedotcom.
I installed this client on OpenShift cloud and my app is running with 2 pods. Problem I am facing is that since there are two pods and each are running the same docker image thus both are getting the same of event. That means duplication event.
Per my understanding, Salesforce platform event should behave like Kafka subscriber.
I am unable to find a solution about how to avoid getting the same copy of events. Any suggestions here would be the great help.
Note: As of now I am able to create client side solution which drop duplicate copy of event. which is not an optimal solution.
I have to run my app atleast with 2 pods. That's limit I have on my cloud.
This is expected / by design. In CometD, when a message is published on a broadcast channel all subscribers listening on this channel will receive a copy of this message. The broadcast channel behaves like a messaging topic where one sender wants to send the same info to multiple recipients. There are other types of channels in CometD with different semantics. The broadcast channel and one-to-many message semantics is what you get with platform events available via CometD in Salesforce.
In your case it sounds like you have multiple subscribers, thus what you're seeing is expected. You can deduplicate the message stream on the client side as you have done or you can change your architecture so that you have a single subscription.

Spark job callback

Maybe you can help me with my problem
I start spark job on google-dataproc through API. This job writes results on the google data storage.
When it will be finished I want to get a callback to my application.
Do you know any way to get it? I don't want to track job status through API each time.
Thanks in advance!
I'll agree that it would be nice if there was to either wait for or get a callback for when operations such as VM creation, cluster creation, job completion, etc finish. Out of curiosity, are you using one of the api clients (like google-cloud-java), or are you using the REST API directly?
In the mean time, there are a couple of workarounds that come to mind:
1) Google Cloud Storage (GCS) callbacks
GCS can trigger callbacks (either Cloud Functions or PubSub notifications) when you create files. You can create an file at the end of your Spark job, which will then trigger a notification. Or, just add a trigger for when you put an output file on GCS.
If you're modifying the job anyway, you could also just have the Spark job call back directly to your application when it's done.
2) Use the gcloud command line tool (probably not the best choice for web servers)
gcloud already waits for jobs to complete. You can either use gcloud dataproc jobs submit spark ... to submit and wait for a new job to finish, or gcloud dataproc jobs wait <jobid> to wait for an in-progress job to finish.
That being said, if you're purely looking for a callback for choosing whether to run another job, consider using Apache Airflow + Cloud Composer.
In general, the more you tell us about what you're trying to accomplish, we can help you better :)

Nagios3 configuration for sending simple text message on some port

I am developing an application and decided Nagios3 for performing monitoring stuff. But I am stuck at two points. I am using check_http plug-in for monitoring load on my service api. Now I want to perform below tasks.
I need to set a threshold in check_http for performing some task after crossing that threshold. I tried below command
'check_command check_nrpe_1arg!check_service_api'
but it only tells me the load, not any threshold is set. while below one doesn't work.
'check_command check_service_api!100!200'
I need to send simple text message on some port(my application).
I am new to Nagios, so please help me figuring out the solution except email notification stuff.
There is a check command that you can download called "notify_sms" that integrates with an API server hosted by a company called Esendex. They charge for their service but it works well.

How do you make Salesforce ping my application?

I have data in Salesforce and run another application that works with the same data. The current workflow is that when data is entered into the custom application, it sends the information to Salesforce via SOAP. I want to establish the reverse link; when a value is changed on the Salesforce side, I want Salesforce to ping my application with the changes. Does Salesforce have a feature to do this? Something equivalent to a trigger maybe?
My current solution is mindless iteration through all Salesforce records. This is slow, hits the API limit often, and keeps data stale too long.
You can do this using Streaming API
Introduction:
Use Streaming API to receive notifications for changes to Salesforce data.
Use to push relevant data in realtime, instead of having to refresh the screen to get new information. Protocols Use for Connection:
The Bayeux protocol and CometD both use long polling.
Bayeux is a protocol for transporting asynchronous messages, primarily over HTTP.
CometD is a scalable HTTP-based event routing bus that uses an AJAX push technology pattern known as Comet. It implements the Bayeux
protocol. The Salesforce servers use version 2.0 of CometD.
How it Works:
Create a PushTopic based on a SOQL query. This defines the channel. (PushTopic is a standard object).
Clients subscribe to the channel.
A record is created, updated, deleted, or undeleted (an event occurs). The changes to that record are evaluated.
If the record changes match the criteria of the PushTopic query, a notification is generated by the server and received by the subscribed
clients.
Please check this link : http://www.salesforce.com/developer/docs/api_streaming/

Resources