I am reading article titles that suggest h2o.ai integrates its ML in/with snowflake.
https://www.h2o.ai/resources/solution-brief/integration-of-h2o-driverless-ai-with-snowflake/
If I wanted to export a POJO learner like a gbm and have it run in snowflake, is there a clean way to do that? I didn't see any clear directions in the (several) articles I found.
How does that integrate with ML-ops?
One way to integrate models built in H2o.ai is to integrate through Snowflake External Functions.
This is documented at https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/snowflake-integration.html
H2o.ai also have (or will have shortly) support to deploy models into Snowflake Java UDFs and it is described in https://www.h2o.ai/blog/h2o-integrates-with-snowflake-snowpark-java-udfs-how-to-better-leverage-the-snowflake-data-marketplace-and-deploy-in-database/
Related
I want to create a Custom Apache Flink Sink to AWS Sagemaker Feature store, but there is no documentation for how to create custom sinks on Flink's website. There are also multiple base classes that I can potentially extend (e.g. AsyncSinkBase, RichSinkFunction), so I'm not sure which to use.
I am looking for guidelines regarding how to implement a custom sink (both in general and for my specific use-case). For my specific use-case: Sagemaker Feature Store has a synchronous client with a putRecord call to send records to AWS Sagemaker FS, so I am ideally looking for a way to create a custom sink that would work well with this client. Note: I require at at least once processing guarantees, as Sagemaker FS is DynamoDB (a key-value store) under the hood.
Java Client: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sagemakerfeaturestoreruntime/AmazonSageMakerFeatureStoreRuntime.html
Example of the putRecord call using the Python client: https://github.com/aws-samples/amazon-sagemaker-feature-store-streaming-aggregation/blob/main/src/lambda/StreamingIngestAggFeatures/lambda_function.py#L31
What I've Found so Far
Some older articles which say to use org.apache.flink.streaming.api.functions.sink.RichSinkFunction and SinkFunction
Some connectors using classes in org.apache.flink.connector.base.sink.writer (e.g. AsyncSinkWriter, AsyncSinkBase)
This section of the Flink docs says to use the SourceReaderBase from org.apache.flink.connector.base.source.reader when creating custom sources; SourceBaseReader seems to be the equivalent source to the sink classes in the bullet above
Any help/guidance/insights are much appreciated, thanks.
How about extending RichAsyncFunction ?
you can find similar example here - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api
so i made the model for my thesis and like to share it with my prof. over the anylogic cloud. Unfortunatelly when i try to export the model it shows:
"Model parameter values overridden by experiment 'Simulation' will not be used; default values will be exported."
and it will not show the simulation screen in the cloud version. Is this because i use the PLE, if no: how can i improve my export?
i already tried the documentation and various google searches, but i was not able to find anything useful.
Thank you in advance! :)
There is no simulation screen in the cloud. This is independent of your version.
Learn how to create simulation-setups in the cloud by checking the "Run configuration" part of your model before uploading to the cloud. Also check the help on that and example models (those uploaded by AnyLogic to the cloud and that you find matched in the AnyLogic example library).
I have been a long-time user of Google App Engine's Mapreduce library for processing data in the Google Datastore. Google no longer supports it and it doesn't work at all in Python 3. I'm trying to migrate our older Mapreduce jobs to Google's Dataflow / Apache Beam runner, but the official documentation is awful, it just describes Apache Beam, it does not tell you how to migrate.
In particular, the issues are this:
in Mapreduce, the jobs would run on your existing deployed application. However in Beam you have to create and deploy a custom Docker image to build the environment for Dataflow, is this right?
To create a new job template in Mapreduce, you just need to edit a yaml file and deploy it. To create one in Apache beam, you need to create custom runner code, a template file deployed to google cloud storage, and link up with the docker image, is this right?
Is the above accurate? If so, is it generally the case that working with Dataflow is much more difficult than Mapreduce? Are there any libraries or tips for making this easier?
In technical terms that's what is happening, but unless you have some specific advanced use-cases, you won't need to set any custom Docker images manually. Dataflow does some work in the background to run your user code and dependencies on a custom container so that it can execute your user-written code and dependencies on their VMs.
In Dataflow, writing a job template mainly requires writing some pipeline code in your chosen language (Java or Python), and possibly writing some metadata. Once your code is written, creating and staging the template itself isn't much different than running a normal Dataflow job. There's a page documenting the process.
I agree the page on Mapreduce to Beam migration is very sparse and unhelpful, although I think I understand why that is. Migrating from Mapreduce to Beam isn't a straightforward 1:1 migration where only the syntax changes. It's a different pipeline model and most likely will require some level of rewriting your code for the migration. A migration guide that fully covered everything would end up repeating most of the existing documentation.
Since it sounds like most of your questions are around setting up and executing Beam pipelines, I encourage you to begin with the Dataflow quickstart in your chosen language. It won't teach you how to write pipelines, but will teach you how to set up your environment to write and run pipelines. There are links in the quickstarts which direct you to Apache Beam tutorials that teach you the Beam API and how to write your own pipelines, and those will be useful for rewriting your Mapreduce code in Beam.
External database interface - Does Alfresco use point and click integration or is programming required to connect to the DB?
Can we use a 3rd party library like Google's zxing barcode reader to integrate with
Alfresco?
Regards
vish
What do you mean by external database interface? If we're talking about using external databases, then it's just a matter of configuration.
It's generally not a problem to integrate 3rd party libraries. You just have to decide how to do that, e.g. using a custom Java backed web script or a custom action.
No point and click integration. It's build on Java, so you can write a custom Java class. The java class can be run scheduled, via a webscript, via workflow.
Of course, it's open source and fully build on Java, so integrate any 3d party solution into it. But you need to write everything yourself.
already exists a component for Alfresco Share which allow feature #1 with "zero code", in easy and flexible way.
Is available in Alfresco Addons http://addons.alfresco.com/addons/alfresco-metadbconnector-component.
Developed and maintained by VenziaIT (http://venzia.es).
We hope this help to the community.
Greetings!
I have been developing an app using appengine. We are likely to be storing a lot of records in the datastore but I find the admin functionality you are given to manage this data lacking.
As an example, there are no good ways to bulk delete a bunch of data - you have to write a class of your own to do this.
Before I start down the path of building the admin ui and features I need to manage the datastore entities, I was wondering if anyone knows of a good 3rd party tool that's already been written to do this for me? Something that has basic CRUD functionality plus bulk import and bulk export features.
I am using the Python SDK.
You haven't specified whether you're using the Java or Python SDK, but if you're using Java App Engine, I suggest using the Objectify framework to interact with the datastore rather than the standard JDO/JPA method. It's much nicer.