Why does Activity Pub make servers wrap "Note" objects into "Create" activities? - json-ld

What I've gathered is that new posts are published by POSTing a JSON-LD Activity Streams object of type Note to an actor's outbox.
{"#context": "https://www.w3.org/ns/activitystreams",
"type": "Note",
"to": ["https://chatty.example/ben/"],
"attributedTo": "https://social.example/alyssa/",
"content": "Say, did you finish reading that book I lent you?"}
The server will then have wrap it into an activity of type Create.
{"#context": "https://www.w3.org/ns/activitystreams",
"type": "Create",
"id": "https://social.example/alyssa/posts/a29a6843-9feb-4c74-a7f7-081b9c9201d3",
"to": ["https://chatty.example/ben/"],
"actor": "https://social.example/alyssa/",
"object": {"type": "Note",
"id": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
"attributedTo": "https://social.example/alyssa/",
"to": ["https://chatty.example/ben/"],
"content": "Say, did you finish reading that book I lent you?"}}
I fail to see the usefulness of this, as the wrapping activity doesn't seem to add any useful data to the wrapped note. Worse even, it seems like it might introduce a fair bit of redundancy to the responses (in this basic example from the official page, actor and attributedTo, as well as the 2 to fields, have exactly the same purpose). Is this perhaps done just for consistency, as there are a few other other activity types that are applied to notes, and for newly created posts having just a plain object (or a collection of plain objects) as a response would not fit this way of doing things?
Also, why are other activity types (e.g., Like) able to simply reference notes by id, while Create activities enclose that data directly? Is that required or is there a specific reason for it?
{"#context": "https://www.w3.org/ns/activitystreams",
"type": "Like",
"id": "https://social.example/alyssa/posts/5312e10e-5110-42e5-a09b-934882b3ecec",
"to": ["https://chatty.example/ben/"],
"actor": "https://social.example/alyssa/",
"object": "https://chatty.example/ben/p/51086"}

ActivityStreams is a protocole to synchronise datas between different databases and softwares.
Some actions contain the detail (like for Notes) because we can create/update them and an activity stream reader must know the changes to display the good datas to its users.
Some operations like Like have a named cancel operation like Unlike. So we don't need the Create/Update container. and because they operate on existing datas, we also only need the unique ID to the concerned resource.

Related

What syntax should be used for reading the final row in an Array on the Mapping tab on the Copy Data activity in Azure Data Factory / Synapse?

I'm using the copy data activity in Azure Data Factory to copy data from an API to our data lake for alerting & reporting purposes. The API response is comprised of multiple complex nested JSON arrays with key-value pairs. The API is updated on a quarter-hourly basis and data is only held for 2 days before falling off the stack. The API adopts an oldest-to-newest record structure and so the newest addition to the array would be the final item in the array as opposed to the first.
My requirement is to copy only the most recent record from the API as opposed to the collection - so the 192th reading or item 191 of the array (with the array starting at 0.)
Due to the nature of the solution, there are times when the API isn't being updated as the sensors that collect and send over the data to the server may not be reachable.
The current solution is triggered every 15 minutes and tries a copy data activity of item 191, then 190, then 189 and so on. After 6 attempts it fails and so the record is missed.
current pipeline structure
I have used the mapping tab to specify the items in the array as follows (copy attempt 1 example):
$['meta']['params']['sensors'][*]['name']
$['meta']['sensorReadings'][*]['readings'][191]['dateTime']
$['meta']['sensorReadings'][*]['readings'][191]['value']
Instead of explicitly referencing the array number, I was wondering if it is possible to reference the last item of the array in the above code?
I understand we can use 0 for the first record however I don't understand how to reference the final item. I've tried the following using the 'last' function but am unsure of how to place it:
$['meta']['sensorReadings'][*]['readings'][last]['dateTime']
$['meta']['sensorReadings'][*]['readings']['last']['dateTime']
last['meta']['sensorReadings'][*]['readings']['dateTime']
$['meta']['sensorReadings'][*]['readings']last['dateTime']
Any help or advice on a better way to proceed would be greatly appreciated.
Can you call your API with a Web activity? If so, this pulls the API result into the data pipeline and then apply ADF functions like last to it.
A simple example calling the UK Gov Bank Holidays API:
This returns a resultset that looks like this:
{
"england-and-wales": {
"division": "england-and-wales",
"events": [
{
"title": "New Year’s Day",
"date": "2017-01-02",
"notes": "Substitute day",
"bunting": true
},
{
"title": "Good Friday",
"date": "2017-04-14",
"notes": "",
"bunting": false
},
{
"title": "Easter Monday",
"date": "2017-04-17",
"notes": "",
"bunting": true
},
... etc
You can now apply the last function to is, e.g. using a Set Variable activity:
#string(last(activity('Web1').output['england-and-wales'].events))
Which yields the last bank holiday of 2023:
{
"name": "varWorking",
"value": "{\"title\":\"Boxing Day\",\"date\":\"2023-12-26\",\"notes\":\"\",\"bunting\":true}"
}
Or
#string(last(activity('Web1').output['england-and-wales'].events).date)

Best Firestore data model for file tree data

I'm currently building an app that needs to store data with a similar structure as a file tree. It looks something like this:
{
"type": "folder",
"name": "folder A",
"private": false,
"updatedAt": 1231243,
"items": [
{
"type": "folder",
"name": "subfolder A",
"private": false,
"updatedAt": 1231243,
"items": [
{
"type": "file",
"name": "file1"
}
]
},
{
"type": "file",
"name": "file2"
}
]
}
What I'm aware so far, there are 3 ways of implementing this.
Just dump all the data in 1 doc
Create a subcollection for each items array
Create a flat folder structure and save the parent folder id for lookup
I'm looking for a way to get all of this data with as minimum query as possible, the ideal use case would be only having the root folder id and getting all the subfolder & items. But I'm not sure if that's possible.
Also, I'm planning to subscribe to the data in the future, so the file tree will be updated in real-time.
Please give me a suggestion on what should the data model looks like
Update 1
To give more clarification about the query that I will need:
I will only need to get the topmost folder (folder A) in this example and the items under it
I won't need to get the nested items directly
example: getting Subfolder A / File 2 without accessing Folder A
If all you need is to get the entire structure by some unique ID, just put it all in one document, and get() it using the known ID. You have a 1MB limit per document.
It can get more complicated if you need to access items within lists (which is not really possible to do without reading the entire document), so this would actually be a bad idea for that case.
You would also want to reconsider this if you intend to update the elements of this document frequently, as there is a limit of 1 write per second (sustained). You would want to split it up otherwise.

HATEOAS and forms driven by the API

I'm trying to apply HATEOAS to the existing application and I'm having trouble with modeling a form inputs that would be driven by the API response.
The app is allowing to search & book connections between two places. First endpoint allows for searching the connections GET /connections?from={lat,lon}&to={lat,lon}&departure={dateTime} and returns following payload (response body).
[
{
"id": "aaa",
"carrier": "Fast Bus",
"price": 3.20,
"departure": "2019-04-05T12:30"
},
{
"id": "bbb",
"carrier": "Airport Bus",
"price": 4.60,
"departure": "2019-04-05T13:30"
},
{
"id": "ccc",
"carrier": "Slow bus",
"price": 1.60,
"departure": "2019-04-05T11:30"
}
]
In order to make an order for one of connections, the client needs to make a POST /orders request with one of following payloads (request body):
email required
{
"connectionId": "aaa",
"email": "passenger#example.org"
}
email & flight number required (carrier handles only aiprort connections)
{
"connectionId": "bbb",
"email": "passenger#example.org",
"flightNumber": "EA1234"
}
phone number required
{
"connectionId": "ccc",
"phoneNumber": "+44 111 222 333"
}
The payload is different, because different connections may be handled by different carriers and each of them may require some different set of information to provide. I would like to inform the API client, what fields are required when creating an order. The question I have is how do I do this with HATEOAS?
I checked different specs and this is what I could tell from reading the specs:
HAL & HAL-FORMS There are "_templates" but, there is no URI in the template itself. It’s presumed to operate on the self link, which in my case would be /connections... not /orders.
JSON-LD I couldn't find anything about forms or templates support.
JSON-API I couldn't find anything about forms or templates support.
Collection+JSON There is at most one "template" per document, therefore it's presumed that all elements of the collection have the same fields which is not the case in my app.
Siren Looks like the "actions" would fit my use case, but the project seems dead and there are no supporting libraries for many major languages.
CPHL The project seems dead, very little documentation and no libraries.
Ion There is nice support for forms, but I couldn't find any supporting libraries. Looks like it's just a spec for now.
Is such a common problem as having forms driven by the API still unsolved with spec and tooling?
In your example, it appears that Connections are resources. It's not completely clear if Orders are truly resources. I'm guessing probably yes, but to have an Order you need a Client and Connection. So, to create an Order you will need to expose a collection, likely from the Client or Connection, possibly both.
I think the disconnect is from thinking along the lines of "now that we've got a list of available connections, the client can select one and create an Order." That's perfectly valid, but it's remote procedure call (RPC) thinking, not REST. Neither is objectively better than the other, except in the context of a particular set of project requirements, and generally they shouldn't be mixed together.
With an RPC mindset, a create order method is defined (e.g. using OpenAPI) and any clients are expected to use some out-of-band information to determine the correct form required (i.e. by reading the OpenAPI spec).
With a REST/HATEOAS mindset, the correct approach would be to expose a Orders collection from Connection. Each Connection in the collection has a self link and a Orders collection (link or object, as defined by app requirements). Each item of Order has a self link, and that is where the affordances are specified. An Order is a known type (even with REST/HATEOAS the client and service have to at least agree on a shared vocabulary) that the client presumably knows how to define. That vocabulary can be defined using any mechanism that works -- json-ld, XSD, etc.
HATEOAS requires that the result contains everything the client needs to update the state. There can be no out-of-band information (other than the shared vocabulary). So, to solve your issue, you either need to expose a collection of Orders from Connection or you need to allow an Order to be created by posting to Connection. If the latter seems like a bit of a hack, it probably is.
For example, in HAL-Forms, I would do something like:
{
"connections": [{
"id": "aaa",
"carrier": "Fast Bus",
"price": 3.20,
"departure": "2019-04-05T12:30"
"_links": {
"self": { ... }, // link to this connection
"orders": {} // link to collection of orders for this connection
}
},
, ...],
"_links": {
"self": { ... } // link to the collection
},
"_templates": { ... } // post/put/patch/delete connection
}
Clients would follow the links to orders and from there would get the _templates collection that contains the instructions for managing the Order resources. The Order POST would likely require a connection identifier and client information. The HAL-Forms Spec defines a regex property that can be used to specify the type of data to supply for any particular form element. Since you have reached the order by navigating through a specific connection, you would be able to specify in your _templates for that order exactly which fields are required. e.g. /orders?connectionType=aaa would return a different set of required properties than /orders?connectionType=bbb but both use the same self link of /orders?connectionType={type} and you'd validate it on POST/PUT/PATCH.
I should note that the Spring-HATEOAS goes beyond the HAL-Forms spec and allows for multiple _links and _templates. See this GitHub issue.
It may look like HATEOAS/REST requires quite a bit more work than a simple OpenAPI/RPC API and it does. But what you are giving up in simplicity, you are gaining in flexibility and resilience, assuming well-designed clients. Which approach is correct depends on a lot of factors, most of them not technical (team skills, expected consumers, how much control you have over clients, maintenance, etc.).

How to define entities with intent independent in RASA NLU?

I am working in RASA NLU to extract intents and entities in Arabic language, and I have my own entities such as (places, org, and people) and i want to add these entities without any intent.
I just want to add them as an entities and their type.
How can I do this?
Are you using .md or .json as training data file? I cannot think of a solution to pass an entity in markdown format without defining an intent. But in json format you may pass a text without defining an intent by simply not writing a value for the dictionary key "intent". See example below.
The documentation https://rasa.com/docs/nlu/dataformat/ says that intent is an optional field, so it should work.
{
"text": "show me chinese restaurants",
"intent": ,
"entities": [
{
"start": 8,
"end": 15,
"value": "chinese",
"entity": "cuisine"
}
]
}
I would try leaving the value of "intent" completely empty, insert None or Null.

My custom slot type is taking on unexpected values

I noticed something strange when testing my interaction model with the Alexa skills kit.
I defined a custom slot type, like so:
CAR_MAKERS Mercedes | BMW | Volkswagen
And my intent scheme was something like:
{
"intents": [
{
"intent": "CountCarsIntent",
"slots": [
{
"name": "CarMaker",
"type": "CAR_MAKERS"
},
...
with sample utterances such as:
CountCarsIntent Add {Amount} cars to {CarMaker}
Now, when testing in the developer console, I noticed that I can write stuff like:
"Add three cars to Ford"
And it will actually parse this correctly! Even though "Ford" was never mentioned in the interaction model! The lambda request is:
"request": {
"type": "IntentRequest",
...
"intent": {
"name": "CountCarsIntent",
"slots": {
"CarMaker": {
"name": "ExpenseCategory",
"value": "whatever"
},
...
This really surprises me, because the documentation on custom slot types is pretty clear about the fact that the slot can only take the values which are listed in the interaction model.
Now, it seems that values are also parsed dynamically! Is this a new feature, or am I missing something?
Actually that is normal (and good, IMO). Alexa uses the word list that you provide as a guide, not a definitive list.
If it didn't have this flexibility then there would be no way to know if users were using words that you weren't expecting. This way you can learn and improve your list and handling.
Alexa treat the provided slot values as 'Samples'. Hence slot values which are not mentioned in interaction model will also get mapped.
When you create a custom slot type, a key concept to understand is
that this is training data for Alexa’s NLP (natural language
processing). The values you provide are NOT a strict enum or array
that limit what the user can say. This has two implications
1) words and phrases not in your slot values will be passed to you,
2) your code needs to perform any validation you require if what’s
said is unknown.
Since you know the acceptable values for that slot, always perform a slot-value validation on your code. In this way when you get something other than a valid car manufacturer or something which you don't support, you can always politely respond back like
"Sorry I didn't understand, can you repeat"
or
"Sorry we dont have in our list. can you please
select something from [give some samples from your list]"
More info here

Resources