Append sensor data into document in Couchbase database - database

We are a group of people writing a bachelor-project about storing sensor data into a noSQL-database, and we have chosen couchbase for this.
We want to store quite a few data in the same document, one document per day, per sensor, and we want to append new sensor data witch comes in every minute.
But unforunatly, we are not able to append new data into existing document without overwriting the existing data.
The structure for the documents is:
DocumentID: Sensor + date, ie: KitchenTemperature20180227
{
"topic": "Kitchen/Temp",
"type": "temperature",
"unit": "DegC"
"20180227130400": [
{
"data": "24"
}
],
..............
"20180227130500": [
{
"data": "25"
}
],
}
We are all new to couchbase and NoSql-databases, but eager to learn and understand how we the best way should implemet this.
We've tried upsert, insert and update commands, but they all overwrite the existing document or won't execute because the document already exists. As you can see, we have some top-level information, like topic, type, unit. The rest should be data coming in every minute and appended to the existing document.
Help on how to proceed would be very appriciated.
Best regards, Kenneth

In this case you can use the subdocument API. This allows you to modify portions of a document based on a "path". This image gives the idea for getting a subdocument.
You can mutate subdocuments as well. Look at the subdocument API documentation for Couchbase. There are also blog posts that go through examples in Java and Go on the Couchbase blog site.

Related

What syntax should be used for reading the final row in an Array on the Mapping tab on the Copy Data activity in Azure Data Factory / Synapse?

I'm using the copy data activity in Azure Data Factory to copy data from an API to our data lake for alerting & reporting purposes. The API response is comprised of multiple complex nested JSON arrays with key-value pairs. The API is updated on a quarter-hourly basis and data is only held for 2 days before falling off the stack. The API adopts an oldest-to-newest record structure and so the newest addition to the array would be the final item in the array as opposed to the first.
My requirement is to copy only the most recent record from the API as opposed to the collection - so the 192th reading or item 191 of the array (with the array starting at 0.)
Due to the nature of the solution, there are times when the API isn't being updated as the sensors that collect and send over the data to the server may not be reachable.
The current solution is triggered every 15 minutes and tries a copy data activity of item 191, then 190, then 189 and so on. After 6 attempts it fails and so the record is missed.
current pipeline structure
I have used the mapping tab to specify the items in the array as follows (copy attempt 1 example):
$['meta']['params']['sensors'][*]['name']
$['meta']['sensorReadings'][*]['readings'][191]['dateTime']
$['meta']['sensorReadings'][*]['readings'][191]['value']
Instead of explicitly referencing the array number, I was wondering if it is possible to reference the last item of the array in the above code?
I understand we can use 0 for the first record however I don't understand how to reference the final item. I've tried the following using the 'last' function but am unsure of how to place it:
$['meta']['sensorReadings'][*]['readings'][last]['dateTime']
$['meta']['sensorReadings'][*]['readings']['last']['dateTime']
last['meta']['sensorReadings'][*]['readings']['dateTime']
$['meta']['sensorReadings'][*]['readings']last['dateTime']
Any help or advice on a better way to proceed would be greatly appreciated.
Can you call your API with a Web activity? If so, this pulls the API result into the data pipeline and then apply ADF functions like last to it.
A simple example calling the UK Gov Bank Holidays API:
This returns a resultset that looks like this:
{
"england-and-wales": {
"division": "england-and-wales",
"events": [
{
"title": "New Year’s Day",
"date": "2017-01-02",
"notes": "Substitute day",
"bunting": true
},
{
"title": "Good Friday",
"date": "2017-04-14",
"notes": "",
"bunting": false
},
{
"title": "Easter Monday",
"date": "2017-04-17",
"notes": "",
"bunting": true
},
... etc
You can now apply the last function to is, e.g. using a Set Variable activity:
#string(last(activity('Web1').output['england-and-wales'].events))
Which yields the last bank holiday of 2023:
{
"name": "varWorking",
"value": "{\"title\":\"Boxing Day\",\"date\":\"2023-12-26\",\"notes\":\"\",\"bunting\":true}"
}
Or
#string(last(activity('Web1').output['england-and-wales'].events).date)

Read JSON from rest API as is with Azure Data Factory

I'm trying to get Azure Data Factory to read my REST API and put it in SQL Server. The source is a REST API and the sink is a SQL Server table.
I tried to do something like:
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"$": "json"
},
"collectionReference": "$.tickets"
}
The source looks like:
{ "tickets": [ {... }, {...} ] }
Because of the poor mapping capabilities I'm choosing this path. I'll then split the data with a query. Preferbly I'd like to store each object inside tickets as a row with JSON of that object.
In short, how can I get the JSON output from the RestSource to a SqlSink single column text/nvarchar(max) column?
I managed to solve the same issue by modifying mapping manually.
ADF anyway tries to parse json, but from the Advanced mode you can edit json paths. Ex., this is the original schema parsed automatically by ADF
https://imgur.com/Y7QhcDI
Once opened in Advanced mode it will show full paths by adding indexes of the elements, something similar to $tickets[0][] etc
Try to delete all other columns and keep the only one $tickets (the highest level one), in my case it was $value https://i.stack.imgur.com/WnAzC.jpg. As the result the entire json will be written into the destination column.
If there are pagination rules in place, each page will be written as a single row.

How to fetch data from couchDB using couch api?

Instead keys and IDs alone, I want to get all the docs via couch api. I have tried with GET "http://localhost:5984/db-name/_all_docs" but it returned
{
"total_rows":4,
"offset":0,
"rows":[
{"id":"11","key":"11","value":{"rev":"1-a0206631250822b37640085c490a1b9f"}},
{"id":"18","key":"18","value":{"rev":"30-f0798ed72ceb3db86501c69ed4efa39b"}},
{"id":"3","key":"3","value":{"rev":"15-0dcb22bab2b640b4dc0b19e07c945f39"}},
{"id":"6","key":"6","value":{"rev":"4-d76008cc44109bd31dd32d26ba03125d"}}
]
}
From the documentation
for the below request it will send the data as we expected but it requires set of keys in request.
POST /db/_all_docs HTTP/1.1
{
"keys" : [
"11",
"18"
]
}
Thanks in advance.
The _all_docs endpoint is actually just a system-level view that uses the _id field as the index. Thus, any parameters that you can use for views also apply here.
If you read the documentation further, you'll find that adding the parameter include_docs=true to your view will include the original documents in the results. The documents will be added as the doc field alongside id, value and rev.

Solr document Submission

I am new in the solr technology.Can you please tell me how a document can submit to solr using user interface.Is it necessary to create xml of the document first?I expect a simplest way of document indexing..
Please Help.
The default Solr RequestHandler (from 4.0) supports four formats: XML, JSON, CSV and javabin. There's a page under the Admin interface to submit documents to the index (select the core and Documents).
There are examples of each of the formats available in the Solr reference guide. If you're using a client library, the library will usually handle this for you anyways, and use an appropriate format depending on which language it's written in and what built-in libraries are available.
The simplest format for manually adding documents is probably JSON:
[
{
"id": "1",
"title": "Doc 1"
},
{
"id": "2",
"title": "Doc 2"
}
]
You can also use the DataImportHandler to import data locally at the server, such as from an SQL-database. In that case you don't submit the actual rows to the server, but you tell the handler to fetch the rows and create documents for you.

CouchDB JSon response customization

I'm storing addresses data in Couchdb, and am looking for a way to get an array of just the values, instead of key: value for every record.
This is the current response:
{"total rows": 2438, "offset": 0, "rows":[
{"id": "ec5de6de2cf7bcac9a2a2a76de5738e4", "key": "user_298774", "value": {"city": "Milano", "address":"Corso Como, 42b"},
{"id": "a2a2a76de573ae4ec5de6de2cf7bcac9", "key": "user_276341", "value": {"city": "Vicenza", "address":"Via Quinto Sella, 118"}
... (etc).
]}
I only really need:
[{"city": "Milano", "address":"Corso Como, 42b"},
{"city": "Vicenza", "address":"Via Quinto Sella, 118"},
...]
I really need to minimize the usage of bandwidth that a JSON response consumes. I can't seem to find a way to transform the view into a simple array. Suggestions?
The response you are getting conforms to the Couch's REST based protocol. To reformat it two methods are provided: show functions and list functions. Basic idea is the same, but the first is suitable for retrieval documents and the list function is for you!
The list function runs the query inside the server and send the output arbitrary transformed with your JS code. API you will need is simple:
Fetch each record from the view with the getRow() function.
Export to the string (containing JSON) your JS object obj with toJSON(obj).
Send the output to the client with send(json).
If the map/reduce view URL with data is /mydb/_design/myapp/_view/mydocs-by-user and the list function name is mylist get the reformatted result to the client with the URL /mydb/_design/myapp/_list/mylist/mydocs-by-user.
Please refer to the list function documentation cited above and the chapter in the Guide for the longed tutorial.

Resources