Building a database of average speed from two cameras using cloudant entries - cloudant

I'm very new to IBM Cloud, and specifically, Cloudant DB. In this database is records of when cars pass two different speed cameras (either 20 or 21). I have a database of javascript objects with the following format:
{
"_id": "006994989f0914a7fb1ca44fae00fe75",
"_rev": "1-e9b9afcb45f6ff703825d4be6d331f73",
"payload": {
"license_plate": "GNX834",
"camera_id": 20,
"date_time_string": "2019-05-08T15:20:04.134Z",
"date_time_UTC_milliseconds": 1557328804134
},
"qos": 2,
"retain": false
}
I wish create a search index function to create another database full of objects that contain the average speed of cars as they travel between the two cameras (They're 3 miles apart). I know I need to sort them by number plate, but I'm struggling to understand how to do this in Cloudant DB?
Any help on this subject would be great!

If you want to extra the documents where the licence_plate is equal to a value that you know, then you need to create an index on that field:
POST /db/_index HTTP/1.1
Content-Type: application/json
{
"index": {
"fields": ["payload.license_plate"]
},
"name" : "plate-index",
"type" : "json"
}
Then you can query by licence plate:
POST /db/_find HTTP/1.1
Content-Type: application/json
{
"selector": {
"payload.license_plate": "NH67992"
}
}
which would return you the matching documents.
Alternatively, if you want to extract the data en masse, ordered by the license plate field, you could create a MapReduce View with the license plate as the key (map function below):
function(doc) {
emit(doc.payload.license_plate, doc.payload.date_time_string)
}
The index that this Map function generates is ordered by license plate and be used to extract data in license plate order.

Related

Azure Data Factory - converting lookup result array

I'm pretty new to Acure Data Factory - ADF and have stumbled into somthing I would have solved with a couple lines of code.
Background
Main flow:
Lookup Activity fetchin an array of ID's to process
ForEach Activity looping over input array and uisng a Copy Activity pulling data from a REST API storing it into a database
Step #1 would result in an array containing ID's
{
"count": 10000,
"value": [
{
"id": "799128160"
},
{
"id": "817379102"
},
{
"id": "859061172"
},
... many more...
Step #2 When the lookup returns a lot of ID's - individual REST calls takes a lot of time. The REST API supports batching ID's using a comma spearated input.
The question
How can I convert the array from the input into a new array with comma separated fields? This will reduce the number of Activities and reduce the time to run.
Expecting something like this;
{
"count": 1000,
"value": [
{
"ids": "799128160,817379102,859061172,...."
},
{
"ids": "n,n,n,n,n,n,n,n,n,n,n,n,...."
}
... many more...
EDIT 1 - 19th Des 22
Using "Until Activity" and keeping track of posistions, I managed to use plain ADF. Would be nice if this could have been done using some simple array manipulation in a code snippet.
The ideal response might be we have to do manipulation with Dataflow -
My sample input:
First, I took a Dataflow In that adding a key Generate (Surrogate key) after the source - Say new key field is 'SrcKey'
Data preview of Surrogate key 1
Add an aggregate where you group by mod(SrcKey/3). This will group similar remainders into the same bucket.
Add a collect column in the same aggregator to collect into an array with expression trim(toString(collect(id)),'[]').
Data preview of Aggregate 1
Store output in single file in blob storage.
OUTPUT

Representing JSON data in relational database table

I have a problem where I need to convert a JSON payload into SQL tables, while maintaining the relationships established in the payload. This is so that later I have the ability to query the tables and recreate the JSON payload structure in the future.
For example:
{
"batchId": "batch1",
"payees" : [
   {
 "payeeId": "payee1",
"payments": [
{
"paymentId": "paymentId1",
"amount": 200,
"currency": "USD"
},
{
"paymentId": "paymentId2",
"amount": 200,
"currency": "YEN"
},
{
"paymentId": "paymentId2",
"amount": 200,
"currency": "EURO"
}
]
}
]
}
For the above payload, I have a batch with payments grouped by payees. At its core it all boils down to a batch and its payments. But in that you can have groupings, for example above, it's grouped by payees.
One thing to note is that the payload may not necessarily always follow the above structure. Instead of grouping by payees, it could be by something else like currency for example. Or even no grouping at all, just a root level batch and an array of payments.
I want to know if there are conventions/rules I can follow to approach represent such data into relational tables? Thanks.
edit:
I am primarily looking to use Postgres and have looked into the jsonb feature that it provides for storing json data. However, I'm still struggling to figure out how/where (in terms of which table) to best store the grouping info.

WKS - Training model to identify entities on tables

Browser type and version: GoogleChrome 67.0.3396.99
We are trying to train our model to identify values from multiple types of tables whom contain different number of rows and columns. A text row was extracted to begin the training, first we configure our system types and then, marked the entities and also the relation “AllInOne”. We are able to train 10 relations in a training set, but when the model is tested, we are only able to see 8 relations even creating other document sets for training and test the model multiple times. Is there another way to associate the column value with the row values in a single relation considering there isn’t a standard for the types of tables we are analyzing with the Discovery service?
We are expecting the discovery service response as the following:
"relations": [
{
"type": "AllInOne",
"sentence": "…",
"arguments": [
{
"entities": [
{
"“text": "””",
"type": "entity1"
}
]
},
{
"entities": [
{
"“text": "””",
"type": "entity2"
}
]
},
{
"entities": [
{
"“text": "””",
"type": "\"entity..n”,"
}
]
},
{ "..." }
]
}
The machine learning model that is trained in Watson Knowledge Studio targets unstructured natural language text. It may not be suitable for (semi-) structured format like table, especially for relations.

indexing large array in mongoDB

according to mongoDB documentation, it's not recommended to create multikey index for large arrays, so what is the alternative option for that?
I want to notify my app users whenever one of their contacts also start using the app, so I have to upload and manage the contacts list of each user.
we are using mongoDB with replica set of master with two secondaries machines.
does mongo can handle multikey indexing for array with hundreds of values?
hundreds of contacts for hundreds thousands of users can be very hard to mange.
the multikey solution looks like that:
{
customerId: "id1",
contacts: ["aaa", "aab", "aac", .... "zzz"]
}
index: createIndex({ contacts: 1 }).
another solution is to save each contacts in it's own document and save all the app users that related to him:
{
phone: "aaa",
contacts: ["id1", "id2", "id3"]
},
{
phone: "aab",
contacts: ["id1"]
},
{
phone: "aac",
contacts: ["id1"]
},
......
{
phone: "zzz",
contacts: ["id1"]
}
index: createIndex( { phone: 1 } )
both have poor performance on writing when uploading the contacts list:
the first on calculate huge index, and the second for updating lots of documents concurrent.
Is there a better way to do it?
I'm using a replica set with two secondaries machines, does shard key could help?
Thanks
To index a field that holds an array value, MongoDB creates an index key for each element in the array. These multikey indexes support efficient queries
against array fields.
So if i were you, my data model would be like this :
{
customerId: "id1",
contacts: ["_idx", "_idy", "_idw", .... "_idz"]
}
And then create your index on the contacts. MongoDB creates by default indexes on ids. So you will have to create new documents for the non app users, just try to to add a field, like "app_user" : true/false.
For index performance, you could make it build in the background without any issues, and for replica sets, this is how it's done.
For the sharding, it won't help you, because you won't even be able to shard anything, since you have one primary node in your cluster. Sharding needs at least 2 sets of primary Mongo instances, so in your case, you could add a fourth server, then have two replica sets, of one primary and one secondary, then shard them, and tranform your system into 2 replicated shards.
Once this is achieved, it will obviously balance the loads between the 2 shards, eventhough a hundred documents isn't really much to deal with for MongoDB.
On the other hand if you're going to go for sharding, you will need more setup, for config servers if you're using Mongodb 3.4 or higher.

Solr request: SQL-like JOIN, GROUP BY, SUM(), WHERE SUM()

I'm new to Solr and I have the following problem:
I have those documents:
category:contract:
{
"contract_id_s": "contract-ENG-00001",
"title_s": "contract title",
"ref_easy_s": "REFAAA",
"commitment_id_s": "ENG-00001",
},
category:commitment:
{
"commitment_id_s": "ENG-00001",
"title_s": "commitment title",
"status_s": "Validated",
"date_changed_status_s": "2015-09-30",
"date_status_initiated_s": "2015-09-27",
"date_status_confirmed_s": "2015-09-28",
"date_status_validated_s": "2015-09-30",
},
category:commitment AND sub_category_s:commitment_project:
{
"id": "ENG-00001_AAA",
"commitment_id_s": "ENG-00001",
"project_id_s": "AAA",
"project_name_s": "project name",
"project_amount_asked_s": "2000",
"project_amount_validated_s": "2100"
},
{
"id": "ENG-00001_AAA2",
"commitment_id_s": "ENG-00001",
"project_id_s": "AAA",
"project_name_s": "project name",
"project_amount_asked_s": "1000",
"project_amount_validated_s": "1200"
},
For each commitment, there could be a contract.
For each commitment, there could be some payments.
Here is what I want to do:
- by default, only select commitment that have at least :
. one sub_category_s:commitment_project with a project_amount_validated_s value.
. one contract.
- if filtered on amounts, only select in this list, commitments with the SUM of project_amount_validated_s > amount_min AND < amount_max.
I don't know what is the best practice in terms of performance?
- Requesting the ids of the commitments then requesting the details for them?
- Is there a way to JOIN the contract informations in this request?
- Or the best practice is to request each document one by one?
The problem is that I don't want to request useless data (performance, bandwidth).
There are some tools available to you in the form of:
Solr's Block Join Query Parser (which allows for simple parent/child
queries).
Solr Facets (which allow for aggregrations (e.g. sum of payments) ... with recent support for faceting on parent/child fields).
The Solr Expand Component (which recently allows parent information to be expanded from a child block join query).
However, I'm not certain you can do everything you're hoping in one query (using with these pieces). And even if you can, stitching them together doesn't even come close the the simplicity of the SELECT...JOIN...GROUP BY...HAVING SQL query you're hoping to replicate. (Unless you want to try out the Solr 6 developer snapshot with parallel SQL support)
BUT If this is your only use-case, AND Solr is not your primary datastore, I'd strongly recommend modeling your Solr data to fit your use-case.
E.g. Start simple, denormalize, and only include the fields in your datamodel needed for search:
Only one type of record: commitment
Fields
commitment_id_s
title_s
status_s
date_changed_status_s
date_status_initiated_s
date_status_confirmed_s
date_status_validated_s
total_payments_asked (numeric sum of project_amount_asked from DB)
total_payments_validated (numeric sum of project_amount_validated from DB)
project_names (multiValued list of searchable project names)
contract_names (multiValued list of searchable contract names)
Then your query just needs a filter:
total_payments_validated:[<amount_min>TO<amount_max>]
to enforce your default criteria.
Once your search has identified the commitment IDs matching the Solr query, then go back and query the source database for any additional information needed for display (project details, contract details, dates, etc...)
Ok, I've found a solution by using !join.
For instance, in PHP:
[
'q' => "{!join from=id to=service_id score=none}uri:\\$serviceUri* AND -deleted:true",
'fq' => "{!cache=false}category:monthly_volume AND type:\"$type\" AND timestamp:[$strDateStart TO $strDateEnd]",
'alt' => 'json',
'max-results' => 1000,
'sort' => 'timestamp ASC',
'statsFields' => 'stats.field=value&stats.facet=timestamp',
]
Or with URL request:
http://localhost:8983/solr/fluks-admin/select?q={!join+from=id+to=sector_id+score=none}{!join+from=uri+to=service+score=none}uri:/test-en/service-en*+AND+-deleted:true&fq={!cache=false}category:indicator+AND+timestamp:[201608+TO+201610]+AND+type:("-3"+OR+2+OR+3)+AND+-deleted:true&wt=json&indent=true&json.facet={sum_timestamp:{terms:{limit:-1, field:timestamp, facet:{sum_type:{terms:{limit:-1, field:type, facet:{sum_vol_value:"sum(vol_value)"}}}}}}}

Resources