Lunr - gatsby-plugin-lunr - Can I alter data / index on build?

Lunr - gatsby-plugin-lunr - Can I alter data / index on build? - reactjs

I've got a Gatsy-Sanity project that needs a search component. For this I though of using gatsby-plugin-lunr. I run into a problem that my nodes are multilingual. For example one of my fields is constructed like:
"title": {
"_type": "localeString",
"nl": "Begin ",
"en": "Home "
},
(This parser is, in short, like following. If has key _type that starts with 'locale*', than return only value of key en or nl. This is passed by a var)
I could make a parser that splits/strips the data. I've got this sort of working (not yet succesfull) inside the component that runs the search query from the search-index. But that would mean it parses it each search. Is there a way to do this on build in gatsby-node.js with a lunr plugin? I also need this since I would need to add a language prefix on the slug/path of the result.
const SearchProcess = lunr => builder => {
// how to pre-process data
}

I'm going with a different gatsby plugin. gatsby-plugin-local-search
This plugin is able to alter the data before saving it with normalizer Now I can call a method to conditional alter the data per language.

Related

Cannot return documents based off a sorted index using Fauna DB

I'm bumbling my way through adding a back-end to my site and have decided to get acquainted with graphQL. I may be structuring things totally the wrong way, however from following some tutorials I have a React front-end (hosted on Vercel), so I have created an api folder in my app to make use of Vercel's serverless functions. I'm using Apollo server and I decided to go with Fauna as my database.
I've successfully been able to return an entire collection via my API. Now I wish to be able to return the collection sorted by my id field.
To do this I created an index which looks like this:
{
name: "sort_by_id",
unique: false,
serialized: true,
source: "my_first_collection",
values: [
{
field: ["data", "id"]
},
{
field: ["ref"]
}
]
}
I then was able to call this via my api and get back and array, which simply contained the ID + ref, rather than the associated documents. I also could only console log it, I assume because the resolver was expecting to be passed an array of objects with the same fields as my typedefs. I understand I need to use the ref in order to look up the documents, and here is where I'm stuck. An index record looks as follows:
[1, Ref(Collection("my_first_collection"), "352434683448919125")]
In my resolvers.js script, I am attempting to receive the documents of my sorted index list. I've tried this:
async users() {
const response = await client.query(
q.Map(
q.Paginate(
q.Match(
q.Index('sort_by_id')
)
),
q.Lambda((ref) => q.Get(ref))
)
)
const res = response.data.map(item => item.data);
return [... res]
}
I'm unsure if the problem is with how I've structured my index, or if it is with my code, I'd appreciate any advice.

It looks like you also asked this question on the Fauna discourse forums and got an answer there: https://forums.fauna.com/t/unable-to-return-a-list-of-documents-via-an-index/3511/2
Your index returns a tuple (just an array in Javascript) of the data.id field and the ref. You confirmed that with your example result
[
/* data.id */ 1,
/* ref */ Ref(Collection("my_first_collection"), "352434683448919125")
]
When you map over those results, you need to Get the Ref. Your query uses q.Lambda((ref) => q.Get(ref)) which passes the whole tuple to Get
Instead, use:
q.Lambda(["id", "ref"], q.Get(q.Var("ref")))
// or with JS arrow function
q.Lambda((id, ref) => q.Get(ref))
or this will work, too
q.Lambda("index_entry", q.Get(q.Select(1, q.Var("index_entry"))))
// or with JS arrow function
q.Lambda((index_entry) => q.Get(q.Select(1, index_entry)))
The point is, only pass the Ref to the Get function.

When should I use _id in MongoDB?

MongoDB has a field for every document called "_id". I see people using it everywhere as a primary key, and using it in queries to find documents by the _id.
This field defaults to using an ObjectId which is auto-generated, an example is:
db.tasks.findOne()
{
_id: ObjectID("ADF9"),
description: "Write lesson plan",
due_date: ISODate("2014-04-01"),
owner: ObjectID("AAF1") // Reference to another document
}
But in JavaScript, the underscore behind a field in an object is a convention for private, and as MongoDB uses JSON (specifically, BSON), should I be using these _ids for querying, finding and describing relationships between documents? it doesn't seem right.
I saw that MongoDB has a way to generate UUID https://docs.mongodb.com/manual/reference/method/UUID
Should I forget that _id property, and create my own indexed id property with an UUID?

Use UUIDs for user-generated content, e.g. to name image uploads. UUIDs can be exposed to the user in an URL or when the user inspects an image on the client-side. For everything that is on the server/not exposed to the user, there is no need to generate a UUID, and using the auto-generated _id is preferred.
An simple example of using UUID would be:
const uuid = require('uuid');
exports.nameFile= async (req, res, next) => {
req.body.photo = `${uuid.v4()}.${extension}`;
next();
};

How MongoDB names its things should not interfere in how you name your things. If data sent by third-party hurts the conventions you agreed to follow, you have to transform that data into the format you want as soon as it arrives in your application.
An example based in your case:
function findTaskById(id) {
var result = db.tasks.findOne({"_id": id});
var task = {
id: result._id,
description: result.description,
something: result.something
};
return task;
}
This way you isolate the use of Mongo's _id into the layer of your application that is responsible to interact with the database. In all other places you need task, you can use task.id.

Read JSON from rest API as is with Azure Data Factory

I'm trying to get Azure Data Factory to read my REST API and put it in SQL Server. The source is a REST API and the sink is a SQL Server table.
I tried to do something like:
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"$": "json"
},
"collectionReference": "$.tickets"
}
The source looks like:
{ "tickets": [ {... }, {...} ] }
Because of the poor mapping capabilities I'm choosing this path. I'll then split the data with a query. Preferbly I'd like to store each object inside tickets as a row with JSON of that object.
In short, how can I get the JSON output from the RestSource to a SqlSink single column text/nvarchar(max) column?

I managed to solve the same issue by modifying mapping manually.
ADF anyway tries to parse json, but from the Advanced mode you can edit json paths. Ex., this is the original schema parsed automatically by ADF
https://imgur.com/Y7QhcDI
Once opened in Advanced mode it will show full paths by adding indexes of the elements, something similar to $tickets[0][] etc
Try to delete all other columns and keep the only one $tickets (the highest level one), in my case it was $value https://i.stack.imgur.com/WnAzC.jpg. As the result the entire json will be written into the destination column.
If there are pagination rules in place, each page will be written as a single row.

Search for multiple id:s in cloudant

I am using node-red to communicate with cloudant and for each time my flow runs I might have different amount of id:s coming in msg.payload. Later I want to use these id:s to display all the relevant objects. Is it possible to search for multiple id:s in some way? Or do you have any other solution? Can't find anything about this online atm

It looks like Node-RED supports querying by _id, a search index, or all documents. When you use _id there does not seem to be a way to specify more than one ID. You can use a search index, however, to query for multiple IDs.
Create a search index in Cloudant similar to the following:
{
"_id": "_design/allDocSearch",
"views": {},
"language": "javascript",
"indexes": {
"byId": {
"analyzer": "standard",
"index": "function (doc) {\n index(\"id\", doc._id);\n}"
}
}
}
This corresponds to the following when using the Cloudant dashboard:
design doc = allDocSearch
index name = byId
index function =
function (doc) {
index("name", doc.name);
}
To search for multiple IDs your query would look something like this:
id:"1" OR id:"2"
In Node-Red set up your Cloudant node to point to the appropriate database, specify a "Search by" of search index, and configure your design document and index name (in this case it would be allDocSearch/byId).
You can test with a simple inject node with a payload similar to the search query above: id:"1" OR id:"2"

Can Solr DIH do atomic updates?`

With Solr 4 came the ability to do atomic (partial) updates on existing documents within the index. I.e. one can match on the document ID and replace the contents of just one field, or add further entries to multivalued fields: http://wiki.apache.org/solr/Atomic_Updates
Can atomic updates be done from DataImportHandler (DIH)?

The answer is "yes" with the ScriptTransformer, I discovered through trial and error.
The Solr documentation shows how to add an update attribute to a field node with "set", "add" or "inc". If I create a test XML file with the requisite update attribute, it works fine when passed to the regular update handler. But, when passed to DIH - even without any transformation - the update attributes get ignored completely.
Here's a simplified version of the script transformer I used to reintroduce the update attribute and get atomic updates working. Note the use of the Java HashMap.
var atomicTransformer = function (row) {
var authorMap = new java.util.HashMap();
var author = String(row.get('author'));
authorMap.put('add', author);
row.put('author', authorMap);
};
This produces the following JSON in DIH debug mode:
{
"id": [
123
],
"author": [
{
"add": "Smith, J"
}
]
}
Multivalued fields are also no problem: pass in an ArrayList to the HashMap instead of a string.
var atomicTransformer = function (row) {
var fruits = new java.util.ArrayList();
fruits.add("banana");
fruits.add("apple");
fruits.add("pear");
var fruitMap = new java.util.HashMap();
fruitMap.put('add', fruits);
row.put('fruit', fruitMap);
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Lunr - gatsby-plugin-lunr - Can I alter data / index on build? - reactjs

I'm going with a different gatsby plugin. gatsby-plugin-local-search This plugin is able to alter the data before saving it with normalizer Now I can call a method to conditional alter the data per language.

Related

Cannot return documents based off a sorted index using Fauna DB

When should I use _id in MongoDB?

Read JSON from rest API as is with Azure Data Factory

Search for multiple id:s in cloudant

Can Solr DIH do atomic updates?`

Categories

Resources