Querying relational data in GraphQL (NoSQL v RDS) - database

I'm writing an application that contains an overall data model with some obvious relations. I started writing the application using MongoDB but decided to try and transition over to Postgres since my data has tons of "foreign keys". For simplicity, let's consider the following models:
class GameBase {
id: string
entryIds: string[]
submissionIds: string[]
}
class EntryBase {
id: string
description: string
gameId: string
userId: string // id of user who supplied entry
submissionIds: string[] // submissions where entry is covered
}
class SubmissionBase {
id: string
url: string
gameId: string
userId: string // id of user who submitted
entryIds: string[] // entries covered by submission
}
Now I understand if I use a tool like TypeOrm, I could retrieve these relations with something along the lines of:
const games = await gameRepository.find({ relations: ["entryIds", "submissionIds"] });
But I'm not really sure how that relates to GraphQL. What I've been doing up until now is adding #ResolveField inside my Resolvers and writing something like
// game.resolver.ts
#ResolveField(() => [SubmissionBase], { nullable: true })
submissions(#Parent() game: GameBase) {
return this.submissionService.getManySubmissions(game.submissionIds)
}
and in the service
// game.service.ts
async getManySubmissions(submissionIds: string[]): Promise<SubmissionBase[]> {
if (!submissionIds) return []
return await this.submissionRepository.find({
where: {
id: { $in: submissionIds },
},
})
}
So this makes sense to me and has been working great, I'm just curious if I would see tangible speed/performance improvements if I switched to a relational database. For example, if the same .find method you see in my service was instead backed by Postgres instead of MongoDB, and the appropriate foreign key relationship was established, could I reasonably expect speed improvements? I imagine I wouldn't since it's just a simple get with no joins. Also, although submissionIds is a pseudo foreign key (because of MongoDB), it still acts as one in this setup. I guess I'm failing to see why MongoDB is inherently the wrong choice for relational data if you can use GraphQL and something like #ResolveField to grab whatever you need. What would a successful implementation of an RDS backed by GraphQL look like given this context?

This is a good question although I think it's going to get some opinionated and non definitive answers. My personal experience and advice after working on multiple production GraphQL servers talking to both SQL and NoSQL databases is this:
If you're going to expose GraphQL over a relational DB like Postgres and you're using NestJS do not write the GraphQL layer by hand
It's extremely time consuming and error prone plus you'll run into all kinds of problems related to N+1 and performance while sacrificing lots of the functionality that is gained by using an RDS in the first place (like the joins you're talking about). As someone who has gone down this path before, just please don't
There is a plethora of extremely powerful technologies that allow you to instead generate a GraphQL API on top of your RDS. These technologies than parse the GraphQL AST and convert it into an optimized single SQL query. I highly recommend that you look in to Hasura and PostGraphile. Both of these will blow you away with how productive you can be. Manually writing resolvers over top of SQL relations is just a waste of time
These tools can then be integrated into and with your NestJS application. If you're interested in Hasura specifically, I maintain open source packages that can help you integrate it nicely with NestJS

Related

Is it safe to access Elasticseach from a client without going through an API server?

For example, suppose you embed the following Javascript code in Vue.js or React.js.
var elasticsearch = require ('elasticsearch');
var esclient = new elasticsearch.Client ({
host: 'Elasticsearch host name of Elascticsearch Cloud's(URL?')
});
esclient.search ({
index: 'your index',
body: {
query: {
match: {message: 'search keyword'}
},
aggs: {
your_states: {
terms: {
field: 'your field',
size: 10
}
}
}
}
}
).then (function (response) {
var hits = response.hits.hits;
}
);
When aiming at a search engine of an application like stackoverflow,
if only GET from the public is OK by using the ROLE setting of the cloud of Elasticseach,
Even though I did not prepare an API server, I thought that the same thing could be realized with the above client side code,
Is it a security problem? (Such as whether it is dangerous for the host name to fall on the client side)
If there is no problem, the search engine response will be faster and the cost of implementation will be reduced,
I wondered why many people would not do it. (Because sample code like this can not be seen on the net much)
Thank you.
It is NOT a good idea.
If any client with a bit of programming knowledge finds our your ElasticSearch IP address, you are screwed, he could basically delete all the data without you even noticing.
I have no understanding about XPack Security, but if you are not using that you are absolutely forced to hide ES behind an API.
Then you also have to secure you ES domain to allow access only from the API server and block the rest of the world.

Let publication return modified/"fake"/non-database data, still queryable by client

A standard publication would look something like this:
Meteor.publish('suggestions', function(query){
return MyDB.find({param: query});
}
and the results, once subscribed to, would then be accessible in the client by simple doing MyDB.find(...);.
However, how would I implement
(a) Some kind of pre-processing, meaning I add or remove certain properties to the queried documents on the server side, that should then still be queryable client-side?
(b) returning fake data, i.e. data following the database schema and still being queryable client-side, but not actually being present server side?
Example:
Meteor.publish('suggestions', function(query){
//Stuff in database: [{prop1: 'first'}, {prop1: '2nd'}]
if(query == 'something') { //Fake data
return [{prop1: 'hello', prop2: 42}];
} else {
result = MyDB.find().fetch();
result.forEach(function(element) {
element.prop2 = random_number;
}
return result;
}
}
So if I then subscribe to 'suggestions' on the client, I'd like to see the following:
//Subscribed with query 'something':
var arr = MyDB.find().fetch();
//arr equals [{prop1: 'hello', prop2: 42}]
Subscribed with another query
var arr = MyDB.find().fetch();
//arr equals [{prop1: 'first', prop2: random_number}, {prop1: '2nd', prop2: random_number}]
Basically, as said above, I want the database data to be modified a bit or completely before being sent to the client, but then the client should be able to query it as if it was coming directly from the database.
How would I go about doing this?
I believe that the answer given in
Meteor : how to publish custom JSON data?
I had a similar need and this answer helped me.
You should manipulate the properties "added", "changed", "ready ()", etc. directly.
Or even evaluate whether it would be better to use "Meteor.call ()"
I answered a similar question from How do you publish and subscribe to data that's not Mongo db?
I've put references and guides in that answer.
Note that the client side subscriber in that question was implemented in React.js, but I think the way to publish data on server side is almost the same.
meteor

Saving Documents to CouchDB Urls with Multiple Slashes

My first exposure to NoSQL DBs was through Firebase, where I'd typically store json data to a url like: category, then store something else later to a url like category/subcategory.
Trying to do the same in CouchDB I ran into a problem.
For example, I saved a simple object like:
{"_id":"one"}
to
database/category
which works as expected. Then I try saving the following
{"_id":"two"}
to
database/category/subcategory
I get this error message:
{"error":"not_found","reason":"Document is missing attachment"}
Apparently, when you use multiple slashes in a url, Couch understands the resource as an attachment. If this is so, how does one make databases where data will have multiple levels, like Geography/Continents/Africa/Egypt, for example?
CouchDB is not suitable for the usage you described. CouchDB is a flat document store.
You should flatten your structure in order to store it in CouchDB.
{"_id":"country-es",
"type":"geography",
"country":"Spain",
"continent":"Europe"
}
{"_id":"country-fr",
"type":"geography",
"country":"France",
"continent":"Europe"
}
Then use a view in order to have a mechanism to query it hierarchically.
function (doc) {
if (doc.type == "geography") {
emit([doc.continent,doc.country], doc._id );
}
}

App Engine Instance ID

Is it possible to get info on what instance you're running on? I want to output just a simple identifier for which instance the code is currently running on for logging purposes.
Since there is no language tag, and seeing your profile history, I assume you are using GAE/J?
In that case, the instance ID information is embedded in one of the environment attributes that you could get via ApiProxy.getCurrentEnvironment() method. You could then extract the instance id from the resulting map using key BackendService.INSTANCE_ID_ENV_ATTRIBUTE.
Even though the key is stored in BackendService, this approach will also work for frontend instances. So in summary, the following code would fetch the instance ID for you:
String tInstanceId = ApiProxy.getCurrentEnvironment()
.getAttributes()
.get( BackendService.INSTANCE_ID_ENV_ATTRIBUTE )
.toString();
Please keep in mind that this approach is quite undocumented by Google, and might subject to change without warning in the future. But since your use case is only for logging, I think it would be sufficient for now.
With the advent of Modules, you can get the current instance id in a more elegant way:
ModulesServiceFactory.getModulesService().getCurrentInstanceId()
Even better, you should wrap the call in a try catch so that it will work correctly locally too.
Import this
import com.google.appengine.api.modules.ModulesException;
import com.google.appengine.api.modules.ModulesServiceFactory;
Then your method can run this
String instanceId = "unknown";
try{
instanceId = ModulesServiceFactory.getModulesService().getCurrentInstanceId();
} catch (ModulesException e){
instanceId = e.getMessage();
}
Without the try catch, you will get some nasty errors when running locally.
I have found this super useful for debugging when using endpoints mixed with pub-sub and other bits to try to determine why some things work differently and to determine if it is related to new instances.
Not sure about before, but today in 2021 the system environment variable GAE_INSTANCE appears to contain the instance id:
instanceId = System.getenv("GAE_INSTANCE")

Salesforce Metadata apis

I want to rerieve list of Metadata Component's like ApexClass using Salesforce Metadata API's.
I'm getting list of all the Apex Classes(total no is 2246) that are on the Salesforce using the following Code and its taking too much time to retrieve these file names:
ListMetadataQuery query = new ListMetadataQuery();
query.type = "ApexClass";
double asOfVersion = 23.0;
// Assume that the SOAP binding has already been established.
FileProperties[] lmr = metadataService.listMetadata(
new ListMetadataQuery[] { query }, asOfVersion);
if (lmr != null)
{
foreach(FileProperties n in lmr)
{
string filename = n.fileName;
}
}
My requirement is to get list of Metadata Components(Apex Classes) which are developed by my organizasion only so that i can get the Salesforce Metadata Components which are relevant to me and possibly can save my time by not getting all the classes.
How can I Achieve this?
Reply as soon as possible.
Thanks in advance.
I've not used the meta-data API directly, but I'd suggest either trying to filter on the created by field, or use a prefixed name on your classes so you can filter on that.
Not sure if filters are possible though! As for speed, my experience of using the Meta-Data API via Eclipse is that it's always pretty slow and there's not much you can do about it!

Resources