How to accumulate resolve calls for a single request? - reactjs

When a graphql request hits my server it jumps into several different resolve functions depending on the query.
But I don't want to hit my database with sometimes dozens of small requests (basically a select in each resolve function). That would not really scale very well.
Is there a way to accumulate all these resolve calls for a single graphql request - so that at the end I can do some magic and build only a single select to my database and then resolve all promises?
I can see a few ways of building something myself by accumulating all the promises I return in the resolve functions, but I don't know when all resolve functions for a single request have been called.
Is there a solution to this problem? How does Facebook deal with that scenario?
Thank you

Batching all resolve calls into a single database query is an active area of exploration for GraphQL implementations and isn't supported in the reference implementation. Instead, we recommend using dataloader. Using dataloader can help ensure that you fetch each record from the storage layer once, even if it is queried multiple times.

Related

Making a request that triggers a function that searches a json object. The object will be updated every minute, could this lead to issues?

I am making a bus prediction web application for a college project. The application will use GTFS-R data, which is essentially a transit delay API that is updated regularly. In my application, I plan to use a cron job and python script to make regular get requests and write the response to a JSON file, essentially creating a feed of transit updates. I have set up a get request, where the user inputs trip data that will be searched against the feed to determine if there are transit delays associated with their specific trip.
My question is - if the user sends a request at the same time as the JSON file is being updated, could this lead to issues?
One solution I was thinking of is having an intermediary JSON file, which when fully loaded will replace the file used in the search function.
I am not sure if this is a good solution or if it is even needed. I am also not sure of the semantics needed to search for solutions to similar problems so pointers in the right direction would be useful.

Flink job dynamic input parameters

One parameter for my flink job is dynamic and i have an api so as to fetch the dynamic value. Can i call the api in source everytime so as to fetch data based on the parameter? Is it the correct way? Will it cause any trouble in flink job?
So, If I understand correctly the idea is that You first get some key from dynamoDB and then use that to query external service from the source.
I think that should be possible in general, but there are few things to have in mind when doing that.
Not sure about performance of such solution. Are You going to query database constantly? Or somehow just get changes ? There are several things to consider here to have good performance of the source.
It may be hard to provide any strong guarantees for such setup, but that depends on the charcteristics of the setup itself. I.e. how are You going to handle failures? How often will key in database change? Will the data be accessible via URL after the key in DB changes ? You probably can keep the last read key in state, so that when the job fails and key in DB changes You can try to read the data for the previous key (for which the job has failed) but that depends on the questions above.
Finally, depending on the characteristics of the setup, it may be possible to use existing Flink operators to achieve that. For example, You can technically stream changes from Database (using one of existing connectors depending on DB) and then use that data in AsyncIO to query external URL, so that finally You have a stream of data from URL witout creating Your own source.

Tracking unused redux data in React components

I'm looking for a good way to track which props received by a component are not being used and can be safely removed.
In a system I maintain, our client single-page app fetches a large amount of data from some private endpoints in our backend services via redux saga. For most endpoints called, all data received is passed directly to our React components, no filtering applied. We are working to improve the overall system performance, and part of that process involves reducing the amount of data returned by our backend-for-frontend services, given those themselves call a large number of services to compose the returned JSON data, which adds to the overall response time.
Ideally, we want to make sure we only fetch the data we absolutely need and save the server from doing unnecessary calls and data normalization. So far, we've been trimming the backend services data by doing a code inspection; we inspect the returned data for each endpoint, then inspect the front-end code and finally remove the data we identified (as a best guess) as unused. That's proven to be risky and inefficient, frequently we assume some data is unused, then months later find a corner case in which it was actually needed, and have to reverse the work. I'm looking for a smart, automated way to identify unused props in my app. Has anyone else had to work on something like that before? Ideas?
There's an existing library called https://github.com/aholachek/redux-usage-report , which wraps the Redux state in a proxy to identify which pieces of state are actually being used.
That may be sufficiently similar to what you're doing to be helpful, or at least give you some ideas that you can take inspiration from.

Is it okay to have dataloader and rest api caching?

I buiding a graphQL server to wrap a multiple restful APIs. Some of the APIs that i will be integrating are third party and some that we own. We use redis as caching layer. Is it okay if i implement dataloader caching on graphQL? Will it have an effect on my existing redis caching?
Dataloader does not only serve one purpose. In fact there are three purposes that dataloader serves.
Caching: You mentioned caching. I am assuming that you are building a GraphQL gateway/proxy in front of your GraphQL API. Caching in this case means that when you need a specific resource and later on you will need it again, you can reach back to the cached value. This caching happens in memory of your JavaScript application and usually does not conflict with any other type of caching e.g. on the network.
Batching: Since queries can be nested quite deeply you will ultimately get to a point where you request multiple values of the same resource type in different parts of your query execution. Dataloader will basically collect them and resolve resources like a cascade. Requests flow into a queue and are kept there until the execution cycle ends. Then they are all "released" at once (and probably can be resolved in batches). Also the delivered Promises are all resolved at once (even if some results come in earlier than others). This allows the next execution level to also happen within one cycle.
Deduplication: Let's say you fetch a list of BlogPost with a field author of type User. In this list multiple blog posts have been written by the same author. When the same key is requested twice it will only be delivered once to the batch function. Dataloader will then take care of delivering the resources back by resolving the repective promises.
The point is that (1) and (3) can be achieved with a decent http client that caches the requests (and not only the responses, that means does not fire another request when one is already running for that resource). This means that the interesting question is if your REST API supports batch requests (e.g. api/user/1,2 in one requests instead of api/user/1 and api/user/2). If so, using dataloader can massively improve the performance of your API.
Maybe you want to look into what Apollo is building right now with their RESTDatasource: https://www.apollographql.com/docs/apollo-server/v2/features/data-sources.html#REST-Data-Source

Why do major DB vendors not provide truly asynchronous APIs?

I work with Oracle and Mysql, and I struggle to understand why the APIs are not written such that I can issue a call, go away and do something else, and then come back and pick it up later eg NIO - I am forced to dedicate a thread to waiting for data. It seems that the SQL interfaces are the only place where sync IO is still forced, which means tying up a thread waiting for the DB.
Can anybody explain the reasons for this? Is there something fundamental that makes this difficult?
It would be great to be able to use 1-2 threads to manage my DB query issue and result fetch, rather than use worker threads to retrieve data.
I do note that there are two experimental attempts (eg: adbcj) at implementing an async API but none seem to be ready for Production use.
Database servers should be able to handle thousands of clients. To provide an asyncronous interface, the DB server will need to keep the resultset from the query in memory, so you can pick it up at later stage. It will quickly become out of resources.
A considerable problem with async is many many libraries use threadlocal for transactions.
For example in Java Much of the JDBC specification relies on a synchronous behavior to achieve single thread per-transaction. That is you write your transaction in procedural order.
To do it right transactions would have to be done through callback but they are not. I know of only node.js that does this but its unclear if its really async.
Of course even if you do async I'm not sure if it will really improve performance as the database itself if is probably doing it synchronous.
There are lots of ways to avoid thread over-population in (Java):
Is asynchronous jdbc call possible?
Personally to get around this issue I use a Message Bus like RabbitMQ.

Resources