I buiding a graphQL server to wrap a multiple restful APIs. Some of the APIs that i will be integrating are third party and some that we own. We use redis as caching layer. Is it okay if i implement dataloader caching on graphQL? Will it have an effect on my existing redis caching?
Dataloader does not only serve one purpose. In fact there are three purposes that dataloader serves.
Caching: You mentioned caching. I am assuming that you are building a GraphQL gateway/proxy in front of your GraphQL API. Caching in this case means that when you need a specific resource and later on you will need it again, you can reach back to the cached value. This caching happens in memory of your JavaScript application and usually does not conflict with any other type of caching e.g. on the network.
Batching: Since queries can be nested quite deeply you will ultimately get to a point where you request multiple values of the same resource type in different parts of your query execution. Dataloader will basically collect them and resolve resources like a cascade. Requests flow into a queue and are kept there until the execution cycle ends. Then they are all "released" at once (and probably can be resolved in batches). Also the delivered Promises are all resolved at once (even if some results come in earlier than others). This allows the next execution level to also happen within one cycle.
Deduplication: Let's say you fetch a list of BlogPost with a field author of type User. In this list multiple blog posts have been written by the same author. When the same key is requested twice it will only be delivered once to the batch function. Dataloader will then take care of delivering the resources back by resolving the repective promises.
The point is that (1) and (3) can be achieved with a decent http client that caches the requests (and not only the responses, that means does not fire another request when one is already running for that resource). This means that the interesting question is if your REST API supports batch requests (e.g. api/user/1,2 in one requests instead of api/user/1 and api/user/2). If so, using dataloader can massively improve the performance of your API.
Maybe you want to look into what Apollo is building right now with their RESTDatasource: https://www.apollographql.com/docs/apollo-server/v2/features/data-sources.html#REST-Data-Source
Related
This bounty has ended. Answers to this question are eligible for a +50 reputation bounty. Bounty grace period ends in 16 hours.
Evgen is looking for a canonical answer.
Suppose I need to make a component (using React) that displays a devices state table that displays various device attributes such as their IP address, name, etc. Polling each device takes a few seconds, so you need to display a loading indicator for each specific device in the table.
I have several ways to make a similar component:
On the server side, I can create an API endpoint (like GET /device_info) that returns data for only one requested device and make multiple parallel requests from fontend to that endpoint for each device.
I can create an API endpoint (like GET /devices_info) that returns data for the entire list of devices at once on the server and make one request to it at once for the entire list of devices from frontend.
Each method has its pros and cons:
Way one:
Prons:
Easy to make. We make a "table row" component that requests data only for the device whose data it displays. The "device table" component will consist of several "table row" components that execute multiple queries in parallel each for their own device. This is especially true if you are using libraries such as React Query or RTK Query which support this behavior out of the box;
Cons:
Many requests to the endpoint, possibly hundreds (although the number of parallel requests can be limited);
If, for some reason, synchronization of access to some shared resource on the server side is required and the server supports several workers, then synchronization between workers can be very difficult to achieve, especially if each worker is a separate process;
Way two:
Prons:
One request to the endpoint;
There are no problems with access to some shared resource, everything is controlled within a single request on the server (because guaranteed there will be one worker for the request on the server side);
Cons:
It's hard to make. Since one request essentially has several intermediate states, since polling different devices takes different times, you need to make periodic requests from the UI to get an updated state, and you also need to support an interface that will support several states for each device such as: "not pulled", "in progress", "done";
With this in mind, my questions are:
What is the better way to make described component?
Does the better way have a name? Maybe it's some kind of pattern?
Maybe you know a great book/article/post that describes a solution to a similar problem?
that displays a devices state
The component asking for a device state is so... 2010?
If your device knows its state, then have your device send its state to the component
SSE - Server Sent Events, and the EventSource API
https://developer.mozilla.org/.../API/Server-sent_events
https://developer.mozilla.org/.../API/EventSource
PS. React is your choice; I would go Native JavaScript Web Components, so you have ZERO dependencies on Frameworks or Libraries for the next 30 JavaScript years
Many moons ago, I created a dashboard with PHP backend and Web Components front-end WITHOUT SSE: https://github.com/Danny-Engelman/ITpings
(no longer maintained)
Here is a general outline of how this approach might work:
On the server side, create an API endpoint that returns data for all devices in the table.
When the component is initially loaded, make a request to this endpoint to get the initial data for all devices.
Use this data to render the table, but show a loading indicator for each device that has not yet been polled.
Use a client-side timer to periodically poll each device that is still in a loading state.
When the data for a device is returned, update the table with the new information and remove the loading indicator.
This approach minimizes the number of API requests by polling devices only when necessary, while still providing a responsive user interface that updates in real-time.
As for a name or pattern for this approach, it could be considered a form of progressive enhancement or lazy loading, where the initial data is loaded on the server-side and additional data is loaded on-demand as needed.
Both ways of making the component have their pros and cons, and the choice ultimately depends on the specific requirements and constraints of the project. However, here are some additional points to consider:
Way one:
This approach is known as "rendering by fetching" or "server-driven UI", where the server is responsible for providing the data needed to render the UI. It's a common pattern in modern web development, especially with the rise of GraphQL and serverless architectures.
The main advantage of this approach is its simplicity and modularity. Each "table row" component is responsible for fetching and displaying data for its own device, which makes it easy to reason about and test. It also allows for fine-grained caching and error handling at the component level.
The main disadvantage is the potential for network congestion and server overload, especially if there are a large number of devices to display. This can be mitigated by implementing server-side throttling and client-side caching, but it adds additional complexity.
Way two:
This approach is known as "rendering by rendering" or "client-driven UI", where the client is responsible for driving the rendering logic based on the available data. It's a more traditional approach that relies on client-side JavaScript to manipulate the DOM and update the UI.
The main advantage of this approach is its efficiency and scalability. With only one request to the server, there's less network overhead and server load. It also allows for more granular control over the UI state and transitions, which can be useful for complex interactions.
The main disadvantage is its complexity and brittleness. Managing the UI state and transitions can be difficult, especially when dealing with asynchronous data fetching and error handling. It also requires more client-side JavaScript and DOM manipulation, which can slow down the UI and increase the risk of bugs and performance issues.
In summary, both approaches have their trade-offs and should be evaluated based on the specific needs of the project. There's no one-size-fits-all solution or pattern, but there are several best practices and libraries that can help simplify the implementation and improve the user experience. Some resources to explore include:
React Query and RTK Query for data fetching and caching in React.
Suspense and Concurrent Mode for asynchronous rendering and data loading in React.
GraphQL and Apollo for server-driven data fetching and caching.
Redux and MobX for state management and data flow in React.
Progressive Web Apps (PWAs) and Service Workers for offline-first and resilient web applications.
Both approaches have their own advantages and disadvantages, and the best approach depends on the specific requirements and constraints of your project. However, given that polling each device takes a few seconds and you need to display a loading indicator for each specific device in the table, the first approach (making multiple parallel requests from frontend to an API endpoint that returns data for only one requested device) seems more suitable. This approach allows you to display a loading indicator for each specific device in the table and update each row independently as soon as the data for that device becomes available.
This approach is commonly known as "concurrent data fetching" or "parallel data fetching", and it is supported by many modern front-end libraries and frameworks, such as React Query and RTK Query. These libraries allow you to easily make multiple parallel requests and manage the caching and synchronization of the data.
To implement this approach, you can create a "table row" component that requests data only for the device whose data it displays, and the "device table" component will consist of several "table row" components that execute multiple queries in parallel each for their own device. You can also limit the number of parallel requests to avoid overloading the server.
To learn more about concurrent data fetching and its implementation using React Query or RTK Query, you can refer to their official documentation and tutorials. You can also find many articles and blog posts on this topic by searching for "concurrent data fetching" or "parallel data fetching" in Google or other search engines.
I am making a bus prediction web application for a college project. The application will use GTFS-R data, which is essentially a transit delay API that is updated regularly. In my application, I plan to use a cron job and python script to make regular get requests and write the response to a JSON file, essentially creating a feed of transit updates. I have set up a get request, where the user inputs trip data that will be searched against the feed to determine if there are transit delays associated with their specific trip.
My question is - if the user sends a request at the same time as the JSON file is being updated, could this lead to issues?
One solution I was thinking of is having an intermediary JSON file, which when fully loaded will replace the file used in the search function.
I am not sure if this is a good solution or if it is even needed. I am also not sure of the semantics needed to search for solutions to similar problems so pointers in the right direction would be useful.
When a graphql request hits my server it jumps into several different resolve functions depending on the query.
But I don't want to hit my database with sometimes dozens of small requests (basically a select in each resolve function). That would not really scale very well.
Is there a way to accumulate all these resolve calls for a single graphql request - so that at the end I can do some magic and build only a single select to my database and then resolve all promises?
I can see a few ways of building something myself by accumulating all the promises I return in the resolve functions, but I don't know when all resolve functions for a single request have been called.
Is there a solution to this problem? How does Facebook deal with that scenario?
Thank you
Batching all resolve calls into a single database query is an active area of exploration for GraphQL implementations and isn't supported in the reference implementation. Instead, we recommend using dataloader. Using dataloader can help ensure that you fetch each record from the storage layer once, even if it is queried multiple times.
This is something that's been bothering me.
In a typical web application, request goes to the server and the server does it's 'stuff' and returns the template along with the data, ready to be displayed on the browser.
In Angular however, there is a response to get the template and, in most cases, another one to get the data.
Doesn't this add pressure on the bandwidth – 2 responses sent over the wire as compared to one from a typical web app. Doesn't it also imply a (possibly) higher page-load time?
Thanks,
Arun
The short answer is that it depends. :)
For example, in terms of bandwidth, it depends on how big the 'server-side' view render payload is. If the payload is very big, then separating out the calls might reduce bandwidth. This is because the JSON payload might be lighter (less angle brackets) so the overall bandwidth could reduce. Also, many apps that use ajax are already making more than one call (for partial views etc).
As for performance, you should test it in your application. There is an advantage to calling twice if you are composing services and one service might take longer than another. For example, if you need to layout a product and the service which provides 'number in the warehouse' takes longer than the product details. Then, the application could see a perceived performance gain. In the server-side view render model, you would have to wait for all the service calls to finish before returning a rendered view.
Further, client side view rendering will reduce the CPU burden on the server, because view rendering will be distributed on each client's machine. This could make a very big difference in the number of concurrent users and application can handle.
Hope this helps.
There is appengine-mapreduce which seems the official way to do things on AppEngine. But there seems no documentation besides some hacked together Wiki Pages and lengthy videos. There are statements that the lib only supports the map step. But the source indicates that there are also implementations for shuffle.
A Version of this appengine-mapreduce library seems also to be included in the SDK but it not blessed for public use. So you basically are expected to load the library twice into your runtime.
Then there is appengine-pipeline. "A primary use-case of the API is connecting together various App Engine MapReduces into a computational pipeline." But there also seems pipeline-related code in the appengine-mapreduce library.
So where do I start to find out how this all fits together? Which is the library to call from my project. Is there any decent documentation on appengine-mapreduce besides parsing change logs?
Which is the library to call from my project.
They serve different purposes, and you've provided no details about what you're attempting to do.
The most fundamental layer here is the task queue, which lets you schedule background work that can be highly parallelized. This is fan-out. Let's say you had a list of 1000 websites, and you wanted to check the response time for each one and send an email for any site that takes more than 5 seconds to load. By running these as concurrent tasks, you can complete the work much faster than if you checked all 1000 sites in sequence.
Now let's say you don't want to send an email for every slow site, you just want to check all 1000 sites and send one summary email that says how many took more than 5 seconds and how many took fewer. This is fan-in. It's trickier with the task queue, because you need to know when all tasks have completed, and you need to collect and summarize their results.
Enter the Pipeline API. The Pipeline API abstracts the task queue to make fan-in easier. You write what looks like synchronous, procedural code, but uses Python futures and is executed (as much as possible) in parallel. The Pipeline API keeps track of task dependencies and collects results to facilitate building distributed workflows.
The MapReduce API wraps the Pipeline API to facilitate a specific type of distributed workflow: mapping the results of a piece of work into a set of key/value pairs, and reducing multiple sets of results to one by combining their values.
So they provide increasing layers of abstraction and convenience around a common system of distributed task execution. The right solution depends on what you're trying to accomplish.
There is offical documentation here: https://developers.google.com/appengine/docs/java/dataprocessing/