-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer internal subgraph requests on non-required fields #2653
Comments
I agree that the underlying need is desirable, and this is something that has been mentioned a few times, though in a slightly different form. Why would it ever make sense to "defer" the So fundamentally, I think this is about allowing the query planner to know about the cost of various fields. And it is true that the query planner currently has to make assumptions when it tries to find "the best" plan, and one of them is that all fields cost the same thing and that doing fetch is overall a lot more costly than resolving a field (with the result that the planner optimise first and foremost for the number of fetches). But that's obviously not true, and if the planner had access to some cost information, it could do a better job. Here, in a way, Anyway, all this to say that I'd rather introduce this as a query getAllProductsNoReview {
products {
id
inStock
}
} then it makes not sense to "defer" |
Well put! That is exactly the reason for wanting to defer so I agree a better approach is to mark the costly fields and still let the query planner find the best/most-efficient path |
@pcmanus to provide more context on the motivation/my discussion with shane - this came up for us at Yelp, which I've distilled down here https://gist.github.com/magicmark/cbda3eedf1255334caee357fde7680de It sounds like Having a strong guarantee to subgraph authors that "this big scary chunk of work will be parallelized" would be awesome - we tend to think in big blocks of network waterfalls, and trying to make sure everything is squished together as much as possible. |
I don’t think exposing costs of fields to the planner is the right approach here. In general this speaks toward what I think is the biggest issue with Federation as I have experienced it. To begin with, let me set up an example based off of @smyrick's: Assume I have a monolithic GraphQL server with the following schema, and a response time annotated after each field: type Query {
product(id: ID!): Product # 1s
}
type Product {
id: ID! # 0s (Key fields are usually synchronous)
manufacturer: Company! # 1s
countryOfOrigin: Country! # 2s
inStock: Boolean! # 3s
}
type Company {
id: ID! # 0s
name: String! # 2s
owner: Person! # 1s
}
type Person {
id: ID! # 0s
name: String! # 1s
}
type Country {
id: ID! # 0s
name: String! # 2s
} And now I execute this query against that monolithic server: query GetProductDetails($id: ID!) {
product(id: $id) {
inStock
manufacturer {
name
owner {
name
}
}
countryOfOrigin {
name
}
}
} The performance of this query is simply given by taking the maximum time it takes to any given leaf field:
So, in this case given the performance of the entire query is This makes for a very clear visual example as a Gantt chart: # Product Graph
type Query {
product(id: ID!): Product # 1s
}
type Product {
id: ID! # 0s (Key fields are usually synchronous)
manufacturer: Company! # 1s
countryOfOrigin: Country! # 2s
inStock: Boolean! # 3s
}
type Company @key(fields: "id") {
id: ID! #0s
}
type Country @key(fields: "id") {
id: ID! #0s
}
# Company Graph
type Company @key(fields: "id") {
id: ID! # 0s
name: String! # 2s
owner: Person! # 1s
}
type Person @key(fields: "id") {
id: ID! # 0s
}
# Person Graph
type Person @key(fields: "id") {
id: ID! # 0s
name: String! # 1s
}
# Country Graph
type Country @key(fields: "id") {
id: ID! # 0s
name: String! # 2s
} We will assume there is no need for Now, if the same client is to execute the same operation, computing the total runtime is a lot harder. First you need to consider the individual subgraph queries: Product Graphquery GetProductDetails($id: ID!) {
product(id: $id) {
inStock
manufacturer {
id
}
countryOfOrigin {
id
}
}
} Where the performance is given by:
For a total time in the subgraph of Company Graphquery ($representations: [_Any!]!) {
_entities(representations: $representations) {
... on Company {
name
owner {
id
}
}
}
} Where the performance is given by:
For Person Graphquery ($representations: [_Any!]!) {
_entities(representations: $representations) {
... on Person {
name
}
}
} Where the performance is given by:
For Country Graphquery ($representations: [_Any!]!) {
_entities(representations: $representations) {
... on County {
name
}
}
} Where the performance is given by:
For Overall PerformanceThe Country graph path and the Company → Person graph paths race eachother:
Our performance has got significantly worse ( Notice we have added additional synchronisation points all along the graph that are not necessary. In fact, this leaks internal details about the subgraph structure of the services, even though we should be looking at an opaque single GraphQL API. From the perspective of the GraphQL API’s owner it becomes hard to work out how to optimize this query too - the critical path is now given by the critical path of subgraphs and within each subgraph their own critical path. In my experience the vast majority of queries against federated graphs have a different critical path than they would otherwise, even if the overhead doesn’t look that bad at first glance. Tools like distributed tracing can give you a real view in to the graph performance that looks very similar to this Gantt chart above. To such a developer this actually suggest that putting more fields in to the Product graph would be beneficial because that would maximise parallelisation, but I think this is unwise. Addressing the cost suggestionIf we mark up all the fields in the schema with the cost equivalent to their execution durations then the planner has sufficient knowledge to perform splitting of the queries to optimise the operation. In particular, it would probably make multiple requests to the Product graph: // query1
query GetProductDetails1($id: ID!) {
product(id: $id) {
inStock
}
}
// query2
query GetProductDetails2($id: ID!) {
product(id: $id) {
manufacturer {
id
}
}
}
// query3
query GetProductDetails3($id: ID!) {
product(id: $id) {
countryOfOrigin {
id
}
}
} The planner may decide that query 2 & 3 could be combined, but given the different costs I suspect they won’t be. I do not think this is a good idea because it results in the common path of the fields executing multiple times ( There is also the problem of what happens if the result for the common path is different in one of the query results than the others (Say query3’s If the server is using cost (either the same or different metrics) to estimate the expense of a query for purposes of rate limiting or execution size limiting, then it becomes non-trivial for the server to calculate the execution cost. It is even harder for the client to reason about it because they should not know about the internals of the implementation. Using
|
@meiamsome Thank you for the detailed explanation! That is exactly correct on the use case and problems you described, this will be really helpful for anyone else who wants to catch up. For solutioning what was proposed by @pcmanus was not about how the Router requests slow fields, but when. My initial comment proposed the idea of this being controlled by subgraph developers and them explicitly marking certain fields with a new directive like Instead, what if we considered a configurable option that could take cost estimates into account and defer subgraph requests when they went over a certain cost threshold. If subgraph request cost > 100, use @defer on the most expensive non-key fields. In your case doing @defer on every single field could roughly be implemented by setting that cost threshold to 1. Maybe what we would additionally need though is not just a limit on when to use defer at all, but also what should be the max limit of all subgraph requests. So rather than splitting requests until all request were under 100 for one request, a max single request cost of 1 would then basically defer everything |
Related #3141 |
Let's say we have this schema across two subgraphs
Subgraph Products
Subgraph Reviews
I can write the following query and this all works as expected. The query planner is smart enough to split the products query into two separate queries and make an optimized call to the reviews subgraph because it only needs
Product.id
to connect the two.However we have a user requirement that we don't defer the loading of the UI state into chunks and that we want to return everything in one response. Also to use this optimization requires clients to know to use
@defer
. Instead if there was some schema directive we could use to indicate to the query planner that it should not wait for the entire response and do the@defer
optimization but still only return one response, we could better control this logic server side and give everyone the optimization if they don't use client@defer
.Maybe something like
@subgraphDefer
or@entityDefer
Query we want to make
so in the schema we would need something like this
Keyword search
Internal defer, entity defer, Router defer, schema defer
The text was updated successfully, but these errors were encountered: