Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Degradation on Simple Query with ArcadeDB Embedded Version 24.5.1 #1730

Open
ismaeladra opened this issue Sep 24, 2024 · 10 comments
Assignees
Labels
performance question Further information is requested

Comments

@ismaeladra
Copy link

ArcadeDB Version: 24.5.1 (Embedded)
Operating System: Windows
JDK Version: 11
Nodes: 8357
Edges: 13557

Problem Description:
I am experiencing significant performance issues when running a simple query on ArcadeDB embedded version 24.5.1. The query in question is as follows:
MATCH (n:V_6)-[e:E_36]-(p:V_8) WHERE n.id = 647 RETURN n, p LIMIT 1
This query takes approximately 4 seconds to retrieve a result, which seems excessive given the dataset size (8357 nodes and 13557 edges). I would expect much faster execution times for a query of this simplicity.
I have searched through the documentation but could not find any clear information regarding performance tuning or optimization guidelines for embedded use cases. This issue occurs consistently with the same query structure and dataset size.

Any suggestions for improving query performance, configuration changes, or indexing strategies would be greatly appreciated.

@lvca lvca self-assigned this Sep 24, 2024
@lvca lvca added question Further information is requested performance labels Sep 24, 2024
@lvca
Copy link
Contributor

lvca commented Sep 24, 2024

Do you have an index on V_6.id property?

@ismaeladra
Copy link
Author

Thank you for the response.

To clarify, V_6 is a subtype of a parent type called OBJECT_INSTANCE, and the index is applied on the id field of the parent type (OBJECT_INSTANCE), not directly on V_6. The database does not allow us to create an index directly on V_6.id if an index already exists on the parent type (OBJECT_INSTANCE).

Would this structure affect the query performance? Should the index be applied differently, or is there another recommended way to optimize in this scenario?
index

@lvca
Copy link
Contributor

lvca commented Sep 25, 2024

If you profile the query, do you see it's using that index?

Enable profiling from here:

image

@ismaeladra
Copy link
Author

this is the profiling result
image

thank you very much

@lvca
Copy link
Contributor

lvca commented Sep 29, 2024

The index is used, but 02% of the time is spent on edge traversing. How many edges do you have between vertices?

Is E_36 an outgoing vertex from V_6 or are you interested in both incoming and outgoing?

Also, can you run this SQL query?

select expand( both("E_36") ) from V_6 where id = 647

@ismaeladra
Copy link
Author

you can check in this photo time for execution is 2.3 seconds for a simple query
image

and this the result for the above query
image
and this is the explain
`+ FETCH FROM INDEX OBJECT_INSTANCE[id] (5,278μs)
id = 803

  • EXTRACT VALUE FROM INDEX ENTRY (480μs)
    filtering buckets [228,225,222,219,216,213,210,207,204,201]
  • FILTER ITEMS BY TYPE (2,381μs)
    V_6
  • CALCULATE PROJECTIONS (17,051μs)
    both("E_36")
  • EXPAND (20μs)
  • LIMIT ( LIMIT 25)`

@lvca
Copy link
Contributor

lvca commented Sep 30, 2024

The math on the explanation doesn't add up for the 1,174ms of the query. Is there any way we can look into the database? Is it possible that you previously created and deleted many edges from that vertex?

@ismaeladra
Copy link
Author

This is exactly the issue we're trying to track down but haven't been able to identify the root cause yet.

Regarding the created and deleted edges, our system is built on a Spring Boot application that uses an embedded ArcadeDB. Whenever users create or delete relations in our system, these changes are saved directly to ArcadeDB. However, we haven't noticed any specific patterns or issues related to previously deleted edges causing problems.

@lvca
Copy link
Contributor

lvca commented Sep 30, 2024

If you can't send me your database zipped to support at arcadedata.com, is there a way we can access to studio remotely and run some queries?

@ismaeladra
Copy link
Author

I've just sent an email to your support team. As this is a customer environment, we can't provide direct access to the database, but we're happy to arrange a session to investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants