We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spark version: spark-3.5.0-bin-hadoop3 graphframes version: graphframes-0.8.3-spark3.5-s_2.12
spark-3.5.0-bin-hadoop3
graphframes-0.8.3-spark3.5-s_2.12
If I run the shortestPaths() algorithm on a symetric graph it fails to return a distance for a vertex that is verifiably only three edges away.
shortestPaths()
Code:
peptide_edges = xiSR.withColumn( "src", col("sequence_p1") ).withColumn( "dst", col("sequence_p2") ).select('src','dst').distinct() print(f"Edges: {peptide_edges.count()}") # Do reverse edges peptide_edges = peptide_edges.union( peptide_edges.withColumn('_src', col('src'))\ .withColumn('_dst', col('dst'))\ .withColumn('src', col('_dst'))\ .withColumn('dst', col('_src'))\ .select('src','dst') ).distinct() print(f"Double-Edges: {peptide_edges.count()}") peptide_vertices = peptide_edges.select(col("src").alias("id")).distinct() # Materialize peptide_edges.write.mode('overwrite').parquet('./spark-checkpoints/cache/peptide_edges.parquet') peptide_vertices.write.mode('overwrite').parquet('./spark-checkpoints/cache/peptide_vertices.parquet') peptide_edges = spark.read.parquet('./spark-checkpoints/cache/peptide_edges.parquet') peptide_vertices = spark.read.parquet('./spark-checkpoints/cache/peptide_vertices.parquet') # Create graph peptideG = GraphFrame(peptide_vertices, peptide_edges) # Get shortest path peptideG.shortestPaths(["oxMoxMTNR"]).where(col('id') == 'AAADAKAK').show() # Display existing path peptide_edges.where((col('src') == 'AAADAKAK') & (col('dst') == 'TGEYLK')).show() peptide_edges.where((col('src') == 'TGEYLK') & (col('dst') == 'QPGAGGGGGSGSGGSGAKGGPESR')).show() peptide_edges.where((col('src') == 'QPGAGGGGGSGSGGSGAKGGPESR') & (col('dst') == 'oxMoxMTNR')).show()
Output:
Edges: 3847839 Double-Edges: 7669377 +--------+---------+ | id|distances| +--------+---------+ |AAADAKAK| {}| +--------+---------+ +--------+------+ | src| dst| +--------+------+ |AAADAKAK|TGEYLK| +--------+------+ +------+--------------------+ | src| dst| +------+--------------------+ |TGEYLK|QPGAGGGGGSGSGGSGA...| +------+--------------------+ +--------------------+---------+ | src| dst| +--------------------+---------+ |QPGAGGGGGSGSGGSGA...|oxMoxMTNR| +--------------------+---------+
The text was updated successfully, but these errors were encountered:
No branches or pull requests
spark version:
spark-3.5.0-bin-hadoop3
graphframes version:
graphframes-0.8.3-spark3.5-s_2.12
If I run the
shortestPaths()
algorithm on a symetric graph it fails to return a distance for a vertex that is verifiably only three edges away.Code:
Output:
The text was updated successfully, but these errors were encountered: