H-Index

1. Algorithm Introduction

H-Index calculates the h-index value for each node in a undirected graph, where h-index here is defined as the maximun value of h such that the given vertex has h neighbors and each neighbor itself has at least h neighbors. The h-index value is usually used to represent the importance of a vertex.

2. Parameters

IO Params

input：hdfs path for a undirected and unweighted graph, each row represents an edge in the form of srcId | dstId
output: hdfs path for output, each row represents the vertex and the corresponding hindex/gindex/windex value in the form of nodeId | hindex | gindex | windex
sep: the separation in input file to separate the srcId and dstId, could be tab, space or comma

Algo Params

partitionNum：num of RDD partitions
psPartitionNum：num of data partitions on ps
useBalancePartition：whether to user balancePartition strategy, true / false, true is suggested when the distribution of graph vertices is unbalanced
storageLevel：RDD persist level，DISK_ONLY/MEMORY_ONLY/MEMORY_AND_DISK

3. Running

input=hdfs://my-hdfs/data
output=hdfs://my-hdfs/output

source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
  --master yarn-cluster\
  --conf spark.ps.instances=1 \
  --conf spark.ps.cores=1 \
  --conf spark.ps.jars=$SONA_ANGEL_JARS \
  --conf spark.ps.memory=10g \
  --name "hindex angel" \
  --jars $SONA_SPARK_JARS  \
  --driver-memory 5g \
  --num-executors 1 \
  --executor-cores 4 \
  --executor-memory 10g \
  --class org.apache.spark.angel.examples.graph.HIndexExample \
  ../lib/spark-on-angel-examples-3.2.0.jar
  input:$input output:$output sep:tab storageLevel:MEMORY_ONLY useBalancePartition:true \
  partitionNum:4 psPartitionNum:1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hindex_sona_en.md

hindex_sona_en.md

H-Index

1. Algorithm Introduction

2. Parameters

IO Params

Algo Params

3. Running

Files

hindex_sona_en.md

Latest commit

History

hindex_sona_en.md

File metadata and controls

H-Index

1. Algorithm Introduction

2. Parameters

IO Params

Algo Params

3. Running