From 7102174edabc2ec2a89942cf4d5070b34a60479f Mon Sep 17 00:00:00 2001 From: onebox-li Date: Tue, 3 Dec 2024 16:57:53 +0800 Subject: [PATCH] [CELEBORN-1759] Fix reserve slots might lost partition location between 0.4 client and 0.5 server ### What changes were proposed in this pull request? Fix the worker parses `ReserveSlots` logic for compatibility ### Why are the changes needed? When upgrading to 0.5, the 0.4 client reserves slots for the 0.5 worker. If there is only a replicate location, the worker parses abnormally, causing the actual reserve to fail, but returns success to the client. The worker log "Reserved 0 primary location and 0 replica location" appears. ### Does this PR introduce _any_ user-facing change? When upgrading to 0.5 from 0.4, fix potential reserve slot failure scenario.(only replica location). ### How was this patch tested? Manual test. Closes #2968 from onebox-li/fix-reserve-compatibility. Authored-by: onebox-li Signed-off-by: mingji --- .../celeborn/common/protocol/message/ControlMessages.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/src/main/scala/org/apache/celeborn/common/protocol/message/ControlMessages.scala b/common/src/main/scala/org/apache/celeborn/common/protocol/message/ControlMessages.scala index 7b2c3b13280..bd1a6a11e84 100644 --- a/common/src/main/scala/org/apache/celeborn/common/protocol/message/ControlMessages.scala +++ b/common/src/main/scala/org/apache/celeborn/common/protocol/message/ControlMessages.scala @@ -1271,7 +1271,7 @@ object ControlMessages extends Logging { val pbReserveSlots = PbReserveSlots.parseFrom(message.getPayload) val userIdentifier = PbSerDeUtils.fromPbUserIdentifier(pbReserveSlots.getUserIdentifier) val (primaryLocations, replicateLocations) = - if (pbReserveSlots.getPrimaryLocationsList.isEmpty) { + if (pbReserveSlots.getPrimaryLocationsList.isEmpty && pbReserveSlots.getReplicaLocationsList.isEmpty) { PbSerDeUtils.fromPbPackedPartitionLocationsPair( pbReserveSlots.getPartitionLocationsPair) } else {