Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create test for project reset #440

Closed
hahn-kev opened this issue Nov 30, 2023 · 10 comments · Fixed by #550
Closed

Create test for project reset #440

hahn-kev opened this issue Nov 30, 2023 · 10 comments · Fixed by #550
Assignees
Milestone

Comments

@hahn-kev
Copy link
Collaborator

since this is rarely used we might break it without realizing it, so we should have a test for it.

@hahn-kev hahn-kev added this to the v1 milestone Nov 30, 2023
@rmunn rmunn self-assigned this Feb 5, 2024
@rmunn
Copy link
Contributor

rmunn commented Feb 5, 2024

This should also let us verify that #515 is fixed.

@hahn-kev
Copy link
Collaborator Author

hahn-kev commented Feb 7, 2024

bumping up the priority of this as we had an issue where it was broken and an admin couldn't recover a user issue due to this. So while this is not frequently used it is an escape hatch so we want to be sure it's working properly.

@rmunn
Copy link
Contributor

rmunn commented Feb 7, 2024

@hahn-kev - I currently have one working test in #550 (which I'd welcome feedback on if you have any), but I'm still figuring out how I'm going to integrate the Playwright Node tests with a dotnet Send/Receive test in order to test both sides of the equation (i.e., do a Send/Receive after a project reset, all as part of the test). I think I'll end up writing a small command-line tool in C# that can trigger a Send/Receive of the Sena-3 project, and call that command-line tool from Node. Then the Playwright tests are, conceptually, the E2E integration tests, and the C# code acts as a service.

@rmunn
Copy link
Contributor

rmunn commented Feb 7, 2024

Rethinking that approach. It might be better to create a new .NET test in SendReceiveServiceTests, which drives the API via ApiTestBase (similar to how the SendNewProject test works). The main thing we're concerned about here is that the reset project ends up in a good state (file permissions, etc) after resetting, and driving it via the UI in a Playwright test adds no benefit over driving it via calling the API.

So I'll leave #550 alone for now and start a new PR to create a .NET test or two in SendReceiveServiceTests.

@hahn-kev
Copy link
Collaborator Author

hahn-kev commented Feb 7, 2024

Yeah I think that would be a better approach than creating a tool. It might be difficult to drive the test due to using tus for the upload, but that could be covered in another tests.

I'm thinking we have a playwright test that makes sure we can do a project reset, maybe query /hg/project-code/browse?style=json-lex to determine that an uploaded file in the zip made it where we expect.

Then a S&R based reset test that ensures we can S&R to projects after a reset, but this doesn't need to go through the UI, we could even make a dedicated endpoint for this test to simplify the reset process (as long as it's consistent with the real reset process).

@rmunn
Copy link
Contributor

rmunn commented Feb 7, 2024

A quick record of my findings so far:

If the /var/vcs/public/sena-3 repo is empty in hgresumable when I try to do a Send/Receive to it, it goes into a nasty looping failure mode where NGINX returns a 499 (client closed connection early), then Chorus Resumable keeps retrying the connection because it doesn't know how to handle a 499. (Since the LexSyncReverseProxy is involved, the "client" that closed the connection might be the other end, i.e. hgweb; I haven't managed to track this down yet). Then you end up with hgresumable consumng all available CPU until I forcefully kill it off.

Logging into hgresumable and doing ps auxw to see what's going on reveals a bunch of processes, one per worker thread, trying to run sh -c hg tip --template "{rev}:{branches}\n" or sh -c hg log --template "{node|short}:{branches}\n" and each using 100% CPU. That looks like something that the resumable PHP code is trying to run, but I haven't yet looked into the PHP code to see what code path tries to run those commands, so it might be something else.

I don't know yet why NGINX is seeing a closed connection on one end, while hgresumable is seeing hg processes (which should complete very quickly) stuck at 100% CPU. I'll continue investigating, but by the nature of the problem this is going to take a little time as every time I trigger this behavior my computer slows down quite a lot.

Here's a copy of the lexbox-api logs for one such cycle:
[lexbox-api] info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       Request starting HTTP/1.1 GET http://resumable.localhost/api/v03/pullBundleChunk?offset=0&chunkSize=5000&transId=f375d994-03db-4661-abee-b8c1081101bb&baseHashes[]=fb853b6ed66f&baseHashes[]=1af18777e23e&repoId=sena-3 - - -
[lexbox-api] info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[1]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       Request:
[lexbox-api]       Protocol: HTTP/1.1
[lexbox-api]       Method: GET
[lexbox-api]       Scheme: http
[lexbox-api]       PathBase: 
[lexbox-api]       Path: /api/v03/pullBundleChunk
[lexbox-api]       Host: resumable.localhost
[lexbox-api]       User-Agent: HgResume v03
[lexbox-api]       X-Request-ID: [Redacted]
[lexbox-api]       X-Real-IP: [Redacted]
[lexbox-api]       X-Original-Proto: [Redacted]
[lexbox-api]       X-Forwarded-Host: resumable.localhost
[lexbox-api]       X-Forwarded-Port: [Redacted]
[lexbox-api]       X-Forwarded-Scheme: [Redacted]
[lexbox-api]       X-Scheme: [Redacted]
[lexbox-api]       X-Original-For: [Redacted]
[lexbox-api] info: LexSyncReverseProxy.Auth.BasicAuthHandler[7]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       HgAuthScheme was not authenticated. Failure message: No authorization header
[lexbox-api] info: Microsoft.AspNetCore.Authorization.DefaultAuthorizationService[2]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       Authorization failed. These requirements were not met:
[lexbox-api]       DenyAnonymousAuthorizationRequirement: Requires an authenticated user.
[lexbox-api]       LexSyncReverseProxy.Auth.UserHasAccessToProjectRequirement
[lexbox-api] info: LexSyncReverseProxy.Auth.BasicAuthHandler[12]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       AuthenticationScheme: HgAuthScheme was challenged.
[lexbox-api] info: Microsoft.AspNetCore.Authentication.JwtBearer.JwtBearerHandler[12]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       AuthenticationScheme: Bearer was challenged.
[lexbox-api] info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[2]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       Response:
[lexbox-api]       StatusCode: 401
[lexbox-api]       WWW-Authenticate: Basic realm="SyncProxy",Bearer
[lexbox-api]       lexbox-version: dockerDev
[lexbox-api] info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
[lexbox-api]       => TraceId:abb237ebcb66279da54f3fdbed543e9f => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000001
[lexbox-api]       Request finished HTTP/1.1 GET http://resumable.localhost/api/v03/pullBundleChunk?offset=0&chunkSize=5000&transId=f375d994-03db-4661-abee-b8c1081101bb&baseHashes[]=fb853b6ed66f&baseHashes[]=1af18777e23e&repoId=sena-3 - 401 - application/problem+json 1.1041ms
[lexbox-api] info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Request starting HTTP/1.1 GET http://resumable.localhost/api/v03/pullBundleChunk?offset=0&chunkSize=5000&transId=f375d994-03db-4661-abee-b8c1081101bb&baseHashes[]=fb853b6ed66f&baseHashes[]=1af18777e23e&repoId=sena-3 - - -
[lexbox-api] info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[1]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Request:
[lexbox-api]       Protocol: HTTP/1.1
[lexbox-api]       Method: GET
[lexbox-api]       Scheme: http
[lexbox-api]       PathBase: 
[lexbox-api]       Path: /api/v03/pullBundleChunk
[lexbox-api]       Host: resumable.localhost
[lexbox-api]       User-Agent: HgResume v03
[lexbox-api]       Authorization: [Redacted]
[lexbox-api]       X-Request-ID: [Redacted]
[lexbox-api]       X-Real-IP: [Redacted]
[lexbox-api]       X-Original-Proto: [Redacted]
[lexbox-api]       X-Forwarded-Host: resumable.localhost
[lexbox-api]       X-Forwarded-Port: [Redacted]
[lexbox-api]       X-Forwarded-Scheme: [Redacted]
[lexbox-api]       X-Scheme: [Redacted]
[lexbox-api]       X-Original-For: [Redacted]
[lexbox-api] info: Microsoft.EntityFrameworkCore.Database.Command[20101]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Executed DbCommand (15ms) [Parameters=[@__email_0='?'], CommandType='Text', CommandTimeout='30']
[lexbox-api]       SELECT t."Id", t."CanCreateProjects", t."CreatedDate", t."Email", t."EmailVerified", t."IsAdmin", t."LastActive", t."LocalizationCode", t."Locked", t."Name", t."PasswordHash", t."Salt", t."UpdatedDate", t."Username", t0."Id", t0."CreatedDate", t0."ProjectId", t0."Role", t0."UpdatedDate", t0."UserId", t0."Id0", t0."Code", t0."CreatedDate0", t0."DeletedDate", t0."Description", t0."LastCommit", t0."MigratedDate", t0."MigrationStatus", t0."Name", t0."ParentId", t0."ProjectOrigin", t0."ResetStatus", t0."RetentionPolicy", t0."Type", t0."UpdatedDate0"
[lexbox-api]       FROM (
[lexbox-api]           SELECT u."Id", u."CanCreateProjects", u."CreatedDate", u."Email", u."EmailVerified", u."IsAdmin", u."LastActive", u."LocalizationCode", u."Locked", u."Name", u."PasswordHash", u."Salt", u."UpdatedDate", u."Username"
[lexbox-api]           FROM "Users" AS u
[lexbox-api]           WHERE u."Email" = @__email_0 OR u."Username" = @__email_0
[lexbox-api]           LIMIT 1
[lexbox-api]       ) AS t
[lexbox-api]       LEFT JOIN (
[lexbox-api]           SELECT p."Id", p."CreatedDate", p."ProjectId", p."Role", p."UpdatedDate", p."UserId", t1."Id" AS "Id0", t1."Code", t1."CreatedDate" AS "CreatedDate0", t1."DeletedDate", t1."Description", t1."LastCommit", t1."MigratedDate", t1."MigrationStatus", t1."Name", t1."ParentId", t1."ProjectOrigin", t1."ResetStatus", t1."RetentionPolicy", t1."Type", t1."UpdatedDate" AS "UpdatedDate0"
[lexbox-api]           FROM "ProjectUsers" AS p
[lexbox-api]           INNER JOIN (
[lexbox-api]               SELECT p0."Id", p0."Code", p0."CreatedDate", p0."DeletedDate", p0."Description", p0."LastCommit", p0."MigratedDate", p0."MigrationStatus", p0."Name", p0."ParentId", p0."ProjectOrigin", p0."ResetStatus", p0."RetentionPolicy", p0."Type", p0."UpdatedDate"
[lexbox-api]               FROM "Projects" AS p0
[lexbox-api]               WHERE p0."DeletedDate" IS NULL
[lexbox-api]           ) AS t1 ON p."ProjectId" = t1."Id"
[lexbox-api]           WHERE t1."DeletedDate" IS NULL
[lexbox-api]       ) AS t0 ON t."Id" = t0."UserId"
[lexbox-api]       ORDER BY t."Id", t0."Id"
[lexbox-api] info: Microsoft.EntityFrameworkCore.Database.Command[20101]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Executed DbCommand (11ms) [Parameters=[@__id_0='?' (DbType = Guid)], CommandType='Text', CommandTimeout='30']
[lexbox-api]       UPDATE "Users" AS u
[lexbox-api]       SET "LastActive" = now()
[lexbox-api]       WHERE u."Id" = @__id_0
[lexbox-api] info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Executing endpoint '/api/v03/{**catch-all}'
[lexbox-api] info: Yarp.ReverseProxy.Forwarder.HttpForwarder[9]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Proxying to http://hg/api/v03/pullBundleChunk?offset=0&chunkSize=5000&transId=f375d994-03db-4661-abee-b8c1081101bb&baseHashes[]=fb853b6ed66f&baseHashes[]=1af18777e23e&repoId=sena-3 HTTP/2 RequestVersionOrLower 
[otel-collector] 2024-02-07T07:07:02.184Z	error	exporterhelper/retry_sender.go:145	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = Unauthenticated desc = attempted to use disabled API key", "dropped_items": 5}
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/retry_sender.go:145
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:177
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper.(*queueSender).start.func1
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:126
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).Start.func1
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:52
[otel-collector] 2024-02-07T07:07:06.178Z	error	exporterhelper/retry_sender.go:145	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/metrics", "error": "Permanent error: rpc error: code = Unauthenticated desc = attempted to use disabled API key", "dropped_items": 53}
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/retry_sender.go:145
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:176
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper.(*queueSender).start.func1
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:126
[otel-collector] go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).Start.func1
[otel-collector] 	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:52
[lexbox-api] warn: Yarp.ReverseProxy.Forwarder.HttpForwarder[48]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       RequestCanceled: The request was canceled before receiving a response.
[lexbox-api]       System.Threading.Tasks.TaskCanceledException: The operation was canceled.
[lexbox-api]        ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
[lexbox-api]        ---> System.Net.Sockets.SocketException (125): Operation canceled
[lexbox-api]          --- End of inner exception stack trace ---
[lexbox-api]          at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
[lexbox-api]          at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
[lexbox-api]          at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async)
[lexbox-api]          at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
[lexbox-api]          --- End of inner exception stack trace ---
[lexbox-api]          at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
[lexbox-api]          at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
[lexbox-api]          at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
[lexbox-api]          at System.Net.Http.Metrics.MetricsHandler.SendAsyncWithMetrics(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
[lexbox-api]          at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
[lexbox-api]          at Yarp.ReverseProxy.Forwarder.HttpForwarder.SendAsync(HttpContext context, String destinationPrefix, HttpMessageInvoker httpClient, ForwarderRequestConfig requestConfig, HttpTransformer transformer, CancellationToken cancellationToken)
[lexbox-api] fail: LexSyncReverseProxy.ForwarderTelemetryConsumer[0]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Forwarder Failed: RequestCanceled
[lexbox-api] info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Executed endpoint '/api/v03/{**catch-all}'
[lexbox-api] info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[2]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Response:
[lexbox-api]       StatusCode: 400
[lexbox-api]       lexbox-version: dockerDev
[lexbox-api] info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
[lexbox-api]       => TraceId:1f2d1affbb391aaf54836fbffe8e837b => ConnectionId:0HN17LBEE0752 => RequestPath:/api/v03/pullBundleChunk RequestId:0HN17LBEE0752:00000002
[lexbox-api]       Request finished HTTP/1.1 GET http://resumable.localhost/api/v03/pullBundleChunk?offset=0&chunkSize=5000&transId=f375d994-03db-4661-abee-b8c1081101bb&baseHashes[]=fb853b6ed66f&baseHashes[]=1af18777e23e&repoId=sena-3 - 499 - application/problem+json 29973.6265ms

@rmunn
Copy link
Contributor

rmunn commented Feb 7, 2024

The root cause (well, ONE of the root causes) appears to be that Mercurial's behavior on an empty project is not what the PHP code expects to receive. When a project is empty, hg log --template "{node|short}:{branches}\n" returns an empty string, while hg tip --template "{rev}:{branches}\n" returns the string -1:\n, i.e. there's nothing after the colon. This triggers a bug in the isValidBase PHP code, which is fixed by the following patch to hgresumable:

diff --git a/api/src/HgRunner.php b/api/src/HgRunner.php
index 4b186d7..ac9d6bb 100644
--- a/api/src/HgRunner.php
+++ b/api/src/HgRunner.php
@@ -204,7 +204,7 @@ class HgRunner {
                $i = 0;
                while($foundHash < count($hashes)) {
                        $revisions = $this->getRevisions($i, $q);
-                       if (count($revisions) == 0) {
+                       if (count($revisions) == 0 || count($revisions) == 1 && $revisions[0] == "0:") {
                                return false;
                        }
                        foreach($revisions as $hashandbranch) {

That's not enough to get us out of the woods, though. It solves the issue of hgresumable getting stuck in a loop, but Chorus is still stuck in a loop. Because once that patch is applied to the PHP code, Chorus now goes into a loop of receiving a 400 error and retrying it, even though the 400 error is a message from Mercurial saying "Hey, your request is invalid". Specifically, inside hgresume's pullBundleChunkInternal function, there's the following snippet:

			// TODO This might be bogus, the given baseHash may well be a baseHash that exists in a future push, and we don't have it right now. CP 2012-08
			if (!$hg->isValidBase($baseHashes)) {
				return new HgResumeResponse(HgResumeResponse::FAIL, array('Error' => 'invalid baseHash'));
			}

And Chorus is supposed to see the "invalid baseHash" message in the hgresume error and give up. However, it doesn't.

We can't patch Chorus. But we can patch our deployed version of hgresume. I suspect that if you have an empty repo, ALL base hashes are valid, because they are a baseHash that exists in a future push, as Cambell's comment from 2012 mentions. I'll try that and see what happens.

@rmunn
Copy link
Contributor

rmunn commented Feb 7, 2024

I added a check for an empty repo and had pullBundleChunkInternal return HgResumeResponse::NOCHANGE if the repo is empty. That got Chorus to proceed with pushing the bundle. However, that then made Chorus think that the resumable server had revision 2, so it didn't push a bundle with all the changes, it just pushed a bundle with the changes since revision 2. This resulted in Mercurial erroring out with:

abort: 00changelog@6f010b976aaaab1561a29893bf9820c68af47794: unknown parent
Command exited with non-zero status 255

And that, in turn, produced a nasty retry cycle between Resumable and Chorus where Resumable was repeatedly throwing a PHP exception from AsyncRunner: "PHP Fatal error: Uncaught AsyncRunnerException: Lock file '/var/cache/hgresume/8ac501c2-225b-4d6c-b365-5df157d30502.bundle.async_run' not found, process is not running". But Chorus just kept retrying that, even though it was never going to get better on the server end.

I have one more thing to try. So far I've been using our Send/Receive tests (the ModifyProjectData test, specifically) to do this testing. That uses the Send/Receive process in LfMergeBridge. But there's one more Chorus code path: the "Send this project for the first time" option in FLEx. I will have to try out that code path on a project that's been reset.

@rmunn
Copy link
Contributor

rmunn commented Feb 8, 2024

Manual testing on FLEx shows that the "Send this project for the first time" option only appears when there is no .hg folder in the project storage, which is not what we want.

However, I did find a way to work around the issue. If you're using resumable to send/receive the project in FLEx, the project folder (default location: C:\ProgramData\SIL\FieldWorks\Projects\(project name)) will contain a "Chorus" folder. Delete the "Chorus" folder and then everything will work. I'll do one more experiment where the only thing I delete from the Chorus folder is the "revisioncache.db" file (named "revisioncache.json" on Linux), and see if that works.

@rmunn
Copy link
Contributor

rmunn commented Feb 8, 2024

That gives us a working procedure for a .NET test of sending projects via resumable after a project reset. First create a new project (with a tiny .fwdata file, see the SendNewProject test). Then do the following steps in the test:

  • Clone project with resumable
  • Reset project via API calls (and leave it empty)
  • Delete the "Chorus" folder in the project clone
  • Send/Receive project with resumable
  • Clone project again into new folder
  • Verify that old folder and new folder have same hg tip SHA

For the Playwright reset-project test, create a new project (with a random project code so the test can run in parallel with itself) and commit a single file into it. Verify that we can see that file via /hg/browse in hgweb. Reset project to empty. Verify the file is gone. Reset project again, but this time upload the .zip file. Verify the file is visible again in hgweb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants