-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically create partitions for alert executions #398
Conversation
Admin user at runtime = 5 Admin user in tests = 5 Migration connection = 1 Total possible connections is 11 so let's do 15
Have we setup the Akka pools for these actors? You're updating connection pool sizes, but if the Akka actor's thread pools aren't big enough (they usually assume non-blocking), then you're gonna block before the connection open. |
Is the concern because I block on actor start whereas we ordinarily would let that happen asynchronously? I'm not sure I entirely understand the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional context for reviewers.
@@ -1165,7 +1165,6 @@ | |||
<groupId>org.flywaydb</groupId> | |||
<artifactId>flyway-play_${scala.package.version}</artifactId> | |||
<version>${flyway.play.version}</version> | |||
<scope>runtime</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scope changed because I need to block on the flyway-play PlayInitializer
from within MainModule
to guarantee that migrations have run.
@@ -27,7 +27,7 @@ ALTER ROLE metrics_app WITH NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLIC | |||
|
|||
CREATE ROLE metrics_dba LOGIN; | |||
ALTER ROLE metrics_dba WITH PASSWORD 'metrics_dba_password'; | |||
ALTER ROLE metrics_dba WITH NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION CONNECTION LIMIT 6; | |||
ALTER ROLE metrics_dba WITH NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION CONNECTION LIMIT 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 15:
I bumped the connection pool to be the same size as what we have for metrics_dml
, and set the size to be the same in tests. That is 5 + 5 = 10
connections. There's also an additional connection flyway creates with this role. That's a total of 11. I then just bumped it to 15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Flyway creates (for some reason) 2 connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any change required here then? that would be 12 which is still below the 15 limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, just trying to spread the knowledge.
-- Table - Text - The name of the parent table. | ||
-- Start - Date - The beginning date of the time range, inclusive. | ||
-- End - Date - The end date of the time range, exclusive. | ||
CREATE OR REPLACE FUNCTION create_daily_partition(schema TEXT, tablename TEXT, start_date DATE, end_date DATE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content is exactly the same as the previous function, just formatted in a nicer way and with an added schema parameter
The concern regarding thread pools is because you seem to need to adjust the connection pools. If we're now seeing enough database traffic to warrant tuning the connection pools, we'll also need to tweak the thread pools (requests are served on Akka actors). It hadn't been a problem up until now so we've just kinda ignored it. But it seemed like you were hitting some limitation. And I just wanted to call out that just bumping the connection pool size would likely be insufficient. |
I believe the issue here is that the code is now using the DDL connection pool to manage the partition tables. Previously, this connection pool was only used by Flyway and tuned to prevent accidental usage (by limiting its size). Now that we are intentionally using this pool for dynamically extending the partitions, the default pool size needs to be increased. |
Yes, this was my reasoning for increasing the connection pool size. It is likely overkill to have it the same size as the DML pool since the only additional traffic will be from the single partition-creating actor. The request pool was unchanged because the additional traffic is from an internal actor rather than generated via external requests. |
Some additional context that I provided in slack:
|
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Show resolved
Hide resolved
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
.setParameter(3, startDate) | ||
.setParameter(4, endDate); | ||
|
||
sql.findOneOrEmpty().orElseThrow(() -> new IllegalStateException("Expected a single empty result.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a top level message handler. Throwing the exception here will just result in an uncaught message handler exception in Akka. Seems like we could do better handling, logging and instrumenting this failure. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the top-level handler - the actual handler is the execute
overload with no parameters. This has the actual DB logic but the other code adds logging/metrics.
It is the case though that the wrapper code will not handle this illegal state exception which would still cause the problem you're mentioning albeit for different reasons so I could probably handle that.
app/com/arpnetworking/metrics/portal/alerts/impl/DatabaseAlertExecutionRepository.java
Outdated
Show resolved
Hide resolved
@@ -27,7 +27,7 @@ ALTER ROLE metrics_app WITH NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLIC | |||
|
|||
CREATE ROLE metrics_dba LOGIN; | |||
ALTER ROLE metrics_dba WITH PASSWORD 'metrics_dba_password'; | |||
ALTER ROLE metrics_dba WITH NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION CONNECTION LIMIT 6; | |||
ALTER ROLE metrics_dba WITH NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION CONNECTION LIMIT 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Flyway creates (for some reason) 2 connections.
@@ -441,6 +473,23 @@ public EbeanServer get() { | |||
} | |||
} | |||
|
|||
private static final class MetricsPortalDDLEbeanServerProvider implements Provider<EbeanServer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my edification: you wrote this Provider<...>
that you bind()
up above, but you also wrote a @Provides
private method. Why? (If it was a purely stylistic difference, I'd have expected you to use only one.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's purely stylistic in this case, the behavior would be identical. I wrote a SIC for this to keep it consistent with the other EBeanServer provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally though I do prefer @Provides
methods because there's less boilerplate.
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
.toCompletableFuture() | ||
.get(); | ||
public static void stop(final ActorRef ref, final Duration timeout) throws ExecutionException, InterruptedException { | ||
Patterns.gracefulStop(ref, timeout).toCompletableFuture().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth the method? Especially, since it actually has no logic specific to this actor. Moreover, you could use PatternCS
(assuming we're on a new enough version of Akka in MP) which should simplify the call to stop (I believe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can inline this; I had a method to be consistent with the start
call but since that was removed there's no need for the consistency.
As far as the APIs used, PatternsCS and Patterns were merged in Akka 2.5.19 (we're running 2.5.20).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked and MAD is running Akka 2.5.16 so it makes sense that using Patterns works here but not over there.
* interrupted for other reasons. | ||
*/ | ||
public static void stop(final ActorRef ref, final Duration timeout) throws ExecutionException, InterruptedException { | ||
Patterns.gracefulStop(ref, timeout).toCompletableFuture().get(); | ||
public static void ensurePartitionExistsForDate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely without merit, and not as concerned, but at least may be simpler with PatternsCS
. I believe we also doubled up on the timeouts with this pattern in MAD and CAGG; e.g.
ArpNetworking/metrics-aggregator-daemon@158884e#diff-fed66f6be12dfd97b6fb62212dd0d6ebR52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above PatternsCS and Patterns are essentially the same API for this version of Akka - but I'll add the timeouts. I'm pretty sure I'm the one that asked you to double up on that code to begin with 😄
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Show resolved
Hide resolved
.matchEquals(EXECUTE, msg -> execute()) | ||
.match(CreateForRange.class, msg -> { | ||
final Status.Status resp = execute(msg.getStart(), msg.getEnd()); | ||
if (!getSender().equals(getSelf())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. Is this what you were looking for?
https://doc.akka.io/docs/akka/current/typed/interaction-patterns.html
See: "Ignoring replies"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I was looking for, but that's Akka 2.6.XX
using typed actors so it's not available here 😢
app/com/arpnetworking/metrics/portal/alerts/impl/DailyPartitionCreator.java
Outdated
Show resolved
Hide resolved
private void tick() { | ||
recordCounter("tick", 1); | ||
|
||
final Instant now = _clock.instant(); | ||
if (_schedule.nextRun(_lastRun).map(run -> run.isBefore(now)).orElse(true)) { | ||
getSelf().tell(EXECUTE, getSelf()); | ||
final LocalDate startDate = ZonedDateTime.ofInstant(now, _clock.getZone()).toLocalDate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a local timezone here? What happens if two boxes have different timezones? Would this work equally well just fixed to UTC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a local time zone, it is UTC regardless of host. I initialized the clock to Clock.systemUTC()
. I can be explicit here and just specify UTC a second time.
|
||
private void updateCache(final LocalDate start, final LocalDate end) { | ||
LocalDate date = start; | ||
while (!date.equals(end)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these are local dates, I believe they won't be equal if they have different locales (zones) even if adjusted it's the same point in time. Is this what we want for comparison? (related: see comment about zones above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the concern, it's not applicable here because LocalDate instances do not have timezone information - it's essentially just day/month/year
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm treating all time in here as UTC which is why the zone offset information was dropped)
Introduce a DailyPartition manager actor that runs within the
DatabaseAlertExecutionRepository
, periodically attempting to create partitions for the alert executions table that was added in #380. Notably I've introduced anotherEbeanServer
which uses the DDL role (metrics_dba
), since that Role is necessary to create tables.I've included unit tests for the actor while the DB testing is covered by the existing integration tests.