-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(limit-count-redis): add sync_rate to reduce syncing frequency of counter data with redis #11374
feat(limit-count-redis): add sync_rate to reduce syncing frequency of counter data with redis #11374
Conversation
… counter data with redis
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@@ -42,6 +42,9 @@ local policy_to_additional_properties = { | |||
redis_ssl_verify = { | |||
type = "boolean", default = false, | |||
}, | |||
sync_interval = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync_interval
should directly belong to plugin schema rather than Redis-schema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only redis based rate limiting plugins will support this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For further reuse purpose, suggest separating redis-schema definition from concrete plugin policy.
May be we could have structure like this:
apisix/utils/rate-limiting-policy.lua
- redis[_cluster]_cli_properties -> ref. redis-schema
- sync_interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be over-optimisation.
@@ -42,6 +42,9 @@ local policy_to_additional_properties = { | |||
redis_ssl_verify = { | |||
type = "boolean", default = false, | |||
}, | |||
sync_interval = { | |||
type = "integer", default = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to 0 and with restriction >= 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get you. But the default value denotes normal mode of operation. i.e sync with redis immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because delay(<=0)
arg. to the ngx.timer.at
function means executing the timeout callback as soon as the current handler yield.
But negative number for interval
have no meaning, so restrict sync_interval >= 0
.
And sync_interval = 0(default) means no delay(sync with redis immediately).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync_interval is in seconds, 0 second would gives misleading idea of "sync immediately"
end | ||
core.log.info("num reqs passed since sync to redis: ", incr) | ||
|
||
local ttl = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should update ttl
before returning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supporting ttl was complicating the code so I plan to support it in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTL have been read from redis and wrote to SHDICT by the syncer
local res, err = red:eval(script, 1, key, conf.count, conf.time_window, num_reqs)
if err then
core.log.error("failed to sync shdict data to redis: ", err)
return
end
local remaining = res[1]
local ttl = res[2]
core.log.info("syncing shdict num_req counter to redis. remaining: ", remaining,
" ttl: ", ttl, " reqs: ", num_reqs)
counter:set(key .. consts.SHDICT_REDIS_REMAINING, tonumber(remaining), tonumber(ttl))
So just read it from SHDICT is OK?
local remaining, err = counter:incr(key .. consts.SHDICT_REDIS_REMAINING,
0 - cost, limit, window)
local ttl, err = counter:ttl(key .. consts.SHDICT_REDIS_REMAINING)
to_be_synced[key] = true -- add to table for syncing | ||
redis_confs[key] = conf | ||
|
||
local incr, ierr = counter:incr(key .. consts.REDIS_COUNTER, cost, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why init_val
is 1 rather than 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the first requests should also be counted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for the 1st request after counter:incr
, the counter value is equal to 2.
-- run: resty --shdict 'counter 10m' incr.lua
local counter = ngx.shared["counter"]
local incr, _ = counter:incr("key", 1, 1)
print("incr: ", incr)
-- out: incr: 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should get remaining and judge remaining<0 before incr key..consts.REDIS_COUNTER ?Otherwise, the request that is rejected will be counted
|
||
function _M.rate_limit_with_delayed_sync(conf, counter, to_be_synced, redis_confs, key, cost, | ||
limit, window, script) | ||
local syncer_started = counter:get(consts.REDIS_SYNCER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get and test not atomic, multiple timer would be started.
consider using DICT:add, if succ. then start timer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get and test not atomic
the document doesn't say anything like that. Could you share the source of this info? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So your idea is one syncer per worker process
?
If so, why using SHDICT(consts.REDIS_SYNCER)
which is globally to all worker processes to check syncer_started
?
And this made me mistakenly think that your idea is one syncer per nginx server instance
.
local syncer_started = counter:get(consts.REDIS_SYNCER)
if not syncer_started then
local ok, err = ngx_timer.at(conf.sync_interval, sync_counter_data, counter, to_be_synced,
redis_confs, script)
if ok then
counter:set(consts.REDIS_SYNCER, true)
else
core.log.error("failed to create main redis syncer timer: ", err, ". Will retry next time.")
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get and test not atomic
the document doesn't say anything like that. Could you share the source of this info? 🤔
If you want to ensure one syncer per nginx server instance
(but not), the following logic(get and test) should be considered to be atomic.
local syncer_started = counter:get(consts.REDIS_SYNCER)
if not syncer_started then
limit, window, script) | ||
local syncer_started = counter:get(consts.REDIS_SYNCER) | ||
if not syncer_started then | ||
local ok, err = ngx_timer.at(conf.sync_interval, sync_counter_data, counter, to_be_synced, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syncer
is globally for a NGX server instance but conf.sync_interval
is plugin instance's configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if one syncer per worker process
, but which plugin instance's conf.sync_interval
the syncer
apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if
one syncer per worker process
, but which plugin instance'sconf.sync_interval
thesyncer
apply?
but which plugin instance's conf.sync_interval
the syncer
apply?
->
but which plugin conf
instance's sync_interval
the syncer
apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all plugins will have their own syncer as key
is unique to each plugin instance.
apisix/apisix/plugins/limit-count/init.lua
Line 243 in 4dbecfd
key = gen_limit_key(conf, ctx, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all plugins will have their own syncer as
key
is unique to each plugin instance.apisix/apisix/plugins/limit-count/init.lua
Line 243 in 4dbecfd
key = gen_limit_key(conf, ctx, key)
end | ||
end | ||
|
||
to_be_synced[key] = true -- add to table for syncing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_be_synced & redis_confs
are module scoped variables, worker processes that not running syncer
could not be able to tell syncer
to sync the keys added by them through this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each worker process will have it's copy of syncer so all worker processes will sync keys that they possess. What's the problem here? Or do you have any better alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the one syncer per worker process
scenario, it's OK.
BTW:
to_be_synced & redis_confs
are not necessary pass to ngx.timer.at as extra args?- using redis client instance instead of redis config?
could be like this:to_be_synced[key] = { redis_cli = red_cli }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using redis client instance instead of redis config?
cosockets cannot be shared to ngx.timer functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_be_synced & redis_confs are module scoped variables
I get it now. We need to make to_be_synced
and redis_confs
visible to the worker process which has syncer
.
Thanks for your review again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_be_synced & redis_confs are module scoped variables
I get it now. We need to make
to_be_synced
andredis_confs
visible to the worker process which hassyncer
.Thanks for your review again.
:D
Yes.
But these module variables are originally visible to the per worker process syncer
.
local ngx = ngx
local ngx_timer = ngx.timer
local ngx_sleep = ngx.sleep
local ngx_worker_pid = ngx.worker.pid
local table = table
local table_insert = table.insert
local table_remove = table.remove
local str_fmt = string.format
local chan = {}
local timer_cb = function(premature, ...)
if premature then
return
end
local data = table_remove(chan, 1)
local msg = str_fmt("--> timer is sched." ..
" on worker process(%d)," ..
" \n--> and got data from worker process(%s)" ..
" through the `chan`",
ngx_worker_pid(), tostring(data))
print(msg)
end
assert(ngx_timer.at(0, timer_cb))
local pid = ngx_worker_pid()
table_insert(chan, pid)
print(str_fmt("-> worker(%d)", pid), " will yield for running timer")
ngx_sleep(0)
print(str_fmt("<- worker(%d)", pid), " did yield")
-- out
-- -> worker(23037) will yield for running timer
-- --> timer is sched. on worker process(23037),
-- --> and got data from worker process(23037) through the `chan`
-- <- worker(23037) did yield
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using redis client instance instead of redis config?
cosockets cannot be shared to ngx.timer functions.
May be we could consider other implementation instead of ngx.timer. e.g. request driven syncer
:
if reach count or time threshold
, sync. immediately, in-place, in the req. handler context
core.log.info("syncing shdict num_req counter to redis. remaining: ", remaining, | ||
" ttl: ", ttl, " reqs: ", num_reqs) | ||
counter:set(key .. consts.SHDICT_REDIS_REMAINING, tonumber(remaining), tonumber(ttl)) | ||
counter:set(key .. consts.REDIS_COUNTER, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counter:set(key .. consts.REDIS_COUNTER, 0) | |
counter:incr(key .. consts.REDIS_COUNTER, -num_reqs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counter denotes the number of requests past since last sync to redis. So after syncing to redis the number of requests should be zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it now, you are right. Great catch!!
|
||
|
||
local function sync_counter_data(premature, counter, to_be_synced, redis_confs, script) | ||
if premature then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should delete/clear the syncer
flag here?
and what if the syncer
worker process crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syncer flag? wdym?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the one syncer per worker process
scenario, ignore it. :D
if premature then | ||
return | ||
end | ||
for key, _ in pairs(to_be_synced) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using sub-function for keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encapsulate a function for synchronizing single key. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format indent whitespace from 2 to 4?
and pass make lint
ERROR: apisix/plugins/limit-count/delayed_syncer.lua: line 33: getting the Lua global "pairs"
ERROR: apisix/plugins/limit-count/delayed_syncer.lua: line 67: getting the Lua global "tonumber"
ERROR: apisix/plugins/limit-count/delayed_syncer.lua: line 67: getting the Lua global "tonumber"
What do you think of? |
Description
Add the sync_rate parameter to control the frequency of synchronizing counters with redis in redis based limit-count plugins.
Real-time synchronization will put a lot of pressure on redis, refer to https://docs.konghq.com/hub/kong-inc/rate-limiting/configuration/#config-sync_rate.
Checklist