Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(limit-count-redis): add sync_rate to reduce syncing frequency of counter data with redis #11374

Conversation

shreemaan-abhishek
Copy link
Contributor

@shreemaan-abhishek shreemaan-abhishek commented Jun 26, 2024

Description

Add the sync_rate parameter to control the frequency of synchronizing counters with redis in redis based limit-count plugins.

Real-time synchronization will put a lot of pressure on redis, refer to https://docs.konghq.com/hub/kong-inc/rate-limiting/configuration/#config-sync_rate.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@@ -42,6 +42,9 @@ local policy_to_additional_properties = {
redis_ssl_verify = {
type = "boolean", default = false,
},
sync_interval = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_interval should directly belong to plugin schema rather than Redis-schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only redis based rate limiting plugins will support this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For further reuse purpose, suggest separating redis-schema definition from concrete plugin policy.

May be we could have structure like this:

apisix/utils/rate-limiting-policy.lua

- redis[_cluster]_cli_properties -> ref. redis-schema
- sync_interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be over-optimisation.

@@ -42,6 +42,9 @@ local policy_to_additional_properties = {
redis_ssl_verify = {
type = "boolean", default = false,
},
sync_interval = {
type = "integer", default = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to 0 and with restriction >= 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get you. But the default value denotes normal mode of operation. i.e sync with redis immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because delay(<=0) arg. to the ngx.timer.at function means executing the timeout callback as soon as the current handler yield.

But negative number for interval have no meaning, so restrict sync_interval >= 0.
And sync_interval = 0(default) means no delay(sync with redis immediately).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_interval is in seconds, 0 second would gives misleading idea of "sync immediately"

end
core.log.info("num reqs passed since sync to redis: ", incr)

local ttl = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update ttl before returning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supporting ttl was complicating the code so I plan to support it in another PR.

Copy link
Contributor

@zhoujiexiong zhoujiexiong Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTL have been read from redis and wrote to SHDICT by the syncer

    local res, err = red:eval(script, 1, key, conf.count, conf.time_window, num_reqs)
    if err then
      core.log.error("failed to sync shdict data to redis: ", err)
      return
    end

    local remaining = res[1]
    local ttl = res[2]
    core.log.info("syncing shdict num_req counter to redis. remaining: ", remaining,
                  " ttl: ", ttl, " reqs: ", num_reqs)
    counter:set(key .. consts.SHDICT_REDIS_REMAINING, tonumber(remaining), tonumber(ttl))

So just read it from SHDICT is OK?

  local remaining, err = counter:incr(key .. consts.SHDICT_REDIS_REMAINING,
                                      0 - cost, limit, window)
  local ttl, err = counter:ttl(key .. consts.SHDICT_REDIS_REMAINING)

to_be_synced[key] = true -- add to table for syncing
redis_confs[key] = conf

local incr, ierr = counter:incr(key .. consts.REDIS_COUNTER, cost, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why init_val is 1 rather than 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the first requests should also be counted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for the 1st request after counter:incr, the counter value is equal to 2.

-- run: resty --shdict 'counter 10m' incr.lua
local counter = ngx.shared["counter"]
local incr, _ = counter:incr("key", 1, 1)
print("incr: ", incr)
-- out: incr: 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch thank you.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should get remaining and judge remaining<0 before incr key..consts.REDIS_COUNTER ?Otherwise, the request that is rejected will be counted


function _M.rate_limit_with_delayed_sync(conf, counter, to_be_synced, redis_confs, key, cost,
limit, window, script)
local syncer_started = counter:get(consts.REDIS_SYNCER)
Copy link
Contributor

@zhoujiexiong zhoujiexiong Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get and test not atomic, multiple timer would be started.

consider using DICT:add, if succ. then start timer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get and test not atomic

the document doesn't say anything like that. Could you share the source of this info? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your idea is one syncer per worker process?

If so, why using SHDICT(consts.REDIS_SYNCER) which is globally to all worker processes to check syncer_started?

And this made me mistakenly think that your idea is one syncer per nginx server instance.

local syncer_started = counter:get(consts.REDIS_SYNCER)
 if not syncer_started then
   local ok, err = ngx_timer.at(conf.sync_interval, sync_counter_data, counter, to_be_synced,
                                redis_confs, script)
   if ok then
     counter:set(consts.REDIS_SYNCER, true)
   else
     core.log.error("failed to create main redis syncer timer: ", err, ". Will retry next time.")
   end
 end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get and test not atomic

the document doesn't say anything like that. Could you share the source of this info? 🤔

If you want to ensure one syncer per nginx server instance(but not), the following logic(get and test) should be considered to be atomic.

local syncer_started = counter:get(consts.REDIS_SYNCER)
if not syncer_started then

limit, window, script)
local syncer_started = counter:get(consts.REDIS_SYNCER)
if not syncer_started then
local ok, err = ngx_timer.at(conf.sync_interval, sync_counter_data, counter, to_be_synced,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncer is globally for a NGX server instance but conf.sync_interval is plugin instance's configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if one syncer per worker process, but which plugin instance's conf.sync_interval the syncer apply?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if one syncer per worker process, but which plugin instance's conf.sync_interval the syncer apply?

but which plugin instance's conf.sync_interval the syncer apply?
->
but which plugin conf instance's sync_interval the syncer apply?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all plugins will have their own syncer as key is unique to each plugin instance.

key = gen_limit_key(conf, ctx, key)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all plugins will have their own syncer as key is unique to each plugin instance.

key = gen_limit_key(conf, ctx, key)

But per current creation logic, syncer is GLOBAL?
image

end
end

to_be_synced[key] = true -- add to table for syncing
Copy link
Contributor

@zhoujiexiong zhoujiexiong Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_be_synced & redis_confs are module scoped variables, worker processes that not running syncer could not be able to tell syncer to sync the keys added by them through this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each worker process will have it's copy of syncer so all worker processes will sync keys that they possess. What's the problem here? Or do you have any better alternative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the one syncer per worker process scenario, it's OK.

BTW:

  1. to_be_synced & redis_confs are not necessary pass to ngx.timer.at as extra args?
  2. using redis client instance instead of redis config?
    could be like this: to_be_synced[key] = { redis_cli = red_cli }?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using redis client instance instead of redis config?

cosockets cannot be shared to ngx.timer functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_be_synced & redis_confs are module scoped variables

I get it now. We need to make to_be_synced and redis_confs visible to the worker process which has syncer.

Thanks for your review again.

Copy link
Contributor

@zhoujiexiong zhoujiexiong Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_be_synced & redis_confs are module scoped variables

I get it now. We need to make to_be_synced and redis_confs visible to the worker process which has syncer.

Thanks for your review again.

:D

Yes.
But these module variables are originally visible to the per worker process syncer.

local ngx = ngx
local ngx_timer = ngx.timer
local ngx_sleep = ngx.sleep
local ngx_worker_pid = ngx.worker.pid
local table = table
local table_insert = table.insert
local table_remove = table.remove
local str_fmt = string.format

local chan = {}

local timer_cb = function(premature, ...)
    if premature then
        return
    end
    local data = table_remove(chan, 1)
    local msg = str_fmt("--> timer is sched." ..
        " on worker process(%d)," ..
        " \n--> and got data from worker process(%s)" ..
        " through the `chan`",
        ngx_worker_pid(), tostring(data))
    print(msg)
end

assert(ngx_timer.at(0, timer_cb))
local pid = ngx_worker_pid()
table_insert(chan, pid)
print(str_fmt("-> worker(%d)", pid), " will yield for running timer")
ngx_sleep(0)
print(str_fmt("<- worker(%d)", pid), " did yield")

-- out
-- -> worker(23037) will yield for running timer
-- --> timer is sched. on worker process(23037), 
-- --> and got data from worker process(23037) through the `chan`
-- <- worker(23037) did yield

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using redis client instance instead of redis config?

cosockets cannot be shared to ngx.timer functions.

May be we could consider other implementation instead of ngx.timer. e.g. request driven syncer:
if reach count or time threshold, sync. immediately, in-place, in the req. handler context

core.log.info("syncing shdict num_req counter to redis. remaining: ", remaining,
" ttl: ", ttl, " reqs: ", num_reqs)
counter:set(key .. consts.SHDICT_REDIS_REMAINING, tonumber(remaining), tonumber(ttl))
counter:set(key .. consts.REDIS_COUNTER, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
counter:set(key .. consts.REDIS_COUNTER, 0)
counter:incr(key .. consts.REDIS_COUNTER, -num_reqs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counter denotes the number of requests past since last sync to redis. So after syncing to redis the number of requests should be zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it now, you are right. Great catch!!



local function sync_counter_data(premature, counter, to_be_synced, redis_confs, script)
if premature then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should delete/clear the syncer flag here?

and what if the syncer worker process crash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncer flag? wdym?

Copy link
Contributor

@zhoujiexiong zhoujiexiong Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the one syncer per worker process scenario, ignore it. :D

if premature then
return
end
for key, _ in pairs(to_be_synced) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using sub-function for keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encapsulate a function for synchronizing single key. :D

@apache apache deleted a comment from zhoujiexiong Jul 3, 2024
@apache apache deleted a comment from zhoujiexiong Jul 3, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format indent whitespace from 2 to 4?

and pass make lint

ERROR: apisix/plugins/limit-count/delayed_syncer.lua: line 33: getting the Lua global "pairs"
ERROR: apisix/plugins/limit-count/delayed_syncer.lua: line 67: getting the Lua global "tonumber"
ERROR: apisix/plugins/limit-count/delayed_syncer.lua: line 67: getting the Lua global "tonumber"

@zhoujiexiong
Copy link
Contributor

zhoujiexiong commented Jul 4, 2024

@shreemaan-abhishek

one syncer per worker process vs one syncer per nginx server instance

What do you think of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants