A gem for archiving (deleting) old records you no longer need. Send them straight to tartarus!
Add this line to your application's Gemfile:
gem 'tartarus-rb'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install tartarus-rb
This game is based on sidekiq-cron, which means you can manage (e.g. disable/enable) jobs from sidekiq-cron UI.
Here are some examples how to use it
Put it in the initializer, e.g. in config/initializers/sidekiq.rb
right after loading schedule for sidekiq-cron
:
Sidekiq.configure_server do |config|
config.on(:startup) do
schedule_file = "config/schedule.yml"
if File.exist?(schedule_file) && Sidekiq.server?
Sidekiq::Cron::Job.load_from_hash YAML.load_file(schedule_file)
tartarus = Tartarus.new
tartarus.register do |item|
item.model = ModelThatYouWantToArchive
item.cron = "5 4 * * *"
item.queue = "default"
item.tenants_range = -> { Account.active }
item.tenant_value_source = :uuid
item.tenant_id_field = :account_uuid
item.archive_items_older_than = -> { 30.days.ago }
item.timestamp_field = :created_at
item.archive_with = :destroy_all
end
tartarus.register do |item|
item.model = OtherModelThatYouWantToArchive
item.cron = "5 5 * * *"
item.queue = "default"
item.tenants_range = -> { ["Account", "User"] }
item.tenant_id_field = :model_type
item.archive_items_older_than = -> { 30.days.ago }
item.timestamp_field = :created_at
end
glacier_configuration = Tartarus::RemoteStorage::Glacier::Configuration.build(
aws_key: ENV.fetch("AWS_KEY"),
aws_secret: ENV.fetch("AWS_SECRET"),
aws_region: ENV.fetch("AWS_REGION"),
vault_name: ENV.fetch("GLACIER_VAULT_NAME"),
root_path: Rails.root.to_s,
archive_registry_factory: ArchiveRegistry,
)
# don't forget about installing `aws-sdk-glacier` gem
tartarus.register do |item|
item.model = YetAnotherModel
item.cron = "5 6 * * *"
item.queue = "default"
item.timestamp_field = :created_at
item.archive_items_older_than = -> { 1.week.ago }
item.remote_storage = Tartarus::RemoteStorage::Glacier.new(glacier_configuration)
end
tartarus.schedule # this method must be called to create jobs for sidekiq-cron!
end
end
end
You can use the following config params:
model
- a name of the ActiveReord model you want to archive, requiredname
- name of your strategy, optional. It fallbacksmodel.to_s
. It's important to set in in cases when you have several strategies for the same model:
tartarus.register do |item|
item.model = InternalEvent
item.name = "archive_account_and_user_internal_events"
item.cron = "5 5 * * *"
item.queue = "default"
item.tenants_range = -> { ["Account", "User"] }
item.tenant_id_field = :model_type
item.archive_items_older_than = -> { 30.days.ago }
item.timestamp_field = :created_at
end
tartarus.register do |item|
item.model = InternalEvent
item.name = "archive_post_and_comment_internal_events"
item.cron = "5 15 * * *"
item.queue = "default"
item.tenants_range = -> { ["Post", "Comment"] }
item.tenant_id_field = :model_type
item.archive_items_older_than = -> { 10.days.ago }
item.timestamp_field = :created_at
end
cron
- cron syntax, requiredqueue
- name of the sidekiq queue you want to use for execution of the jobs, requiredtenants_range
- optional, use if you want to scope items by a tenant (or any field that can be used for partitioning). It doesn't have to be ActiveRecord collection, could be just an array. Must be a proc/lambda/object responding tocall
method. For ActvieRecord collection,find_each
loop will be used for optimization.tenant_value_source
- optional but required if you want to have scoping by tenant/partitioning field. Specifying:uuid
here means that ModelThatYouWantToArchive collection will be scheduled for archiving by uuid of each Account. It defaults toid
.tenant_id_field
- required when using tenant_value_source/tenant_value_source. It's a DB column that will be used for scoping records by a tenant. For example, here it would be:ModelThatYouWantToArchive.where(account_uuid: value_of_uuid_from_some_active_account)
archive_items_older_than
- required, for defining retention policytimestamp_field
- required, used for performing a query using the value fromarchive_items_older_than
archive_with
- optional (defaults todelete_all
). Could bedelete_all
,destroy_all
,delete_all_without_batches
,destroy_all_without_batches
,delete_all_using_limit_in_batches
batch_size
- optional (defaults to10_000
, used withdelete_all_using_limit_in_batches
strategy)remote_storage
- optional (defaults toTartarus::RemoteStorage::Null
which does nothing). Use this option if you want store the data somewhere before deleting it.
Currently, only Glacier
(for AWS Glacier) is supported. Also, it works only with Postgres database and requires postgres-copy.
To take advantage of this feature you will need a couple of things:
- Apply
acts_as_copy_target
to the archivable model (frompostgres-copy
gem). - Create a model that will be used as a registry for all uploads that happened.
- Install
aws-sdk-glacier
gem.
If you want to make Version
model archivable and use ArchiveRegistry
as the registry, you will need the following models and tables:
database.create_table(:archive_registries) do |t|
t.string :glacier_location, null: false
t.string :glacier_checksum, null: false
t.string :glacier_archive_id, null: false
t.string :archivable_model, null: false
t.string :tenant_id_field
t.string :tenant_id
t.datetime :completed_at, null: false
end
database.create_table(:versions) do |t|
end
class Version < ApplicationRecord
acts_as_copy_target
end
class ArchiveRegistry < ApplicationRecord
end
You can use the above schema for the registry model as it contains all needed fields.
To initialize the service:
glacier_configuration = Tartarus::RemoteStorage::Glacier::Configuration.build(
aws_key: ENV.fetch("AWS_KEY"),
aws_secret: ENV.fetch("AWS_SECRET"),
aws_region: ENV.fetch("AWS_REGION"),
vault_name: ENV.fetch("GLACIER_VAULT_NAME"),
root_path: Rails.root.to_s,
archive_registry_factory: ArchiveRegistry,
)
Tartarus::RemoteStorage::Glacier.new(glacier_configuration)
You can also pass account_id
(by default "-" string will be used):
glacier_configuration = Tartarus::RemoteStorage::Glacier::Configuration.build(
aws_key: ENV.fetch("AWS_KEY"),
aws_secret: ENV.fetch("AWS_SECRET"),
aws_region: ENV.fetch("AWS_REGION"),
vault_name: ENV.fetch("GLACIER_VAULT_NAME"),
root_path: Rails.root.to_s,
archive_registry_factory: ArchiveRegistry,
account_id: "some_account_id"
)
Tartarus::RemoteStorage::Glacier.new(glacier_configuration)
Important - do not use Glacier Storage for large batches (> 4 GB) as multipart uploads are not supported yet.
If you know what you are doing, you can add your own storage, as long as it complies with the following interface:
class Glacier
attr_reader :configuration
private :configuration
def initialize(configuration)
@configuration = configuration
end
def store(collection, archivable_model, tenant_id: nil, tenant_id_field: nil)
end
end
You might want to verify that the gem works in the way you expect it to work. For that, you will be mostly interested in 2 usecases:
- scheduling/enqueueing: use
Tartarus::ScheduleArchivingModel#schedule
- for example,Tartarus::ScheduleArchivingModel.new.schedule("PaperTrailVersion")
, it's going to enqueue eitherTartarus::Sidekiq::ArchiveModelWithTenantJob
orTartarus::Sidekiq::ArchiveModelWithoutTenantJob
, depending on the config. - execution of the archiving logic: use
Tartarus::ArchiveModelWithTenant#archive
(for example,Tartarus::ArchiveModelWithTenant.new.archive("PaperTrailVersion", "User")
) orTartarus::ArchiveModelWithoutTenant#archive
(for example,Tartarus::ArchiveModelWithoutTenant.new.archive("PaperTrailVersion")
)
You might also want to check spec/integration
to get an idea how the integration tests were written.
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/tartarus-rb.
The gem is available as open source under the terms of the MIT License.