business contact validation + tests #2691
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This rake task performs the following actions:
Therefore, the attributes look like this:
open_data_file_path
- specifies where the data is saved and retrieved from. Default valuelib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv
missing_companies_output_path
- specifies the path where companies not found in the business registry will be saved. Default valuelib/tasks/data/missing_companies_in_business_registry.csv
deleted_companies_output_path
- specifies the path where companies that have been removed from the registry will be saved. Default valuedeleted_companies_from_business_registry.csv
download_path
- specifies where the data will be downloaded from. Default valuehttps://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip
soft_delete
- Indicates whether to run soft deletion for companies that have been removed, gone bankrupt, or are missing from the business registry. (Default value False)Since this command already includes default values, it is not necessary to enter any parameters; they were simply added for greater flexibility. Therefore, you can run the following command:
bundle exec rake company_status:check_all
and the data will be available in the directory tmp/
The job:
This job accepts the following parameters:
days_interval
- selects domains that were last checked more than {days_interval} days ago.spam_time_delay
- this is the time delay when querying the business registry.batch_size
- the size of the batch for processing. This is needed for optimization.download_open_data_file_url
- the URL from which to download the business registry data.As indicated above, all these values have default settings, so they can be modified if necessary.
What the job does:
Also we use whitelist for skip some organization. Whitelist is indicated in application.yml file and it has this structure:
whitelist_companies:
POTENTIAL PROBLEM: It could happen that we decide to check a large array of data in one day, and say the next time we decide to check in a year, and logically this job might process a large list of companies exactly one year later. This should be kept in mind.
this PR related to this one #internetee/company_register#6
related tickets: internetee/company_register#4 internetee/company_register#5