Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

business contact validation + tests #2691

Closed

Conversation

OlegPhenomenon
Copy link
Contributor

@OlegPhenomenon OlegPhenomenon commented Sep 24, 2024

bundle exec rake company_status:check_all -- --open_data_file_path=lib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv --missing_companies_output_path=lib/tasks/data/missing_companies_in_business_registry.csv --deleted_companies_output_path=lib/tasks/data/deleted_companies_from_business_registry.csv --download_path=https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip --soft_delete_enable=false --registrants_only=false

This rake task performs the following actions:

  • downloads an archive
  • unzips it
  • checks all companies from our registry to see if they are in the business registry based on the downloaded data
  • if not present, a query is made to the business registry
  • if a company has been deleted, it is saved in the file specified here at deleted_companies_output_path, if information about the company is missing, it is saved in the file specified here at missing_companies_output_path
  • we set company status and validation date to the Contact model
  • We can also decide whether to perform a soft deletion or not through a flag (needed for the first run).

Therefore, the attributes look like this:

  • open_data_file_path - specifies where the data is saved and retrieved from. Default value lib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv
  • missing_companies_output_path - specifies the path where companies not found in the business registry will be saved. Default value lib/tasks/data/missing_companies_in_business_registry.csv
  • deleted_companies_output_path - specifies the path where companies that have been removed from the registry will be saved. Default value deleted_companies_from_business_registry.csv
  • download_path - specifies where the data will be downloaded from. Default value https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip
  • soft_delete - Indicates whether to run soft deletion for companies that have been removed, gone bankrupt, or are missing from the business registry. (Default value False)

Since this command already includes default values, it is not necessary to enter any parameters; they were simply added for greater flexibility. Therefore, you can run the following command:
bundle exec rake company_status:check_all

and the data will be available in the directory tmp/

The job:

CompanyRegisterStatusJob.perform_later(days_interval = 14, spam_time_delay = 0.2, batch_size = 100, download_open_data_file_url='https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip')

This job accepts the following parameters:

  • days_interval - selects domains that were last checked more than {days_interval} days ago.
  • spam_time_delay - this is the time delay when querying the business registry.
  • batch_size - the size of the batch for processing. This is needed for optimization.
  • download_open_data_file_url - the URL from which to download the business registry data.

As indicated above, all these values have default settings, so they can be modified if necessary.

What the job does:

  • It selects companies from Estonia that were checked N days ago or companies that are in liquidation/bankruptcy/removed from the registry - or generally contain no information about having been validated (NULL value).
  • For each of these, a request is made to the registry to determine the status.
  • If the status is K/N or there is no information, we set ForceDelete if it is not already set or SoftDelete if kandeliik is Kustutamiskanne dokumentide hoidjata.
  • If the previous status was R, and the status in the business registry is R, we simply update the date of the check.
  • If a domain has ForceDelete due to the company's status, and the status is K/N, but the business registry shows status R, we cancel ForceDelete.
  • For domains in status_notes, we specify the following information Company no: {ident_number} if we set ForceDelete due to bankruptcy, company removal from the registry, or its absence.
  • If the domain status is L, we send them an email.

Also we use whitelist for skip some organization. Whitelist is indicated in application.yml file and it has this structure:
whitelist_companies:

  - '12345678'
  - '87654321'

POTENTIAL PROBLEM: It could happen that we decide to check a large array of data in one day, and say the next time we decide to check in a year, and logically this job might process a large list of companies exactly one year later. This should be kept in mind.

this PR related to this one #internetee/company_register#6

related tickets: internetee/company_register#4 internetee/company_register#5

@OlegPhenomenon OlegPhenomenon deleted the business-registry-check-for-company-existinп-2 branch September 26, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant