From e414a60e483a9c1350accce6197db916d7b8499e Mon Sep 17 00:00:00 2001 From: Eric Lipe Date: Mon, 26 Aug 2024 09:32:10 -0400 Subject: [PATCH 1/5] - initial commit with templates - halfway completed tm for new multi select filter updates --- .../multi-select-filters.md | 26 ++++++++ .../tech-memos/reparse.md | 48 ++++++++++++++ .../tech-memos/sequential-reparse.md | 62 +++++++++++++++++++ .../tech-memos/tm-template.md | 26 ++++++++ 4 files changed, 162 insertions(+) create mode 100644 docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md create mode 100644 docs/Technical-Documentation/tech-memos/reparse.md create mode 100644 docs/Technical-Documentation/tech-memos/sequential-reparse.md create mode 100644 docs/Technical-Documentation/tech-memos/tm-template.md diff --git a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md new file mode 100644 index 0000000000..fb149f2858 --- /dev/null +++ b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md @@ -0,0 +1,26 @@ +# Multi-Select Filters + +**Audience**: TDP Software Engineers
+**Subject**: Multi-Select Filter Integration
+**Date**: August 8, 2024
+ +## Summary +This is a template to use to create new technical memorandums. + +## Background +TDP has been expanding it's Django Admin Console (DAC) filtering capabilities by introducing custom filters, specifically multi-select filters. This has introduced a myriad of issues because TDP does not use the default DAC. Instead, to assist with accessability compliance TDP wraps the default DAC with [Django 508](https://github.com/raft-tech/django-admin-508) (henceforth referred to as 508) which makes various updates to the styling and functionality of the default DAC. A key change is that 508 introduces to the DAC is an `Apply Filters` button that intercepts query string parameters from default DAC filters and only applies them after clicking the button. The default DAC applies the filters as they are selected as opposed to all at once. The issue with 508's approach is that it assumes all filters are builtin Django filters (i.e. single select filters). This presents a discrepancy because Django allows developers to write custom templates and filters to add further filtering functionality (e.g. multi-select filters). + +## Out of Scope +Call out what is out of scope for this technical memorandum and should be considered in a different technical memorandum. + +## Method/Design +This section should contain sub sections that provide general implementation details surrounding key components required to implement the feature. + +### Sub header (piece of the design, can be many of these) +sub header content describing component. + +## Affected Systems +provide a list of systems this feature will depend on/change. + +## Use and Test cases to consider +provide a list of use cases and test cases to be considered when the feature is being implemented. diff --git a/docs/Technical-Documentation/tech-memos/reparse.md b/docs/Technical-Documentation/tech-memos/reparse.md new file mode 100644 index 0000000000..5c1f5991e8 --- /dev/null +++ b/docs/Technical-Documentation/tech-memos/reparse.md @@ -0,0 +1,48 @@ +# Reparsing + +**Audience**: TDP Software Engineers
+**Subject**: Reparsing
+**Date**: August 9, 2024
+ +## Summary +Re-parsing improves the flexibility of TDP's workflow for ingesting data files. These enhancement requests came out of pragmatic needs by the administrator user of the tool from 3041 and theoretical concerns from the development team in addressing current system limitations with this new feature. + +## Background +https://github.com/raft-tech/TANF-app/issues/2870 +https://github.com/raft-tech/TANF-app/issues/2820 +https://github.com/raft-tech/TANF-app/releases/tag/v3.2.0-Sprint-90 +https://github.com/raft-tech/TANF-app/pull/2772 +https://github.com/raft-tech/TANF-app/issues/1858 +https://github.com/raft-tech/TANF-app/issues/1350 + +[Driving force of reparsing](https://github.com/raft-tech/TANF-app/issues/2870) +- Reparsing files that are stuck in pending to some other state because validators have changed, or the parser has better exception handling + +## Out of Scope +- Parsing and/or validator logic changes +- Data Model or search_indices changes +- Systemic/Infrastructure changes to accommodate large data sets +- End-user facing changes to our frontend +- Pipeline or Orchestration changes + +## Method/Design +The reparsing enhancements focus on a maturization of the clean_and_reparse.py django commando which needed CLI invocation by system administrator(s). To mature and polish this feature to meet our new deliverables, we plan to shift major functionality and visibility into the Administrator Console to leverage our existing tools within. + +#3004 introduced an initial pass at reparsing. From this key components were identified that would improve both reparsing and it's usability for the system administrators. The following items were identified to enhance the reparsing feature: introduce a Django model that tracks meta data surrounding the reparsing event, managing data synchronization and parallel execution of reparsing events, and moving away from the current CLI interface in favor of a DAC specific way to execute reparsing. + +### Meta Model +This enhancement will seek to improve our visibility into what has happened during execution of a re-parsing command. We believe creating a database model to store relevant fields about the run will improve usability. Fields will include (start time, end time, number of files processed, which files were targetted, number of records repopulated, etc.) + +### Data Synchronization +... + +### DAC Reparse Action +To mature and polish this feature it should no longer be executed from the CLI. The DAC provides all/most of the necessary filtering required to specify what datafiles to reparse. Adding a new `reparse` action to the `DataFiles` page in the DAC provides a seamless experience for the admins while also providing the reparse event with the appropriate datafiles. + #### Confirmation dialog asking "are you sure you want to reparse?" + +## Affected Systems +- Elastic +- Postgres (records, dfs, datafiles, parser errors) + +## Use and Test cases to consider +provide a list of use cases and test cases to be considered when the feature is being implemented. diff --git a/docs/Technical-Documentation/tech-memos/sequential-reparse.md b/docs/Technical-Documentation/tech-memos/sequential-reparse.md new file mode 100644 index 0000000000..077d21f367 --- /dev/null +++ b/docs/Technical-Documentation/tech-memos/sequential-reparse.md @@ -0,0 +1,62 @@ +# Guarantee Sequential Reparse Events + +**Audience**: TDP Software Engineers
+**Subject**: Sequential Reparsing
+**Date**: August 8, 2024
+ + +## Summary +This technical memorandum aims to provide a software engineer with initial research, design patterns, and ideas necessary +to implement sequential reparsing in the TDP application. This document covers distributed/parallel data safety, how +the data synchronization allows sequential execution guarantees, and a last ditch timeout calculation necessary to +guarantee sequential reparse events at the application level. This memorandum does not take into account network partition tolerance or parsing idempotence. + +## Background +When a reparse event is executed by an admin user a set of size N files can be selected where N is on the range +[0, # of datafiles in DB]. For each reparsing event, a ReparseMeta Django model is created to track meta data about the +event such as: the number of files to be reparsed, the number of records deleted before reparsing, the number of records +created during reparsing, a backup location, etc... The meta model also contains the fields: `files_completed`, and +`files_failed`. These two fields were added to the model for it to be able to track when all files in it's set of files +had finished the parsing process, regardless of whether they passed or failed parsing. + +## Distributed/Parallel Data Safety +In the [Background](#background) section the meta model and some of it's fields were introduced along with the idea that +a reparse event generates N parsing tasks. Because (theoretically) all the tasks can execute in parallel, and there is +only one meta model per event, the meta model inherently becomes a shared object and therefore must be synchronized +across the set of N parsing tasks. There are many ways to synchronize data in a distributed system, both custom and not. +However, because the meta model is a database object, this technical memorandum suggests using the already tested and +vetted concurrency control and synchronization mechanisms inherent to TDPs Postgres database. That is for the fields in +the meta model that need to be updated in parallel (`files_completed`, `files_failed`, `num_records_created`), the +implementing engineer should ensure to leverage Django queries that convert to minimumly scoped locking database +transactions. This memorandum suggests leveraging the [select_for_update()](https://docs.djangoproject.com/en/5.0/ref/models/querysets/#select-for-update) query provides row based locking for transactions in a Postgres environment. Using this +query ensures that whichever task executes it first will be the only task that can update the fields. All other tasks trying to query the model for updates will be blocked until the original task releases the lock. Thus, each parser task can query the appropriate meta model, update the appropriate fields, and continue on as normal. The one caveat to this approach is that whenever an update needs to be made, the task must explicitely re-query the meta model to avoid any race conditions and stale +data. An piece of example code is given below to demostrate how the implementer might update the `files_completed` field. Note the function was implemented as a static member of the ReparseMeta class. + +```python +@staticmethod +def increment_files_completed(reparse_meta_models): + """ + Increment the count of files that have completed parsing for the datafile's current/latest reparse model. + + Because this function can be called in parallel we use `select_for_update` because multiple parse tasks can + referrence the same ReparseMeta object that is being queried below. `select_for_update` provides a DB lock on + the object and forces other transactions on the object to wait until this one completes. + """ + if reparse_meta_models.exists(): + with transaction.atomic(): + try: + meta_model = reparse_meta_models.select_for_update().latest("pk") + meta_model.files_completed += 1 + if ReparseMeta.assert_all_files_done(meta_model): + ReparseMeta.set_reparse_finished(meta_model) + meta_model.save() + except DatabaseError: + logger.exception("Encountered exception while trying to update the `files_reparsed` field on the " + f"ReparseMeta object with ID: {meta_model.pk}.") +``` + +## Sequential Execution +... + +## Last Ditch Timeout +... diff --git a/docs/Technical-Documentation/tech-memos/tm-template.md b/docs/Technical-Documentation/tech-memos/tm-template.md new file mode 100644 index 0000000000..0921d38883 --- /dev/null +++ b/docs/Technical-Documentation/tech-memos/tm-template.md @@ -0,0 +1,26 @@ +# TITLE + +**Audience**: TDP Software Engineers
+**Subject**: SUBJECT/TITLE
+**Date**: August 8, 2024
+ +## Summary +This is a template to use to create new technical memorandums. + +## Background (Optional) +Background for the feature if necessary. + +## Out of Scope +Call out what is out of scope for this technical memorandum and should be considered in a different technical memorandum. + +## Method/Design +This section should contain sub sections that provide general implementation details surrounding key components required to implement the feature. + +### Sub header (piece of the design, can be many of these) +sub header content describing component. + +## Affected Systems +provide a list of systems this feature will depend on/change. + +## Use and Test cases to consider +provide a list of use cases and test cases to be considered when the feature is being implemented. From 447ff840ac0ef3f2e6479c9c14fe843659bc22ce Mon Sep 17 00:00:00 2001 From: Eric Lipe Date: Tue, 27 Aug 2024 09:55:58 -0400 Subject: [PATCH 2/5] - Mostly complete tech memo - assets --- .../multi-select-filters.md | 161 +++++++++++++++++- .../multiselectdropdownfilter.html | 18 ++ 2 files changed, 171 insertions(+), 8 deletions(-) create mode 100644 docs/Technical-Documentation/tech-memos/multi-select-fiters/multiselectdropdownfilter.html diff --git a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md index fb149f2858..2902cb2d32 100644 --- a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md +++ b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md @@ -2,25 +2,170 @@ **Audience**: TDP Software Engineers
**Subject**: Multi-Select Filter Integration
-**Date**: August 8, 2024
+**Date**: August 27, 2024
## Summary This is a template to use to create new technical memorandums. ## Background -TDP has been expanding it's Django Admin Console (DAC) filtering capabilities by introducing custom filters, specifically multi-select filters. This has introduced a myriad of issues because TDP does not use the default DAC. Instead, to assist with accessability compliance TDP wraps the default DAC with [Django 508](https://github.com/raft-tech/django-admin-508) (henceforth referred to as 508) which makes various updates to the styling and functionality of the default DAC. A key change is that 508 introduces to the DAC is an `Apply Filters` button that intercepts query string parameters from default DAC filters and only applies them after clicking the button. The default DAC applies the filters as they are selected as opposed to all at once. The issue with 508's approach is that it assumes all filters are builtin Django filters (i.e. single select filters). This presents a discrepancy because Django allows developers to write custom templates and filters to add further filtering functionality (e.g. multi-select filters). +TDP has been expanding it's Django Admin Console (DAC) filtering capabilities by introducing custom filters, specifically multi-select filters. This has introduced a myriad of issues because TDP does not use the default DAC. Instead, to assist with accessibility compliance TDP wraps the default DAC with [Django 508](https://github.com/raft-tech/django-admin-508) (henceforth referred to as 508) which makes various updates to the styling and functionality of the default DAC. A key change is that 508 introduces to the DAC is an `Apply Filters` button that intercepts query string parameters from default DAC filters and only applies them after clicking the button. The default DAC applies the filters as they are selected as opposed to all at once. The issue with 508's approach is that it assumes all filters are built-in Django filters (i.e. single select filters). This presents a discrepancy because Django allows developers to write custom templates and filters to add further filtering functionality (e.g. multi-select filters). ## Out of Scope -Call out what is out of scope for this technical memorandum and should be considered in a different technical memorandum. +General filter template specification and general property based multi-select filtering mentioned in the [TDP Updates](#tdp-updates) section of this memorandum are out of scope for this memorandum. ## Method/Design -This section should contain sub sections that provide general implementation details surrounding key components required to implement the feature. +To support multi-select/custom filtering in the DAC, both the TDP repository and the 508 repository will require updates. + +### Django 508 Updates +508 builds the query string for all filters on the currently selected DAC page with the [dropdown-filter.js](https://github.com/raft-tech/django-admin-508/blob/main/admin_interface/static/admin_interface/508/dropdown-filter.js) JavaScript file. This file defines a JQuery function that operates on the `changelist-filter` element in the DOM. The function adds `onchange` event handlers to each filter in the `changelist-filter` element which extract the filter's query string template value that gets selected when it changes to construct a new query string. However, when custom templates and custom filters are introduced the JQuery function breaks down and cannot handle it. The implementation of the single-select and multi-select query building cannot be unified. This is because Django built-in single-select filters define a single prop on the `option` elements for the filter. The `value` prop that is defined on all the `option` elements is that `option`'s query parameter with the rest of the current query string appended to it. This same implementation cannot be achieved on multi-select filters because the query string cannot (and should not) contain multiple queries of the same type. This implies single-select and multi-select filters have to be handled in 508 differently. The update to `dropdown-filter.js` provided below serves as a guide towards a final solution for integrating multi-select filters, single-select filters, and the `Apply Filters` button. The implementation below relies on two key facts. One, all multi-select filters define `ariaMultiSelectable` and two, that all multi-select filters define two custom props: `key` and `value`. These key value pairs (e.g. `key=name__in`, `value=Bob`) are used to build the query string along with whatever the remaining single-select filters have chosen. When a user clicks the `Apply Filters` button, the code below executes and builds the query string for single and multi-select filters. + +```javascript +if (typeof (django) !== 'undefined' && typeof (django.jQuery) !== 'undefined') { + (function ($) { + 'use strict'; + $(document).ready(function () { + const filters = document.querySelectorAll('#changelist-filter .list-filter-dropdown select') + let query = '?' + + const applyFiltersButton = document.querySelector('#submit-filters'); + if (applyFiltersButton) { + applyFiltersButton.onclick = function () { + for (const filter of filters) { + let conjunction = query === '?' ? '' : '&' + if (!filter.ariaMultiSelectable) { + if (filter.selectedIndex !== 0) { + // Built in Django filters append the query string to the `value` field on the element. However, when we + // have a mult-selectable filter, the select element can't have the `value` field as a query string + // because multiple options can be selected and there is no way to track that. Therefore, we strip the + // single select filters query param from the existing query string and build and entirely new query + // string from that. + let opt = filter.options[filter.selectedIndex] + let query_str = opt.value + let filter_query = '' + for (let i = 1; i < query_str.length; i++) { + if (query_str[i] === '&') { + break + } + filter_query += query_str[i] + } + query = query.concat(conjunction, filter_query) + } + } + else { + // All multi select filters are required to set the `key` and `value` fields on the option element to the + // individual options to be able to build the correct query string. + let selected = '' + for (const option of filter.options) { + if (option.selected) { + selected = selected.concat(option.value, '%2C') + } + } + selected = selected.substring(0, selected.lastIndexOf('%2C')) + if (selected !== '') { + query = query.concat(conjunction, filter.options[0].getAttribute('key'), '=', selected) + } + } + } + window.location = query + }; + } + }); + })(django.jQuery); +} +``` + +### TDP Updates +Currently, TDP implements a custom multi-select filter, with a custom template. This filter is complex and relies on a custom "filter" button to apply it's selection which is a very incohesive with the `Apply Filters` button that 508 introduces. To remedy this, the current and future multi-select/custom filters implemented in TDP need to give 508 control of constructing the appropriate query string by way of supplying 508 with the appropriate key-value pairs needed. In doing so, we can also simplify and generalize the current multi-select filter available in TDP. + +TDP currently utilizes three classes to implement field based multi-select filtering. This can be simplified to the single class below when we let 508 manage query string building. There are three main features to note in this class. The first is the custom template used to create a multi-select drop down filter: [multiselectdropdownfilter.html](multiselectdropdownfilter.html). The second is the unique query string parameters that are defined in the `choices` method of the `FieldListMultiSelectFilter` class: `key` and `value`. These are the parameters that Django will populate into the aforementioned template and that 508 will parse to build the appropriate query string. The third and final feature to note is how the `FieldListMultiSelectFilter` overrides and adds new parameters in the constructor before calling `super()`. These additions and overrides help convert the parent class `AllValuesFieldListFilter` from a single-select to a multi-select filter. These three features allow TDP to support multi-select filtering based on Django model fields. However, this MVP implementation also provides a path forward for Django model property based multi-select filtering (e.g. the `fiscal_period` property on the `DataFile` model). Leveraging the aforementioned template and building a class which sub-classes the Django `SimpleListFiter` class we will have the ability to provide general multi-select filtering for Django model properties by implementing the correct `queryset`, `choices`, and `lookups` methods. + +```python +class FieldListMultiSelectFilter(admin.AllValuesFieldListFilter): + """Multi select dropdown filter for all kind of fields.""" + + template = 'multiselectdropdownfilter.html' + + def __init__(self, field, request, params, model, model_admin, field_path): + self.lookup_kwarg = '%s__in' % field_path + self.lookup_kwarg_isnull = '%s__isnull' % field_path + lookup_vals = request.GET.get(self.lookup_kwarg) + self.lookup_vals = lookup_vals.split(',') if lookup_vals else list() + self.lookup_val_isnull = request.GET.get(self.lookup_kwarg_isnull) + self.empty_value_display = model_admin.get_empty_value_display() + parent_model, reverse_path = reverse_field_path(model, field_path) + # Obey parent ModelAdmin queryset when deciding which options to show + if model == parent_model: + queryset = model_admin.get_queryset(request) + else: + queryset = parent_model._default_manager.all() + self.lookup_choices = (queryset + .distinct() + .order_by(field.name) + .values_list(field.name, flat=True)) + super(admin.AllValuesFieldListFilter, self).__init__(field, request, params, model, model_admin, field_path) + + def queryset(self, request, queryset): + """Build queryset based on choices.""" + params = Q() + for lookup_arg, value in self.used_parameters.items(): + params |= Q(**{lookup_arg: value}) + try: + return queryset.filter(params) + except (ValueError, ValidationError) as e: + # Fields may raise a ValueError or ValidationError when converting + # the parameters to the correct type. + raise IncorrectLookupParameters(e) + + def prepare_querystring_value(self, value): + """Mask commas.""" + return str(value).replace(',', '%~') + + def choices(self, changelist): + """Generate choices.""" + add_facets = getattr(changelist, "add_facets", False) + facet_counts = self.get_facet_queryset(changelist) if add_facets else None + query_string = changelist.get_query_string({}, [self.lookup_kwarg, self.lookup_kwarg_isnull]) + yield { + 'selected': not self.lookup_vals and self.lookup_val_isnull is None, + 'query_string': query_string, + 'display': _('All'), + } + include_none = False + count = None + empty_title = self.empty_value_display + for i, val in enumerate(self.lookup_choices): + if add_facets: + count = facet_counts[f"{i}__c"] + if val is None: + include_none = True + empty_title = f"{empty_title} ({count})" if add_facets else empty_title + continue + + val = str(val) + qval = self.prepare_querystring_value(val) + yield { + 'selected': qval in self.lookup_vals, + 'query_string': query_string, + "display": f"{val} ({count})" if add_facets else val, + 'value': urllib.parse.quote_plus(val), + 'key': self.lookup_kwarg, + } + if include_none: + yield { + 'selected': bool(self.lookup_val_isnull), + 'query_string': query_string, + "display": empty_title, + 'value': 'True', + 'key': self.lookup_kwarg_isnull, + } +``` -### Sub header (piece of the design, can be many of these) -sub header content describing component. ## Affected Systems -provide a list of systems this feature will depend on/change. +- Django 508 +- TANF-App ## Use and Test cases to consider -provide a list of use cases and test cases to be considered when the feature is being implemented. +- Consider adding 508 integration tests for all/most Django fields for the suggested `FieldListMultiSelectFilter` +- Test having multiple Django built-in and `FieldListMultiSelectFilter`'s on the same page and verify the query string + diff --git a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multiselectdropdownfilter.html b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multiselectdropdownfilter.html new file mode 100644 index 0000000000..4cec46c779 --- /dev/null +++ b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multiselectdropdownfilter.html @@ -0,0 +1,18 @@ +{% load i18n admin_urls %} +
+

{% blocktrans with filter_title=title %} By {{ filter_title }} {% endblocktrans %}

+ +
+ +
+
\ No newline at end of file From 669ba45540cf76a7f173a8741f9141c533be8131 Mon Sep 17 00:00:00 2001 From: Eric Lipe Date: Wed, 28 Aug 2024 12:38:20 -0400 Subject: [PATCH 3/5] - add summary --- .../tech-memos/multi-select-fiters/multi-select-filters.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md index 2902cb2d32..5a6ae1ade1 100644 --- a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md +++ b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md @@ -5,7 +5,7 @@ **Date**: August 27, 2024
## Summary -This is a template to use to create new technical memorandums. +This technical memorandum provides the suggested guidelines for a future engineer to integrate TDP's need for multi-select filtering with Django 508. The memorandum provides some necessary background on both the TDP multi-select filters as well as Django 508 and it's purpose and effects. The [Method](#method) section provides the guidelines and updates required to integrate TDP's custom filtering needs with Django 508. Specifically, the [Django 508 Updates](#django-508-updates) section introduces the engineer to the area where the filtering and query string building occurs within Django 508 and the suggested changes. The [TDP Updates](#tdp-updates) section introduces the recommended changes to current TDP custom filtering with respect to how it can be simplified and how it could be unified with Django 508 to provide a seamless filtering experience. ## Background TDP has been expanding it's Django Admin Console (DAC) filtering capabilities by introducing custom filters, specifically multi-select filters. This has introduced a myriad of issues because TDP does not use the default DAC. Instead, to assist with accessibility compliance TDP wraps the default DAC with [Django 508](https://github.com/raft-tech/django-admin-508) (henceforth referred to as 508) which makes various updates to the styling and functionality of the default DAC. A key change is that 508 introduces to the DAC is an `Apply Filters` button that intercepts query string parameters from default DAC filters and only applies them after clicking the button. The default DAC applies the filters as they are selected as opposed to all at once. The issue with 508's approach is that it assumes all filters are built-in Django filters (i.e. single select filters). This presents a discrepancy because Django allows developers to write custom templates and filters to add further filtering functionality (e.g. multi-select filters). @@ -13,7 +13,7 @@ TDP has been expanding it's Django Admin Console (DAC) filtering capabilities by ## Out of Scope General filter template specification and general property based multi-select filtering mentioned in the [TDP Updates](#tdp-updates) section of this memorandum are out of scope for this memorandum. -## Method/Design +## Method To support multi-select/custom filtering in the DAC, both the TDP repository and the 508 repository will require updates. ### Django 508 Updates From 415cc7f51e940e545f76d648d668d1a60c6985e3 Mon Sep 17 00:00:00 2001 From: Eric Lipe Date: Fri, 6 Sep 2024 10:21:24 -0400 Subject: [PATCH 4/5] - Change language in the tdp section --- .../tech-memos/multi-select-fiters/multi-select-filters.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md index 5a6ae1ade1..93d668bd6a 100644 --- a/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md +++ b/docs/Technical-Documentation/tech-memos/multi-select-fiters/multi-select-filters.md @@ -75,9 +75,9 @@ if (typeof (django) !== 'undefined' && typeof (django.jQuery) !== 'undefined') { ``` ### TDP Updates -Currently, TDP implements a custom multi-select filter, with a custom template. This filter is complex and relies on a custom "filter" button to apply it's selection which is a very incohesive with the `Apply Filters` button that 508 introduces. To remedy this, the current and future multi-select/custom filters implemented in TDP need to give 508 control of constructing the appropriate query string by way of supplying 508 with the appropriate key-value pairs needed. In doing so, we can also simplify and generalize the current multi-select filter available in TDP. +Currently, TDP implements a custom multi-select filter, with a custom template. This filter is complex and relies on a custom "filter" button to apply it's selection which is very incohesive with the `Apply Filters` button that 508 introduces. To remedy this, the current and future multi-select/custom filters implemented in TDP need to give 508 control of constructing the appropriate query string by way of supplying 508 with the appropriate key-value pairs needed in their templates. In doing so, we can also simplify and generalize the current multi-select filter available in TDP. -TDP currently utilizes three classes to implement field based multi-select filtering. This can be simplified to the single class below when we let 508 manage query string building. There are three main features to note in this class. The first is the custom template used to create a multi-select drop down filter: [multiselectdropdownfilter.html](multiselectdropdownfilter.html). The second is the unique query string parameters that are defined in the `choices` method of the `FieldListMultiSelectFilter` class: `key` and `value`. These are the parameters that Django will populate into the aforementioned template and that 508 will parse to build the appropriate query string. The third and final feature to note is how the `FieldListMultiSelectFilter` overrides and adds new parameters in the constructor before calling `super()`. These additions and overrides help convert the parent class `AllValuesFieldListFilter` from a single-select to a multi-select filter. These three features allow TDP to support multi-select filtering based on Django model fields. However, this MVP implementation also provides a path forward for Django model property based multi-select filtering (e.g. the `fiscal_period` property on the `DataFile` model). Leveraging the aforementioned template and building a class which sub-classes the Django `SimpleListFiter` class we will have the ability to provide general multi-select filtering for Django model properties by implementing the correct `queryset`, `choices`, and `lookups` methods. +TDP currently utilizes three classes to implement field based multi-select filtering. This should be able to be simplified to a single class when we let 508 manage query string building. There are a few main features to note that this type of class would need. The first is the custom template used to create a multi-select drop down filter. An example template is the [multiselectdropdownfilter.html](multiselectdropdownfilter.html). The second are the unique query string parameters that would need to be defined in the `choices` method of the class. E.g the `key` and `value` parameters. These are the parameters that Django would populate into the aforementioned template and that 508 will need to parse to build the appropriate query string. Below is an example class that allows for field based multi-select filtering. The example class introduces some extra caveats to handle multi-select functionality. The `FieldListMultiSelectFilter` overrides and adds new parameters in the constructor before calling `super()`. These additions and overrides help convert the parent class `AllValuesFieldListFilter` from a single-select to a multi-select filter. Looking forward towards the future of TDP filtering, the example class implementation provides a path forward for Django model property based multi-select filtering (e.g. the `fiscal_period` property on the `DataFile` model). Leveraging the aforementioned template and building a class which sub-classes the Django `SimpleListFiter` class we have the ability to provide general multi-select filtering for Django model properties by implementing the correct `queryset`, `choices`, and `lookups` methods. ```python class FieldListMultiSelectFilter(admin.AllValuesFieldListFilter): From a354eed503c00e0c77eec211f91f768431e542bf Mon Sep 17 00:00:00 2001 From: Eric Lipe Date: Tue, 10 Sep 2024 11:30:34 -0400 Subject: [PATCH 5/5] - remove old TMs --- .../tech-memos/reparse.md | 48 -------------- .../tech-memos/sequential-reparse.md | 62 ------------------- 2 files changed, 110 deletions(-) delete mode 100644 docs/Technical-Documentation/tech-memos/reparse.md delete mode 100644 docs/Technical-Documentation/tech-memos/sequential-reparse.md diff --git a/docs/Technical-Documentation/tech-memos/reparse.md b/docs/Technical-Documentation/tech-memos/reparse.md deleted file mode 100644 index 5c1f5991e8..0000000000 --- a/docs/Technical-Documentation/tech-memos/reparse.md +++ /dev/null @@ -1,48 +0,0 @@ -# Reparsing - -**Audience**: TDP Software Engineers
-**Subject**: Reparsing
-**Date**: August 9, 2024
- -## Summary -Re-parsing improves the flexibility of TDP's workflow for ingesting data files. These enhancement requests came out of pragmatic needs by the administrator user of the tool from 3041 and theoretical concerns from the development team in addressing current system limitations with this new feature. - -## Background -https://github.com/raft-tech/TANF-app/issues/2870 -https://github.com/raft-tech/TANF-app/issues/2820 -https://github.com/raft-tech/TANF-app/releases/tag/v3.2.0-Sprint-90 -https://github.com/raft-tech/TANF-app/pull/2772 -https://github.com/raft-tech/TANF-app/issues/1858 -https://github.com/raft-tech/TANF-app/issues/1350 - -[Driving force of reparsing](https://github.com/raft-tech/TANF-app/issues/2870) -- Reparsing files that are stuck in pending to some other state because validators have changed, or the parser has better exception handling - -## Out of Scope -- Parsing and/or validator logic changes -- Data Model or search_indices changes -- Systemic/Infrastructure changes to accommodate large data sets -- End-user facing changes to our frontend -- Pipeline or Orchestration changes - -## Method/Design -The reparsing enhancements focus on a maturization of the clean_and_reparse.py django commando which needed CLI invocation by system administrator(s). To mature and polish this feature to meet our new deliverables, we plan to shift major functionality and visibility into the Administrator Console to leverage our existing tools within. - -#3004 introduced an initial pass at reparsing. From this key components were identified that would improve both reparsing and it's usability for the system administrators. The following items were identified to enhance the reparsing feature: introduce a Django model that tracks meta data surrounding the reparsing event, managing data synchronization and parallel execution of reparsing events, and moving away from the current CLI interface in favor of a DAC specific way to execute reparsing. - -### Meta Model -This enhancement will seek to improve our visibility into what has happened during execution of a re-parsing command. We believe creating a database model to store relevant fields about the run will improve usability. Fields will include (start time, end time, number of files processed, which files were targetted, number of records repopulated, etc.) - -### Data Synchronization -... - -### DAC Reparse Action -To mature and polish this feature it should no longer be executed from the CLI. The DAC provides all/most of the necessary filtering required to specify what datafiles to reparse. Adding a new `reparse` action to the `DataFiles` page in the DAC provides a seamless experience for the admins while also providing the reparse event with the appropriate datafiles. - #### Confirmation dialog asking "are you sure you want to reparse?" - -## Affected Systems -- Elastic -- Postgres (records, dfs, datafiles, parser errors) - -## Use and Test cases to consider -provide a list of use cases and test cases to be considered when the feature is being implemented. diff --git a/docs/Technical-Documentation/tech-memos/sequential-reparse.md b/docs/Technical-Documentation/tech-memos/sequential-reparse.md deleted file mode 100644 index 077d21f367..0000000000 --- a/docs/Technical-Documentation/tech-memos/sequential-reparse.md +++ /dev/null @@ -1,62 +0,0 @@ -# Guarantee Sequential Reparse Events - -**Audience**: TDP Software Engineers
-**Subject**: Sequential Reparsing
-**Date**: August 8, 2024
- - -## Summary -This technical memorandum aims to provide a software engineer with initial research, design patterns, and ideas necessary -to implement sequential reparsing in the TDP application. This document covers distributed/parallel data safety, how -the data synchronization allows sequential execution guarantees, and a last ditch timeout calculation necessary to -guarantee sequential reparse events at the application level. This memorandum does not take into account network partition tolerance or parsing idempotence. - -## Background -When a reparse event is executed by an admin user a set of size N files can be selected where N is on the range -[0, # of datafiles in DB]. For each reparsing event, a ReparseMeta Django model is created to track meta data about the -event such as: the number of files to be reparsed, the number of records deleted before reparsing, the number of records -created during reparsing, a backup location, etc... The meta model also contains the fields: `files_completed`, and -`files_failed`. These two fields were added to the model for it to be able to track when all files in it's set of files -had finished the parsing process, regardless of whether they passed or failed parsing. - -## Distributed/Parallel Data Safety -In the [Background](#background) section the meta model and some of it's fields were introduced along with the idea that -a reparse event generates N parsing tasks. Because (theoretically) all the tasks can execute in parallel, and there is -only one meta model per event, the meta model inherently becomes a shared object and therefore must be synchronized -across the set of N parsing tasks. There are many ways to synchronize data in a distributed system, both custom and not. -However, because the meta model is a database object, this technical memorandum suggests using the already tested and -vetted concurrency control and synchronization mechanisms inherent to TDPs Postgres database. That is for the fields in -the meta model that need to be updated in parallel (`files_completed`, `files_failed`, `num_records_created`), the -implementing engineer should ensure to leverage Django queries that convert to minimumly scoped locking database -transactions. This memorandum suggests leveraging the [select_for_update()](https://docs.djangoproject.com/en/5.0/ref/models/querysets/#select-for-update) query provides row based locking for transactions in a Postgres environment. Using this -query ensures that whichever task executes it first will be the only task that can update the fields. All other tasks trying to query the model for updates will be blocked until the original task releases the lock. Thus, each parser task can query the appropriate meta model, update the appropriate fields, and continue on as normal. The one caveat to this approach is that whenever an update needs to be made, the task must explicitely re-query the meta model to avoid any race conditions and stale -data. An piece of example code is given below to demostrate how the implementer might update the `files_completed` field. Note the function was implemented as a static member of the ReparseMeta class. - -```python -@staticmethod -def increment_files_completed(reparse_meta_models): - """ - Increment the count of files that have completed parsing for the datafile's current/latest reparse model. - - Because this function can be called in parallel we use `select_for_update` because multiple parse tasks can - referrence the same ReparseMeta object that is being queried below. `select_for_update` provides a DB lock on - the object and forces other transactions on the object to wait until this one completes. - """ - if reparse_meta_models.exists(): - with transaction.atomic(): - try: - meta_model = reparse_meta_models.select_for_update().latest("pk") - meta_model.files_completed += 1 - if ReparseMeta.assert_all_files_done(meta_model): - ReparseMeta.set_reparse_finished(meta_model) - meta_model.save() - except DatabaseError: - logger.exception("Encountered exception while trying to update the `files_reparsed` field on the " - f"ReparseMeta object with ID: {meta_model.pk}.") -``` - -## Sequential Execution -... - -## Last Ditch Timeout -...