Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularisation and Construction Pattern Sequencing #169

Open
mike1813 opened this issue Aug 2, 2024 · 15 comments
Open

Modularisation and Construction Pattern Sequencing #169

mike1813 opened this issue Aug 2, 2024 · 15 comments

Comments

@mike1813
Copy link
Member

mike1813 commented Aug 2, 2024

Construction patterns are rules for adding inferred assets and/or relationships to a system model. They fill in details that users may be unable to provide, or may forget to provide, where they are deducible (given some assumptions).

System modeller applies construction patterns one at a time. Patterns flagged as iterative are repeated until no further changes occur, before moving on to the next pattern in the sequence. The sequence is specified via an integer property core#hasPriority.

As may be familiar to older folk, this priority number acts like a BASIC program line number. As in BASIC, best practice is to leave gaps between line numbers, so one starts at line 10 or 1000, and the next line is 20 or 1100 or 2000. The idea is to leave some line numbers available between the lines in case new lines need to be inserted. If that happens, the new line can be given a number without changing the old line numbers.

Of course, eventually some of the gaps get filled in and one has to renumber from the top. That creates problems for a diff-based version tracking system like Git, because every line in the program has to change.

It also makes modularisation more difficult to achieve. Although the source tables are not split into modules, we do use a core#Package property as a sort index in most tables, so lines pertaining to a given module are kept together. While we don't have separate sources for each module, the sources we do have are equivalent to what one would get if separate source files per module were concatenated.

The exception is table ConstructionPattern.csv, which for reasons of readability is sorted on the priory index core#hasPriority. Modules are interleaved with each other, and adding a new module is doubly difficult because one must figure out where in the sequence each of its construction patterns should go, and then adjust the core#hasPriority properties of patterns either side.

The proposal is that we remove these 'line numbers' from ConstructionPattern.csv. Instead, we should introduce two new tables called something like 'Predecessor.csv' and 'Successor.csv', containing core#hasPredecessor and core#hasSuccessor properties of construction patterns.

The reason for using two tables rather than one is so the relationships between construction patterns respect package dependencies. If package B depends on package A, a construction pattern from package A would not refer to package B patterns at all. Patterns from B would refer to patterns in A, so each pattern in B could specify its position in the sequence by referring to patterns in A on either side of the position it should have.

Of course, with this approach it would be possible to specify a partial ordering only, leaving system modeller to decide which pattern to process first in some cases.

Ideally, system modeller would use the core#hasPredecessor and core#hasSuccessor properties to work out the order for itself. This is not essential, though, because it should be possible to insert code into csv2nq that can figure out the sequence and insert the calculated values of core#hasPriority into the RDF for deployment to system modeller. The sequence is encoded by precedence relationships in the source code, but in the 'compiled' version this is converted to a numerical order.

Proposals for these changes in csv2nq are covered in csv2nq#7.

mike1813 added a commit that referenced this issue Sep 12, 2024
mike1813 added a commit that referenced this issue Sep 12, 2024
…ables.

Initially these are populated with dependencies inferred between patterns one of which creates a link/asset and the other matches the created link/asset either before or after the first pattern.
mike1813 added a commit that referenced this issue Sep 12, 2024
@mike1813
Copy link
Member Author

mike1813 commented Sep 13, 2024

First step was to merge 'inference only' packages containing construction patterns with the packages for the sub-model to which the construction pattern contributed.

The original idea of these packages was to separate asset and relationship types introduced for the purpose of construction only. That way, the inference packages could become side-shoots in the package dependency tree and if threats used those assets and relationships the Access DB editor would detect a packaging error. The problem is that once CP dependencies are added, we get a lot of dependencies between inference packages, so they cannot remain as side-shoots and the benefit of separating them out is lost. They just make it more difficult to encode CP dependencies without breaking the package dependency hierarchy.

Step two was to add two new tables ConstructionPredecessor.csv and ConstructionSuccessor.csv to hold the dependencies. At first these were filled with dependencies extracted from CP contents by queries added to the Access DB editor. The first time this was done, the queries used contained bugs, so several passes were needed.

The criteria for creating an autogenerated dependency between two packages CP1 and CP2 are as follows:

  • an asset creation dependency exists if CP1 creates an asset which would match a node in the matching pattern of CP2 even if it had no relationships
  • a link creation dependency exists if CP1 creates a link between two assets (either of which may also be created by CP1), and CP2 has a matching pattern in which the created link would be matched.

To get an asset creation dependency, the matching pattern of CP2 must match the asset to a mandatory node with the same or a parent asset class of the one created by CP1, and whose only relationships are prohibited relationships, or relationships to nodes that are prohibited or optional. If this node has other relationships, then the asset created by CP1 would not match independently of relationships created by CP1 (or later patterns), so the dependency should be represented as a link creation dependency.

To get a link creation dependency, the matching pattern of CP2 must contain a link of the same type or a parent type of the link created by CP1. The source and target nodes for this link must have compatible asset classes in CP2 and CP1. If CP1 creates the source or target asset, then the corresponding node in CP2 must specify the same asset class or a parent class. If CP1 creates the link from/to an existing assets, then the source/target nodes in CP1 and CP2 must have at least one common asset subclass.

Where a dependency does exist between CP1 and CP2, one must then decide if it is a create-then-match dependency. This is found from the relative positions of CP1 and CP2 in the original construction sequence. If CP2 comes after CP1, then the dependency is a create-then-match dependency: CP1 creates the asset/link first so it can be matched by CP2. If not, the dependency is a match-then-create dependency: CP1 creates the link, but afterwards so it is not matched by CP2.

Finally, the dependency is encoded as a successor dependency if the two patterns are in different packages, and the package of the first pattern executed depends (directly or indirectly) on the package of the second pattern executed. If this is not the case then the dependency is encoded as a predecessor dependency, including when CP1 and CP2 are in the same package.

With this approach, dependencies are not autogenerated if neither package depends on the other, i.e., if the two patterns are in distinct branches. A query was added to the Access DB domain model editor to find and display such 'unfulfilled' dependencies.

To address an unfulfilled dependency without making changes to the involved construction patterns, there are three options:

  • add a package dependency so one of the packages does depend (directly or indirectly) on the other, or
  • add a new pattern CP3 that creates no assets or links, in a package on which both these packages depend, and create manual dependencies CP1-CP3 and CP2-CP3
  • ignore the CP1-CP2 dependency: this only makes sense if the dependency is 'fake', i.e., other elements of patterns CP1 and CP2 mean that the assets/links created by CP1 could not produce a match in CP2

Making package#5G depend on package#Virtualisation was the first step towards 'enabling' dependencies that would otherwise violate the package dependency hierarchy.

@mike1813
Copy link
Member Author

The test case 'Context and Clouds' includes a mobile user device (a smart phone) moving between the user's home and town (i.e., between private and public locations), a 5G network (with inferred infrastructure) serving both spaces, Bluetooth pair connections between the phone and both a PC fixed in a private location and an IoT sensor carried by the phone user to both locations, a USB pair connection between the PC and an external thumb drive used to store sensor data, a cloud data centre (with inferred server cluster running an inferred K8S cloud framework) hosting containerised services, and routers in the user's home and in the data centre providing connections to the Internet.

This test case therefore exercises many, though not all of the domain model features (e.g., it does not include OAuth-style single sign on authenticators, nor IoT controller devices).

Running this test case using only autogenerated CP dependencies after making package#5G depend on package#Virtualisation, it turns out that 15 inferred Host-subjectTo-Jurisdiction relationships are not created by pattern JuPHS+s. This happens because this pattern should be executed after any pattern that creates a Host or a Host-hasPhysicalHost-PhysicalHost relationship. The problem is that some of those patterns are in package#Virtualisation, package#5G or package#CloudManagement, while JuPHS+s is part of package package#Legal which does not depend on these packages nor do they depend on package#Legal.

As a result, it is not possible to add autogenerated dependency relationships between JuPHS+s and some predecessor patterns as those dependency relationships would violate the declared package dependency hierarchy.

It would not be appropriate for package#Privacy to depend on package#Virtualisation, package#5G or package#CloudManagement because the concepts needed by package#Privacy are in package#DataLifecycle or packages it depends on including package#Users and package#Network, etc. The need for privacy in a system should not depend on whether the system is deployed on virtual hosts or cloud data centres, nor whether it uses 5G for communications.

Actually, we already have one inappropriate dependency for package #Privacy (on package#IoT), but the plan is to find some way to remove this dependency, not add more inappropriate dependencies.

At this stage we want to avoid making changes in the construction patterns, so (as described above) the only way to ensure these host-related construction patterns are executed before JuPHS+s is to insert a new pattern in (say) package#Network, and make it depend on each of the host-related patterns while making JuPHS+s depend on it. These dependencies can be added manually or by including features in the new pattern such that they can be deduced, e.g., by including the Host-hasPhysicalHost-PhysicalHost relationship in the matching pattern and making the pattern create the same relationship (creating a duplicate which will therefore have no effect on the model).

mike1813 added a commit that referenced this issue Sep 13, 2024
…etwork to provide a bridge for a construction pattern dependency of JuPHS+s in package#Privacy on patterns in package#Virtualisation, package#5G or package#CloudManagement which can't be encoded directly without violating the asserted package dependency hierarchy.
@mike1813
Copy link
Member Author

mike1813 commented Sep 13, 2024

The new 'dummy' pattern is PHH+hPH in package#Network, inserted just before the package#Network patterns to determine in which spaces a subnet is accessible based on the locations of connected non-mobile Hosts.

The Access DB editor CP dependency generation queries find this depends on the host creation and physical host identification patterns in package#Virtualisation, package#5G and package#CloudManagement (among others), and JuPHS+s depends on it.

With this change, using a modified csv2nq program) the 'Context and Clouds' test case produces the same results as with the original numbered construction sequence. A direct comparison between the NQ files after validation revealed that the only differences were either (a) timestamps or references to the domain model version, (b) differences in the threatened asset for threats that in principle threaten multiple assets (so system modeller must make an arbitrary choice which one to use), or (c) differences in the construction pattern responsible for creating some DataStep assets. The last of these arises because the sequence created by the modified csv2nq program is a partial sequence, so it is possible for multiple patterns to have the same hasPriority value leaving system modeller to decide in which order to apply them. The DataStep creation patterns don't depend on each other, but they have some overlaps (more than one pattern could create the same asset). Which pattern is responsible depends on what order they are applied by system modeller, but the outcome doesn't depend on this.

@mike1813
Copy link
Member Author

mike1813 commented Sep 14, 2024

There are still 43 irreducible dependencies between CP (i.e., with no dependencies of either pattern on an intermediary) that are not encoded in the CP predecessor/successor properties. These are either:

  • fake dependencies in which the assets/links created in one pattern cannot match the other pattern due to other elements of the two patterns,
  • dependencies that are real, but where the sequence calculation in csv2nq happens to rank the patterns in the right order due to the number of predecessors has (i.e., the successor has a higher calculated hasPriority value, but by accident),
  • dependencies that are real, and the successor pattern does not have a higher calculated hasPriority value than its predecessor but the patterns are not both matched by the same assets in this particular test case.

The last category must be addressed by further patterns similar to PHH+hPH whose purpose is to act as intermediaries, as without this discrepancies could arise in other test cases. Ideally the last two categories should be so addressed, as otherwise there is a risk that discrepancies may appear if the domain model is changed in any way.

@mike1813
Copy link
Member Author

mike1813 commented Sep 14, 2024

Dependency between pattern CtHhPH+hPH (package#CloudManagement) and patterns MPCNS+aF and FCNS+aF (package#5G). this arises because the 5G patterns determine accessibility of a cellular network based on the locations of fixed physical hosts that are connected or on which run virtual hosts that are connected to the cellular network.

The reason this hasPhysicalHost dependency is not addressed by PHH+hPH is because that intermediate pattern was inserted after MPCNS+aF and FCNS+aF. This is necessary because the accessibility relationships they create are later used to infer the existence of more physical hosts comprising the 5G infrastructure (if not already present). PHH+hPH must come after those patterns, so it can't be inserted before MPCNS+aF and FCNS+aF.

This does not affect the 'Context and Clouds' test case because although it includes cloud-hosted VMs and a 5G network, none of the former are connected to the latter. This could happen, although the test case would be even more complex so no such test was created. Instead, possible fixes were tested by checking that these dependencies are no longer found by the 'unfulfilled dependencies' query in the Access DB editor.

The simple fix is to add a copy of PHH+hPH just before MPCNS+aF and FCNS+aF. The original was renamed PHH+hPH-2, and the new copy named PHH+hPH-1, so the names reflect their relative positions in the construction sequence.

A better solution would be to move CtHhPH+hPH further up the construction sequence, so it is followed by a pattern that can be linked directly with MPCNS+aF and FCNS+aF without violating the package dependency hierarchy. This was not attempted as the first solution because it would be difficult to ensure that changes to the original sequence would not cause new discrepancies and make it impossible to run tests that use back-to-back comparisons.

@mike1813
Copy link
Member Author

Next dependency is between package#5G patterns CNRS+aF and CNRANBSS+aF, both of which deduce that the backbone and radio access networks are accessible from the locations of the providing gateway routers (base stations in the case of the RANs). Subsequent package#LocalDeviceConnectivity patterns L1cGcL3+NSg and L3cGcL1+CSg create routes from a device to a L2/L3 subnet through a gateway paired with the device via Bluetooth/USB (the pairing connection being modelled as an L1 only subnet). This dependency violates the package hierarchy because package#LocalDeviceConnectivity does not depend (directly or indirectly) on package#5G or vice versa.

To solve this one, a dummy package#Network pattern LSaS+a can be added between CNRANBSS+aF and L1cGcL3+NSg, in which the link LogicalSubnet-accessibleFrom-Space is matched and duplicated. The best position for this new pattern is at the start of the network connectivity (gateway routing) asset inference sequence, because that way it follows other patterns that deduce network accessibility such as FpLSS+aF and FcLSS+aF, and provides a boundary between accessibility and routing that can be used by any future extensions, e.g., for new types of networks.

This was tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returned dependencies between package#5G patterns CNRS+aF & CNRANBSS+aF and package#LocalDeviceConnectivity patterns L1cGcL3+NSg & L3cGcL1+CSg.

mike1813 added a commit that referenced this issue Sep 16, 2024
…SS+aF and L1cGcL3+NSg, so the dependency between those patterns can be resolved without violating the asserted package hierarchy.
@mike1813
Copy link
Member Author

Next up is a dependency between package#IoT pattern DcTh+s and package#LocalDeviceConnectivity package pattern USBD-S+S. The former adds a Host-stores-Data relationship between a Thing and its control input (which should normally be stored because it affects the behaviour of the Thing between updates). The latter adds an onboard DataService to a USB device if it stores any Data - so it depends on asserted or created Host-stores-Data relationships.

To solve this, a dummy package#Network pattern HD+s before USBD-S+S. This should match a Host-stores-Data pattern and duplicate the 'stores' relationship, so there is a create-then-match dependency between DcTh+s and HD+s, and a match-after-create dependency between USBD-S+S and HD+s, neither of which violate the package dependency hierarchy.

Tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returns dependencies between DcTh+s and USBD-S+S.

mike1813 added a commit that referenced this issue Sep 17, 2024
…lowing the dependency between DcTh+s and USBD-S+S (which violates the package hierarchy) to be represented in terms of dependencies on HD+s.
@mike1813
Copy link
Member Author

Next is a set of dependencies between various package#IoT patterns and package#ProcessComms patterns. The IoT patterns create process-uses-process relationships between onboard IoT processes and external processes that communicate with the IoT device. The direction of the uses relationship indicates whether communication is initiated by the IoT device or the external process.

The problem here is that the IoT package depends on the Network package (which covers process-process relationships) but not the ProcessComms package (which inserts inferred assets representing process-process relationships, and handles aspects such as authentication, authorisation and the use of communication proxies). A direct IoT-ProcessComms dependency (in either direction) therefore violates the package dependency hierarchy.

This can be addressed by adding package#Network pattern CS+u which detects a Process-uses-Process relationship and makes a copy of it. Inserting this before the ProcessComms inference sequence (i.e., before SPuS+U) allows the dependencies between IoT and ProcessComms patterns to go via CS+u, without violating the package dependency hierarchy.

Tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returns direct dependencies between the IoT and ProcessComms patterns.

mike1813 added a commit that referenced this issue Sep 17, 2024
…+U in package#ProcessComms, allowing dependencies between package#IoT patterns and package#ProcessComms patterns to go via CS+u without violating the package dependency hierarchy.
@mike1813
Copy link
Member Author

Next comes dependencies betwen various package#IoT patterns and package#LocalDeviceConnectivity patterns involving USB storage devices. (Arguably, these patterns should also apply to storage devices paired via Bluetooth, but that is a separate issue).

The problem here is that the USB storage patterns check for processes that handle data (i.e., access data for any reason, whether for the purpose of processing or merely to move data between other processes). This means they depend on IoT patterns which create relationships between onboard IoT processes and data, and between external processes that send control inputs to an IoT device or use sensor output from an IoT device. An IoT device may have a USB connection to some host gateway device, so this represents a real dependency that must be reflected in the construction order.

This can be addressed by adding a package#Application pattern PD+h that detects a Process-handles-Data relationship and adds a duplicate relationship. Inserting this between the IoT and LocalDeviceConnectivity patterns allows their dependenci to go via PD+h, without violating the package dependency hierarchy.

Tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returns direct dependencies between the IoT and LocalDeviceConnectivity patterns.

mike1813 added a commit that referenced this issue Sep 17, 2024
…age inference patterns, allowing dependencies between these and IoT patterns to go via PD+h without violating the package dependency hierarchy.
@mike1813
Copy link
Member Author

mike1813 commented Sep 17, 2024

The remaining 'unfulfilled' construction pattern dependencies are all 'fake' or 'unrealistic':

  • fake: one pattern creates/matches an asset or relationship that is matched/created by another pattern, but the two patterns can't possibly involve the same asset or relationship because other elements in the two patterns prevent this
  • unrealistic: one pattern creates/matches an asset or relationship that is matched/created by another pattern, but they can involve the same asset or relationship only if the system model is highly non-typical.

The fake dependencies in the current domain model (on branch 169) are as follows:

  1. Patterns KPxCazSI+az and KPxCuScP+aa from package#CloudManagement match Host-connectedTo-LogicalSubnet links that are created by package#LocalDeviceConnectivity patterns like HpBRH+B and HpURH+U. However, they only create connectedTo relationships to LogicalSubnets representing pairing connections between physical hosts, while the LogicalSubnet in KPxCazSI+az and KPxCuScP+aa have connections from virtual hosts.
  2. Pattern HuiCo+UI from package#IoT creates a Host-hosts-Process relationship which could be matched by KPxMLnSA+uu from package#CloudManagement. However, the Process in the latter case is a remote access client, while in the former it is not.
  3. Patterns Pp-uSe+Rel and PuSe+Rel from package#IoT create Process-uses-Process relationships that would later be matched in pattern KPxCuuSI+uu from package#CloudManagement. However, in the latter case the service (used process) runs on a virtual host, while in the IoT patterns the host is physical.
  4. Pattern PrSe+Rel from package#IoT created a Process-uses-Process relationship that would later be matched in KPxMLnSA+uu from package#CloudManagement. However, in the latter pattern the client (using) process is a remote access client, while in the IoT pattern it is not. There is a modelling error if a remote access client handles data, which would be triggered were this not the case.
  5. Pattern USBD-S+S from package#LocalDeviceConnectivity creates a Host-hosts-Process link after this would have been matched by pattern HuiSe+Rel from package#IoT. However, the Process in USBD-S+S is an inferred DataService, while in HuiSe+Rel it is an inferred SensorProcess, so the same process cannot be involved in both patterns.
  6. Pattern DcTh+s from package#IoT creates a Host-stores-Data relationship which could be matched (in two places) by pattern VHHsD-HsS-S+S from package#Virtualisation. However, in the latter pattern one of the matches involves a virtual host while the host in DcTh+s is physical IoT Thing, and in the other the host is provisioning a virtual host which (in the current domain model) is not permitted to run on an IoT Thing.
  7. Pattern HumThP-m+m from package#IoT matches a Host-hosts-Process link, which is created by pattern USBD-S+S from package#LocalDeviceConnectivity, but does not match HumThP-m+m because it precedes USBD-S+S. The Host in the latter pattern is a USB device, and we cannot exclude the possibility that this is a Thing. However, the requirement for HumThP-m+m to precede USBD-S+S only exists if the process in HumThP-m+m is a DataService. That would have been created by the earlier pattern DcTh+s, and would match a prohibited node in USBD-S+S, so the same process cannot match in both patterns.

In some of these cases, it is possible to remove the apparent dependency by specifying that one of the assets in one or other of the related patterns is of a more restricted class than currently specified. For example, the LogicalSubnet in K8S-related patterns can't be an L1 subnet (Bluetooth or USB pairing connection), so we could specify that the subnet is a L23Subnet in those patterns. Such changes have not been made at this stage - for the purposes of this issue, it is enough to know that these apparent dependencies can be ignored.

@mike1813
Copy link
Member Author

mike1813 commented Sep 17, 2024

Separate issues added (#172, #173 and #174) describing contruction pattern changes needed to remove apparent dependencies that are not valid. These changes should eliminate fake dependencies 1,2, 4, 5 and one of the two dependencies in 6.

Addressing these issues would not change the real CP dependencies, of course. It just removes apparant dependencies that may otherwise be rediscovered and cause confusion in future.

We probably also need a way to record dependencies that are not valid, and add comments explaining why they can be ignored, or a way to exclude dependencies that are definitely fake from the existing 'unfulfilled dependency' query in the Access DB editor, and from other tools we might in future develop to assist with dependency management and composability.

@mike1813
Copy link
Member Author

Finally, there are some dependencies that are not definitively 'fake' but are nevertheless unrealistic:

  1. Pattern DCMW-LS+VX from package#CloudManagement creates a VXLAN virtual network to which cloud Master and Worker VMs (in an inferred K8S-like cloud platform) are connected. The connectedTo relationships would match CNS+RAN+BS-S+l from package#5G, but are missed because DCMW-LS+VX should come after DCMW-LS+VX. In practice, the VXLAN in a K8S-like cloud platform would not serve as a cellular network backbone network, so this dependency is illusory.
  2. Pattern PCNRWI+ct from package#5G creates a Host-connectedTo-Internet connection between the backbone router in a public cellular network and the Internet. This would match several Host-connectedTo-LogicalSubnet relationships in patterns KPxCazSI+az and KPxCuScP+aa from package#CloudManagement, but are missed because they follow PCNRWI+ct. However, the LogicalSubnet in the cloud management patterns would not in practice be the Internet - at worst it would be a VXLAN running on the Internet.
  3. Pattern CNHS+RAN+cT from package#5G creates a Host-connectedTo-RAN connection between the physical host of any Host connected to a cellular network and the cellular radio access network in its location. This would match several Host-connectedTo-LogicalSubnet relationships in patterns KPxCazSI+az and KPxCuScP+aa from package#CloudManagement, but are missed because they follow CNHS+RAN+cT. However, the LogicalSubnet in the cloud management patterns would not in practice be a cellular RAN.
  4. Patterns DcTh+DS, SesD+SP and HuiCo+UI in package#IoT create processes onboard IoT devices. The Thing-hosts-Process relationships would be matched in KPxCaaSI+aa and KPxCazSI+az from package#CloudManagement, but are missed because these patterns follow the IoT patterns. In practice they wouldn't be matched anyway because the IoT onboard processes are inferred and wouldn't be controlled by or use keys from a cloud authentication/authorization service.
  5. Pattern UHDrP-S+S in package#LocalDeviceConnectivity creates a DataService on a USB storage device. The Host-hosts-Process relationship would then be matched in subsequent patterns KPxCaaSI+aa and KPxCazSI+az from package#CloudManagement. In practice, the inferred DataService would not match these patterns because it wouldn't be controlled by or use keys from a cloud authentication/authorization service.

These can be ignored for now, but ideally we would have a way to acknowledge them and document the reason for ignoring them.

@mike1813
Copy link
Member Author

At this point, following commit fe32abd, the new tables ConstructionPredecessor.csv and ConstructionSuccessor.csv in branch 169 contain the following fields:

  • URI: of the construction pattern whose predecessor or successor is defined by a row in the table
  • package: the package this construction pattern is in, used in a canonical sort order (pattern, then URI) so rows of the table related to a specific submodel are kept together, allowing git to merge changes made to different submodels automatically
  • hasPredecessor or hasSuccessor: the construction pattern that should be applied before or after this one
  • note: a short text comment explaining why the second pattern should predece or follow the first
  • asserted: a flag indicating whether the dependency was asserted by a user or inferred from an old numbered sequence
  • fake: a flag indicating whether the dependency should be ignored

The first three are self-evidently necessary, and are used by csv2nq to determine a partial ordering and insert it into the hasPriority field in ConstructionPattern.csv (overriding the existing value, if any).

The 'note' field was added so users (or other tools) can insert the reason why the second pattern should precede or follow the first. At present, all rows in these tables were derived from create-then-match or create-after-match dependencies in which an asset or a link is created in one pattern that would match another pattern, so the extraction query inserts this information in the 'note' field. It is possible that several distinct assets/links may be created by one pattern and matched by another, so in some cases this produces duplicate rows that differ only in the 'note' field which refers to a different asset/link in each case. That isn't a problem for csv2nq as it creates a single, temporary table containing the successors of each URI, combining the predecessor and successor tables and also combining rows that refer to the same pair of construction patterns.

The 'asserted' flag is a temporary measure designed to support the process of extracting dependencies from an existing sequence. Some features were added to the Access DB editor to support the process (as described above):

  • extract dependencies from construction patterns, using the existing sequence to determine their direction (create-then-match or create-after-match)
  • work out which are 'irreducible' (not equivalent to two or more dependencies forming a chain via intermediate patterns),
  • convert those into predecessor or successor statements while respecting the package dependency hierarchy,
  • check if any irreducible dependencies remain that could not be converted without violating the package hierarchy
  • if any such 'unfulfilled' dependencies exist, add new patterns (like PHH+hPH-1 and PHH+hPH-2) and start again.

Once the new patterns are added, new irreducible dependencies will be found involving them. Some old dependencies that were previously irreducible will now be equivalent to chains involving these new dependencies. Starting again doesn't just add more irreducible predecessor and successor relationships - it also removes some of the old ones (as well as eliminating some unfulfilled dependencies). Because of this, the Access DB clears the old predecessor and successor tables before creating new ones. However, this should not be done for entries that were added manually because it may not be possible to recalculate them from construction pattern content. The 'asserted' flag is used to exclude these dependencies from being deleted.

For example, the user may know that one subsequence can follow another, but each pattern in one depends only on some patterns in the other sequence. A partial sequence generated from those dependencies would include patterns from both sequences mixed together. That may make it harder for different people to maintain each sequence. A user might fix this by adding a pattern in each sequence and some asserted dependencies:

  • all patterns in the first sequence are predecessors of the new pattern added to that sequence
  • all patterns in the second sequence are successors of the new pattern added to that sequence
  • the pattern added to the second sequence is a successor of the pattern added to the first sequence

The two sequences would now remain separate. All dependencies between patterns in different sequences would be 'reducible' to chains going via the two new patterns. The maintainer of one sequence would not need to refer to patterns in the other. The new patterns may need to be very simple so they don't have other dependencies, which means it may not be possible to deduce those dependencies they do have from their contents.

The 'fake' flag is used by the Access DB editor (but not yet csv2nq) to filter dependencies before using them, e.g. to compute the partial construction sequence. The idea was to allow a user to insert a dependency that may violate the package dependency tree but mark it as 'fake' and explain in the 'note' field why the dependency should be ignored.

In practice, neither the 'asserted' nor 'fake' fields have yet been used because everything now at the head of branch 169 is based on extracted dependencies derived from construction pattern contents. It may be possible and/or sensible to remove these fields later, once all relevant domain models have been converted from the old sequence to the new dependency approach.

mike1813 added a commit that referenced this issue Sep 20, 2024
…D+s, and creates parallel/opposed link flags for patterns CS+u, HD+s and PD+h.
mike1813 added a commit that referenced this issue Sep 21, 2024
…d ConstructionSuccessor.csv, so git diff and git merge can work more effectively.
@mike1813
Copy link
Member Author

One more issue. I used 'note' as the explanation field name in the ConstructionPredecessor.csv and ConstructionSuccessor.csv tables.

It turns out that 'note' is a reserved keyword in MS Access SQL, though not in the SQL standard ISO/IEC 9075:2023, and this causes problems when reimporting the table in the MS Access DB editor.

Changed the field name 'note' to 'notes' to fix this. This involved finding and altering the MS Access DB editor in several places.

mike1813 added a commit that referenced this issue Sep 24, 2024
…ructionPredecessor.csv and ConstructionSuccessor.csv tables. Needed because 'note' is a reserved keyword in the Microsoft Access variant of SQL.
@mike1813
Copy link
Member Author

An empty version of the MS Access DB editor is now on Sharepoint as 'Domain Modeller - v10-3-1 - Sequencer.accdb'. This is a version designed to support the task of:

  • importing a domain model in which the sequence is encoded via the ConstructionPattern.hasPriority values
  • inferring the dependencies between patterns and checking that enough of them can be expressed as ConstructionPredecessor and ConstructionSuccessor entries without violating the package dependency hierarchy
  • exporting the domain model with the inferred ConstructionPredecessor and ConstructionSuccessor tables included

After export, assuming no other edits have been made, the only changes should be the addition of tables ConstructionPredecessor.csv and ConstructionSuccessor.csv containing the inferred dependencies, and a change in the order in ConstructionPattern.csv. Until now, this was sorted on the hasPriority field, but this 'sequencer' version of the database sorts on package membership then URI (same as most other tables), so that rows belonging to one package stay together.

In this version of the MS Access DB editor, the inferred ConstructionPredecessor and ConstructionSuccessor entries are stored as tables CP_ConstructionPredecessor and CP_ConstructionSuccessor. When exporting the database, these are exported as ConstructionPredecessor.csv and ConstructionSuccessor.csv. When importing again, the data is stored in tables ConstructionPredecessor and ConstructionSuccessor, but entries not marked as 'asserted' are deleted. One must regenerate CP dependencies to repopulate CP_ConstructionPredecessor and CP_ConstructionSuccessor before one can export again.

The ConstructionPatternEntry form now allows 'asserted' predecessor and successor references to be added or deleted. These are stored in tables ConstructionPredecessor and ConstructionSuccessor. One must then regenerate CP dependencies to repopulate CP_ConstructionPredecessor and CP_ConstructionSuccessor, during which any 'asserted' dependencies are copied from ConstructionPredecessor and ConstructionSuccessor into CP_ConstructionPredecessor and CP_ConstructionSuccessor, with the asserted flag set to true. On export, these flags are retained, so they will be loaded again into ConstructionPredecessor and ConstructionSuccessor and not deleted.

This version of the database includes a script to calculate a sequence number or rank for each construction pattern in the partial ordering specified by tables CP_ConstructionPredecessor and CP_ConstructionSuccessor. The results are stored in ConstructionPatternRank, which gives the old 'hasPriority' value and the new sequence number per construction pattern. This is used to check that all patterns got a sequence number, and to find whether two patterns with an unfulfilled dependency would run in the wrong order if the dependency is not added.

There is a query 'CP_UpdateConstructionPatternPriorities' which replaces the old 'hasPriority' value with the computed sequence number. This query is not run automatically by any of the CP dependencies generation functions available from the 'Construction Dependencies' form. The idea is that the old sequence should be preserved so any changes can be made with that as a baseline, and new CP dependencies extracted by regenerating them.

However, this query is provided so it can be launched manually to insert the new sequence numbering in the ConstructionPattern table. If this is done, and the model then exported to CSV files, the results can be processed by the version of csv2nq from the 'main' branch, which is the version used in the domain model CI pipeline. This provides a way to create a release that can be processed by the current CI pipeline, if needed.

The other optionb would be to export the model 'as is' (with the old ConstructionPattern hasPriority values), and use the new csv2nq (branch 7) to compute the sequence during the translation to NQ. We can't use that version of csv2nq in the CI pipeline until all future releases will have the new ConstructionPredecessor.csv and ConstructionSuccessor.csv tables.

We need a new version of the database to fully support the new approach. With such a version, tables all construction pattern dependencies would be imported from ConstructionPredecessor.csv and ConstructionSuccessor.csv, edited in place, and exported again. There would be no ConstructionPattern hasPriority field, and conversion to NQ would be done using the new version of csv2nq.

mike1813 added a commit that referenced this issue Oct 11, 2024
…extracted CP dependencies after fixing bugs in the extraction queries.

Issue #177: Changed package interdependencies so package#Users (and hence all other packages) now depends on package#Core.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant