-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularisation and Construction Pattern Sequencing #169
Comments
… separate packages for some sub-models.
…ables. Initially these are populated with dependencies inferred between patterns one of which creates a link/asset and the other matches the created link/asset either before or after the first pattern.
First step was to merge 'inference only' packages containing construction patterns with the packages for the sub-model to which the construction pattern contributed. The original idea of these packages was to separate asset and relationship types introduced for the purpose of construction only. That way, the inference packages could become side-shoots in the package dependency tree and if threats used those assets and relationships the Access DB editor would detect a packaging error. The problem is that once CP dependencies are added, we get a lot of dependencies between inference packages, so they cannot remain as side-shoots and the benefit of separating them out is lost. They just make it more difficult to encode CP dependencies without breaking the package dependency hierarchy. Step two was to add two new tables ConstructionPredecessor.csv and ConstructionSuccessor.csv to hold the dependencies. At first these were filled with dependencies extracted from CP contents by queries added to the Access DB editor. The first time this was done, the queries used contained bugs, so several passes were needed. The criteria for creating an autogenerated dependency between two packages CP1 and CP2 are as follows:
To get an asset creation dependency, the matching pattern of CP2 must match the asset to a mandatory node with the same or a parent asset class of the one created by CP1, and whose only relationships are prohibited relationships, or relationships to nodes that are prohibited or optional. If this node has other relationships, then the asset created by CP1 would not match independently of relationships created by CP1 (or later patterns), so the dependency should be represented as a link creation dependency. To get a link creation dependency, the matching pattern of CP2 must contain a link of the same type or a parent type of the link created by CP1. The source and target nodes for this link must have compatible asset classes in CP2 and CP1. If CP1 creates the source or target asset, then the corresponding node in CP2 must specify the same asset class or a parent class. If CP1 creates the link from/to an existing assets, then the source/target nodes in CP1 and CP2 must have at least one common asset subclass. Where a dependency does exist between CP1 and CP2, one must then decide if it is a create-then-match dependency. This is found from the relative positions of CP1 and CP2 in the original construction sequence. If CP2 comes after CP1, then the dependency is a create-then-match dependency: CP1 creates the asset/link first so it can be matched by CP2. If not, the dependency is a match-then-create dependency: CP1 creates the link, but afterwards so it is not matched by CP2. Finally, the dependency is encoded as a successor dependency if the two patterns are in different packages, and the package of the first pattern executed depends (directly or indirectly) on the package of the second pattern executed. If this is not the case then the dependency is encoded as a predecessor dependency, including when CP1 and CP2 are in the same package. With this approach, dependencies are not autogenerated if neither package depends on the other, i.e., if the two patterns are in distinct branches. A query was added to the Access DB domain model editor to find and display such 'unfulfilled' dependencies. To address an unfulfilled dependency without making changes to the involved construction patterns, there are three options:
Making package#5G depend on package#Virtualisation was the first step towards 'enabling' dependencies that would otherwise violate the package dependency hierarchy. |
The test case 'Context and Clouds' includes a mobile user device (a smart phone) moving between the user's home and town (i.e., between private and public locations), a 5G network (with inferred infrastructure) serving both spaces, Bluetooth pair connections between the phone and both a PC fixed in a private location and an IoT sensor carried by the phone user to both locations, a USB pair connection between the PC and an external thumb drive used to store sensor data, a cloud data centre (with inferred server cluster running an inferred K8S cloud framework) hosting containerised services, and routers in the user's home and in the data centre providing connections to the Internet. This test case therefore exercises many, though not all of the domain model features (e.g., it does not include OAuth-style single sign on authenticators, nor IoT controller devices). Running this test case using only autogenerated CP dependencies after making package#5G depend on package#Virtualisation, it turns out that 15 inferred Host-subjectTo-Jurisdiction relationships are not created by pattern JuPHS+s. This happens because this pattern should be executed after any pattern that creates a Host or a Host-hasPhysicalHost-PhysicalHost relationship. The problem is that some of those patterns are in package#Virtualisation, package#5G or package#CloudManagement, while JuPHS+s is part of package package#Legal which does not depend on these packages nor do they depend on package#Legal. As a result, it is not possible to add autogenerated dependency relationships between JuPHS+s and some predecessor patterns as those dependency relationships would violate the declared package dependency hierarchy. It would not be appropriate for package#Privacy to depend on package#Virtualisation, package#5G or package#CloudManagement because the concepts needed by package#Privacy are in package#DataLifecycle or packages it depends on including package#Users and package#Network, etc. The need for privacy in a system should not depend on whether the system is deployed on virtual hosts or cloud data centres, nor whether it uses 5G for communications. Actually, we already have one inappropriate dependency for package #Privacy (on package#IoT), but the plan is to find some way to remove this dependency, not add more inappropriate dependencies. At this stage we want to avoid making changes in the construction patterns, so (as described above) the only way to ensure these host-related construction patterns are executed before JuPHS+s is to insert a new pattern in (say) package#Network, and make it depend on each of the host-related patterns while making JuPHS+s depend on it. These dependencies can be added manually or by including features in the new pattern such that they can be deduced, e.g., by including the Host-hasPhysicalHost-PhysicalHost relationship in the matching pattern and making the pattern create the same relationship (creating a duplicate which will therefore have no effect on the model). |
…etwork to provide a bridge for a construction pattern dependency of JuPHS+s in package#Privacy on patterns in package#Virtualisation, package#5G or package#CloudManagement which can't be encoded directly without violating the asserted package dependency hierarchy.
The new 'dummy' pattern is PHH+hPH in package#Network, inserted just before the package#Network patterns to determine in which spaces a subnet is accessible based on the locations of connected non-mobile Hosts. The Access DB editor CP dependency generation queries find this depends on the host creation and physical host identification patterns in package#Virtualisation, package#5G and package#CloudManagement (among others), and JuPHS+s depends on it. With this change, using a modified csv2nq program) the 'Context and Clouds' test case produces the same results as with the original numbered construction sequence. A direct comparison between the NQ files after validation revealed that the only differences were either (a) timestamps or references to the domain model version, (b) differences in the threatened asset for threats that in principle threaten multiple assets (so system modeller must make an arbitrary choice which one to use), or (c) differences in the construction pattern responsible for creating some DataStep assets. The last of these arises because the sequence created by the modified csv2nq program is a partial sequence, so it is possible for multiple patterns to have the same hasPriority value leaving system modeller to decide in which order to apply them. The DataStep creation patterns don't depend on each other, but they have some overlaps (more than one pattern could create the same asset). Which pattern is responsible depends on what order they are applied by system modeller, but the outcome doesn't depend on this. |
There are still 43 irreducible dependencies between CP (i.e., with no dependencies of either pattern on an intermediary) that are not encoded in the CP predecessor/successor properties. These are either:
The last category must be addressed by further patterns similar to PHH+hPH whose purpose is to act as intermediaries, as without this discrepancies could arise in other test cases. Ideally the last two categories should be so addressed, as otherwise there is a risk that discrepancies may appear if the domain model is changed in any way. |
Dependency between pattern CtHhPH+hPH (package#CloudManagement) and patterns MPCNS+aF and FCNS+aF (package#5G). this arises because the 5G patterns determine accessibility of a cellular network based on the locations of fixed physical hosts that are connected or on which run virtual hosts that are connected to the cellular network. The reason this hasPhysicalHost dependency is not addressed by PHH+hPH is because that intermediate pattern was inserted after MPCNS+aF and FCNS+aF. This is necessary because the accessibility relationships they create are later used to infer the existence of more physical hosts comprising the 5G infrastructure (if not already present). PHH+hPH must come after those patterns, so it can't be inserted before MPCNS+aF and FCNS+aF. This does not affect the 'Context and Clouds' test case because although it includes cloud-hosted VMs and a 5G network, none of the former are connected to the latter. This could happen, although the test case would be even more complex so no such test was created. Instead, possible fixes were tested by checking that these dependencies are no longer found by the 'unfulfilled dependencies' query in the Access DB editor. The simple fix is to add a copy of PHH+hPH just before MPCNS+aF and FCNS+aF. The original was renamed PHH+hPH-2, and the new copy named PHH+hPH-1, so the names reflect their relative positions in the construction sequence. A better solution would be to move CtHhPH+hPH further up the construction sequence, so it is followed by a pattern that can be linked directly with MPCNS+aF and FCNS+aF without violating the package dependency hierarchy. This was not attempted as the first solution because it would be difficult to ensure that changes to the original sequence would not cause new discrepancies and make it impossible to run tests that use back-to-back comparisons. |
Next dependency is between package#5G patterns CNRS+aF and CNRANBSS+aF, both of which deduce that the backbone and radio access networks are accessible from the locations of the providing gateway routers (base stations in the case of the RANs). Subsequent package#LocalDeviceConnectivity patterns L1cGcL3+NSg and L3cGcL1+CSg create routes from a device to a L2/L3 subnet through a gateway paired with the device via Bluetooth/USB (the pairing connection being modelled as an L1 only subnet). This dependency violates the package hierarchy because package#LocalDeviceConnectivity does not depend (directly or indirectly) on package#5G or vice versa. To solve this one, a dummy package#Network pattern LSaS+a can be added between CNRANBSS+aF and L1cGcL3+NSg, in which the link LogicalSubnet-accessibleFrom-Space is matched and duplicated. The best position for this new pattern is at the start of the network connectivity (gateway routing) asset inference sequence, because that way it follows other patterns that deduce network accessibility such as FpLSS+aF and FcLSS+aF, and provides a boundary between accessibility and routing that can be used by any future extensions, e.g., for new types of networks. This was tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returned dependencies between package#5G patterns CNRS+aF & CNRANBSS+aF and package#LocalDeviceConnectivity patterns L1cGcL3+NSg & L3cGcL1+CSg. |
…SS+aF and L1cGcL3+NSg, so the dependency between those patterns can be resolved without violating the asserted package hierarchy.
Next up is a dependency between package#IoT pattern DcTh+s and package#LocalDeviceConnectivity package pattern USBD-S+S. The former adds a Host-stores-Data relationship between a Thing and its control input (which should normally be stored because it affects the behaviour of the Thing between updates). The latter adds an onboard DataService to a USB device if it stores any Data - so it depends on asserted or created Host-stores-Data relationships. To solve this, a dummy package#Network pattern HD+s before USBD-S+S. This should match a Host-stores-Data pattern and duplicate the 'stores' relationship, so there is a create-then-match dependency between DcTh+s and HD+s, and a match-after-create dependency between USBD-S+S and HD+s, neither of which violate the package dependency hierarchy. Tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returns dependencies between DcTh+s and USBD-S+S. |
…lowing the dependency between DcTh+s and USBD-S+S (which violates the package hierarchy) to be represented in terms of dependencies on HD+s.
Next is a set of dependencies between various package#IoT patterns and package#ProcessComms patterns. The IoT patterns create process-uses-process relationships between onboard IoT processes and external processes that communicate with the IoT device. The direction of the uses relationship indicates whether communication is initiated by the IoT device or the external process. The problem here is that the IoT package depends on the Network package (which covers process-process relationships) but not the ProcessComms package (which inserts inferred assets representing process-process relationships, and handles aspects such as authentication, authorisation and the use of communication proxies). A direct IoT-ProcessComms dependency (in either direction) therefore violates the package dependency hierarchy. This can be addressed by adding package#Network pattern CS+u which detects a Process-uses-Process relationship and makes a copy of it. Inserting this before the ProcessComms inference sequence (i.e., before SPuS+U) allows the dependencies between IoT and ProcessComms patterns to go via CS+u, without violating the package dependency hierarchy. Tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returns direct dependencies between the IoT and ProcessComms patterns. |
…+U in package#ProcessComms, allowing dependencies between package#IoT patterns and package#ProcessComms patterns to go via CS+u without violating the package dependency hierarchy.
Next comes dependencies betwen various package#IoT patterns and package#LocalDeviceConnectivity patterns involving USB storage devices. (Arguably, these patterns should also apply to storage devices paired via Bluetooth, but that is a separate issue). The problem here is that the USB storage patterns check for processes that handle data (i.e., access data for any reason, whether for the purpose of processing or merely to move data between other processes). This means they depend on IoT patterns which create relationships between onboard IoT processes and data, and between external processes that send control inputs to an IoT device or use sensor output from an IoT device. An IoT device may have a USB connection to some host gateway device, so this represents a real dependency that must be reflected in the construction order. This can be addressed by adding a package#Application pattern PD+h that detects a Process-handles-Data relationship and adds a duplicate relationship. Inserting this between the IoT and LocalDeviceConnectivity patterns allows their dependenci to go via PD+h, without violating the package dependency hierarchy. Tested using the 'unfulfilled dependency' query in the Access DB editor, which no longer returns direct dependencies between the IoT and LocalDeviceConnectivity patterns. |
…age inference patterns, allowing dependencies between these and IoT patterns to go via PD+h without violating the package dependency hierarchy.
The remaining 'unfulfilled' construction pattern dependencies are all 'fake' or 'unrealistic':
The fake dependencies in the current domain model (on branch 169) are as follows:
In some of these cases, it is possible to remove the apparent dependency by specifying that one of the assets in one or other of the related patterns is of a more restricted class than currently specified. For example, the LogicalSubnet in K8S-related patterns can't be an L1 subnet (Bluetooth or USB pairing connection), so we could specify that the subnet is a L23Subnet in those patterns. Such changes have not been made at this stage - for the purposes of this issue, it is enough to know that these apparent dependencies can be ignored. |
Separate issues added (#172, #173 and #174) describing contruction pattern changes needed to remove apparent dependencies that are not valid. These changes should eliminate fake dependencies 1,2, 4, 5 and one of the two dependencies in 6. Addressing these issues would not change the real CP dependencies, of course. It just removes apparant dependencies that may otherwise be rediscovered and cause confusion in future. We probably also need a way to record dependencies that are not valid, and add comments explaining why they can be ignored, or a way to exclude dependencies that are definitely fake from the existing 'unfulfilled dependency' query in the Access DB editor, and from other tools we might in future develop to assist with dependency management and composability. |
Finally, there are some dependencies that are not definitively 'fake' but are nevertheless unrealistic:
These can be ignored for now, but ideally we would have a way to acknowledge them and document the reason for ignoring them. |
At this point, following commit fe32abd, the new tables ConstructionPredecessor.csv and ConstructionSuccessor.csv in branch 169 contain the following fields:
The first three are self-evidently necessary, and are used by csv2nq to determine a partial ordering and insert it into the hasPriority field in ConstructionPattern.csv (overriding the existing value, if any). The 'note' field was added so users (or other tools) can insert the reason why the second pattern should precede or follow the first. At present, all rows in these tables were derived from create-then-match or create-after-match dependencies in which an asset or a link is created in one pattern that would match another pattern, so the extraction query inserts this information in the 'note' field. It is possible that several distinct assets/links may be created by one pattern and matched by another, so in some cases this produces duplicate rows that differ only in the 'note' field which refers to a different asset/link in each case. That isn't a problem for csv2nq as it creates a single, temporary table containing the successors of each URI, combining the predecessor and successor tables and also combining rows that refer to the same pair of construction patterns. The 'asserted' flag is a temporary measure designed to support the process of extracting dependencies from an existing sequence. Some features were added to the Access DB editor to support the process (as described above):
Once the new patterns are added, new irreducible dependencies will be found involving them. Some old dependencies that were previously irreducible will now be equivalent to chains involving these new dependencies. Starting again doesn't just add more irreducible predecessor and successor relationships - it also removes some of the old ones (as well as eliminating some unfulfilled dependencies). Because of this, the Access DB clears the old predecessor and successor tables before creating new ones. However, this should not be done for entries that were added manually because it may not be possible to recalculate them from construction pattern content. The 'asserted' flag is used to exclude these dependencies from being deleted. For example, the user may know that one subsequence can follow another, but each pattern in one depends only on some patterns in the other sequence. A partial sequence generated from those dependencies would include patterns from both sequences mixed together. That may make it harder for different people to maintain each sequence. A user might fix this by adding a pattern in each sequence and some asserted dependencies:
The two sequences would now remain separate. All dependencies between patterns in different sequences would be 'reducible' to chains going via the two new patterns. The maintainer of one sequence would not need to refer to patterns in the other. The new patterns may need to be very simple so they don't have other dependencies, which means it may not be possible to deduce those dependencies they do have from their contents. The 'fake' flag is used by the Access DB editor (but not yet csv2nq) to filter dependencies before using them, e.g. to compute the partial construction sequence. The idea was to allow a user to insert a dependency that may violate the package dependency tree but mark it as 'fake' and explain in the 'note' field why the dependency should be ignored. In practice, neither the 'asserted' nor 'fake' fields have yet been used because everything now at the head of branch 169 is based on extracted dependencies derived from construction pattern contents. It may be possible and/or sensible to remove these fields later, once all relevant domain models have been converted from the old sequence to the new dependency approach. |
…D+s, and creates parallel/opposed link flags for patterns CS+u, HD+s and PD+h.
…d ConstructionSuccessor.csv, so git diff and git merge can work more effectively.
One more issue. I used 'note' as the explanation field name in the ConstructionPredecessor.csv and ConstructionSuccessor.csv tables. It turns out that 'note' is a reserved keyword in MS Access SQL, though not in the SQL standard ISO/IEC 9075:2023, and this causes problems when reimporting the table in the MS Access DB editor. Changed the field name 'note' to 'notes' to fix this. This involved finding and altering the MS Access DB editor in several places. |
…ructionPredecessor.csv and ConstructionSuccessor.csv tables. Needed because 'note' is a reserved keyword in the Microsoft Access variant of SQL.
An empty version of the MS Access DB editor is now on Sharepoint as 'Domain Modeller - v10-3-1 - Sequencer.accdb'. This is a version designed to support the task of:
After export, assuming no other edits have been made, the only changes should be the addition of tables ConstructionPredecessor.csv and ConstructionSuccessor.csv containing the inferred dependencies, and a change in the order in ConstructionPattern.csv. Until now, this was sorted on the hasPriority field, but this 'sequencer' version of the database sorts on package membership then URI (same as most other tables), so that rows belonging to one package stay together. In this version of the MS Access DB editor, the inferred ConstructionPredecessor and ConstructionSuccessor entries are stored as tables CP_ConstructionPredecessor and CP_ConstructionSuccessor. When exporting the database, these are exported as ConstructionPredecessor.csv and ConstructionSuccessor.csv. When importing again, the data is stored in tables ConstructionPredecessor and ConstructionSuccessor, but entries not marked as 'asserted' are deleted. One must regenerate CP dependencies to repopulate CP_ConstructionPredecessor and CP_ConstructionSuccessor before one can export again. The ConstructionPatternEntry form now allows 'asserted' predecessor and successor references to be added or deleted. These are stored in tables ConstructionPredecessor and ConstructionSuccessor. One must then regenerate CP dependencies to repopulate CP_ConstructionPredecessor and CP_ConstructionSuccessor, during which any 'asserted' dependencies are copied from ConstructionPredecessor and ConstructionSuccessor into CP_ConstructionPredecessor and CP_ConstructionSuccessor, with the asserted flag set to true. On export, these flags are retained, so they will be loaded again into ConstructionPredecessor and ConstructionSuccessor and not deleted. This version of the database includes a script to calculate a sequence number or rank for each construction pattern in the partial ordering specified by tables CP_ConstructionPredecessor and CP_ConstructionSuccessor. The results are stored in ConstructionPatternRank, which gives the old 'hasPriority' value and the new sequence number per construction pattern. This is used to check that all patterns got a sequence number, and to find whether two patterns with an unfulfilled dependency would run in the wrong order if the dependency is not added. There is a query 'CP_UpdateConstructionPatternPriorities' which replaces the old 'hasPriority' value with the computed sequence number. This query is not run automatically by any of the CP dependencies generation functions available from the 'Construction Dependencies' form. The idea is that the old sequence should be preserved so any changes can be made with that as a baseline, and new CP dependencies extracted by regenerating them. However, this query is provided so it can be launched manually to insert the new sequence numbering in the ConstructionPattern table. If this is done, and the model then exported to CSV files, the results can be processed by the version of csv2nq from the 'main' branch, which is the version used in the domain model CI pipeline. This provides a way to create a release that can be processed by the current CI pipeline, if needed. The other optionb would be to export the model 'as is' (with the old ConstructionPattern hasPriority values), and use the new csv2nq (branch 7) to compute the sequence during the translation to NQ. We can't use that version of csv2nq in the CI pipeline until all future releases will have the new ConstructionPredecessor.csv and ConstructionSuccessor.csv tables. We need a new version of the database to fully support the new approach. With such a version, tables all construction pattern dependencies would be imported from ConstructionPredecessor.csv and ConstructionSuccessor.csv, edited in place, and exported again. There would be no ConstructionPattern hasPriority field, and conversion to NQ would be done using the new version of csv2nq. |
…extracted CP dependencies after fixing bugs in the extraction queries. Issue #177: Changed package interdependencies so package#Users (and hence all other packages) now depends on package#Core.
Construction patterns are rules for adding inferred assets and/or relationships to a system model. They fill in details that users may be unable to provide, or may forget to provide, where they are deducible (given some assumptions).
System modeller applies construction patterns one at a time. Patterns flagged as iterative are repeated until no further changes occur, before moving on to the next pattern in the sequence. The sequence is specified via an integer property
core#hasPriority
.As may be familiar to older folk, this priority number acts like a BASIC program line number. As in BASIC, best practice is to leave gaps between line numbers, so one starts at line 10 or 1000, and the next line is 20 or 1100 or 2000. The idea is to leave some line numbers available between the lines in case new lines need to be inserted. If that happens, the new line can be given a number without changing the old line numbers.
Of course, eventually some of the gaps get filled in and one has to renumber from the top. That creates problems for a diff-based version tracking system like Git, because every line in the program has to change.
It also makes modularisation more difficult to achieve. Although the source tables are not split into modules, we do use a
core#Package
property as a sort index in most tables, so lines pertaining to a given module are kept together. While we don't have separate sources for each module, the sources we do have are equivalent to what one would get if separate source files per module were concatenated.The exception is table ConstructionPattern.csv, which for reasons of readability is sorted on the priory index
core#hasPriority
. Modules are interleaved with each other, and adding a new module is doubly difficult because one must figure out where in the sequence each of its construction patterns should go, and then adjust thecore#hasPriority
properties of patterns either side.The proposal is that we remove these 'line numbers' from ConstructionPattern.csv. Instead, we should introduce two new tables called something like 'Predecessor.csv' and 'Successor.csv', containing
core#hasPredecessor
andcore#hasSuccessor
properties of construction patterns.The reason for using two tables rather than one is so the relationships between construction patterns respect package dependencies. If package B depends on package A, a construction pattern from package A would not refer to package B patterns at all. Patterns from B would refer to patterns in A, so each pattern in B could specify its position in the sequence by referring to patterns in A on either side of the position it should have.
Of course, with this approach it would be possible to specify a partial ordering only, leaving system modeller to decide which pattern to process first in some cases.
Ideally, system modeller would use the
core#hasPredecessor
andcore#hasSuccessor
properties to work out the order for itself. This is not essential, though, because it should be possible to insert code into csv2nq that can figure out the sequence and insert the calculated values ofcore#hasPriority
into the RDF for deployment to system modeller. The sequence is encoded by precedence relationships in the source code, but in the 'compiled' version this is converted to a numerical order.Proposals for these changes in csv2nq are covered in csv2nq#7.
The text was updated successfully, but these errors were encountered: