Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for setting DFDL external variables #16

Merged

Conversation

stevedlawrence
Copy link
Member

The name/value of NiFi dynamic properties are now treated as DFDL external variables. Property values are allowed to be NiFi expressions, which are evaluated and set as the variable values. If a property value evaluates to the empty string, it is ignored--this helps when a NiFi expression determines that a variable does not apply for a schema, since DFDL requires that all variables passed in externally must be valid for the schema. This does mean it is not possible to have a variable with the value of an empty string, but this should be rare and worked around.

This also refactored exceptions, so that functions throw correct exceptions instead of turning everything into an IOException, which worked but is not technically correct.

Also added a new "External Variables" section to additional details pages documening this capability, and cleaned up the additional details pages.

Copy link
Collaborator

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I have suggestions ti improve phrasing in help text and doc. That's all though.

// ignored and is not added as a variable. This supports expressions that determine that
// a variable does not apply to a schema and to ignore it, since all varibles passed to
// withExternalVariables must be valid for that schema.
final HashMap<String, String> variableMap = new HashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I harp on this I know. Please use "LinkedHashMap" for deterministic behavior.

@@ -51,6 +53,12 @@
@Tags({"xml", "json", "daffodil", "dfdl", "schema", "xsd"})
@CapabilityDescription("Use Daffodil and a user-specified DFDL schema to transform data to an infoset, represented by either XML or JSON.")
@WritesAttribute(attribute = "mime.type", description = "Sets the mime type to application/json or application/xml based on the infoset type.")
@DynamicProperty(
name = "Name of external variable defined in a DFDL schema",
value = "Value of the external variable. Variable is ignored if value evaluates to the empty string",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say,

value = "Value to set for the DFDL external variable. May be an expression. The DFDL variable is not set if the value expression evaluates to an empty string. ",

Presumably it is clear what the expression language is from the context?

@@ -48,6 +50,12 @@
@Tags({"xml", "json", "daffodil", "dfdl", "schema", "xsd"})
@CapabilityDescription("Use Daffodil and a user-specified DFDL schema to transform an XML or JSON representation of data back to the original data format.")
@WritesAttribute(attribute = "mime.type", description = "If the FlowFile is successfully unparsed, this attriute is removed, as the MIME Type is no longer known.")
@DynamicProperty(
name = "Name of external variable defined in a DFDL schema",
value = "Value of the external variable. Variable is ignored if value evaluates to the empty string",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same rephrase as for parser.

<p>
Some DFDL schemas may require the use of plugins, such as layers, user defined functions, or custom
character encodings. To make these available to NiFi Daffodil processors, these plugins can be compiled to
jars, and paths to the jars or parent directory may be added to the <tt>Plugins and Schemas</tt> property. For
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say "paths to individual jars or to a parent directory containing one or more jars"

DFDL Schema File: /com/example/foo.dfdl.xsd
Plugins and Schemas: /usr/share/nifi-daffodil/schemas/
</pre></code>
<p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draw people's attention to that leading slash like
"In the above notice how the paths begin with a '/' which denotes the file is to be found at that location inside a jar file. "

</dd>
<dt>Cache TTL after last access</dt>
<dd>
<p>
Defines the cache time-to-live, or how long to keep an unused compiled DFDL schemas in the cache. To avoid
compilation, it is recommended that this should be larger than the amount of time it is expected
for a DFDL schema to be unused.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase? I didn't understand how to set this.
Maybe describe by example: if a schema is used occasionally (once a day perhaps) then set this to 24 hours to avoid recompiling it each time it is used.

You also need to say what are the units of measure for this time limit. Seconds, minutes, hours?

You also need to say what happens if it is set to zero, i.e., what is the acceptable range of values.

Does this have a useful default, such that people are unlikely to need to change this? If so you should say that this defaults to N which works for most situations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

It turns out the default is 30 minutes, which feels like it might be too short? In fact, I wonder if we should default to zero seconds, which disables cache expiration? I imagine it could be very confusing to see random hiccups if we have to recompile a schema.

@@ -110,16 +122,20 @@ <h2>Compiled DFDL Schema Cache</h2>
<dl>
<dt>Cache Size</dt>
<dd>
<p>
Defines the maximum number of DFDL schemas that can be compiled and saved for rapid reuse from the
cache. To avoid compilation, it is recommended that this value be larger than the expected number
of possible DFDL schemas. Setting this value to 0 disables the cache, though this is not
recommended.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have a useful default, such that people probably don't need to change this? Then you should say that to discourage people from frobbing this without reason.

@@ -111,16 +123,20 @@ <h2>Compiled DFDL Schema Cache</h2>
<dl>
<dt>Cache Size</dt>
<dd>
<p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw this same thing above. Does it really need to be repeated textually? Gaaak.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, NiFi doesn't have a way to reduce this duplication. And we can't do a symlink or something because the first paragraph and title are specific to parse vs unparse.

The name/value of NiFi dynamic properties are now treated as DFDL
external variables. Property values are allowed to be NiFi expressions,
which are evaluated and set as the variable values. If a property value
evaluates to the empty string, it is ignored--this helps when a NiFi
expression determines that a variable does not apply for a schema, since
DFDL requires that all variables passed in externally must be valid for
the schema. This does mean it is not possible to have a variable with
the value of an empty string, but this should be rare and worked around.

This also refactored exceptions, so that functions throw correct
exceptions instead of turning everything into an IOException, which
worked but is not technically correct.

Also added a new "External Variables" section to additional details
pages documening this capability, and cleaned up the additional details
pages.
@stevedlawrence stevedlawrence merged commit 67ca394 into TresysTechnology:master Jan 29, 2024
6 checks passed
@stevedlawrence stevedlawrence deleted the variable-support branch January 29, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants