Chapter 2. Configure MorganaXProc-III
1. Setting the configuration for MorganaXProc-III
Using the command line interface you can also set specific configurations for internal features of MorganaXProc-III. Those control can appear everywhere on the command line after the specification of the pipeline to run.
1.1. Setting the catalog resolver
MorganaXProc-III uses XMLResolver (developed by Norm
Tovey-Walsh) as catalog system to resolve resources used in your pipeline. Use
-catalogs=uri-or-path
to set an
initial XML catalogs as a semicolon-separated list of URIs or paths. If a path or a
URI is relative it is resolved against the current working directory.
1.2. Selecting the XSLTConnector
MorganaXProc-III implements a flexible way letting you
choose which XSLT processor is used in p:xslt
. For each supported XSLT
processor MorganaXProc-III provides a connector which
needs to be registered to be used. Only one connector can be registered for a
specific pipeline run. To register a connector use
-xslt-connector=java-classname-or-shortcut
.
Currently the following XSLT processors are supported:
processor | java classname | shortcut |
---|---|---|
Saxon 9.9 | com.xml_project.morganaxproc3.saxon99connector.Saxon99XSLTConnector (Recommended to be used with Saxon 9.9.1.8) | saxon99 |
Saxon 10 | com.xml_project.morganaxproc3.saxon10connector.Saxon10XSLTConnector (Recommened to be used with Saxon 10.8) | saxon10 |
Saxon 11 (please see note below) | com.xml_project.morganaxproc3.saxon11connector.Saxon11XSLTConnector (Recommened to be used with Saxon 11.4) | saxon11 |
Saxon 12, 12.1 and 12.2 (please see note below) | com.xml_project.morganaxproc3.saxon12connector.Saxon12XSLTConnector (Recommened to be used with Saxon 12.2) | saxon12 |
Saxon 12.3 or later (please see note below) | com.xml_project.morganaxproc3.saxon12_3connector.Saxon12_3XSLTConnector (Recommened to be used with Saxon 12.3 or later) | saxon12-3 |
Any JAXP compliant processor | Any processor extending
Of cause you have to make sure that the relevant Java classes are on Java's classpath. | jaxp |
Currently Saxon 10 is used as default setting, so
if you do not set an XSLTConnector, this version will be used. Please mind that you
have to put the requested XSLT processor on the Java classpath too. The easiest way
is to just drop the JAR-file into folder “MorganaXProc-IIIse_lib
”
or “MorganaXProc-IIIee_lib
” respectively. As the classes in the
different version of an XSLT processor typically overlap, it is in general a risky
strategy to have different versions of the same processor on the classpath.
MorganaXProc-III will use the first
Saxon version found on the classpath.
Another problem in setting the configuration for XSLTConnector is that
MorganaXProc-III will always use the same instance of
a Saxon processor if it is used for p:xslt
and
p:xquery
. Therefor it is not possible to use
Saxon 9.9 for XSLT processing and
Saxon 10 for XQuery processing or the other way
around.
Saxon 11
and later in p:xslt
and
p:xquery
:The W3C specification requires that if document-uri(D) = U, then doc(U) is D. A consequence of this rule is that two different documents cannot have the same document URI.
This rule is strongly enforced with Saxon 11.
This has consequences for p:xslt
and p:xquery
steps in
an XProc 3.0 pipeline using an underlying
Saxon 11 processor. XProc 3.0 does allow
two different documents to have the same URI. Additionally it also forces
different documents to have the same URI as all inline created documents (in
general) get their base URI from the base URI of the surrounding pipeline. Using
these documents as initial match selection, global-context-item, and/or as
default collection forces Saxon to raise an error.
Norm Tovey-Walsh invented a mechanism to workaround this problem by making document
URIs unique: Each document supplied to Saxon 11
(either via p:xslt
or via p:xquery
gets a unique id by
adding a special query parameter to document's base URI.
MorganaXProc-III
follows this strategy: The document URI
is made unique (where necessary) by adding a query parameter named
“xproc_unique
” associated with a increasing integer
value.
This seems to be a good solution for now: May be the query parameter is added in places where it is not necessary, but time will tell. One place where the query parameter is necessary due to Saxon 11's internal mechanism is providing the same document twice as part of the default collection. Saxon 11 raises an error here, so a query parameter has to be added. This is probably not the best solution, but it is a solution for now, occurs only in a special situation, and does not interfere with normal operation too much.
1.3. Selecting the XQueryConnector
With
-xquery-connector=java-classname-or-shortcut
you can select the XQuery processor to be used in p:xquery
in the same
ways, an XSLTConnector is used for p:xslt
. Currently the following
connectors are supported:
processor | java classname | shortcut |
---|---|---|
Saxon 9.9 | com.xml_project.morganaxproc3.saxon99connector.Saxon99XQueryConnector (Recommended to be used with Saxon 9.9.1.8) | saxon99 |
Saxon 10 | com.xml_project.morganaxproc3.saxon10connector.Saxon10XQueryConnector (Recommended to be used with Saxon 10.8) | saxon10 |
Saxon 11 | com.xml_project.morganaxproc3.saxon11connector.Saxon11XQueryConnector (Tested with Saxon 11.4) Please mind that Saxon 11 only supports XQuery 3.1. Therefor
using | saxon11 |
Saxon 12, 12.1 and 12.2 | com.xml_project.morganaxproc3.saxon12connector.Saxon12XQueryConnector (Tested with Saxon 12.2) Please mind that Saxon 11 and later only supports XQuery 3.1. Therefor
using | saxon12 |
Saxon 12.3 or later | com.xml_project.morganaxproc3.saxon12_3connector.Saxon12_3XSLTConnector (Tested with Saxon 12.3 and 12-4) Please mind that Saxon 11 and later only supports XQuery 3.1. Therefor
using | saxon12-3 |
Currently version Saxon 10 is used as default. Please see the hints on Saxon as XSLTConnector for further configuration details.
1.4. Loading configuration for XSLT- and XQuery processors
Some XSLT- and XQuery processors accept configuration files to control their
settings. With -xslt-config=uri-or-path
and
-xquery-config=uri-or-path
you can
provide such files. MorganaXProc-III does not do anything
with these files but just pass them through to the XSLT- or XQuery processors when
they are instantiated. The files nature and the semantics of its content is therefor
completely determined by the receiving product. Please see its documentation.
As stated above, MorganaXProc-III will use the only one
processor instance if you choose to do p:xslt
and p:xquery
with Saxon. If you do so, please make sure to give the same configuration file for
-xslt-config
and -xquery-config
. As only
one Saxon instance is created, only one of them will be used, but as it is not
predictable, whether this will be done by a p:xslt
or a
p:xquery
.
1.5. Selecting the Schematron processor
Using
-schematron-connector=java-classname-or-shortcut
you can select the Schematron processor used in
p:validate-with-schematron
. Currently the following Schematron
processors are supported:
processor | java classname | shortcut |
---|---|---|
SchXSLT | com.xml_project.morganaxproc3.validation.support.SchXSLTAdapterForSchematron | schxslt |
SchXSLT2 | com.xml_project.morganaxproc3.validation.support.SchXSLT2AdapterForSchematron | schxslt2 |
Skeleton XSLT implementation | com.xml_project.morganaxproc3.validation.support.ISOSkeletonAdapterForSchematron | skeleton |
Currently the SchXSLT
connector is default. Please note that
MorganaXProc-III only provides connectors to the two
implementations, but not the implementation itself. If you want to use
p:validate-with-schematron
in your pipeline, you have to download an
implementation yourself and make their paths known to
MorganaXProc-III. For details please see instructions.
1.6. Selecting the XML Schema validator
MorganaXProc-III comes with out of the box support for
XML Schema 1.0 validation using the Xerces implementation supplied with Java.However
this validator does not know anything about XML Schema 1.1, so if attribute
version
on p:validate-with-xml-schema
is set to
1.1
you will get an error message, saying the connector is
not capable of this type of validation.
MorganaXProc-III offers support for validation with XML
Schema 1.1 using either Xerces (xerces-2_12_1-xml-schema-1.1)
or
Saxon-EE
. If you have already installed
Saxon-EE
for XSLT transformation or p:xquery
,
MorganaXProc-III will automatically select it as soon
as validation with XML Schema 1.1 is invoked. If you want to use Xerces, download
the package from their website and make it available on
MorganaXProc-III's classpath. The JAR-files needed
are xercesImpl.jar
,
org.eclipse.wst.xml.xpath2.processor_1.2.0.jar
, and
cupv10k-runtime.jar
.
Caveat:
If you use
Xerces
andSaxon-EE
with MorganaXProc-III, the first implementation to appear on the classpath will be used. To explicitly control which validation processor is used, you can supply a command line switch or a configuration file element “schemafactory-impl
”. The supplied value must be the fully qualified factory class name of a class which provides implementation ofjavax.xml.validation.SchemaFactory
. ForXerces
this isorg.apache.xerces.jaxp.validation.XMLSchemaFactory
(for Schema 1.0) ororg.apache.xerces.jaxp.validation.XMLSchema11Factory
(for Schema 1.1), forSaxonEE
the class is namedcom.saxonica.ee.jaxp.SchemaFactoryImpl
. Alternately you can supply “Xerces
” or “Saxon
” as short cuts. Of course you can supply the fully qualified class name of any other class implementing the named interface. Please make sure that the releventJar
files can be found on the classpath.Due to the internal mechanism of Saxon-EE, the validator will try to resolve additional schemas even if you set
p:validate-with-xml-schema
's optionsuse-location-hints
and/ortry-namespaces
tofalse
.
SaxonEE has some features to control aspects of
Schema validation. For instance you can use http://saxon.sf.net/feature/strip-whitespace
to control whitespace stripping of the document to be validated. MorganaXProc-III
supports these features via p:validate-with-xml-schema
's option parameter.
The key has to be in Saxon's feature namespace and the local name is the feature's name,
e.g. Q{http://saxon.sf.net/feature}strip-whitespace
. Those key-value-pairs are ignored
if Schema validation is performed by another validator.
1.7. Selecting processor for Invisible XML
Starting with release 1.3 MorganaXProc-III supports
Invisible XML processing with XProc's p:ixml
. The step is implemented as specified using
either the NineML
tools developed by Norm Tovey-Walsh or Markup Blitz developed by
Gunther Rademacher. There is no default selection so
you have to explicitly set your preferred processor for iXML.
To use NineML as your IXML processor use
-ixml-connector=com.xml_project.morganaxproc3.ninemlConnector.NineMLConnector
on commandline or the respective<ixml-connector>
in your configuration file. Additionally you need CoffeeGrinder and CoffeeFilter on classpath. The connector for NineML supports Java 8 and later.If you use Java 11 or later, you can use Markup Blitz by setting
-ixml-connector=com.xml_project.morganaxproc3.markupblitzConnector.MarkupBlitzConnector
. Additionally make sure thatmarkup-blitz-xxx.jar
is on your classpath.
1.8. (Re-) directing message from steps and stylesheets
Using the common attribute [p:]message
pipeline author can
define messages to be printed on some output channel. The same is true for
xsl:message
within XSLT stylesheet. Using
-messages
on CLI (or a message
element in
configuration document) users can choose whether and where these message are
printed. If the value of messages
is off
, no
messages will be printed. The values std:out
or
std:err
tell MorganaXProc-III to
print the messages to the standard output stream or the standard error stream. Any
other value will be interpreted as a filepath or a URI of a ressource to where the
messages should be written. If the path/URI is relative it will be resolved against
the current working directory (for the CLI option) or the base URI of the
configuration document. Any existing resource with the given name is overwritten
without warning.
2. Setting the configuration file on the command line
Additional to the explicit settings on the command line,
MorganaXProc-III also allows you to bundle specific
configuration settings in a file. This is done by using
-config=uri-or-path
as
first element on the command line. If the given URI or path
is relative it is resolved against the current working directory. You can find an
example of a configuration file in your distribution folder. All switches and
configuration controls can be used in the configuration document. The local name of
the elements are the same as on the command line without the trailing
“-
”.
As the configuration file has to be the first element in the command line settings, latter settings for the same switch or control will override the settings in the configuration document.
3. Configuration of XSLT based schematron implementations
MorganaXProc-III supports XSLT based schematron implementations like SchXSLT and Skeleton XSLT implementation. These implementation are based on a set of XSLT stylesheets needed to perform the actual validation. You can place the respective files anywhere on your file system, but please make sure you do not change the file names. In order for MorganaXProc-III to find the relevant file on your file system, you have to register the relevant folders with the processor. In order to do so, the configuration file loaded by MorganaXProc-III has to contain one or more of the following elements:
mox:path_to_SchXSLT_1
: Path to the folder containing the SchXSLT files for XSLT 1.0, e.g.schxslt-1.4.5-sources/xslt/1.0
.mox:path_to_SchXSLT_2
: Path to the folder containing the SchXSLT files for XSLT 2.02, e.g.schxslt-1.4.5-sources/xslt/2.0
.mox:path_to_iso_skeleton_schematron_1
: Path to the folder with the Skeleton files for XSLT 1.0, e.g.ISO_SKELETON_SCHEMATRON_1
.mox:path_to_iso_skeleton_schematron_2
: Path to the folder withe the Skeleton files for XSLT 2.0, e.g.ISO_SKELETON_SCHEMATRON_2
.
If a provided path is relative, it is made absolute against the URI of the configuration file the respective element is contained in.
4. Configuration of Schematron to XSLT transpiler implementations
MorganaXProc-III also supports
SchXSLT2, a schematron to XSLT 3.0 transpiler
developed by David Maus. This implementation uses a single XSLT-Stylesheet
("transpiler.xsl
") to translate the schematron to an XSLT
stylesheet. To register the location of the transpiler-stylesheet, use
path_to_SchXSLT2_transpiler
. SchXSLT2 uses static
parameters to control stylesheet transformation. Currently MorganaXProc-III
supports the following static parameters in the p:validate-with-schematron
's option
parameters
: schxslt:debug
, schxslt:streamable
,
schxslt:location-function
and schxslt:fail-early
(schxslt="http://dmaus.name/ns/2023/schxslt")
.
5. Adding media type mappings
When loading documents, MorganaXProc-III needs to determine
whether the file contains an XML document, an HTML document, a JSON document, a text
document, or a binary document. MorganaXProc-III defines a number
of mappings from file extensions to media types by default. In your XProc 3.0
pipeline you can use attribute content-type
on
p:load
and p:document
to explicitly define
the media type of the related document.
MorganaXProc-III defines an additional way to identify the
media type via file extensions using the configuration file. If the document
contains an element mediatype-mapping
in
MorganaXProc-III's namespace, all elements
map
(in the namespace) will be considered. They need to have
non-empty attributes file-extension
and
media-type
. Attribute media-type
must
contain a valid media type. If all those criteria are matched and the file extension
is not already bound to another media type, subsequent loading of files with the
used extension will be recognized as the used media type. See file
config.xml
in your MorganaXProc-III
distribution for an example.
Please keep in mind that running MorganaXProc-III with a different configuration file might change pipeline's behaviour dramatically because the file might be recognized with another media type.