Sunday, September 2, 2007

The Business Value of XML

What exactly is the business value of XML and its related technologies such as XML Schema, ISO Schematron, XForms, XSLT, and XQuery? Let's review three use cases where XML adds real value.

The first use case is software configuration and metadata. Java EE Frameworks such as Hibernate, Spring, Struts, and JSF use XML for configuration and metadata. Unfortunately, not all Java EE developers understand the value of configuring their application with XML as opposed to Java annotations. There is currently a backlash against XML with some developers complaining about what they call "XML hell". These developers prefer to keep configuration and metadata closer to the code itself using Java annotations. Newer Java EE frameworks such as the JPA (Java Persistence API) and Struts 2 provide annotation capabilities which are very popular with developers. I personally believe that this is a trend in the wrong direction. Annotations are certainly convenient for developers, but not necessarily to end users of your application. With XML configuration, end users of your software who are not programmers can achieve a certain level of customization on their own by simply editing an XML configuration file in a text editor without importing your SDK in an IDE and compiling Java code or hiring a Java EE developer to do it for them. As a software buyer, when deciding between two competing products, I would choose (everything else being equal) the one that allows me to do some customizations with simple XML files. Developers can reduce the verbosity of their XML configuration files by using techniques such as inheritance, overridable default values, and preferring the use of attributes over child elements for example. A compromise could be to make the annotations overridable with XML configuration.

The next use case is data exchange across organizations. Two example applications that are currently delivering real value are UBL and NIEM. XML vocabularies such as UBL and NIEM define common semantics and data structure trough data dictionaries and XML schemas respectively. In addition, they can specify certain business rules that can be enforced with the use of an assertion-based schema language such as ISO Schematron.

Developed by OASIS, UBL (Universal Business Language) is an XML vocabulary for the exchange of business documents such as invoices, purchase orders, and receipts. In Denmark, the government has mandated the use of UBL invoices for all public-sector billing. The result is over 100 millions euros in savings every year. The Swedish government estimates that it can save 440 millions euros with the adoption of UBL for electronic commerce. Please note that these initiatives involve not only big government and Fortune 500 companies, but hundreds of thousands of SMEs (small and medium size enterprises) as well.

NIEM (National Information Exchange Model) developed by the U.S. Department of Justice and the Department of Homeland Security is an XML vocabulary for the exchange of information between government agencies. For example, it allows law enforcement agencies to quickly exchange information. Law enforcement agencies use heterogeneous applications called RMS (Record Management Systems) and XML data is the bridge between them because it is vendor neutral, cross-platform, and supports structured data of arbitrary complexity. XSLT 2.0 as a generic XML transformation language can play an important role here as well. As an example, an RMS system can export raw XML data which can then be mapped to a NIEM compliant XML Schema by performing an XSLT transformation. If a legacy RMS system can only export CSV (comma-separated values)text files, XSLT 2.0 can up-convert the CSV into a NIEM compliant XML document. It is possible to process XML with a traditional programming language such as C# or Java. However, the problem is the "impedance mismatch" between the type system of these programming languages and a type system based on XML (such as the XML schema type system). Some developers will find XQuery easier to use than XSLT (probably because of its SQL-like syntax) for processing XML data. In addition, XSLT 2.0 and XQuery are declarative processing languages (they describe the "what" as opposed to the "how") and are therefore accessible to many non-programmers.

The last use case is knowledge management in general and content management in particular. In the new global knowledge economy, the most important asset of an organization is its intellectual capital which is acquired and developed by its knowledge workers. That intellectual capital is often captured in documents such as blogs, wikis, emails, PowerPoint presentations, podcasts, engineering drawings, architecture diagrams, ISO 9000 quality manuals, installation and troubleshooting procedures, Microsoft Word documents containing requirements and design specifications, various corporate forms, etc. These mission-critical documents are often dumped into shared network drives. They are not managed with the same rigor and cannot be queried as the data contained in your CRM and ERP systems. The main reason is that these documents represent unstructured data as opposed to the well-structured relational data stored by the RDBMS on which your ERP and CRM systems sit. In some industries, the need to bring content under control is driven by regulatory compliance. In any case, organizations shouldn't wait until their most valuable employees leave before they start thinking about managing their knowledge assets. That's where an enterprise content management system (CMS) and an enterprise portal come into play.

XML goes beyond tags, taxonomy, and content categorization to provide fine-grained content discovery, query, and processing capabilities. With XML, the document becomes the database. First, using XML schema, you can constrain and validate the structure and data types of the content of your business documents just like you do with a relational database schema. Using XForms, you can provide a user friendly interface for your end users to contribute XML content by presenting them with a regular HTML form. Once the content is captured as XML, it can be stored in a native XML database. With XQuery, the native XML database allows you to perform structured queries on the content (as opposed to just full-text or metadata search). XQuery also allows you to assemble content dynamically (for example, build two distinct training manuals for two different configurations of your product from a single source). You can use XQuery to query both relational (from your ERP and CRM systems) and XML data sources and aggregate the results. With XSLT, you enable content adaptation for cross-media publishing (print, web, and wireless) from a single source.

If you decide to manage your product technical documentation with XML, there are standards that can help. The DITA (Darwin Information Typing Architecture) specification is very popular with computer software and hardware documentation. The S1000D standard is designed to support mission critical maintenance and operation documentation in industries such as aerospace, defense, automotive, oil and gas, heavy equipments and machinery, and power generation.

Organizations that have a strategy for managing their knowledge assets using XML and related technologies will have a definitive competitive advantage in today’s economy.

No comments: