XML: What It Is and Why We Need It In Publishing

Will Skinner, a full-time 2013/14 student, works at the British Standards Institute and is an all-around XML guru. Here he gives us an insight into what XML is and how it is fundamental in the publishing process.

If you intend to work in Production, or in publishing in general, at some point you are going to come face-to-face with Extensible Mark-Up Language (XML). That first encounter can be scary, and XML can seem impenetrable, so here’s a brief insight into what it is, why we need it and how it is used in the working environment.

You may never need to actually edit XML (though this is becoming more widely common in the industry), as most companies tend to outsource this function. However some knowledge of how XML works, and the ability to understand and recognise why it doesn’t, may play a crucial role in your making informed decisions as a publisher.

What Is It?

XML is an acronym for eXtensible Markup Language, a form which, as its name implies, is capable of being extended in any direction or dimension.

Its use is recommended by the World Wide Web consortium, an international community where member organisations, the Consortium’s staff and the public work together to develop web standards, led by web inventor Tim Berners-Lee, as a way of capturing content and metadata (information about content) in such a way that they can be manipulated with ease to suit a huge variety of purposes.

It is the quality of the XML at the heart of any project that ensures its integrity regardless of the format in which it is published. It is a mark-up language, meaning it can be used to attach values, or ‘tags’ to content which describe the function of that content, not its format.

The tags used in XML are not predefined, therefore you can create your own tags to define data in any direction you desire. It is a source format, in that it must be transformed for use.

So, what gets coded in content? Everything, according to Jodie Hodgson at Cambridge University Press!

Why We Need XML

The uses of XML in publishing are myriad. Markup languages were created so that the structure and meaning of content can be separated from its styling. This makes it possible to use content in ways that would not be possible if content were left in its raw digital form.

As Adrian Bullock states in ‘Book Production’, ‘content is no longer just the words as they appear on the page; content has now become data which can be read and understood by a computer and can be:

• linked
• referenced
• sorted
• Searched for
• amalgamated
• counted
• indexed
• listed
• re-used, re-versioned and re-purposed.’

How XML Works

Case Study 1 – Content Conversion Process

At the British Standards Institute (BSI), the Online Production team is responsible for the conversion of print documents (standards, technical handbooks etc.) into formats that can be used in digital products.

To do this content is converted to XML, and then to Hypertext Markup Language (HTML), the language used to code the display web pages.

The conversion is usually outsourced to external suppliers and BSI has developed a web-based solution, Sentinel, to allow this process to develop smoothly across the production cycle.

It provides the Online Production team with the ability to upload work packages for conversion, review and accept completed packages, as well as setting validation rules and stylesheets. It also allows suppliers to download the packages that they have been assigned, receive feedback and submit completed ones back to BSI.

The Online Production team can them view the submitted packages and add or edit comments. This can then either be returned to the supplier if below standard, or used to make corrections.

The Online Production team at BSI have also worked closely with developers to create an Information Publishing Platform (IPP) that makes the XML content viewable and available for customers to purchase.

In the future XML files will allow BSI to transform its content for a number of end uses; PDFs, smartphones, PDAs, accessible media, etc.

Case Study 2: The Production Workflow

For the Production Assistant, XML functions in the background in a digital workflow. When a digital edition has been proofread and is deemed ready for publication, the content, in the form of a pdf file, is placed in the ‘Approved to Publish’ folder by the Editorial team.

The assistant will usually verify that content is ok to publish and then move the pdf through the production cycle using an application like SAP.

SAP then triggers a process that moves the file to the ‘Production Only’ ’hot’ folder; a hot folder is dedicated to ‘batch processing’. This automatically processes and places the files into a specified output folder.

Whilst in the ‘Production Only’ folder, the server automatically creates a further copy of the file which is placed in the ‘Product Data’ hot folder. This converts any pdfs into a raw XML file and a ‘cleaned-up’ XML file. These files are then automatically sent to the printer!

There are a number of resources where you can find out more about XML:

W3Schools: XML Tutorial
World Wide Web Consortium: Introduction to XML

Share Button


Will Skinner is a full-time (2013/14) student on the MA Publishing​. An experienced Operations Manager, Will is now Production Assistant in the Content Solutions Department of the British Standards Institute. He works with colleagues in the ePublishing and IT teams on projects using XML technology in the online delivery of Standards publications.'