You are on page 1of 14

CREATING WORD DOCUMENT IN OFFICE OPEN XML FORMAT USING JAVA.

AUTHOR SANJAY KUMAR MADHVA/ KULKARNI D.V./ SRINIDHI H. S/PUJARI Y. SONATA SOFTWARE LIMITED

RV ROAD BANGALORE INDIA.

INTRODUCTION TO OFFICE OPEN XML............................3 SCOPE OF THE ARTICLE.........................................................3


WORDPROCESSINGML ................................................................................................................................3 PACKAGE .....................................................................................................................................................4 PARTS ..........................................................................................................................................................4 ITEM.............................................................................................................................................................5 CONTENT TYPE ............................................................................................................................................5 CONTENT-TYPE ITEM....................................................................................................................................5 RELATIONSHIP..............................................................................................................................................5 PACKAGE RELATIONSHIP..............................................................................................................................6 PACKAGE-RELATIONSHIP ITEM ....................................................................................................................6 RELATIONSHIP MARKUP ..............................................................................................................................6

JAVA WORDPROCESSINGML IMPLEMENTATION.........6


JAVA WORDPROCESSINGML FOLDER CREATION.......................................................................................6 JAVA WORDPROCESSINGML FILE CREATION.............................................................................................7 Create [Content_Types].xml...................................................................................................................7 Create or copy image1.jpg ....................................................................................................................7 Create .rels.............................................................................................................................................7 Create document.xml.rels.......................................................................................................................7 Create document.xml..............................................................................................................................8 JAVA PACKAGING CLASS IMPLEMENTATION..............................................................................................9 Importing Classes...................................................................................................................................9 Create a OpenXMLZipFile Classes........................................................................................................9 Create a CreateZipFile Method..............................................................................................................9 Create a UnZipFile Method..................................................................................................................10 Creating WordprocesingML package...................................................................................................14

Introduction to Office Open XML


The introduction of the Open XML file formats standard from Ecma provides developers with the option of creating/editing an Open XML document using any development tool on any platform as long at they are conforming to the standardized file format specified. The use of open document formats, such as WordprocessingML improves interoperability by enabling standard-based XML 1.0 tools to create, read and write files conforming to the standardized file format. The Office Open XML formats can be used by a wide set of tools and platforms in order to foster interoperability across office productivity application and with line-of-business systems. This article is based the Office Open XML standard being developed by Ecma the TC45 technical committee, the family of XML schemas collectively called Open XML. This standard defines the XML vocabularies consumed and produced by applications such as the Office 2007 version of the Microsoft Office products Microsoft Word, Microsoft Excel, and Microsoft PowerPoint. The standard describes the packaging of documents that conform to these schemas.

Scope of the article


Article describes the packaging mechanism and minimum required files for creating an Office Open XML Word document (referred to as WordprocessingML) using JAVA. This document, although created with no Microsoft APIs or software, can be consumed or viewed by Word 2007. (It may also be consumed by Word 2000, Word XP, or Word 2003, using the free add-in for Open XML support that will be released by Microsoft when Office 2007 is released.) Assumption: All the required files such as XML and images are created manually under the directory for packaging.

Understanding Office open XML


In order to create a WordprocessingML, let us understand how the document is structured in the Open XML packaging specification. The following sections cover some of those parts.

WordProcessingML
A WordprocessingML document (Office Open XML document) is represented as a series of related parts that are stored in a container called a package. Information about the relationships between a package and its parts is stored in the packages packagerelationship items. Information about the relationships between two parts is stored in the part-relationship item for the source parts. A package is an ordinary Zip archive whose items correspond directly to those related parts.

Package
Package A Zip archive that contains all the relationship items and parts of the Office Open XML documents, such that those parts are reachable via a set of relationships defines in the relationship items. Package acts as a container for a collection of components, which are composed, processed, and persisted according to a set of rules. These are two kinds of components: parts and relationship items. A package is implemented as a ZIP archive, with each component in a package corresponding to an item in the archive. A Zip archive is a ZIP file as defined in the ZIP file format specification, but excluding all elements of that specification related to the encryption or decryption. A package provides a convenient way to distribute a document with all of its components pieces, such as images fonts and data. The purpose of a package is to combine all of the pieces of document into a single file. A package holding a WordprocessingML document with a picture might contain a number of parts; an XML markup part representing the document, a part containing page header information, a part containing footnotes, and a part representing the picture in jpeg form. Note: XML that is valid according to Office Open XMLs schemas. Note: All XML content of the components defined in this Standard must be encoded using either UTF-8 or UTF-16.

Parts
Part A package component that has associated common properties. A part corresponds to an item in a package. A WordprocessingML document contains a part for the body of the text; it might also contain a part for an image referenced by that text, parts that defining documents characteristics, styles and fonts. Parts can have relationship to each other, as well as to the package itself. These relationships are defined using XML in one or more relationship items. Each part has a content type and is unambiguously addressed using well defined naming guidelines. Content-type information is recorded in the content-type item. Each part has part names. Part names refer to parts within a package, typically as part of a URI reference. Like file names in a file system and URIs, part names are hierarchical. Part name consist of segments, each representing a level in the hierarchy. For example, the part name /hello/world/document.xml contains three segments hello, world, and document.xml. Segments form a tree structure. This is similar to the file systems, where all of the non-leaf nodes in the tree are folders and the leaf nodes are files, which contain actual content. The folder (that is non-leaf) is the tree serve a similar function: they organize the parts of the package. EG:
<Override PartName="/hello/world/document.xml " ContentType="application/vnd.ms-word.main+xml" />

Item
Item is the context of a package item is a synonym for Zip item

Content type
Content type is the description of the type of content stored in a part. A content type defines a media type, a subtype, and an optional set of parameters. The file which is a must and will be named [Content_Types].xml

Content-type item
Content type item an XML representation of mappings from part names to content types, stored as an item in a package. A content-type is not itself a part. This is the must and will look as below
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <Types xmlns="http://schemas.microsoft.com/package/2005/06/content-types"> <Default Extension="xml" ContentType="application/xml" /> <Default Extension="rels" ContentType="application/vnd.ms-package.relationships+xml" /> <Default Extension="jpg" ContentType="image/jpeg" /> <Override PartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" /> </Types>

Relationship
Parts often contain reference to other parts in a package and to resources outside of the package. However, in general, these references are represented inside the references inside the referring part in way that are specific to the content type of the part; that is, in arbitrary markup of an application-specific encoding. This effectively hides the internal and external linkages between parts from consumers that do not understand the content type of the parts containing such references. The package user relationship as a higher-level mechanism to describe references from parts to other internal of external resources. A relationship represents the kind of connection between a source and a target resource. If the source is a part, the relationship is referred to as a part relationship. If the source is the package itself, the relationship is referred to as a package relationship. Relationship makes the connections

directly discoverable without looking at the content in the parts, so they are independent of content-specific schema and faster to resolve the location of others parts.

Package relationship
A relationship whose target is a part and whose source is the package as a whole.

Package-relationship item
An XML representation of one or more package relation ship. Stored as an item in a package. A package relationship item is not itself a part.

Relationship markup
Relationship is represented using one or more Relationship elements nested in a single Relationship element. These elements are defiled in the relationship namespace. Every relationship element must have an Id attribute, the value of which must be unique with in the relationship item. The Id type is xsd:ID and must conform to the naming restriction for that type.

This concludes the commonly used terms in creating an Office Open XML document. What follows next is how to create an Office Open XML Word document referred to as WordprocessingML using JAVA,

JAVA WordprocessingML implementation


We in this article are trying to create a WordprocessingML document, which contains body text as This document was created using JAVA. and image being embedded. This can be achieved by creating hierarchy of folders and files, which will be packaged together as mentioned in the steps below.

JAVA WordprocessingML Folder creation


Create a directory for example c:\WordprocessingML\, which will contain all the files required for packaging such as [Content_Types].xml, image1.jpg, document.xml etc Under c:\WordprocessingML create a folder with the name of _rels as shown below.

JAVA WordprocessingML File creation


Create [Content_Types].xml
In the directory c:\WordprocessingML\ create a XML file [Content_Types].xml, which will contain the WordprocessingML content type. The file content is displayed below.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <Types xmlns="http://schemas.microsoft.com/package/2005/06/content-types"> <Default Extension="xml" ContentType="application/xml" /> <Default Extension="rels" ContentType="application/vnd.ms-package.relationships+xml" /> <Default Extension="jpg" ContentType="image/jpeg" /> <Override PartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" /> </Types>

In Override tag attribute PartName is the xml representation of the word document and the ContentType indicates that it is the main document in xml format.
<Override PartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" />

Images used in the document are referred as shown below, where Extension attribute describes the file <type> and the ContentType attributes contains image/<type>.
<Default Extension="jpg" ContentType="image/jpeg" />

Create or copy image1.jpg


Copy or create an image1.jpg of type jpg file format under thec:\WordprocessingML, which needs to be embedded in the document.

Create .rels
Create XML file under c:\WordprocessingML\_rels\ with below content.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <Relationships xmlns="http://schemas.microsoft.com/package/2005/06/relationships"> <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2006/relationships/officeDocument" Target="document.xml" /> </Relationships>

Create document.xml.rels
Create XML file under c:\WordprocessingML\_rels\ with below content.
<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <Relationships xmlns="http://schemas.microsoft.com/package/2005/06/relationships"> <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2006/relationships/image" Target="image1.jpg" /> </Relationships>

Create document.xml
Create XML file under c:\WordprocessingML\ with below content. This XML contains the text of the document as wells as formatting such as paragraph, row etc For example in below example tag <w:p> represents paragraph for the text. In below case it will also contain the <w:pict> tag for the image to be embedded in the document.

<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2005/10/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:r="http://schemas.microsoft.com/office/2005/11/relationships"> <w:body> <w:p> <w:r> <w:t>This document was created using JAVA.</w:t> </w:r> </w:p> <w:p> <w:r> <w:pict> <v:shape> <v:imagedata r:id="rId1" /> </v:shape> </w:pict> </w:r> </w:p> </w:body> </w:wordDocument>

JAVA Packaging Class Implementation


Importing Classes
Following classes needs to be imported for creating packaging class. The imported classes are built in classes of JAVA..
import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.util.Enumeration; import java.util.zip.ZipEntry; import java.util.zip.ZipFile;

Create a OpenXMLZipFile Classes


OpenXMLZipFile class will contain all the required methods, which will to help in packaging..
public class OpenXMLZipFile { // CreateZipFile method which will take the zipFileName and ToCompressFiles as arguments // and will go through the array of ToCompressFiles and pack it into zipFileName // Below sections explains the method implementation. . }

Create a CreateZipFile Method


create a ZipOutputStream set the level as Deflater.BEST_COMPRESSION loop through the list of files to be Zipped and for each file do the following o Get the file Name and add it to the ZipEntry. o set the ZipEntry to the ZipOutputStream o write the contents of the file to the ZipOutputStream
public static void CreateZipFile(String zipFileName, String[] ToCompressFiles) { try {

String[] fileNames = ToCompressFiles; //fileNames[0] = "C:\\noname.xml"; //fileNames[1] = "C:\\sql_reference.pdf"; FileInputStream inStream; // "C:\\ZipExample1.zip" FileOutputStream outStream = new FileOutputStream(zipFileName); ZipOutputStream zipOStream = new ZipOutputStream(outStream); zipOStream.setLevel ( Deflater.BEST_COMPRESSION ); for (int loop=0;loop < fileNames.length; loop++) { inStream = new FileInputStream(fileNames[loop]); zipOStream.putNextEntry(new ZipEntry(fileNames[loop])); int i=0; while ((i=inStream.read())!=-1) { zipOStream.write(i); } zipOStream.closeEntry(); inStream.close(); } zipOStream.flush(); zipOStream.close(); } catch (IllegalArgumentException iae) { iae.printStackTrace(); } catch(FileNotFoundException fe) { System.out.println("File not found===="+fe); } catch (IOException ioe) { System.out.println("IOException===="+ioe); ioe.printStackTrace(); } }

Create a UnZipFile Method


Read the zip file Loop through the entries in the zip file and for each entry do the following o Create a File Object. The file name is derived from the ZipEntry. o Create an OutputStream using the File Object. o Read the contents ZipEntry into a InputStream o Write the contents of the InputStream into the OutputStream

public static void UnZipFile(String zipFileName, String ToExtractFile) { String inputFileName = zipFileName; // "C:\\PPT.zip"; String desFileName = ToExtractFile; // "C:\\TEST\\"; try { File sourceZipFile = new File(inputFileName); File destDirectory = new File(desFileName); //Open the ZIP file for reading ZipFile zipFile = new ZipFile(sourceZipFile,ZipFile.OPEN_READ); //Get the entries Enumeration enum = zipFile.entries(); while(enum.hasMoreElements()) { ZipEntry zipEntry = (ZipEntry)enum.nextElement(); String currName = zipEntry.getName(); File destFile = new File(destDirectory,currName); // grab file's parent directory structure File destinationParent = destFile.getParentFile(); // create the parent directory structure if needed destinationParent.mkdirs(); if(!zipEntry.isDirectory()) { BufferedInputStream is = new BufferedInputStream(zipFile.getInputStream(zipEntry)); int currentByte; // write the current file to disk FileOutputStream fos = new FileOutputStream(destFile); BufferedOutputStream dest = new BufferedOutputStream(fos); // read and write until last byte is encountered while ((currentByte = is.read()) != -1) { dest.write(currentByte); } dest.flush(); dest.close(); is.close(); } } }

catch(IOException ioe) { System.out.println("IOException occured====="+ioe); ioe.printStackTrace(); } }

The OpenXMLZipFile class. class code looks as below after implementation


import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.util.Enumeration; import java.util.zip.ZipEntry; import java.util.zip.ZipFile; public class OpenXMLZipFile { // CreateZipFile method which will take the zipFileName and ToCompressFiles as arguments // and will go through the array of ToCompressFiles and pack it into zipFileName public static void CreateZipFile(String zipFileName, String[] ToCompressFiles) { try { String[] fileNames = ToCompressFiles; //fileNames[0] = "C:\\noname.xml"; //fileNames[1] = "C:\\sql_reference.pdf"; FileInputStream inStream; // "C:\\ZipExample1.zip" FileOutputStream outStream = new FileOutputStream(zipFileName); ZipOutputStream zipOStream = new ZipOutputStream(outStream); zipOStream.setLevel ( Deflater.BEST_COMPRESSION ); for (int loop=0;loop < fileNames.length; loop++) { inStream = new FileInputStream(fileNames[loop]); zipOStream.putNextEntry(new ZipEntry(fileNames[loop])); int i=0; while ((i=inStream.read())!=-1) { zipOStream.write(i); } zipOStream.closeEntry();

inStream.close(); } zipOStream.flush(); zipOStream.close(); } catch (IllegalArgumentException iae) { iae.printStackTrace(); } catch(FileNotFoundException fe) { System.out.println("File not found===="+fe); } catch (IOException ioe) { System.out.println("IOException===="+ioe); ioe.printStackTrace(); } } public static void UnZipFile(String zipFileName, String ToExtractFile) { String inputFileName = zipFileName; // "C:\\PPT.zip"; String desFileName = ToExtractFile; // "C:\\TEST\\"; try { File sourceZipFile = new File(inputFileName); File destDirectory = new File(desFileName); //Open the ZIP file for reading ZipFile zipFile = new ZipFile(sourceZipFile,ZipFile.OPEN_READ); //Get the entries Enumeration enum = zipFile.entries(); while(enum.hasMoreElements()) { ZipEntry zipEntry = (ZipEntry)enum.nextElement(); String currName = zipEntry.getName(); File destFile = new File(destDirectory,currName); // grab file's parent directory structure File destinationParent = destFile.getParentFile(); // create the parent directory structure if needed destinationParent.mkdirs(); if(!zipEntry.isDirectory()) { BufferedInputStream is = new BufferedInputStream(zipFile.getInputStream(zipEntry));

int currentByte; // write the current file to disk FileOutputStream fos = new FileOutputStream(destFile); BufferedOutputStream dest = new BufferedOutputStream(fos); // read and write until last byte is encountered while ((currentByte = is.read()) != -1) { dest.write(currentByte); } dest.flush(); dest.close(); is.close(); } } } catch(IOException ioe) { System.out.println("IOException occured====="+ioe); ioe.printStackTrace(); } } }

Creating WordprocesingML package


To create a WordprocessingML do the following steps. 1. Create an instance of the class OpenXMLZipFile
OpenXMLZipFile myWordprocessingML = new OpenXMLZipFile()

2. Create a variable
String zipFileName = c:\\myFirstDocumentUsingJava.docx String [] ToCompressFiles = new String[4]; ToCompressFiles [0] = c:\\WordprocessingML\\[Content_Types].xml; ToCompressFiles [1] = c:\WordprocessingML\\image1.jpg; ToCompressFiles [2] = c:\WordprocessingML\_rels\document.xml.rels; ToCompressFiles [3] = c:\WordprocessingML\document.xml ;

3. Call the method CreateZipFile


CreateZipFile (zipFileName, ToCompressFiles);

The output of the above method will be a myFirstDocumentUsingJava.docx. This document fully conforms to the Open XML standard, and can be accessed using Office 2007 (or the 2000/XP/2003 versions of Office with the free Open XML add-in installed).

You might also like