2011年6月16日 星期四

XSLT - Using XSLT to generate HTML from XML

可擴展樣式錶轉換語言(Extensible Stylesheet Language Transformations,簡稱XSLT) , 前陣子剛碰到這玩意, 可以用來對XML檔做轉換的動作, 這次是用在把XML > HTML ,XSLT本身也是一份XML檔案,所以它也必須遵守嚴格的XML規範。 XSLT Wiki 有一些說明,還有一些link , 建議可以先看看XSLT的簡介, 還有XPATH的範例說明等,下面2個連結可以先參考看看:
XPATH 語法
XSLT 基礎

這邊主要是針對XML to HTML做處理, 還有在XSLT裡面使用java method , javabean 來達到一些效果。

原始的XML檔案大概長這樣,實作EDRM 的mail XML 檔會長的...非常難看
<?xml version='1.0' encoding='UTF-8' ?>
  <Root caseId="Case1" description="Test Case" locale="US" majorVersion="1" minorVersion="2" rootFilePath="/temp/">
    <batch>
      <Document MimeType="multipart/mixed" DocType="Header" DocID="1">
        <FieldValues>
          <contentType>multipart/mixed;
 boundary="------------010709010402030905080005"</contentType>
          <mailAuthor>N2&lt;tamino2@tsai.james&gt;</mailAuthor>
          <mailCC>tamino0@tsai.james</mailCC>
          <mailSubject>2011/06/14-test01</mailSubject>
          <mailTo>tamino1@tsai.james,tamino0@tsai.james</mailTo>
          <messageID>&lt;4DF6BDD9.7090905@tsai.james&gt;</messageID>
        </FieldValues >
      </Document>
      <Document MimeType="multipart/alternative" DocType="MIME Entity" DocID="2">
        <FieldValues>
          <contentType>multipart/alternative;
 boundary="------------080407070307020309080702"</contentType>
        </FieldValues>
      </Document>
      <Document MimeType="text/plain" DocType="MIME Entity" DocID="3">
        <FieldValues>
          <content>aabbccd
*eeffggy*
myImg1


myImg2

</content>
          <contentType>text/plain; charset=UTF-8; format=flowed</contentType>
        </FieldValues>
      </Document>
      <Document MimeType="multipart/related" DocType="MIME Entity" DocID="4">
        <FieldValues>
          <contentType>multipart/related;
 boundary="------------000300090903000901010201"</contentType>
        </FieldValues>
      </Document>
      <Document MimeType="text/html" DocType="MIME Entity" DocID="5">
        <FieldValues>
          <content>&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"&gt;
&lt;html&gt;
  &lt;head&gt;

    &lt;meta http-equiv="content-type" content="text/html; charset=UTF-8"&gt;
  &lt;/head&gt;
  &lt;body text="#000000" bgcolor="#ffffff"&gt;
    aabbccd&lt;br&gt;
    &lt;b&gt;&lt;big&gt;&lt;big&gt;&lt;big&gt;eeffggy&lt;/big&gt;&lt;/big&gt;&lt;/big&gt;&lt;/b&gt;&lt;br&gt;
    myImg1&lt;br&gt;
    &lt;img src="cid:part1.03050007.07010308@tsai.james" alt=""&gt;&lt;br&gt;
    &lt;br&gt;
    myImg2&lt;br&gt;
    &lt;img src="cid:part2.01030500.03060103@tsai.james" alt=""&gt;&lt;br&gt;
  &lt;/body&gt;
&lt;/html&gt;
</content>
          <contentType>text/html; charset=UTF-8</contentType>
        </FieldValues>
      </Document>
      <Document MimeType="image/jpeg" DocType="MIME Entity" DocID="6">
        <FieldValues>
          <content>/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0a
HBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIy
MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAaAVADASIA
</content>
          <contentID>&lt;part1.03050007.07010308@tsai.james&gt;</contentID>
          <contentTransferEncoding>base64</contentTransferEncoding>
          <contentType>image/jpeg;
 name="new_pagingtoolbar.JPG"</contentType>
        </FieldValues>
      </Document>
      <Document MimeType="image/jpeg" DocType="MIME Entity" DocID="7">
        <FieldValues>
          <content>/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0a
HBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIy
</content>
          <contentID>&lt;part2.01030500.03060103@tsai.james&gt;</contentID>
          <contentTransferEncoding>base64</contentTransferEncoding>
          <contentType>image/jpeg;
 name="SaveAs_SubMenu.JPG"</contentType>
        </FieldValues>
      </Document>
      <Document MimeType="application/vnd.openxmlformats-officedocument.wordprocessingml.document" DocType="Attach File" DocID="8">
        <FieldValues>
          <contentTransferEncoding>7bit</contentTransferEncoding>
          <contentType>application/vnd.openxmlformats-officedocument.wordprocessingml.document;
 name="temp.docx"</contentType>
        </FieldValues>
      </Document>
      <Document MimeType="application/octet-stream" DocType="Attach File" DocID="9">
        <FieldValues>
          <contentTransferEncoding>base64</contentTransferEncoding>
          <contentType>application/octet-stream;
 name="test.rar"</contentType>
        </FieldValues>
      </Document>
    </batch>
    <fields/>
  </Root>
為了閱讀方便,我把file content 的base64 encode 部份刪掉部份,所以上面的xml是會有問題的, 一般使用XSLT在做XML轉換的時候,如果XML定義檔很單純簡單的話,XSL檔的設計上就會簡單很多,碰到上面這種XML很復雜的時候處理上就會麻煩點。先來看看轉出來的結果是長什麼樣子, 轉成HTML 檔 或著是回傳HTML String ,大概像這樣:

說穿了只是把原始的E-Mail內容轉換成EDRM XML , 然後會有一堆XML檔,透過XSLT轉換成HTML View , 所以在XSL的設計上就要把相關的屬性取出來,然後變成HTML, 下面是實作出來的XSL檔
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:cu="com.mypackage.util.XsltCommonUtils"
 xmlns:obj="com.mypackage.xsltObject.XsltDataObject"
 exclude-result-prefixes="cu obj" >
 
 <xsl:output method="html" encoding="UTF-8" />
 
 
 <xsl:variable name="Documents">
  <xsl:value-of select="/Documents" />
 </xsl:variable>
 <xsl:variable name="From"> 
  <xsl:value-of select="cu:processEscapingString(//FieldValues/mailAuthor)" disable-output-escaping="yes"/>
 </xsl:variable>
 <xsl:variable name="To">
  <xsl:value-of select="//FieldValues/mailTo" />
 </xsl:variable>
 <xsl:variable name="Cc">
  <xsl:value-of select="//FieldValues/mailCC" />
 </xsl:variable>
 <xsl:variable name="Bcc">
  <xsl:value-of select="//FieldValues/mailBCC" />
 </xsl:variable>
 <xsl:variable name="SendDate">
  <xsl:value-of select="//FieldValues/mailSentDate" />
 </xsl:variable>
 <xsl:variable name="Subject">
  <xsl:value-of select="//FieldValues/mailSubject" />
 </xsl:variable>
 <xsl:variable name="uid">
  <xsl:value-of select="//FieldValues/messageID" />
 </xsl:variable>
 
 <xsl:variable name="bodyObj" select="obj:new()"/>  
 
 <xsl:param name="htmlBody">
  <xsl:for-each select="//Document">
   <xsl:choose>
    <xsl:when test="contains(@MimeType,'text/html')">
        <xsl:value-of select="cu:setXlstDataObject($bodyObj,FieldValues/content)" disable-output-escaping="yes"/>     
        <xsl:value-of select="cu:setXlsDataObjectIsHtml($bodyObj)"/>
    </xsl:when>
   </xsl:choose>
  </xsl:for-each>       
 </xsl:param>

 <xsl:variable name="textBody">
  <xsl:if test="not(cu:getXlsDataObjectIsHtml($bodyObj))">
   <xsl:for-each select="//Document">
    <xsl:choose>
    <xsl:when test="contains(@MimeType,'text/plain')">
        <xsl:if test="contains(@DocType,'MIME Entity')">
         <xsl:value-of select="//FieldValues/content" />
        </xsl:if> 
    </xsl:when>
   </xsl:choose>
   </xsl:for-each>    
  </xsl:if> 
 </xsl:variable>
 
 <xsl:template match="/">
  <html>
   <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
     BODY, TD {
     font-family: Arial;
     color: black;
     font-size: 9pt;
     font-weight: normal;
     text-decoration: none; }
     * {margin:0; padding:0;}
     .header{
     background-image: url(img/bg_line.jpg);
     background-repeat: repeat-x;
     background-position: left top;
     height: 94px;
     width: 100%;
     }
     .headerbg{
     background-image: url(img/bg.jpg);
     background-repeat: no-repeat;
     background-position: left top;
     height: 94px;
     width: 100%;
     }
     .main{
     margin-right: 5px;
     padding-top: 15px;
     width: 200px;
     }
     .right{
     float: right;
     margin-right: 5px;
     margin-left: 3px;
     padding-top: 4px;
     text-align: right;
     }
     .left{
     float: left;
     margin-right: 5px;
     margin-left: 3px;
     padding-top: 4px;
     text-align: left;
     }
     .logoimg{
     width: 50px;
     padding-right: 5px;
     padding-left: 10px;
     padding-top: 15px;
     }

     .input_a{
     border: 1px solid #333;
     font-family: Arial;
     font-size: 9pt;
     font-weight: normal;
     color: #000;
     height: 18px;
     width: 312px;
     background-image: url(img/usericon.jpg);
     background-repeat: no-repeat;
     background-color: #FFF;
     padding-left: 17px;
     background-position: 2px;
     }
     .input_b{
     border: 1px solid #333;
     font-family: Arial;
     font-size: 9pt;
     font-weight: normal;
     color: #000;
     height: 18px;
     width: 500px;
     background-color: #FFF;
     padding-left: 0px;
     background-position: 2px;
     }
     TABLE.tablestyle1 TD, TABLE.tablestyle1
     TD { border: 1 solid black;
     }
     TABLE.tablestyle1 { border-collapse:
     collapse; border-spacing: 0px
     0px; empty-cells: show }
    </style>
   </head>
   <body>
   
    
    <table width="100%" border="0" cellspacing="0" cellpadding="0">
     <tr>
      <td align="left" valign="top" class="headerbg">
       <table width="100%" border="0" cellspacing="0" cellpadding="0">
        <tr>
         <td align="left" valign="top" class="logoimg">
          <img src="img/logo.gif" width="60" height="60" alt="" />
         </td>
         <td align="left" valign="top" class="main">
          <b>
           <xsl:value-of select="$From"
            disable-output-escaping="yes" />
          </b>
          <br />
          <xsl:value-of select="$SendDate" />
         </td>
         <td align="left" valign="top" class="left">
          <table width="50px" border="0" cellspacing="1"
           cellpadding="0">
           <tr>
            <td align="right" valign="top">To%uFF1A</td>
            <td width="20%" align="right" valign="top">
             <xsl:text disable-output-escaping="yes"><![CDATA[<input type="text" size="50" class="input_b" readonly="true" value=']]></xsl:text>
             <xsl:value-of select="$To"
              disable-output-escaping="yes" />
             <xsl:text disable-output-escaping="yes"><![CDATA['/>]]></xsl:text>
            </td>
           </tr>
           <tr>
            <td align="right" valign="top">cc%uFF1A</td>
            <td align="right" valign="top">
             <xsl:text disable-output-escaping="yes"><![CDATA[<input type="text" size="50" class="input_b" readonly="true" value=']]></xsl:text>
             <xsl:value-of select="$Cc"
              disable-output-escaping="yes" />
             <xsl:text disable-output-escaping="yes"><![CDATA['/>]]></xsl:text>
            </td>
           </tr>
           <tr>
            <td align="right" valign="top">bcc%uFF1A</td>
            <td align="right" valign="top">
             <xsl:text disable-output-escaping="yes"><![CDATA[<input type="text" size="50" class="input_b" readonly="true" value=']]></xsl:text>
             <xsl:value-of select="$Bcc"
              disable-output-escaping="yes" />
             <xsl:text disable-output-escaping="yes"><![CDATA['/>]]></xsl:text>
            </td>
           </tr>
           <tr>
            <td align="right" valign="top">Subject%uFF1A</td>
            <td align="right" valign="top">
             <xsl:text disable-output-escaping="yes"><![CDATA[<input type="text" size="50" class="input_b" readonly="true" value=']]></xsl:text>
             <xsl:value-of select="$Subject"
              disable-output-escaping="yes" />
             <xsl:text disable-output-escaping="yes"><![CDATA['/>]]></xsl:text>
            </td>
           </tr>
          </table>
         </td>
        </tr>
       </table>
      </td>
     </tr>
    </table>
    <br />
    <table width="100%">
     <tr>
      <td>
       
       <!-- 
       <xsl:for-each select="//item[@name='Body']/itemdata[@type='19']">
        <pd4ml-include encoding="qp" debug="false">
         <xsl:copy-of select="text()" />
        </pd4ml-include>
       </xsl:for-each>

       <xsl:for-each select="//item/rawitemdata[@type='19']">
        <table border="0" width="100%">
         <tr>
          <td style="word-wrap: break-word">
           <pd4ml-include xencoding="qp" debug="false">
            <xsl:copy-of select="text()" />
           </pd4ml-include>
          </td>
         </tr>
        </table>
       </xsl:for-each>
        -->
       <!-- 
       <xsl:for-each
        select="//item[@name='$FILE']/object/file[@name='mime.htm']/filedata">
        <pd4ml-include debug="false">
         <xsl:attribute name="encoding"><xsl:value-of
          select="../@encoding" /></xsl:attribute>
         <xsl:copy-of select="text()" />
        </pd4ml-include>
       </xsl:for-each>
       
       <xsl:apply-templates select="item" />
        -->
       <!-- commented out because file[contains(@name,'mime.gif')] causes 
        java.lang.ArrayIndexOutOfBoundsException in Xalan under some conditions xsl:for-each 
        select="//item[@name='$FILE']/object/file[contains(@name,'mime.gif')]/filedata"> 
        <img> <xsl:attribute name="src">data:image/gif;base64,<xsl:value-of select="normalize-space(.)" 
        disable-output-escaping="yes"/></xsl:attribute> <xsl:attribute name="border">0</xsl:attribute> 
        </img> </xsl:for-each -->
      </td>
     </tr>
     </table>
     <!-- 
    <xsl:for-each select="//Document">
     <xsl:choose>
      <xsl:when test="contains(@DocType,'MIME Entity')">
       <xsl:value-of select="Document/FieldValues/ContentID" disable-output-escaping="yes"/>  
       <xsl:value-of select="cu:testNodeSet(Document/FieldValues)" />
      </xsl:when>
   
     </xsl:choose>
    </xsl:for-each>
     -->
     
     
     <!--  to get body string with inline images. -->
     <xsl:apply-templates select="//Document" ></xsl:apply-templates>
     
     <xsl:value-of select="$textBody" disable-output-escaping="yes"/>
     <xsl:value-of select="cu:getXlstDataObject($bodyObj)" disable-output-escaping="yes"/>
     
   </body>
  </html>
 </xsl:template>
 
 
 <xsl:template match="Document">    
   <xsl:choose>
     <xsl:when test="contains(@MimeType,'image')">
       <xsl:variable name="contentId">
        <xsl:value-of select="FieldValues/contentID" disable-output-escaping="yes"/> 
       </xsl:variable>
       <xsl:variable name="content">       
        <xsl:value-of select="FieldValues/content" disable-output-escaping="yes"/>
       </xsl:variable>
       <xsl:variable name="fileType">       
        <xsl:value-of select="Files/File/@FileType" disable-output-escaping="yes"/>
       </xsl:variable>

       <xsl:variable name="filePath">
        <xsl:value-of select="Files/File/ExternalFile/@FilePath" disable-output-escaping="yes"/>
       </xsl:variable>
       <xsl:variable name="fileName">       
        <xsl:value-of select="Files/File/ExternalFile/@FileName" disable-output-escaping="yes"/>
       </xsl:variable>
       <xsl:variable name="hashMd5">
        <xsl:value-of select="Files/File/ExternalFile/@Hash" disable-output-escaping="yes"/>
       </xsl:variable>
      <xsl:value-of select="cu:setDataObjectBodyByInlineImageUrl($bodyObj,$contentId,$fileType,$content,$fileName,$hashMd5) "  disable-output-escaping="yes"/>               
    </xsl:when>
    <xsl:when test="contains(@DocType,'Attach File')">
     <!-- do something -->
    </xsl:when>
   </xsl:choose>   
 </xsl:template>
 
</xsl:stylesheet> 


還有相關的Java Class
XsltDataObject.java
package com.mypackage.xsltObject;

public class XsltDataObject {
 private  String bodyString;
 private boolean isHtml = false;
 
 
 
 
 public boolean isHtml() {
  return isHtml;
 }

 public void setHtml(boolean isHtml) {
  this.isHtml = isHtml;
 }

 public  String getBodyString() {
  return bodyString;
 }

 public  void setBodyString(String bodyString) {
  this.bodyString = bodyString;
 }
}

還有XsltCommonUtils.java
package com.mypackage.util;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

import com.mypackage.xsltObject.XsltDataObject;

public class XsltCommonUtils {
 
 
 
 public static void setXlstDataObject(XsltDataObject object,String bodyString){
  String bodyStr = bodyString;
  bodyStr = bodyStr.replaceAll("cid:", "");
  object.setBodyString(bodyStr);
 }
 
 public static String getXlstDataObject(XsltDataObject object){
  return object.getBodyString();
 }
 
 public static void setXlsDataObjectIsHtml(XsltDataObject object){
  System.out.println("setXlsDataObjectIsHtml to true ");
  object.setHtml(true);
 }
 
 public static boolean getXlsDataObjectIsHtml(XsltDataObject object){
  System.out.println("getXlsDataObjectIsHtml  = "  object.isHtml());
  return object.isHtml();
 }
 
 
 public static void setDataObjectBodyByInlineImageUrl(XsltDataObject inObject,
   String contentId, String fileType, String content,
   String fileName, String hashMd5) {
  
  
  //for debug
  System.out.println("contentId = " contentId);
  System.out.println("fileType = " fileType);

  System.out.println("fileName = " fileName);
  
  
  String contentIdStr = replaceContentId(contentId.trim());
  System.out.println("contentIdStr =" contentIdStr);
  StringBuffer resultString = new StringBuffer();
  String bodyString = inObject.getBodyString();
  
  resultString.append("data:");
  resultString.append(fileType);
  resultString.append(";base64,");
  
  
  //will be change to get file name rule.
  /*
  String fullFileName = filePath "/"   contentIdStr;
  System.out.println("fullFileName = " fullFileName);
  fullFileName =  fullFileName.replaceAll("\\\\", "/");
  
  File f = new File(fullFileName);
  BufferedReader reader = null;
  StringBuffer contents = new StringBuffer();
  
  if(f.exists()){
   try {
    reader = new BufferedReader(new FileReader(f));
    String text = null;
    while ((text = reader.readLine()) != null) {
     contents.append(text);
    }
    reader.close();
   } catch (FileNotFoundException e) {    
    e.printStackTrace();
   } catch (IOException e) {
    e.printStackTrace();
   }
  }
  */
//  System.out.println("contents = "  contents.toString());
  resultString.append(content);
  
  bodyString = bodyString.replaceAll(contentIdStr, resultString.toString());
  
//  System.out.println("Result bodyString = "   bodyString);  
  inObject.setBodyString(bodyString);
 }
 
 
 private static String replaceContentId(String contentId){
  String result = contentId;
  result = result.replaceAll("&lt;", "");
  result = result.replaceAll("&gt;", "");
  result = result.replaceAll("<", "");
  result = result.replaceAll(">", "");
  
  return result;
 }
 
 public static String processEscapingString(String inputString){
  System.out.println("processEscapingString = " inputString);
  String result = inputString;
  result = result.replaceAll("&lt;", "&lt; ");
  result = result.replaceAll("&gt;", " &gt;");
  result = result.replaceAll("<", "< ");
  result = result.replaceAll(">", " >");
  System.out.println("processEscapingString result= " result);
  return result;
  
 }

}

可以先別理會Java Class在做什麼事, 在原始的XSL 檔裡面可以注意下面這個
<xsl:variable name="bodyObj" select="obj:new()"/> 
這邊會設定一個variable 為bodyObj ,XSLT 的variable 一旦設定後就不可變更,但是我們可以對這個Java Object的值做處理。

看看下面的param設定
<xsl:param name="htmlBody">
  <xsl:for-each select="//Document">
   <xsl:choose>
    <xsl:when test="contains(@MimeType,'text/html')">
        <xsl:value-of select="cu:setXlstDataObject($bodyObj,FieldValues/content)" disable-output-escaping="yes"/>     
        <xsl:value-of select="cu:setXlsDataObjectIsHtml($bodyObj)"/>
    </xsl:when>
   </xsl:choose>
  </xsl:for-each>       
 </xsl:param>

cu:setXlstDataObject , cu是我們指定的xmlns name = cu , 指向java class = com.mypackage.util.XsltCommonUtils, 所以當你在XSLT裡調用
cu:setXlstDataObject($bodyObj,FieldValues/content)
會呼叫XsltCommonUtils的setXlstDataObject method,
public static void setXlstDataObject(XsltDataObject object,String bodyString){
  String bodyStr = bodyString;
  bodyStr = bodyStr.replaceAll("cid:", "");
  object.setBodyString(bodyStr);
}
注意在XSL 裡面$bodyObj會自動對應Java method裡面的setXlstDataObject為XsltDataObject, 然後在XsltCommonUtils就寫一些針對JavaBean做處理的動作把XSL裡面的$bodyObj再做操作處理。
相同的處理在XSL 跟 Java之間的物件型態轉換可以自己參考文件。

所以我們的XSL檔跟XML都有了之後就是要透過TransformerFactory轉換成HTML

XmlToXLSTtest.java
package com;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class XmlToXLSTtest {
 
 /**       
  * @param args 
  */
 public static void main(String[] args) {
  TransformerFactory tFactory = TransformerFactory.newInstance();
  Transformer transformer; 
  try {  
   transformer = tFactory.newTransformer(new javax.xml.transform.stream.StreamSource("test.xsl"));
   
//   StreamSource xslt = new StreamSource(dxlReader);
//   StreamResult result = new StreamResult(outWriter);
   
   transformer.transform(new javax.xml.transform.stream.StreamSource("EDRM_Sample-2.xml"), 
                   new javax.xml.transform.stream.StreamResult(new FileOutputStream("EDRM.html")));
   
   System.out.println("transformer finished.");
  } catch (TransformerConfigurationException e) {
   e.printStackTrace();
  } catch (FileNotFoundException e) {
   e.printStackTrace();
  } catch (TransformerException e) {
   e.printStackTrace();
  }
 }

}



Reference:
Java and XSLT
XSLT Wiki
Processing XML with Java
XSL Transformations (XSLT) Version 2.0 (W3c)

XPATH 語法
XSLT 基礎
简单的 Xalan 扩展函数

沒有留言:

張貼留言