删除已编码的html并添加换行符(Remove encoded html and add line break)

我一直试图解决这个问题几个小时但没有运气。 XML看起来像 -

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat &lt;p&gt;&lt;b&gt;Section B: China&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat&lt;/p&gt; &lt;p&gt;&lt;b&gt;Section C: Himalayan Studies&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat&lt;/p&gt; </description>

我希望输出在没有编码的<p>或<b>标签的情况下是干净的,但是也可以通过替换&lt;p&gt;&lt;b&gt;在部分之前插入换行符&lt;p&gt;&lt;b&gt; 与<br/> 。 所以输出看起来像

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat <br/>Section B: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam no eirmod tempor invidunt ut labore et dolore magna aliquyam erat <br/>Section C: Himalayan Studies Lorem ipsum dolor sit amet, consetetur sadipscing sed diam nonumy eirmod tempor invidunt ut labore et dolore m ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat </description>

我尝试使用替换功能,但无法添加换行符。 也尝试使用翻译,但没有运气

<xsl:value-of select="translate(., translate(., 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ', ''), '')"/>

任何有关如何处理此问题的帮助将不胜感激。

I have been trying to solve this one for hours now but got no luck. The XML looks like-

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat &lt;p&gt;&lt;b&gt;Section B: China&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat&lt;/p&gt; &lt;p&gt;&lt;b&gt;Section C: Himalayan Studies&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat&lt;/p&gt; </description>

I want the output to be clean without the encoded <p> or <b> tags but also insert a line break before the sections by replacing &lt;p&gt;&lt;b&gt; with <br/> . So output will look like

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat <br/>Section B: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam no eirmod tempor invidunt ut labore et dolore magna aliquyam erat <br/>Section C: Himalayan Studies Lorem ipsum dolor sit amet, consetetur sadipscing sed diam nonumy eirmod tempor invidunt ut labore et dolore m ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat </description>

I have tried using the replace function but was not able to add line breaks. Also tried using translate but no luck

<xsl:value-of select="translate(., translate(., 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ', ''), '')"/>

Any help on how to approach this problem will be appreciated.

最满意答案

使用parse-xml()函数的XSLT 3.0解决方案:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"> <!--standard identity template--> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="description"> <xsl:copy> <!--Concatenate encoded <p> element to ensure that it is well-formed XML with a document element when parsed. Use parse-xml() to parse the encoded markup as a parsed document. Apply-templates to the parsed document--> <xsl:apply-templates select="parse-xml(concat('&lt;p&gt;', ., '&lt;/p&gt;'))"/> </xsl:copy> </xsl:template> <!-- remove <p> and <b> elements --> <xsl:template match="p | b"> <xsl:apply-templates/> </xsl:template> <!--for every <p> element that has a <b> element, generate a <br/> --> <xsl:template match="p[b]"> <br/> <xsl:apply-templates/> </xsl:template> </xsl:stylesheet>

An XSLT 3.0 solution that uses the parse-xml() function:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"> <!--standard identity template--> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="description"> <xsl:copy> <!--Concatenate encoded <p> element to ensure that it is well-formed XML with a document element when parsed. Use parse-xml() to parse the encoded markup as a parsed document. Apply-templates to the parsed document--> <xsl:apply-templates select="parse-xml(concat('&lt;p&gt;', ., '&lt;/p&gt;'))"/> </xsl:copy> </xsl:template> <!-- remove <p> and <b> elements --> <xsl:template match="p | b"> <xsl:apply-templates/> </xsl:template> <!--for every <p> element that has a <b> element, generate a <br/> --> <xsl:template match="p[b]"> <br/> <xsl:apply-templates/> </xsl:template> </xsl:stylesheet>删除已编码的html并添加换行符(Remove encoded html and add line break)

我一直试图解决这个问题几个小时但没有运气。 XML看起来像 -

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat &lt;p&gt;&lt;b&gt;Section B: China&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat&lt;/p&gt; &lt;p&gt;&lt;b&gt;Section C: Himalayan Studies&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat&lt;/p&gt; </description>

我希望输出在没有编码的<p>或<b>标签的情况下是干净的,但是也可以通过替换&lt;p&gt;&lt;b&gt;在部分之前插入换行符&lt;p&gt;&lt;b&gt; 与<br/> 。 所以输出看起来像

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat <br/>Section B: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam no eirmod tempor invidunt ut labore et dolore magna aliquyam erat <br/>Section C: Himalayan Studies Lorem ipsum dolor sit amet, consetetur sadipscing sed diam nonumy eirmod tempor invidunt ut labore et dolore m ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat </description>

我尝试使用替换功能,但无法添加换行符。 也尝试使用翻译,但没有运气

<xsl:value-of select="translate(., translate(., 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ', ''), '')"/>

任何有关如何处理此问题的帮助将不胜感激。

I have been trying to solve this one for hours now but got no luck. The XML looks like-

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat &lt;p&gt;&lt;b&gt;Section B: China&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat&lt;/p&gt; &lt;p&gt;&lt;b&gt;Section C: Himalayan Studies&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat&lt;/p&gt; </description>

I want the output to be clean without the encoded <p> or <b> tags but also insert a line break before the sections by replacing &lt;p&gt;&lt;b&gt; with <br/> . So output will look like

<description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat <br/>Section B: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam no eirmod tempor invidunt ut labore et dolore magna aliquyam erat <br/>Section C: Himalayan Studies Lorem ipsum dolor sit amet, consetetur sadipscing sed diam nonumy eirmod tempor invidunt ut labore et dolore m ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore a aliquyam erat </description>

I have tried using the replace function but was not able to add line breaks. Also tried using translate but no luck

<xsl:value-of select="translate(., translate(., 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ', ''), '')"/>

Any help on how to approach this problem will be appreciated.

最满意答案

使用parse-xml()函数的XSLT 3.0解决方案:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"> <!--standard identity template--> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="description"> <xsl:copy> <!--Concatenate encoded <p> element to ensure that it is well-formed XML with a document element when parsed. Use parse-xml() to parse the encoded markup as a parsed document. Apply-templates to the parsed document--> <xsl:apply-templates select="parse-xml(concat('&lt;p&gt;', ., '&lt;/p&gt;'))"/> </xsl:copy> </xsl:template> <!-- remove <p> and <b> elements --> <xsl:template match="p | b"> <xsl:apply-templates/> </xsl:template> <!--for every <p> element that has a <b> element, generate a <br/> --> <xsl:template match="p[b]"> <br/> <xsl:apply-templates/> </xsl:template> </xsl:stylesheet>

An XSLT 3.0 solution that uses the parse-xml() function:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"> <!--standard identity template--> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="description"> <xsl:copy> <!--Concatenate encoded <p> element to ensure that it is well-formed XML with a document element when parsed. Use parse-xml() to parse the encoded markup as a parsed document. Apply-templates to the parsed document--> <xsl:apply-templates select="parse-xml(concat('&lt;p&gt;', ., '&lt;/p&gt;'))"/> </xsl:copy> </xsl:template> <!-- remove <p> and <b> elements --> <xsl:template match="p | b"> <xsl:apply-templates/> </xsl:template> <!--for every <p> element that has a <b> element, generate a <br/> --> <xsl:template match="p[b]"> <br/> <xsl:apply-templates/> </xsl:template> </xsl:stylesheet>