正则表达式能过滤html元素你知道吗

转帖|其它|编辑:郝浩|2008-11-18 11:22:29.000|阅读 1284 次

概述:正则表达式能过滤html元素你知道吗

# 界面/图表报表/文档/IDE等千款热门软控件火热销售中 >>

  /**

  * filter all html element.

  * For example:< a href="www.sohu.com/test">hello!< /a>

  * The filter result is :hello!

  * Notice:This method filter the text between "< " and ">"

  * @param element

  * @return

  */

  public static String getTxtWithoutHTMLElement (String element)
  {

    // String reg="< [^< |^>]+>";
    // return element.replaceAll(reg,"");

    if(null==element||"".equals(element.trim()))

    {
      return element;
    }

    Pattern pattern=Pattern.compile("< [^< |^>]*>");
    Matcher matcher=pattern.matcher(element);
    StringBuffer txt=new StringBuffer();
    while(matcher.find())

    {
      String group=matcher.group();
      if(group.matches("< [\\s]*>"))
      {
        matcher.appendReplacement(txt,group);
      }

      else
      {
        matcher.appendReplacement(txt,"");
      }

    }

    matcher.appendTail(txt);
    repaceEntities(txt,"&","&");
    repaceEntities(txt,"< ","< ");
    repaceEntities(txt,">",">");
    repaceEntities(txt,""","\"");
    repaceEntities(txt," ","");
    return txt.toString();

  }
 
  下面是测试用例:

  public void testGetTxtWithoutHTMLElement ()
  {

    assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("< a href='a/test'>test< /a>"));
    assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("< a href='a/test'>test"));
    assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("< input type='text'>test< /input>"));
    assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("< p>test"));
    assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("< table>< tr>< td>test< /td>< /tr>< /table>"));
    assertEquals("te< st",ExcelHssfView.getTxtWithoutHTMLElement("< p>te< st"));
    assertEquals("te>st",ExcelHssfView.getTxtWithoutHTMLElement("< p>te>st"));
    assertEquals("tst",ExcelHssfView.getTxtWithoutHTMLElement("< p>t< e>st"));
    assertEquals("t< st",ExcelHssfView.getTxtWithoutHTMLElement("< p>t< < e>st"));
    assertEquals("< >test",ExcelHssfView.getTxtWithoutHTMLElement("< p>< >test"));
    assertEquals("< >test",ExcelHssfView.getTxtWithoutHTMLElement("< p>< >test"));
    assertEquals("< < >test",ExcelHssfView.getTxtWithoutHTMLElement("< p>< < >test"));
    assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("< table>< tr>< td> test< /td>< /tr>< /table>"));

  }


标签:

本站文章除注明转载外,均为本站原创或翻译。欢迎任何形式的转载,但请务必注明出处、不得修改原文相关链接,如果存在内容上的异议请邮件反馈至chenjj@evget.com

文章转载自:IT专家网论坛

为你推荐

  • 推荐视频
  • 推荐活动
  • 推荐产品
  • 推荐文章
  • 慧都慧问
扫码咨询


添加微信 立即咨询

电话咨询

客服热线
023-68661681

TOP