Skip to content

How to extract the content of only certain tags using css selector #76

@rulo4

Description

@rulo4

This is the web page that I want to extract text from http://www.jornada.unam.mx/2018/02/21/politica/005n1pol

If I use the complex-config.xml file from the examples, I get all content of the web page. Now, I want to extract only the content inside a especific div. That div have the following css selector #article-cont.

To achieve this, I'm trying to use the following importer configuration, but I still get all page content.

<importer>
          <preParseHandlers>
              <filter class="com.norconex.importer.handler.filter.impl.DOMContentFilter"
                        selector="#article-cont" onMatch="include" >
              </filter>
          </preParseHandlers>
</importer>

What I'm doing wrong? What have I to do?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions