HtmlDocumentHeaderIndex Class

Article Id
kb251
Published
12 Feb 2016 at 14:32
Last Updated
12 Feb 2016 at 17:11

Summary

This article provides an explanation of the HtmlDocumentHeaderIndex class included the CalzadaMedia.Web common library. For reference, an example based upon working code is also provided.

Overview

The HtmlDocumentHeaderIndex class provides functionality to generate an index of header tags (H1, H2, H3 etc) within an HTML document. A common application for this class is to generate Table of Contents (ToCs) for documents. An example of the HtmlDocumentHeaderIndex may be seen within this article - the Contents section at the top was generated using HtmlDocumentHeaderIndex.

How It Works

The HtmlDocumentHeaderIndex scans a supplied HTML string for all instances of header tags. From this is builds up an index of all the headers including the respective positions within the document structure.

In addition to building an index of headers, the HtmlDocumentHeaderIndex can also generate a HTML list of the index within links pointing to anchors.

Default Operation

By default, the HtmlDocumentHeaderIndex only looks for H2 to H6 tags. H1 tags are deliberately excluded as there should only be one H1 tag on an HTML page, and that should be for the document title. You may amend which tags the HtmlDocumentHeaderIndex scans for via the HeaderTags property.

Code Examples

Example 1: Basic Usage

In this example, the variable myHtmlText contains the HTML Document to scan. The generated index is retrieved through the GetHtmlOrderedList() function.

Dim headerIndex As New CalzadaMedia.Web.UI.HtmlDocumentHeaderIndex
headerIndex.DocumentText = myHtmlText
headerIndex.GenerateIndex()
Dim tableOfContents = headerIndex.GetHtmlOrderedList()

Example 2: With Table of Contents numbering and Anchor Tags

In this example, the HtmlDocumentHeaderIndex inserts Table of Contents (Toc) numbering and also anchor tags.

The HTML list returned by GetHtmlOrderedList will vary from the Basic Usage example above. Firstly, the header name will be prefixed with a sequential number based upon the header's position within the document structure. Secondly, the header text itself will be a Hyperlink pointing to an anchor next to the actual header tag.

The DocumentText property contains the contents of myHtmlText modified to include the relevant anchor tags.

A live example of this code may be seen at the top of this article in the Contents section.

Dim headerIndex As New CalzadaMedia.Web.UI.HtmlDocumentHeaderIndex
headerIndex.DocumentText = myHtmlText
headerIndex.InsertTocNumbering = True
headerIndex.InsertAnchorTags = True
headerIndex.GenerateIndex()
htmlText = HeaderIndex.DocumentText
Dim tableOfContents = headerIndex.GetHtmlOrderedList()

Additional Notes & Considerations

The DocumentText Property

The HtmlDocumentHeaderIndex does not subtract any tags or values from the value supplied to DocumentText. If an anchor has already been defined for a header, it will not be removed or replaced.

Where to use the HtmlDocumentHeaderIndex

In all likelihood, the HtmlDocumentHeaderIndex may be used in conjunction with other text parsers - like HtmlDocumentLinkInjector. To ensure that the generated index is as accurate as possible it is recommended that the HtmlDocumentHeaderIndex is the last parser used.

     
Copyright © 2011 - 2024 Calzada Media Limited. All Rights Reserved