In this article, we will learn to Create a regular expression pattern that matches open tags in HTML except for self-contained XHTML tags.
A regular expression (RegEx) can be used to match open tags, excluding XHTML self-contained tags(For Eg- <br/>, <img />). This can be achieved by creating a pattern that matches opening angle brackets followed by a tag name, but excluding certain tags that are self-contained in XHTML, which don’t require a closing tag. The pattern can be tailored based on specific requirements and HTML structure.
Here are some common approaches to achieve this :
- Using Negative Lookahead
- Using a Whitelist of HTML Tags
- Using DOM Parse
Approach 1: Using Negative Lookahead
A negative lookahead allows us to specify a pattern that should not be present after the current position in the string.
Syntax:
Regular Expression Pattern: <([a-zA-Z]+)(?![^>]*\/>)>
Example: In this example, we are using the above-explained approach.
Javascript
const regex = /<([a-zA-Z]+)(?![^>]*\/>)>/; const inputString = '<div><br/><p>Hello</p><span>World</span></div>' ; const matches = inputString.match(regex); console.log(matches); |
[ '<div>', 'div', index: 0, input: '<div><br/><p>Hello</p><span>World</span></div>', groups: undefined ]
Approach 2: Using a Whitelist of HTML Tags
Another approach is to create a whitelist of HTML tags that are considered valid open tags and match against that list.
Syntax:
Regular Expression Pattern: <(div|p|span|...)>
Example: In this example, we are using the above-explained approach.
Javascript
const regex = /<(div|p|span)>/; const inputString = '<div><br/><p>Hello</p><span>World</span></div>' ; const matches = inputString.match(regex); console.log(matches); |
[ '<div>', 'div', index: 0, input: '<div><br/><p>Hello</p><span>World</span></div>', groups: undefined ]
Approach 3: Using DOM Parse
The DOM Parser is a JavaScript utility that is built-in to HTML/XML strings and converts them into a structured document object model (DOM) representation, making it simple to navigate and manipulate the document’s contents.
Syntax:
parseFromString(string, mimeType);
Example: In this example, we are using the above-explained approach.
Javascript
// Example HTML input const Data = '<div class="container"><p>Hello, <span>world!</span></p></div>' ; // Create a DOM parser const parser = new DOMParser(); // Parse the HTML string const inputElement = parser.parseFromString(Data, 'text/html' ); // Get all elements const elements = inputElement.getElementsByTagName( '*' ); // Filter open tags const matches = Array.from(elements).filter((element) => element.outerHTML.match(/<([A-Za-z][A-Za-z0-9]*)\b(?![^>]*\/>)/)); // Output the matched open tags console.log(matches); |
Output:
(6)
0: html
1: head
2: body
3: div.container
4: p
5: span