|
| 1 | +--- |
| 2 | +id: tag-validator |
| 3 | +title: Tag Validator |
| 4 | +sidebar_label: 0591 - Tag Validator |
| 5 | +tags: |
| 6 | + - Iterator |
| 7 | + - Stack |
| 8 | + - String |
| 9 | +description: "This is a solution to the Tag Validator problem on LeetCode." |
| 10 | +--- |
| 11 | + |
| 12 | +## Problem Description |
| 13 | + |
| 14 | +Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid. |
| 15 | + |
| 16 | +A code snippet is valid if all the following rules hold: |
| 17 | + |
| 18 | +1. The code must be wrapped in a **valid closed tag**. Otherwise, the code is invalid. |
| 19 | +2. A **closed tag** (not necessarily valid) has exactly the following format : `<TAG_NAME>TAG_CONTENT</TAG_NAME>`. Among them, `<TAG_NAME>` is the start tag, and `</TAG_NAME>` is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is **valid** if and only if the TAG_NAME and TAG_CONTENT are valid. |
| 20 | +3. A **valid** `TAG_NAME` only contain **upper-case letters**, and has length in range [1,9]. Otherwise, the `TAG_NAME` is **invalid**. |
| 21 | +4. A **valid** `TAG_CONTENT` may contain other **valid closed tags, cdata** and any characters (see note1) **EXCEPT** unmatched `<`, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the `TAG_CONTENT` is **invalid**. |
| 22 | +5. A start tag is unmatched if no end tag exists with the same TAG_NAME, and vice versa. However, you also need to consider the issue of unbalanced when tags are nested. |
| 23 | +6. A `<` is unmatched if you cannot find a subsequent `>`. And when you find a `<` or `</`, all the subsequent characters until the next `>` should be parsed as TAG_NAME (not necessarily valid). |
| 24 | +7. The cdata has the following format : `<![CDATA[CDATA_CONTENT]]>`. The range of `CDATA_CONTENT` is defined as the characters between `<![CDATA[ and the first subsequent ]]>`. |
| 25 | +8. `CDATA_CONTENT` may contain **any characters**. The function of cdata is to forbid the validator to parse `CDATA_CONTENT`, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as **regular characters**. |
| 26 | + |
| 27 | +### Examples |
| 28 | +**Example 1:** |
| 29 | + |
| 30 | +``` |
| 31 | +Input: code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>" |
| 32 | +Output: true |
| 33 | +Explanation: |
| 34 | +The code is wrapped in a closed tag : <DIV> and </DIV>. |
| 35 | +The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. |
| 36 | +Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag. |
| 37 | +So TAG_CONTENT is valid, and then the code is valid. Thus return true. |
| 38 | +``` |
| 39 | + |
| 40 | +**Example 2:** |
| 41 | + |
| 42 | +``` |
| 43 | +Input: code = "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>" |
| 44 | +Output: true |
| 45 | +Explanation: |
| 46 | +We first separate the code into : start_tag|tag_content|end_tag. |
| 47 | +start_tag -> "<DIV>" |
| 48 | +end_tag -> "</DIV>" |
| 49 | +tag_content could also be separated into : text1|cdata|text2. |
| 50 | +text1 -> ">> ![cdata[]] " |
| 51 | +cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>" |
| 52 | +text2 -> "]]>>]" |
| 53 | +The reason why start_tag is NOT "<DIV>>>" is because of the rule 6. |
| 54 | +The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7. |
| 55 | +``` |
| 56 | + |
| 57 | +### Constraints |
| 58 | + |
| 59 | +- `1 <= code.length <= 500` |
| 60 | +- `code` consists of English letters, digits, `'<'`, `'>'`, `'/'`, `'!'`, `'['`, `']'`, `'.'`, and `' '`. |
| 61 | + |
| 62 | +## Solution for Tag Validator |
| 63 | + |
| 64 | +### Approach: Stack |
| 65 | +Summarizing the given problem, we can say that we need to determine whether a tag is valid or not, by checking the following properties. |
| 66 | + |
| 67 | +1. The code should be wrapped in a valid closed tag. |
| 68 | + |
| 69 | +2. The `TAG_NAME` should be valid. |
| 70 | + |
| 71 | +3. The `TAG_CONTENT` should be valid. |
| 72 | + |
| 73 | +4. The cdata should be valid. |
| 74 | + |
| 75 | +5. All the tags should be closed. i.e. each start-tag should have a corresponding end-tag and vice-versa and the order of the tags should be correct as well. |
| 76 | + |
| 77 | +In order to check the validity of all these, firstly, we need to identify which parts of the given code string act as which part from the above-mentioned categories. To understand how it's done, we'll go through the implementation and the reasoning behind it step by step. |
| 78 | + |
| 79 | +We iterate over the given code string. Whenever a `<` is encountered(unless we are currently inside `<![CDATA[...]]>`), it indicates the beginning of either a `TAG_NAME`(start tag or end tag) or the beginning of cdata as per the conditions given in the problem statement. |
| 80 | + |
| 81 | +If the character immediately following this `<` is an `!`, the characters following this `<` can't be a part of a valid `TAG_NAME`, since only upper-case letters(in case of a start tag) or `/` followed by upper-case letters(in the case of an end tag). Thus, the choice now narrows down to only **cdata**. Thus, we need to check if the current bunch of characters following `<!`(including it) constitute a valid **cdata**. To do this, firstly we find out the first matching `]]>` following the current `<!` to mark the ending of **cdata**. If no such matching `]]>` exists, the code string is considered as invalid. Apart from this, the `<!` should also be immediately followed by `CDATA[` for the **cdata** to be valid. The characters lying inside the `<![CDATA[` and `]]>` do not have any constraints on them. |
| 82 | + |
| 83 | +If the character immediately following the `<` encountered isn't an `!`, this `<` can only mark the beginning of `TAG_NAME`. Now, since a valid start tag can't contain anything except upper-case letters if a `/` is found after `<`, the `</` pair indicates the beginning of an end tag. Now, when a `<` refers to the beginning of a `TAG_NAME`(either start-tag or end-tag), we find out the first closing `>` following the `<` to find out the substring(say s), that constitutes the `TAG_NAME`. This s should satisfy all the criteria to constitute a valid `TAG_NAME`. Thus, for every such s, we check if it contains all upper-case letters and also check its length(It should be between 1 to 9). If any of the criteria isn't fulfilled, s doesn't constitute a valid `TAG_NAME`. Hence, the code string turns out to be invalid as well. |
| 84 | + |
| 85 | +Apart from checking the validity of the `TAG_NAME`, we also need to ensure that the tags always exist in pairs. i.e. for every start-tag, a corresponding end-tag should always exist. Further, we can note that in case of multiple `TAG_NAME`'s, the `TAG_NAME` whose start-tag comes later than the other ones, should have its end-tag appearing before the end-tags of those other `TAG_NAME`'s. i.e. the tag that starts later should end first. |
| 86 | + |
| 87 | +From this, we get the intuition that we can make use of a stack to check the existence of matching start and end-tags. Thus, whenever we find out a valid start-tag, as mentioned above, we push its `TAG_NAME` string onto a stack. Now, whenever an end-tag is found, we compare its `TAG_NAME` with the `TAG_NAME` at the top of the stack and remove this element from the stack. If the two don't match, this implies that either the current end-tag has no corresponding start-tag or there is a problem with the ordering of the tags. The two need to match for the tag-pair to be valid since there can't exist an end-tag without a corresponding start-tag and vice-versa. Thus, if a match isn't found, we can conclude that the given code string is invalid. |
| 88 | + |
| 89 | +Now, after the complete code string has been traversed, the stack should be empty if all the start-tags have their corresponding end-tags as well. If the stack isn't empty, this implies that some start-tag doesn't have the corresponding end-tag, violating the closed-tag's validity condition. |
| 90 | + |
| 91 | +Further, we also need to ensure that the given code is completely enclosed within closed tags. For this, we need to ensure that the first **cdata** found is also inside the closed tags. Thus, when we find a possibility of the presence of **cdata**, we proceed further only if we've already found a start tag, indicated by a non-empty stack. Further, to ensure that no data lies after the last end-tag, we need to ensure that the stack doesn't become empty before we reach the end of the given code string since an empty stack indicates that the last end-tag has been encountered. |
| 92 | +## Code in Different Languages |
| 93 | + |
| 94 | +<Tabs> |
| 95 | +<TabItem value="cpp" label="C++"> |
| 96 | + <SolutionAuthor name="@Shreyash3087"/> |
| 97 | + |
| 98 | +```cpp |
| 99 | +#include <stack> |
| 100 | +#include <string> |
| 101 | + |
| 102 | +class Solution { |
| 103 | +private: |
| 104 | + std::stack<std::string> stack; |
| 105 | + bool contains_tag = false; |
| 106 | + |
| 107 | + bool isValidTagName(std::string s, bool ending) { |
| 108 | + if (s.length() < 1 || s.length() > 9) |
| 109 | + return false; |
| 110 | + for (int i = 0; i < s.length(); i++) { |
| 111 | + if (!isupper(s[i])) |
| 112 | + return false; |
| 113 | + } |
| 114 | + if (ending) { |
| 115 | + if (!stack.empty() && stack.top() == s) |
| 116 | + stack.pop(); |
| 117 | + else |
| 118 | + return false; |
| 119 | + } else { |
| 120 | + contains_tag = true; |
| 121 | + stack.push(s); |
| 122 | + } |
| 123 | + return true; |
| 124 | + } |
| 125 | + |
| 126 | + bool isValidCdata(std::string s) { |
| 127 | + return s.find("[CDATA[") == 0; |
| 128 | + } |
| 129 | + |
| 130 | +public: |
| 131 | + bool isValid(std::string code) { |
| 132 | + if (code[0] != '<' || code[code.length() - 1] != '>') |
| 133 | + return false; |
| 134 | + for (int i = 0; i < code.length(); i++) { |
| 135 | + bool ending = false; |
| 136 | + int closeindex; |
| 137 | + if (stack.empty() && contains_tag) |
| 138 | + return false; |
| 139 | + if (code[i] == '<') { |
| 140 | + if (!stack.empty() && code[i + 1] == '!') { |
| 141 | + closeindex = code.find("]]>", i + 1); |
| 142 | + if (closeindex < 0 || !isValidCdata(code.substr(i + 2, closeindex - i - 2))) |
| 143 | + return false; |
| 144 | + } else { |
| 145 | + if (code[i + 1] == '/') { |
| 146 | + i++; |
| 147 | + ending = true; |
| 148 | + } |
| 149 | + closeindex = code.find('>', i + 1); |
| 150 | + if (closeindex < 0 || !isValidTagName(code.substr(i + 1, closeindex - i - 1), ending)) |
| 151 | + return false; |
| 152 | + } |
| 153 | + i = closeindex; |
| 154 | + } |
| 155 | + } |
| 156 | + return stack.empty() && contains_tag; |
| 157 | + } |
| 158 | +}; |
| 159 | + |
| 160 | + |
| 161 | +``` |
| 162 | +</TabItem> |
| 163 | +<TabItem value="java" label="Java"> |
| 164 | + <SolutionAuthor name="@Shreyash3087"/> |
| 165 | + |
| 166 | +```java |
| 167 | +public class Solution { |
| 168 | + Stack < String > stack = new Stack < > (); |
| 169 | + boolean contains_tag = false; |
| 170 | + public boolean isValidTagName(String s, boolean ending) { |
| 171 | + if (s.length() < 1 || s.length() > 9) |
| 172 | + return false; |
| 173 | + for (int i = 0; i < s.length(); i++) { |
| 174 | + if (!Character.isUpperCase(s.charAt(i))) |
| 175 | + return false; |
| 176 | + } |
| 177 | + if (ending) { |
| 178 | + if (!stack.isEmpty() && stack.peek().equals(s)) |
| 179 | + stack.pop(); |
| 180 | + else |
| 181 | + return false; |
| 182 | + } else { |
| 183 | + contains_tag = true; |
| 184 | + stack.push(s); |
| 185 | + } |
| 186 | + return true; |
| 187 | + } |
| 188 | + public boolean isValidCdata(String s) { |
| 189 | + return s.indexOf("[CDATA[") == 0; |
| 190 | + } |
| 191 | + public boolean isValid(String code) { |
| 192 | + if (code.charAt(0) != '<' || code.charAt(code.length() - 1) != '>') |
| 193 | + return false; |
| 194 | + for (int i = 0; i < code.length(); i++) { |
| 195 | + boolean ending = false; |
| 196 | + int closeindex; |
| 197 | + if(stack.isEmpty() && contains_tag) |
| 198 | + return false; |
| 199 | + if (code.charAt(i) == '<') { |
| 200 | + if (!stack.isEmpty() && code.charAt(i + 1) == '!') { |
| 201 | + closeindex = code.indexOf("]]>", i + 1); |
| 202 | + if (closeindex < 0 || !isValidCdata(code.substring(i + 2, closeindex))) |
| 203 | + return false; |
| 204 | + } else { |
| 205 | + if (code.charAt(i + 1) == '/') { |
| 206 | + i++; |
| 207 | + ending = true; |
| 208 | + } |
| 209 | + closeindex = code.indexOf('>', i + 1); |
| 210 | + if (closeindex < 0 || !isValidTagName(code.substring(i + 1, closeindex), ending)) |
| 211 | + return false; |
| 212 | + } |
| 213 | + i = closeindex; |
| 214 | + } |
| 215 | + } |
| 216 | + return stack.isEmpty() && contains_tag; |
| 217 | + } |
| 218 | +} |
| 219 | +``` |
| 220 | + |
| 221 | +</TabItem> |
| 222 | +<TabItem value="python" label="Python"> |
| 223 | + <SolutionAuthor name="@Shreyash3087"/> |
| 224 | + |
| 225 | +```python |
| 226 | +class Solution: |
| 227 | + def __init__(self): |
| 228 | + self.stack = [] |
| 229 | + self.contains_tag = False |
| 230 | + |
| 231 | + def isValidTagName(self, s, ending): |
| 232 | + if len(s) < 1 or len(s) > 9: |
| 233 | + return False |
| 234 | + if not all(c.isupper() for c in s): |
| 235 | + return False |
| 236 | + if ending: |
| 237 | + if self.stack and self.stack[-1] == s: |
| 238 | + self.stack.pop() |
| 239 | + else: |
| 240 | + return False |
| 241 | + else: |
| 242 | + self.contains_tag = True |
| 243 | + self.stack.append(s) |
| 244 | + return True |
| 245 | + |
| 246 | + def isValidCdata(self, s): |
| 247 | + return s.startswith("[CDATA[") |
| 248 | + |
| 249 | + def isValid(self, code): |
| 250 | + if code[0] != '<' or code[-1] != '>': |
| 251 | + return False |
| 252 | + i = 0 |
| 253 | + while i < len(code): |
| 254 | + ending = False |
| 255 | + if not self.stack and self.contains_tag: |
| 256 | + return False |
| 257 | + if code[i] == '<': |
| 258 | + if self.stack and code[i + 1] == '!': |
| 259 | + closeindex = code.find("]]>", i + 1) |
| 260 | + if closeindex < 0 or not self.isValidCdata(code[i + 2:closeindex]): |
| 261 | + return False |
| 262 | + else: |
| 263 | + if code[i + 1] == '/': |
| 264 | + i += 1 |
| 265 | + ending = True |
| 266 | + closeindex = code.find('>', i + 1) |
| 267 | + if closeindex < 0 or not self.isValidTagName(code[i + 1:closeindex], ending): |
| 268 | + return False |
| 269 | + i = closeindex |
| 270 | + i += 1 |
| 271 | + return not self.stack and self.contains_tag |
| 272 | + |
| 273 | + |
| 274 | +``` |
| 275 | +</TabItem> |
| 276 | +</Tabs> |
| 277 | + |
| 278 | +## Complexity Analysis |
| 279 | + |
| 280 | +### Time Complexity: $O(N)$ |
| 281 | + |
| 282 | +> **Reason**: We traverse over the given code string of length N. |
| 283 | +
|
| 284 | +### Space Complexity: $O(N)$ |
| 285 | + |
| 286 | +> **Reason**: The stack can grow upto a size of n/3 in the worst case. e.g. In case of `<A><B><C><D>`, N=12 and number of tags = 12/3 = 4. |
| 287 | +
|
| 288 | +## References |
| 289 | + |
| 290 | +- **LeetCode Problem**: [Tag Validator](https://leetcode.com/problems/tag-validator/description/) |
| 291 | + |
| 292 | +- **Solution Link**: [Tag Validator](https://leetcode.com/problems/tag-validator/solutions/) |
0 commit comments