The text does technically give the reason on the first page:
It is not a regular language and hence cannot be parsed by regular expressions.
Here, “regular language” is a technical term, and the statement is correct.
The text goes on to discuss Perl regexes, which I think are able to parse at least all languages in LL(*)
. I’m fairly sure that is sufficient to recognize XML, but am not quite certain about HTML5. The WHATWG standard doesn’t define HTML5 syntax with a grammar, but with a stateful parsing procedure which defies normal placement in the Chomsky hierarchy.
This, of course, is the real reason: even if such a regex is technically possible with some regex engines, creating it is extremely exhausting and each time you look into the spec to understand an edge case you suffer 1D6 SAN damage.
There is no downside to nested encryption, except of course the performance overhead. But this only really makes sense if each layer has an independent key and each layer uses an algorithm from a different family. Improper key reuse weakens the scheme.
For symmetric cryptography like AES the benefit is dubious. It is far more likely that the content is decrypted because the key was acquired independently than that AES would be broken.
However, there absolutely is a benefit for asymmetric crypto and key agreement schemes. This is how current Post-Quantum Cryptography schemes work, because:
Nesting one algorithm from each family gives us the best of both worlds, at a performance overhead: conventional asymmetric cryptography give us temporary security in the near future, and the second PQC layer gives us a chance at long-term security.