In my opinion, the reason people hate XML is because of what M signifies: it is a markup language and most of the time we don’t need a markup language. Markup languages are great for rich text documents. They are just not a good fit for representing data. The markup-nature of XML introduces unnecessary choice in whether to use an attribute or a child element to represent data; for HTML such ambiguity doesn’t actually exist but for data it does. Consider this piece of XML from the Python docs:
Why is the country name an attribute but not the rank? Why are all information about neighbors attributes but not children?
Furthermore parsing JSON or YAML gives you an AST that consists of the basic data types like lists and dictionaries. Parsing XML gives you an AST that requires a lot more effort to turn into data in your domain. Even on the web, very few people like to use the verbose XML DOM API like childNodes, nodeType, getElementsByTagName et al; it is basically unheard of for anyone to use it outside the web such as in Python, despite that the DOM API is in the Python standard library since forever (see https://github.com/python/cpython/blob/3.14/Lib/xml/dom/mini... for example).
Attributes are intended to hold metadata, not data. It's not the fault of the format if someone chose to use it in a poor way.
You also need to distinguish the format itself from the various libraries that may be available to parse/process it in a given language. It doesn't always make sense, even when it's an option, to let the tail wag the dog and choose a language just because it has a nice library for something.
> Furthermore parsing JSON or YAML gives you the basic data types like lists and dictionaries
Well, maybe some library for some language does that, and if that is the language you are using, and that is all you need, then I suppose you are in luck. More generally you may want to use a format like XML or JSON to hold user defined types, which rather levels the playing field since there are few good libraries for this in any language,and you may need to roll your own (been there, done that).
> Furthermore parsing JSON or YAML gives you the basic data types like lists and dictionaries. Parsing XML gives you an AST that requires a lot more effort to turn into data in your domain.
More precisely: in XML, elements (nodes) are named/labeled. ("node-labeled graph")
In JSON, keys (edges) are named. ("edge-labeled graph")
In programming, we need names for the fields in our structures (edges between objects), so JSON is a much better match than XML (which needs contortions to handle this use case -- e.g. by having nesting levels alternate between element=node and element=edge).
Only in some object-oriented cases (which derived class should the deserializer construct?) do you care about node labels -- but usually that's in addition to edge labels, so a "_type" key in JSON is still easier than XML.
Well, "easier" may well be "one less dimension to encode data" in this case.
Sure, this gives quite a few variations on how to serialize some data. But it's not like json's simpler approach would make data serialization universal, there are many different ways to encode the same thing.
Cardinality is the easy way to resolve this. If the data has a cardinality of 1, it should be an attribute. If cardinality > 1, it should be a child element/node.
Interesting point of view. JSON is also not the right thing to use in many scenarios, but it is the de-facto standard now. Maybe something like protobuf is the way to go.
Because SAX parsing is a thing, and the visitor pattern makes it easy to elide searches in sub-trees if an attribute does not match.
So if name == "foobar" then read; else ignore. For a 500 GiB XML file that makes a difference.
As for your other point about an "AST" (it's actually just a DOM.) That's the the benefit? And you're in for a surprise when you learn that reaching into a deeply-nested JSON structure deserialised into whatever memory format most appropriate for your pet language is also an abstract data type that you act on with getters/accessors/what-have-yous that is in all but name a DOM.
And we do have tools to deal with it: XSLT for transformation. For querying? XPath.
I hate the X and L parts too. Just because you put a URL inside doesn't make the other side understand your structure. The features that try to make it extensible actually make it less so.
I can't think of any cases XML has helped, and plenty where it's massively gotten in the way. XMPP should've been json for instance. React used something like XML in structure for JSX but didn't actually use XML, so thank goodness we didn't have to put xmlns= all over it.
Not really. In C# I use a parsing library for which I just write a class and then the library automatically serializes the JSON into an instance of that class.
I can do the same thing with XML. Of course it doesn't necessarily go that smoothly with all xml, but as long as the xml is fairly simple like a JSON document would be it's totally fine. It's only when you start to use all the features of xml that don't fit neatly into a class model that it starts to get annoying. But if JSON serves your needs then simple xml does as well. I wouldn't use it because JSON works just fine but it's not as bad as people make it seem, unless people make it really bad.
I would even go as far to say that XML may very well be better in some cases, - here you have a schema most of the time, so you can often catch e.g. schema evolution failures at compile time.
This is much less common/less standardized with json.
JSON schema exists. If you restrict yourself to a sensible subset of XML features in your XML schema, you can have a 1:1 correspondence to JSON and JSON Schema. We do that at work. Due to historical reasons we have a XSD but provide the complimentary JSON Schema to those who wish to send us JSON.
The JSON is converted on the fly to XML based on the XSD so it can be ingested by our existing XML integration. Similar with return answers, response XMLs are converted on the fly based on the XSD to JSON.
JSON endpoints validate against the JSON Schema, also generated from the XSD at runtime, XML against the XSD of course.
We had a diverse set of XSDs but didn't have to tweak them to support JSON. We used restrictions and extensions, both simple and complex, we used min/max, enums, descriptions and examples and more, so not entirely boring XSDs.
We did establish some conventions, attributes turns the child element to an object and the attributes become properies, just simple stuff like that.
This way customers can hand us what they prefer generating and ingesting, and we don't have to worry about keeping two different schemas for the same endpoint in sync.
That is actually a good approach that I have also used a lot: let the parsing library handle everything including the serialization and deserialization. But if you do that, why do you care that behind the scenes it is using JSON or XML or protobuf or something else?
The problem, IMHO, was that rampant "xml-abuse" in the naughts. ws-* standards and over-engineered garbage like SOAP ("complex object access protocol") made people loathe XML.
I did like JAXB in Java, XLST, schemas, XPATH. Never got into XSL, but it seemed like good thing too. It worked best when your tooling manipulated it for you or at least helped you in an intelligent way. Much of the hate for XML came from situations where you had to deal with someone's over-the-top-one-size-fits all schema without the benefit of tooling to at least hint you in the right direction.
It still survives in WPF and c# *.proj files. If it were just me, I would still use it for object serialization. But json is king now even though it's inferior.
It's non-trivial to implement XML parser in a secure way, many stdlib ones are insecure by default. That should just not be a thing. XML has a bunch of vulnerabilities very specific to it, XXE is the most well known one, but you also have a bunch of DoSes due to expansions and XPath injection etc.
An object serialization format should not have a bunch of footguns and vulnerability categories specific to it.
The funny thing is that JSON parsing is usually kinda unsafe in it's main target language JavaScript, and usually safe in other languages, because of the `__proto__` prototype pollution.
At least XML is hated for the wrong reasons (e.g. verbosity, esthetics) most of the time. There was for sure an era where it was overused (see Apache Cocoon from 2006 https://en.wikipedia.org/wiki/Apache_Cocoon). But XML is still a pretty good format to exchange (and store) data and make sure the data conforms to a certain schema. JSON Schema in comparison is not nearly as powerful.
1. What, in your view, are the right reasons to hate XML?
2. To me, verbosity and aesthetics seem like perfectly valid reasons to hate XML. Once you learn S expressions, XML looks disgusting. They implemented half of Common Lisp in a markup language.
> They implemented half of Common Lisp in a markup language
Come on, S expressions are just trees, they are not God's gift to humankind.. and just because a language has an AST (surprise, a tree again!) doesn't make it a lisp. I can write a C program's AST as sexprs or a Haskell program's, yet neither will be a lisp.
XML is unfairly maligned. Yes, people bought into it too much 26 years ago, but then you would too if you had to maintain someone else's massive packed struct dumped into a file and documented in a poorly-maintained word document --- or worse, a brace of dumb IETF RFCs that contradict eachother.
I am glad that younger generations are looking at it with fresh eyes. XML is a useful format; it has its place in your toolbox. Ignore the haters.
I’ve hated XML since 2004. The worst part about it is the tags vs attributes fights. They both do the same thing and the only difference is preference. Having two ways of doing the same thing invite and incite religious positions and cause unnecessary fighting. There should be one, opinionated way of doing things so you avoid confusion.
> The worst part about it is the tags vs attributes fights. They both do the same thing and the only difference is preference.
They're not the same thing. If you look at it as the extensible markup language for documents that it is, "tags" (i.e. inner content) would be visible and "attributes" would not. If your XML document was processed by an application to convert to another type of document (PDF, etc.), and it didn't recognize a particular tag, it would be sensible for attributes to disappear, but inner content ("tags") to remain.
It's only seems like a preference thing if you look at XML as a structured data format like JSON is.
In data structure terms, attributes do allow nodes to be decorated with additional information without forcing any change on existing parsers. In JSON, this would require swapping, eg. "str" -> {"value": "str", "attrib1": "..."}.
Last year we chose XML as the basis for our document language.
It's been a good choice for designing a new language, but we've been really surprised by the poor quality of the available parsers. We figured it would be a solved problem, but we'll be writing our own at some point.
Honestly I miss it. As overengineered as it was, at least we had proper tooling for it, and while there were dialects in the associated tech (e.g. XML Schema vs RELAX NG vs Schematron) it was minor compared to the wild west that JSON is to this day.
I dislike it because it failed in such a fundamental way as a way to represent a document; you cannot, in general, reliably determine what characters the bytes in an XML file represent - the best a general XML processor can do is guess.
> developers must become domain experts [my emphasis] in a rich and complex space that is essentially unrelated to the application itself.
XML is a markup language, but most people that used it just needed a standard structured data format. In comes JSON which is more easily compatible with the object systems of various languages and in particular is compatible with Javascript syntax, and XML loses most of the people that used it.
As a markup language though, it seems pretty good. It's just that the amount of people that actually need an extensible markup language is much smaller.
I do hate the strictness of it. The header
<?xml version="1.0" encoding="UTF-8"?>
should be unnecessary. For a markup language, an already-made plain-text document should already count as XML. The tags should be something you can just sprinkle as you'd like to add contextual metadata.
Aside I love how about me is just another tag and clicking lists 3 blog posts.
On XML I don't hate. I hate wrapping my head around XSLT but that is more about my head. AI may make XSLTing more bearable as it happens? I did work with someone passionate about XSLT. Aaaand now I am doxxed.
I also thing in practice schemaless i.e. JSON or "the schema is look at the code or some logs lol" won because fuck let's face it that is more fun.
i dont hate it, the declaration kind of annoys me from time to time digging into attributes can be annoying its obviously not the best form of structured data.
json is just easier for my brain at this point if it needs to go over http, but ive seen some pretty... poorly designed json structures.
csv is always a good time. love when i can just plop important data into a table and query away
Every time XML comes up, I feel obligated to share my opinion (I too wrote XML a the turn of the millennium and have seen it become and still witness on occasion it being excommunicated).
XML is verbose and therefore uglier than it ought to be. I think most of the haters hate it for that alone -- there's not much else to hate because you don't have to deal with the rest, it's not really imposed on you unless you really have to deal with someone else's XML application.
What do I mean? Well, the brackets thing and the necessity to repeat name of every element twice, in correct (LIFO, last in first out) order, isn't great, admittedly.
What XML has that the dev-bro alternatives that have flooded the void XML left since, haven't gotten and thus see being reinvented, are: namespaces, attributes and interop using the former two. Sure you can write JSON and YAML (the latter deservingly being incredibly hard to parse correctly -- they tried to design a better XML but failed IMO) -- but these suck as meta-languages because there's not much "meta" there. JSON, for example, allows you create properties and has a few types (kind of more than XML, really) but it leaves semantics up to you and namespaces are up to you to re-invent, poorly. If you think I am stretching the argument, see if you can represent an HTML document (no, not Markdown) with a JSON file.
YAML is a similar story, albeit with a few cool things like aliases. I think it's a better attempt to give the world a better XML, but the jury is still out on that one.
The killer thing with XML, for better and for worse, was plethora of tools to work with it. I wrote a fair share of XSLT documents to transform data, back when there was momentum in XHTML, for example. XSLT barely supports JSON and it's not pretty. XPath cannot natively understand YAML -- unless you convert it to XML which I guess re-animates XML as some sort of Frankenstein's monster. And even if it were a [pretty] monster, dealing with intermediate representation for the kind of purpose, is a can of worms all of its own.
Ironically nobody seems to hate HTML 5, seemingly. Or React basically turned it into a greasy cogwheel nobody needs to look at. Because if you look at it, it's in my opinion an abomination even compared to XML (unpopular opinion) -- the parser is quirky and behaviour is defined by the standard per element type (i.e. some elements need a closing tag and some do not, and what happens if you forget a closing tag is element-specific; care to remember the set of rules to ensure your document renders to your liking?). It has no namespaces but it has "custom elements" which require a dash in the name as poor' man's namespaces and you can't omit one, and now we have a Web of `x-spinner` and `x-carousel` because it turns out everyone rightfully wanted default namespace but didn't get one. Anyway, it's all plumbing, right -- the idea of _writing_ HTML has largely come and gone us by. And I am digressing.
The one good feature of HTML 5 was the introduction of boolean attributes. It's a feature XML could and should adopt.
The whole handling of custom elements was fumbled beyond belief. The HTML spec is a disaster particularly it's parsing rules the complexity of which is used as excuse by HTML spec authors not to improve the language.
XHTML was a better path.
I think the reason we don't see too many people complaining about HTML 5 is because not many web programmers use it directly, most are using JSX.
What do you think of CSTML? It's my attempt to heal the rift between XML and HTML5, as well as solving all the problems that made XML feel onerous to use... https://docs.bablr.org/guides/cstml
It's simple to parse like JSON, it has namespaces like XML (but better), and it doesn't require you to repeat the name of every element twice.
> Well, the brackets thing and the necessity to repeat name of every element twice,
As a document format, it's supposed to be hand-written by humans. If you have paragraphs between the opening tag and closing tag, it makes sense to let the reader know what they're seeing the closing of.
After deciding you do want to repeat the element name, the angle brackets make more sense. Otherwise, you can have a syntax like LaTeX's.
XHTML was dropped because it wasn’t backwards compatible, and it was too strict in its syntax. Minor syntax errors that could be automatically corrected by the browser turned into full page errors.
XML is ok, the problem IMO is that the way some people use(d) it is utterly unhinged.
So a return value that could have easily been two lined of text is now a nested demon of XML. The complexity isn't what that application does, the complexity is in how it returns the values.
Another good one is to use XML when something else would have made things simpler, e.g. for a config file where you could easily have used a .ini or a .toml file.
Or you have an application that tries to so generic that the definition for a simple use case is a whopping 5MB XML file. Cool. When I feel writing a config for your application is harder than programming the whole thing yourself you have really made it. /s
Furthermore parsing JSON or YAML gives you an AST that consists of the basic data types like lists and dictionaries. Parsing XML gives you an AST that requires a lot more effort to turn into data in your domain. Even on the web, very few people like to use the verbose XML DOM API like childNodes, nodeType, getElementsByTagName et al; it is basically unheard of for anyone to use it outside the web such as in Python, despite that the DOM API is in the Python standard library since forever (see https://github.com/python/cpython/blob/3.14/Lib/xml/dom/mini... for example).
You also need to distinguish the format itself from the various libraries that may be available to parse/process it in a given language. It doesn't always make sense, even when it's an option, to let the tail wag the dog and choose a language just because it has a nice library for something.
> Furthermore parsing JSON or YAML gives you the basic data types like lists and dictionaries
Well, maybe some library for some language does that, and if that is the language you are using, and that is all you need, then I suppose you are in luck. More generally you may want to use a format like XML or JSON to hold user defined types, which rather levels the playing field since there are few good libraries for this in any language,and you may need to roll your own (been there, done that).
More precisely: in XML, elements (nodes) are named/labeled. ("node-labeled graph") In JSON, keys (edges) are named. ("edge-labeled graph")
In programming, we need names for the fields in our structures (edges between objects), so JSON is a much better match than XML (which needs contortions to handle this use case -- e.g. by having nesting levels alternate between element=node and element=edge). Only in some object-oriented cases (which derived class should the deserializer construct?) do you care about node labels -- but usually that's in addition to edge labels, so a "_type" key in JSON is still easier than XML.
Sure, this gives quite a few variations on how to serialize some data. But it's not like json's simpler approach would make data serialization universal, there are many different ways to encode the same thing.
Perhaps because it's an example of what is possible in XML and how to parse it, and not, in fact, a particularly good or canonical example of XML?
So if name == "foobar" then read; else ignore. For a 500 GiB XML file that makes a difference.
As for your other point about an "AST" (it's actually just a DOM.) That's the the benefit? And you're in for a surprise when you learn that reaching into a deeply-nested JSON structure deserialised into whatever memory format most appropriate for your pet language is also an abstract data type that you act on with getters/accessors/what-have-yous that is in all but name a DOM.
And we do have tools to deal with it: XSLT for transformation. For querying? XPath.
I can't think of any cases XML has helped, and plenty where it's massively gotten in the way. XMPP should've been json for instance. React used something like XML in structure for JSX but didn't actually use XML, so thank goodness we didn't have to put xmlns= all over it.
I can do the same thing with XML. Of course it doesn't necessarily go that smoothly with all xml, but as long as the xml is fairly simple like a JSON document would be it's totally fine. It's only when you start to use all the features of xml that don't fit neatly into a class model that it starts to get annoying. But if JSON serves your needs then simple xml does as well. I wouldn't use it because JSON works just fine but it's not as bad as people make it seem, unless people make it really bad.
This is much less common/less standardized with json.
The JSON is converted on the fly to XML based on the XSD so it can be ingested by our existing XML integration. Similar with return answers, response XMLs are converted on the fly based on the XSD to JSON.
JSON endpoints validate against the JSON Schema, also generated from the XSD at runtime, XML against the XSD of course.
We had a diverse set of XSDs but didn't have to tweak them to support JSON. We used restrictions and extensions, both simple and complex, we used min/max, enums, descriptions and examples and more, so not entirely boring XSDs.
We did establish some conventions, attributes turns the child element to an object and the attributes become properies, just simple stuff like that.
This way customers can hand us what they prefer generating and ingesting, and we don't have to worry about keeping two different schemas for the same endpoint in sync.
The problem, IMHO, was that rampant "xml-abuse" in the naughts. ws-* standards and over-engineered garbage like SOAP ("complex object access protocol") made people loathe XML.
I did like JAXB in Java, XLST, schemas, XPATH. Never got into XSL, but it seemed like good thing too. It worked best when your tooling manipulated it for you or at least helped you in an intelligent way. Much of the hate for XML came from situations where you had to deal with someone's over-the-top-one-size-fits all schema without the benefit of tooling to at least hint you in the right direction.
It still survives in WPF and c# *.proj files. If it were just me, I would still use it for object serialization. But json is king now even though it's inferior.
An object serialization format should not have a bunch of footguns and vulnerability categories specific to it.
2. To me, verbosity and aesthetics seem like perfectly valid reasons to hate XML. Once you learn S expressions, XML looks disgusting. They implemented half of Common Lisp in a markup language.
Come on, S expressions are just trees, they are not God's gift to humankind.. and just because a language has an AST (surprise, a tree again!) doesn't make it a lisp. I can write a C program's AST as sexprs or a Haskell program's, yet neither will be a lisp.
I am glad that younger generations are looking at it with fresh eyes. XML is a useful format; it has its place in your toolbox. Ignore the haters.
- element vs attribute ambiguity
- model of the document does not fit nicely to programming model of structs, dicts and arrays
- too many complexities (entities, cdata, parser directives)
- cardinality unknown without schema (is that a single value, or an array that just happens to have one element)
- order of elements may or may not be significant depending on schema
- not really extensible if the original schema does not explicitly allow for extensibility
- some types of valid XML documents are not representable by a schema (e.g. any number of different elements in any order)
- verbosity
- namespace identifiers being URIs that may or may not be resolvable
What I want for general data exchange is JSON with comments and sane namespaces.
Edit: line wraps
They're not the same thing. If you look at it as the extensible markup language for documents that it is, "tags" (i.e. inner content) would be visible and "attributes" would not. If your XML document was processed by an application to convert to another type of document (PDF, etc.), and it didn't recognize a particular tag, it would be sensible for attributes to disappear, but inner content ("tags") to remain.
It's only seems like a preference thing if you look at XML as a structured data format like JSON is.
It's been a good choice for designing a new language, but we've been really surprised by the poor quality of the available parsers. We figured it would be a solved problem, but we'll be writing our own at some point.
XML is a markup language, but most people that used it just needed a standard structured data format. In comes JSON which is more easily compatible with the object systems of various languages and in particular is compatible with Javascript syntax, and XML loses most of the people that used it.
As a markup language though, it seems pretty good. It's just that the amount of people that actually need an extensible markup language is much smaller.
I do hate the strictness of it. The header
should be unnecessary. For a markup language, an already-made plain-text document should already count as XML. The tags should be something you can just sprinkle as you'd like to add contextual metadata.On XML I don't hate. I hate wrapping my head around XSLT but that is more about my head. AI may make XSLTing more bearable as it happens? I did work with someone passionate about XSLT. Aaaand now I am doxxed.
I also thing in practice schemaless i.e. JSON or "the schema is look at the code or some logs lol" won because fuck let's face it that is more fun.
I write a lot of html already I don't need xml too in my life.
json is just easier for my brain at this point if it needs to go over http, but ive seen some pretty... poorly designed json structures.
csv is always a good time. love when i can just plop important data into a table and query away
XML is verbose and therefore uglier than it ought to be. I think most of the haters hate it for that alone -- there's not much else to hate because you don't have to deal with the rest, it's not really imposed on you unless you really have to deal with someone else's XML application.
What do I mean? Well, the brackets thing and the necessity to repeat name of every element twice, in correct (LIFO, last in first out) order, isn't great, admittedly.
What XML has that the dev-bro alternatives that have flooded the void XML left since, haven't gotten and thus see being reinvented, are: namespaces, attributes and interop using the former two. Sure you can write JSON and YAML (the latter deservingly being incredibly hard to parse correctly -- they tried to design a better XML but failed IMO) -- but these suck as meta-languages because there's not much "meta" there. JSON, for example, allows you create properties and has a few types (kind of more than XML, really) but it leaves semantics up to you and namespaces are up to you to re-invent, poorly. If you think I am stretching the argument, see if you can represent an HTML document (no, not Markdown) with a JSON file.
YAML is a similar story, albeit with a few cool things like aliases. I think it's a better attempt to give the world a better XML, but the jury is still out on that one.
The killer thing with XML, for better and for worse, was plethora of tools to work with it. I wrote a fair share of XSLT documents to transform data, back when there was momentum in XHTML, for example. XSLT barely supports JSON and it's not pretty. XPath cannot natively understand YAML -- unless you convert it to XML which I guess re-animates XML as some sort of Frankenstein's monster. And even if it were a [pretty] monster, dealing with intermediate representation for the kind of purpose, is a can of worms all of its own.
Ironically nobody seems to hate HTML 5, seemingly. Or React basically turned it into a greasy cogwheel nobody needs to look at. Because if you look at it, it's in my opinion an abomination even compared to XML (unpopular opinion) -- the parser is quirky and behaviour is defined by the standard per element type (i.e. some elements need a closing tag and some do not, and what happens if you forget a closing tag is element-specific; care to remember the set of rules to ensure your document renders to your liking?). It has no namespaces but it has "custom elements" which require a dash in the name as poor' man's namespaces and you can't omit one, and now we have a Web of `x-spinner` and `x-carousel` because it turns out everyone rightfully wanted default namespace but didn't get one. Anyway, it's all plumbing, right -- the idea of _writing_ HTML has largely come and gone us by. And I am digressing.
The whole handling of custom elements was fumbled beyond belief. The HTML spec is a disaster particularly it's parsing rules the complexity of which is used as excuse by HTML spec authors not to improve the language.
XHTML was a better path.
I think the reason we don't see too many people complaining about HTML 5 is because not many web programmers use it directly, most are using JSX.
It's simple to parse like JSON, it has namespaces like XML (but better), and it doesn't require you to repeat the name of every element twice.
As a document format, it's supposed to be hand-written by humans. If you have paragraphs between the opening tag and closing tag, it makes sense to let the reader know what they're seeing the closing of.
After deciding you do want to repeat the element name, the angle brackets make more sense. Otherwise, you can have a syntax like LaTeX's.
So a return value that could have easily been two lined of text is now a nested demon of XML. The complexity isn't what that application does, the complexity is in how it returns the values.
Another good one is to use XML when something else would have made things simpler, e.g. for a config file where you could easily have used a .ini or a .toml file.
Or you have an application that tries to so generic that the definition for a simple use case is a whopping 5MB XML file. Cool. When I feel writing a config for your application is harder than programming the whole thing yourself you have really made it. /s