JSON Schema bundling formalised

atombender · on Sept 10, 2021

I'll be one of the few positive voices here, I guess.

JSON Schema is pretty good. It's a relatively simple, extensible, pragmatic specification that supports validating the kinds of data that JSON can express.

XML was killed by complexity, same as lots of technologies that preceded it, such as CORBA and SOAP. Developers don't like complexity. W3C tried to build an enormously complicated ecosystem of tools on top of XML. The specifications for XML Schema, WSDL, XSLT, etc. were gigantic. XML Schema and WSDL are both so big they are split into multiple parts. XSLT 3.0 is maybe 500 pages when printed. Many of these originated in the wrong sort of place, designed by committee and from inside big enterprises like IBM, rather than being adopted from evolving practices.

With XML out of the way, we still need a way to represent structured data, and it turns out JSON is pretty good for that. JSON has problems, but its simplicity is what lead to it becoming so prevalent in the first place. Now, we also need a way to define the structure and format of that data, and JSON Schema is pretty good for that. I think JSON Schema could have been a bit simpler, but it's still nowhere near to making the same mistakes as XML Schema.

mstade · on Sept 10, 2021

> I think JSON Schema could have been a bit simpler, but it's still nowhere near to making the same mistakes as XML Schema.

Agree with every single one of your points, and this in particular. Thankfully, there are some heroes out there developing excellent tooling to make working with JSON schema easier. I've been making good use of Ajv[1], and highly recommend it to anyone having to deal with JSON payloads and schemas in a javascript environment. (I'm not affiliated with Ajv in any way, just happy with it.)

Working without a schema is much faster when getting started, obviously, but once things solidify in my experience a schema will help more than it costs and excellent tooling goes a long way in easing that transition from schema-less POCs to stable high quality implementations.

[1]: https://ajv.js.org

dnautics · on Sept 10, 2021

I used to resent the changes between the draft JSONSchema, but on reflection, almost all of them have been, IMO, extremely well chosen.

indymike · on Sept 10, 2021

> Developers don't like complexity.

Judging by the code that is in front of me right now, I'm pretty sure they really do like it... or at least make lots of it.

shados · on Sept 10, 2021

They like to produce complexity, else they get bored while writing the code. But they don't like OTHER developer's complexity.

onionisafruit · on Sept 10, 2021

There’s a crude expression about smells and feces that applies here.

I used to struggle to understand why other people didn’t like using all the clever abstractions I came up with. I felt unappreciated and thought I needed to learn how to communicate better. It turns out I was communicating just fine. What I needed to learn was how to accept feedback.

DrBazza · on Sept 10, 2021

"If I had more time I would have written a shorter letter" (Pascal, apparently).

jbverschoor · on Sept 10, 2021

Yeah they (including younger me) loved abstractions and complexity I wrote or I read about.

But it’s complex and difficult to grasp, so it’s not fun when you have to get into someone else’s mind, or even to get back into your own thinking after period of a sense of that project/architecture

shados · on Sept 10, 2021

Most new software devs go through that phase. A few decades ago, it was about the Gang of 4 design patterns. We were all reading that book and then looking for every possible place we could implement those patterns, even when they made absolutely no sense.

I'm so sorry for the teams I worked with in those days. The worse is I felt the folks around me were worse devs for not doing the same. It's so embarassing in hindsight.

lolive · on Sept 10, 2021

The rule is "K.I.S.S (and add more RAM !!!)"

arethuza · on Sept 10, 2021

Strongly agree - I've worked with ONC RPC, CORBA, SOAP, XML & XSD, XSLT etc. and I much prefer the relatively straightforward nature of JSON and JSON Schema. Yes it's not perfect - but for me it is more than good enough.

Edit: Of course, there is no direct equivalent of XSLT in the JSON world - which pretty much counts as a feature to me.

espadrine · on Sept 10, 2021

One thing I am wondering is whether the protobuf folks have a better development environment overall, or a worse one. Is the schema-to-self-validating-serialization-code approach going to win just as JSON won over XML?

SV_BubbleTime · on Sept 10, 2021

This is what I was thinking, but no, skeptical. Protobufs are far more efficient, but dump the human readability for the data transfer, and that is where the logging and debugging happen quite a bit.

As to code generation, if you use someone’s package for confirm_my_schema(thisjson) isn’t that basically code generation just moved somewhere else?

dnautics · on Sept 10, 2021

not all jsonschema validators are codegen/macros, so some of them are relatively unperformant and come with a lot of code baggage.

lolive · on Sept 10, 2021

Nor XPath.

arethuza · on Sept 10, 2021

I think XPath is actually one of the nicer bits of the XML universe - there is a JSON equivalent in JSONPath that I have used a bit.

shados · on Sept 10, 2021

This. Back in the XML days, I had to spend way to long reading XSLT and WSDL specs just to do basic things.

JSON Schema takes 15 minutes to learn and it's a heck of a lot better than writing my own validator.

kords · on Sept 10, 2021

I remember I was enjoying XSLT; I used it to create templates and, in the end, valid XHTML pages. I thought that's the future.

kiliancs · on Sept 10, 2021

At some point I made a news-articles site and I took that approach rather than databases: the owner would write XML files for each article that would be pretty simple: e.g. with tags like `title`, `image` (left, right, hero)`, some asides and some basic formatting for the body of the article, and XSLT, templates and CSS would make it a proper page. All the magic happened in the browser. I still think that's pretty cool.

shados · on Sept 10, 2021

That's pretty much what it was designed to do. You could push it pretty far too. Relational databases at the time all implemented XML types, so you could have XML all the way down, and XSLT is how you'd go from point A to point B.

But in the end, simpler alternatives prevailed.

dehrmann · on Sept 10, 2021

It is a functional language, and the cool kids are really into those.

noworld · on Sept 10, 2021

So what makes JSON Schema better is that nobody will "build an enormous ecosystem of tools on top" of it?

lolive · on Sept 10, 2021

It depends the (complexity of the) JSON schema to mainpulate.

Bombthecat · on Sept 10, 2021

You don't need to?

Simplicitas · on Sept 10, 2021

"[humans] don't like complexity"

andyjohnson0 · on Sept 10, 2021

> Developers don't like complexity.

This may be true, but they also can't resist it's temptation.

As long as we build tools that are composable, developers will continue to build complexity on top of them. I fear that json's fate is sealed.

csmpltn · on Sept 10, 2021

We've come full circle back to XML SOAP.

We should consider restricting the registration of .org domains to actual non-profit organizations, and restricting the use of words like "schema" and "standard" to things that have been fully certified as such by internationally accredited engineering bodies.

This doesn't add anything on-top of the tools we've had 40 years ago. We're stuck with this everywhere now, all thanks to pure marketing.

dgb23 · on Sept 10, 2021

JSON syntax is more familiar for people who have not been exposed to XML for long.

Parsing and walking a JSON structure is much easier and less ambiguous. It's often very clear how to represent/deserialize/serialize JSON (as-is).

The benefits of using schema are clear, you can add validation and semantics to parse (JSON) into a richer and more consistent data structure.

JSON-schema is in large parts a well designed and clear language that is meant to be read and written by humans. Compare that to DTD, XML schema.

I think in terms of raw functionality, there are very strong parallels. And even though I'm not an XML hater, I find JSON simply more pleasant to work with.

The only fundamental advantage that I would give XML over JSON is the fact that you can encode meta data in XML in a more idiomatic, standard way via attributes. But we all know that is not how XML is being used. If the ecosystem around XML wasn't abusing attributes to encode core/primary data, then there would be a stronger argument for the language IMO.

da_chicken · on Sept 10, 2021

> The only fundamental advantage that I would give XML over JSON is the fact that you can encode meta data in XML in a more idiomatic, standard way via attributes. But we all know that is not how XML is being used. If the ecosystem around XML wasn't abusing attributes to encode core/primary data, then there would be a stronger argument for the language IMO.

Attributes are absolutely one of the reasons for XML's failure. They seem so innocuous, until you encounter a schema where the developer created:

  <Entity ID="12345" Type="Person">
    <Property Name="FirstName" Value="Harry" Type="String" />
    <Property Name="LastName" Value="Potter" Type="String" />
    <Property Name="DateOfBirth" Value="1/1/1980" Type="DateTime" />
  </Entity>

And you just want to hunt those people down and slap them.

Nevermind if you've ever encountered XML with mixed namespaces. Then again, reading the article the bit at the end is exactly the same sort of nastiness.

dgb23 · on Sept 10, 2021

That's a very illustrative example. It completely and utterly destroys the benefit of having attributes and makes the structure less clear in context of the XML ecosystem.

I imagine if there was discipline around attributes being only used for meta-data and maybe explicit (de-normalized) references (which is arguably meta data, happy to discuss), then XML would be semantically richer by default. However like this we are left guessing the semantics if we encounter some XML.

Somewhat in contrast, HTML is generally, but not consistently, better at making this distinction, which enables clearer semantics around accessibility, HATEOAS and so on. Ironically we are kicking that value proposition into the can when driving web application rendering with JSON (like popular SPA frameworks do), so we have to re-invent meta data and communicate that with JSON-schema.

oaiey · on Sept 10, 2021

XSD is not as bad as everyone claims. JSON Schema vs. XSD complexity is IMHO pretty much the same. Working with both over years.

DTD however, let us not talk about that. Let us just forget it.

I think the drama in XML usage was basically that a simple RPC (or resource query) had at least a dozen namespaces, standards and very long element names. But that is not the fault of XML but of SOAP/WS-*. XML itself can be very beautiful and simple.

Oh, and horrible DOM Apis ;). When working in semi-modern .NET parsing XML/JSON is super easy. Doing the same in JavaScript (out-of-the-box) is a drama.

meltedcapacitor · on Sept 10, 2021

Namespaces killed xml. Such a bad design. It may have worked out better with every schema required to explicitly flatten in whatever they need from sub-grammars.

I see this json schema thing also seems to use long intricate URLs that you can only copy-paste to identify things... at least in this day and age if one does ID with unwieldy blobs, it'd be good to at least use some hash to get the benefits of content addressing.

staticassertion · on Sept 10, 2021

Maybe someone can chime in but parsing XML seems very much more difficult than parsing JSON.

dgb23 · on Sept 10, 2021

Just the raw parsing is a solved problem from a user's perspective, the issue arises when you ask yourself "into what" you parse or rather what the semantics of the tree is.

It even might not matter whether you parse it programmatically or not in some cases. Typically when looking at a given XML (or other data structure text representations) you can pattern match the semantics of what you see, but that is not always true, because everyone uses it in a different way, relying almost completely on schema and documentation to convey semantics, which in turn makes the format strictly inferior to JSON, which also doesn't have rich semantics but is simpler.

oaiey · on Sept 10, 2021

Parsing XML is just grammar like JSON is ... that is not more or less difficult. JavaScript just has a built-in JSON Parser (which translate JSON into JavaScript object).

As a consequence, JSON in JavaScript is super easy. Using XML is medium-hard.

In .NET using XML or JSON is easy (not super easy).

In C++ using XML or JSON is medium-hard.

Do not know the state of the art for Go, Rust, Java, ...

As a consequence, JSON is - from a JavaScript developer perspective - the holy grail. From a .NET/Java/C++ developer perspective it is just another format.

radomir_cernoch · on Sept 10, 2021

In can't agree more.

In Java, there's the JAXB ”parser“ which translates an XML schema into (mostly) straightforward classes. Correct initial setup is the only hard part.

diroussel · on Sept 10, 2021

The key difference is that there are no options when you parse json. You just parse it and get some lists and maps back.

In XML you have to decide if you want DTD or schema enabled (which could provide default values, and could be a security risk) and where abs how you resolve the DTDs from etc. Do you resolve references? And many option options.

So JSON benefits not just from a simpler structure, but also from less optionality.

oaiey · on Sept 10, 2021

You do not pull schemas dynamically. Schemalocation beyond editors is a pure theory thing. It is not default, it cannot be fast. Any serious developer has the schemas as files in their code when checking the data for compliance.

Basically, in a static typed language you decide for schema checking or not - like for Json -, serializing in structures or access in an API - like for Json.

It is just that JavaScript maps json directly in their dynamic object, array and native types. For xml that is an mismatch. For static languages both are an mismatch.

staticassertion · on Sept 10, 2021

> Parsing XML is just grammar like JSON is ... that is not more or less difficult.

My understanding is that that's not the case. XML parsers are Pull or SAX based and it's weird as fuck.

You can abstract that out, but parser complexity isn't something to sneeze out. For one thing, parsing XML is notoriously dangerous - tons of vulns both at a design and memory management level.

dolmen · on Sept 10, 2021

> I think the drama in XML usage ...

What about:

- verbosity

- complexity of writing a parser

- complexity of dumping data

- DTDs

- data model which doesn't map directly to a scalar/array/associative array

... and so many other XML features that are either useless or painful just for RPC.

oaiey · on Sept 10, 2021

Verbosity is not a thing. We read HTML all the time.

Parser writing and dumping of data is equally complex for both JSON and XML. The only difference is that in JavaScript JSON is so much simpler due to the dynamic nature of the language and the core library. Out-of-the-box XML parsing in JS is just stone-age. Compare that to .NET System.Xml.XDocument and Newtonsoft.Json, then you can see the simplicity how an API could look (in a static type system ;))

No one needs DTDs. No modern contract is every spec'ed in DTDs. XSDSchema has superseeded that 20 years ago. The only issue with it, is that some entry level tutorials are still mentioning it.

The data model mapping depends a lot on the target language. In JavaScript XML is not a match but JSON is. Working with JSON in C++ is equally stupid as working with XML is.

The only stupid feature in XML is namespaces. Without that, it is simple.

theamk · on Sept 10, 2021

> Parser writing and dumping of data is equally complex for both JSON and XML.

Strongly disagree. Sure, you can make XML which is easy to parse, but parsing arbitrary XML is hard. Namespaces (and their short prefixes), and custom entities are something that fly in my face any time I use XML but simply do not exist in JSON.

Dumping is usually OK but again, I've seen XML libraries replace short namespace prefixes with full URLs when dumping, making simple documents unreadable.

easrng · on Sept 10, 2021

It's easy in JS too, just use a DOMParser

no_wizard · on Sept 10, 2021

That is not a JavaScript API, that’s a DOM only API, meaning it’s not reproduced in the language spec. This is important because it won’t work in Node, for example, without a 3rd party library or someone implementing it in Node core.

JSON however is supported in the language spec

oaiey · on Sept 10, 2021

And DOM API is an example of bad API design. Compare that to a modern XML API (like .NET System.Xml.XDocument) and you never want to see Dom again

paulddraper · on Sept 10, 2021

Do you mean XSD?

XSD = JSONSchema

WSDL = OpenAPI

SOAP = there really isn't a direct analog maybe ""REST""

---

In any case, try creating a SOAP service and WSDL. It's literally so complex that most tutorials have to rely on an IDE to do it. Humans are simply not equal to the task.

Yes, the purpose is similar, but only those who have never used them would think they're the same.

k__ · on Sept 10, 2021

I had to use SOAP once... for one API endpoint.

I'd say it was the worst dev experience of my whole career, and I fought with objective-c and app stores.

m_st · on Sept 10, 2021

Using SOAP APIs daily since over 15 years with .NET Framework development. Simple and easy to write and then easy to consume through Visual Studio generated code.

Tried some ObjC development early on the iPhone. Gave up quickly.

I agree that SOAP looks like a mess. But given the right tools, it honestly just works. Never had a single issue with these APIs apart being stuck in .NET Framework land.

kgwxd · on Sept 10, 2021

Been using it daily myself for just as long with VS. Being basically exclusive to .Net is a huge set back. I would never make a new SOAP services these days. There are other solutions that just work with far less complexity, waste, and limitations.

paulddraper · on Sept 10, 2021

And that's the issue.

SOAP isn't some nice standardized language-agnostic API layer.

SOAP is .NET/Visual Studio RPC. (Not technically, but in practice, due to its complexity.)

And hey, great for you. But a lot of people want APIs to be broadly compatible.

xienze · on Sept 10, 2021

> SOAP isn't some nice standardized language-agnostic API layer.

What? Of course it is (well, it's a messaging protocol). Where on earth did you get the idea that it's some Microsoft-only proprietary technology?

theamk · on Sept 10, 2021

You missed the word "nice" in the GP's message. Also "(Not technically, but in practice, due to its complexity.)"

No one says it is proprietary technology and yes, it is nominally standardized. It is just so complex that it is impossible to use without heavy tooling support, and this tooling is only really usable in Microsoft's technology.

So it seems that people who use MS ecosystem like SOAP ... while almost everyone else who tried it calls is over-complicated, incomprehensible and "worst dev experience of my whole career"

xienze · on Sept 10, 2021

> It is just so complex that it is impossible to use without heavy tooling support, and this tooling is only really usable in Microsoft's technology.

Again, gotta call BS here. My early career was 100% SOAP and related WS-* technologies. And it was all in Java, Apache Axis and friends. Wildly popular at the time. Give it your WSDL, out pops a nice Java client. People didn’t like it not because it was difficult to use outside of the Microsoft ecosystem, they didn’t like it because it was the concepts were difficult to understand and there were many complicated technologies piggybacking off of it. It was hard to debug issues, Visual Studio or not. If you had to do something that a generated client could do, it was difficult, Visual Studio or not.

spookthesunset · on Sept 10, 2021

Try calling a soap service with python. Oh boy enjoy the pain!

arethuza · on Sept 10, 2021

"it honestly just works"

Which I would agree is most of the time - but then it doesn't and you have to try and work out what is going wrong from reams of impenetrable generated WSDL. Which is fun.

k__ · on Sept 10, 2021

Yes, I had to use it in a Node.js project.

Would imagine, with the right tooling and TypeScript it wouldn't have been that bad.

dolmen · on Sept 10, 2021

Looks like a case of Stockholm syndrom.

kdtsh · on Sept 11, 2021

Over the past year I’ve built an ESB for a client’s deployment of a brand spanking new distribution management system … which you could integrate with only and solely using SOAP. The tooling in Java makes it bearable, but I am still shocked that this system deployed in 2021 has this.

miohtama · on Sept 10, 2021

I did XSD validator back in 2003 and indeed this blog post gives me familiar vibes.

I do believe there has to be some sort of common schema format, but I have no idea how this problem could be solved cleanly in practice, even after 20 years.

sebazzz · on Sept 10, 2021

And the tools have trouble too. For instance: Java axis2 generating non-deterministic class names (Product1 in one compilation, Product0 in another compilation after).

ivolimmen · on Sept 10, 2021

Well let's turn the statement around then: can you create a OpenAPI YAML by hand then? At least with XML and XSD you have proper auto complete.

I really no need see the need to use a schema for JSON. The only thing it has going for it compared to SOAP/XML is the fact the it faster parsing and that it has no schema.

paulddraper · on Sept 10, 2021

I can certainly create a JSONSchema (the XSD equivalent) by from memory (except for the $schema URI):

    {
      "type": "object",
      "properties": {
        "foo": { "type": "string" }
      }
    }

And at least in VSCode, I get autocomplete too.

OpenAPI...I personally don't use this as much but yes there are hundred tutorials on creating definitions by hand.

pionar · on Sept 10, 2021

Yes, you can. Since OpenAPI itself has a JSON schema, you can get autocomplete from any editor that supports JSON schema.

At my company, we found that the auto-OAS generation tools were too brittle and didn't communicate enough information, so we actually do write our own OAS and run automation to make sure it's accurate.

That's also how our architects communicate with developers about an endpoint - create an OAS for it.

stueynz · on Sept 11, 2021

> At my company, we found that the auto-OAS generation tools were too brittle and didn't communicate enough information, so we actually do write our own OAS and run automation to make sure it's accurate.

I found all OAS spec authoring tooling to be useless toys when building non-trivial real world specs.

The depths of depravity I’m forced to is as follows:

1. Data model in SparxEA because it’s the tool we have in the depths of govt. Generate XSD from tool

2. XSLT to turn the XSD into matching JSON Schema because I’m old and can still remember how.

3. Hand write the OAS with $ref into JSON schema as appropriate

4. Curate sample payloads to be educationally useful

5. Generate CURL and various code samples

6. Bundle it all into approx 80k lines of machine written wondrousness -

7. ReDoc and Mkdocs to build static website with more documentation

All the work is at data modeling, OAS sample curation and mkdocs documentation

Things I wish JSON and Schema had:

1. An equivalent of xs:decimal that JS would use number & double are just plain useless in real world commerce with dollars & cents and bulk item pricing with 3&4 decimal places

2. Has taken the time to fix rfc3399 and give Dates a Timezone suffix. Too many languages Marshall a date into datetime. How many bugs have I seen where it’s broken in the morning and works in the afternoon because the damn server is running in AU with UTC Timezone and we’re in NZ

dolmen · on Sept 10, 2021

> can you create a OpenAPI YAML by hand then?

Of course.

> At least with XML and XSD you have proper auto complete.

Tooling for OpenAPI exist. In VS Code for example.

> I really no need see the need to use a schema for JSON

Don't you see the need for precise machine readable description of JSON input and output for a REST API for validation? This is OpenAPI..

IggleSniggle · on Sept 10, 2021

You can get proper autocomplete out of OpenAPI, powered by typescript.

tylercrompton · on Sept 10, 2021

> We should consider restricting the registration of .org domains to actual non-profit organizations

Too late for that

> We should consider […] restricting the use of words like "schema" and "standard" to things that have been fully certified as such by internationally accredited engineering bodies.

Too late for that and unenforceable

hyperpallium2 · on Sept 10, 2021

back to CORBA

DonHopkins · on Sept 10, 2021

The World's Second Fully Modular Software Disaster!

It's just riddled with features.

dnautics · on Sept 10, 2021

what was the first fully modular software disaster?

DonHopkins · on Sept 10, 2021

Glad you asked! ;)

https://donhopkins.medium.com/the-x-windows-disaster-128d398...

[...] X-Windows is the Iran-Contra of graphical user interfaces: a tragedy of political compromises, entangled alliances, marketing hype, and just plain greed. X-Windows is to memory as Ronald Reagan was to money. Years of “Voodoo Ergonomics” have resulted in an unprecedented memory deficit of gargantuan proportions. Divisive dependencies, distributed deadlocks, and partisan protocols have tightened gridlocks, aggravated race conditions, and promulgated double standards. [...]

X: The First Fully Modular Software Disaster

>X-Windows started out as one man’s project in an office on the fifth floor of MIT’s Laboratory for Computer Science. A wizardly hacker, who was familiar with W, a window system written at Stanford University as part of the V project, decided to write a distributed graphical display server. The idea was to allow a program, called a client, to run on one computer and allow it to display on another computer that was running a special program called a window server. The two computers might be VAXes or Suns, or one of each, as long as the computers were networked together and each implemented the X protocol.

[...]

X-Windows: …A mistake carried out to perfection. X-Windows: …Dissatisfaction guaranteed. X-Windows: …Don’t get frustrated without it. X-Windows: …Even your dog won’t like it. X-Windows: …Flaky and built to stay that way. X-Windows: …Complex non-solutions to simple non-problems. X-Windows: …Flawed beyond belief. X-Windows: …Form follows malfunction. X-Windows: …Garbage at your fingertips. X-Windows: …Ignorance is our most important resource. X-Windows: …It could be worse, but it’ll take time. X-Windows: …It could happen to you. X-Windows: …Japan’s secret weapon. X-Windows: …Let it get in your way. X-Windows: …Live the nightmare. X-Windows: …More than enough rope. X-Windows: …Never had it, never will. X-Windows: …No hardware is safe. X-Windows: …Power tools for power fools. X-Windows: …Putting new limits on productivity. X-Windows: …Simplicity made complex. X-Windows: …The cutting edge of obsolescence. X-Windows: …The art of incompetence. X-Windows: …The defacto substandard. X-Windows: …The first fully modular software disaster. X-Windows: …The joke that kills. X-Windows: …The problem for your problem. X-Windows: …There’s got to be a better way. X-Windows: …Warn your friends about it. X-Windows: …You’d better sit down. X-Windows: …You’ll envy the dead.

_hzrk · on Sept 10, 2021

> CORBA

"Well, it's high noon somewhere in the world."

ChrisMarshallNY · on Sept 10, 2021

> CORBA

Get off my lawn!

ChrisMarshallNY · on Sept 10, 2021

I generally prefer using JSON in my interactions, where possible.

That's because it is lightweight, and 99.9% of the data I'm transferring is scalar. JSON is basically "implied" for scalar types.

The good thing (if you want to call it "good") about XML, is Schema.

Schema is a "rock hard" contract. It is definite, empirical, unambiguous. When I am looking at an API, and it has a Schema, then I know that I can figure out exactly what shape the data will take/emit.

If possible, I still use JSON for the exchange. I just use the XML to figure out the specifics in the exchange.

I've been working in XML forever, it seems. I've done ONVIF stuff, which is SOAP/WSDL-based, and even XSLT.

I still like JSON.

Mostly because I don't have to deal with Schema.

Over the years, I've gotten fairly good at developing XML Schemas, but I have never gotten "used" to it, and still have to look everything up.

If I develop APIs, I will often do an XML variant, alongside the JSON, because that forces me to write a Schema. Doing this, helps me to "code review" my schema, and also gives me a very convenient automated test hook.

But, boy, I still hate XML Schema...

tored · on Sept 10, 2021

One thing I'm guilty of is designing a document format as JSON, thus the need to formally verify it.

Passing data works well as JSON but documents works better as XML where you can use XML Schema.

ChrisMarshallNY · on Sept 10, 2021

This is where I would also develop an XML variant, even if it is never to be used, simply so that I can have a Schema to go along with it. I'll start with a data structure/class, in the code, then use a built-in transformer, to turn it into JSON or XML. That way, I know that the "kernel" of the data is the same, between them.

Even though JSON is a "natural" for scalar types, it can get weird, there. For example, let's say we have a float:

    { "kids": 1.0 }

It needs to be parsed as a float, because we could have:

    { "kids": 2.3 }

But I often see these sent as:

    { "kids": 1 }

or

    { "kids": "1.0" }

or even

    { "kids": "1" }

And I have differing results, based on the parser.

matja · on Sept 10, 2021

Even with integers (comes up often for large numeric IDs):

   JSON.stringify(JSON.parse('{"kids":10000000000000001}')) -> '{"kids":10000000000000000}'

How do JSON schemas handle "must be an integer", and having to pass a string to get the necessary value?

Someone · on Sept 10, 2021

That’s not an issue with json, but with json libraries. For better or (IMO) for worse

  {“kids”: 12345678901234567890123456789012345678901234567890}

is perfectly legal json.

matja · on Sept 10, 2021

If every JSON library actively chooses not to implement the JSON spec, that sounds like an issue with JSON to me.

HelloNurse · on Sept 10, 2021

More precisely, an issue with different literal types instead of one like in XML.

In XML, every attribute or text fragment is a string, to be POSSIBLY parsed as a string, integer, date etc. according to what applications choose to do (usually according to an explicit schema).

In JSON, YAML, etc. there are at least integers, strings and floating point numbers, introducing arbitrary representation choices (1, 1.0, "1", "1.0") and implementation artifacts (e.g. numerical limits).

matja · on Sept 10, 2021

Separate types is fine, but JSON doesn't specify a minimum range or precision of the number type that a implementation must have, so the number type is unreliable (and string has to be used instead). To handle any number, a JSON library needs to use an arbitrary-precision number type, which no common one does because of the huge performance hit.

So, no better than XML really :D

Too · on Sept 11, 2021

Look at MongoDBs "extended json". They serialize this as { "$numberLong": "10000000000000001" }.

With a custom extension to the json-schema adding property bsonType:long, instead of type:int. Along with a few other data types like datetimes, uuid, regex. It can be integrated with custom hooks in many schema validators.

tored · on Sept 10, 2021

I think you can use a string type with a regexp pattern.

tored · on Sept 10, 2021

Perhaps create your own custom type floatable that can be either be of type float, integer or string with a regexp pattern.

jacobmischka · on Sept 10, 2021

I'd like to offer a contrasting opinion to all of the other currently negative comments: I've used JSON schema in the past to validate outside input and it was a pleasant and straightforward experience.

There are unfortunately no standard type declarations that I'm aware of, which is a pain, but tools exist to translate schema.org schemas[1] (I have not used this myself).

[1]: https://github.com/charlestati/schema-org-json-schemas

lwn · on Sept 10, 2021

I'm using json schema all the time. The big step forwards for me is I can validate data everywhere down the application pipeline, no matter the programming language, using a single document (single source of truth), and it's human readable.

solids · on Sept 10, 2021

I agree with you, I didn’t expect the amount of negative comments. Used in the past and it was a great tool to “avoid representing invalid states” and also as documentation to share between teams.

chrisjc · on Sept 10, 2021

> I didn’t expect the amount of negative comments

I did!

It's been made clear time and time again on HN that "everyone" HATES XML and anything to do with it... especially XSD, WSDL, etc!

I really don't see any difference between XML and JSON except the semantics (I lie, I see a lot of reasons why XML is more powerful). But I do get why people prefer JSON over XML for visual reasons alone.

Since they're very similar logically/structurally, it only makes sense that the same things that are possible with XML are possible with JSON. And here we are, discussing JSON schemas!

So is it really that they hate XML and love JSON for any other reason besides the visual ones?

If not, what is it that they hate?

I think it has to do with the responsibility and effort that goes into creating a contract, the "schema". All the anxiety and headbanging that goes into setting up all the meetings and discussions. All the convincing that happens over and over again just to move forward. All of the back-and-forth that goes into whether it's this type or that bc the requirements are ambiguous to begin with and now it's the developer's responsibility. "I could just throw this API together in a day if it wasn't for all these useless meetings. Why do I have to wait for Bob to update the schema when I can just cast this field to a boolean in my code?". "I don't want to have to deal with validation exceptions at runtime, just fix your damn code and read the wiki I put together." etc.

(I do understand that not all APIs, data exchange files, tightly coupled project, etc... need schemas)

In my experience, while these schema-less approaches lead to projects being leaner and completed sooner, the price is payed continuously there after. Something changes on the server-side, some edge case appears on the UI several weeks after release to prod. Enhance the structure, breaks ETLs. And so on.

A perceptual state of keeping everything in tranquility emerges.

spookthesunset · on Sept 10, 2021

> Something changes on the server-side, some edge case appears on the UI several weeks after release to prod.

Schemas don’t fix this problem. The reason the thing probably broke is the change wasn’t backwards compatible with the clients calling the service. No schema will fix that.

If the client hasn’t been updated to consume the breaking change it is gonna break, period. Schemas don’t get you out of handling breaking changes to your API.

That being said they might expose an unintentional breaking change in automated testing of the server. Which, I suppose, is a good thing. I mean using a standard tool for schema validation in testing… yeah I can get on board with that.

But no schema validation will ever be able to “fix” an intentional breaking change to an API.

In conclusion, writing this comment now has me kinda on board with schema validation for automated testing. It probably won’t catch all forms of unintentional breaking changes but it could catch some.

chrisjc · on Sept 13, 2021

Just to clarify, I wasn't suggesting that schema validation will ever be able to "fix" a breaking change to an API, intentional or not.

And I'm not suggesting that you release new versions of server-side APIs without doing the same on the client-side. (although I understand there are a whole lot of deployments where that's the norm)

What I'm suggesting is that if there are changes made on the server-side that introduce breaking changes to the schema, should be easily picked up at build time (test phase, maybe even compilation).

Of course there are going to be exceptions to this when perhaps using a language's truthiness, or code that converts everything in the payload to a string... But I', pretty sure for 99% of the cases, esp if you use the schema and tools based on schema (code gen) you'll end up in a better place.

stueynz · on Sept 11, 2021

Amen brother (or sister)

tored · on Sept 10, 2021

When I used JSON schema last time a few years ago the different implementations of it in PHP and nodejs where either incorrect of feature incomplete. Seems like it is hard to get right.

relequestual · on Sept 10, 2021

If only they had used used the official test suite... which is provided, free of charge, in JSON.

dolmen · on Sept 10, 2021

The main problem is that JSON Schema is still an evolving design. All tools may not yet implement the latest draft.

paulddraper · on Sept 10, 2021

I agree.

I use JSONSchema a lot. It's as pleasant as one would reasonable expect.

RustyRussell · on Sept 10, 2021

I use jsonschema to validate my projects' output, and also generate the documentation from it. The python jsonschema support makes this possible: our testsuite is in Python already.

But it's an awkward fit, and I despair of having anyone else in my team write schemas: the default of allowing additional unspecified fields must be continually overridden, otherwise your schema has no teeth, and things like "if this field is this value, these additional fields exist" must then always have an "else" indicating that no additional fields exist.

In summary, it's better than nothing, but it's not easy. I'm not sure that JSON is a great language to specify schemas in, sorry.

paulddraper · on Sept 10, 2021

> otherwise your schema has no teeth

I don't disagree, but FWIW, this is common deliberate choice.

E.g. Protobufs work exactly the same way. Except unlike JSONSchema, there's no way to disable it.

The reasoning is to permit future extensibility.

RustyRussell · on Sept 12, 2021

For JSON itself, ignoring extra fields it makes sense. For validating my own output, it does not.

And validating someone else's output seems counterproductive?

tored · on Sept 10, 2021

Yes, the unspecified fields is a nuisance.

It is hard to manually look at JSON data and compare it visually to a JSON schema because the schema has a depth to every property and lots of other cruft, which makes it somewhat cumbersome to retrofit a JSON schema to existing JSON data.

I once developed an alternative JSON schema internally for a company where the structure was the same as the JSON data (and default strict of course). We implemented multiple implementations for every language we used. It sort of worked, not as feature complete as the official JSON schema of course, but my conclusion is that JSON doesn't fit well for this.

diegoperini · on Sept 10, 2021

I haven't stumbled upon a better schema for JSON than the Typescript definition files. If only I could use them in non Typescript contexts.

https://www.typescriptlang.org/docs/handbook/declaration-fil...

YousefED · on Sept 10, 2021

This is why I originally started a project to convert ts to json schema; https://github.com/YousefED/typescript-json-schema (and also check out a good alternative https://github.com/vega/ts-json-schema-generator)

panzerklein · on Sept 10, 2021

You can write validation schemas using zod library [1]. They end up looking pretty similar to typescript definitions (same vocabulary, same methods for combining/intersecting/filtering). And if you migrate to typescript you'll get type definitions for free out of those schemas.

[1] https://github.com/colinhacks/zod

IggleSniggle · on Sept 10, 2021

If you like zod, you should check out myzod: https://github.com/davidmdm/myzod

anentropic · on Sept 10, 2021

you might like https://jsontypedef.com/ if you haven't seen it

ironman1478 · on Sept 10, 2021

I started using jsonschema at work and can't go back. We have an extremely large configuration file to configure our system and there are lots of 'optional' blocks that can be configured. Jsonschema really cleanly allows you to express that logic without having to write a line of code. It's been great.

Protip: if you are using jsonschema in python, it takes in a dict() not a file, so you can use yaml (or something else) as your configuration file and still validate it with jsonschema.

akie · on Sept 10, 2021

I might be exceptionally dense, but it's hard for me to see practical applications for something like this. In the end, if you implement this in your application, you will have a mechanism to say "this input document is invalid". AND THEN WHAT? Your only option is to discard it.

I'd rather live by the old maxim "be liberal in what you accept, and strict in what you produce". But perhaps I'm overlooking an important use case here, in which I'd happily stand corrected.

66fm472tjy7 · on Sept 10, 2021

There are things you cannot accept. What do you do if an order has a negative quantity, if you get a string instead of a number, if a value does not fit in your DB column, etc ?

Having this as part of your interface definition, along with generated representations of the possible data in the language your choice[0] and automatic (so you can't forget it) validation that rejects invalid input with HTTP 400 bad request instead of 500 internal server error when the DB constraint fails (or even worse, you persist invalid data and something breaks later) is definitively useful.

That said, OpenAPI fails at this somewhat as the people writing the spec don't consider the (de)serialization and validation libraries. Thus, OpenAPI keeps adding features/constraints that are not supported by the libraries and are thus cannot be automatically validated or even produce nonsense output on generation (e.g. in Java you might end up with raw Map or even Object fields)

[0] https://openapi.tools/#sdk

jonathanlydall · on Sept 10, 2021

I feel exactly the same.

I also want to add that consumers with an appropriate framework can automatically generate a response body explaining where in the payload that errors were encountered and what those errors are. This allows API consumers to understand what is going wrong far more easily that a 500 being returned.

ASP.NET for example can do exactly this with validation on incoming payloads.

If the payload is invalid, then either way there will be an error, but it's nice to be able to give more helpful errors.

dolmen · on Sept 10, 2021

Input validation errors should be reported using HTTP code 400.

reaperducer · on Sept 10, 2021

What do you do if an order has a negative quantity, if you get a string instead of a number, if a value does not fit in your DB column, etc ?

Isn't that all just basic data validation and sanitation that we all do anyway? And wouldn't it still have to be done, even if the JSON came with all the schemabloat?

I don't think I'm suddenly going to accept that the data presented to my function is valid and fit for purposes just because it goes through someone's JSON schema library. That's just passing the responsibility on to a third party, which feels lazy and dangerous to me.

Maybe it'll be OK for systems that don't get unusual data. But mine are always ingesting strange things, and the only person who knows what's truly valid and what's not is me, not a third party. That's my responsibility as the programmer of the system.

I looked through the web page, and it's probably great for someone. Somewhere. But I'd like to see that SKU example that was presented fully fleshed out with real-world type data. It seems OK if your data only consists of "productID," "productName," and "productTags." But in real life, SKUs are often incredibly complex, and this just seems to invite disaster by making complex systems even more complex. So much so that they require even more complex tools to manage them.

66fm472tjy7 · on Sept 10, 2021

  I don't think I'm suddenly going to accept that the data presented to my function is valid and fit for purposes just because it goes through someone's JSON schema library. That's just passing the responsibility on to a third party, which feels lazy and dangerous to me.

Do you also distrust that third parties can implement HTTP, generate SQL, enforce the constraints in your DDL, do arithmetic operations, etc. correctly ?

Of course there is still a lot of semantic validation you have to do manually (e.g. for an order: does the referenced product exist, is the seller allowed to sell it, at this time, for this price, for this currency, taxes, do the line prices add up to the order price, contingents, etc., etc.).

Expressing these constraints in your interface definition is just another declarative validation layer like your implementation language's type system, database schema/constraints, generic validation libraries[0].

Also, as the server author you are usually also the author of the interface definition containing all these constraints.

[0] E.g. https://beanvalidation.org . The OpenAPI to Java generator will express constraints using BV annotations. BV also integrates with the Hibernate ORM to validate data before you send it to the DB, or you can manually annotate your DTOs/domain model and programmatically trigger the validation at any time, so you only have to know one set of annotations and rely on a single library for validation in any layer of your application.

ClumsyPilot · on Sept 10, 2021

"That's just passing the responsibility on to a third party, which feels lazy and dangerous to me."

You trust the memory manager, the SSL library and countless others to do their jobs, data validation is no different.

Whether a particular library is up to scratch, is, ofcourse, debatable.

Any library worth it's salt, like asp.net, allows you to define custon data type, like SKU, with whatever validation your heart desires.

da_chicken · on Sept 10, 2021

> I don't think I'm suddenly going to accept that the data presented to my function is valid and fit for purposes just because it goes through someone's JSON schema library. That's just passing the responsibility on to a third party, which feels lazy and dangerous to me.

The purpose of a schema isn't to ensure that every business rule is satisfied. It's just the first step of data validation. It's step 1 of input triage. It's saying, "I'm not even going to look at your input until you can demonstrate that the request meets the minimum requirements of a theoretically valid request."

If I'm asking for a Student, and I say they have to have a StudentId, a FirstName, a LastName, and a DateOfBirth, then I can put that into a schema and immediately reject any submission that tries to submit Surname or GivenName. I can go further and say that StudentId is up to 25 characters and can't be blank. DateOfBirth has to be present, must be a date in yyyy-MM-dd format, and it must be in the past, etc.

You can reject the data as incomplete or invalid at a glance. You're setting up a basic set of rules that allows you to instantly discard a whole range of invalid input.

dolmen · on Sept 10, 2021

>> What do you do if an order has a negative quantity, if you get a string instead of a number, if a value does not fit in your DB column, etc ?

> Isn't that all just basic data validation and sanitation that we all do anyway? And wouldn't it still have to be done, even if the JSON came with all the schemabloat?

The point of JSON Schema (and schemas for serialization format in general) is to have a machine readable description of the data constraints that both a writer of the data and a reader of that data can use to automate both validation and documentation rendering.

> I don't think I'm suddenly going to accept that the data presented to my function is valid and fit for purposes just because it goes through someone's JSON schema library. That's just passing the responsibility on to a third party, which feels lazy and dangerous to me.

Do you trust the compiler/runtime of your programming language?

JSON Schema can allow to use a JSON Schema definition for translation into validation code. That may not fit all crazy validation rules, but at least that allow to automate one validation layer from a specification.

vbezhenar · on Sept 10, 2021

Why do you need to be liberal in what you accept, regarding to API? It just invites for subtle bugs. It makes sense for user-generated input, like phone numbers or addresses. But for software? You're encouraging buggy incomplete software that'll bite you or your client one way or another with that liberal approach.

API must be absolutely strict and unambiguous. It helps everyone in the end. Or you'll end up with parseInt abomination.

And for strict API you need strict validation. Probably better than schema can provide, but schema is a good start with declarative description and proven battle-tested libraries. After request was validated by the schema, you can be sure in some facts, like which properties are not null, which values are numbers and so on. This already helps to eliminate lots of unnecessary code.

akie · on Sept 10, 2021

> This already helps to eliminate lots of unnecessary code.

Well, I think you just moved the validation code to another place. Instead of putting it in your app, where (in my opinion) it belongs, you are putting it in a hard-to-read and difficult to understand external JSON file. I'm not sure how that's an improvement.

Too · on Sept 11, 2021

What is hard to read and difficult to understand about this:

    schemaValidate(input, { 
        "required": ["price"]
        "properties": { 
            "price": {
                "type": "number",
                "minimum": 0,
                "maximum": 1000
            }
        }})

vs

    if (input.hasOwnProperty("price"))
        throw Error("Missing price property")
    let price = parseInt(data["price"], 10)
    if (isNaN(price))
        throw Error("Property 'price' is not a number")
    if (price < 0)
        throw Error("Property 'price' is negative")
    if (price > 1000)
        throw Error("Property 'price' is too high")

Not claiming the later is particularly difficult, in fact it's very easy to understand, but it is sligthly more verbose and tedious, for bigger validations it can become even worse. Here the schema turns every 2-3 lines of custom validation code into 1, provides automatic and better error messages of the error cause and acts as a contract which consumers of the api can read or even use to auto generate code, GUI, autocompletion. For the later use cases you would of course put the schema in a separate public file. For more complex validation you can of course add them after calling schemaValidate.

IggleSniggle · on Sept 10, 2021

It’s an improvement because it’s a public contract that can be enforced both client(s) and server(s), and a relatively standardized interface format that is parseable by all manner of tooling.

Otherwise, it’s just in your code, where it’s potentially not just “hard-to-read”, but impossible to read for anybody but the authors.

That said, I’m partial to other solutions like protobuf, but what can you do.

pibi · on Sept 10, 2021

It's really not that hard-to-read or to-write. I have instructed dozens of young developers on that with almost no-supervision and no-issue. It's really pretty straightforward.

Also, I can replicate my validations at many levels down the stack: from the client to API-GW to the database model. All with a single definition.

yarcob · on Sept 10, 2021

Let's say you have some code that takes some JSON as input, and then iterates over the elements of some array in that JSON. Since JSON is schemaless, you need to check whether the array you want to iterate over is actually and array, and for each item you have to check if it has the correct type. Each time you access any data in the JSON you need to validate if it has the correct type, or you will end up with runtime errors.

All that validation takes a lot of effort and makes your code hard to read.

If you verify that the JSON input matches the expected schema at the very beginning, you can skip all the validation and error handling later, since you already know that is valid. This would make it a lot safer to handle external input.

Then the next step would be if language & tooling also supported JSON schema, so that the way you access JSON documents in code also doesn't violate the schema. This would help prevent bugs from programmer errors.

paulddraper · on Sept 10, 2021

On the contrary, the error message immediately states exactly how the data is malformed.

Instead of just failing somewhere deep inside the app (even perhaps deferred to later) with "cannot read property of undefined... stack trace"

Vinnl · on Sept 10, 2021

I'm relatively often manually editing `package.json` files in npm projects - it's useful if my editor can highlight when I make a typo.

spookthesunset · on Sept 10, 2021

Ah yeah, that was one of the nice things about XML documents that had a well managed XSD backing it up. Visual Studio code completion worked on your XML files!

XAML, for example, would suck if it didn't have a bunch of XSD mumbo jumbo backing it up.

sorokod · on Sept 10, 2021

liberal or not you can express your expectation in a well defined manner. As to "then what" its up to you, i'd go with some 4XX ( or other technology specific error) that implies that no processing has occured.

All arguments for and against schemas in databases translate well to this case.

kazoomonger · on Sept 11, 2021

When using tooling that accepts YAML (or JSON) configuration, I always want something to say "this is what you can write that will have an effect".

As a specific example, when I'm writing a snapcraft.yaml, I want to be able to view a schema to see what all I can put in there. What's important is that the schema is actually used by snapcraft itself for validation, otherwise it's no better than the rest of the snapcraft documentation, which is pretty meh. Schemas are also easy to digest for other tooling, so e.g. my editor can automatically highlight when I make a mistake, without having to specifically write a plugin for each tool that accepts JSON/YAML

xg15 · on Sept 10, 2021

I mean, I think there are use-cases where a schema would bring real improvement:

- Error messages: A validator doesn't just have to tell you that a document is invalid, it can also tell you why it is invalid and what you have to do to fix it.

- Better editors or viewers: If e.g. an IDE knows that your JSON is "really" a record or a graph or a table, it could show some specialized UI and make it easier to view or edit those documents.

- Smarter auto-formatting: A schema-aware pretty-printer could format JSON in a way that's better readable than just putting every field on one line.

- Documentation generation: e.g. Swagger does this well: You pass it a schema file and it generates a polished HTML documentation, complete with examples and interactive web client.

- Code generation: You could auto-generate parsers, serialisers or data structures that are specialized for a particular schema. This would probably be most useful for low-level languages or other languages where working with generic JSON is inconvenient.

- More efficient storage: A data store that "knows" that all documents conform to a particular schema could optomize by storing/querying only the schema's data structures and not a generic JSON DOM. (Though to be fair, it's not clear how large the performance improvements from this would be, especially compared to a DB that learns similar information from the usage patterns and data that is actually stored)

That's a lot of "could"s though. So far, I haven't seen a lot of those use-cases realized. What also frequently baffles me is that use-cases almost seem like an afterthought in schema design. Most specs and blog posts (like this one) read as if a schema doesn't really need any justification and actual uses are just a nice bonus.

Instead of an overengeneered spec, I'd prefer an actual software suite which realizes some of the above use-cases - and then brings a well-designed schema language with it.

I believe Swagger/OpenAPI started this way, which might be how they gained traction at all. But judging from the OP, they seem to be taken over by the astronauts again...

spookthesunset · on Sept 10, 2021

I made an argument like yours and kinda talked myself out of it.

Validation while in production makes no sense, I agree. But automated testing… I bet you can catch some unintentional changes to your JSON output while testing. Not all of them, but some. Maybe even enough to justify keeping the schema current.

streamofdigits · on Sept 10, 2021

Never had to learn the intimidating XML stack in depth but it seems clear that the (superficially definetely more digestible) JSON way of notating data must slowly and painfully reinvent the wheel. Reaching the same level of logical complexity (if it solves the same set of problems) seems unavoidable, no?

So if the main advantage of JSON is human readability (not a machine oriented attribute btw :-) might be possible to JSON-ify the XML stack, essentially focusing on appearances and preserving the substance (a bit like the JSON-LD approach)

In the end of the day these are frameworks for public data / metadata exchange and removing these frictions would be enormously beneficial to everybody...

spookthesunset · on Sept 10, 2021

What makes JSON so nice to use is it is basically how you already write complex data structures in most loosely typed languages anyway. JSON documents look a hell of a lot like how you'd write the same structure in PERL, python, javascript, and more.

It provides a very natural mapping into the types of native data structures these languages offer. Dumping native data structures in these languages out to JSON is trivial. Dumping these same data structures into XML was always a pain in the ass because you'd have to manually map every field into elements and attributes.

aeberhart · on Sept 11, 2021

I think so too. JSON Schema enables very useful tooling such as validation, Swagger-style UIs for interacting with services, or declarative web forms that people would have to build again and again.

endisneigh · on Sept 10, 2021

I’m curious - what’s an argument against this that’s logically consistent with a position that supports databases that have schemas?

sorokod · on Sept 10, 2021

Would be good if there was a clear statement about who the authors are and what makes their content authorative.

oefrha · on Sept 10, 2021

> what makes their content authorative.

Nothing, it’s just relatively widely known and used (e.g. for VS Code and Windows Terminal config files, which I expect a lot of people to have experience with), with relatively good library support in a wide selection of languages. Although I should mention that most of the libraries are stuck on old versions, and the versions they’re stuck on are all over the place: https://json-schema.org/implementations.html

brabel · on Sept 10, 2021

It's a horrible mess, made worse by the several years in which OpenAPI defined their own "dialect" of json-schema for their data schemas which was incompatible with the non-OpenAPI variant of json-schema, so you get libraries that support both, or support only a certain version of each, and because different versions tend to be completely incompatible with each other, you might run into lots of problems if you have more than one language consuming schemas, or even the same language but using different libraries due to transitive dependencies... it's just terrible, just use XML if you need a schema.

oefrha · on Sept 10, 2021

The language is a godforsaken monstrosity, but if you only have to write it once for some non-critical validation task it’s acceptable. For instance, I use JSON schema to (optionally) validate some generated JSON data which is fed into a web app as a JSON module. I also generated the TS spec for it from the JSON schema (although writing the TS spec by hand would have been 10–100x easier than writing the JSON schema). XML is obviously not suitable here — TypeScript doesn’t have XML modules.

dolmen · on Sept 10, 2021

json-schema.org is the place where the JSON Schema working group at IETF publishes its work.

They also publish on IETF infrastructure. Example: https://datatracker.ietf.org/doc/html/draft-bhutton-json-sch...

Ben Hutton who is the author of the blog post is also the editor of the latest publlished IETF drafts. https://datatracker.ietf.org/person/ben@jsonschema.dev

Do you need some more authoritative source?

oefrha · on Sept 10, 2021

> https://datatracker.ietf.org/doc/html/draft-bhutton-json-sch...

That's an expired draft on the informational track. Informational track isn't supposed to be authoritative; drafts even less, let alone expired drafts. Having submitted an Internet-Draft to IETF is an indicator of seriousness, not authority, unless you get it approved on the standards track.

sorokod · on Sept 10, 2021

Sounds like a candidate for an "About us" section of the website.

relequestual · on Sept 10, 2021

Do you mean the article or the specification?

run-types · on Sept 10, 2021

For those looking to sanitize input from Typescript, I highly recommend the runtypes library: https://github.com/pelotom/runtypes

It let's you specify type definitions a DSL in Typescript using syntax very similar to Typescript's type definitions. Once you define your types in the DSL, you get Typescript types and parsing / verification for free. Not as general purpose as JSON Schema, but 1000x cleaner and easier to use.

dolmen · on Sept 10, 2021

The point of JSON Schema is to have a common format independant of the programming language to ease interoperability.

move-on-by · on Sept 10, 2021

I've never seen runtypes, but I've seen io-ts [1]. Are you aware of the pros/cons between them?

[1] https://gcanti.github.io/io-ts/

run-types · on Sept 10, 2021

They look roughly the same

deepakarora3 · on Sept 10, 2021

Not at all to be critical of JSON schema, but my experience has been that for most use cases it is an overkill. As simple as it may be, it still is relatively complex. As someone mentioned in a prior post, if the validation fails then what? Even though the JSON schema document is human readable, just by looking at the JSON schema document, its not intuitive to visualize where the particular path resides in the actual document. And one would need tools written on top of JSON schema to be able to do operations like merge one JSON document into another while validating at the same time. To make things simple, I had created JDocs. What this allows is to write the JSON schema in exactly the same structure as the JSON data document, just that the value field contains the validation specifications. Of course it does not have all the advanced features of JSON Schema (like some properties and cross referencing abilities), but then as I said, it meets our requirements in a real straightforward and simple way and opens up multiple possibilities in manipulating JSON data. You can read about it here. I would welcome feedback and apologies if you feel it is off track. Thanks.

https://github.com/americanexpress/unify-jdocs

ether_at_cpan · on Sept 10, 2021

> ...if the validation fails then what? Even though the JSON schema document is human readable, just by looking at the JSON schema document, its not intuitive to visualize where the particular path resides in the actual document.

The JSON Schema specification actually does say that errors should indicate both the location within the data that the error occurred, and the path in the schema itself. So if you are having difficulty deciphering validation errors, that's a failure of your particular implementation failing to follow the spec, not the spec itself.

segphault · on Sept 10, 2021

The way $ref resolution works is hideously complicated, to the point where many JSON Schema implementations just don't even bother supporting external $refs. It really should be a totally separate concern from validation, especially since how you store and compose schemas may end up being specific to a given use case.

beardyw · on Sept 10, 2021

Much as I value JSON Schema, building a composite schema, to use and reuse parts, can easily hit a brick wall. I often end up building each schema programmatically and exporting to JSON. Could do better.

Waterluvian · on Sept 11, 2021

A possibly silly question about JSON Schema:

What’s with the emphasis on full URLs to describe where to find related schema? Are developers really assembling a bunch of related schema over the Internet instead of just coalescing them in one place for local use?

Maybe I just don’t understand the use case. Why would I ever want to make a client do a bunch of calls to URLs I don’t own rather than serving them up reliably and consistently at one time?

It feels to me like the idea is some utopia where independent resources everywhere provide schemas and you can start stitching them together. But it just feels… unrealistic and not what I’d actually want.

Is this actively realized today? Or does everyone just reference local schemas by relative file path like I do?

ether_at_cpan · on Sept 11, 2021

They don't have to be real URLs, but just URIs. That is -- the files don't have to be accessable on the network at those URIs -- you can just use these strings as identifiers, and load the files up manually as you need them.

In fact, the JSON Schema specification even says that identifiers are just URIs and the evaluator implementation need not be expected to have to load the documents from the network (ref. https://json-schema.org/draft/2020-12/json-schema-core.html#... and https://json-schema.org/draft/2020-12/json-schema-core.html#...)

cinaboniver · on Sept 11, 2021

To be clear, I don’t have a lot of experience with JSON schema, but recently I found myself needing to validate Azure ARM templates using json schema, and my god is the Microsoft provided schema insane. Go look for yourself: https://schema.management.azure.com/schemas/2019-04-01/deplo...

That’s the root schema definition, but there are many, many references within, and they go deep. I tried using the bundling tool mentioned in the article to bundle all the references schemas and it came out to 25MB, minified. But if you don’t bundle, then you are right, your tool has to crawl the document and make many more http calls to deref everything. It’s so frustrating to work with.

Waterluvian · on Sept 11, 2021

That thing is ridiculous. Surely they build it in a different language and render the result as json.

enriquto · on Sept 10, 2021

Great, now we can finally add the next step in "The Ascent of Ward".

EDIT (for the lucky 10.000 of today): http://harmful.cat-v.org/software/xml/

dolmen · on Sept 10, 2021

Bundling for OpenAPI specification has long been a need for authors to allow to reduce duplication, and to allow to split a big specification in multiples files, but publish a single one.

A few years ago I've written a tool to fit that niche: https://github.com/dolmen-go/openapi-preprocessor

https://github.com/dolmen-go/openapi-preprocessor

I have now to tweak it (well, it will be a major rewrite) to handle $ref relative to $id instead of the file location.

corentin88 · on Sept 10, 2021

JSON format is simple and easily readable. It’s now so common and probably one of most used language on earth. That’s great! Now, anyone can create a decent API with basic programming tool available. And everyone will be able to use it.

Why does some people want to over-engineer it?

> { "$id": "http://xn--rvg" }

Oh no. Please don’t do that.

theteapot · on Sept 10, 2021

Article:

> "Developers of platforms and libraries that use OpenAPI haven't had such a shake up before, and my feeling is it may take more than a few releases to correctly implement all the new shiny features full JSON Schema has to offer."

Dear OpenAPI, please avoid the shiny features.

ether_at_cpan · on Sept 11, 2021

Like which? Or are you just knee-jerking at the term "shiny" and presuming it means "over-complicated and unnecessary"?

theteapot · on Sept 10, 2021

> "There are several libraries which offer bundling solutions, however they all have caveats, and I haven't seen any to date which are fully JSON Schema aware."

But what's wrong with just stuffing the schemas in a flat array or object exactly?

relequestual · on Sept 10, 2021

Expectation of a single schema by existing tooling.

theteapot · on Sept 10, 2021

That's pretty vague.

bitfield · on Sept 10, 2021

A slightly different approach: https://bitfieldconsulting.com/golang/cuelang-exciting

rincewind · on Sept 10, 2021

I use json schema. I used to understand what it is and what it does. Now I don't.

tasogare · on Sept 10, 2021

Still missing: namespaces, attributes and comments (only-half ironical comment).

ether_at_cpan · on Sept 11, 2021

Comments are supported. There is a "$comment" keyword, and you are free to invent any other keyword you like, as they are generally ignored.

netfl0 · on Sept 10, 2021

Reinventing JSON-LD.

aeberhart · on Sept 11, 2021

There certainly are some similarities, but I think JSON-LD and and JSON Schema solve different problems. You might find this article helpful (I ended up writing this after being confused by the various standards in this space): https://dashjoin.medium.com/json-schema-schema-org-json-ld-w...

jonny383 · on Sept 10, 2021

[flagged]

endisneigh · on Sept 10, 2021

You’d laugh at someone for merely suggesting a technology? Seems like a toxic team

If you think it’s bad surely an explanation would win them over and stop their efforts?

sorokod · on Sept 10, 2021

Would a library that validates conformance to the schema so that you don't have to handcraft error handling sweeten the proposition?

NicoJuicy · on Sept 10, 2021

To be honest, i used json schema for auto generating layouts.

Eg. https://github.com/rjsf-team/react-jsonschema-form

chiph · on Sept 10, 2021

That depends on if you're getting JSON data from someone you don't quite trust. For internal uses, sure. If a process sends you bad JSON you go talk to the team responsible.

But if someone outside your organization is sending you data (product orders, say) you want something like this to ensure it's well-formed before you start evaluating business logic & validation against it.

Jenk · on Sept 10, 2021

Do you prohibit the use of VSCode in your workplace? Anyone using that is using jsonschema. It's built into the applicatoin for its settings and configuration files. Plugins and extensions use it to manage their own config _and_ as validation for the editor's api.

vultour · on Sept 10, 2021

Your editor has nothing to do with your product. People use PyCharm to write Python yet would "laugh out" anyone who suggests to start writing Java.

Jenk · on Sept 10, 2021

So vscode product team should laugh at themselves?

Do you get how ridiculous GGP is now?

ironman1478 · on Sept 10, 2021

So you don't sanitize JSON input to your system? That's what this tool is for. The alternative is writing custom code to validate it, which is just as messy and worse, bespoke.

gone35 · on Sept 10, 2021

History has not been kind to efforts like these.

speedbird · on Sept 10, 2021

Software "engineering" repeatedly goes through the same loop: - something simple, easy to learn, easy to use - doesn't cover edge case X - lacks rigour - lets add a bunch of features - and committees and processes to manage that - this is really complicated and hard to learn and use, we need something simpler ...

josteink · on Sept 10, 2021

History has been repeating efforts like this.

sorokod · on Sept 10, 2021

Worked well for relational databases.

chmod775 · on Sept 10, 2021

A yes. Because things like "integer" totally need to be abstracted away into another schema. Couldn't have given this DSL built-ins for the things that are... you know... built into JSON already.

Also bonus points for the apparent lack of shorthands, turning this language into a verbose word salad. I hope there's at least a line of reasoning that explains why some keys are prefixed with '$' and others aren't.

Great way to turn something as beautifully simple as JSON into something abhorrent.

Hey. You know what is better than JSON at being XML? XML.

conceptme · on Sept 10, 2021

afaik there is no integer type in json but a numeric type, and most languages do have different numeric types.

chmod775 · on Sept 10, 2021

Neither is 'non negative' - but that should also be a built-in, considering how verbose the language becomes otherwise.

Now you're adding ~5 lines of boilerplate every time you want to have an integer, another 5 for positive integers, and so on.

michaelmior · on Sept 10, 2021

To support non-negative integers, all you need is the following

{"type": "integer", "minimum": 0}

I think the format used in the example was used to demonstrate bundling, but if what you really want is a non-negative integer, the above is the simplest way to do it.

relequestual · on Sept 10, 2021

This is correct.

If I had used a more complex example, it would have been much harder to follow. This was the simplest.

We factor this out for our meta-schema construction due to reuse.

Someone · on Sept 10, 2021

https://json-schema.org/draft/2020-12/json-schema-core.html#...

This document defines a core vocabulary that MUST be supported by any implementation, and cannot be disabled. Its keywords are each prefixed with a "$" character to emphasize their required nature. This vocabulary is essential to the functioning of the "application/schema+json" media type, and is used to bootstrap the loading of other vocabularies.