JSON for Classic C++

(github.com)

101 points | by davikr 1 day ago

14 comments

  • conradev 1 day ago
    I remember searching for a JSON library with minimal dependencies a while ago, and came across this:

    https://rawgit.com/miloyip/nativejson-benchmark/master/sampl...

    The variance in feature set, design and performance is huge across all of them. I ultimately landed on libjson, written in C: https://github.com/vincenthz/libjson

    It does a lot for you, but it notably does not build a tree for you and does not try to interpret numbers, which I found perfect for adding to languages with C FFI that have their own collection and number types. It’s also great for partial parsing if you need to do any sort of streaming.

    It looks like this one can’t currently do partial parsing, but it looks great if C++ maps/vectors are your target.

    • msarnoff 1 day ago
      If you want to go extremely lightweight, there’s jsmn: https://github.com/zserge/jsmn

      It does no dynamic memory allocation, which is a plus in constrained IoT/embedded applications. But it’s really only a tokenizer. For example, if you want to parse fields out of a map, you have to write your own wrappers to iterate over key/value pairs. Since no data is copied out of the original buffer, all the “tokens” are given as byte offsets and lengths, not null-terminated strings, so you can’t just do printf(“%s”).

      If you can’t (or don’t want to) malloc, it gets the job done. Not sure I’d recommend it for other applications though.

      • conradev 21 hours ago
        I actually evaluated and used jsmn and almost mentioned it in my comment. It was really quite cool, but I believe I couldn’t use it due to the lack of UTF-8 validation. Because UTF-8 validation is in the state machine for libjson, I can actually ignore incomplete UTF-8 escape sequences in incomplete JSON strings when streaming.
      • jdougan 1 day ago
        Is there a reason you can't do printf("%.*s", strlen, strptr); ?
        • deathanatos 20 hours ago
          A non-allocating library would be forced to return to you the unparsed string literal, since returning a parsed string can require an allocation. It might tell you that the literal is valid.

          E.g., take the JSON:

            "\"\uD83D\uDCA9\""
          
          There's no (pointer, length) into that that you can then printf(…, ptr, len); you'd get the escapes, raw.

          Ofc., there might be situations like debugging where that's fine.

        • jeffreygoesto 1 day ago
          The terminating character is still the closing double quote and not a null, since the library does neither copy out nor alter the input. For example tiny_json replaces the closing quotes to create C strings, but that needs the full file to be in a mutable buffer which can be prohibitive for small controllers reading some config from flash only.
          • jdougan 1 day ago
            With the "%.*s" format you need no null at the end. It just counts out the characters:

                #include <stdio.h>
            
                void main()
                {
                  char buff[10] = {'R', 'o', 'b', 'o', 't', 't', 'y', 'p', 'e', 's'};
                  printf("=>%.*s<=", 4, ((char *)&buff + 3));
                }
            
            prints

                =>otty<=
            • jeffreygoesto 1 day ago
              Ah. Ok. Scanning the length before printing is mandatory then.
              • jdougan 19 hours ago
                @msarnoff had already stipulated that the json lib is returning lengths:

                    all the “tokens” are given as byte offsets and lengths
              • ska 22 hours ago
                You have to do (a variant of) one or the other, no?
        • msarnoff 22 hours ago
          Yes, that’s exactly what I’ve done.
          • jdougan 13 hours ago
            Any other issues with it? Sibling comments mentioned potential unicode issues.
            • msarnoff 12 hours ago
              I've only used it in an application where we ensure the data is ASCII-only. The only issue is that I've had to write a bunch of wrapper code around it (for looking up object properties by key, iterating over key/value pairs or array elements, etc.)
      • rurban 1 day ago
        I settled on this one too.

        Far better than nicklohmann's monster build times.

    • nwpierce 1 day ago
      Somehow I didn't run across that one in my searching - I'll check it out. I've been working on a json C library myself:

      https://github.com/nwpierce/jsb

      My goal was to convert a stream of JSON to/from a binary stream that is easier to traverse and manipulate.

    • alex_suzuki 1 day ago
      Did you try cJSON? Works well for me. https://github.com/DaveGamble/cJSON
  • darknavi 1 day ago
    Compile time is largely a "developer problem", but so is the usability of a library. nlohmann/json's main perk that it is selling is that it's interface is usable. Whether or not a developer values usability at typing time vs compile time is an interesting thing to ponder for sure.
    • jart 1 day ago
      Compile time is a collective problem and usability is an individual problem. I work with llama.cpp. The files in that codebase that were made using nlohmann json take about a minute to compile using g++ -O3 -g, all because one guy who originally wrote it wanted to type fewer keystrokes on his keyboard by using a more magical library, and the rest of us have to suffer for it every time we experiment with a 1 line of code change to those files.
      • chipdart 1 day ago
        > (...) and the rest of us have to suffer for it every time we experiment with a 1 line of code change to those files.

        If you feel this is an issue then why don't you move it to an independent submodule that can be compiled independently? That means you can build it in parallel along with the whole project, and in the end you just link the resulting binaries.

        • fsloth 1 day ago
          ” If you feel this is an issue then why don't you move it to an independent submodule that can be compiled independently?”

          If it’s a header, you necessarily can’t. Header gets included every time you want to compile code that depends on a header.

          Compilers may offer precompilation etc but if the code you want to change has direct dependency to a large header you need to recompile all of the dependencies.

          This is one of the painpoints C++.

          • pjmlp 1 day ago
            It is a pain point of build management regardless of the language, even with a language having proper modules one can have a cascade build, if the public interface or module ABI is impacted.

            C++ modules are here, unfortunely outside VC++ and clang latest, plus MSBuild or CMake/ninja, they are not an option.

            • papichulo2023 1 day ago
              Are they? According to some people (github issue to support cpp modules on vscode) the standard is mess and is likely to go away. VSCode doesnt support modules atm.
              • int_19h 21 hours ago
                Assuming that you're referring to https://github.com/microsoft/vscode-cpptools/issues/6302, I see two comments along these lines, neither of which is from actual implementers. That isn't evidence either that the standard is a mess, nor that it's likely to go away.

                The reason why this is taking such a long time is because the entire approach is a rather drastic change to how C++ compilers usually work, and C++ compilers (or even frontends, such as the stuff used by IDEs) are complicated things that aren't trivial to make major changes to.

                • jart 8 hours ago
                  I don't understand why they don't just double down on these #include <__fwd/vector.h> etc. headers. They fix everything. The only downside to them is I can't use them on a header that defines a class with std::vector member variables. But if they could make a small tweak to the language so that I could, then I'd take that over the promised modules revolution any day.
                • papichulo2023 18 hours ago
                  Yeah but the people implementing it are prob more interested on you moving to VS, so there a slighty conflict of interest, I appreciate their work but I also a bit sceptical and think this is the main reason behind it. 6yo is a lot of time for such an important feature. They dont they to support it on all compilers/frontends in order to release it.
              • pjmlp 1 day ago
                Visual Studio is what matters.

                VSCode is never going to be as good, you are better of with Clion then.

                • fsloth 1 day ago
                  This!

                  As win/mac user Visual Studio is my preferred tool, but in MacOS Clion (with vscode for few random workflow things not supported in Clion) is an adequate replacement (but Visual Studio remains king).

                  VSCode can be used as an industrial editor if one likes to, but if it does not feel right, it’s not a skill issue.

        • jart 1 day ago
          I just wrote a new server instead. There's nothing I won't do, no lengths I'm not willing to go, when it comes to cutting back on build latency.
          • gary_0 1 day ago
            I follow the same philosophy, to the point where at this point I barely use the STL; most of that template-heavy junk has been replaced in most of my projects. For instance, most of what I typically used <iostream> for was replaced with a 150-line .h (plus a 50-line .cpp that uses explicit template insantiation and a <charconv> include). {fmt} was too heavy for me. And I'm locked into C++17 because C++20 seems to double down on the 20k-line header madness.

            When I was stuck with C++ codebases that forced me to take a mandatory coffee break every time I needed to run a bit of new code, it made me a little bit insane! Never again.

          • chipdart 1 day ago
            > I just wrote a new server instead.

            I'm sorry, this makes no sense at all. Why would anyone write a new server just because a small component was taking a minute to build?

            • wiseowise 1 day ago
              It makes no sense at all that person concerned with slow build time rewrote slow component to compile faster?
      • hnlmorg 1 day ago
        As a prolific contributor to open source yourself, I’d have expected you to be a little more sympathetic to other open source developers giving up their time freely.

        For some contributors, they’ll have a day job, a family and other personal commitments. so writing open source code is a luxury they don’t have a lot of time for. I know this because I fall exactly into that camp myself.

        • jart 1 day ago
          I'm defending open source developers. We can't freely modify open source code if it has glacial build times. It's specifically because people are volunteering that we should aim to be as conscientious as possible when it comes to build latency. Someone who volunteers to contribute code that compiles slowly is not being respectful of the time of all the other volunteers, which is like pumping the brakes on the open source movement. So I will make my views clear that development practices need to improve.
          • EasyMark 7 hours ago
            Those people can create their own projects then. If the library/foss project doesn’t make it because of popularity then natural selection of the code worked, if people choose to accept the latency then it succeeds because it’s utility is worth more than time saved during a compile.
            • jart 6 hours ago
              That's the same argument companies like Google make when they're in the exploit phase of their lifecycle. We can afford to let search results get worse, just so long as they're no so much worse people don't turn to our competitors. However they're doing it for money, so at least they have a good reason. Who can say why someone would make their software unpleasant with template metaprogramming.
        • wiseowise 1 day ago
          Just because they give up their time freely it makes their decisions immune to criticism?
          • hnlmorg 1 day ago
            Constructive feedback is fine. jarts comment wasn’t that.
            • wiseowise 1 day ago
              We will never know if jarts comment was constructive or not until we know original developers decision process.

              If original decision process was indeed “less keystrokes”, then how is that not a constructive criticism?

              • hnlmorg 1 day ago
                The developers motives doesn’t change the snarky way jart wrote their comment.

                And if you felt their comment was acceptable then I question how much you’ve contributed to open source yourself. Snarky comments like jarts are all too common and really demotivate people from maintaining popular projects.

                But don’t just take my word on it, there’s a plethora of other contributors who’ve talked about this topic as well.

                • TeMPOraL 1 day ago
                  Where's the snark though? jart's comment reads true literally.

                  Compile times are a big deal, and 'jart is right about individual vs. collective problems. And unlike most other critics on the Internet, 'jart actually provided a solution along with the criticism. If that kind of behavior "demotivates [some] people from maintaining popular projects", I still feel it's a net win.

                • wiseowise 16 hours ago
                  What difference does it even make if it’s an open source project or not? Compile times are a big deal.
                  • hnlmorg 6 hours ago
                    I’m not talking about compile times. I’m talking about the way people should communicate respectfully.
                    • wiseowise 5 hours ago
                      Believe me, his message is as respectful as it gets.

                      I have far, far, FAR stronger words for “people” who don’t respect other people’s time by not caring about compile times. Words that would make Linus blush.

                    • jart 6 hours ago
                      Well maybe you should be, since focusing on the technology is what allows our disagreements to be impersonal.
              • gr4vityWall 22 hours ago
                I don't think a supposedly bad decision has to be answered with being snarky. A pull request, or a fork focused on reducing build times are actual net gains. From that poster's original name, seems like they went on and did just that, which is great I believe.

                At the very least, giving the original developer the benefit of doubt, or assuming their decision made sense under the circumstances they were in at the time, is IMO a better start than just public criticism.

      • occz 1 day ago
        While the pursuit of faster build times is definitely a worthy cause, I feel like there's something I'm not quite seeing here. Does the JSON-code change frequently enough to incur build cache misses and the full minute penalty? Is there something inherent about the structure of the library that makes it unable to have its compilation be cached? Is the code structured in such a way that editing other code requires also invalidating the cache for the JSON-related code? I guess one way would be to break out the JSON parsing code to its own module and have it produce language-specific structs to be interacted with by the rest of the program.
        • jart 1 day ago
          Programming is the process of manipulating data structures, so if you're building a JSON server, then every piece of code in your server is going to be dealing with and operating on JSON data structures. It can't be neatly tucked away in a corner. Because it would be foolish to design a server that makes needless copies of all its inputs and outputs. This truth would be the same if you were using something like protobuf instead. Therefore it's important that your fundamental data structures be something that (a) you can control, and (b) doesn't make everything it touches take forever to build. Do you feel in control of someone else's 24000 line header full of template magic? If that thing is sitting between me and my data structures, then I will wipe it out of existence.
        • mort96 1 day ago
          It seems like nlohmann/json is a header-only library, meaning the entire library has to be compiled once for every source file which uses it any time that source file or its includes has updated.

          So I guess in a JSON-heavy code base or a code base where nlohmann/json has leaked into common headers, you may end up recompiling the library a few dozen times per build where a few dozen of your C++ source files must be recompiled (e.g due to common header changes)...

          (But don't worry, the linker will then spend a bunch of time throwing away almost all of that work so you only get one copy of the library in your binary)

          • occz 1 day ago
            I missed that part. That is a pretty significant downside in that case.
            • jart 7 hours ago
              It places a hard scaling limit on how big an open source project can become. Projects like the Linux Kernel spend enormous amounts of political capital restraining decadent programming practices, since it's the only way a codebase like that can maintain the support of its developers and grow. For example, Linux had a rule until 2018 that everything had to be able to compile with GCC 3.2 from 2003. They're much more laid back these days, since it's difficult to imagine Linux growing bigger than it already has. But I think for a newer project like llama.cpp would be well advised to follow by example what projects like Linux did in their growth phase, rather than following their leadership today. It requires an lot of discipline, toil, and restraint to be a leader in open source, because you're essentially offering the world a pot of gold, and that only works if you keep very little for yourself.
        • wiseowise 1 day ago
          > Does the JSON-code change frequently enough to incur build cache misses and the full minute penalty?

          The moment you switch branches - it changes.

          If you develop for Android - it generates build for with hash name from some CMake/Gradle variables, the moment one of those changes (like AGP version) you get a new build dir and essentially have to compile from scratch.

          • occz 1 day ago
            If you're on something reasonably smart like Bazel it will be able to determine whether the module itself has been changed and requires recompilation instead of running from cache.
            • wiseowise 23 hours ago
              Nice.

              We, and majority of Android projects, aren’t on Bazel, though.

              • occz 22 hours ago
                This is true, and it's kind of a bummer to be honest. There's some serious time being wasted on recompilation that could be avoided with a really sharp build system.

                Bazel comes with its own bag of sharp edges though so it's unfortunately not like you can just adopt it and be on your merry way.

    • chipdart 1 day ago
      > Compile time is largely a "developer problem", but so is the usability of a library.

      Compiler time is way more than a "developer problem". It's an operational problem that ends up permeating to software architecture and development practices, and ultimately affects how the whole project is delivered and deployed.

    • marmakoide 1 day ago
      Significantly faster compilation means less friction to iterate ideas, try things, which in the end lead to more polished results.

      A nice interface is agreable, but maybe there are diminishing returns when you pay it with large compile time. I remember pondering about that when working with the Eigen math library, which is very nice but such a resource hog when you compile a project using it.

  • henshao 1 day ago
    Really interesting that nlohmann isn't fully compliant. What cases are these?

    It seems to me though that if you're encountering the edges of json where nlohmann or simple parsing doesn't work properly, a binary format might be better. And if you're trying to serialize so much data that speed actually becomes an issue, then again, binary format might be what you really want.

    The killer feature of nlohmann are the the NLOHMANN_DEFINE_TYPE_INTRUSIVE or NLOHMANN_DEFINE_TYPE_NON_INTRUSIVE macros that handle all of the ??? -> json -> ??? steps for you. That alone make it my default go to unless the above reasons force me to go another direction.

  • leni536 1 day ago
    On the other end of the spectrum there is [1]. It's both performance and usability oriented, although compile times are probably higher.

    Nlohmann is the slowest out of the popular libraries, AFAIK, and not particularly more usable than rapidjson, in my experience. So "better than nlohmann" is not very novel.

    [1] https://github.com/beached/daw_json_link

  • psyclobe 1 day ago
    The moment nlohmann's library came out, I switched to it and I never looked back.

    I loved the interface and its exactly how I would've designed a json library with modern c++.

    Just maybe turn off the implicit conversion option, that can get a bit messy ;)

  • zeroq 1 day ago
    "This project is a reaction agains..." is such a punk move I can't do anything but appreciate.
  • cod1r 1 day ago
    jart is such a good programmer. a lot of people already know this but i just have to give props where it's due.
  • makz 1 day ago
    What does “Classic C++” mean?
    • jll29 1 day ago
      This library is nicely concise, and the code is mostly readable (although there are some non-obvious tricks that could be better documented).

      The Makefile could need some work:

        json_test.cpp:360:23: warning: missing terminating '"' character [-Winvalid-pp-token]
        { Json::success, R"({
                              ^
        fatal error: too many errors emitted, stopping now [-ferror-limit=]
        9 warnings and 20 errors generated.
        make: *** [json_test.o] Error 1
        % c++ --version                   
        Apple clang version 15.0.0 (clang-1500.1.0.2.5)
        Target: arm64-apple-darwin22.6.0
        Thread model: posix
        InstalledDir: /Library/Developer/CommandLineTools/usr/bin
      
      Compiling direclty with

        c++ --std=c++11 -c json.cpp
      
      works fine, though.
    • jandrewrogers 1 day ago
      There are approximately three major dialects of C++. They are distinguished by major changes in what idiomatic code looks like, enabled by the addition of core features to the language that made it more efficient and type-safe to express many things.

      The era of so-called “modern” C++ started with C++11, which was a radical reworking of the language. All prior versions of C++ are “legacy” or “classic”. Idiomatic code in “modern” and “classic” dialects almost look like different languages.

      C++20 arguably marks a new dialect break but it doesn’t have a colloquial label to distinguish it from “legacy” and “modern” AFAIK. Idiomatic C++20 looks pretty foreign from a C++11 perspective (but is unambiguously an improvement).

      • jart 1 day ago
        This library supports building with C++11. I haven't tried compiling it with an older standard, but I imagine it might work. One thing I like about the C++11 compilers like GCC 4.9 is they build code magnificently faster than recent editions. See https://x.com/JustineTunney/status/1795427808631758936
        • aninteger 1 day ago
          > This library supports building with C++11. I haven't tried compiling it with an older standard, but I imagine it might work.

          I believe it does require C++11, due to std::nullptr_t and r-value references (&&), but that might be it. It's not a show stopper though since everyone should have a c++11 compiler now (even Ubuntu 14.04 LTS, which still has paid support I believe).

          > One thing I like about the C++11 compilers like GCC 4.9 is they build code magnificently faster than recent editions

          Kind of reminds me of gcc 2.95 which people kept around for the compiler speed. They would use gcc 3.x for the warning support and then compile with gcc 2.95 after fixing the warnings :).

          • jart 1 day ago
            Yes they'd be very trivial to remove locally. It might also be nice to have #ifdef statements around them like we're already doing for std::string_view. If we consider that many big name C projects like curl are still on C89 then there's surely got to be people still out there using 2000's era C++.
          • chipdart 1 day ago
            > It's not a show stopper though since everyone should have a c++11 compiler now (...)

            I think the point of pointing out it's C++11 is that it's not "classic C++" as it's using "modern C++" features. Thus it's a mystery why it would be referred to as classic C++.

            • jart 1 day ago
              Just because I included an rvalue constructor doesn't make it C++11. This library was originally written in C. It hasn't changed a whole lot since Gautham and I originally wrote it: https://github.com/jart/cosmopolitan/blob/master/tool/net/lj... I feel perfectly comfortable calling C++11 "classic" or even "baroque" compared to what people are doing with C++ in 2024. However if you disagree with me, and feel that classic means C++03, then I've made certain that your preferences are supported by this library too. Just remove the rvalue and nullptr_t constructors. I'll probably add #ifdefs soon to automate that too.
              • chipdart 18 hours ago
                > Just because I included an rvalue constructor doesn't make it C++11.

                Actually, it does. I mean, does it compile when you pass -std=c++98?

                > This library was originally written in C.

                Doesn't matter. If it uses C++11 features, it's C++11.

                > I feel perfectly comfortable calling C++11 "classic" or even "baroque" compared to what people are doing with C++ in 2024.

                Irrelevant. You can go the Humpty Dumpty way as far as you want to go and call anything any way. It doesn't matter. If you use C++11 features, it's C++11. If it's C++11 then you're discussing modern C++. You don't need to use all bells and whistles to quality.

      • int_19h 20 hours ago
        I don't think it's entirely accurate. "Modern idiomatic C++" was a thing already before C++11 - that would be the kind of code that heavily used the standard library and especially STL containers, iterators etc (but also stuff like auto_ptr etc; and yes, for all its flaws, it was actually used).

        And don't forget that C++03 TR1 also added a bunch of very useful stuff - most notably, std::shared_ptr and std::function. And, of course, Boost has been a thing long before C++11, filling many gaps for "modern C++" projects of the time.

        "classic C++" from that perspective is C++ written more or less Java-style.

    • pbrowne011 1 day ago
      "Classic C++" and "Modern C++" refer to the language before and after C++11, respectively.

      Some of the key differences are use of standard library and its containers, smart pointers, and other language features that look less like C. In this specific library, this refers to some of the techniques like bit manipulation, manual memory management and string parsing, and using things like enums to improve speed and reduce complexity.

      An example of a more robust (but still "classic") library would be something like https://github.com/Tencent/rapidjson.

  • 0x1ceb00da 1 day ago
    https://github.com/jart/json.cpp/blob/4f0a02dab1af7d81888cf5...

    The response doesn't tell you the location of the problem in the input.

    • jart 1 day ago
      That might actually be the explanation for why json.cpp benchmarks 39x faster than nlohmann's library if I include the failure test cases.
  • ur-whale 1 day ago
    Code in jart's version is refreshingly clean and easy to read compared the nlohmann's version.

    As an aside, I wonder: what are the ThomPike* set of macros actually doing in jart's implem ?

    Also, a speed comparison of this vs the other one would be very welcome: conformance and simplicity are certainly important criteria when picking a JSON parser, but speed is rather crucial.

    • jart 1 day ago
      Thompson Pike encoding. It predates the UTF-8 standard and was invented on a napkin in a New Jersey diner. It allows the full spectrum of 32-bit numbers to be encoded, rather than restricting characters to only those also present in UTF-16. The json.cpp library enforces UTF-8 restrictions on parsing, because we have no choice. But you're allowed to serialize anything you want, thanks to the ThomPike macros.
  • tsurnyc 1 day ago
    What are the performance numbers? nlohmann/json is no speed demon.
  • UncleOxidant 1 day ago
    Sounds like there's a backlash to modern C++.
  • nurettin 1 day ago
    This is a fine library, but I use nlohmann extensively and haven't experienced any considerable compilation slowdown once I added it to the project.

    Overloading from_json to modularize parsing is really useful, I think that should be a part of every templated C++ json parser library.

    That said, I have seen these ThomPike* macros in cosmopolitan.h before, I wonder what the origin is.

  • madduci 1 day ago
    Interesting approach, but without providing a conan/vcpkg in (the end of) 2024, makes only friction.

    We are not living in 90s anymore..

    • epcoa 1 day ago
      Dunking on nlohmann for performance is pretty easy. I’m interested in what the value proposition is over one of rapidjson, glaze, or simdjson (all of which have some amount of SIMD or SWAR optimization, and more importantly SAX and the use of something other than std::map)