An in-depth look at the PHP JSON Parser, from the perspective of it's author, Jakub Zelenka.
The json extension was added to the PHP core in version 5.2. Its parser was using a LL(1) grammar defined in a parsing table. The scanner was part of the parser so there was no visible distinction between parser and scanner. The advantage was quite a good performance. The disadvantage was its complexity and extensibility which meant that some parts wasn't optimized as they could be. And the main problem was licensing of the parser which was licensed under the JSON license which contained problematic clause: “The Software shall be used for Good, not Evil”. That was a big issue for the main Linux distributions (RedHat, Debian) because the license wasn’t considered a free software license. It meant that the core json extension was unbundled and instead a slower jsonc extension was used by default in those distros.
The issue was fixed by replacing the json extension with code based on the PECL jsond extension in PHP 7.0. In fact, it was mainly the decoding that changed, with only some structural changes in the encoder. The parser has been completely rewritten and used tools that are also used for language parser. It means re2c is used for a scanner and Bison for a parser.
The grammar generated by Bison is LALR so the parsing is done from bottom rather than from the top. It is a more powerful parsing method, however JSON is a simple deterministic context-free language so such parsing is not necessary. However the main advantage is simplifying the parser code because Bison allows specifying grammar in a nice way with some some useful helpers.
The main purpose of the parser is to check if a string forms a correct JSON in terms of ordering tokens. Parser also handles initializing and appending elements to array and object. The low level creation of tokens is then handled by scanner so the logic is separated.
Until recently, the parser was not available to other extensions and all its logic has not been exported. That is going to change soon with upcoming addition of a parser method which is a structure filled with callbacks, which are called by the parser during parsing. At the moment there are 8 callbacks:
It allows overloading of existing functions and using a custom logic for parsing. It has been done in a way that doesn’t have a negative impact on performance.
The feature is mainly important for other extensions that can use the parser method and replace parsing with their own logic. The initial version of the patch was actually sent by mysqlnd maintainer so we can expect its usage there.
In addition, this has some important implications such as allowing using different parser methods for different purposes. That might be especially useful for implementing JSON Schema validation to json parser without impacting the performance of the parsing, and without any validation or prior object deserialization.
The development of the parser will of course continue. There are couple of features that are planned. One of them is an error tracking to get an exact location of the error in JSON string. That could be exposed to user land using function like json_last_error_info() which would contain line and column of the error. Another feature is an experiment with the push parser that would allow iterative decoding which could be then used for stream parsing. New function could be introduced for that purpose called json_decode_stream. That could speed up parsing from the input considerably. And the last already mentioned change would be a support for JSON Schema validation and class deserialization from JSON.
The PHP JSON parser is still being actively developed, and the array of proposed features presented in this article will keep development going for quite some time.
Jakub is an experienced software developer working mostly with PHP and C/C++, currently contracting for the Maple Syrup Media group as a software engineer . As the author of the JSON extension and contributor to others, Jakub is widely known as a contributor to the PHP hypertext preprocessor.