Skip to content

Conversation

@thekid
Copy link
Member

@thekid thekid commented Feb 26, 2023

Useful when interacting with large amounts of streamed JSON data, e.g. from an OData endpoint, see http://docs.oasis-open.org/odata/odata-json-format/v4.01/odata-json-format-v4.01.html#sec_CollectionofEntities. In contrast to other JSON streaming solutions, e.g. newline-delimited JSON, the payload is one huge object, with the values inside the value key:

{
  "@context": "...",
  "@count": 37,
  "value": [
    { ... },
    { ... },
    { ... }
  ],
  "@nextLink": "...?$skiptoken=342r89"
}

Example

use util\address\{JsonStreaming, StructureOf};
use peer\http\HttpConnection;
use util\cmd\Console;

// Returns about 25 MB of JSON starting like this:
// [
//    {
//       "actor" : {
//          "avatar_url" : "https://avatars.githubusercontent.com/u/665991?",
//          "gravatar_id" : "",
//          "id" : 665991,
//          "login" : "petroav",
//          "url" : "https://api.github.com/users/petroav"
//       },
//       "created_at" : "2015-01-01T15:00:00Z",
//       "id" : "2489651045",
//       "payload" : {
//          "description" : "...",
//          "master_branch" : "master",
//          "pusher_type" : "user",
//          "ref" : "master",
//          "ref_type" : "branch"
//       },
//       "public" : true,
//       "repo" : {
//          "id" : 28688495,
//          "name" : "petroav/6.828",
//          "url" : "https://api.github.com/repos/petroav/6.828"
//       },
//       "type" : "CreateEvent"
//    },
//    ...
// ]
$http= new HttpConnection('https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json');

$definition= new StructureOf();
$stream= new JsonStreaming($http->get()->in());
foreach ($stream->pointers('//[]') as $pointer) {
  Console::writeLine($pointer->value($definition));
}

Performance

The above script runs for 14.2 seconds (limited by my old DSL line's download speed) and consumes 4197.281kB memory at its peak. Its "pure PHP" equivalent using json_decode(file_get_contents(...)) runs for the same time but its memory consumption peaks at 133902.258kB (32x!). Also, due to the streaming nature, the above script starts yielding events immediately after the HTTP request/response dialog, while the "pure PHP" equivalent blocks for 99% of the time, then yields all values at once.

See also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants