Ncdu JSON Export File Format
This document describes the file format that ncdu 1.9 and later use
for the export/import feature (the -o
and -f
options). Check the ncdu manual for a
description on how to use that feature.
Top-level object
Ncdu uses JSON notation as its data format. The top-level object is an array:
[
<majorver>,
<minorver>,
<metadata>,
<directory>
]
Versioning
The <majorver>
and <minorver>
elements indicate the version of the file format. These are numbers with
accepted values in the range of
0 <= version <= 10000
. Major version must be
1
. Minor version is 0
for ncdu 1.9 till 1.12,
1
for ncdu 1.13 till 1.15.2 for the addition of the
extended mode and 2
since ncdu 1.16 for the addition of the
nlink
field. The major version should increase if
backwards-incompatible changes are made (preferably never), the minor
version can be increased to indicate additions to the existing
format.
Metadata
The <metadata>
element is a JSON object holding
whatever (short) metadata you’d want. This block is currently (1.9-1.16)
ignored by ncdu when importing, but it writes out the following keys
when exporting:
- progname
-
String, name of the program that generated the file,
i.e.
"ncdu"
. - progver
-
String, version of the program that generated the file,
e.g.
"1.10"
. - timestamp
-
Number, UNIX timestamp as returned by the POSIX
time()
function at the time the file was generated. Note that this may not necessarily be equivant to when the directory has been scanned.
Directory Info
A <directory>
is represented with a JSON
array:
[
<infoblock>,
<directory>, <directory>, <infoblock>, ...
]
That is, the first element of the array must be an
<infoblock>
. If the directory is empty, that will be
its only element. If it isn’t, its subdirectories and files are listed
in the remaining elements. Each subdirectory is represented as a
<directory>
array again, and each file is represented
as just an <infoblock>
object.
The Info Object
An <infoblock>
is a JSON object holding
information about a file or directory. The following fields are
supported:
- name
-
String (required). Name of the file/dir. For the top-level directory (that is, the
<directory>
item in the top-level JSON array), this should be the full absolute filesystem path, e.g."/media/harddrive"
. For any items below the top-level directory, the name should be just the name of the item.The name will be in the same encoding as reported by the filesystem (i.e. readdir()). The name may not exceed 32768 bytes.
- asize
-
Number. The apparent file size, as reported by
lstat().st_size
. If absent, 0 is assumed. Accepted values are in the range of0 <= asize < 2^63
. - dsize
-
Number. Size of the file, as consumed on the disk. This is obtained
through
lstat().st_blocks*S_BLKSIZE
. If absent, 0 is assumed. Accepted values are in the range of0 <= dsize < 2^63
. - dev
-
Number. The device ID. Has to be a unique ID within the context of the
exported dump, but may not have any meaning outside of that. I.e. this
can be a serialization of
lstat().st_dev
, but also a randomly generated number only used within this file. As long as it uniquely identifies the device/filesystem on which this file is stored. This field may be absent, in which case it is equivalent to that of the parent directory. If this field is absent for the parent directory, a value of 0 is assumed. Accepted values are in the range of0 <= dev < 2^64
. - ino
-
Number. Inode number as reported by
lstat().st_ino
. Together with the Device ID this uniquely identifies a file in this dump. In the case of hard links, two objects may appear with the same (dev
,ino
) combination. As of ncdu 1.16, this field is only exported ifst_nlink > 1
. A value of 0 is assumed if this field is absent, which is fine as long as thehlnkc
field is false andnlink
is 1, otherwise everything with the samedev
and emptyino
values will be considered as a single hardlinked file. Accepted values are in the range of0 <= ino < 2^64
. - hlnkc
-
Boolean.
true
if this is a file withlstat().st_nlink > 1
. This field redundant if thenlink
field is also set, but is still included in new dumps for backwards compatibility with ncdu versions prior to 1.16. If both this and thenlink
fields are absent,false
is assumed. - read_error
-
Boolean.
true
if something went wrong while reading this entry. I.e. the information in this entry may not be complete. For files, this indicates that thelstat()
call failed. For directories, this means that an error occurred while obtaining the file listing, and some items may be missing. Note that iflstat()
failed, ncdu has no way of knowing whether an item is a file or a directory, so a file withread_error
set might as well be a directory. If absent,false
is assumed. - excluded
-
String. Set if this file or directory is to be excluded from calculation for some reason. The following values are recognized:
"pattern"
- If the path matched an exclude pattern.
"otherfs"
or"othfs"
-
If the item is on a different device/filesystem. Every version of ncdu
versions recognizes
"otherfs"
when importing, but versions before 1.21 or 2.5 wrote"othfs"
when exporting. Later versions recognize both strings and output"otherfs"
. "kernfs"
-
If the item has been excluded with
--exclude-kernfs
(since ncdu 1.15). "frmlink"
-
If the item is a firmlink and hasn’t been followed with
--follow-firmlinks
(since ncdu 1.15).
Excluded items may still be included in the export, but only by name.
size
,asize
and other information may be absent. If this item was excluded by a pattern, ncdu will not do anlstat()
on it, and may thus report this item as a file even if it is a directory.Other values than mentioned above are accepted by ncdu, but are currently interpreted to be equivalent to “pattern”. This field should be absent if the item has not been excluded from the calculation.
- nlink
-
(since ncdu 1.16) Number, the value of
lstat().st_nlink
. If this field is present and has a value larger than 1, this file is considered for hardlink counting. Accepted values are in the range1 <= nlink < 2^32
. If absent,1
is assumed. - notreg
-
Boolean. This is
true
if neither S_ISREG() nor S_ISDIR() evaluates to true. I.e. this is a symlink, character device, block device, FIFO, socket, or whatever else your system may support. If absent,false
is assumed.
Extended information
In addition, the following fields are exported when extended
information mode is enabled (available since ncdu 1.13). See the
-e
flag in ncdu(1) for details.
- uid
-
Number, user ID who owns the file. Accepted values are in the range
0 <= uid < 2^31
. - gid
-
Number, group ID who owns the file. Accepted values are in the range
0 <= uid < 2^31
. - mode
-
Number, the raw file mode as returned by lstat(3). For Linux systems, see
inode(7) for the interpretation
of this field. Accepted range:
0 <= mode < 2^16
. - mtime
-
Number, last modification time as a UNIX timestamp. Accepted range:
0 <= mtime < 2^64
. As of ncdu 1.16, this number may also include an (infinite precision) decimal part for fractional seconds, though the decimal part is (currently) discarded during import.
Miscellaneous notes
As mentioned above, file/directory names are not
converted to any specific encoding when exporting. If you want the
exported info dump to be valid JSON (and thus valid UTF-8), you’ll have
to ensure that either all filenames in your directory tree are valid
UTF-8 or you should process the dump through a conversion utility such
as iconv
. When browsing an imported file with ncdu 1.x,
you’ll usually want to ensure that the filenames are in the same
encoding as what your terminal is expecting. The browsing interface may
look garbled or otherwise ugly if that’s not the case.
Another important thing to keep in mind is that an export can be fairly large. If you write a program that reads a file in this format and you care about handling directories with several million files, make sure to optimize for that. For example, prefer the use of a stream-based JSON parser over a JSON library that reads the entire file in a single generic data structure, and only keep the minimum amount of data that you care about in memory.
BUG#245: Ncdu versions before 1.21 or 2.7 did not correctly read escaped UTF-16 surrogate pairs. For portability with these versions, unicode points above U+FFFF should be encoded as UTF-8 instead. Ncdu itself never outputs UTF-16 surrogate pairs.
Example Export
Here’s a simple example export that displays the basic structure of the format.
[
1,
0,
{
"progname" : "ncdu",
"progver" : "1.9",
"timestamp" : 1354477149
},
[
{ "name" : "/media/harddrive",
"dsize" : 4096,
"asize" : 422,
"dev" : 39123423,
"ino" : 29342345
},
{ "name" : "SomeFile",
"dsize" : 32768,
"asize" : 32414,
"ino" : 91245479284
},
[
{ "name" : "EmptyDir",
"dsize" : 4096,
"asize" : 10,
"ino" : 3924
}
]
]
]
The directory described above has the following structure:
/media/harddrive
├── SomeFile
└── EmptyDir