~ Ncdu ~

Yoran Heling
projects@yorhel.nl
home - git repos
= donate =paypal
= pgp =only used for releases
key - mit
7446 0D32 B808 10EB A9AF A2E9 6239 4C69 8C27 39FA
Cute decorative scissors, cutting through your code.

Ncdu Export File Format

This document describes the file format that ncdu 1.9 and later use for the export/import feature (the -o and -f options). Check the ncdu manual for a description on how to use that feature.

Top-level object

Ncdu uses JSON notation as its data format. The top-level object is an array:

  [
<majorver>,
<minorver>,
<metadata>,
<directory>
]

Versioning

The <majorver> and <minorver> elements indicate the version of the file format. These are numbers with accepted values in the range of 0 <= version <= 10000. Major version must be 1, minor version is currently 0. The major version should increase if backwards-incompatible changes are made (preferably never), the minor version can be increased to indicate additions to the existing format.

Metadata

The <metadata> element is a JSON object holding whatever (short) metadata you'd want. This block is currently (1.9-1.12) ignored by ncdu when importing, but it writes out the following keys when exporting:

progname

String, name of the program that generated the file, i.e. "ncdu".

progver

String, version of the program that generated the file, e.g. "1.10".

timestamp

Number, UNIX timestamp as returned by the POSIX time() function at the time the file was generated. Note that this may not necessarily be equivant to when the directory has been scanned.

Directory Info

A <directory> is represented with a JSON array:

  [
<infoblock>,
<directory>, <directory>, <infoblock>, ...
]

That is, the first element of the array must be an <infoblock>. If the directory is empty, that will be its only element. If it isn't, its subdirectories and files are listed in the remaining elements. Each subdirectory is represented as a <directory> array again, and each file is represented as just an <infoblock> object.

The Info Object

An <infoblock> is a JSON object holding information about a file or directory. The following fields are supported:

name

String (required). Name of the file/dir. For the top-level directory (that is, the <directory> item in the top-level JSON array), this should be the full absolute filesystem path, e.g. "/media/harddrive". For any items below the top-level directory, the name should be just the name of the item.

The name will be in the same encoding as reported by the filesystem (i.e. readdir()). The name may not exceed 32768 bytes.

asize

Number. The apparent file size, as reported by lstat().st_size. If absent, 0 is assumed. Accepted values are in the range of 0 <= asize < 2^63.

dsize

Number. Size of the file, as consumed on the disk. This is obtained through lstat().st_blocks*S_BLKSIZE. If absent, 0 is assumed. Accepted values are in the range of 0 <= dsize < 2^63.

dev

Number. The device ID. Has to be a unique ID within the context of the exported dump, but may not have any meaning outside of that. I.e. this can be a serialization of lstat().st_dev, but also a randomly generated number only used within this file. As long as it uniquely identifies the device/filesystem on which this file is stored. This field may be absent, in which case it is equivalent to that of the parent directory. If this field is absent for the parent directory, a value of 0 is assumed. Accepted values are in the range of 0 <= dev < 2^64.

ino

Number. Inode number as reported by lstat().st_ino. Together with the Device ID this uniquely identifies a file in this dump. In the case of hard links, two objects may appear with the same (dev,ino) combination. A value of 0 is assumed if this field is absent. This is currently (ncdu 1.9-1.12) not a problem as long as the hlnkc field is false, otherwise it will consider everything with the same dev and empty ino values as a single hardlinked file. Accepted values are in the range of 0 <= ino < 2^64.

hlnkc

Boolean. true if this is a file with lstat().st_nlink > 1. If absent, false is assumed.

read_error

Boolean. true if something went wrong while reading this entry. I.e. the information in this entry may not be complete. For files, this indicates that the lstat() call failed. For directories, this means that an error occurred while obtaining the file listing, and some items may be missing. Note that if lstat() failed, ncdu has no way of knowing whether an item is a file or a directory, so a file with read_error set might as well be a directory. If absent, false is assumed.

excluded

String. Set if this file or directory is to be excluded from calculation for some reason. The following values are recognized:

"pattern"

If the path matched an exclude pattern.

"otherfs"

If the item is on a different device/filesystem.

Excluded items may still be included in the export, but only by name. size, asize and other information may be absent. If this item was excluded by a pattern, ncdu will not do an lstat() on it, and may thus report this item as a file even if it is a directory.

Other values than mentioned above are accepted by ncdu, but are currently interpreted to be equivalent to "pattern". This field should be absent if the item has not been excluded from the calculation.

notreg

Boolean. This is true if neither S_ISREG() nor S_ISDIR() evaluates to true. I.e. this is a symlink, character device, block device, FIFO, socket, or whatever else your system may support. If absent, false is assumed.

Miscellaneous notes

As mentioned above, file/directory names are not converted to any specific encoding when exporting. If you want the exported info dump to be valid JSON (and thus valid UTF-8), you'll have to ensure that you have either no non-UTF-8 filenames in your filesystem, or you should process the dump through a conversion utility such as iconv. When browsing an imported file with ncdu, you'll usually want to ensure that the filenames are in the same encoding as what your terminal is expecting. The browsing interface may look garbled or otherwise ugly if that's not the case.

Another important thing to keep in mind is that an export can be fairly large. If you write a program that reads a file in this format and you care about handling directories with several million files, make sure to optimize for that. For example, prefer the use of a stream-based JSON parser over a JSON library that reads the entire file in a single generic data structure, and only keep the minimum amount of data that you care about in memory.

Example Export

Here's a simple example export that displays the basic structure of the format.

  [
1,
0,
{
"progname" : "ncdu",
"progver" : "1.9",
"timestamp" : 1354477149
},
[
{ "name" : "/media/harddrive",
"dsize" : 4096,
"asize" : 422,
"dev" : 39123423,
"ino" : 29342345
},
{ "name" : "SomeFile",
"dsize" : 32768,
"asize" : 32414,
"ino" : 91245479284
},
[
{ "name" : "EmptyDir",
"dsize" : 4096,
"asize" : 10,
"ino" : 3924
}
]
]
]

The directory described above has the following structure:

  /media/harddrive
├── SomeFile
└── EmptyDir