Support counting the contents of symlinks #14

Closed
opened 2019-03-23 18:03:23 +01:00 by Multi · 0 comments

(Automatically imported from issue 16 of the old bug tracker)

Description

Created: 2012-08-01 18:33:21 GMT by Anonymous

I am using git-annex, which makes heavy use of symlinks. I want to clear up some files, but ncdu doesn't count symlinks, and instead reports all the disk space used in the .git/annex directory.

Is there anyway to add an option to count a symlink A to a file B as if it was the same size as B? I know this has problems with double counting files, but it would be a great help in this case.

Reply 1

Added: 2014-01-24 00:44:32 GMT by Florian

I need this feature as well. du has this option using du -L

It would be awesome to have this in ncdu as well.

Reply 2

Added: 2014-01-24 10:24:58 GMT by Yorhel

Here's a trivial patch to add the -L flag and have it follow symlinks: http://p.blicky.net/j3py5

There's one major problem with it, it's terribly insecure. Create a 'ln -s .. parent' link and ncdu will eat all your memory and/or fill all your disk space when exporting to a file. It looks like du avoids this by checking whether it has already scanned the directory that is linked to, but I'm not sure how it does this check (path name comparison after some form of normalization? dev/inode comparison?). Implementing either solution is not going to be trivial in the current code base, unfortunately.

One simple solution I can think of is to limit -L to only follow symlinks to a file, and not recurse into directory symlinks. That would be inconsistent with du, and I'm not sure if that serves all use cases. Another "solution" is to keep the vulnerability in there and add a huge warning in the manual (though I'm sure someone will ignore that and run ncdu -L on an untrusted directory anyway...).

I'm open to suggestions.

Reply 3

Added: 2014-01-24 11:32:46 GMT by Florian

First: I'm no c programmer. So I only can guess ^^

But it looks like the coreutils du uses a dict with inode and device to track duplicate files. What about that?

Reply 4

Added: 2014-01-24 11:50:37 GMT by Yorhel

That's what I meant with dev/inode comparison. I'm not a huge fan of that solution because it requires a lookup table containing all directory entries, but only when the -L option is present. That's, in my opinion, a moderately large code+complexity increase just for handling a very uncommon (but scary) situation.

There's another annoyance that I just thought of. What to do when the symlink to a directory is scanned before the directory itself is scanned? With the dev/inode comparison solution, the symlink will appear to be the actual directory and the actual directory will appear to be the symlink. That's rather unintuitive and may result in wrong directory sizes. Correctly handling that situation will increase code complexity even more. :-(

(Sometimes it feels that all I ever do is complain...)

Reply 5

Added: 2014-11-01 21:29:52 GMT by anarcat

the patch provided works well.

Reply 6

Added: 2017-03-11 16:17:23 GMT by Rolf

Another happy and thankful ncdu and git-annex user here.

I'm not a programmer and I'm not sure of the current algorithms used in ncdu, but maybe you can use 'find' to generate the list of files to consider? The find program seems to have loop detection when using the -L parameter.

Reply 7

Added: 2017-03-11 16:19:50 GMT by Rolf

Using find would make it trivial to deal with bug 33 as well.

Reply 8

Added: 2017-03-11 16:23:30 GMT by Rolf

and bug 56

Reply 9

Added: 2018-07-23 09:57:05 GMT by Anonymous

One simple solution I can think of is to limit -L to only follow symlinks to a file, and not recurse into directory symlinks. That would be inconsistent with du, and I'm not sure if that serves all use cases

Even a partial solution is better than no solution in my opinion. And in this case it would at least cover this use case (git-annex) since it only makes use of symlinks to files.

Reply 10

Added: 2019-01-24 08:51:23 GMT by Yorhel; Status: new to fixed; Closed: no to yes

The git version has a simple -L,--follow-symlinks option now, should be sufficient for a git-annex directory.

Reply 11

Added: 2019-01-24 09:48:51 GMT by Rolf

Awesome! Thank you very much indeed. I will check it out.

Reply 12

Added: 2019-01-24 10:25:01 GMT by Rolf

Are you going to tag 1.14 soon?

Reply 13

Added: 2019-01-24 10:29:23 GMT by Yorhel

Unless something strange happens, I expect to release a 1.14 in a week or two.

_(Automatically imported from issue 16 of the old bug tracker)_ ### Description **Created**: 2012-08-01 18:33:21 GMT **by** Anonymous I am using git-annex, which makes heavy use of symlinks. I want to clear up some files, but ncdu doesn't count symlinks, and instead reports all the disk space used in the .git/annex directory. Is there anyway to add an option to count a symlink A to a file B as if it was the same size as B? I know this has problems with double counting files, but it would be a great help in this case. ### Reply 1 **Added**: 2014-01-24 00:44:32 GMT **by** Florian I need this feature as well. du has this option using du -L<br> It would be awesome to have this in ncdu as well. ### Reply 2 **Added**: 2014-01-24 10:24:58 GMT **by** Yorhel Here's a trivial patch to add the -L flag and have it follow symlinks: http://p.blicky.net/j3py5 There's one major problem with it, it's terribly insecure. Create a 'ln -s .. parent' link and ncdu will eat all your memory and/or fill all your disk space when exporting to a file. It looks like du avoids this by checking whether it has already scanned the directory that is linked to, but I'm not sure how it does this check (path name comparison after some form of normalization? dev/inode comparison?). Implementing either solution is not going to be trivial in the current code base, unfortunately. One simple solution I can think of is to limit -L to only follow symlinks to a file, and not recurse into directory symlinks. That would be inconsistent with du, and I'm not sure if that serves all use cases. Another "solution" is to keep the vulnerability in there and add a huge warning in the manual (though I'm sure someone will ignore that and run ncdu -L on an untrusted directory anyway...). I'm open to suggestions. ### Reply 3 **Added**: 2014-01-24 11:32:46 GMT **by** Florian First: I'm no c programmer. So I only can guess ^^ But it looks like the coreutils du uses a dict with inode and device to track duplicate files. What about that? ### Reply 4 **Added**: 2014-01-24 11:50:37 GMT **by** Yorhel That's what I meant with dev/inode comparison. I'm not a huge fan of that solution because it requires a lookup table containing all directory entries, but only when the -L option is present. That's, in my opinion, a moderately large code+complexity increase just for handling a very uncommon (but scary) situation. There's another annoyance that I just thought of. What to do when the symlink to a directory is scanned before the directory itself is scanned? With the dev/inode comparison solution, the symlink will appear to be the actual directory and the actual directory will appear to be the symlink. That's rather unintuitive and may result in wrong directory sizes. Correctly handling that situation will increase code complexity even more. :-( (Sometimes it feels that all I ever do is complain...) ### Reply 5 **Added**: 2014-11-01 21:29:52 GMT **by** anarcat the patch provided works well. ### Reply 6 **Added**: 2017-03-11 16:17:23 GMT **by** Rolf Another happy and thankful ncdu and git-annex user here. I'm not a programmer and I'm not sure of the current algorithms used in ncdu, but maybe you can use 'find' to generate the list of files to consider? The find program seems to have loop detection when using the -L parameter. ### Reply 7 **Added**: 2017-03-11 16:19:50 GMT **by** Rolf Using find would make it trivial to deal with bug 33 as well. ### Reply 8 **Added**: 2017-03-11 16:23:30 GMT **by** Rolf and bug 56 ### Reply 9 **Added**: 2018-07-23 09:57:05 GMT **by** Anonymous > One simple solution I can think of is to limit -L to only follow symlinks to a file, and not recurse into directory symlinks. That would be inconsistent with du, and I'm not sure if that serves all use cases Even a partial solution is better than no solution in my opinion. And in this case it would at least cover this use case (git-annex) since it only makes use of symlinks to files. ### Reply 10 **Added**: 2019-01-24 08:51:23 GMT **by** Yorhel; **Status**: _new_ to _fixed_; **Closed**: _no_ to _yes_ The git version has a simple -L,--follow-symlinks option now, should be sufficient for a git-annex directory. ### Reply 11 **Added**: 2019-01-24 09:48:51 GMT **by** Rolf Awesome! Thank you very much indeed. I will check it out. ### Reply 12 **Added**: 2019-01-24 10:25:01 GMT **by** Rolf Are you going to tag 1.14 soon? ### Reply 13 **Added**: 2019-01-24 10:29:23 GMT **by** Yorhel Unless something strange happens, I expect to release a 1.14 in a week or two.
Multi added the
imported
label 2019-03-23 18:03:23 +01:00
Multi closed this issue 2019-03-23 18:03:23 +01:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: yorhel/ncdu#14
No description provided.