FAQ SearchLogin
Tuxera Home
View unanswered posts | View active topics It is currently Fri May 07, 2021 03:20



Post new topic Reply to topic  [ 9 posts ] 
ntfsclone rescue extremely slow for little benefit 
Author Message

Joined: Tue Jan 01, 2013 15:46
Posts: 6
Post ntfsclone rescue extremely slow for little benefit
modern hard drives have 4k sectors, but ntfsclone has no option to read and handle errors in larger chunks. The current copy_cluster() implementation leads to 9 waits for every bad 4k, instead of 1. (one for the 4k read, and another 8 for the component 512B reads from rescue_sector()).

I thought it would save time to copy my windows install off a drive that was just starting to fail in a couple places, rather than do a fresh install and re-config all my settings, then restore my data from backups. It's been copying for about 24 hours now.

By comparison, rsync gives up on a file as soon as it hits an error in the file, and made a copy of the filesystem tree from a readonly mount in maybe 2 or 3 hours (110GB).

With the size of modern hard drives, and how long they take to timeout when reading a failing sector, unless the sectors contains valuable metadata, it's not worth the nearly 10x extra time trying to read it. Esp. as 4k sector drives become more common, and partition tools start defaulting to aligning partitions to boundaries equal or greater than 4k. In the worst case for this, where only a single 512B sector is actually damaged, losing the extra 3.5k is unlikely to be much worse. A multimedia file might still play, a compressed is usually completely useless regardless of how little is left, a text file is probably junk.

If you think it's worth the complexity, you could have an option to control the error behaviour. e.g. --slow-rescue for the current behaviour. --rescue to just move on without any retries if 4k read of the cluster failed. If you wanted to be fancy, --fast-rescue could skip the rest of a file's data as soon as you hit an error anywhere in the file. i.e. only bother with files that can be recovered fully. That would actually be useful to have; knowing what's undamaged and fully usable is one of the more time-consuming parts of sorting out what you managed to save from a dieing drive. (I only back up my important data, but then I always find myself spending a lot of time seeing what I did manage to copy from a dieing drive, in case some of it was newer than the backups, or if there was anything I kinda still want that wasn't part of what I backed up, etc. etc.)

I wish I'd straced or looked at the source sooner, I would have re-started the copy with a modified version that does what I suggested for --rescue.


Tue Jan 01, 2013 16:23
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: ntfsclone rescue extremely slow for little benefit
Hi,

Quote:
modern hard drives have 4k sectors, but ntfsclone has no option to read and handle errors in larger chunks. The current copy_cluster() implementation leads to 9 waits for every bad 4k, instead of 1. (one for the 4k read, and another 8 for the component 512B reads from rescue_sector()).

Hmm. Yes, this code is indeed sub-optimal for 4k sectors.

Thank you for the report. I am putting the need for improvement on my todo list.

Jean-Pierre


Tue Jan 01, 2013 23:37
Profile

Joined: Tue Jan 01, 2013 15:46
Posts: 6
Post Re: ntfsclone rescue extremely slow for little benefit
First, sorry if I sounded a bit annoyed in my first post. Getting Windows to boot is pretty annoying, even worse than the worst parts of old LILO and GRUB combined it seems. I had a rsync-recovered filesystem tree which I gave up on because I couldn't get it to boot, before using ntfsclone. If windows wasn't an uncooperative piece of crap, I would have had my system up and running days ago.

Anyway after posting the above, I looked at the the strace output more carefully. I noticed that sometimes after failing on a 4k read, all 8 512B reads would succeed. (This is on a WD20EARS-00MVWB0, WD 2TB Green Power, 4k sectors). The 4k read timeouts took usually 120 secs, but the failing 512B read timeouts would be go from 115 secs to as short as 16 secs. Usually the longer timeout on the first 512B reads in a 4k sector, then shorter timeouts later in the sector. (since it's essentially retrying in the same 4k sector on the drive.) Possibly it's worth retrying the 4k read once in a 4k-sector slow recovery mode.


Wed Jan 02, 2013 02:27
Profile

Joined: Tue Jan 01, 2013 15:46
Posts: 6
Post Re: ntfsclone rescue extremely slow for little benefit
Thanks for taking a look at my suggestions :)

jpa wrote:
Hi,
Hmm. Yes, this code is indeed sub-optimal for 4k sectors.

Thank you for the report. I am putting the need for improvement on my todo list.

Jean-Pierre


Even for older hard drives with 512B sectors, I think most people would prefer a faster recovery that doesn't take so long trying to get every last recoverable sector, esp. outside of metadata.

It literally took ntfsclone about 36 hours to dump a 400GB partition with 115GB in use, with most of the data recoverable. As I said in my last msg, my Linux 3.2.0 (Ubuntu Precise livecd) reading a Western Digital 2TB Green Power drive does about 2 minute timeouts on failed read(2) system calls, down to 30 or 15 seconds for later 512B chunks within an already failed 4k block. I really don't know whether this is typical of failing 512B sector drives, or if this WD takes more time than most before giving up. I didn't take the time to plug in a failing older drive to check.

WD Raid-edition drives have much shorter fail timeouts, since you're supposed to use them in redudant setups where a fail falls back to another drive.


If you wanted to get really fancy about it, you could measure how long the read timeouts were taking, and decide to minimize retries if they were really slow. This is probably more complexity than is appropriate for a program that's designed to be a filesystem dumper, not really primarly a recovery tool. On the other hand, rescue dd can't avoid trying to read the whole device, not just the used portions.

It would be neat if there was a library for reading from possibly-bad block devices or files, with its own parameter handling, so we could get as fancy and configurable as we wanted to without bloating ntfsclone.c.


For my own recovery situation, I extracted my 400GB ntfs image to a sparse file on a spare 320GB drive I had lying around (and it finally finished), and resized it down to 175GiB, so now I can ntfsclone it onto the 200GB hard drive I wanted to use in my windows machine. I didn't have any unused hard drives lying around that were as big as the system partition on my failing windows machine :/

Oh, it would be nice to have an option to skip pagefile.sys and hiberfil.sys. That's 14GB of useless data for my case.


Wed Jan 02, 2013 02:55
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: ntfsclone rescue extremely slow for little benefit
Hi,

Quote:
Anyway after posting the above, I looked at the the strace output more carefully. I noticed that sometimes after failing on a 4k read, all 8 512B reads would succeed.

I do not have any real 4k-sectored device, so I cannot check myself, but I would think that only a full sector can be read, and that reading a part of it implies reading the full sector and either returning a read error or returning the requested part. If another part of the same sector is then requested, it is just read from cache (or tried to read again if there were a read error).
This means that the 512B reads from a sector should either all be successful or all fail (except for the situation where a sick sector can be read at a second or later attempt), and the significant unit for rescuing is the physical sector.
Quote:
For my own recovery situation, I extracted my 400GB ntfs image to a sparse file on a spare 320GB drive I had lying around (and it finally finished), and resized it down to 175GiB, so now I can ntfsclone it onto the 200GB hard drive I wanted to use in my windows machine.

This is a nice achievement, despite the time spent.

Regards

Jean-Pierre


Wed Jan 02, 2013 10:21
Profile

Joined: Tue Jan 01, 2013 15:46
Posts: 6
Post Re: ntfsclone rescue extremely slow for little benefit
jpa wrote:
Hi,
I do not have any real 4k-sectored device, so I cannot check myself, but I would think that only a full sector can be read, and that reading a part of it implies reading the full sector and either returning a read error or returning the requested part. If another part of the same sector is then requested, it is just read from cache (or tried to read again if there were a read error).
This means that the 512B reads from a sector should either all be successful or all fail (except for the situation where a sick sector can be read at a second or later attempt), and the significant unit for rescuing is the physical sector.


Yes, that's my understanding. The only thing that surprised me was that a sector could read ok on the 2nd try sometimes. So in the unlikely case you did want to retry 8 times, it should be from the start of the sector every time, so you get the whole sector. This is assuming the filesystem is on a 4kiB-aligned partition. (older linux fdisk defaulted to making partitions that weren't aligned on 4k boundaries, but I think Windows has for a while. So I expect there are at least some ntfs partitions in the wild that aren't aligned, and some of those have probably been whole-disk-DDed to new 4k-sector disks.)

I'm not sure whether addresses sent in SATA commands actually need to know what size sectors the HD actually uses, or if it still addresses in 512B sectors, or what. Obviously user-space can read whatever, and at some point the kernel or the hard drive, whichever it is, grows the request so it reads one or more full sectors.

Quote:
This is a nice achievement, despite the time spent.


Thanks, although it wasn't all smooth sailing even after that, thanks to Microsoft being horrible and bad at useful error messages, and not coming with good tools. Although I have to admit GRUB can be pretty confusing too, but at least it's documented exactly how it's supposed to work. And it only ever got complicated with multiple disks installed, which isn't the case for me... anyway.

It took me another day and a half of fighting with Windows and the slow-booting recovery/install DVD to make it bootable. After fixing the succeeding with about the 10th variation on re-creating the BCD succeeded, I could boot. I don't think in the end that I had to use bcdedit, just bcdboot. Anyway, hope someone finds these links helpful, because I had to dig through a about twice that many pages of non-technical answers for people that aren't even trying to understand what's going on.

But then Windows thought it should label the only partition on the only disk in the system as drive D:. But a lot of things obviously still reference c:, since explorer.exe didn't even start when I logged in (no desktop icons or taskbar). I Started cmd.exe from the task manager. Eventually ended up booting into safe mode and running regedt32 to rename D: to C:.

Crap like this is why I'll never consider Windows more than a toy for running games.

ntfsclone did exactly what it said it would, though, unlike what I had to deal with once I started having to use Microsoft's crap. So thanks again for providing reliable open-source tools for dealing with Windowsy things. It's much appreciated.

  • http://superuser.com/a/270476 (an answer to a question about copying files then making bootable)
  • Bootrec: sevenforums.com/tutorials/104341-bootmgr-missing-fix.html
  • BCDedit: support.microsoft.com kb/927391
  • rename a drive: answers.microsoft.com en-us/windows/forum/windows_7-system/file-location-of-windows-operating-system-files/08c0ec57-ca26-4a6c-b709-032de93c676e

apparently hitting the URL limit of 1, but I was going to put most of the useful URLs I found into one message, for future reference. Already had this typed, seems a shame to delete it.


Thu Jan 03, 2013 14:38
Profile

Joined: Tue Jan 01, 2013 15:46
Posts: 6
Post Re: ntfsclone rescue extremely slow for little benefit
while closing all the tabs I had open, I found one that might have actually been why I had Windows booting up and calling the boot volume drive D:. Maybe something about disk signatures. http://blogs.technet.com/b/markrussinov ... 63572.aspx.

IDK, not going to go back and check now.

Ok, I should probably stop going off-topic with this windows headache, and get back to how ntfsclone should handle read errors.


Thu Jan 03, 2013 14:57
Profile

Joined: Sat Jan 26, 2013 18:30
Posts: 1
Post Re: ntfsclone rescue extremely slow for little benefit
Got the same problem as OP. Changed NTFS_SECTOR_SIZE to 4096 - now it copies from the failing device a lot faster.
I would be glad to see the sector size as a command line option defaulting to device logical block size. Retries count there might help against cases OP mentioned, when the device succeeds after multiple tries, but in my case I prefer only 1 try because i don't have few days to wait for the ntfsclone to complete copying.


Sat Jan 26, 2013 18:45
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: ntfsclone rescue extremely slow for little benefit
Hi,

Quote:
Changed NTFS_SECTOR_SIZE to 4096

This may be useful in your case, but it is wrong, because this is the constant which defines how the integrity of metadata is checked, even for 4K-sector devices. This is required for compatibility with Windows.
In the fix I have prepared, the retry unit is the actual sector size. This will probably lead to the same behavior as your change.

Regards

Jean-Pierre


Mon Feb 04, 2013 16:18
Profile
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Original forum style by Vjacheslav Trushkin.