FAQ SearchLogin
Tuxera Home
View unanswered posts | View active topics It is currently Sun Jun 13, 2021 13:06



Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2  Next
Applying WIM file to NTFS 
Author Message

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Applying WIM file to NTFS
Hi,

I am working on a library (http://sourceforge.net/projects/wimlib) to create, modify, mount, and apply WIM files from a Linux system. I am now working on implementing support for applying a WIM directly to a NTFS volume. I see there was a thread here several years ago about just this topic, but I'm not aware of that person's code being available anywhere, and the library I'm working on will have more extensive support for WIM files.

I am implementing the WIM apply to NTFS by calls into the NTFS-3g library, such as ntfs_create() and ntfs_attr_pwrite(). However I have run into a few issues.

First of all I'd like to set file attributes and security descriptors on NTFS files, as these are specified in the WIM file. However, I would like to set these on NTFS inodes since I have the inode pointers already and want to avoid unnecessary path lookups. To do this I've had to duplicate the security.c file from libntfs-3g and make functions

int _ntfs_set_file_attributes(ntfs_inode *ni, s32 attrib);

int _ntfs_set_file_security(ntfs_volume *vol, ntfs_inode *ni,
u32 selection, const char *attr);

So what I'm wondering is, would it make any sense to have functions like this be officially part of libntfs-3g? Perhaps ntfs_set_file_attributes() and ntfs_set_file_security() could just be wrappers around the inode versions.

Besides that, there's another issue with applying the security descriptors. Basically, in the WIM file format there is a table of security descriptors in the SECURITY_DESCRIPTOR_RELATIVE format, and an integer for each directory entry indexes the table of security descriptors. But the problem is that even in the official "install.wim" file provided by Microsoft for Windows 7, not all of the security descriptors pass the call to ntfs_valid_descr(). More specifically, there is a security descriptor whose last member is a SACL with no ACE's, which causes the check

Code:
(!offsacl || ((offsacl >= sizeof(SECURITY_DESCRIPTOR_RELATIVE))
                            && (offsacl+sizeof(ACL) < attrsz)))

in ntfs_valid_descr() to return false. Shouldn't this be changed to

Code:
(!offsacl || ((offsacl >= sizeof(SECURITY_DESCRIPTOR_RELATIVE))
                            && (offsacl+sizeof(ACL) <= attrsz)))


to allow for an empty SACL at the end of the security descriptor, or am I missing something?

Thanks for any help or comments anyone might be able to provide! I expect to run into more problems as I test the code because this is challenging to implement correctly.


Fri Aug 24, 2012 21:50
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
I am working on a library (http://sourceforge.net/projects/wimlib) to create, modify, mount, and apply WIM files from a Linux system. I am now working on implementing support for applying a WIM directly to a NTFS volume.

Great !
Quote:
I see there was a thread here several years ago about just this topic, but I'm not aware of that person's code being available anywhere

I am not aware of any public code on this matter.
Quote:
First of all I'd like to set file attributes and security descriptors on NTFS files, as these are specified in the WIM file. However, I would like to set these on NTFS inodes since I have the inode pointers already and want to avoid unnecessary path lookups.
[...]
So what I'm wondering is, would it make any sense to have functions like this be officially part of libntfs-3g? Perhaps ntfs_set_file_attributes() and ntfs_set_file_security() could just be wrappers around the inode versions.

If they are really useful (and simple), I would not object inserting them in a future version. Please attach a proposal (note : this forum only accepts compressed attachments)
Quote:
But the problem is that even in the official "install.wim" file provided by Microsoft for Windows 7, not all of the security descriptors pass the call to ntfs_valid_descr(). More specifically, there is a security descriptor whose last member is a SACL with no ACE's, which causes the check
[...]

Allowing an SACL with no ACEs is reasonable. I have never seen this situation, but I can check that chkdsk does not object.

Please post the desired patches when you consider you have identified all your needs.

Regards

Jean-Pierre


Sat Aug 25, 2012 09:02
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Okay, thank you. I will prepare some patches to NTFS-3g when I'm ready.

At this point, I am able to successfully apply the "Windows 7 Starter" image from Windows 7's "install.wim" file to a NTFS volume and boot Windows from it, after setting up a MBR and running bcdboot.exe to set up the Boot Configuration Data. However, hard links are not yet working, and I had to disable them for this test. I am making them with ntfs_link(). I will try to track down the issue; it may or may not be with NTFS-3g. Coincidentally, there is a weird issue in some WIM files where some files marked as being in the some hard link group despite not actually being the same, although I'm working around this problem by splitting the hard link groups. I don't suppose there's some weird NTFS feature that explains this, such as inode numbers that get re-used?

As long as I am implementing NTFS apply, I may try to implement NTFS capture as well. I expect to run into some problems with this as well.

Thank you!


Sat Aug 25, 2012 16:09
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
However, hard links are not yet working, and I had to disable them for this test. I am making them with ntfs_link(). I will try to track down the issue; it may or may not be with NTFS-3g.

An important thing to keep in mind is that an inode should never be opened twice, even for read-only : there would be two different copies and the changes in one of them would be lost. There are two hooks in the code to insert checks : debug_double_inode() and debug_cached_inode(). I can provide such debug code if needed.
Quote:
Coincidentally, there is a weird issue in some WIM files where some files marked as being in the some hard link group despite not actually being the same, although I'm working around this problem by splitting the hard link groups.

Possible cause : several file attributes are replicated in the parent directories so that directory contents can be displayed without opening the actual files. When a hard linked file is updated, Microsoft only updates the parent directory which was used to update the file, whereas ntfs-3g updates all the parent directories.
A file can only have a single short name (with an associated long name) per directory. When there are several names for the same file to be inserted into the same directory, put the long name associated to a short name first, then the short name (special procedure), then the other names. This is to ensure that the short/long name pairs get correctly associated.

Hope this helps

Regards

Jean-Pierre


Sat Aug 25, 2012 18:27
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Hi,

I believe I've fixed the hard link problem. The inode for the directory containing the link being created needs to be closed before the inode for the link target, as is done in ntfs_fuse_link().

I have implemented setting the timestamps by doing a depth-first traversal of the directory tree after applying all the files and calling ntfs_inode_set_times(). That seems to work all right.

I still need to set the short (DOS) filenames, which I had forgotten about earlier. I'm using the ntfs_set_ntfs_dos_name() function. I'll keep in mind what you mentioned about the short names. So based on what you said, my understanding is that when extracting a WIM dentry that is part of a hard link set, I should search for other dentries in the same directory that are part of the same hard link set, return an error if more than one has a short name, and extract the one with the short name before the others; is this correct?

Thanks!


Sat Aug 25, 2012 21:43
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
The inode for the directory containing the link being created needs to be closed before the inode for the link target, as is done in ntfs_fuse_link().

And if the parent directories are the same, be sure not to open it twice.
Quote:
when extracting a WIM dentry that is part of a hard link set, I should search for other dentries in the same directory that are part of the same hard link set, return an error if more than one has a short name,

Yes. At this stage also identify the associated long name and return an error if there are several (or none if there were no short name). This may come in two forms :
- a single name (both Win32 and Dos)
- two different names (a Win32 one and a Dos one).
Quote:
extract the one with the short name before the others; is this correct?

- first the long name (either Win32 or Win32+Dos)
- then the short name (either Dos or Win32+Dos, the latter being repeated)
- then all the Posix names.

Regards

Jean-Pierre


Sun Aug 26, 2012 07:05
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Quote:
And if the parent directories are the same, be sure not to open it twice.


I'm a bit confused about what counts as opening an inode. If I call ntfs_pathname_to_inode() to open the target of a hard link, does that implicitly open the parent directory as well, which would be a mistake if that directory was already open? Doing this doesn't seem to be a problem in ntfs_fuse_link(), though.

Quote:
- first the long name (either Win32 or Win32+Dos)
- then the short name (either Dos or Win32+Dos, the latter being repeated)
- then all the Posix names.


Thanks, I think that part is somewhat clear now. Unfortunately, nowhere in the WIM does it say whether a given filename is in the Win32 or POSIX namespace. Each WIM directory entry just has a regular (long) filename and a DOS (short) filename, which may be empty. So if there is a short name, the corresponding long name in the directory entry must be Win32, but if there is no short name among all the hardlinked dentries in a directory, one could arbitrarily be chosen to be in the Win32 namespace and the rest would be in the POSIX namespace (well technically, I would have to make sure the Win32 name is a valid Win32 name), or alternatively they could all be placed in the POSIX namespace. Does that sound right?

And I see that ntfs_create() always creates a name in the POSIX namespace, but the POSIX name will be changed to Win32 (or Win32+DOS) if ntfs_set_ntfs_dos_name() is called to set the DOS name, provided that the extra POSIX names haven't been added already; is that correct?

I wonder if Microsoft's imagex.exe can really deal with all these weird cases... I bet some bugs would show up if it were to be fed the right WIM files. The tricky thing about WIMs is that it's single-instance storage, so a "stream" may be shared among any number of dentries and their un-named and named data streams and reparse point data buffers in any number of WIM images, and this is regardless of which dentries are actually hard-linked to each other. Luckily I've gotten most of this figured out though.

Thanks for all the help! I've also started working on the code to capture a WIM from NTFS and will be testing it soon.


Sun Aug 26, 2012 16:59
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Okay, Microsoft's "imagex.exe" program doesn't even work correctly. If you have a file with alternate data streams and make a hard link going to it, the alternate data streams will not be captured correctly (they will all be copies of the unnamed data stream). So of course, when it applies the WIM, the alternate data streams are incorrect. There isn't some NTFS issue with this combination, is there? It seems to work correctly before the WIM is captured.

Also, when "imagex.exe" captures a hard-linked file, all the dentries except for one will have no references to data streams. This makes sense because the data streams are shared among the dentries in the hard link set anyway. However, this is inconsistent with the WIMs distributed by Microsoft (Windows 7/8 "boot.wim", "install.wim") which have the data stream information duplicated, and in the Windows 7 "install.wim" there are actually inconsistencies among the data streams in hard link sets. So I don't even know what Microsoft is doing here; maybe they use their own version of the software, or maybe they use a hidden switch on it to change its behavior.

The alternate data streams really are a design disaster; they don't make any sense and Microsoft's own software doesn't even support them correctly...


Sun Aug 26, 2012 18:09
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
I'm a bit confused about what counts as opening an inode. If I call ntfs_pathname_to_inode() to open the target of a hard link, does that implicitly open the parent directory as well, which would be a mistake if that directory was already open?

Yes, ntfs_pathname_to_inode() may have to open temporarily the parent directory, so you have to open the file first, then the parent directory.
Quote:
alternatively they could all be placed in the POSIX namespace. Does that sound right?

The Win32 and DOS names always go by pairs, and unlinking one of them implies deleting both. If you make them Posix names, only one of them would be deleted, and the file will not be deleted until the second name is unlinked. So this is not the same. A better approximation would be to ignore DOS-only names, and create Win32 and Win32+DOS as Posix.... But there must be some hidden way of telling the kind of each name (the order in which they appear may be meaningful...)
Quote:
And I see that ntfs_create() always creates a name in the POSIX namespace, but the POSIX name will be changed to Win32 (or Win32+DOS) if ntfs_set_ntfs_dos_name() is called to set the DOS name,

It has to be so, because after the first step the file cannot be left with a Win32 name and no DOS name, which would be an invalid configuration. A buggy user program could create a file with a Win32 only name.
Quote:
...provided that the extra POSIX names haven't been added already; is that correct?

That is a real problem. The kernel to fuse interface is based on inode numbers, and fuse has to rebuild the path to files from a designated inode number. So when there are several paths to a file, fuse (or lowntfs-3g, which does not rely on paths) cannot reliably tell which one was actually used by an application. This is why I recommend putting the DOS name at the stage where there is no ambiguity.
Quote:
Okay, Microsoft's "imagex.exe" program doesn't even work correctly. If you have a file with alternate data streams and make a hard link going to it, the alternate data streams will not be captured correctly (they will all be copies of the unnamed data stream).

I do not understand what you mean. Hard links are several names to a file, and a file is an inode number designating a list of attributes (such as time stamps, ACL, a main data stream and optional named data streams, etc.). When you create a link you create a new name, but you cannot associate it to copies of alternate data streams. The new name may however force a reorganization of the base inode so as to fit into the space allocated (generally 1K), and some attribute may have to be moved elsewhere, but when this happens, all the links remain associated to the moved attribute.
Quote:
However, this is inconsistent with the WIMs distributed by Microsoft (Windows 7/8 "boot.wim", "install.wim") which have the data stream information duplicated, and in the Windows 7 "install.wim" there are actually inconsistencies among the data streams in hard link sets.

I do not have Windows 7, but I have the Windows 8 Release Preview. In the 64 bit version, I see the file "brcoinst.dll" with eight names in different directories. Most of them are recorded with creation time 19 May 2012 5:00:26 UTC, but one of them has creation time 12 Jul 2012 1:36:04 UTC. Is this what you mean, which is explained in remarks on http://msdn.microsoft.com/en-us/library ... 60(v=vs.85).aspx ? You will not get this with ntfs-3g, the time stamps will allways be consistent.
Quote:
So I don't even know what Microsoft is doing here; maybe they use their own version of the software, or maybe they use a hidden switch on it to change its behavior.
The alternate data streams really are a design disaster; they don't make any sense and Microsoft's own software doesn't even support them correctly...

The problem with Microsoft is lack of specifications and partial information (on the msdn page quoted above, the fact that new links are always created as Posix is only mentioned as a possible bug in a user comment, with not official reply). You have to live with it, and guess what your data sample really means.

Regards

Jean-Pierre


Sun Aug 26, 2012 23:22
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Quote:
Yes, ntfs_pathname_to_inode() may have to open temporarily the parent directory, so you have to open the file first, then the parent directory.


Thanks, that's helpful. I think I'm not opening the files in the right order when making a link to a file in the same directory.

Quote:
The Win32 and DOS names always go by pairs, and unlinking one of them implies deleting both.


Okay, that's helpful too. I hadn't realized that a Win32 name cannot exist without a corresponding DOS name. So if the WIM does not provide a DOS name for a file in a directory, all the "long" names for that file in that directory must be in the POSIX namespace; therefore, there should be no ambiguity about which namespace the names provided in the WIM dentries are in.

Quote:
I do not understand what you mean. Hard links are several names to a file, and a file is an inode number designating a list of attributes...


If a NTFS file has named data streams, they should be captured in the WIM file. The references to the named data streams, as well as the un-named data stream and other information such as security ID and timestamps, go in a WIM "directory entry", even though this information is in reality associated with an NTFS inode, of which there is no direct analogue in a WIM. Instead, there is an integer (analogous to an inode number, I guess) associated with each WIM directory entry that tells you which hard link group it's in.

The problem I noticed is that if you create a NTFS file (inode) with a named data stream, then create a hard link to the file (I did so in the same directory, but that may not matter), the named data streams will not be captured correctly. For example, I did the following on a clean C: drive:

Code:
echo 1 > file
echo 2 > file:ads
mklink /h link file
imagex /capture c: j:test.wim "test"


The problem is that if you look at or apply the resulting "test.wim", the named data stream "ads" of "file" will contain "1" instead of the expected "2" that I wrote into it. However, if the link "link" was not made, then the named stream contained the correct contents. I do not know why making the hard link should make a difference, and I think this is just a bug in Microsoft's "imagex" program.

Quote:
I do not have Windows 7, but I have the Windows 8 Release Preview. In the 64 bit version, I see the file "brcoinst.dll" with eight names in different directories. Most of them are recorded with creation time 19 May 2012 5:00:26 UTC, but one of them has creation time 12 Jul 2012 1:36:04 UTC. Is this what you mean, which is explained in remarks on http://msdn.microsoft.com/en-us/library ... 60(v=vs.85).aspx ? You will not get this with ntfs-3g, the time stamps will allways be consistent.


I am actually not checking the timestamps of hardlinked WIM dentries for consistency at this point; I am only checking the file attributes (meaning the flags like FILE_ATTRIBUTE_SYSTEM), the security descriptor, and SHA1 message digests of the un-named and named data streams or reparse point data buffers. There were no inconsistencies in "install.wim" from Windows 8 Release Preview based on these fields, although it's possible there were still some inconsistent timestamps which I'm not too worried about.

However, in "install.wim" from Windows 7, there are inconsistencies in the contents of the streams themselves. Just to pick an example, "/Windows/System32/wshqos.dll" and "/Windows/System32/el-GR/cdosys.dll.mu" both have a hard link group ID of 0x1000000007718, but one has 13824 bytes of data, and the other has 51712 bytes of data (both in the un-named data stream). These definitely should be different files (inodes), so I believe I cannot assume that all dentries that share the hard link group ID are actually part of the same hard link group, although I can still assume that all dentries in the same hard link group will share the same hard link group ID.


Mon Aug 27, 2012 00:31
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
The image for the the 64-bit Windows 8 Release Preview has been applied with my code (and, of course, your code in libntfs-3g) and is running.


Mon Aug 27, 2012 02:28
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
Instead, there is an integer (analogous to an inode number, I guess) associated with each WIM directory entry that tells you which hard link group it's in.

You showed 0x1000000007718 as an example of group ID. This is very likely the inode number in the original volume (number 0x7718 reused once).
Quote:
The problem is that if you look at or apply the resulting "test.wim", the named data stream "ads" of "file" will contain "1" instead of the expected "2" that I wrote into it.

Most likely this is a consequence of how hardlinks work in Windows : consistency of attributes is not enforced across links (more below). To get a clear idea of the resulting state, identify the inode numbers of link and file (they should be the same) from the original volume, and post the ntfsinfo output :
Code:
ls -li file link
# as root
ntfsinfo -fvi inode-number volume

*edit* check the parent directory as well :
Code:
ls -ldi ..
ntfsinfo -fvi directory-inode-number volume

*end of edit*
Also try updating link, file and file:ads on Windows after the link is created and see what happens on the original volume.
Quote:
"/Windows/System32/wshqos.dll" and "/Windows/System32/el-GR/cdosys.dll.mu" both have a hard link group ID of 0x1000000007718, but one has 13824 bytes of data, and the other has 51712 bytes of data (both in the un-named data stream)

This is just another example of what I explained for time stamps. When a file is updated, Windows only updates the link which is used to designate the file. You cannot create such an inconsistent configuration with ntfs-3g. Either you compare the time stamps to identify the file size defined in the latest update, or you collect the size from the data stream itself. The same goes for all the attributes which are replicated in the parent directories : the time stamps, the allocated and data sizes, the file attributes, and the reparse point tag (I have a doubt about the inode numbers, maybe an update counts as a reuse, in which case the upper 16 bits will be incremented).
Quote:
The image for the the 64-bit Windows 8 Release Preview has been applied with my code (and, of course, your code in libntfs-3g) and is running.

Great !

Regards

Jean-Pierre


Mon Aug 27, 2012 09:09
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Quote:
You showed 0x1000000007718 as an example of group ID. This is very likely the inode number in the original volume (number 0x7718 reused once).


I've attached the dump of WIM dentries for the "file" and "link" example, and the hard link group ID that was used in the WIM was 0x11000000000023. The inode number was indeed 0x23, but I don't know what the high bits mean. It's possible that they are not actually part of the hard link group ID. I already know that the fields in the WIM dentry are not the same as they are documented. Maybe the hard link group ID is really only 32 bits? In "install.wim" for Windows 7, the high 16 bits vary between 0x1 and 0x6.

The root dentry has a hard link group ID of 0, but this is expected because this is the case for all dentries that are singletons in a hard link group (including, hopefully, all directories).

Quote:
Most likely this is a consequence of how hardlinks work in Windows : consistency of attributes is not enforced across links (more below). To get a clear idea of the resulting state, identify the inode numbers of link and file (they should be the same) from the original volume, and post the ntfsinfo output


The output for the ntfsinfo commands are attached. The un-named data stream should contain "1 \r\n" (4 bytes) while the named data stream should contain "22 \r\n" (5 bytes). I've also attached a dump of the WIM dentries of the WIM captured by imagex.exe. Note that there are two alternate stream entries listed on the "file" dentry, one of which has no name and the other of which is named "ads", but they both point to a 4-byte file resource with the same SHA1 message digest (this is the data "1 \r\n"). (Also, I don't know why they put the un-named data stream in an alternate stream entry instead of in the dentry itself.).

Anyway, I've also set up the same filesystem using NTFS-3g and captured it in a WIM on Windows, and I encountered the same problem. So I think this has to be a bug with Microsoft's program.

Quote:
This is just another example of what I explained for time stamps.


Unfortunately there are quite a lot of dentries affected by these inconsistencies in the Windows 7 install.wim (around 3900) and the inconsistencies occur between very different files. For example, a DLL might be marked as being in the same hard link group as a font file, which definitely is incorrect. So it can't be the case that all the hard link groups are correct. But if we assume that due to the NTFS issues, the dentries in a hard link group need not even have consistent data, then it may be impossible to actually determine all hard link groups correctly. I think I'll have to do some tests with imagex.exe to see how it handles some of these cases.


Attachments:
File comment: List of dentries in WIM captured with imagex.exe
wim_dentries.info.gz [634 Bytes]
Downloaded 1043 times
File comment: ntfsinfo output for root directory inode
root_inode.info.gz [1.83 KiB]
Downloaded 1054 times
File comment: ntfsinfo output for inode shared by the names "file" and "link"
file_inode.info.gz [825 Bytes]
Downloaded 1031 times
Mon Aug 27, 2012 21:29
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
The high 16 bits of the hard link ID has to be be some sort of inode re-use count. Each time I delete the files, re-make them, and capture a new WIM, the high 16 bits of each hard link ID go up by 1.

I did a test where I used imagex.exe to apply a WIM with two hard link sets falsely sharing their hard link group ID. They were applied as different hard link sets when the file contents were different (note: no named data streams were used here), but as the same hard link set when the file contents were the same. So the file contents (and perhaps other fields) must be used to disambiguate the hard link groups, which is what I'm trying to do in my program. Please note that this may be a past issue that's been fixed, as I don't notice this problem in the Windows 8 WIMs. Also, this is separate from the issue where the named data streams get captured incorrectly, which is an issue I doubt Microsoft has even noticed, since they use no named data streams in any of their Windows or Windows PE images.

Anyway, the WIM file format really is a huge mess, and I don't think you need to get too concerned about some of these details. NTFS is confusing enough already!


Mon Aug 27, 2012 22:07
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
I've attached the dump of WIM dentries for the "file" and "link" example, and the hard link group ID that was used in the WIM was 0x11000000000023

Your example does not show what I wanted, I have modified it below to show you.
Quote:
The inode number was indeed 0x23, but I don't know what the high bits mean. It's possible that they are not actually part of the hard link group ID. I already know that the fields in the WIM dentry are not the same as they are documented. Maybe the hard link group ID is really only 32 bits? In "install.wim" for Windows 7, the high 16 bits vary between 0x1 and 0x6.

The inode number is 48 bits, and the high 16 bits is a reuse count. When a file is deleted and a new file is created, the latter may reuse the inode number, in which case the reuse count is incremented. You should check the 64 bits, a different reuse count means a different file.
Quote:
But if we assume that due to the NTFS issues, the dentries in a hard link group need not even have consistent data, then it may be impossible to actually determine all hard link groups correctly. I think I'll have to do some tests with imagex.exe to see how it handles some of these cases.

The inconsistencies are mostly related to copies in parent directories, but each attribute has a reference value in the inode :
- the sizes should be taken from the $DATA record in the inode,
- the stamps and attributes should be taken from the $STANDARD_INFORMATION
- and the optional reparse tag should be taken from $REPARSE_DATA
This is a detailed example :
Code:
# This was first done in a Windows terminal
# (do not open explorer on the same directory)
# unneeded output removed, and comments inserted
F:\testlink\links>echo 1 > file
F:\testlink\links>echo 2 > file:ads
F:\testlink\links>mklink /h link file
F:\testlink\links>dir
28/08/2012  09:55                 4 file
28/08/2012  09:55                 4 link
# file and link are shown consistent
F:\testlink\links>echo 3333 >> file
F:\testlink\links>dir
28/08/2012  09:57                11 file
28/08/2012  09:55                 4 link
# they are no more consistent
F:\testlink\links>echo 4 >> link
28/08/2012  09:57                11 file
28/08/2012  09:58                15 link
# consistency was restore while opening link before appending
# but the entry file has not been updated
F:\testlink\links>type file
1
3333
4
# this is the correct contents
F:\testlink\links>type link
1
3333
4
# this is also the correct contents
F:\testlink\links>dir
28/08/2012  09:58                15 file
28/08/2012  09:58                15 link
# both names were opened, they are now consistent
F:\testlink\links>echo 5 >> file:ads
F:\testlink\links>echo 6 >> link:ads
F:\testlink\links>type file:ads
# (error : wrong syntax....)
F:\testlink\links>echo 7 >> file
F:\testlink\links>echo 8 >> link
28/08/2012  10:01                19 file
28/08/2012  10:01                23 link
# this was to restore inconsistency before switching to Linux

And this is the result seen from Linux :
Code:
[linux@pavilion2 symlinks]$ cd links
[linux@pavilion2 links]$ /bin/ls -l
total 1
-rwxrwxrwx 2 root root 23 Aug 28 10:01 file
-rwxrwxrwx 2 root root 23 Aug 28 10:01 link
[linux@pavilion2 links]$ cat file
1
3333
4
7
8
[linux@pavilion2 links]$ cat link
1
3333
4
7
8
# (both names lead to the correct contents)
[linux@pavilion2 links]$ getfattr -e text -n user.ads file
# file: file
user.ads="2 \015\0125 \015\0126 \015\012"
[linux@pavilion2 links]$ getfattr -e hex -n user.ads link
# file: link
user.ads=0x32200d0a35200d0a36200d0a
# (both names lead to the correct contents)
ntfsinfo -fvi 8490 /dev/sdb1
# (partial output)
Dumping attribute $STANDARD_INFORMATION (0x10) from mft record 8490 (0x212a)
   File Creation Time:    Tue Aug 28 07:55:15 2012 UTC
   File Altered Time:    Tue Aug 28 08:01:14 2012 UTC
   MFT Changed Time:    Tue Aug 28 08:01:14 2012 UTC
   Last Accessed Time:    Tue Aug 28 08:14:37 2012 UTC
   File attributes:    ARCHIVE (0x00000000)
# above are the reference time stamps and attributes for this file
Dumping attribute $FILE_NAME (0x30) from mft record 8490 (0x212a)
   File Creation Time:    Tue Aug 28 07:55:15 2012 UTC
   File Altered Time:    Tue Aug 28 07:55:15 2012 UTC
   MFT Changed Time:    Tue Aug 28 07:55:15 2012 UTC
   Last Accessed Time:    Tue Aug 28 07:55:15 2012 UTC
# the time stamps above should not be used
   Allocated Size:       24 (0x18)
   Data Size:       0 (0x0)
# the above sizes should not be used
   File attributes:    ARCHIVE (0x00000000)
# the above attributes should not be used
   Namespace:       Win32 & DOS
   Filename:       'file'
Dumping attribute $FILE_NAME (0x30) from mft record 8490 (0x212a)
   File Creation Time:    Tue Aug 28 07:55:15 2012 UTC
   File Altered Time:    Tue Aug 28 07:55:35 2012 UTC
   MFT Changed Time:    Tue Aug 28 07:55:35 2012 UTC
   Last Accessed Time:    Tue Aug 28 07:55:15 2012 UTC
# the time stamps above should not be used
   Allocated Size:       24 (0x18)
   Data Size:       4 (0x4)
# the above sizes should not be used
   File attributes:    ARCHIVE (0x00000000)
# the above attributes should not be used
   Namespace:       POSIX
   Filename:       'link'
Dumping attribute $DATA (0x80) from mft record 8490 (0x212a)
   Name length:       0 (0x0)
   Data size:       23 (0x17)
   Data offset:       24 (0x18)
# above is the reference data size for the stream and for the file
Dumping attribute $DATA (0x80) from mft record 8490 (0x212a)
   Attribute name:       'ads'
   Data size:       12 (0xc)
# above is the reference data size for the ads

Quote:
So the file contents (and perhaps other fields) must be used to disambiguate the hard link groups, which is what I'm trying to do in my program.

Probably yes. Try to replay the example above.
Quote:
Also, this is separate from the issue where the named data streams get captured incorrectly, which is an issue I doubt Microsoft has even noticed, since they use no named data streams in any of their Windows or Windows PE images.

This may be left in the dark because Microsoft does not need it yet, but it may emerge some day in an unexpected way.

Regards

Jean-Pierre


Tue Aug 28, 2012 10:53
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Hi,

Thanks for the example! I see how the inconsistencies in the timestamps and stream sizes in NTFS can be created now.

Unfortunately (or rather, fortunately) this does actually not seem to be the problem for WIM files. I tested making a filesystem with the parent directory information out-of-sync, as you did above, then I captured it into a WIM with imagex.exe. All timestamps and file streams were captured as the most recent version, and there were no inconsistencies. Furthermore, I examined the Windows 7 and Windows 8 install.wims, boot.wims, and a WIM for the Windows Recovery Environment and a split WIM for a backup of Windows 7. All three timestamps, all data streams, the security ID, and the file attributes were always consistent among members of a hard link group, except in the Windows 7 Service Pack 1 "install.wim", in which some hard link groups contained dentries with inconsistent file streams, security IDs, file attributes, or timestamps --- but the security ID, file attributes, or timestamps were never different without the main file stream being different as well.

Note that WIM dentries do not actually contain the stream size. They contain a SHA1 message digest of the stream contents that is used as a hash key to look up an entry in the WIM's "lookup table" that tells you the length of the stream and where to find it in the WIM file. So it isn't possible for a WIM dentry to have an incorrect stream length but still be associated with the correct stream. However, each WIM dentry does have its own timestamps, security ID, and file attributes.

Taking into consideration the fact that we have files like DLLs and fonts marked as being in the same hard link group, it has to be the case that these are truly different files (inodes) despite the fact they are marked as having the same hard link group ID (inode number) for some reason. And this interpretation seems consistent with how "imagex.exe" applies the WIM file.

So I'm planning to do the following:

- For each group of WIM dentries sharing the same hard link group ID, split it into the minimal number of "true" hard link groups that share all file attributes, security IDs, and file streams (however, merely looking at file streams would be sufficient based on the Windows 7 install.wim).
- For each "true" hard link group with inconsistent timestamps, change each timestamp to the most recent corresponding timestamp in the group (however, based on what I've seen, the timestamps shouldn't even be inconsistent).
- As a special case, dentries in a nominal hard link group may have no file stream information. In this case, I will have to require that there is only one "true" hard link group in the nominal hard link group, since otherwise it may be ambiguous which "true" hard link group the dentry is associated with.

Thanks for all the help!


Tue Aug 28, 2012 16:37
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
I see how the inconsistencies in the timestamps and stream sizes in NTFS can be created now.

Unfortunately (or rather, fortunately) this does actually not seem to be the problem for WIM files. I tested making a filesystem with the parent directory information out-of-sync, as you did above, then I captured it into a WIM with imagex.exe. All timestamps and file streams were captured as the most recent version, and there were no inconsistencies.

Fine, good to know.
Quote:
Taking into consideration the fact that we have files like DLLs and fonts marked as being in the same hard link group, it has to be the case that these are truly different files (inodes) despite the fact they are marked as having the same hard link group ID (inode number) for some reason.

Do you have an example (from Windows 8) ? Can you reproduce this ?
Quote:
So I'm planning to do the following:

Looks good, I will do some tests when some code is available.

Regards

Jean-Pierre


Tue Aug 28, 2012 21:21
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Quote:
Do you have an example (from Windows 8) ? Can you reproduce this ?


The problem with the inconsistent hard link groups does not appear in the install.wim for either the 64-bit Windows 8 Customer Preview or the 64-bit Windows 8 Release Preview, the latter of which contains two WIM images. The only place I've found it is in the Windows 7 Service Pack 1 install.wim, where it appears in all 5 images in the WIM.

I do not know if this issue appears in Windows Vista or the "original" Windows 7 because I don't have copies of the install media for either of them.

I have so far been unable to reproduce the issue. I used "imagex.exe" to capture a Windows 7 installation, and the issue did not show up. I then replaced the "wimgapi.dll" file (Microsoft's WIM library; imagex.exe is really just a front-end for it) with the oldest version I had available and tried again, but the issue still did not show up. Still, I think this is most likely a problem with Microsoft's WIM library, and it's possible that the actual version affected by the problem is not publicly available.

Quote:
Looks good, I will do some tests when some code is available.


If you'd like, you can check out the latest wimlib from the git repository and do some testing (configuring with --enable-debug may be helpful). The entry point for the hard link disambiguation code is the fix_link_groups() function in hardlink.c. However, I'm not sure it would be that useful especially since I'm not able to reproduce the problem in a WIM created using "imagex.exe". The only reason I'm really concerned about fixing up these hard link groups correctly (rather than just considering it to be an error) is that applying Windows 7 is probably the most important use case of the NTFS apply feature. Eventually I will run out of patience dealing with Microsoft's bugs though!


Wed Aug 29, 2012 02:38
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
I do not know if this issue appears in Windows Vista

I have an original Vista install DVD. It contains install.wim and boot.wim. How should I check for different file shown within the same group id ?

Regards

Jean-Pierre


Wed Aug 29, 2012 10:27
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

I cloned the repository, and tried to compile, but I got unexpected errors :

Code:
/bin/sh: build-aux/config.rpath: No such file or directory

Missing file ?
Code:
configure: error: Cannot find libntfs-3g.
checking for ntfs_set_file_security in -lntfs-3g... no

Search in a wrong directory ? I have it available at the standard location :
Code:
[linux@dimension wimlib]$ ls -l /lib64/libntf*
lrwxrwxrwx 1 root root     21 Jul  6 11:02 /lib64/libntfs-3g.so.836 -> libntfs-3g.so.836.0.0
-rwxr-xr-x 1 root root 290568 Jul  6 10:52 /lib64/libntfs-3g.so.836.0.0
[linux@dimension wimlib]$ disasm -sf /lib64/libntfs-3g.so.836.0.0 | grep ntfs_set_file_security
;   62 0x0034d2c31180  ntfs_set_file_security
          public  ntfs_set_file_security
ntfs_set_file_security :

And, when configuring with --without-ntfs-3g (obviously not what I want...)
Code:
config.status: error: cannot find input file: `Makefile.in'

Maybe these are consequences of the missing config.rpath file ?

Regards

Jean-Pierre


Wed Aug 29, 2012 12:16
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Quote:
I have an original Vista install DVD. It contains install.wim and boot.wim. How should I check for different file shown within the same group id ?


Configure the library with "--enable-debug", then on the install.wim file run

Code:
imagex info --metadata install.wim 1 | less


Search for a line that says

Quote:
[src/resource.c 1200] read_metadata_resource(): Fixing inconsistencies in the link groups


Directly below that, each split link group will be printed, if you have the latest version from git (and if there were actually any link groups split). For example, I have:

Code:
[src/resource.c 1186] read_metadata_resource(): Calculating dentry full paths
[src/resource.c 1192] read_metadata_resource(): Building link group table
[src/resource.c 1200] read_metadata_resource(): Fixing inconsistencies in the link groups
Split nominal link group 0x1000000007619 into 2 link groups:
------------------------------------------------------------------------------
[Split link group 1]
`/Windows/winsxs/x86_microsoft-windows-security-digest_31bf3856ad364e35_6.1.7600.16385_none_3aa3a13ade08a93a/wdigest.dll'
`/Windows/System32/wdigest.dll'

[Split link group 2]
`/Windows/System32/DriverStore/FileRepository/tsprint.inf_x86_neutral_c48d421ad2c1e3e3/tsprint-PipelineConfig.xml'
`/Windows/winsxs/x86_tsprint.inf_31bf3856ad364e35_6.1.7601.17514_none_6dfd51f9a39171c2/tsprint-PipelineConfig.xml'
------------------------------------------------------------------------------

... followed by a lot more nominal link groups that were split. If there are more images in the WIM, you can check them as well (e.g. imagex info --meta install.wim 1 | less).

Quote:
I cloned the repository, and tried to compile, but I got unexpected errors :


Seems to be a problem with libtool. Make sure you have a recent version of libtool, and try creating the file "build-aux/config.rpath" manually (it just needs to be an empty file). I also committed this file to the git repo. libntfs-3g is searched for using AC_CHECK_LIB([ntfs-3g], ...) so this should be correct. Make sure all the headers are present as well. If you still can't get the code to compile I can send you a distribution tarball.

Just last night I was just testing the build on a 2-year old version of Ubuntu and I fixed a couple issues, although the older libntfs-3g they have available wasn't compatible with the code that I cloned from security.c (which isn't really unexpected!). I am using the libntfs-3g dated 2012-1-15.


Wed Aug 29, 2012 16:21
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,

Quote:
Configure the library with "--enable-debug", then on the install.wim file run
imagex info --metadata install.wim 1 | less

Ok, I got it to compile, and run on the Vista install files, resulting into an output file with 1,253,304 lines.
Quote:
Search for a line that says
[src/resource.c 1200] read_metadata_resource(): Fixing inconsistencies in the link groups

This line is present, but there is no line like :
Quote:
Split nominal link group 0x1000000007619 into 2 link groups

The word "Split" is not to be found anywhere (no "split" either).
There are 17347 non-null link groups, their ids do not look like inode ids, they are numbered from 0x746b617800000003 to 0x746b617800008b43 (with holes in the numbering).

Hope this helps.

Regards

Jean-Pierre


Wed Aug 29, 2012 20:01
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Quote:
Ok, I got it to compile, and run on the Vista install files, resulting into an output file with 1,253,304 lines.


Great! I was actually worried that for Vista they might be using the "old" WIM format, which I don't support and don't have any examples of it to work with. (There is, of course, no documentation with regards to the difference between the formats.) I guess the older format was only used in the development releases of Vista.

Quote:
The word "Split" is not to be found anywhere (no "split" either).
There are 17347 non-null link groups, their ids do not look like inode ids, they are numbered from 0x746b617800000003 to 0x746b617800008b43 (with holes in the numbering).


Thanks, that's somewhat helpful. So the incorrect link groups are still only known to be an issue in the Windows 7 install.wim. I will keep the code though, since the same issue could show up in other WIMs.

I don't know where those numbers like 0x746b617800000003 came from, but it doesn't really matter. In my library, I just re-number the groups in order starting at 0x1, and that seems to work fine.

Thanks for all the help! Once I've done some more work on the NTFS capture code and the documentation I will submit some patches to NTFS-3g. Basically I need to have the four functions ntfs_inode_{set,get}_{attributes,security}().


Wed Aug 29, 2012 20:36
Profile
NTFS-3G Lead Developer

Joined: Tue Sep 04, 2007 17:22
Posts: 1286
Post Re: Applying WIM file to NTFS
Hi,
Quote:
Basically I need to have the four functions ntfs_inode_{set,get}_{attributes,security}().

Does the attached patch set suit your needs ?

Regards

Jean-Pierre


Attachments:
wim.patches.gz [2.09 KiB]
Downloaded 958 times
Thu Aug 30, 2012 13:31
Profile

Joined: Fri Aug 24, 2012 21:18
Posts: 30
Post Re: Applying WIM file to NTFS
Hi,

Overall the patch set looks pretty good! I've noticed a couple things though:

- For ntfs_get_inode_security(), having the psize argument as in ntfs_get_file_security() would be useful, so it's possible to find out how big the buffer for the security descriptor has to be.
- In ntfs_valid_descr(), should we allow the DACL (in addition to the SACL) to have no ACE's as well? I haven't noticed this issue, but this may be more consistent (unless the DACL absolutely cannot have no ACE's).
- The inode should not be closed in ntfs_get_inode_attributes().
- In ntfs_get_inode_attributes(), attrib should be initialized to -1 in case EINVAL is returned before setting attrib.
- In ntfs_get_inode_security(), shouldn't the return value be the security ID, not the buffer size?
- In ntfs_get_inode_security(), errno is set to EINVAL if getsecurityattr() returns NULL, but this can happen because of an out of memory error, so errno should be preserved.

That's all I noticed just from looking at the code. I'm going to make it possible to build wimlib using either NTFS-3g from 2012-1-15 or the new version with the patches; also I'm going to have to write some test cases for NTFS capture and apply so it's easier to make sure everything is working correctly.

Thanks!


Thu Aug 30, 2012 16:46
Profile
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2  Next


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Original forum style by Vjacheslav Trushkin.