Wednesday, October 03, 2007

A painful Vista

My Dell Latitude D830 came with Microsoft Vista which, for the most part, has seemed like a prettied up XP without a lot of added useful functionality nor a substantial increase in stability. At the time I upgraded, I wrote that I was like so close to buying a Macbook Pro. I am sorry to say that I regret that decision even more today.

What got me to that stage? Well, it's a long painful path and to be honest, I'm not at the stage (yet) that I'm ready to just give in and replace my fairly new laptop.

The problems all started about 2 months after receiving the laptop. On July 14th, the day before I left for a trip to Shanghai, my email program (Thunderbird) locked up. When I restarted it, it still had problems and wouldn't pull mail from nfthe server, so I shutdown and restarted the computer.

During the reboot, the OS decided a chkdsk of the NTFS filesystem was necessary and it found and fixed many problems. When I got back to the running OS, all of the files that were actively open at the time I cleanly shutdown the system were gone. Totally gone. Not in the found.* directories (the NTFS equivalent of the UNIX lost+found directory).

Luckily I had the data backed up earlier that day as well as an offline backup on an external drive from a week before that. Since I was taking off for Shanghai, I copied both backups onto my system so I could pick the files (as I wasn't sure if the backup earlier that day wasn't corrupt as well).

I was able to get up and running again on my way to Shanghai without a problem and things were working fine. I assumed it was just some freak accident.

About a month later (mid-August) the same thing happened. This time I dug into it further and found that there had been a series of events in my event log (both then and back in July). It seems the problem starts with an NTFS event (which is flagged as an "Error" rather than a "Critical" event) with the event code of 137. The message from the event was extremely helpful... NOT!:

The default transaction resource manager on volume D: encountered a non-retryable error and could not start. The data contains the error code.

Microsoft's online help for the event was no help:

Results for: Microsoft product: Windows Operating System; Version: 6.0.6000.16386; ID: 137; Event Source: Ntfs;

No results were found for your query. Please see Search Help for suggestions.

Googling on "Default transaction resouce manager" found little results as well, but there was at least a possible link to another's problem. Apparently some had discovered that Acronis True Image had led to similar problems. I had installed Acronis Disk Director to reorganize my disk partitions, so I uninstalled it to see if that would alleviate the problem. And, of course, I did the same restoration process to get back all the lost files.

I did find one interesting discussion on resource managers in Vista, but that didn't provide any information that would help solve my problem.

Given that the error message just showed up in the event log (and in both cases, was close to 24 hours before the system crashed -- allowing me to open/use many files that disappeared), I added an event alert task which would send a message to the console should this error occur again. This is really important so that you can catch the problem as it starts, minimizing the potential damages.

Things went well for about another month and then it happened again in Mid-September, so it clearly wasn't the Acronis product. I was busy getting some heavy work done, so i didn't have the time to explore the problem other than to restore the files again.

About a week later, it happened again. This time it started going into an almost daily problem, sometimes happening again just after I had fixed things and ran chkdsk to fix the problems.

The pain had passed the threshold and I decided to do a total reinstall of the system. Prior to doing that, I did a complete backup. I copied my data files to a portable drive. I ran the extensive system diagnostics including the full suite of hard disk diagnostics to see if there was some form of a hardware problem. All diagnostics passed.

So, this past weekend, I reinstalled vista. I've been installing each of my former tools (there are many of them) and so far, so good. Given that this didn't turn up until I had had the computer for about 60 days the first time, I guess I won't know for sure if I've gotten around the problem till early Dec.

And, of course, I went and added the event task to generate a message should this occur again.

Wish me luck!

Tags : / / / / / /

6 comments:

Anonymous said...

Oh!!! But Vista uses that TPM that Intel so loves! So in a sense y'all deserve each other... Sorry

Conor P. Cahill said...

As far as I can tell, Vista only uses the TPM as it's root of trust & security store for bitlocker -- a somewhat limited disk encryption methhodology that will only protect your system (boot) drive. Since I keep all my data on my data drive, it doesn't offer me much value.

I don't understand the implied issue with the TPM. Seems to be a useful capability, though I'm just exploring how/where I can make use of it.

Zarko said...

Hi Conor,

I am trying to add Alert Task on Vista that will trigger batch file to run. The problem is my batch file won't run.
Now, I am sure that the triggering event does occur, but somehow I miss configured Alert Task so it doesn't run the batch file.
What I wrote at text box for "Run this task when alert is triggered" is C:\Users\zacimovic\Desktop\sutdown.bat
Other text boxes I left blank. Also I added Alert Action to trigger monitoring certain perf counters. When my event does occur, monitoring of counters does begin, but my shutdown.bat file is not run. Do you have any idea what should I write in those other fields at Alert Task dialog? These fields are: "Run this task when alert is triggered", "Task Arguments", "Task argument user text", "Example task argument". My batch script does work on WinXP and when run manually.

Anonymous said...

See this thread from the Acronis True Image forum:

http://www.wilderssecurity.com/showthread.php?t=175737&page=2&highlight=transaction+resource+manager

QUOTE:

October 30th, 2007, 05:35 PM
CGAllred_MSFT
Infrequent Poster Join Date: Oct 2007
Posts: 1

Re: Acronis On Vista Blocks Task Creation

--------------------------------------------------------------------------------

Hopefully somebody is still monitoring this thread; I have some information that you might like to hear.

My name is Christian Allred and I'm a developer at Microsoft on Transactional NTFS (TxF). TxF is the component responsible for the transactional resource managers mentioned in the NTFS 137 events. I just wanted to let you guys know that we've fixed this issue for Vista service pack 1 and Windows Server 2008.

Acronis provided us a copy of TrueImage, and we were able to determine that the issue is a bug in an NTFS routine that handles the FSCTL_LOCK_VOLUME control code. The bug was in the mechanism that we use to get TxF resource managers out of the way to allow a LOCK_VOLUME request to work. The result of a bug was that if sombody used an asynchronous handle to issue FSCTL_LOCK_VOLUME, we'd erroneously keep a flag set that would prevent us from starting TxF resource managers again on the volume until you dismounted/remounted the volume. It turns out that some backup and defrag products, including TrueImage, issue FSCTL_LOCK_VOLUME on async handles, triggering the bug.

The reason that putting a page file on a volume prevents the bug (and cause the "Analyzing partitions" part to move faster, as mwang noted) is that TrueImage doesn't bother to try locking the volume, since it knows already that the lock will fail in the presence of a page file.

The info you guys provided in this thread was quite useful to us in narrowing down the cause and figuring out how to fix this. Thanks.
__________________
Christian Allred
Software Development Engineer
Microsoft Corp.

UNQUOTE

Conor P. Cahill said...

Thanks for the link. That is an interesting article and an especially clear description from Microsoft.

It's also interesting that none of the people in that thread complained about the corrupt filesystem problems I experienced. Perhaps they did not let their systems run long enough to experience that problem.

In any case, my re-install so far seems to have done the job as I'm now at the 3 month point and so far no similar problems with corrupted file systems. I do have to admit that I haven't installed all of the tools I had prior to the reinstall, so it's possible I could run into it again (and hence why I have been in no rush to install them).

Conor

NW Professional Tax said...

I bought my Mother a computer for her birthday last year and i was slightly worried about it due to the fact that it had Vista operating system and I had heard so much bad things about it, such as blue screen errors, random crashing. I actually considered reformatting her computer with XP Pro if she started having troubles, but to date she hasn't had any problems.