#define private public Crashing with style on Vista, part II (25 Jun 2007)

Crashing with style on Vista, part II (25 Jun 2007)

In the first part of this mini-series, I demonstrated the ReportFault API and why it didn't fit my needs on Vista. Last time around, I discussed my first attempt to use the new Windows Error Reporting (WER) APIs instead, which failed to produce any crash reports on Microsoft's Winqual site.

When the curtain fell last time, I had a WER test application which, on the surface, appeared to work, but didn't manage to get any crash reports through to Winqual. Also, entries for crash reports produced by this application looked a little funny in Vista's Problem History window:

In particular, the Bucket ID value stands out. What are bucket IDs? Essentially, the Winqual site combines various attributes of the crash report (application, signatures, crash address etc.) and creates a unique integer value from them, which then becomes an identifier for this particular type of crash.

All my WER-induced crash reports submitted from Vista clients always had a bucket ID of 8, regardless of which test application I used and how exactly I provoked the crash. Also, I knew from earlier, successful attempts to talk to the Winqual servers how real bucket IDs usually look like (much larger integers). Something fishy was going on here.

The application I tested was properly registered, signed and mapped at the Winqual site, and crash reports submitted from XP systems made it to the Winqual servers just fine. Hence, registration issues could be ruled out. I posted to the Windows Error Reporting forum and asked for help and clarification. Saar Picker responded: "We filter out unknown event types. Since your report is not of a recognized event type, it is being rejected. The Bucket ID 8 event is reporting the rejection to us."

So my crash reports were not of a recognized event type. What's a poor crash report supposed to do to be recognized?

The first parameter for WerReportCreate is an event type. The documentations says: "wzEventType - A pointer to a Unicode string that specifies the name of the event." Hmmm, so maybe this is the event type that Saar mentioned. If so, what kind of event are we talking about? Win32 events? Events like the ones captured in the Windows event log?

None of those, as it turns out. Instead, error reporting servers can define types of error events that they want to capture. Microsoft's Winqual servers, for example, are configured to accept event types which represent application or operating system crashes.

So what is the magic event type which represents an application crash?

Hint 1: The werapi.h header file defines an undocumented macro constant called APPCRASH_EVENT.

  #define APPCRASH_EVENT L"APPCRASH"

Hint 2: When a crash report is submitted using WerReportSubmit, this API tries to contact the error reporting server. In Vista, the protocol is based on XML snippets which the client sends to the server via HTTP. One of the attributes in the initial XML that is transmitted is called eventtype, and for applications which do not try to handle fatal crashes themselves, the value of that attribute is indeed "APPCRASH".

So I modified my WER code to use "APPCRASH" instead of some arbitrary string. And indeed, this made a difference, although not the one I had hoped for: With the new event type, WerReportSubmit() now returned an error (E_FAIL), where it previously succeeded...

To debug the problem, I intercepted the XML exchange between the client and the server, and looked at the differences between a non-WER client and my own test code. (If you're interested in the interception details, drop me a line.) The non-WER client transmitted additional data (so-called "signature parameters"), and it also specified a "report type" of 2 instead of 1. My strategy was eliminate the differences one by one by working the WER APIs.

The extra parameters sent by the non-WER client were things like the application's name, version and timestamp; the faulting module's name, version and typestamp; and the exception code and address offset. And now, finally, I understood the purpose of the underdocumented WerReportSetParameter API - depending on the server's setup, it expects certain extra parameters to safely identify an event, and those can be set using WerReportSetParameter:

static void wer_report_set_parameters(HREPORT hReportHandle,
                                      EXCEPTION_POINTERS *exc_ptr)
{
  TCHAR moduleName[1024];
  get_module_name(NULL, moduleName, _countof(moduleName));
  pWerReportSetParameter(hReportHandle, 0, L"Application Name", moduleName);

  TCHAR buffer[1024];
  get_module_file_version(moduleName, buffer, _countof(buffer));
  pWerReportSetParameter(hReportHandle, 1, L"Application Version", buffer);

  HMODULE hModule = GetModuleHandle(0);
  DWORD timeStamp = GetTimestampForLoadedLibrary(hModule);
  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE,
    __T("%x"), timeStamp);
  pWerReportSetParameter(hReportHandle, 2, L"Application Timestamp", buffer);

  // determine module name from crash address
  moduleName[0] = 0;
  void *exceptionAddress = exc_ptr->ExceptionRecord->ExceptionAddress;
  if (GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
    GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
    (LPCTSTR)exceptionAddress, &hModule)) {
    get_module_name(hModule, moduleName, _countof(moduleName));
  }
  pWerReportSetParameter(hReportHandle, 3, L"Fault Module Name", moduleName);

  get_module_file_version(moduleName, buffer, _countof(buffer));
  pWerReportSetParameter(hReportHandle, 4, L"Fault Module Version", buffer);

  timeStamp = GetTimestampForLoadedLibrary(hModule);
  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE,__T("%x"), timeStamp);
  pWerReportSetParameter(hReportHandle, 5, L"Fault Module Timestamp", buffer);

  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE,
    __T("%08x"), exc_ptr->ExceptionRecord->ExceptionCode);
  pWerReportSetParameter(hReportHandle, 6, L"Exception Code", buffer);

  INT_PTR offset = (char *)exceptionAddress - (char *)hModule;
  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE, __T("%p"), offset);
  pWerReportSetParameter(hReportHandle, 7, L"Exception Offset", buffer);
}

The other significant change was to use the undocumented WerReportApplicationCrash constant as the "report type" parameter for WerReportCreate. After these changes, the Winqual servers finally started talking to me: I received bucket IDs, sometimes also requests to transmit minidump data - and after a few days, the crash reports appeared on the Winqual site! Whoopee!

The full demo code is attached. To build, open a Visual Studio command prompt and run the compiler:

  cl werapitest.cpp

My special thanks to Saar Picker and Jason Hardester at Microsoft for their help!

Now that I've achieved my original goal (reporting crashes using the WER APIs under Vista), let me spoil the fun by warning you to ever use this approach. Why? Because this is clearly not the way Microsoft recommends to handle application crashes. Now, while I'm not sure whether Microsoft as a whole has an official recommendation, the documentation or the postings in newsgroups in blogs clearly suggest that an application shouldn't actually even try to handle a crash explicitly - instead, it should just crash and let the OS do the reporting. The basic rationale behind this is that an application is probably already deeply confused when a crash occurs, and some of its data may already have been damaged. This makes crash recovery a difficult and unreliable endeavor.

There are circumstances where an application needs to keep control of the reporting process, but Microsoft expects such cases to be very rare. Which explains a lot of the initial communication disconnects that I experienced while discussing my case with Saar and Jason.

There's a reason why it's called "WER" (Windows Error Reporting) and not "WCR" (Windows Crash Reporting). Apparently, Microsoft doesn't expect us to use those APIs for crash reporting, but rather for more generic "error" or "event" reporting. For example, this U.S. patent claim discusses how the WER APIs can be used to report failures in handwriting recognition. (By the way, there's also a patent for WER itself, see http://www.freepatentsonline.com/20060271591.html.)

Attachment	Action	Size	Date	Who	Comment
werapitest2.zip	manage	6.5 K	25 Jun 2007 - 10:41	ClausBrod	werapitest demo code, revised version

Revision: r1.7 - 27 Jun 2007 - 06:09 - ClausBrod

Blog > DefinePrivatePublic20070625