In the first part of this mini-series, I
demonstrated the ReportFault API
and why it didn't fit my needs on Vista. Last time around, I discussed
my
first attempt to use the new
Windows Error Reporting (WER) APIs instead, which failed to produce
any crash reports on Microsoft's Winqual site.
When the curtain fell last time, I had a WER test application which,
on the surface,
appeared to work, but didn't manage to get any crash
reports through to Winqual. Also, entries for crash reports produced
by this application looked a little funny in Vista's Problem History window:
In particular, the
Bucket ID value stands out. What are
bucket IDs? Essentially,
the Winqual site combines various attributes of the crash report (application,
signatures, crash address etc.) and creates a unique integer value from them,
which then becomes an identifier for this particular type of crash.
All my WER-induced crash reports submitted from Vista clients always had a
bucket ID of 8, regardless of which test application I used and how exactly I provoked the
crash. Also, I knew from earlier, successful attempts to talk to the Winqual
servers how
real bucket IDs usually look like (much larger integers).
Something fishy was going on here.
The application I tested was properly registered, signed and mapped
at the Winqual site, and crash reports submitted from XP systems made it to
the Winqual servers just fine. Hence, registration issues could be ruled
out. I posted to the
Windows Error Reporting forum
and asked for help and clarification. Saar Picker responded: "We filter out
unknown event types. Since your report is not of a recognized event type, it
is being rejected. The Bucket ID 8 event is reporting the rejection to us."
So my crash reports were
not of a recognized event type. What's a poor
crash report supposed to do to be recognized?
The first parameter for
WerReportCreate
is an event type. The documentations says: "
wzEventType - A pointer to a
Unicode string that specifies the name of the event." Hmmm, so maybe this is
the event type that Saar mentioned. If so, what kind of event are we talking
about? Win32 events? Events like the ones captured in the Windows event log?
None of those, as it turns out. Instead, error reporting servers can define
types of error events that they want to capture.
Microsoft's Winqual servers, for example, are configured to accept event types
which represent application or operating system crashes.
So what is the magic event type which represents an application crash?
Hint 1: The
werapi.h
header file defines an undocumented macro constant called
APPCRASH_EVENT
.
#define APPCRASH_EVENT L"APPCRASH"
Hint 2: When a crash report is submitted using
WerReportSubmit
, this API tries to
contact the error reporting server. In Vista, the protocol is based
on XML snippets which the client sends to the server via HTTP. One of
the attributes in the initial XML that is transmitted is called
eventtype
,
and for applications which do not try to handle fatal crashes themselves,
the value of that attribute is indeed "APPCRASH".
So I modified my WER code to use "APPCRASH" instead of some arbitrary
string. And indeed, this made a difference, although not the one I had hoped for:
With the new event type,
WerReportSubmit()
now returned an error
(
E_FAIL
), where it previously succeeded...
To debug the problem, I intercepted the XML exchange between the client
and the server, and looked at the differences between a non-WER client
and my own test code. (If you're interested in the interception details,
drop me a line.) The non-WER client transmitted
additional data (so-called "signature parameters"), and it also
specified a "report type" of 2 instead of 1. My strategy was
eliminate the differences one by one by working the WER APIs.
The extra parameters sent by the non-WER client were things like the
application's name, version and timestamp; the faulting module's name,
version and typestamp; and the exception code and address offset.
And now, finally, I understood the purpose of the
underdocumented
WerReportSetParameter
API - depending on the server's setup, it expects certain extra
parameters to safely identify an event, and those can be set using
WerReportSetParameter
:
static void wer_report_set_parameters(HREPORT hReportHandle,
EXCEPTION_POINTERS *exc_ptr)
{
TCHAR moduleName[1024];
get_module_name(NULL, moduleName, _countof(moduleName));
pWerReportSetParameter(hReportHandle, 0, L"Application Name", moduleName);
TCHAR buffer[1024];
get_module_file_version(moduleName, buffer, _countof(buffer));
pWerReportSetParameter(hReportHandle, 1, L"Application Version", buffer);
HMODULE hModule = GetModuleHandle(0);
DWORD timeStamp = GetTimestampForLoadedLibrary(hModule);
_sntprintf_s(buffer, _countof(buffer), _TRUNCATE,
__T("%x"), timeStamp);
pWerReportSetParameter(hReportHandle, 2, L"Application Timestamp", buffer);
// determine module name from crash address
moduleName[0] = 0;
void *exceptionAddress = exc_ptr->ExceptionRecord->ExceptionAddress;
if (GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
(LPCTSTR)exceptionAddress, &hModule)) {
get_module_name(hModule, moduleName, _countof(moduleName));
}
pWerReportSetParameter(hReportHandle, 3, L"Fault Module Name", moduleName);
get_module_file_version(moduleName, buffer, _countof(buffer));
pWerReportSetParameter(hReportHandle, 4, L"Fault Module Version", buffer);
timeStamp = GetTimestampForLoadedLibrary(hModule);
_sntprintf_s(buffer, _countof(buffer), _TRUNCATE,__T("%x"), timeStamp);
pWerReportSetParameter(hReportHandle, 5, L"Fault Module Timestamp", buffer);
_sntprintf_s(buffer, _countof(buffer), _TRUNCATE,
__T("%08x"), exc_ptr->ExceptionRecord->ExceptionCode);
pWerReportSetParameter(hReportHandle, 6, L"Exception Code", buffer);
INT_PTR offset = (char *)exceptionAddress - (char *)hModule;
_sntprintf_s(buffer, _countof(buffer), _TRUNCATE, __T("%p"), offset);
pWerReportSetParameter(hReportHandle, 7, L"Exception Offset", buffer);
}
The other significant change was to use the undocumented
WerReportApplicationCrash
constant
as the "report type" parameter for
WerReportCreate
. After these changes,
the Winqual servers finally started talking to me: I received bucket IDs, sometimes also
requests to transmit minidump data - and after a few days, the crash reports appeared
on the Winqual site! Whoopee!
The full demo code is
attached. To build, open
a Visual Studio command prompt and run the compiler:
cl werapitest.cpp
My special thanks to Saar Picker and Jason Hardester at Microsoft for their help!
Now that I've achieved my original goal (reporting crashes using the WER APIs
under Vista), let me spoil the fun by warning you to ever use this approach. Why?
Because this is clearly not the way Microsoft recommends to handle application
crashes. Now, while I'm not sure whether Microsoft as a whole has an official
recommendation, the documentation or the postings in newsgroups in blogs clearly
suggest that an application shouldn't actually even try to handle a crash
explicitly - instead, it should just crash and let the OS do the reporting.
The basic rationale behind this is that an application is probably already
deeply confused when a crash occurs, and some of its data may already have been
damaged. This makes crash recovery a difficult and unreliable endeavor.
There are circumstances where an application needs to keep control of the reporting
process, but Microsoft expects such cases to be very rare. Which explains
a lot of the initial communication disconnects that I experienced while discussing
my case with Saar and Jason.
There's a reason why it's called "WER" (
Windows
Error
Reporting)
and not "WCR" (
Windows
Crash
Reporting). Apparently, Microsoft doesn't
expect us to use those APIs for crash reporting, but rather for more generic
"error" or "event" reporting. For example,
this U.S. patent claim
discusses how the WER APIs can be used to report failures in handwriting recognition.
(By the way, there's also a patent for WER itself, see
http://www.freepatentsonline.com/20060271591.html.)