Think globally, dump locally (27 Jun 2007)

localdump.png

These days, I spend quite some time in Microsoft's Windows Error Reporting forum, which is where David Ching, who is a Microsoft MVP, posed an interesting problem this week.

On Vista, Windows Error Reporting will create and transmit minidump files only if the WER servers request them. At least this seems to be the default behavior which both David and I have observed on Vista systems. David, however, wanted to make sure that whenever an application crashes, a minidump file is generated which the user or tester can then send directly to the developers of the application for analysis - even if Microsoft's WER servers never actually request the minidumps, which, as far as I can tell, is the default for applications which have not been explicitly registered with and mapped at Winqual.

My first idea was to force the system into queuing mode. When crash reports are queued, minidumps are always generated and stored locally, so that they can be transmitted to the error reporting server later on. Queuing is enabled by setting HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\ForceQueue (DWORD) to 1. (See WER Settings for documentation on this and other WER-related registry keys.) Crash report data will be stored in directories such as c:\Users\someusername\AppData\Local\temp and C:\ProgramData\Microsoft\Windows\WER\ReportQueue.

That works, but it also suppresses the WER UI, which isn't ideal either. Isn't there some way to have the cake and eat it, too?

Let's see: A variation of the above approach is to disable the Internet connection before the crash occurs. You'll get the dialogs, but WER won't be able to connect to the Microsoft servers, and so it should then also queue the crash information. Alternatively, and this is something that I have tried myself a few times, you could set HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\Windows Error Reporting\CorporateWERServer (string) to the name of some non-existing system. When a crash occurs, WER will try to contact that server, find that it's not responding, and then store all crash data locally so that it can be re-sent when the connection is later established.

Or you could go all the way and actually install such a Corporate Error Reporting server on one of your systems. Probably one of the best solutions, since this gives you direct access to minidump files within your organization.

But this blog isn't about IT, it's about hacking and coding wink Here's an idea how David's goals could be accomplished without implementing a full-blown crash handler:

And here's the demo code which demonstrate this technique:

// Demo program using SetUnhandledExceptionFilter() and
// MiniDumpWriteDump().
//
// Claus Brod, http://www.clausbrod.de/Blog

#include <windows.h>
#include <DbgHelp.h>
#pragma comment(lib, "DbgHelp.lib")
#include <stdio.h>

static LONG WINAPI myfilter(_EXCEPTION_POINTERS *exc_ptr)
{
  static const char *minidumpFilename = "myminidump.mdmp";
  HANDLE hDumpFile = CreateFile(minidumpFilename, GENERIC_WRITE, 0, NULL,
    CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

  if (hDumpFile != INVALID_HANDLE_VALUE) {
    __try {
      MINIDUMP_EXCEPTION_INFORMATION exceptionInfo;
      exceptionInfo.ThreadId = GetCurrentThreadId();
      exceptionInfo.ExceptionPointers = exc_ptr;
      exceptionInfo.ClientPointers = false;

      BOOL ret = MiniDumpWriteDump(GetCurrentProcess(),
        GetCurrentProcessId(), hDumpFile, MiniDumpNormal, &exceptionInfo, NULL, NULL);
      if (ret) {
        printf("Minidump information has been written to %s.\n", minidumpFilename);
      }
    } __except(EXCEPTION_EXECUTE_HANDLER) { }
    CloseHandle(hDumpFile);
  }

  return EXCEPTION_CONTINUE_SEARCH;
}

static int wedding_crasher(int *pp)
{
  *pp = 42;
  return 42;
}

int main(void)
{
  SetUnhandledExceptionFilter(myfilter);

  wedding_crasher(0);
  return 0;
}

And finally, here's a really weird idea from Dmitry Vostokov: Resurrecting Dr. Watson on Vista wink If you're into exception handling and crash analysis, Dmitry's http://www.dumpanalysis.org/ web site is a fantastic resource. This guy lives in an exception filter big grin


Crashing with style on Vista, part II (25 Jun 2007)

In the first part of this mini-series, I demonstrated the ReportFault API and why it didn't fit my needs on Vista. Last time around, I discussed my first attempt to use the new Windows Error Reporting (WER) APIs instead, which failed to produce any crash reports on Microsoft's Winqual site.

When the curtain fell last time, I had a WER test application which, on the surface, appeared to work, but didn't manage to get any crash reports through to Winqual. Also, entries for crash reports produced by this application looked a little funny in Vista's Problem History window:


In particular, the Bucket ID value stands out. What are bucket IDs? Essentially, the Winqual site combines various attributes of the crash report (application, signatures, crash address etc.) and creates a unique integer value from them, which then becomes an identifier for this particular type of crash.

All my WER-induced crash reports submitted from Vista clients always had a bucket ID of 8, regardless of which test application I used and how exactly I provoked the crash. Also, I knew from earlier, successful attempts to talk to the Winqual servers how real bucket IDs usually look like (much larger integers). Something fishy was going on here.

The application I tested was properly registered, signed and mapped at the Winqual site, and crash reports submitted from XP systems made it to the Winqual servers just fine. Hence, registration issues could be ruled out. I posted to the Windows Error Reporting forum and asked for help and clarification. Saar Picker responded: "We filter out unknown event types. Since your report is not of a recognized event type, it is being rejected. The Bucket ID 8 event is reporting the rejection to us."

So my crash reports were not of a recognized event type. What's a poor crash report supposed to do to be recognized?

The first parameter for WerReportCreate is an event type. The documentations says: "wzEventType - A pointer to a Unicode string that specifies the name of the event." Hmmm, so maybe this is the event type that Saar mentioned. If so, what kind of event are we talking about? Win32 events? Events like the ones captured in the Windows event log?

None of those, as it turns out. Instead, error reporting servers can define types of error events that they want to capture. Microsoft's Winqual servers, for example, are configured to accept event types which represent application or operating system crashes.

So what is the magic event type which represents an application crash?

Hint 1: The werapi.h header file defines an undocumented macro constant called APPCRASH_EVENT.

  #define APPCRASH_EVENT L"APPCRASH"

Hint 2: When a crash report is submitted using WerReportSubmit, this API tries to contact the error reporting server. In Vista, the protocol is based on XML snippets which the client sends to the server via HTTP. One of the attributes in the initial XML that is transmitted is called eventtype, and for applications which do not try to handle fatal crashes themselves, the value of that attribute is indeed "APPCRASH".

So I modified my WER code to use "APPCRASH" instead of some arbitrary string. And indeed, this made a difference, although not the one I had hoped for: With the new event type, WerReportSubmit() now returned an error (E_FAIL), where it previously succeeded...

To debug the problem, I intercepted the XML exchange between the client and the server, and looked at the differences between a non-WER client and my own test code. (If you're interested in the interception details, drop me a line.) The non-WER client transmitted additional data (so-called "signature parameters"), and it also specified a "report type" of 2 instead of 1. So my strategy was to eliminate the differences one by one by working the WER APIs.

The extra parameters sent by the non-WER client were things like the application's name, version and timestamp; the faulting module's name, version and typestamp; and the exception code and address offset. And now, finally, I understood the purpose of the underdocumented WerReportSetParameter API - depending on the server's setup, it expects certain extra parameters to safely identify an event, and those can be set using WerReportSetParameter:

static void wer_report_set_parameters(HREPORT hReportHandle,
                                      EXCEPTION_POINTERS *exc_ptr)
{
  TCHAR moduleName[1024];
  get_module_name(NULL, moduleName, _countof(moduleName));
  pWerReportSetParameter(hReportHandle, 0, L"Application Name", moduleName);

  TCHAR buffer[1024];
  get_module_file_version(moduleName, buffer, _countof(buffer));
  pWerReportSetParameter(hReportHandle, 1, L"Application Version", buffer);

  HMODULE hModule = GetModuleHandle(0);
  DWORD timeStamp = GetTimestampForLoadedLibrary(hModule);
  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE,
    __T("%x"), timeStamp);
  pWerReportSetParameter(hReportHandle, 2, L"Application Timestamp", buffer);

  // determine module name from crash address
  moduleName[0] = 0;
  void *exceptionAddress = exc_ptr->ExceptionRecord->ExceptionAddress;
  if (GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
    GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
    (LPCTSTR)exceptionAddress, &hModule)) {
    get_module_name(hModule, moduleName, _countof(moduleName));
  }
  pWerReportSetParameter(hReportHandle, 3, L"Fault Module Name", moduleName);

  get_module_file_version(moduleName, buffer, _countof(buffer));
  pWerReportSetParameter(hReportHandle, 4, L"Fault Module Version", buffer);

  timeStamp = GetTimestampForLoadedLibrary(hModule);
  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE,__T("%x"), timeStamp);
  pWerReportSetParameter(hReportHandle, 5, L"Fault Module Timestamp", buffer);

  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE,
    __T("%08x"), exc_ptr->ExceptionRecord->ExceptionCode);
  pWerReportSetParameter(hReportHandle, 6, L"Exception Code", buffer);

  INT_PTR offset = (char *)exceptionAddress - (char *)hModule;
  _sntprintf_s(buffer, _countof(buffer), _TRUNCATE, __T("%p"), offset);
  pWerReportSetParameter(hReportHandle, 7, L"Exception Offset", buffer);
}

The other significant change was to use the undocumented WerReportApplicationCrash constant as the "report type" parameter for WerReportCreate. After these changes, the Winqual servers finally started talking to me: I received bucket IDs, sometimes also requests to transmit minidump data - and after a few days, the crash reports appeared on the Winqual site! Whoopee!

The full demo code is attached. To build, open a Visual Studio command prompt and run the compiler:

  cl werapitest.cpp

My special thanks to Saar Picker and Jason Hardester at Microsoft for their help!

Now that I've achieved my original goal (reporting crashes using the WER APIs under Vista), let me spoil the fun by warning you to ever use this approach. Why? Because this is clearly not the way Microsoft recommends to handle application crashes. Now, while I'm not sure whether Microsoft as a whole has an official recommendation, the documentation or the postings in newsgroups in blogs clearly suggest that an application shouldn't actually even try to handle a crash explicitly - instead, it should just crash and let the OS do the reporting. The basic rationale behind this is that an application is probably already deeply confused when a crash occurs, and some of its data may already have been damaged. This makes crash recovery a difficult and unreliable endeavor.

There are circumstances where an application needs to keep control of the reporting process, but Microsoft expects such cases to be very rare. Which explains a lot of the initial communication disconnects that I experienced while discussing my case with Saar and Jason.

There's a reason why it's called "WER" (Windows Error Reporting) and not "WCR" (Windows Crash Reporting). Apparently, Microsoft doesn't expect us to use those APIs for crash reporting, but rather for more generic "error" or "event" reporting. For example, this U.S. patent claim discusses how the WER APIs can be used to report failures in handwriting recognition. (By the way, there's also a patent for WER itself, see http://www.freepatentsonline.com/20060271591.html.)


Crashing with style on Vista (18 Jun 2007)

A few days ago, I reported about the peculiarities of the ReportFault API, particularly on Windows Vista, and how those peculiarities drove me to give in to Microsoft's sound advice and use the new and shiny Windows Error Reporting (WER) APIs on Vista.

ReportFault() is a great one-stop shopping API: A one-liner will display all required dialogs, ask the user if he wants to contact Microsoft, create report data (including minidumps) if required, and send the whole report off to Microsoft.

The new WER APIs in Vista are slightly more complex, but also provide more control for the details of error reporting. Well, if you know how to handle the APIs, that is. Apparently, I do not know how to handle them since I still haven't solved all the problems around them.

More on this in a moment. Let's first take a look at the core of a test application I wrote:

static bool report_crash(_EXCEPTION_POINTERS *inExceptionPointer)
{
  // Set up parameters for WerReportCreate()
  WER_REPORT_INFORMATION werReportInfo;
  memset(&werReportInfo, 0, sizeof(werReportInfo));
  werReportInfo.dwSize = sizeof(werReportInfo);
  wcscpy_s(werReportInfo.wzFriendlyEventName,
    _countof(werReportInfo.wzFriendlyEventName),
        L"werapitest (friendly event name)");
  wcscpy_s(werReportInfo.wzApplicationName,
    _countof(werReportInfo.wzApplicationName), L"");
  wcscpy_s(werReportInfo.wzDescription,
    _countof(werReportInfo.wzDescription), L"Critical runtime problem");

  PCWSTR eventType = L"werapitest (eventType)"; // APPCRASH
  HREPORT hReportHandle;
  if (FAILED(pWerReportCreate(eventType, WerReportCritical,
    &werReportInfo, &hReportHandle)) || !hReportHandle) {
      return false;
  }

  bool ret = false;

  WER_EXCEPTION_INFORMATION werExceptionInformation;
  werExceptionInformation.bClientPointers = FALSE;
  werExceptionInformation.pExceptionPointers = inExceptionPointer;
  bool dumpAdded = SUCCEEDED(pWerReportAddDump(hReportHandle, ::GetCurrentProcess(),
    ::GetCurrentThread(), WerDumpTypeMiniDump, &werExceptionInformation, NULL, 0));
  if (!dumpAdded) {
    FATAL_ERROR("Minidump generation failed.\n");
  }

  DWORD submitOptions = WER_SUBMIT_OUTOFPROCESS | WER_SUBMIT_NO_CLOSE_UI;
  WER_SUBMIT_RESULT submitResult;
  if (SUCCEEDED(pWerReportSubmit(hReportHandle, WerConsentNotAsked,
    submitOptions, &submitResult))) {
      switch(submitResult)
      {
        // ... decode result ...

      }
  }
  pWerReportCloseHandle(hReportHandle);

  return ret;
}

static int filter_exception(EXCEPTION_POINTERS *exc_ptr)
{
  report_crash(exc_ptr);
  return EXCEPTION_EXECUTE_HANDLER;
}

static void wedding_crasher(void)
{
  __try {
    int *foo = (int *)0;
    *foo = 42;
  } __except(filter_exception(GetExceptionInformation())) {
    printf("Now in exception handler, process is still alive!\n");
  }
  Sleep(5000);
}

int main()
{
  HMODULE hWer = LoadLibrary("Wer.dll");
  if (hWer) {
    pWerReportCreate =
      (pfn_WERREPORTCREATE)GetProcAddress(hWer, "WerReportCreate");
    pWerReportSubmit =
      (pfn_WERREPORTSUBMIT)GetProcAddress(hWer, "WerReportSubmit");
    pWerReportCloseHandle =
      (pfn_WERREPORTCLOSEHANDLE)GetProcAddress(hWer, "WerReportCloseHandle");
    pWerReportAddDump =
      (pfn_WERREPORTADDDUMP)GetProcAddress(hWer, "WerReportAddDump");
  }

  if (!pWerReportCreate || !pWerReportSubmit ||
    !pWerReportCloseHandle || !pWerReportAddDump) {
      printf("Cannot initialize WER API.\n");
      return 1;
  }

  wedding_crasher();
  return 0;
}

The fundamental approach is still the same as for the ReportFault test program presented recently:

  • A structured exception block is established using __try and __except.
  • Code provokes an access violation.
  • The exception filter filter_exception is consulted by the exception handling infrastructure to find out how to proceed with the exception.
  • The filter calls the WER APIs to display the crash dialog(s), and to give the user options to debug the problem, ignore it, or report it to Microsoft.
  • The exception filter returns EXCEPTION_EXECUTE_HANDLER to indicate that its associated exception handler should be called.

The following WER APIs are used to create and send a crash report:

The WER APIs do indeed solve a problem that I found with ReportFault on Vista: They don't force the calling process to be terminated, and allow me to proceed as I see fit. That's really good news.

The problem I haven't resolved yet is this: Even though I call WerReportAddDump, I have no idea whether minidump data are actually generated and sent. In fact, from the feedback provided by the system, it seems likely that those data are not generated.

To illustrate my uncertainties, I wrote a test program called werapitest. The code is attached as a ZIP file; unpack it into a directory, open a Visual Studio command prompt window, and build the code as follows:

  cl werapitest.cpp

Run the resulting executable, then open up the "Problem Reports and Solutions" control panel and click on "View problem history". On my system, I get something like this:

werapitest_history.jpg

Double-clicking on the report entry leads to this:

werapitest_entry.png

The problem history entry does not mention any attached files, such as minidump data!

When a crash occurs, the system also writes entries into the event log; those log entries claim there are additional data in paths such as C:\Users\clausb\AppData\Local\Microsoft\Windows\WER\ReportArchive\Report0f8918ad, and indeed, such directories exist and each contain a file called Report.wer, which holds data such as:

Version=1
EventType=werapitest (eventType)
EventTime=128266502225896608
ReportType=1
Consent=1
UploadTime=128266502257542112
Response.BucketId=8
Response.BucketTable=5
Response.type=4
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=6.0.6000.2.0.0.256.16
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=1033
UI[3]=werapitest.exe has stopped working
UI[4]=Windows can check online for a solution to the problem.
UI[5]=Check online for a solution and close the program
UI[6]=Check online for a solution later and close the program
UI[7]=Close the program
State[0].Key=Transport.DoneStage1
State[0].Value=1
State[1].Key=DataRequest
State[1].Value=Bucket=8/nBucketTable=5/nResponse=1/n
FriendlyEventName=werapitest (friendly event name)
ConsentKey=werapitest (eventType)
AppName=werapitest.exe
AppPath=C:\tmp\werapitest.exe
ReportDescription=Critical runtime problem

So again, the minidump is not mentioned anywhere.

Now let's try some minimal code which uses neither ReportFault nor the new WER API:

  int main(void)
  {
    int *p = (int *)0;
    *p = 42;
    return 0;
  }

After running this code and letting it crash and report to Microsoft, I get the following problem history entry:

crashme.jpg

This problem report contains a lot more data than the one for werapitest, and it even refers to a minidump file which was apparently generated by the system and probably also sent to Microsoft.

So the lazy code which doesn't do anything about crashes gets full and proper service from the OS, while the application which tries to deal with a crash in an orderly manner and elaborately goes through all the trouble of using the proper APIs doesn't get its message across to Microsoft. I call this unfair wink

Oh, and in case you're wondering: Yes, we've registered with Microsoft's Winqual site where the crash reports are supposed to be sent to, and we established "product mappings" there, and the whole process seems to work for XP clients just fine.

I'm pretty sure that I'm just missing a couple of details with the new APIs, or maybe I'm misinterpreting the feedback from the system. I ran numerous experiments and umpteen variations, I've searched the web high and low, read the docs, consulted newsgroups here and there - and now I'm running out of ideas. Any hints most welcome...

PS: I did indeed receive some hints. For updated WER code, along with an explanation on why the above failed, see Crashing with style on Vista, part II.


The end is nigh (for my process) (16 Jun 2007)

How can you tell that you're the control freak type of Windows programmer? Easy: You feel that irresistible urge to install top-level exception handlers which report application crashes to the end user and provide useful options on how to proceed, such as to report the issue to the software vendor, save the currently loaded data, inspect the issue in more detail, or call the police.

reportfault_xp.jpg In fact, this is pretty much what Windows Error Reporting is all about, only that the crash reports are sent to Microsoft first (to their Winqual site, that is), from where ISVs can then download them for further analysis. Oh, and the other difference is that Microsoft dropped the "call the police" feature in order to get Vista done in time.

One of the applications that I'm working on already had its own top-level crash handler which performed some of the services also provided by Windows Error Reporting. It was about time to investigate Microsoft's offerings in this area and see how they can replace or augment the existing crash handler code.

The first option I looked at was the ReportFault API. Microsoft's documentation says that the function is obsolete, and we should rather use a different set of APIs collectively called the "WER functions". However, understanding them requires a lot more brain calories than the trivial ReportFault call which you can simply drop into an exception filter, and you're done.

The required code is pretty trivial and looks roughly like this:

int filter_exception(EXCEPTION_POINTERS *exc_ptr)
{
  EFaultRepRetVal repret = ReportFault(exc_ptr, 0);
  switch (repret)
  {
         // decode return value...
         //
  }
  return EXCEPTION_EXECUTE_HANDLER;
}

void main(void)
{
  __try {
    int *foo = (int *)0;
    *foo = 42;
  } __except(filter_exception(GetExceptionInformation())) {
    _tprintf(__T("Nothing to see here, move on, process is still alive!\n"));
  }
  Sleep(5000);
}

Sequence of events:

  • A structured exception block is established using __try and __except.
  • Code provokes an access violation.
  • The exception filter filter_exception is consulted by the exception handling infrastructure to find out how to proceed with the exception.
  • The filter calls ReportFault to display the crash dialog as shown above, and to give the user options to debug the problem, ignore it, or report it to Microsoft.
  • After performing its menial reporting duties, the exception filter returns EXCEPTION_EXECUTE_HANDLER to indicate that its associated exception handler should be called.

That exception handler is, in fact, essentially the _tprintf statement which spreads the good news about the process still being alive.

reportfault_vista.jpg On XP, that is. On Vista, the _tprintf statement may actually never execute. You'll still get a nice reporting dialog, such as the one in the screenshot to the right, but when you click the "Close program" button, the calling process will be terminated immediately, i.e. ReportFault never really returns to the caller!

I debugged into ReportFault on my Vista machine and found that ReportFault spawns off a process called wermgr.exe which performs the actual work. My current hypothesis is that it is wermgr.exe which terminates the calling process if the user chooses "Close program".

If you want to try it yourself, click here to download the demo code. To compile, simply run it through cl.exe:

  cl.exe reportfault.cpp

Now, can we complain about this, really? After all, you can't call it surprising if a program closes after hitting the "Close program" button. Still, the behavior differs from the old XP dialog - and it is inconsistent even on Vista. What I just described is the behavior that I found with the default error reporting settings in Vista. By default, Vista "checks for solutions automatically" and doesn't ask the user what to do when a crash occurs. This can be configured in the "Problem Reports and Solutions" control panel:

vista_settings.jpg

After changing the report settings as shown above ("Ask me") and then running the test application again, the error reporting dialog looks like this:

reportfault_vista_ask.jpg

When I click on "Close program" now, guess what happens - the process does not terminate, and the _tprintf statement in my exception handler is executed, just like on XP! So that "Close program" button can mean two different things on Vista...

It's not just this inconsistency which bugged me. I also don't like the idea of letting the error reporting dialog pull the rug from under my feet. Sure, I'd like to use the dialog's services, but when it returns, I want to make my own decisions about how to proceed. For example, I could try and save the currently loaded data in my application, or I could add my own special reporting. Or call the cops.

ReportFault won't let me do that on Vista. And so I set out to burn those extra brain calories anyway and learn about the new WER APIs which were introduced with Windows Vista.

And burn calories I did, oh yes. More on this hopefully soon.


Previous month: Click here.

Revision: r1.7 - 28 Jun 2007 - 06:02 - ClausBrod
Blog > DefinePrivatePublic200706
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback