Regular recursion (25 Mar 2006)

So I tried to come up with some simple code in VBscript which recursively searches a directory for file names of arbitrary patterns. This is what I got working:

Sub recursiveSearch(dir, regex)
  for each file in dir.files
    if regex.Test(file.Name) Then
      WScript.Echo("File matches: " & file.Path)
    End if
  next

  for each folder in dir.SubFolders
    recursiveSearch folder, regex
  next
End Sub

startFolder="c:\temp"
set folder=CreateObject("Scripting.FileSystemObject").GetFolder(startFolder)

Set regex=new RegExp
regex.Pattern = "^Foo\d{3}[0-9a-zA-Z]\.txt$"
' File name starts with 'Foo', followed by three digits, then either
' a digit or letter, and has a .txt extension.
recursiveSearch folder, regex

Somehow I've got a hunch that there may be an easier way to do this. Blogosphere, any ideas?


When asked for a TWiki account, use your own or the default TWikiGuest account.


Don't quote me on this (18 Mar 2006)

Let us assume that I'm a little backward and have a peculiar fondness for the DOS command shell. Let us further assume that I also like blank characters in pathnames. Let us conclude that therefore I'm hosed.

But maybe others out there are hosed, too. Blank characters in pathnames are not exactly my exclusive fetish; others have joined in as well (C:\Program Files, C:\Documents and Settings). And when using software, you might be running cmd.exe without even knowing it. Many applications can run external helper programs upon user request, be it through the UI or through the application's macro language.

The test environment is a directory c:\temp\foo bar which contains write.exe (copied from the Windows system directory) and two text files, one of them with a blank in its filename.

Now we open a DOS shell:

C:\>dir c:\temp\foo bar
 Volume in drive C is IBM_PRELOAD
 Volume Serial Number is C081-0CE2

 Directory of c:\temp

File Not Found

 Directory of C:\

File Not Found

C:\>dir "c:\temp\foo bar"
 Volume in drive C is IBM_PRELOAD
 Volume Serial Number is C081-0CE2

 Directory of c:\temp\foo bar

03/18/2006  03:08 PM    <DIR>          .
03/18/2006  03:08 PM    <DIR>          ..
01/24/2006  11:19 PM             1,516 foo bar.txt
01/24/2006  11:19 PM             1,516 foo.txt
03/17/2006  09:44 AM             5,632 write.exe
               3 File(s)          8,664 bytes
               2 Dir(s)  17,448,394,752 bytes free

Note that we had to quote the pathname to make the DIR command work. Nothing unusual here; quoting is a fact of life for anyone out there who ever used a DOS or UNIX shell.

Trying to start write.exe by entering c:\temp\foo bar\write.exe in the DOS shell fails; again, we need to quote:

C:\>"c:\temp\foo bar\write.exe"

And if we want to load foo bar.txt into the editor, we need to quote the filename as well:

C:\>"c:\temp\foo bar\write.exe" "c:\temp\foo bar\foo bar.txt"

Still no surprises here.

But let's suppose we want to run an arbitrary command from our application rather than from the command prompt. The C runtime library provides the system() function for this purpose. It is well-known that under the hood system actually runs cmd.exe to do its job.

#include <stdio.h>
#include <process.h>

int main(void)
{
  char *exe = "c:\\temp\\foo bar\\write.exe";
  char *path = "c:\\temp\\foo bar\\foo bar.txt";

  char cmdbuf[1024];
  _snprintf(cmdbuf, sizeof(cmdbuf), "\"%s\" \"%s\"", exe, path);

  int ret = system(cmdbuf);
  printf("system(\"%s\") returns %d\n", cmdbuf, ret);
  return 0;
}

When running this code, it reports that system() returned 0, and write.exe never starts, even though we quoted both the name of the executable and the text file name.

What's going on here? system() internally runs cmd.exe like this:

  cmd.exe /c "c:\temp\foo bar\write.exe" "c:\temp\foo bar\foo bar.txt"

Try entering the above in the command prompt: No editor to be seen anywhere! So when we run cmd.exe programmatically, apparently it parses its input differently than when we use it in an interactive fashion.

I remember this problem drove me the up the freakin' wall when I first encountered it roughly two years ago. With a lot of experimentation, I found the right magic incantation:

  _snprintf(cmdbuf, sizeof(cmdbuf), "\"\"%s\" \"%s\"\"", exe, path);
  // originally: _snprintf(cmdbuf, sizeof(cmdbuf), "\"%s\" \"%s\"", exe, path);

Note that I quoted the whole command string another time! Now the executable actually starts. Let's verify this in the command prompt window: Yes, something like cmd.exe /c ""c:\temp\foo bar\write.exe" "c:\temp\foo bar\foo bar.txt"" does what we want.

I was reminded of this weird behavior when John Scheffel, long-time user of our flagship product OneSpace Designer Modeling and maintainer of the international CoCreate user forum, reported funny quoting problems when trying to run executables from our app's built-in Lisp interpreter. John also found the solution and documented it in a Lisp version.

Our Lisp implementation provides a function called sd-sys-exec, and you need to invoke it thusly:

(setf exe "c:/temp/foo bar/write.exe")
(setf path "c:/temp/foo bar/foo bar.txt")
(oli:sd-sys-exec (format nil "\"\"~A\" \"~A\"\"" exe path))

Kudos to John for figuring out the Lisp solution. Let's try to decipher all those quotes and backslashes in the format statement.

Originally, I modified his solution slightly by using ~S instead of ~A in the format call and thereby saving one level of explicit quoting in the code:

  (format nil "\"~S ~S\"" exe path))

This is much easier on the eyes, yet I overlooked that the ~S format specifier not only produces enclosing quotes, but also escapes any backslash characters in the argument that it processes. So if path contains a backslash (not quite unlikely on a Windows machine), the backslash will be doubled. This works surprisingly well for some time, until you hit a UNC path which already starts with two backslashes. As an example, \\backslash\lashes\back turns into \\\\backslash\\lashes\\back, which no DOS shell will be able to grok anymore.

John spotted this issue as well. Maybe he should be writing these blog entries, don't you think? smile

From those Lisp subtleties back to the original problem: I never quite understood why the extra level of quoting is necessary for cmd.exe, but apparently, others have been in the same mess before. For example, check out this XEmacs code to see how complex correct quoting can be. See also an online version of the help pages for CMD.EXE for more information on the involved quoting heuristics applied by the shell.

PS: A very similar situation occurs in OneSpace Designer Drafting as well (which is our 2D CAD application). To start an executable write.exe in a directory c:\temp\foo bar and have it open the text file c:\temp\foo bar\foo bar.txt, you'll need macro code like this:

LET Cmd '"C:\temp\foo bar\write.exe"'
LET File '"C:\temp\foo bar\foo bar.txt"'
LET Fullcmd (Cmd + " " + File)
LET Fullcmd ('"' + Fullcmd + '"')  { This is the important line }
RUN Fullcmd

Same procedure as above: If both the executable's path and the path of the data file contain blank characters, the whole command string which is passed down to cmd.exe needs to be enclosed in an additional pair of quotes...

PS: See also http://blogs.msdn.com/b/twistylittlepassagesallalike/archive/2011/04/23/everyone-quotes-arguments-the-wrong-way.aspx and http://daviddeley.com/autohotkey/parameters/parameters.htm


When asked for a TWiki account, use your own or the default TWikiGuest account.
http://xkcd.com/1638/

-- ClausBrod - 27 Mar 2016


Two minute warning (16 Mar 2006)

Software is a freaky thing. I still don't know how to explain what it is to my mum and dad. I could tell them about ones and zeroes and how layer upon layer of hardware and software build on each other until they finally form what they see on a computer screen or in a digital camera. But I'm not overly confident I'd be able to bring across how this stuff really works, partially because I hardly think about the inner workings of computer systems anymore. The von Neumann architecture is so deeply engraved into Joe Developer's reasoning and mindset that it becomes an almost subconscious fundament of our daily work.

But many of those who use computers every day never really understand how this gadget works. They usually get by because over time the software industry has developed UI metaphors which shield users from the internal complexity. Until something unexpected happens - such as an application crash.

Once upon a time in a reality not too far away, a Japanese user started seeing application crashes. The bug report through which we learned about his problems did not really complain about individual bugs or situations in which the crashes occurred. Instead, he requested to add a feature to the software so that it would alert the user before a crash would occur so that he'd have the chance to save his data and exit before the crash actually happened.

Now this was not a request to add a top-level exception handler which kicks in when a crash occurs, reports the issue to the user and makes a last-ditch effort to save data in memory. We already had that in the application. No, what the customer really wanted our application to do was to predict that a crash was looming in the near future.

My brain starts to hurt whenever I think about this request. After all, a crash is usually caused by a hitherto undetected bug in the code, i.e. by an issue which neither we as the programmers nor the software itself know about. Being able to predict a crash which is due to a bug in our code is more or less equivalent to knowing that the bug exists, where it is located, what the user's next action will be, and whether that course of action would lead him into the "danger zone". I'll ignore the bit about predicting the user's action for a moment; but if either we or the software already knows about the bug, why not simply fix it in the first place rather than ceremonially announcing it to the user? (Did I miss something? Does any of the more recent CPUs have a clairvoyance opcode that we could use? big grin )

It took me only a short while to explain this to our support folks, but then, they are sufficiently versed with software that they kind of "got it" naturally, even though most of them do not develop any software. I don't think, however, that we ever succeeded to communicate this properly to the customer.

Maybe I even understand the customer. He was probably thinking he was kind of generous to us; after all, he was willing to accept that any kind of software inevitably has some bugs, some of which even cause crashes, and that there is no practical way of dealing with this other than using the software and fixing the issues one by one.

But at the very minimum, he wanted to be warned. I mean, how hard can this be, after all! Even cars alert their drivers if there is a problem with the car which should be taken care of in a garage as soon as possible. Most of these problems, however, are not immediately fatal. The car continues to work for some time - you don't have to stop it right away and have it toed to the garage, but can drive it there yourself, which is certainly more convenient.

What seems to be a fairly simple idea to a customer, is a nerve-wrecking perspective for a developer. There is really no way to predict the future, not even in a computer program; this is what the halting problem teaches us. However, what seems obvious to a developer, sounds like a lame excuse to someone who is not that computer-savvy.

But then, maybe there are ways to monitor the health of software and the data which it processes, and maybe, based on a lot of heuristics, we could even translate those observations into warnings for users without causing too many false alarms...


When asked for a TWiki account, use your own or the default TWikiGuest account.


Rapid proto-typing (03 Mar 2006)

Much to my dismay, I found myself in a situation where the following hack is useful. I shudder at the thought of actually using it because of its inherent instability, but sometimes it's better than a poke in the eye with C#.

If you're automating an application which, while executing a command, may pop up error or warning messages and wait for user input, you may need to explicitly send a keystroke to that application. Fortunately, this is reasonably simple using cscript.exe, the WSH Shell object and VBscript:

Set WshShell = WScript.CreateObject("WScript.Shell")
WshShell.AppActivate ("Appname as it appears in the main window title")
WshShell.SendKeys "{ENTER}"

While testing this, I learnt that the application name parameter to AppActivate can actually be an abbreviation. For instance, if you run Word, its main window title is usually something like "gazonk.doc - Microsoft Word". AppActivate actually uses a simple best-match algorithm so that the following will still work as expected:

Set WshShell = WScript.CreateObject("WScript.Shell")
WshShell.AppActivate ("Microsoft Word")
WshShell.SendKeys "foo"

The SendKeys method turns out to be pretty convenient since it allows to describe non-printable characters with a special notation, such as {BREAK} for the Break key, {PGUP} and {PGDN} for moving pagewise, {DEL}, {HOME}, all the function keys et cetera.


When asked for a TWiki account, use your own or the default TWikiGuest account.


Comment dit-on "knapsack" en français? (01 Mar 2006)

This week, a customer of our software asked a seemingly innocent question; given a set of tools of various lengths, he wanted to find subsets of those tools which, when combined, can be used to manufacture a screw of a given length.

From the description, I deduced that we were talking about a variation of the subset sum problem which is a special case of the knapsack problem. Faint memories of my time at university arose; I couldn't resist the weird intellectual tickle. Or maybe it was just the beginning of my pollen allergy for this year big grin Anyway, I searched high and low on my quest to reacquire long-lost knowledge.

One of the weirder search results was a TV show called Des chiffres et des lettres which has been running for ages now on French TV. In that show, they play a game called "Le compte est bon" which is actually a variation of the subset sum problem! The candidates are supposed to solve this puzzle in about a minute or so during the show. Wow - these French guys must be math geniuses! wink

Anyway, I couldn't help but try a subset sum algorithm in Lisp. I ran it both using CLISP and the implementation of Lisp provided in CoCreate OneSpace Modeling. I started to collect some benchmark results for CLISP, comparing interpreted and compiled code to get a better feeling for the kind of improvements I can expect from the CLISP compiler. In the case of CLISP, the compiler improves runtime by roughly an order of magnitude. See the discussion of the algorithm for detailled results.


When asked for a TWiki account, use your own or the default TWikiGuest account.
https://xkcd.com/287/

-- ClausBrod - 01 Sep 2017


Previous month: Click here. Next month: Click here.

Revision: r1.8 - 09 Apr 2006 - 14:25 - ClausBrod
Blog > DefinePrivatePublic200603
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback