n
items of lengths l1, l2...
ln and a total length of t
, find all subsets of items
which add up to t
.
Even though the description sounds fairly simple, the problem is
surprisingly tough. In fact, it is NP-complete,
which is computer science gobbledigook for "d*mn tough, man!".
In practice, it means that for small values of n
, it's easy to find
an algorithm which tests all permutations and lists all subsets in
a reasonable amount of time. However, as the size of the array of item lengths
grows, the computation time may grow exponentially, and chances are you'll
never see the day when it ends
(cf. Deep Thought).
The following simple backtracking
algorithm is of that nature: It performs reasonably well for small arrays, but will
quickly wear your patience for larger arrays. To see how the algorithm behaves,
I created a version of the code which only counts the number of found solutions;
then I ran it in CLISP, compiled and interpreted:
Array size | Run time (compiled) | Run time (interpreted) |
---|---|---|
27 | 80s | 780s |
20 | 0.7s | 6s |
25 | 18s | 210s |
15 | 0.05s | 0.35s |
10 | 0.002s | 0.02s |
29 | 270s | |
30 | 570s |
(let ((solutions 0) flags numbers) (defun found-solution() "Called whenever the algorithm has found a solution" (let ((total 0)) (format t " ") (dotimes (i (length numbers)) (when (aref flags i) (incf total (aref numbers i)) (format t "~A " (aref numbers i))) ) (format t " => ~A~%" total) (incf solutions))) (defun find-solutions(k target-sum callback) "Core backtracking algorithm" (when (zerop target-sum) (funcall callback) (return-from find-solutions)) (unless (= k (length numbers)) (let ((nk (aref numbers k))) (when (>= target-sum nk) ;; try subtracting numbers[k] from target-sum (setf (aref flags k) t) (find-solutions (+ 1 k) (- target-sum nk) callback) (setf (aref flags k) nil))) ;; recurse without subtracting first (find-solutions (+ 1 k) target-sum callback))) (defun find-subset-sum(target-sum) "Set up and run backtracking algorithm based on 'numbers' array" (setf flags (make-array (list (length numbers)))) (setf solutions 0) (find-solutions 0 target-sum #'found-solution) (format t "Found ~A different solutions.~%" solutions)) (defun subset-sum-test(size) "Test subset sum algorithm using random numbers" (let* ((total 0) target-sum) ;; init numbers array with random values up to 1000 (setf numbers (make-array (list size))) (dotimes (i size) (setf (aref numbers i) (random 1000)) (incf total (aref numbers i))) (setf target-sum (floor (/ total 2))) ;; random target sum (format t "Now listing all subsets which sum up to ~A:~%" target-sum) (find-subset-sum target-sum))) )The core backtracking algorithm is in
find-solutions
. It will recursively
exhaust all subsets. When it finds a subset which adds up to target-sum
,
it will call the callback
function - this function can either simply
increase a solution counter, report the current solution to the user, or
store it somewhere for later retrieval.
In the test example above, the callback function is print-solution
which
increments a solution counter and prints the current solution.
To test the code, run subset-sum-test
, providing an array size. This function
will create an array of numbers
of that size and initialize it with random
values; it will also pick a random target-sum
. In a real application,
you would replace subset-sum-test
with a function which gets the array
data from somewhere (for example, from a tools database as in the customer's
case), and lets the user pick a target-sum
.
-- ClausBrod - 01 Mar 2006
The aforementioned customer would actually have preferred a solution in CoCreate
Drafting's macro language. However, this macro language isn't really
a full-blown programming language (even though it is perfectly adequate
for almost all customization purposes). For instance, its macros cannot return
values, the language doesn't have an array type, and the macro expansion stack
(i.e. the depth of the macro call tree) has a fixed limit - which pretty much
rules out non-trivial amounts of recursion.
While I was considering my options, I also fooled around with VBA,
which resulted in the code presented below. I'm not at all
proficient in VBA, so I'm sure the implementation is lacking, but
anyway - maybe someone out there finds it useful nonetheless
Dim solutions As Long Dim flags() As Boolean Dim numbers() As Long Sub findSolutions(k As Long, targetSum As Long) If targetSum = 0 Then ' we found a solution solutions = solutions + 1 Exit Sub End If If k <= UBound(numbers) Then If (targetSum >= numbers(k)) Then flags(k) = True ' try first by subtracting numbers[k] from targetSum Call findSolutions(k + 1, targetSum - numbers(k)) flags(k) = False End If ' now try without subtracting Call findSolutions(k + 1, targetSum) End If End Sub Sub subsetsum() Dim targetSum As Long Dim i As Long Dim arraySize As Long arraySize = 25 ReDim numbers(0 To arraySize - 1) ReDim flags(0 To arraySize - 1) ' initialize numbers array with random entries Randomize For i = 0 To arraySize - 1 numbers(i) = Int(1000 * Rnd + 1) flags(i) = False targetSum = targetSum + numbers(i) Next targetSum = Int(targetSum / 2) solutions = 0 Call findSolutions(0, targetSum) MsgBox "Found " + Str(solutions) + " solutions." End SubLet's see - we recurse one level for each entry in the array, so with a maximum array size of 30 (the customer said he was considering a table of 20-30 values), the recursion depth should never exceed 30. That's not a lot, in fact, so would this still exceed the macro stack thresholds in CoCreate Drafting? Oh, and by the way, the recursive function doesn't even try to return values, so the lack of return values in the macro language isn't a real obstacle in this case! I couldn't resist and just had to try to translate the algorithm into CoCreate Drafting macro language:
{ Description: Subset sum algorithm } { Author: Claus Brod } { Language: CoCreate Drafting macros } { (C) Copyright 2006 Claus Brod, all rights reserved } DEFINE Found_solution LOCAL I LOCAL Solution LOCAL Total LOCAL Nk { display current solution } LET Subset_sum_solutions (Subset_sum_solutions+1) LET I 1 LET Solution '' LET Total 0 WHILE (I <= Subset_sum_arraysize) IF (READ_LTAB 'Flags' I 1) LET Nk (READ_LTAB 'Numbers' I 1) LET Total (Total + Nk) LET Solution (Solution + ' ' + STR(Nk)) END_IF LET I (I+1) END_WHILE DISPLAY_NO_WAIT (Solution + ' sum up to ' + STR(Total)) END_DEFINE DEFINE Find_solutions PARAMETER K PARAMETER Target_sum LOCAL Nk IF (Target_sum = 0) { we found a solution, display it } Found_solution ELSE_IF (K <= Subset_sum_arraysize) LET Nk (READ_LTAB 'Numbers' K 1) { The following optimization only works if we can assume a sorted array } IF ((Nk * (Subset_sum_arraysize-K+1)) >= Target_sum) IF (Target_sum >= Nk) { try first by subtracting Numbers[k] from Target } WRITE_LTAB 'Flags' K 1 1 Find_solutions (K+1) (Target_sum-Nk) WRITE_LTAB 'Flags' K 1 0 END_IF { now try without subtracting } Find_solutions (K+1) Target_sum END_IF END_IF END_DEFINE DEFINE Subset_sum PARAMETER Subset_sum_arraysize LOCAL Target LOCAL Random LOCAL I LOCAL Subset_sum_solutions LOCAL Start_time { Allocate Numbers and Flags arrays } CREATE_LTAB Subset_sum_arraysize 1 'Numbers' CREATE_LTAB Subset_sum_arraysize 1 'Flags' LET Target 0 LET I 1 WHILE (I <= Subset_sum_arraysize) LET Random (INT(1000 * RND + 1)) LET Target (Target + Random) WRITE_LTAB 'Numbers' I 1 Random WRITE_LTAB 'Flags' I 1 0 LET I (I+1) END_WHILE LET Target (INT (Target/2)) DISPLAY ('Array size is ' + STR(Subset_sum_arraysize) + ', target sum is ' + STR(Target)) { Sorting in reverse order speeds up the recursion } SORT_LTAB 'Numbers' REVERSE_SORT 1 CONFIRM LET Start_time (TIME) LET Subset_sum_solutions 0 Find_solutions 1 Target DISPLAY ('Found ' + STR(Subset_sum_solutions) + ' solutions in ' + STR(TIME-Start_time) + ' seconds.') END_DEFINEBecause CoCreate Drafting's macro language doesn't have arrays, they have to be emulated using logical tables, or ltabs. The solutions which are found are displayed in CoCreate Drafting's prompt line, which is certainly not the most ideal place, but it's sufficient to verify the algorithm is actually doing anything .-) What you see above, is a tuned version of the macro code I came up with initially. The "literal" translation from the original Lisp version took ages to complete even for small arrays; for instance, searching for solutions in an array of 20 numbers took 95 seconds (using OSDD 2005). However, there are two simple optimizations which can be applied to the algorithm. First, the input array can be sorted in reverse order, i.e. largest numbers first. This makes it more likely that we can prune the recursion tree early. This optimization itself improved runtimes by only 5% or so, but more importantly, it paved the way for another optimization. Since we know that the numbers in the array are monotonically decreasing, we can now predict in many cases that there is no chance of possibly reaching the target sum anyway, and therefore abort the recursion early. Example for an array of size 20:
Target_sum
has already been
reduced to 500 earlier in the recursion, i.e. the remaining
entries in the array somehow have to sum up to 500 to meet
the required subset sum.
Numbers
array will in fact contain floating-point
values rather than integers. The algorithm doesn't change, but
whenever you work with floating-point values, it's good to follow
a few basic guidelines like the ones outlined here.
In the case of the above macro code, instead of a comparison like
IF (Target = 0), you'd probably want to write something
like IF (ABS(Target) < Epsilon)
where Epsilon
is a small value
chosen to meet a user-defined tolerance (for example 0.001).
-- ClausBrod - 16 Mar 2006
This is taking me to places I didn't anticipate.
Over at codecomments.com,
they are discussing solutions to the subset sum problem in Haskell,
Prolog and Caml, if anyone is interested. (I sure was, and even read
a tutorial on Haskell to help me understand what these guys are
talking about )
-- ClausBrod - 01 Apr 2006
I started to learn some Ruby, so here's a naïve implementation of
the algorithm in yet another language (See also this blog entry.)
$solutions = 0 $numbers = [] $flags = [] def find_solutions(k, target_sum) if target_sum == 0 # found a solution! (0..$numbers.length).each { |i| if ($flags[i]) then print $numbers[i], " "; end } print "\n" $solutions = $solutions + 1 else if k < $numbers.length if target_sum >= $numbers[k] $flags[k] = true find_solutions k+1, target_sum-$numbers[k] $flags[k] = false end find_solutions k+1, target_sum end end end def find_subset_sum(target_sum) print "\nNow listing all subsets which sum up to ", target_sum, ":\n" $solutions = 0 (0..$numbers.length()).each { |i| $flags[i] = false } find_solutions 0, target_sum print "Found ", $solutions, " different solutions.\n" end def subset_sum_test(size) total = 0 target_sum = 0 (0..size).each { |i| $numbers[i] = rand(1000); total += $numbers[i]; print $numbers[i], " " } target_sum = total/2 find_subset_sum target_sum end subset_sum_test 25-- ClausBrod - 17 Apr 2006 The other day, I experimented with Python and thought I'd start with a quasi-verbatim translation of the subset sum code. Here is the result. Apologies for the non-idiomatic and naïve implementation.
import random import array import sys numbers = array.array('i') flags = array.array('c') solutions = 0 def find_solutions(k, target_sum): global solutions if target_sum == 0: print " Solution:", for i in range(0, len(numbers)): if flags[i] != 0: print numbers[i], print solutions = solutions + 1 else: if k < len(numbers): if (numbers[k] * (len(numbers)-k+1)) >= target_sum: if target_sum >= numbers[k]: flags[k] = 1 find_solutions(k+1, target_sum - numbers[k]) flags[k] = 0 find_solutions(k+1, target_sum) def find_subset_sum(target_sum): global solutions global flags print "Subsets which sum up to %s:" % target_sum flags = [0] * len(numbers) find_solutions(0, target_sum) print "Found", solutions, "different solutions" def subset_sum_test(size): global numbers total = 0 print "Random values:\n ", for i in range(0, size): numbers.append(random.randint(0, 1000)) total = total + numbers[i] print numbers[i], print numbers = sorted(numbers, reverse = True) target_sum = total/2 find_subset_sum(target_sum) subset_sum_test(15 if len(sys.argv) < 2 else int(sys.argv[1]))See also A Subset Of Python. -- ClausBrod - 30 Jun 2013 And here is an implementation in C#:
using System; namespace SubsetSum { class SubsetSum { private int[] numbers; private bool[] flags; private int findSolutions(int k, int targetSum, int solutions=0) { if (targetSum == 0) { Console.Write(" Solution: "); for (int i=0; i<numbers.Length; i++) { if (flags[i]) { Console.Write("{0} ", numbers[i]); } } Console.WriteLine(); solutions++; } else { if (k < numbers.Length) { if ((numbers[k] * (numbers.Length - k + 1)) >= targetSum) { if (targetSum >= numbers[k]) { flags[k] = true; solutions = findSolutions(k + 1, targetSum - numbers[k], solutions); flags[k] = false; } solutions = findSolutions(k + 1, targetSum, solutions); } } } return solutions; } public void solve() { Array.Sort(numbers, (x, y) => y - x); // sort in reverse order Array.Clear(flags, 0, flags.Length); int total = 0; Array.ForEach(numbers, (int n) => total += n); int solutions = findSolutions(0, total / 2); Console.WriteLine("Found {0} different solutions.", solutions); } SubsetSum(int size) { numbers = new int[size]; Random r = new Random(); for (int i = 0; i < size; i++) { numbers[i] = r.Next(1000); Console.Write("{0} ", numbers[i]); } Console.WriteLine(); flags = new bool[size]; } public static void Main(string[] args) { int size = args.Length > 1 ? int.Parse(args[1]) : 15; new SubsetSum(size).solve(); } } }A naïve implementation in Delphi:
program subsetsum; {$APPTYPE CONSOLE} {$R *.res} uses System.Generics.Collections, System.Generics.Defaults, System.SysUtils; type TSubsetSum = class private FNumbers: TArray<Integer>; FFlags: TArray<Boolean>; function FindSolutions( aK: Integer; aTargetSum: Integer; aSolutions: Integer = 0 ): Integer; public procedure Solve( ); constructor Create( aSize: Integer ); end; var vSize: Integer; vSubsetSum: TSubsetSum; constructor TSubsetSum.Create( aSize: Integer ); var i: Integer; begin SetLength( FNumbers, aSize ); SetLength( FFlags, aSize ); Randomize; for i := 0 to aSize - 1 do begin FNumbers[i] := Random( 1000 ); Write( FNumbers[i].ToString + ' ' ); end; writeln; end; function TSubsetSum.!FindSolutions( aK, aTargetSum, aSolutions: Integer ): Integer; begin if ( aTargetSum = 0 ) then begin write( ' Solution: ' ); for var i := 0 to Length( FNumbers ) - 1 do if FFlags[i] then write( FNumbers[i].ToString + ' ' ); writeln; inc( aSolutions ); end else begin if ( aK < Length( FNumbers ) ) then if ( ( FNumbers[aK] * ( Length( FNumbers ) - aK + 1 ) ) >= aTargetSum ) then begin if ( aTargetSum >= FNumbers[aK] ) then begin FFlags[aK] := True; aSolutions := FindSolutions( aK + 1, aTargetSum - FNumbers[aK], aSolutions ); FFlags[aK] := False; end; aSolutions := FindSolutions( aK + 1, aTargetSum, aSolutions ); end; end; Result := aSolutions; end; procedure TSubsetSum.Solve; var vTotal: Integer; vSolutions: Integer; begin TArray.Sort<Integer>( FNumbers, TComparer<Integer>.Construct( function( const Left, Right: Integer ): Integer begin Result := Right - Left; end ) ); vTotal := 0; for var vNumber: Integer in FNumbers do vTotal := vTotal + vNumber; vSolutions := FindSolutions( 0, vTotal div 2 ); writeln( Format( 'Found %d different solutions.', [vSolutions] ) ); end; begin vSize := 15; if ( ParamCount > 1 ) then vSize := ParamStr( 0 ).ToInteger; vSubsetSum := TSubsetSum.Create( vSize ); vSubsetSum.Solve( ); vSubsetSum.Free; end.-- ClausBrod - 19 Jan 2021