CoCreate Modeling code examples: Solving the subset sum problem

CoCreate Modeling: Solving the subset sum problem

In a recent discussion in the German CoCreate user forum, a customer was looking for ways to solve a variation of the subset sum problem (which is a special case of the knapsack problem). This was needed to find the right combination of tool parts to manufacture a screw. Those tool parts are available in a wide variety of diameters, and the task at hand is to find a set of these tools which, when combined, can be used to manufacture a screw of a given size.

Abstracting from the manufacturing problem, the problem can be stated like this: Given n items of lengths l₁, l₂... l_n and a total length of t, find all subsets of items which add up to t.

Even though the description sounds fairly simple, the problem is surprisingly tough. In fact, it is NP-complete, which is computer science gobbledigook for "d*mn tough, man!". In practice, it means that for small values of n, it's easy to find an algorithm which tests all permutations and lists all subsets in a reasonable amount of time. However, as the size of the array of item lengths grows, the computation time may grow exponentially, and chances are you'll never see the day when it ends (cf. Deep Thought).

The following simple backtracking algorithm is of that nature: It performs reasonably well for small arrays, but will quickly wear your patience for larger arrays. To see how the algorithm behaves, I created a version of the code which only counts the number of found solutions; then I ran it in CLISP, compiled and interpreted:

Array size	Run time (compiled)	Run time (interpreted)
27	80s	780s
20	0.7s	6s
25	18s	210s
15	0.05s	0.35s
10	0.002s	0.02s
29	270s
30	570s

Take those results with arbitrary amounts of salt; the code (see below) initializes the test array with random numbers and will therefore always vary a little. But the results are reliable enough to show that you don't really want to use this algorithm for large arrays...

(let ((solutions 0)
      flags
      numbers)

  (defun found-solution()
    "Called whenever the algorithm has found a solution"

    (let ((total 0))
      (format t "  ")
      (dotimes (i (length numbers))
        (when (aref flags i)
          (incf total (aref numbers i))
          (format t "~A " (aref numbers i)))
        )
      (format t " => ~A~%" total)
      (incf solutions)))

  (defun find-solutions(k target-sum callback)
    "Core backtracking algorithm"

    (when (zerop target-sum)
      (funcall callback)
      (return-from find-solutions))

    (unless (= k (length numbers))
      (let ((nk (aref numbers k)))
        (when (>= target-sum nk)
          ;; try subtracting numbers[k] from target-sum
          (setf (aref flags k) t)
          (find-solutions (+ 1 k) (- target-sum nk) callback)
          (setf (aref flags k) nil)))

      ;; recurse without subtracting first
      (find-solutions (+ 1 k) target-sum callback)))

  (defun find-subset-sum(target-sum)
    "Set up and run backtracking algorithm based on 'numbers' array"

    (setf flags (make-array (list (length numbers))))
    (setf solutions 0)
    (find-solutions 0 target-sum #'found-solution)
    (format t "Found ~A different solutions.~%" solutions))


  (defun subset-sum-test(size)
    "Test subset sum algorithm using random numbers"

    (let* ((total 0) target-sum)
      ;; init numbers array with random values up to 1000
      (setf numbers (make-array (list size)))
      (dotimes (i size)
        (setf (aref numbers i) (random 1000))
        (incf total (aref numbers i)))

      (setf target-sum (floor (/ total 2))) ;; random target sum
      (format t "Now listing all subsets which sum up to ~A:~%" target-sum)

      (find-subset-sum target-sum)))

  )

The core backtracking algorithm is in find-solutions. It will recursively exhaust all subsets. When it finds a subset which adds up to target-sum, it will call the callback function - this function can either simply increase a solution counter, report the current solution to the user, or store it somewhere for later retrieval.

In the test example above, the callback function is print-solution which increments a solution counter and prints the current solution.

To test the code, run subset-sum-test, providing an array size. This function will create an array of numbers of that size and initialize it with random values; it will also pick a random target-sum. In a real application, you would replace subset-sum-test with a function which gets the array data from somewhere (for example, from a tools database as in the customer's case), and lets the user pick a target-sum.

-- ClausBrod - 01 Mar 2006

The aforementioned customer would actually have preferred a solution in CoCreate Drafting's macro language. However, this macro language isn't really a full-blown programming language (even though it is perfectly adequate for almost all customization purposes). For instance, its macros cannot return values, the language doesn't have an array type, and the macro expansion stack (i.e. the depth of the macro call tree) has a fixed limit - which pretty much rules out non-trivial amounts of recursion.

While I was considering my options, I also fooled around with VBA, which resulted in the code presented below. I'm not at all proficient in VBA, so I'm sure the implementation is lacking, but anyway - maybe someone out there finds it useful nonetheless wink

Dim solutions As Long
Dim flags() As Boolean
Dim numbers() As Long

Sub findSolutions(k As Long, targetSum As Long)
  If targetSum = 0 Then
    ' we found a solution
    solutions = solutions + 1
    Exit Sub
  End If
  
  If k <= UBound(numbers) Then
    If (targetSum >= numbers(k)) Then
      flags(k) = True
      ' try first by subtracting numbers[k] from targetSum
      Call findSolutions(k + 1, targetSum - numbers(k))
      flags(k) = False
    End If
    
    ' now try without subtracting
    Call findSolutions(k + 1, targetSum)
  End If
End Sub

Sub subsetsum()
  Dim targetSum As Long
  Dim i As Long
  
  Dim arraySize As Long
  arraySize = 25
  ReDim numbers(0 To arraySize - 1)
  ReDim flags(0 To arraySize - 1)
  
  ' initialize numbers array with random entries
  Randomize
  For i = 0 To arraySize - 1
    numbers(i) = Int(1000 * Rnd + 1)
    flags(i) = False
    targetSum = targetSum + numbers(i)
  Next
  targetSum = Int(targetSum / 2)
  
  solutions = 0
  Call findSolutions(0, targetSum)
  MsgBox "Found " + Str(solutions) + " solutions."
End Sub

Let's see - we recurse one level for each entry in the array, so with a maximum array size of 30 (the customer said he was considering a table of 20-30 values), the recursion depth should never exceed 30. That's not a lot, in fact, so would this still exceed the macro stack thresholds in CoCreate Drafting?

Oh, and by the way, the recursive function doesn't even try to return values, so the lack of return values in the macro language isn't a real obstacle in this case! I couldn't resist and just had to try to translate the algorithm into CoCreate Drafting macro language:

{ Description:  Subset sum algorithm }
{ Author:       Claus Brod  }
{ Language:     CoCreate Drafting macros }
{ (C) Copyright 2006 Claus Brod, all rights reserved }

DEFINE Found_solution
  LOCAL I
  LOCAL Solution
  LOCAL Total
  LOCAL Nk

  { display current solution }
  LET Subset_sum_solutions (Subset_sum_solutions+1)
  LET I 1
  LET Solution ''
  LET Total 0
  WHILE (I <= Subset_sum_arraysize)
    IF (READ_LTAB 'Flags' I 1)
      LET Nk (READ_LTAB 'Numbers' I 1)
      LET Total (Total + Nk)
      LET Solution (Solution + ' ' + STR(Nk))
    END_IF
    LET I (I+1)
  END_WHILE
  DISPLAY_NO_WAIT (Solution + ' sum up to ' + STR(Total))
END_DEFINE

DEFINE Find_solutions
  PARAMETER K
  PARAMETER Target_sum

  LOCAL Nk

  IF (Target_sum = 0)
    { we found a solution, display it }
    Found_solution
  ELSE_IF (K <= Subset_sum_arraysize)
    LET Nk (READ_LTAB 'Numbers' K 1)
    { The following optimization only works if we can assume a sorted array }
    IF ((Nk * (Subset_sum_arraysize-K+1)) >= Target_sum)
      IF (Target_sum >= Nk)
        { try first by subtracting Numbers[k] from Target }
        WRITE_LTAB 'Flags' K 1 1
        Find_solutions (K+1) (Target_sum-Nk)
        WRITE_LTAB 'Flags' K 1 0
      END_IF

      { now try without subtracting }
      Find_solutions (K+1) Target_sum
    END_IF
  END_IF
END_DEFINE

DEFINE Subset_sum
  PARAMETER Subset_sum_arraysize

  LOCAL Target
  LOCAL Random
  LOCAL I
  LOCAL Subset_sum_solutions
  LOCAL Start_time

  { Allocate Numbers and Flags arrays }
  CREATE_LTAB Subset_sum_arraysize 1 'Numbers'
  CREATE_LTAB Subset_sum_arraysize 1 'Flags'

  LET Target 0
  LET I 1
  WHILE (I <= Subset_sum_arraysize)
    LET Random (INT(1000 * RND + 1))
    LET Target (Target + Random)
    WRITE_LTAB 'Numbers' I 1 Random
    WRITE_LTAB 'Flags' I 1 0
    LET I (I+1)
  END_WHILE
  LET Target (INT (Target/2))

  DISPLAY ('Array size is ' + STR(Subset_sum_arraysize) + ', target sum is ' + STR(Target))

  { Sorting in reverse order speeds up the recursion }
  SORT_LTAB 'Numbers' REVERSE_SORT 1 CONFIRM

  LET Start_time (TIME)
  LET Subset_sum_solutions 0
  Find_solutions 1 Target
  DISPLAY ('Found ' + STR(Subset_sum_solutions) + ' solutions in ' + STR(TIME-Start_time) + ' seconds.')
END_DEFINE

Because CoCreate Drafting's macro language doesn't have arrays, they have to be emulated using logical tables, or ltabs. The solutions which are found are displayed in CoCreate Drafting's prompt line, which is certainly not the most ideal place, but it's sufficient to verify the algorithm is actually doing anything .-)

What you see above, is a tuned version of the macro code I came up with initially. The "literal" translation from the original Lisp version took ages to complete even for small arrays; for instance, searching for solutions in an array of 20 numbers took 95 seconds (using OSDD 2005). However, there are two simple optimizations which can be applied to the algorithm.

First, the input array can be sorted in reverse order, i.e. largest numbers first. This makes it more likely that we can prune the recursion tree early. This optimization itself improved runtimes by only 5% or so, but more importantly, it paved the way for another optimization.

Since we know that the numbers in the array are monotonically decreasing, we can now predict in many cases that there is no chance of possibly reaching the target sum anyway, and therefore abort the recursion early. Example for an array of size 20:

We are in the midst of the recursion, say, at recursion level 15. The array entry at index 15 contains the value 42.
Let us further assume that Target_sum has already been reduced to 500 earlier in the recursion, i.e. the remaining entries in the array somehow have to sum up to 500 to meet the required subset sum.
Because the values in the array are monotonically decreasing, we know that the entries 15 through 20 have a value of 42 at most. Assuming that all remaining entries are 42, we get a maximum remaining sum of 42*6=252.
This means we can prune the recursion tree at this point because there's no way that this recursion line will ever find a solution.

This second optimization improved runtimes tremendously; with an array size of 20, we're now down to 15 seconds (originally 95 seconds); however, it still takes more than 3000 seconds to find all solutions in an array of size 30.

By the way, all performance measurements mentioned here were taken on the same system, a laptop with a 1.7 GHz Celeron CPU.

In real life, the Numbers array will in fact contain floating-point values rather than integers. The algorithm doesn't change, but whenever you work with floating-point values, it's good to follow a few basic guidelines like the ones outlined here.

In the case of the above macro code, instead of a comparison like IF (Target = 0), you'd probably want to write something like IF (ABS(Target) < Epsilon) where Epsilon is a small value chosen to meet a user-defined tolerance (for example 0.001).

-- ClausBrod - 16 Mar 2006

This is taking me to places I didn't anticipate. Over at codecomments.com, they are discussing solutions to the subset sum problem in Haskell, Prolog and Caml, if anyone is interested. (I sure was, and even read a tutorial on Haskell to help me understand what these guys are talking about smile )

-- ClausBrod - 01 Apr 2006

I started to learn some Ruby, so here's a naïve implementation of the algorithm in yet another language wink (See also this blog entry.)

$solutions = 0
$numbers = []
$flags = []

def find_solutions(k, target_sum)
  if target_sum == 0
    # found a solution!
    (0..$numbers.length).each { |i| if ($flags[i]) then print $numbers[i], " "; end }
    print "\n"
    $solutions = $solutions + 1
  else
    if k < $numbers.length
      if target_sum >= $numbers[k]
        $flags[k] = true
        find_solutions k+1, target_sum-$numbers[k]
        $flags[k] = false
      end
      find_solutions k+1, target_sum
    end
  end
end

def find_subset_sum(target_sum)
  print "\nNow listing all subsets which sum up to ", target_sum, ":\n"
  $solutions = 0
  (0..$numbers.length()).each { |i| $flags[i] = false }
  find_solutions 0, target_sum
  print "Found ", $solutions, " different solutions.\n"
end

def subset_sum_test(size)
  total = 0
  target_sum = 0
  (0..size).each { |i| $numbers[i] = rand(1000); total += $numbers[i]; print $numbers[i], " " }
  target_sum = total/2
  find_subset_sum target_sum
end

subset_sum_test 25

-- ClausBrod - 17 Apr 2006

The other day, I experimented with Python and thought I'd start with a quasi-verbatim translation of the subset sum code. Here is the result. Apologies for the non-idiomatic and naïve implementation.


import random
import array
import sys

numbers = array.array('i')
flags = array.array('c')
solutions = 0

def find_solutions(k, target_sum):
    global solutions

    if target_sum == 0:
        print "  Solution:",
        for i in range(0, len(numbers)):
            if flags[i] != 0:
                print numbers[i],
        print
        solutions = solutions + 1
    else:
        if k < len(numbers):
            if (numbers[k] * (len(numbers)-k+1)) >= target_sum:
                if target_sum >= numbers[k]:
                    flags[k] = 1
                    find_solutions(k+1, target_sum - numbers[k])
                    flags[k] = 0
                find_solutions(k+1, target_sum)

def find_subset_sum(target_sum):
    global solutions
    global flags
    print "Subsets which sum up to %s:" % target_sum
    flags = [0] * len(numbers)
    find_solutions(0, target_sum)
    print "Found", solutions, "different solutions"

def subset_sum_test(size):
    global numbers
    total = 0

    print "Random values:\n  ",
    for i in range(0, size):
        numbers.append(random.randint(0, 1000))
        total = total + numbers[i]
        print numbers[i],
    print

    numbers = sorted(numbers, reverse = True)
    target_sum = total/2
    find_subset_sum(target_sum)

subset_sum_test(15 if len(sys.argv) < 2 else int(sys.argv[1]))

See also A Subset Of Python.

-- ClausBrod - 30 Jun 2013

And here is an implementation in C#:


using System;

namespace SubsetSum
{
   class SubsetSum
   {
      private int[] numbers;
      private bool[] flags;

      private int findSolutions(int k, int targetSum, int solutions=0)
      {
         if (targetSum == 0) {
            Console.Write("  Solution: ");
            for (int i=0; i<numbers.Length; i++) {
               if (flags[i]) {
                  Console.Write("{0} ", numbers[i]);
               }
            }
            Console.WriteLine();
            solutions++;
         } else {
            if (k < numbers.Length) {
               if ((numbers[k] * (numbers.Length - k + 1)) >= targetSum) {
                  if (targetSum >= numbers[k]) {
                     flags[k] = true;
                     solutions = findSolutions(k + 1, targetSum - numbers[k], solutions);
                     flags[k] = false;
                  }
                  solutions = findSolutions(k + 1, targetSum, solutions);
               }
            }
         }
         return solutions;
      }

      public void solve() {
         Array.Sort(numbers, (x, y) => y - x); // sort in reverse order
         Array.Clear(flags, 0, flags.Length);

         int total = 0;
         Array.ForEach(numbers, (int n) => total += n);
         int solutions = findSolutions(0, total / 2);
         Console.WriteLine("Found {0} different solutions.", solutions);
      }

      SubsetSum(int size) {
         numbers = new int[size];
         Random r = new Random();
         for (int i = 0; i < size; i++) {
            numbers[i] = r.Next(1000);
            Console.Write("{0} ", numbers[i]);
         }
         Console.WriteLine();

         flags = new bool[size];
      }

      public static void Main(string[] args)
      {
         int size = args.Length > 1 ? int.Parse(args[1]) : 15;
         new SubsetSum(size).solve();
      }
   }
}

A naïve implementation in Delphi:

program subsetsum;

{$APPTYPE CONSOLE}
{$R *.res}

uses
  System.Generics.Collections,
  System.Generics.Defaults,
  System.SysUtils;

type
 TSubsetSum = class
  private
    FNumbers: TArray<Integer>;
    FFlags: TArray<Boolean>;

    function FindSolutions( aK: Integer; aTargetSum: Integer; aSolutions: Integer = 0 ): Integer;
  public
    procedure Solve( );
    constructor Create( aSize: Integer );
  end;

var
  vSize: Integer;
  vSubsetSum: TSubsetSum;

constructor TSubsetSum.Create( aSize: Integer );
var
  i: Integer;
begin
 SetLength( FNumbers, aSize );
 SetLength( FFlags, aSize );
  Randomize;
  for i := 0 to aSize - 1 do
  begin
    FNumbers[i] := Random( 1000 );
    Write( FNumbers[i].ToString + ' ' );
  end;
  writeln;
end;

function TSubsetSum.!FindSolutions( aK, aTargetSum, aSolutions: Integer ): Integer;
begin
  if ( aTargetSum = 0 ) then
  begin
    write( '  Solution: ' );
    for var i := 0 to Length( FNumbers ) - 1 do
      if FFlags[i] then
        write( FNumbers[i].ToString + ' ' );
    writeln;
    inc( aSolutions );
  end
  else
  begin
    if ( aK < Length( FNumbers ) ) then
      if ( ( FNumbers[aK] * ( Length( FNumbers ) - aK + 1 ) ) >= aTargetSum ) then
      begin
        if ( aTargetSum >= FNumbers[aK] ) then
        begin
          FFlags[aK] := True;
          aSolutions := FindSolutions( aK + 1, aTargetSum - FNumbers[aK], aSolutions );
          FFlags[aK] := False;
        end;
        aSolutions := FindSolutions( aK + 1, aTargetSum, aSolutions );
      end;
  end;
  Result := aSolutions;
end;

procedure TSubsetSum.Solve;
var
  vTotal: Integer;
  vSolutions: Integer;
begin
  TArray.Sort<Integer>( FNumbers, TComparer<Integer>.Construct(
    function( const Left, Right: Integer ): Integer
    begin
      Result := Right - Left;
    end ) );

  vTotal := 0;
  for var vNumber: Integer in FNumbers do
    vTotal := vTotal + vNumber;

  vSolutions := FindSolutions( 0, vTotal div 2 );
  writeln( Format( 'Found %d different solutions.', [vSolutions] ) );
end;

begin
  vSize := 15;
  if ( ParamCount > 1 ) then
    vSize := ParamStr( 0 ).ToInteger;

  vSubsetSum := TSubsetSum.Create( vSize );
  vSubsetSum.Solve( );
  vSubsetSum.Free;
end.

-- ClausBrod - 19 Jan 2021

-- ClausBrod - 16 Mar 2016
to top

End of topic
Skip to action links | Back to top

You are here: CoCreateModeling > OsdmMacros > MacroSubsetSum

r1.24 - 19 Jan 2021 - 19:20 - ClausBrod to top

CoCreateModeling

CoCreate Modeling: Solving the subset sum problem

Welcome