The European Patent Office considered several method steps for generating a parallel computation graph as not technical. Here are the practical takeaways from the decision T 1125/17 (Parallelizing computation graphs/AB INITIO) of April 2, 2019 of Technical Board of Appeal 3.5.06:
This patent application relates to computation graphs in which the vertices and the links define, respectively, “data processing elements” and the data flow between them. More specifically, the application relates to the transformation of “serial computation graphs” into “parallel computation graphs”. The application explains that it may be desirable to implement a computation graph using multiple instances of individual components. For example, each instance of a component may be hosted an a different processor, thereby achieving a coarse-grain parallelism that provides an overall increase in computation capacity.
Claim 1 (auxiliary request 5)
A computer-implemented method for automated generation, from a user specified computation graph, of a parallel computation graph that specifies the distribution of the computations of the user specified computation graph for parallel execution on a number of separate processors, the parallel computation graph including:
at least a first parallel data processing element representing a plurality of instances of a first data processing element arranged in parallel;
at least a second parallel data processing element representing a plurality of instances of a second data processing element arranged in parallel; and
at least a parallelized inter-component link joining the first parallel data processing element and the second parallel data processing element,
whereby, during execution of the parallel computation graph, each instance of the data processing elements may be hosted on a different processor;
the method including: accepting, by a computer system, a specification of the user specified computation graph in which data processing elements are joined by linking elements, each of the linking elements being associated with a data flow from an associated upstream one of the data processing elements to an associated downstream one of the data processing elements; for each of one or more of the linking elements (205) of the computation graph, that joins an upstream data processing element (210) to a downstream data processing element (240) and for which parallel characteristics are undetermined:
determining, by the computer system, data processing characteristics of a parallelized inter-component link (205) representing the linking element and configured to process data records with a degree of parallelism, the parallelized inter-component link comprising a parallel partition element (221), a parallel gather element (231) and an interconnection network (225), the determining being based on:
the degree of parallelism of the upstream data processing element (m) and the downstream data processing element (n) associated with the linking element;
the characteristics of the output flow from the upstream data processing element (210) associated with the linking element;
the requirements of the input flow of the downstream data processing element associated with the linking elements;
the determining being based on:
metadata associated with the processing elements, the metadata comprising data indicators that indicate whether: if partitioned, the input flow of the processing element must be partitioned according to a particular key or keys; each instance of the component must receive copies of all work elements on its input; and the input flow must be sorted, and the key or keys that define the sort order; and
processing element specific mapping functions that accept the characteristics of each of the input data flows of the components, and produces characteristics for each of the output flows; wherein:
if m=n, and the input to the downstream data processing element does not need to be partitioned or sorted according to any particular key, and the input flow does not need a copy of each work element, corresponding instances of the upstream and downstream components are connected directly by serial links;
if m is not equal to n and the input flow to the downstream component does not need to be partitioned according to any particular key, and the input flow does not need a copy of each work element, then the partition element of the inter-component link is defined to perform a round-robin distribution;
if the input flow to the downstream component requires the work elements to be partitioned according to a set of keys that is different than the partitioning of the output flow of the upstream component, the partitioning element performs a hash partition according to the required key values; and
if the input flow requires a copy of each work element, then the partition element of the inter-component link is defined to perform a broadcast function; and
inserting the parallelized inter-component link on a data path joining m instances of the upstream data processing element and a n instances of the downstream data processing element associated with the linking element, wherein the parallel partition element (221) is implemented by m instances of a partition element (220), the parallel gather element (231) is implemented by n instances of a gather element (230), and the interconnection network (225) is implemented as a cross-connection of serial links in which every instance of partition element (220) is connected to every instance of gather element (230),
the method further comprising executing, on a computer system comprising at least two separate processors, the parallel computation graph by distributing the parallel computation graph and performing parallel execution thereof on a number of separate processors.
Is it patentable?
According to the applicant, it would be easier to parallelize a parallel computation graph as generated by the claimed method:
6. During the oral proceedings, in the context of auxiliary request 4, it was discussed whether the transformation of one computation graph into another, without an explicit mention of the deployment of parallel hardware or the execution on parallel hardware, would have to be accepted as a technical effect on the assumption that the generated computation graph was established as lending itself more easily to parallelization than the given one. Although this question turned out not to be decisive for the case at hand, the appellant encouraged the board to comment on this question in its decision.
In response, the Board in charge explained that it is of the opinion that the generated parallel computation graph is actually not easier to parallelize than a given user-specified graph. Moreover, the Board argued that an accelerated execution of a parallel program would not only dependent on the program itself, but also on the computer platform on which it is executed. In other words, when a program is executed on a single-core architecture, no speed-up may occur, whereas an acceleration may be achieved when the same program is executed on a multi-core platform:
6.3 Even a program written in a programming language with parallelization instructions or with some express potential for parallel execution such as array processing may be executed on parallel hardware or not. Such a program can also be executed on a single-core processor. In this case, a speed-up by parallelization is not achieved. In many cases, a parallel program could be expected to execute more slowly on the single core than an equivalent serial program since any parallelization overhead is not compensated by the speed-up of parallel computation. This is to say that the speed-up of parallelization is not achieved by the form of the parallel program alone and not before the program is actually deployed and executed on parallel hardware.
Since the effect of an accelerated execution is only achieved on a specific execution platform, the effect of a speed-up could not be attributed to the program alone. Hence, to argue the technical effect of an accelerated execution, the execution platform would have to be included in the claim, which is not the case here:
6.5 This board takes the view that T 1173/97 meant to make this statement only if the mentioned further technical effect was produced whenever the program was run, i.e. on any suitable hardware or runtime environment. For if that effect was produced on a particular execution platform but not on another, the effect could not be attributed to the program itself – unless maybe there was an argument to the effect that the required execution platform was implicit in the program claim. In all other situations, it would seem that the execution platform required to achieve the effect would have to be claimed as an essential feature.
6.6 This would appear to mean that if, as in the present case, an inventive-step argument is to rely upon a speed-up by parallelization, a parallel execution platform must be claimed. Consequently, the mere potential for a speed-up by parallelization would not seem to be sufficient as a “further” technical effect because this effect is not achieved irrespective of how the program is executed.
As a result, the Board dismissed the appeal.
You can read the whole decision here: T 1125/17 (Parallelizing computation graphs/AB INITIO) of April 2, 2019.