Database for sequences of time-stamped records: technical

The European Patent Office acknowledged a number of technical aspects of a method of organizing a database for sequences of time-stamped records occurring in continuous streams, but found the invention to be obvious. Here are the practical takeaways of the decision T 0818/16 (Time series search engine/SPLUNK) of 10.9.2019 of Technical Board of Appeal 3.5.07:

Key takeaways

The provision of events as data that can be analysed is a non-technical requirement that reflects the information needed by a data analyst.

The use of indexes for querying was well-known in relational database management systems and thus indexing cannot be the basis for acknowledging inventive step.

The invention

This European patent application relates to time series data organisation, search and retrieval. Time series data are sequences of time-stamped records occurring in one or more usually continuous streams, representing some type of activity made up of discrete events such as information processing logs, market transactions and sensor data from real-time monitors (supply chains, military operation networks or security systems). The ability to index, search and present relevant search results is important for understanding and working with systems emitting large quantities of time series data.

According to the application, existing large scale search engines (e.g. Google and Yahoo web search) are designed to address the needs of less time-sensitive types of data and are built on the assumption that data only needs to be stored in one state in the index repository, for example URLs in a web search index, records in a customer database, or documents as part of a file system. Searches for information are generally based on keywords.

The invention proposes a time series search engine (TSSE) for the indexing, searching and retrieval of time series data. One aspect of such TSSEs is the use of time as a primary mechanism for indexing, searching and/or presenting of search results.

Here is how the invention is defined in claim 1 (sole request):

Claim 1 (sole request)

A computer-implemented method for searching data in a time series search engine, comprising:

receiving streams of different types of machine data generated by different types of machines, wherein different types of machine data are in different formats;

executing a time stamping process on a stream of machine data by:

classifying a collection of machine data from the stream into a domain, the domain being indicative of a source of machine data;

applying aggregation rules corresponding to the domain for the classified machine data to organize the classified machine data into a plurality of events by detecting the beginning and ending boundaries for each event;

determining, for each event of the plurality of events, a time stamp based on the domain by iterating over potential time stamp format patterns from an ordered list; and

time stamping each event of the plurality of events with its determined time stamp to create a plurality of time stamped events, wherein each time stamped event of the plurality of time stamped events includes a respective portion of the machine data, and each time stamp is normalized to a common offset;

executing an indexing process to create time bucketed indices based on the time stamps, wherein each time bucket is defined to correspond to a certain time period according to a bucketing policy, and indexing the time stamped events includes assigning each time stamped event to a time bucket from amongst a plurality of time buckets instantiated in random access computer memory, wherein the assignment is based on the time stamp for the time stamped event, and wherein the bucketing policy enforces that buckets (i) do not overlap and (ii) cover all possible incoming time stamps; and

upon receiving a time series search request that requires search results to be sorted in reverse chronological order, generating sub-searches targeted at individual time buckets, querying time buckets until a number of results specified in the search request are retrieved, and merging the results of the sub-searches into a result set organized by time according to the reverse chronological sort order for the result set, wherein the sub-search for the most recent time bucket is issued first.

Is it technical?

Since claim 1 is directed to a computer-implemented method and therefore requires the use of a computer, patent-eligibility was not an issue and was not even questioned by the board.

Turning to inventive step, the board identified two feature groups that distinguished the invention from the closest prior art. Since the board did not see a synergistic effect shared by the two feature groups, they could be treated separately concerning the questions of non-obviousness.

The first feature group was found to aim to provide event data with normalised time stamps. Here, the board noted:

The provision of events as data that can be analysed is a non-technical requirement that reflects the information needed by a data analyst. According to the established case law of the boards of appeal, when assessing inventive step in accordance with the problem/solution approach, an aim to be achieved in a non-technical field may legitimately appear in the formulation of the problem as part of the framework of the technical problem to be solved as a constraint that has to be met (see decisions T 641/00, OJ EPO 2003, 352; T 154/04, OJ EPO 2008, 46). Hence, steps B2 to B4 solve the problem of how to implement the conversion of the classified machine data into event data that can be analysed with respect to time.

As to step B2, D11 discloses parsing machine data in paragraph [0050]. According to the description of the present application (paragraph [0046]), an example of an aggregation rule for detecting beginning and ending boundaries of events consists of detecting line breaks. The Board is aware that the wording of step B2 is rather broad and that the event boundaries may be defined based on non-technical considerations or at least not based on further technical considerations (see opinion G 3/08, OJ EPO 2011, 10, reasons 13.5.1). However, in any case, on the relevant date the skilled person would have extended the parser of D11 with rules to detect event boundaries such as line breaks in the machine data without exercising inventive skill.

As to steps B3 and B4, D11 (see paragraphs [0028] and [0037]) discloses that the time stamps are received in different formats and that the data is stored in a sequentially ordered table. Moreover, D11 (paragraph [0046]) discloses that the system performance can be improved if the data is sorted (e.g. in chronological order) prior to insertion into the database. In view of this, it was obvious for a skilled person to store the database table in the sort order of the data to be inserted, i.e. in chronologically sorted order. Moreover, the skilled person would consider providing some kind of normalisation of the time stamps, such as normalisation to a common offset, as they are received in different formats. The application itself mentions the well-known Unix epoch as a common offset (description, paragraph [0049]). Hence, the skilled person would have considered using such a well-known common offset for normalisation.

It follows that the skilled person could and would arrive at steps B2 to B4 of claim 1 without exercising inventive skill.

The second feature group defined that the time stamps are used to assign the events to time buckets instantiated in random access computer memory. Here, the board took the following view:

However, the use of indexes for querying was well-known in relational database management systems and thus indexing cannot be the basis for acknowledging inventive step. Document D11 does not explain how the sorted table is actually stored, but it was usual to store such a table not in a single storage area, but in several storage areas (in the main memory or secondary storage). As the data table is sorted in chronological order, different parts of this table, which are stored in different storage areas, correspond to non-overlapping time buckets as claimed. The Board is aware that generally a further difference could be that with a sorted table the events stored within a particular part of the table are stored in sorted order, whereas the events assigned to an individual time bucket may be stored unordered. However, as steps C and D of claim 1 do not specify whether or not the data within an individual time bucket is sorted, there is no further difference which the Board needs to take into account.

In view of the above, the Board considers that, on the relevant date, the skilled person would arrive at steps C and D in an obvious manner.

Therefore, the board ultimately decided that claim 1 lacks inventive step and dismissed the appeal.

More information

You can read the whole decision here: T 0818/16 (Time series search engine/SPLUNK) of 10.9.2019

Stay in the loop

Never miss a beat by subscribing to the email newsletter. Please see our Privacy Policy.

Privacy policy Yes, I consent to the collection, processing and use of my above-mentioned personal data for the purposes of processing my message and for the purposes of contacting me via email. The legal basis of the processing shall be formed by my consent pursuant to Art. 6 (1) lit. a GDPR. The data will be deleted three months after expiry of the purpose, provided that longer retention periods are not required by law. I can revoke this consent with future effect at any time. I have taken note of the privacy statement and consent to it. With regard to the processing of my data, I am entitled to inalienable rights, information on which can be found in the privacy statement.

* = Required field

Key takeaways

The invention

Claim 1 (sole request)

Is it technical?

More information

Stay in the loop

Related Articles