Parsing an execution scenario

In this section, we assume an execution scenario has already been created and executed. For reference on how to do that, check the Execution Scenario section.

optilog.running.parse_scenario(path, parsing_info, trim_path='common', add_logs=False, simplify_index=False, time_scale='s')
Parameters
  • path (str) – Path to execution scenario

  • parsing_info (Union[ParsingInfo, Dict[str, ParsingInfo]]) – Parsing Information object to be used to parse the logs. In case of a multi solver scenario a dictionary is accepted with keys as solver names and values as their corresponding ParsingInfo

  • trim_path – Trim path to the instance. Can be False (don’t trim), “common” (trim the common path of the files), “name” (keep only the name of the files)

  • add_logs (bool) – Add a logs_path column with the path to the output of the tasks. Can be False (don’t add column), “trimmed” (add the trimmed path to the logs) or “full” (add the full path to the logs)

  • simplify_index (bool) – Whether or not to simplify the row indexing when possible. If true, when the tasks are only executed with one seed, the seed will be added as a column but not as an index.

  • time_scale (str) – Scale to be used for the time values. Can be “s” (seconds) or “ms” (milliseconds)

Return type

DataFrame

Returns

DataFrame with the parsed information

In order to parse a scenario we will use the method parse_scenario, which will return a DataFrame with parsed data. parse_scenario parses the logs following the directives of a ParsingInfo object.

class optilog.running.ParsingInfo

Creates a blank Parsing Info with no parsing tags.

static from_template(output_format=None, model_format=None)

Generates a ParsingInfo instance with the most common parsing filters for SAT/MaxSAT experiments.

The parameter output_format parses the Solution Status status line (‘s SATISFIABLE’ etc.) on a filter called sat. Additionally, in maxsat mode we also parse the Solution Cost Line (‘o 13’) on a filter called cost. The parameter model_format parses the Solution Values on a filter called model. The ‘standard’ format expects decimal digits (v 1 -2 3 -4), while the ‘binary’ format expects a binary output (v 1010).

All the filters are added with a timestamp but no history. The cost filter is cast to integer and the model filter is parsed to a list of decimal integers.

Parameters
  • output_format (Optional[str]) – Accepts {‘sat’, ‘maxsat’, None} as options. Specifies the Solution Status and Solution Cost Line filters.

  • model_format (Optional[str]) – Accepts {‘standard’, ‘binary’, None} as options. Specifies the format of the model to be parsed. If it is None, models won’t be parsed, which may improve performance.

Return type

ParsingInfo

Returns

A ParsingInfo instance with configured filters

add_filter(name, expression, timestamp=False, save_history=False, cast_to=None)

Adds a filter to parse on the logs. The name of the filter will be added as a column on the pandas DataFrame. The expression must have a matching group (surrounded by parentheses). If the timestamp parameter is set to true, a new column called “time_{name}” will be added to the dataframe with the timestamp in miliseconds. By the default, only the last match of each filter is logged. But if you want to have access to all the matches, enable save_history, which will save the history of all the parsed values as a list. If the cast_to parameter is assigned a numeric type, the matched value of the filter will be casted while parsing the logs.

Parameters
  • name (str) – Name of the filter tag

  • expression (str) – Matching regex with at least one group

  • timestamp (bool) – Whether or not to include the timestamp. The possible values are False for no timestamp, True for all the timestamps, cpu for only the cpu timestamps, and wall for only the wall timestamps.

  • save_history (bool) – Whether or not to save the history of all the matches.

  • cast_to (Optional[Type[Union[int, float]]]) – Accepts {None, int, float}. Specifies whether the match should be casted

Here we have a very basic example with a single SAT solver (glucose):

 1from optilog.running import ParsingInfo, parse_scenario
 2
 3parsing_info = ParsingInfo.from_template('sat', 'standard')
 4# (Optional) Add extra parsing regex
 5parsing_info.add_filter('num_vars', r'Number of variables:\s*(\d+)', cast_to=int)
 6parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int)
 7
 8df = parse_scenario(
 9    './example_glucose',
10    parsing_info=parsing_info
11)

And then analyze the resulting dataframe:

 1# 1. Print column from the parsing tag:
 2print(df['glucose41']['num_clauses'])
 3
 4# 2. Print column from the parsing tag defined in the scenario itself:
 5print(df['glucose41']['num_vars'])
 6
 7
 8# 3. Available information:
 9print(df.columns)
10
11
12# 4. Manual example:
13
14# 4.0 Select SAT solver:
15df = df['glucose41']
16
17# 4.1 Select SAT instances:
18print(df[df['sat'] == 'SATISFIABLE'])
19
20# 4.2 Select UNSAT instances solved in less than 10s:
21# NOTE: Careful with the parentheses (this is a pandas issue)
22print(df[(df['sat'] == 'UNSATISFIABLE') & (df['wall_time_sat'] < 10)])
23
24# 4.3 Select instances with more than 1M clauses and 300k variables
25
26filtered = df[(df['num_vars'] > 250_000) & (df['num_clauses'] > 1_000_000)]
27print(filtered)
28
29# 4.4 Check if all the instances of the previous example are unsolved
30
31print('Filtered are unsolved?', filtered['sat'].isnull().all())

ParsingInfo offers a lot of flexibility in terms of parsing. We will now see some examples of what is possible with it:

1# Parse the string "Number of clauses: 34" and cast the result to int
2parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int)
1# Parse the string "Suboptimum: 34.72" and cast the result to float.
2# Keeps a history of all the parsed values and stores them in a list.
3parsing_info.add_filter('subopt', r'Suboptimum:\s*(\d+)', cast_to=float, save_history=True)
4# If both timestamp and save_history are True, the history will be a list of tuples (timestamp, value)
5# If timestamp is both cpu and wall, the history will be a list of tuples ((cpu, wall), value)
1# Parse the string "o Cost: 34" and cast the result to int.
2# On parsing, will create two/three columns. A "cost" column with the last match,
3# and a "wall_time_cost" (and optionally "cpu_time_cost") column with the timestamp
4# in seconds when the string was printed.
5parsing_info.add_filter("cost", r"^o\s+(\d+)", timestamp=True, cast_to=int)
1# This is the same as the previous example, but the timestamp is in milliseconds.
2parsing_info.add_filter("cost", r"^o\s+(\d+)", timestamp=True, cast_to=int, time_scale='ms')

The following example shows how to parse an experiment with the number of conflicts on a number of instances executed with the Glucose solver. Notice that this scenario has been created with a single seed.

1from optilog.running import parse_scenario
2
3parsing_info = ParsingInfo()
4parsing_info.add_filter('conflicts', 'Num conflicts: (\d+)')
5
6df = parse_scenario(
7    './scenario',
8    parsing_info=parsing_info
9)

We would get an output like this one:

 1>>> print(df)
 2                                                    glucose41
 3                                                    conflicts
 4instance                                       seed
 5manthey/traffic/traffic_pcb_unknown.cnf.gz     1       739612
 6manthey/traffic/traffic_b_unsat.cnf.gz         1       895011
 7manthey/traffic/traffic_3b_unknown.cnf.gz      1       876072
 8manthey/traffic/traffic_fb_unknown.cnf.gz      1       937239
 9manthey/traffic/traffic_kkb_unknown.cnf.gz     1       950170
10(...)

Notice that the rows and the columns are a multiindex. If we parsed with simplify_index=True we would get a single index on the rows:

1>>> print(df)
2                                            glucose41
3                                                    seed conflicts
4manthey/traffic/traffic_pcb_unknown.cnf.gz             1    739612
5manthey/traffic/traffic_b_unsat.cnf.gz                 1    895011
6manthey/traffic/traffic_3b_unknown.cnf.gz              1    876072
7manthey/traffic/traffic_fb_unknown.cnf.gz              1    937239
8manthey/traffic/traffic_kkb_unknown.cnf.gz             1    950170

Now there are two columns, one for the seed and one for the number of conflicts.

Multiindex on the rows is useful when we run the experiment with multiple seeds. Multiindex on the columns is useful when we run the experiment with multiple solvers.

Here is an example of an experiment with multiple seeds and multiple solvers:

 1>>> print(df)
 2                                                          glucose   cadical
 3                                                        conflicts conflicts
 4instance                                           seed
 5manthey/traffic/traffic_pcb_unknown.cnf.gz         2         9932     13823
 6                                                   1         4743      8738
 7manthey/traffic/traffic_b_unsat.cnf.gz             2        21886     14970
 8                                                   1        10158     14999
 9manthey/traffic/traffic_3b_unknown.cnf.gz          1        10778     21547
10                                                   2        22622     20365

If we needed to parse the same attributes with different regular expressions (or we needed to parse different attributes), we could supply multiple ParsingInfo objects to the parse_scenario function:

1df = parse_scenario(
2    './scenario',
3    parsing_info={
4        'cadical': parsing_info_cadical,
5        'glucose': parsing_info_glucose,
6    }
7)