Parsing an execution scenario

In this section, we assume an execution scenario has already been created and executed. For reference on how to do that, check the ref:Execution Scenario <running_scenario-generation> section.

optilog.running.parse_scenario(path, parsing_info, trim_path='common', add_logs=False, simplify_index=True)
Parameters
  • path (str) – Path to execution scenario

  • parsing_info (Union[ParsingInfo, Dict[str, ParsingInfo]]) – Parsing Information object to be used to parse the logs. In case of a multi solver scenario a dictionary is accepted with keys as solver names and values as their corresponding ParsingInfo

  • trim_path – Trim path to the instance. Can be False (don’t trim), “common” (trim the common path of the files), “name” (keep only the name of the files)

  • add_logs (bool) – Add a logs_path column with the path to the output of the tasks. Can be False (don’t add column), “trimmed” (add the trimmed path to the logs) or “full” (add the full path to the logs)

  • simplify_index (bool) – Whether or not to simplify the row and column indexing when possible. If true, when the tasks are only executed with one seed, the seed will be added as a column but not as an index. If true, if the tasks are run with only one solver, it will be added as a column but not as a column-index.

Return type

DataFrame

Returns

DataFrame with the parsed information

In order to parse a scenario we will use the method parse_scenario, which will return a DataFrame with parsed data. parse_scenario parses the logs following the directives of a ParsingInfo object.

class optilog.running.ParsingInfo

Creates a blank Parsing Info with no parsing tags.

static from_template(output_format=None, model_format=None)

Generates a ParsingInfo instance with the most common parsing filters for SAT/MaxSAT experiments.

The parameter output_format parses the Solution Status status line (‘s SATISFIABLE’ etc.) on a filter called sat. Additionally, in maxsat mode we also parse the Solution Cost Line (‘o 13’) on a filter called cost. The parameter model_format parses the Solution Values on a filter called model. The ‘standard’ format expects decimal digits (v 1 -2 3 -4), while the ‘binary’ format expects a binary output (v 1010).

All the filters are added with a timestamp but no history. The cost filter is cast to integer and the model filter is parsed to a list of decimal integers.

Parameters
  • output_format (Optional[str]) – Accepts {‘sat’, ‘maxsat’, None} as options. Specifies the Solution Status and Solution Cost Line filters.

  • model_format (Optional[str]) – Accepts {‘standard’, ‘binary’, None} as options. Specifies the format of the model to be parsed. If it is None, models won’t be parsed, which may improve performance.

Return type

ParsingInfo

Returns

A ParsingInfo instance with configured filters

add_filter(name, expression, timestamp=False, save_history=False, cast_to=None)

Adds a filter to parse on the logs. The name of the filter will be added as a column on the pandas DataFrame. The expression must have a matching group (surrounded by parentheses). If the timestamp parameter is set to true, a new column called “time_{name}” will be added to the dataframe with the timestamp in miliseconds. By the default, only the last match of each filter is logged. But if you want to have access to all the matches, enable save_history, which will save the history of all the parsed values as a list. If the cast_to parameter is assigned a numeric type, the matched value of the filter will be casted while parsing the logs.

Parameters
  • name (str) – Name of the filter tag

  • expression (str) – Matching regex

  • timestamp (bool) – Whether or not to include the timestamp

  • save_history (bool) – Whether or not to save the history of all the matches.

  • cast_to (Optional[Type[Union[int, float]]]) – Accepts {None, int, float}. Specifies whether the match should be casted

Here we have a very basic example with a single SAT solver (glucose):

>>> from optilog.formulas.loaders import ParsingInfo
>>> from optilog.running.processor import parse_scenario
>>>
>>> parsing_info = ParsingInfo.from_template('sat', 'standard')
>>> # (Optional) Add extra parsing regex
>>> parsing_info.add_filter('num_vars', 'Number of variables:\s*(\d+)', cast_to=int)
>>> parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int)
>>>
>>> df = parse_scenario(
>>>     './example_glucose',
>>>     parsing_info=parsing_info
>>> )

And then analyze the resulting dataframe:

>>>
>>> # 1. Print column from the parsing tag:
>>> print(df['num_clauses'])
>>>
>>> # 2. Print column from the parsing tag defined in the scenario itself:
>>> print(df['num_vars'])
>>>
>>>
>>> # 3. Available information:
>>> print(df.columns)
>>>
>>>
>>> # 4. Manual example:
>>>
>>> # 4.1 Select SAT instances:
>>> print(df[df['sat'] == 'SATISFIABLE'])
>>>
>>> # 4.2 Select UNSAT instances solved in less than 10s:
>>> # NOTE: Careful with the parentheses (this is a pandas issue)
>>> print(df[(df['sat'] == 'UNSATISFIABLE') & (df['time_sat'] < 10_000)])
>>>
>>> # 4.3 Select instances with more than 1M clauses and 300k variables
>>>
>>> filtered = df[(df['num_vars'] > 250_000) & (df['num_clauses'] > 1_000_000)]
>>> print(filtered)
>>>
>>> # 4.4 Check if all the instances of the previous example are unsolved
>>>
>>> print('Filtered are unsolved?', filtered['sat'].isnull().all())
>>>

ParsingInfo offers a lot of flexibility in terms of parsing. We will now see some examples of what is possible with it:

>>> # Parse the string "Number of clauses: 34" and cast the result to int
>>> parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int)
>>> # Parse the string "Suboptimum: 34.72" and cast the result to float.
>>> # Keeps a history of all the parsed values and stores them in a list.
>>> parsing_info.add_filter('subopt', r'Suboptimum:\s*(\d+)', cast_to=float, save_history=True)
>>> # Parse the string "o Cost: 34" and cast the result to int.
>>> # On parsing, will create two columns. A "cost" column with the last match,
>>> # and a "time_cost" column with the timestamp in miliseconds when the string was printed.
>>> parsing_info.add_filter("cost", r"^o\s+(\d+)", timestamp=True, cast_to=int)

The following example shows how to parse an experiment with the number of conflicts on a number of instances executed with the cadical solver. Notice that this scenario has been created with a single seed.

>>> from optilog.formulas.loaders import ParsingInfo
>>> from optilog.running.processor import parse_scenario
>>>
>>> parsing_info = ParsingInfo()
>>> parsing_info.add_filter('conflicts', 'Num conflicts: (\d+)')
>>>
>>> df = parse_scenario(
>>>     './scenario',
>>>     parsing_info=parsing_info
>>> )
>>>
>>> print(df)

We would get an output like this one:

>>>                                                     seed conflicts   solver
>>> manthey/traffic/traffic_pcb_unknown.cnf.gz             1      1180  cadical
>>> manthey/traffic/traffic_b_unsat.cnf.gz                 1      9591  cadical
>>> manthey/traffic/traffic_3b_unknown.cnf.gz              1      4867  cadical
>>> jarvisalo/AAAI2010-SATPlanning/aaai10-planning-...     1     29728  cadical

If we ran the same experiment with multiple seeds, we would get a multindex on the rows:

>>>                                                         conflicts   solver
>>> instance                                           seed
>>> manthey/traffic/traffic_pcb_unknown.cnf.gz         2        22526  cadical
>>>                                                    1        23201  cadical
>>> manthey/traffic/traffic_b_unsat.cnf.gz             2         7704  cadical
>>>                                                    1        23537  cadical
>>> manthey/traffic/traffic_3b_unknown.cnf.gz          1        20797  cadical
>>>                                                    2         1165  cadical
>>> manthey/traffic/traffic_fb_unknown.cnf.gz          1         8257  cadical
>>>                                                    2         2671  cadical

Similarly, if we ran the experiment with multiple solvers we would get multindex in the columns:

>>>                                                           glucose   cadical
>>>                                                         conflicts conflicts
>>> instance                                           seed
>>> manthey/traffic/traffic_pcb_unknown.cnf.gz         2         9932     13823
>>>                                                    1         4743      8738
>>> manthey/traffic/traffic_b_unsat.cnf.gz             2        21886     14970
>>>                                                    1        10158     14999
>>> manthey/traffic/traffic_3b_unknown.cnf.gz          1        10778     21547
>>>                                                    2        22622     20365

If we needed to parse the same attributes with different regular expressions (or we needed to parse different attributes), we could supply multiple ParsingInfo objects to the parse_scenario function:

>>> df = parse_scenario(
>>>     './scenario',
>>>     parsing_info={
>>>         'cadical': parsing_info_cadical,
>>>         'glucose': parsing_info_glucose,
>>>     }
>>> )