.. _running_parsing-scenario: Parsing an execution scenario ---------------------------------------- In this section, we assume an execution scenario has already been created and executed. For reference on how to do that, check the :ref:`Execution Scenario ` section. .. py:module:: optilog.running :noindex: .. autofunction:: parse_scenario In order to parse a scenario we will use the method `parse_scenario`, which will return a DataFrame with parsed data. `parse_scenario` parses the logs following the directives of a `ParsingInfo` object. .. autoclass:: ParsingInfo Here we have a very basic example with a single SAT solver (glucose): .. code:: python :number-lines: from optilog.running import ParsingInfo, parse_scenario parsing_info = ParsingInfo.from_template('sat', 'standard') # (Optional) Add extra parsing regex parsing_info.add_filter('num_vars', r'Number of variables:\s*(\d+)', cast_to=int) parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int) df = parse_scenario( './example_glucose', parsing_info=parsing_info ) And then analyze the resulting dataframe: .. code:: python :number-lines: # 1. Print column from the parsing tag: print(df['glucose41']['num_clauses']) # 2. Print column from the parsing tag defined in the scenario itself: print(df['glucose41']['num_vars']) # 3. Available information: print(df.columns) # 4. Manual example: # 4.0 Select SAT solver: df = df['glucose41'] # 4.1 Select SAT instances: print(df[df['sat'] == 'SATISFIABLE']) # 4.2 Select UNSAT instances solved in less than 10s: # NOTE: Careful with the parentheses (this is a pandas issue) print(df[(df['sat'] == 'UNSATISFIABLE') & (df['wall_time_sat'] < 10)]) # 4.3 Select instances with more than 1M clauses and 300k variables filtered = df[(df['num_vars'] > 250_000) & (df['num_clauses'] > 1_000_000)] print(filtered) # 4.4 Check if all the instances of the previous example are unsolved print('Filtered are unsolved?', filtered['sat'].isnull().all()) ParsingInfo offers a lot of flexibility in terms of parsing. We will now see some examples of what is possible with it: .. code:: python :number-lines: # Parse the string "Number of clauses: 34" and cast the result to int parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int) .. code:: python :number-lines: # Parse the string "Suboptimum: 34.72" and cast the result to float. # Keeps a history of all the parsed values and stores them in a list. parsing_info.add_filter('subopt', r'Suboptimum:\s*(\d+)', cast_to=float, save_history=True) # If both timestamp and save_history are True, the history will be a list of tuples (timestamp, value) # If timestamp is both cpu and wall, the history will be a list of tuples ((cpu, wall), value) .. code:: python :number-lines: # Parse the string "o Cost: 34" and cast the result to int. # On parsing, will create two/three columns. A "cost" column with the last match, # and a "wall_time_cost" (and optionally "cpu_time_cost") column with the timestamp # in seconds when the string was printed. parsing_info.add_filter("cost", r"^o\s+(\d+)", timestamp=True, cast_to=int) .. code:: python :number-lines: # This is the same as the previous example, but the timestamp is in milliseconds. parsing_info.add_filter("cost", r"^o\s+(\d+)", timestamp=True, cast_to=int, time_scale='ms') The following example shows how to parse an experiment with the number of conflicts on a number of instances executed with the Glucose solver. Notice that this scenario has been created with a single seed. .. code:: python :number-lines: from optilog.running import parse_scenario parsing_info = ParsingInfo() parsing_info.add_filter('conflicts', 'Num conflicts: (\d+)') df = parse_scenario( './scenario', parsing_info=parsing_info ) We would get an output like this one: .. code:: python :number-lines: >>> print(df) glucose41 conflicts instance seed manthey/traffic/traffic_pcb_unknown.cnf.gz 1 739612 manthey/traffic/traffic_b_unsat.cnf.gz 1 895011 manthey/traffic/traffic_3b_unknown.cnf.gz 1 876072 manthey/traffic/traffic_fb_unknown.cnf.gz 1 937239 manthey/traffic/traffic_kkb_unknown.cnf.gz 1 950170 (...) Notice that the rows and the columns are a multiindex. If we parsed with simplify_index=True we would get a single index on the rows: .. code:: python :number-lines: >>> print(df) glucose41 seed conflicts manthey/traffic/traffic_pcb_unknown.cnf.gz 1 739612 manthey/traffic/traffic_b_unsat.cnf.gz 1 895011 manthey/traffic/traffic_3b_unknown.cnf.gz 1 876072 manthey/traffic/traffic_fb_unknown.cnf.gz 1 937239 manthey/traffic/traffic_kkb_unknown.cnf.gz 1 950170 Now there are two columns, one for the seed and one for the number of conflicts. Multiindex on the rows is useful when we run the experiment with multiple seeds. Multiindex on the columns is useful when we run the experiment with multiple solvers. Here is an example of an experiment with multiple seeds and multiple solvers: .. code:: python :number-lines: >>> print(df) glucose cadical conflicts conflicts instance seed manthey/traffic/traffic_pcb_unknown.cnf.gz 2 9932 13823 1 4743 8738 manthey/traffic/traffic_b_unsat.cnf.gz 2 21886 14970 1 10158 14999 manthey/traffic/traffic_3b_unknown.cnf.gz 1 10778 21547 2 22622 20365 If we needed to parse the same attributes with different regular expressions (or we needed to parse different attributes), we could supply multiple ParsingInfo objects to the `parse_scenario` function: .. code:: python :number-lines: df = parse_scenario( './scenario', parsing_info={ 'cadical': parsing_info_cadical, 'glucose': parsing_info_glucose, } ) .. >>> >>> # WARNING! The API of the following functions will >>> # probably change and it is best to not try them yet >>> # from optilog.running import virtual_best_solver, get_scores, get_scores_solvers, par_score