.. _running_parsing-scenario:


Parsing an execution scenario
----------------------------------------

In this section, we assume an execution scenario has already been created and executed.
For reference on how to do that, check the :ref:`Execution Scenario <running_scenario-generation>` section.
    
.. py:module:: optilog.running
    :noindex:

.. autofunction:: parse_scenario

In order to parse a scenario we will use the method `parse_scenario`, which will return a DataFrame with parsed data. `parse_scenario` parses the logs following the directives of a `ParsingInfo` object.

.. autoclass:: ParsingInfo


Here we have a very basic example with a single SAT solver (glucose):

.. code:: python
    :number-lines:

    from optilog.running import ParsingInfo, parse_scenario
    
    parsing_info = ParsingInfo.from_template('sat', 'standard')
    # (Optional) Add extra parsing regex
    parsing_info.add_filter('num_vars', r'Number of variables:\s*(\d+)', cast_to=int)
    parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int)
    
    df = parse_scenario(
        './example_glucose',
        parsing_info=parsing_info
    )

And then analyze the resulting dataframe:

.. code:: python
    :number-lines:
    
    # 1. Print column from the parsing tag:
    print(df['glucose41']['num_clauses'])
    
    # 2. Print column from the parsing tag defined in the scenario itself:
    print(df['glucose41']['num_vars'])
    
    
    # 3. Available information:
    print(df.columns)
    
    
    # 4. Manual example:
    
    # 4.0 Select SAT solver:
    df = df['glucose41']

    # 4.1 Select SAT instances:
    print(df[df['sat'] == 'SATISFIABLE'])
    
    # 4.2 Select UNSAT instances solved in less than 10s:
    # NOTE: Careful with the parentheses (this is a pandas issue)
    print(df[(df['sat'] == 'UNSATISFIABLE') & (df['wall_time_sat'] < 10)])
    
    # 4.3 Select instances with more than 1M clauses and 300k variables
    
    filtered = df[(df['num_vars'] > 250_000) & (df['num_clauses'] > 1_000_000)]
    print(filtered)
    
    # 4.4 Check if all the instances of the previous example are unsolved
    
    print('Filtered are unsolved?', filtered['sat'].isnull().all())
    

ParsingInfo offers a lot of flexibility in terms of parsing.
We will now see some examples of what is possible with it:

.. code:: python
    :number-lines:

    # Parse the string "Number of clauses: 34" and cast the result to int
    parsing_info.add_filter('num_clauses', r'Number of clauses:\s*(\d+)', cast_to=int)

.. code:: python
    :number-lines:

    # Parse the string "Suboptimum: 34.72" and cast the result to float.
    # Keeps a history of all the parsed values and stores them in a list.
    parsing_info.add_filter('subopt', r'Suboptimum:\s*(\d+)', cast_to=float, save_history=True)
    # If both timestamp and save_history are True, the history will be a list of tuples (timestamp, value)
    # If timestamp is both cpu and wall, the history will be a list of tuples ((cpu, wall), value)

.. code:: python
    :number-lines:

    # Parse the string "o Cost: 34" and cast the result to int.
    # On parsing, will create two/three columns. A "cost" column with the last match,
    # and a "wall_time_cost" (and optionally "cpu_time_cost") column with the timestamp
    # in seconds when the string was printed.
    parsing_info.add_filter("cost", r"^o\s+(\d+)", timestamp=True, cast_to=int)

.. code:: python
    :number-lines:

    # This is the same as the previous example, but the timestamp is in milliseconds.
    parsing_info.add_filter("cost", r"^o\s+(\d+)", timestamp=True, cast_to=int, time_scale='ms')

The following example shows how to parse an experiment with the number of 
conflicts on a number of instances executed with the Glucose solver.
Notice that this scenario has been created with a single seed.

.. code:: python
    :number-lines:

    from optilog.running import parse_scenario
    
    parsing_info = ParsingInfo()
    parsing_info.add_filter('conflicts', 'Num conflicts: (\d+)')
    
    df = parse_scenario(
        './scenario',
        parsing_info=parsing_info
    )

We would get an output like this one:

.. code:: python
    :number-lines:

    >>> print(df)
                                                        glucose41
                                                        conflicts
    instance                                       seed          
    manthey/traffic/traffic_pcb_unknown.cnf.gz     1       739612
    manthey/traffic/traffic_b_unsat.cnf.gz         1       895011
    manthey/traffic/traffic_3b_unknown.cnf.gz      1       876072
    manthey/traffic/traffic_fb_unknown.cnf.gz      1       937239
    manthey/traffic/traffic_kkb_unknown.cnf.gz     1       950170
    (...)


Notice that the rows and the columns are a multiindex. If we parsed with simplify_index=True we would get a single index on the rows:


.. code:: python
    :number-lines:

    >>> print(df)
                                                glucose41          
                                                        seed conflicts
    manthey/traffic/traffic_pcb_unknown.cnf.gz             1    739612
    manthey/traffic/traffic_b_unsat.cnf.gz                 1    895011
    manthey/traffic/traffic_3b_unknown.cnf.gz              1    876072
    manthey/traffic/traffic_fb_unknown.cnf.gz              1    937239
    manthey/traffic/traffic_kkb_unknown.cnf.gz             1    950170

Now there are two columns, one for the seed and one for the number of conflicts.

Multiindex on the rows is useful when we run the experiment with multiple seeds.
Multiindex on the columns is useful when we run the experiment with multiple solvers.

Here is an example of an experiment with multiple seeds and multiple solvers:

.. code:: python
    :number-lines:

    >>> print(df)
                                                              glucose   cadical
                                                            conflicts conflicts
    instance                                           seed                    
    manthey/traffic/traffic_pcb_unknown.cnf.gz         2         9932     13823
                                                       1         4743      8738
    manthey/traffic/traffic_b_unsat.cnf.gz             2        21886     14970
                                                       1        10158     14999
    manthey/traffic/traffic_3b_unknown.cnf.gz          1        10778     21547
                                                       2        22622     20365

If we needed to parse the same attributes with different regular expressions (or we needed to parse different attributes),
we could supply multiple ParsingInfo objects to the `parse_scenario` function:

.. code:: python
    :number-lines:

    df = parse_scenario(
        './scenario',
        parsing_info={
            'cadical': parsing_info_cadical,
            'glucose': parsing_info_glucose,
        }
    )

..
    >>> 
    >>> # WARNING! The API of the following functions will
    >>> # probably change and it is best to not try them yet
    >>> # from optilog.running import virtual_best_solver, get_scores, get_scores_solvers, par_score