pm4py package#

Subpackages#

Submodules#

pm4py.conformance module#

The pm4py.conformance module contains the conformance checking algorithms implemented in pm4py

pm4py.conformance.conformance_diagnostics_token_based_replay(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False, opt_parameters: Dict[Any, Any] | None = None) List[Dict[str, Any]][source]#

Apply token-based replay for conformance checking analysis. The methods return the full token-based-replay diagnostics.

Token-based replay matches a trace and a Petri net model, starting from the initial place, in order to discover which transitions are executed and in which places we have remaining or missing tokens for the given process instance. Token-based replay is useful for Conformance Checking: indeed, a trace is fitting according to the model if, during its execution, the transitions can be fired without the need to insert any missing token. If the reaching of the final marking is imposed, then a trace is fitting if it reaches the final marking without any missing or remaining tokens.

In PM4Py there is an implementation of a token replayer that is able to go across hidden transitions (calculating shortest paths between places) and can be used with any Petri net model with unique visible transitions and hidden transitions. When a visible transition needs to be fired and not all places in the preset are provided with the correct number of tokens, starting from the current marking it is checked if for some place there is a sequence of hidden transitions that could be fired in order to enable the visible transition. The hidden transitions are then fired and a marking that permits to enable the visible transition is reached. The approach is described in: Berti, Alessandro, and Wil MP van der Aalst. “Reviving Token-based Replay: Increasing Speed While Improving Diagnostics.” ATAED@ Petri Nets/ACSD. 2019.

The output of the token-based replay, stored in the variable replayed_traces, contains for each trace of the log:

  • trace_is_fit: boolean value (True/False) that is true when the trace is according to the model.

  • activated_transitions: list of transitions activated in the model by the token-based replay.

  • reached_marking: marking reached at the end of the replay.

  • missing_tokens: number of missing tokens.

  • consumed_tokens: number of consumed tokens.

  • remaining_tokens: number of remaining tokens.

  • produced_tokens: number of produced tokens.

Parameters:
  • log – event log

  • petri_net (PetriNet) – petri net

  • initial_marking (Marking) – initial marking

  • final_marking (Marking) – final marking

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • return_diagnostics_dataframe (bool) – if possible, returns a dataframe with the diagnostics (instead of the usual output)

  • opt_parameters – optional parameters of the token-based replay, including: * reach_mark_through_hidden: boolean value that decides if we shall try to reach the final marking through hidden transitions * stop_immediately_unfit: boolean value that decides if we shall stop immediately when a non-conformance is detected * walk_through_hidden_trans: boolean value that decides if we shall walk through hidden transitions in order to enable visible transitions * places_shortest_path_by_hidden: shortest paths between places by hidden transitions * is_reduction: expresses if the token-based replay is called in a reduction attempt * thread_maximum_ex_time: alignment threads maximum allowed execution time * cleaning_token_flood: decides if a cleaning of the token flood shall be operated * disable_variants: disable variants grouping * return_object_names: decides whether names instead of object pointers shall be returned

Return type:

List[Dict[str, Any]]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
tbr_diagnostics = pm4py.conformance_diagnostics_token_based_replay(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.conformance_diagnostics_alignments(log: EventLog | DataFrame, *args, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', variant_str: str | None = None, return_diagnostics_dataframe: bool = False, **kwargs) List[Dict[str, Any]][source]#

Apply the alignments algorithm between a log and a process model. The methods return the full alignment diagnostics.

Alignment-based replay aims to find one of the best alignment between the trace and the model. For each trace, the output of an alignment is a list of couples where the first element is an event (of the trace) or » and the second element is a transition (of the model) or ». For each couple, the following classification could be provided:

  • Sync move: the classification of the event corresponds to the transition label; in this case, both the trace and the model advance in the same way during the replay.

  • Move on log: for couples where the second element is », it corresponds to a replay move in the trace that is not mimicked in the model. This kind of move is unfit and signal a deviation between the trace and the model.

  • Move on model: for couples where the first element is », it corresponds to a replay move in the model that is not mimicked in the trace. For moves on model, we can have the following distinction:
    • Moves on model involving hidden transitions: in this case, even if it is not a sync move, the move is fit.

    • Moves on model not involving hidden transitions: in this case, the move is unfit and signals a deviation between the trace and the model.

With each trace, a dictionary containing among the others the following information is associated:

alignment: contains the alignment (sync moves, moves on log, moves on model) cost: contains the cost of the alignment according to the provided cost function fitness: is equal to 1 if the trace is perfectly fitting.

Parameters:
  • log – event log

  • args – specification of the process model

  • multi_processing (bool) – boolean value that enables the multiprocessing

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • variant_str – variant specification (for Petri net alignments)

  • return_diagnostics_dataframe (bool) – if possible, returns a dataframe with the diagnostics (instead of the usual output)

Return type:

List[Dict[str, Any]]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
alignments_diagnostics = pm4py.conformance_diagnostics_alignments(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.fitness_token_based_replay(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, float][source]#

Calculates the fitness using token-based replay. The fitness is calculated on a log-based level. The output dictionary contains the following keys: - perc_fit_traces (the percentage of fit traces (from 0.0 to 100.0)) - average_trace_fitness (between 0.0 and 1.0; computed as average of the trace fitnesses) - log_fitness (between 0.0 and 1.0) - percentage_of_fitting_traces (the percentage of fit traces (from 0.0 to 100.0)

Token-based replay matches a trace and a Petri net model, starting from the initial place, in order to discover which transitions are executed and in which places we have remaining or missing tokens for the given process instance. Token-based replay is useful for Conformance Checking: indeed, a trace is fitting according to the model if, during its execution, the transitions can be fired without the need to insert any missing token. If the reaching of the final marking is imposed, then a trace is fitting if it reaches the final marking without any missing or remaining tokens.

In PM4Py there is an implementation of a token replayer that is able to go across hidden transitions (calculating shortest paths between places) and can be used with any Petri net model with unique visible transitions and hidden transitions. When a visible transition needs to be fired and not all places in the preset are provided with the correct number of tokens, starting from the current marking it is checked if for some place there is a sequence of hidden transitions that could be fired in order to enable the visible transition. The hidden transitions are then fired and a marking that permits to enable the visible transition is reached. The approach is described in: Berti, Alessandro, and Wil MP van der Aalst. “Reviving Token-based Replay: Increasing Speed While Improving Diagnostics.” ATAED@ Petri Nets/ACSD. 2019.

The calculation of the replay fitness aim to calculate how much of the behavior in the log is admitted by the process model. We propose two methods to calculate replay fitness, based on token-based replay and alignments respectively.

For token-based replay, the percentage of traces that are completely fit is returned, along with a fitness value that is calculated as indicated in the scientific contribution

Parameters:
  • log – event log

  • petri_net (PetriNet) – petri net

  • initial_marking (Marking) – initial marking

  • final_marking (Marking) – final marking

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Dict[str, float]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
fitness_tbr = pm4py.fitness_token_based_replay(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.fitness_alignments(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', variant_str: str | None = None) Dict[str, float][source]#

Calculates the fitness using alignments The output dictionary contains the following keys: - average_trace_fitness (between 0.0 and 1.0; computed as average of the trace fitnesses) - log_fitness (between 0.0 and 1.0) - percentage_of_fitting_traces (the percentage of fit traces (from 0.0 to 100.0)

Alignment-based replay aims to find one of the best alignment between the trace and the model. For each trace, the output of an alignment is a list of couples where the first element is an event (of the trace) or » and the second element is a transition (of the model) or ». For each couple, the following classification could be provided:

  • Sync move: the classification of the event corresponds to the transition label; in this case, both the trace and the model advance in the same way during the replay.

  • Move on log: for couples where the second element is », it corresponds to a replay move in the trace that is not mimicked in the model. This kind of move is unfit and signal a deviation between the trace and the model.

  • Move on model: for couples where the first element is », it corresponds to a replay move in the model that is not mimicked in the trace. For moves on model, we can have the following distinction:
    • Moves on model involving hidden transitions: in this case, even if it is not a sync move, the move is fit.

    • Moves on model not involving hidden transitions: in this case, the move is unfit and signals a deviation between the trace and the model.

The calculation of the replay fitness aim to calculate how much of the behavior in the log is admitted by the process model. We propose two methods to calculate replay fitness, based on token-based replay and alignments respectively.

For alignments, the percentage of traces that are completely fit is returned, along with a fitness value that is calculated as the average of the fitness values of the single traces.

Parameters:
  • log – event log

  • petri_net (PetriNet) – petri net

  • initial_marking (Marking) – initial marking

  • final_marking (Marking) – final marking

  • multi_processing (bool) – boolean value that enables the multiprocessing

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • variant_str – variant specification

Return type:

Dict[str, float]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
fitness_alignments = pm4py.fitness_alignments(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.precision_token_based_replay(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') float[source]#

Calculates the precision precision using token-based replay

Token-based replay matches a trace and a Petri net model, starting from the initial place, in order to discover which transitions are executed and in which places we have remaining or missing tokens for the given process instance. Token-based replay is useful for Conformance Checking: indeed, a trace is fitting according to the model if, during its execution, the transitions can be fired without the need to insert any missing token. If the reaching of the final marking is imposed, then a trace is fitting if it reaches the final marking without any missing or remaining tokens.

In PM4Py there is an implementation of a token replayer that is able to go across hidden transitions (calculating shortest paths between places) and can be used with any Petri net model with unique visible transitions and hidden transitions. When a visible transition needs to be fired and not all places in the preset are provided with the correct number of tokens, starting from the current marking it is checked if for some place there is a sequence of hidden transitions that could be fired in order to enable the visible transition. The hidden transitions are then fired and a marking that permits to enable the visible transition is reached. The approach is described in: Berti, Alessandro, and Wil MP van der Aalst. “Reviving Token-based Replay: Increasing Speed While Improving Diagnostics.” ATAED@ Petri Nets/ACSD. 2019.

The reference paper for the TBR-based precision (ETConformance) is: Muñoz-Gama, Jorge, and Josep Carmona. “A fresh look at precision in process conformance.” International Conference on Business Process Management. Springer, Berlin, Heidelberg, 2010.

In this approach, the different prefixes of the log are replayed (whether possible) on the model. At the reached marking, the set of transitions that are enabled in the process model is compared with the set of activities that follow the prefix. The more the sets are different, the more the precision value is low. The more the sets are similar, the more the precision value is high.

Parameters:
  • log – event log

  • petri_net (PetriNet) – petri net

  • initial_marking (Marking) – initial marking

  • final_marking (Marking) – final marking

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

float

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
precision_tbr = pm4py.precision_token_based_replay(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.precision_alignments(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') float[source]#

Calculates the precision of the model w.r.t. the event log using alignments

Alignment-based replay aims to find one of the best alignment between the trace and the model. For each trace, the output of an alignment is a list of couples where the first element is an event (of the trace) or » and the second element is a transition (of the model) or ». For each couple, the following classification could be provided:

  • Sync move: the classification of the event corresponds to the transition label; in this case, both the trace and the model advance in the same way during the replay.

  • Move on log: for couples where the second element is », it corresponds to a replay move in the trace that is not mimicked in the model. This kind of move is unfit and signal a deviation between the trace and the model.

  • Move on model: for couples where the first element is », it corresponds to a replay move in the model that is not mimicked in the trace. For moves on model, we can have the following distinction:
    • Moves on model involving hidden transitions: in this case, even if it is not a sync move, the move is fit.

    • Moves on model not involving hidden transitions: in this case, the move is unfit and signals a deviation between the trace and the model.

The reference paper for the alignments-based precision (Align-ETConformance) is: Adriansyah, Arya, et al. “Measuring precision of modeled behavior.” Information systems and e-Business Management 13.1 (2015): 37-67

In this approach, the different prefixes of the log are replayed (whether possible) on the model. At the reached marking, the set of transitions that are enabled in the process model is compared with the set of activities that follow the prefix. The more the sets are different, the more the precision value is low. The more the sets are similar, the more the precision value is high.

Parameters:
  • log – event log

  • petri_net (PetriNet) – petri net

  • initial_marking (Marking) – initial marking

  • final_marking (Marking) – final marking

  • multi_processing (bool) – boolean value that enables the multiprocessing

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

float

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
precision_alignments = pm4py.precision_alignments(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.generalization_tbr(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') float[source]#

Computes the generalization of the model (against the event log). The approach is described in the paper:

Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst. “Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity.” International Journal of Cooperative Information Systems 23.01 (2014): 1440001.

Parameters:
  • log – event log

  • petri_net (PetriNet) – petri net

  • initial_marking (Marking) – initial marking

  • final_marking (Marking) – final marking

  • multi_processing – boolean value that enables the multiprocessing

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

float

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
generalization_tbr = pm4py.generalization_tbr(dataframe, net, im, fm)
pm4py.conformance.replay_prefix_tbr(prefix: List[str], net: PetriNet, im: Marking, fm: Marking, activity_key: str = 'concept:name') Marking[source]#

Replays a prefix (list of activities) on a given accepting Petri net, using Token-Based Replay.

Parameters:
  • prefix – list of activities

  • net (PetriNet) – Petri net

  • im (Marking) – initial marking

  • fm (Marking) – final marking

  • activity_key (str) – attribute to be used as activity

Return type:

Marking

import pm4py

net, im, fm = pm4py.read_pnml('tests/input_data/running-example.pnml')
marking = pm4py.replay_prefix_tbr(['register request', 'check ticket'], net, im, fm)
pm4py.conformance.conformance_diagnostics_footprints(*args) List[Dict[str, Any]] | Dict[str, Any][source]#

Provide conformance checking diagnostics using footprints

Parameters:

args – provided arguments (the first argument is supposed to be an event log (or the footprints discovered from the event log); the other arguments are supposed to be the process model (or the footprints discovered from the process model).

Return type:

Union[List[Dict[str, Any]], Dict[str, Any]]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
footprints_diagnostics = pm4py.conformance_diagnostics_footprints(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')

Deprecated since version 2.3.0: This will be removed in 3.0.0. conformance checking using footprints will not be exposed in a future release

pm4py.conformance.fitness_footprints(*args) Dict[str, float][source]#

Calculates fitness using footprints. The output is a dictionary containing two keys: - perc_fit_traces => percentage of fit traces (over the log) - log_fitness => the fitness value over the log

Parameters:

args – provided arguments (the first argument is supposed to be an event log (or the footprints discovered from the event log); the other arguments are supposed to be the process model (or the footprints discovered from the process model).

Return type:

Dict[str, float]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
fitness_fp = pm4py.fitness_footprints(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')

Deprecated since version 2.3.0: This will be removed in 3.0.0. conformance checking using footprints will not be exposed in a future release

pm4py.conformance.precision_footprints(*args) float[source]#

Calculates precision using footprints

Parameters:

args – provided arguments (the first argument is supposed to be an event log (or the footprints discovered from the event log); the other arguments are supposed to be the process model (or the footprints discovered from the process model).

Return type:

float

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
precision_fp = pm4py.precision_footprints(dataframe, net, im, fm, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')

Deprecated since version 2.3.0: This will be removed in 3.0.0. conformance checking using footprints will not be exposed in a future release

pm4py.conformance.check_is_fitting(*args, activity_key='concept:name') bool[source]#

Checks if a trace object is fit against a process model

Parameters:

args – arguments (trace object; process model (process tree, petri net, BPMN))

Return type:

bool

Deprecated since version 2.3.0: This will be removed in 3.0.0. this method will be removed in a future release.

pm4py.conformance.conformance_temporal_profile(log: EventLog | DataFrame, temporal_profile: Dict[Tuple[str, str], Tuple[float, float]], zeta: float = 1.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False) List[List[Tuple[float, float, float, float]]][source]#

Performs conformance checking on the provided log with the provided temporal profile. The result is a list of time-based deviations for every case. E.g. if the log on top of which the conformance is applied is the following (1 case): A (timestamp: 2000-01) B (timestamp: 2002-01) The difference between the timestamps of A and B is two years. If the temporal profile: {(‘A’, ‘B’): (1.5 months, 0.5 months), (‘A’, ‘C’): (5 months, 0), (‘A’, ‘D’): (2 months, 0)} is specified, and zeta is set to 1, then the aforementioned case would be deviating (considering the couple of activities (‘A’, ‘B’)), because 2 years > 1.5 months + 0.5 months.

Parameters:
  • log – log object

  • temporal_profile – temporal profile. E.g., if the log has two cases: A (timestamp: 1980-01) B (timestamp: 1980-03) C (timestamp: 1980-06); A (timestamp: 1990-01) B (timestamp: 1990-02) D (timestamp: 1990-03); The temporal profile will contain: {(‘A’, ‘B’): (1.5 months, 0.5 months), (‘A’, ‘C’): (5 months, 0), (‘A’, ‘D’): (2 months, 0)}

  • zeta (float) – number of standard deviations allowed from the average. E.g. zeta=1 allows every timestamp between AVERAGE-STDEV and AVERAGE+STDEV.

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • return_diagnostics_dataframe (bool) – if possible, returns a dataframe with the diagnostics (instead of the usual output)

Return type:

List[List[Tuple[float, float, float, float]]]

import pm4py

temporal_profile = pm4py.discover_temporal_profile(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
conformance_temporal_profile = pm4py.conformance_temporal_profile(dataframe, temporal_profile, zeta=1, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.conformance_declare(log: EventLog | DataFrame, declare_model: Dict[str, Dict[Any, Dict[str, int]]], activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False) List[Dict[str, Any]][source]#

Applies conformance checking against a DECLARE model.

Reference paper: F. M. Maggi, A. J. Mooij and W. M. P. van der Aalst, “User-guided discovery of declarative process models,” 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 2011, pp. 192-199, doi: 10.1109/CIDM.2011.5949297.

Parameters:
  • log – event log

  • declare_model – DECLARE model

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • return_diagnostics_dataframe (bool) – if possible, returns a dataframe with the diagnostics (instead of the usual output)

Return type:

List[Dict[str, Any]]

import pm4py

log = pm4py.read_xes("C:/receipt.xes")
declare_model = pm4py.discover_declare(log)
conf_result = pm4py.conformance_declare(log, declare_model)
pm4py.conformance.conformance_log_skeleton(log: EventLog | DataFrame, log_skeleton: Dict[str, Any], activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False) List[Set[Any]][source]#

Performs conformance checking using the log skeleton

Reference paper: Verbeek, H. M. W., and R. Medeiros de Carvalho. “Log skeletons: A classification approach to process discovery.” arXiv preprint arXiv:1806.08247 (2018).

A log skeleton is a declarative model which consists of six different constraints: - “directly_follows”: specifies for some activities some strict bounds on the activities directly-following. For example,

‘A should be directly followed by B’ and ‘B should be directly followed by C’.

  • “always_before”: specifies that some activities may be executed only if some other activities are executed somewhen before

    in the history of the case. For example, ‘C should always be preceded by A’

  • “always_after”: specifies that some activities should always trigger the execution of some other activities

    in the future history of the case. For example, ‘A should always be followed by C’

  • “equivalence”: specifies that a given couple of activities should happen with the same number of occurrences inside

    a case. For example, ‘B and C should always happen the same number of times’.

  • “never_together”: specifies that a given couple of activities should never happen together in the history of the case.

    For example, ‘there should be no case containing both C and D’.

  • “activ_occurrences”: specifies the allowed number of occurrences per activity:

    E.g. A is allowed to be executed 1 or 2 times, B is allowed to be executed 1 or 2 or 3 or 4 times.

Parameters:
  • log – log object

  • log_skeleton – log skeleton object, expressed as dictionaries of the six constraints (never_together, always_before …) along with the discovered rules.

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • return_diagnostics_dataframe (bool) – if possible, returns a dataframe with the diagnostics (instead of the usual output)

Return type:

List[Set[Any]]

import pm4py

log_skeleton = pm4py.discover_log_skeleton(dataframe, noise_threshold=0.1, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
conformance_lsk = pm4py.conformance_log_skeleton(dataframe, log_skeleton, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.conformance.conformance_dcr(log: EventLog | DataFrame, dcr_graph: DcrGraph, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', group_key: str = 'org:group', resource_key: str = 'org:resource', return_diagnostics_dataframe: bool = False) DataFrame | List[Tuple[str, Dict[str, Any]]][source]#

Applies rule based conformance checking against a DCR model. reference: C. Josep et al., “Conformance Checking Software”, Springer International Publishing, 65-74, 2018., https://doi.org/10.1007/978-3-319-99414-7.

Parameters:
  • log – event log

  • dcr_graph (DcrGraph) – DCR graph

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • group_key (str) – attribute to be used as role identifier

  • resource_key (str) – attribute to be used as resource identifier

  • return_diagnostics_dataframe (bool) – if possible, returns a dataframe with the diagnostics (instead of the usual output)

Return type:

DataFrame | List[Tuple[str,Dict[str, Any]]]

pm4py.conformance.optimal_alignment_dcr(log: EventLog | DataFrame | Trace, dcr_graph: DcrGraph, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False) DataFrame | Any[source]#

Applies optimal alignment against a DCR model. Reference paper: Axel Kjeld Fjelrad Christfort & Tijs Slaats. “Efficient Optimal Alignment Between Dynamic Condition Response Graphs and Traces” https://doi.org/10.1007/978-3-031-41620-0_1 Parameters ———- log : EventLog | pd.DataFrame | Trace

Event log to be used for alignment. also supports Trace

dcr_graphDCRGraph

The DCR graph against which the log is aligned.

activity_keystr

The key to identify activity names in the log.

timestamp_keystr

The key to identify timestamps in the log.

case_id_keystr

The key to identify case identifiers in the log.

return_diagnostics_dataframebool, default False

If True, returns a diagnostics dataframe instead of the usual list output.

Returns#

Union[pd.DataFrame, List[Tuple[str, Dict[str, Any]]]]

Depending on the value of return_diagnostics_dataframe, returns either a pandas DataFrame with diagnostics or a list of alignment results.

Raises#

Exception

If the log provided is not an instance of EventLog or pandas DataFrame.

Examples#

pm4py.convert module#

The pm4py.convert module contains the cross-conversions implemented in pm4py

pm4py.convert.convert_to_event_log(obj: DataFrame | EventStream, case_id_key: str = 'case:concept:name', **kwargs) EventLog[source]#

Converts a DataFrame/EventStream object to an event log object

Parameters:
  • obj – DataFrame or EventStream object

  • case_id_key (str) – attribute to be used as case identifier

Return type:

EventLog

import pandas as pd
import pm4py

dataframe = pm4py.read_csv("tests/input_data/running-example.csv")
dataframe = pm4py.format_dataframe(dataframe, case_id_column='case:concept:name', activity_column='concept:name', timestamp_column='time:timestamp')
log = pm4py.convert_to_event_log(dataframe)
pm4py.convert.convert_to_event_stream(obj: EventLog | DataFrame, case_id_key: str = 'case:concept:name', **kwargs) EventStream[source]#

Converts a log object to an event stream

Parameters:
  • obj – log object

  • case_id_key (str) – attribute to be used as case identifier

Return type:

EventStream

import pm4py

log = pm4py.read_xes("tests/input_data/running-example.xes")
event_stream = pm4py.convert_to_event_stream(log)
pm4py.convert.convert_to_dataframe(obj: EventStream | EventLog, **kwargs) DataFrame[source]#

Converts a log object to a dataframe

Parameters:

obj – log object

Return type:

pd.DataFrame

import pm4py

log = pm4py.read_xes("tests/input_data/running-example.xes")
dataframe = pm4py.convert_to_dataframe(log)
pm4py.convert.convert_to_bpmn(*args: Tuple[PetriNet, Marking, Marking] | ProcessTree) BPMN[source]#

Converts an object to a BPMN diagram. As an input, either a Petri net (with corresponding initial and final marking) or a process tree can be provided. A process tree can always be converted into a BPMN model and thus quality of the result object is guaranteed. For Petri nets, the quality of the converison largely depends on the net provided (e.g., sound WF-nets are likely to produce reasonable BPMN models)

Parameters:

args – petri net (with initial and final marking) or process tree

Return type:

BPMN

import pm4py

# import a Petri net from a file
net, im, fm = pm4py.read_pnml("tests/input_data/running-example.pnml")
bpmn_graph = pm4py.convert_to_bpmn(net, im, fm)
pm4py.convert.convert_to_petri_net(obj: BPMN | ProcessTree | HeuristicsNet | DcrGraph | POWL | dict, *args, **kwargs) Tuple[PetriNet, Marking, Marking][source]#

Converts an input model to an (accepting) Petri net. The input objects can either be a process tree, BPMN model, a Heuristic net or a Dcr Graph. The output is a triple, containing the Petri net and the initial and final markings. The markings are only returned if they can be reasonable derived from the input model.

Parameters:

args – process tree, Heuristics net, BPMN, POWL model or Dcr Graph

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

# imports a process tree from a PTML file
process_tree = pm4py.read_ptml("tests/input_data/running-example.ptml")
net, im, fm = pm4py.convert_to_petri_net(process_tree)
pm4py.convert.convert_to_process_tree(*args: Tuple[PetriNet, Marking, Marking] | BPMN) ProcessTree[source]#

Converts an input model to a process tree. The input models can either be Petri nets (marked) or BPMN models. For both input types, the conversion is not guaranteed to work, hence, invocation of the method can yield an Exception.

Parameters:

args – petri net (along with initial and final marking) or BPMN

Return type:

ProcessTree

import pm4py

# imports a BPMN file
bpmn_graph = pm4py.read_bpmn("tests/input_data/running-example.bpmn")
# converts the BPMN to a process tree (through intermediate conversion to a Petri net)
process_tree = pm4py.convert_to_process_tree(bpmn_graph)
pm4py.convert.convert_to_reachability_graph(*args: Tuple[PetriNet, Marking, Marking] | BPMN | ProcessTree) TransitionSystem[source]#

Converts an input model to a reachability graph (transition system). The input models can either be Petri nets (with markings), BPMN models or process trees. The output is the state-space of the model (i.e., the reachability graph), enocdoed as a TransitionSystem object.

Parameters:

args – petri net (along with initial and final marking), process tree or BPMN

Return type:

TransitionSystem

import pm4py

# reads a Petri net from a file
net, im, fm = pm4py.read_pnml("tests/input_data/running-example.pnml")
# converts it to reachability graph
reach_graph = pm4py.convert_to_reachability_graph(net, im, fm)
pm4py.convert.convert_log_to_ocel(log: EventLog | EventStream | DataFrame, activity_column: str = 'concept:name', timestamp_column: str = 'time:timestamp', object_types: Collection[str] | None = None, obj_separator: str = ' AND ', additional_event_attributes: Collection[str] | None = None, additional_object_attributes: Dict[str, Collection[str]] | None = None) OCEL[source]#

Converts an event log to an object-centric event log with one or more than one object types.

Parameters:
  • log_obj – log object

  • activity_column (str) – activity column

  • timestamp_column (str) – timestamp column

  • object_types – list of columns to consider as object types

  • obj_separator (str) – separator between different objects in the same column

  • additional_event_attributes – additional attributes to be considered as event attributes in the OCEL

  • additional_object_attributes – additional attributes per object type to be considered as object attributes in the OCEL (dictionary in which object types are associated to their attributes, i.e., {“order”: [“quantity”, “cost”], “invoice”: [“date”, “due date”]})

Return type:

OCEL

pm4py.convert.convert_ocel_to_networkx(ocel: OCEL, variant: str = 'ocel_to_nx') DiGraph[source]#

Converts an OCEL to a NetworkX DiGraph object.

Parameters:
  • ocel (OCEL) – object-centric event log

  • variant (str) – variant of the conversion to use: “ocel_to_nx” -> graph containing event and object IDS and two type of relations (REL=related objects, DF=directly-follows); “ocel_features_to_nx” -> graph containing different types of interconnection at the object level

Return type:

nx.DiGraph

pm4py.convert.convert_log_to_networkx(log: EventLog | EventStream | DataFrame, include_df: bool = True, case_id_key: str = 'concept:name', other_case_attributes_as_nodes: Collection[str] | None = None, event_attributes_as_nodes: Collection[str] | None = None) DiGraph[source]#

Converts an event log object to a NetworkX DiGraph object. The nodes of the graph are the events, the cases (and possibly the attributes of the log). The edges are: - Connecting each event to the corresponding case (BELONGS_TO type) - Connecting every event to the directly-following one (DF type, if enabled) - Connecting every case/event to the given attribute values (ATTRIBUTE_EDGE type)

Parameters:
  • log – log object (EventLog, EventStream, Pandas dataframe)

  • include_df (bool) – include the directly-follows graph relation in the graph (bool)

  • case_id_attribute – specify which attribute at the case level should be considered the case ID (str)

  • other_case_attributes_as_nodes – specify which attributes at the case level should be inserted in the graph as nodes (other than the caseID) (list, default empty)

  • event_attributes_as_nodes – specify which attributes at the event level should be inserted in the graph as nodes (list, default empty)

Return type:

nx.DiGraph

pm4py.convert.convert_log_to_time_intervals(log: EventLog | DataFrame, filter_activity_couple: Tuple[str, str] | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', start_timestamp_key: str = 'time:timestamp') List[List[Any]][source]#

Gets a list of intervals from an event log. Each interval contains two temporally consecutive events and measures the time between the two events (complete timestamp of the first against start timestamp of the second).

Parameters:
  • log – log object

  • filter_activity_couple – (optional) filters the intervals to only consider a given couple of activities of the log

  • activity_key (str) – the attribute to be used as activity

  • timestamp_key (str) – the attribute to be used as timestamp

  • case_id_key (str) – the attribute to be used as case identifier

  • start_timestamp_key (str) – the attribute to be used as start timestamp

Return type:

List[List[Any]]

import pm4py

log = pm4py.read_xes('tests/input_data/receipt.xes')
time_intervals = pm4py.convert_log_to_time_intervals(log)
print(len(time_intervals))
time_intervals = pm4py.convert_log_to_time_intervals(log, ('Confirmation of receipt', 'T02 Check confirmation of receipt'))
print(len(time_intervals))
pm4py.convert.convert_petri_net_to_networkx(net: PetriNet, im: Marking, fm: Marking) DiGraph[source]#

Converts a Petri net to a NetworkX DiGraph. Each place and transition is corresponding to a node in the graph.

Parameters:
  • net (PetriNet) – Petri net

  • im (Marking) – initial marking

  • fm (Marking) – final marking

Return type:

nx.DiGraph

pm4py.convert.convert_petri_net_type(net: PetriNet, im: Marking, fm: Marking, type: str = 'classic') Tuple[PetriNet, Marking, Marking][source]#

Changes the Petri net (internal) type

Parameters:
  • net (PetriNet) – petri net

  • im (Marking) – initial marking

  • fm (Marking) – final marking

  • type (str) – internal type (classic, reset, inhibitor, reset_inhibitor)

Return type:

Tuple[PetriNet, Marking, Marking]

pm4py.discovery module#

The pm4py.discovery module contains the process discovery algorithms implemented in pm4py

pm4py.discovery.discover_dfg(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[dict, dict, dict][source]#

Discovers a Directly-Follows Graph (DFG) from a log.

This method returns a dictionary with the couples of directly-following activities (in the log) as keys and the frequency of relation as value.

Parameters:
  • log – event log / Pandas dataframe

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Tuple[dict, dict, dict]

import pm4py

dfg, start_activities, end_activities = pm4py.discover_dfg(dataframe, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_directly_follows_graph(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[dict, dict, dict][source]#
pm4py.discovery.discover_dfg_typed(log: DataFrame, case_id_key: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp') DirectlyFollowsGraph[source]#

Discovers a Directly-Follows Graph (DFG) from a log.

This method returns a typed DFG object, i.e., as specified in pm4py.objects.dfg.obj.py (DirectlyFollowsGraph Class) The DFG object describes a graph, start activities and end activities. The graph is a collection of triples of the form (a,b,f) representing an arc a->b with frequency f. The start activities are a collection of tuples of the form (a,f) representing that activity a starts f cases. The end activities are a collection of tuples of the form (a,f) representing that ativity a ends f cases.

This method replaces pm4py.discover_dfg and pm4py.discover_directly_follows_graph. In a future release, these functions will adopt the same behavior as this function.

Parameters:
  • log (DataFrame) – pandas.DataFrame

  • case_id_key (str) – attribute to be used as case identifier

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

Return type:

DFG

import pm4py

dfg = pm4py.discover_dfg_typed(log, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_performance_dfg(log: EventLog | DataFrame, business_hours: bool = False, business_hour_slots=[(25200, 61200), (111600, 147600), (198000, 234000), (284400, 320400), (370800, 406800)], workcalendar=None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[dict, dict, dict][source]#

Discovers a performance directly-follows graph from an event log.

This method returns a dictionary with the couples of directly-following activities (in the log) as keys and the performance of relation as value.

Parameters:
  • log – event log / Pandas dataframe

  • business_hours (bool) – enables/disables the computation based on the business hours (default: False)

  • business_hour_slots – work schedule of the company, provided as a list of tuples where each tuple represents one time slot of business hours. One slot i.e. one tuple consists of one start and one end time given in seconds since week start, e.g. [(7 * 60 * 60, 17 * 60 * 60), ((24 + 7) * 60 * 60, (24 + 12) * 60 * 60), ((24 + 13) * 60 * 60, (24 + 17) * 60 * 60),] meaning that business hours are Mondays 07:00 - 17:00 and Tuesdays 07:00 - 12:00 and 13:00 - 17:00

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Tuple[dict, dict, dict]

import pm4py

performance_dfg, start_activities, end_activities = pm4py.discover_performance_dfg(dataframe, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_petri_net_alpha(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the Alpha Miner.

Parameters:
  • log – event log / Pandas dataframe

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_alpha(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_petri_net_ilp(log: EventLog | DataFrame, alpha: float = 1.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the ILP Miner.

Parameters:
  • log – event log / Pandas dataframe

  • alpha (float) – noise threshold for the sequence encoding graph (1.0=no filtering, 0.0=greatest filtering)

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_ilp(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_petri_net_alpha_plus(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the Alpha+ algorithm

Parameters:
  • log – event log / Pandas dataframe

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_alpha_plus(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')

Deprecated since version 2.3.0: This will be removed in 3.0.0. this method will be removed in a future release.

pm4py.discovery.discover_petri_net_inductive(log: EventLog | DataFrame | DirectlyFollowsGraph, multi_processing: bool = False, noise_threshold: float = 0.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', disable_fallthroughs: bool = False) Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the inductive miner algorithm.

The basic idea of Inductive Miner is about detecting a ‘cut’ in the log (e.g. sequential cut, parallel cut, concurrent cut and loop cut) and then recur on sublogs, which were found applying the cut, until a base case is found. The Directly-Follows variant avoids the recursion on the sublogs but uses the Directly Follows graph.

Inductive miner models usually make extensive use of hidden transitions, especially for skipping/looping on a portion on the model. Furthermore, each visible transition has a unique label (there are no transitions in the model that share the same label).

Parameters:
  • log – event log / Pandas dataframe / typed DFG

  • noise_threshold (float) – noise threshold (default: 0.0)

  • multi_processing (bool) – boolean that enables/disables multiprocessing in inductive miner

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • disable_fallthroughs (bool) – disable the Inductive Miner fall-throughs

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_petri_net_heuristics(log: EventLog | DataFrame, dependency_threshold: float = 0.5, and_threshold: float = 0.65, loop_two_threshold: float = 0.5, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discover a Petri net using the Heuristics Miner

Heuristics Miner is an algorithm that acts on the Directly-Follows Graph, providing way to handle with noise and to find common constructs (dependency between two activities, AND). The output of the Heuristics Miner is an Heuristics Net, so an object that contains the activities and the relationships between them. The Heuristics Net can be then converted into a Petri net. The paper can be visited by clicking on the upcoming link: this link).

Parameters:
  • log – event log / Pandas dataframe

  • dependency_threshold (float) – dependency threshold (default: 0.5)

  • and_threshold (float) – AND threshold (default: 0.65)

  • loop_two_threshold (float) – loop two threshold (default: 0.5)

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_heuristics(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_process_tree_inductive(log: EventLog | DataFrame | DirectlyFollowsGraph, noise_threshold: float = 0.0, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', disable_fallthroughs: bool = False) ProcessTree[source]#

Discovers a process tree using the inductive miner algorithm

The basic idea of Inductive Miner is about detecting a ‘cut’ in the log (e.g. sequential cut, parallel cut, concurrent cut and loop cut) and then recur on sublogs, which were found applying the cut, until a base case is found. The Directly-Follows variant avoids the recursion on the sublogs but uses the Directly Follows graph.

Inductive miner models usually make extensive use of hidden transitions, especially for skipping/looping on a portion on the model. Furthermore, each visible transition has a unique label (there are no transitions in the model that share the same label).

Parameters:
  • log – event log / Pandas dataframe / typed DFG

  • noise_threshold (float) – noise threshold (default: 0.0)

  • activity_key (str) – attribute to be used for the activity

  • multi_processing (bool) – boolean that enables/disables multiprocessing in inductive miner

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • disable_fallthroughs (bool) – disable the Inductive Miner fall-throughs

Return type:

ProcessTree

import pm4py

process_tree = pm4py.discover_process_tree_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_heuristics_net(log: EventLog | DataFrame, dependency_threshold: float = 0.5, and_threshold: float = 0.65, loop_two_threshold: float = 0.5, min_act_count: int = 1, min_dfg_occurrences: int = 1, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', decoration: str = 'frequency') HeuristicsNet[source]#

Discovers an heuristics net

Heuristics Miner is an algorithm that acts on the Directly-Follows Graph, providing way to handle with noise and to find common constructs (dependency between two activities, AND). The output of the Heuristics Miner is an Heuristics Net, so an object that contains the activities and the relationships between them. The Heuristics Net can be then converted into a Petri net. The paper can be visited by clicking on the upcoming link: this link).

Parameters:
  • log – event log / Pandas dataframe

  • dependency_threshold (float) – dependency threshold (default: 0.5)

  • and_threshold (float) – AND threshold (default: 0.65)

  • loop_two_threshold (float) – loop two threshold (default: 0.5)

  • min_act_count (int) – minimum number of occurrences per activity in order to be included in the discovery

  • min_dfg_occurrences (int) – minimum number of occurrences per arc in the DFG in order to be included in the discovery

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • decoration (str) – the decoration that should be used (frequency, performance)

Return type:

HeuristicsNet

import pm4py

heu_net = pm4py.discover_heuristics_net(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.derive_minimum_self_distance(log: DataFrame | EventLog | EventStream, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, int][source]#

This algorithm computes the minimum self-distance for each activity observed in an event log. The self distance of a in <a> is infinity, of a in <a,a> is 0, in <a,b,a> is 1, etc. The activity key ‘concept:name’ is used.

Parameters:
  • log – event log / Pandas dataframe

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Dict[str, int]

import pm4py

msd = pm4py.derive_minimum_self_distance(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_footprints(*args: EventLog | Tuple[PetriNet, Marking, Marking] | ProcessTree) List[Dict[str, Any]] | Dict[str, Any][source]#

Discovers the footprints out of the provided event log / process model

Parameters:

args – event log / process model

Return type:

Union[List[Dict[str, Any]], Dict[str, Any]]

import pm4py

footprints = pm4py.discover_footprints(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_eventually_follows_graph(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[Tuple[str, str], int][source]#

Gets the eventually follows graph from a log object.

The eventually follows graph is a dictionary associating to every couple of activities which are eventually following each other the number of occurrences of this relation.

Parameters:
  • log – event log / Pandas dataframe

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Dict[Tuple[str, str], int]

import pm4py

efg = pm4py.discover_eventually_follows_graph(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_bpmn_inductive(log: EventLog | DataFrame | DirectlyFollowsGraph, noise_threshold: float = 0.0, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', disable_fallthroughs: bool = False) BPMN[source]#

Discovers a BPMN using the Inductive Miner algorithm

The basic idea of Inductive Miner is about detecting a ‘cut’ in the log (e.g. sequential cut, parallel cut, concurrent cut and loop cut) and then recur on sublogs, which were found applying the cut, until a base case is found. The Directly-Follows variant avoids the recursion on the sublogs but uses the Directly Follows graph.

Inductive miner models usually make extensive use of hidden transitions, especially for skipping/looping on a portion on the model. Furthermore, each visible transition has a unique label (there are no transitions in the model that share the same label).

Parameters:
  • log – event log / Pandas dataframe / typed DFG

  • noise_threshold (float) – noise threshold (default: 0.0)

  • multi_processing (bool) – boolean that enables/disables multiprocessing in inductive miner

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • disable_fallthroughs (bool) – disable the Inductive Miner fall-throughs

Return type:

BPMN

import pm4py

bpmn_graph = pm4py.discover_bpmn_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_transition_system(log: EventLog | DataFrame, direction: str = 'forward', window: int = 2, view: str = 'sequence', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') TransitionSystem[source]#

Discovers a transition system as described in the process mining book “Process Mining: Data Science in Action”

Parameters:
  • log – event log / Pandas dataframe

  • direction (str) – direction in which the transition system is built (forward, backward)

  • window (int) – window (2, 3, …)

  • view (str) – view to use in the construction of the states (sequence, set, multiset)

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

TransitionSystem

import pm4py

transition_system = pm4py.discover_transition_system(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_prefix_tree(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Trie[source]#

Discovers a prefix tree from the provided log object.

Parameters:
  • log – event log / Pandas dataframe

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Trie

import pm4py

prefix_tree = pm4py.discover_prefix_tree(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_temporal_profile(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[Tuple[str, str], Tuple[float, float]][source]#

Discovers a temporal profile from a log object.

Implements the approach described in: Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma. “Temporal Conformance Checking at Runtime based on Time-infused Process Models.” arXiv preprint arXiv:2008.07262 (2020).

The output is a dictionary containing, for every couple of activities eventually following in at least a case of the log, the average and the standard deviation of the difference of the timestamps.

E.g. if the log has two cases:

A (timestamp: 1980-01) B (timestamp: 1980-03) C (timestamp: 1980-06) A (timestamp: 1990-01) B (timestamp: 1990-02) D (timestamp: 1990-03)

The returned dictionary will contain: {(‘A’, ‘B’): (1.5 months, 0.5 months), (‘A’, ‘C’): (5 months, 0), (‘A’, ‘D’): (2 months, 0)}

Parameters:
  • log – event log / Pandas dataframe

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Dict[Tuple[str, str], Tuple[float, float]]

import pm4py

temporal_profile = pm4py.discover_temporal_profile(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_log_skeleton(log: EventLog | DataFrame, noise_threshold: float = 0.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, Any][source]#

Discovers a log skeleton from an event log.

A log skeleton is a declarative model which consists of six different constraints: - “directly_follows”: specifies for some activities some strict bounds on the activities directly-following. For example,

‘A should be directly followed by B’ and ‘B should be directly followed by C’.

  • “always_before”: specifies that some activities may be executed only if some other activities are executed somewhen before

    in the history of the case. For example, ‘C should always be preceded by A’

  • “always_after”: specifies that some activities should always trigger the execution of some other activities

    in the future history of the case. For example, ‘A should always be followed by C’

  • “equivalence”: specifies that a given couple of activities should happen with the same number of occurrences inside

    a case. For example, ‘B and C should always happen the same number of times’.

  • “never_together”: specifies that a given couple of activities should never happen together in the history of the case.

    For example, ‘there should be no case containing both C and D’.

  • “activ_occurrences”: specifies the allowed number of occurrences per activity:

    E.g. A is allowed to be executed 1 or 2 times, B is allowed to be executed 1 or 2 or 3 or 4 times.

Reference paper: Verbeek, H. M. W., and R. Medeiros de Carvalho. “Log skeletons: A classification approach to process discovery.” arXiv preprint arXiv:1806.08247 (2018).

Parameters:
  • log – event log / Pandas dataframe

  • noise_threshold (float) – noise threshold, acting as described in the paper.

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Dict[str, Any]

import pm4py

log_skeleton = pm4py.discover_log_skeleton(dataframe, noise_threshold=0.1, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.discovery.discover_declare(log: EventLog | DataFrame, allowed_templates: Set[str] | None = None, considered_activities: Set[str] | None = None, min_support_ratio: float | None = None, min_confidence_ratio: float | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, Dict[Any, Dict[str, int]]][source]#

Discovers a DECLARE model from an event log.

Reference paper: F. M. Maggi, A. J. Mooij and W. M. P. van der Aalst, “User-guided discovery of declarative process models,” 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 2011, pp. 192-199, doi: 10.1109/CIDM.2011.5949297.

Parameters:
  • log – event log / Pandas dataframe

  • allowed_templates – (optional) collection of templates to consider for the discovery

  • considered_activities – (optional) collection of activities to consider for the discovery

  • min_support_ratio – (optional, decided automatically otherwise) minimum percentage of cases (over the entire set of cases of the log) for which the discovered rules apply

  • min_confidence_ratio – (optional, decided automatically otherwise) minimum percentage of cases (over the rule’s support) for which the discovered rules are valid

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

Dict[str, Any]

import pm4py

declare_model = pm4py.discover_declare(log)
pm4py.discovery.discover_powl(log: EventLog | DataFrame, variant=POWLDiscoveryVariant.MAXIMAL, filtering_weight_factor: float = 0.0, order_graph_filtering_threshold: float = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') POWL[source]#

Discovers a POWL model from an event log.

Reference paper: Kourani, Humam, and Sebastiaan J. van Zelst. “POWL: partially ordered workflow language.” International Conference on Business Process Management. Cham: Springer Nature Switzerland, 2023.

Parameters:
  • log – event log / Pandas dataframe

  • variant – variant of the algorithm

  • filtering_weight_factor (float) – accepts values 0 <= x < 1

  • order_graph_filtering_threshold (float) – accepts values 0.5 < x <= 1

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

Return type:

POWL

import pm4py

log = pm4py.read_xes('tests/input_data/receipt.xes')
powl_model = pm4py.discover_powl(log, activity_key='concept:name')
print(powl_model)
pm4py.discovery.discover_batches(log: EventLog | DataFrame, merge_distance: int = 900, min_batch_size: int = 2, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource') List[Tuple[Tuple[str, str], int, Dict[str, Any]]][source]#

Discover batches from the provided log object

We say that an activity is executed in batches by a given resource when the resource executes several times the same activity in a short period of time.

Identifying such activities may identify points of the process that can be automated, since the activity of the person may be repetitive.

The following categories of batches are detected: - Simultaneous (all the events in the batch have identical start and end timestamps) - Batching at start (all the events in the batch have identical start timestamp) - Batching at end (all the events in the batch have identical end timestamp) - Sequential batching (for all the consecutive events, the end of the first is equal to the start of the second) - Concurrent batching (for all the consecutive events that are not sequentially matched)

The approach has been described in the following paper: Martin, N., Swennen, M., Depaire, B., Jans, M., Caris, A., & Vanhoof, K. (2015, December). Batch Processing: Definition and Event Log Identification. In SIMPDA (pp. 137-140).

The output is a (sorted) list containing tuples. Each tuple contain:
  • Index 0: the activity-resource for which at least one batch has been detected

  • Index 1: the number of batches for the given activity-resource

  • Index 2: a list containing all the batches. Each batch is described by:

    # The start timestamp of the batch # The complete timestamp of the batch # The list of events that are executed in the batch

Parameters:
  • log – event log / Pandas dataframe

  • merge_distance (int) – the maximum time distance between non-overlapping intervals in order for them to be considered belonging to the same batch (default: 15*60 15 minutes)

  • min_batch_size (int) – the minimum number of events for a batch to be considered (default: 2)

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • resource_key (str) – attribute to be used as resource

Return type:

List[Tuple[Tuple[str, str], int, Dict[str, Any]]]

import pm4py

batches = pm4py.discover_log_skeleton(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp', resource_key='org:resource')
pm4py.discovery.discover_dcr(log: EventLog | DataFrame, post_process: Set[str] = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource', group_key: str = 'org:group', finaAdditionalConditions: bool = True, **kwargs) Tuple[Any, Dict[str, Any]][source]#

Discovers a DCR graph from an event log based on the DisCoveR algorithm. This method implements the DCR discovery algorithm as described in: C. O. Back, T. Slaats, T. T. Hildebrandt, M. Marquard, “DisCoveR: accurate and efficient discovery of declarative process models”. Parameters ———- log : Union[EventLog, pd.DataFrame]

The event log or Pandas dataframe containing the event data.

post_processOptional[str]

Specifies the type of post-processing for the event log, currently supports ROLES, PENDING, TIMED and NESTINGS.

activity_keystr, optional

The attribute to be used for the activity, defaults to “concept:name”.

timestamp_keystr, optional

The attribute to be used for the timestamp, defaults to “time:timestamp”.

case_id_keystr, optional

The attribute to be used as the case identifier, defaults to “case:concept:name”.

group_keystr, optional

The attribute to be used as a role identifier, defaults to “org:group”.

resource_keystr, optional

The attribute to be used as a resource identifier, defaults to “org:resource”.

findAdditionalConditionsbool, optional

A boolean value specifying whether additional conditions should be found, defaults to True.

Returns#

Tuple[Any, dict]

A tuple containing the discovered DCR graph and a dictionary with additional information.

Examples#

pm4py.read module#

The pm4py.read module contains all funcationality related to reading files/objects from disk.

pm4py.read.read_xes(file_path: str, variant: str | None = None, return_legacy_log_object: bool = False, encoding: str = 'utf-8', **kwargs) DataFrame | EventLog[source]#

Reads an event log stored in XES format (see xes-standard) Returns a table (pandas.DataFrame) view of the event log.

Parameters:
  • file_path (str) – file path of the event log (.xes file) on disk

  • variant – the variant of the importer to use. “iterparse” => traditional XML parser; “line_by_line” => text-based line-by-line importer ; “chunk_regex” => chunk-of-bytes importer (default); “iterparse20” => XES 2.0 importer

  • return_legacy_log_object (bool) – boolean value enabling returning a log object (default: False)

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

DataFrame

import pm4py

log = pm4py.read_xes("<path_to_xes_file>")
pm4py.read.read_pnml(file_path: str, auto_guess_final_marking: bool = False, encoding: str = 'utf-8') Tuple[PetriNet, Marking, Marking][source]#

Reads a Petri net object from a .pnml file. The Petri net object returned is a triple containing the following objects:

  1. Petrinet Object, encoded as a PetriNet class

  2. Initial Marking

  3. Final Marking

Return type:

Tuple[PetriNet, Marking, Marking]

Parameters:
  • file_path (str) – file path of the Petri net model (.pnml file) on disk

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

pn = pm4py.read_pnml("<path_to_pnml_file>")
pm4py.read.read_ptml(file_path: str, encoding: str = 'utf-8') ProcessTree[source]#

Reads a process tree object from a .ptml file

Parameters:
  • file_path (str) – file path of the process tree object on disk

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

ProcessTree

import pm4py

process_tree = pm4py.read_ptml("<path_to_ptml_file>")
pm4py.read.read_dfg(file_path: str, encoding: str = 'utf-8') Tuple[Dict[Tuple[str, str], int], Dict[str, int], Dict[str, int]][source]#

Reads a DFG object from a .dfg file. The DFG object returned is a triple containing the following objects:

  1. DFG Object, encoded as a Dict[Tuple[str,str],int], s.t. DFG[('a','b')]=k implies that activity 'a' is directly followed by activity 'b' a total of k times in the log

  2. Start activity dictionary, encoded as a Dict[str,int], s.t., S['a']=k implies that activity 'a' is starting k traces in the event log

  3. End activity dictionary, encoded as a Dict[str,int], s.t., E['z']=k implies that activity 'z' is ending k traces in the event log.

Return type:

Tuple[Dict[Tuple[str,str],int], Dict[str,int], Dict[str,int]]

Parameters:
  • file_path (str) – file path of the dfg model on disk

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

dfg = pm4py.read_dfg("<path_to_dfg_file>")
pm4py.read.read_bpmn(file_path: str, encoding: str = 'utf-8') BPMN[source]#

Reads a BPMN model from a .bpmn file

Parameters:
  • file_path (str) – file path of the bpmn model

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

BPMN

import pm4py

bpmn = pm4py.read_bpmn('<path_to_bpmn_file>')
pm4py.read.read_ocel(file_path: str, objects_path: str | None = None, encoding: str = 'utf-8') OCEL[source]#

Reads an object-centric event log from a file (see: http://www.ocel-standard.org/). The OCEL object is returned by this method

Parameters:
  • file_path (str) – file path of the object-centric event log

  • objects_path – [Optional] file path from which the objects dataframe should be read

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel("<path_to_ocel_file>")
pm4py.read.read_ocel_csv(file_path: str, objects_path: str | None = None, encoding: str = 'utf-8') OCEL[source]#

Reads an object-centric event log from a CSV file (see: http://www.ocel-standard.org/). The OCEL object is returned by this method

Parameters:
  • file_path (str) – file path of the object-centric event log (.csv)

  • objects_path – [Optional] file path from which the objects dataframe should be read

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel_csv("<path_to_ocel_file.csv>")
pm4py.read.read_ocel_json(file_path: str, encoding: str = 'utf-8') OCEL[source]#

Reads an object-centric event log from a JSON-OCEL file (see: http://www.ocel-standard.org/). The OCEL object is returned by this method

Parameters:
  • file_path (str) – file path of the object-centric event log (.jsonocel)

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel_json("<path_to_ocel_file.jsonocel>")
pm4py.read.read_ocel_xml(file_path: str, encoding: str = 'utf-8') OCEL[source]#

Reads an object-centric event log from a XML-OCEL file (see: http://www.ocel-standard.org/). The OCEL object is returned by this method

Parameters:
  • file_path (str) – file path of the object-centric event log (.xmlocel)

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel_xml("<path_to_ocel_file.xmlocel>")
pm4py.read.read_ocel_sqlite(file_path: str, encoding: str = 'utf-8') OCEL[source]#

Reads an object-centric event log from a SQLite database (see: http://www.ocel-standard.org/). The OCEL object is returned by this method

Parameters:
  • file_path (str) – file path of the SQLite database (.sqlite)

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel_sqlite("<path_to_ocel_file.sqlite>")
pm4py.read.read_ocel2(file_path: str, variant_str: str | None = None, encoding: str = 'utf-8') OCEL[source]#

Reads an OCEL2.0 event log

Parameters:
  • file_path (str) – path to the OCEL2.0 event log

  • variant_str – (optional) specification of the importer variant to be used

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel2("<path_to_ocel_file>")
pm4py.read.read_ocel2_json(file_path: str, variant_str: str | None = None, encoding: str = 'utf-8') OCEL[source]#

Reads an OCEL2.0 event log from a JSON-OCEL(2) file

Parameters:
  • file_path (str) – path to the JSON file

  • variant_str – (optional) specification of the importer variant to be used

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel2_json("<path_to_ocel_file.jsonocel>")
pm4py.read.read_ocel2_sqlite(file_path: str, variant_str: str | None = None, encoding: str = 'utf-8') OCEL[source]#

Reads an OCEL2.0 event log from a SQLite database

Parameters:
  • file_path (str) – path to the OCEL2.0 database

  • variant_str – (optional) specification of the importer variant to be used

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel2_sqlite("<path_to_ocel_file.sqlite>")
pm4py.read.read_ocel2_xml(file_path: str, variant_str: str | None = None, encoding: str = 'utf-8') OCEL[source]#

Reads an OCEL2.0 event log from an XML file

Parameters:
  • file_path (str) – path to the OCEL2.0 event log

  • variant_str – (optional) specification of the importer variant to be used

  • encoding (str) – the encoding to be used (default: utf-8)

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel2_xml("<path_to_ocel_file.xmlocel>")
pm4py.read.read_dcr_xml(file_path, **parameters)[source]#

Reads a DCR graph from an XML file :param file_path: path to the DCR graph :param parameters: parameters of the importer :rtype: DCR .. code-block:: python3

import pm4py dcr = pm4py.read_dcr_xml(“<path_to_dcr_file>”, variant)

pm4py.vis module#

The pm4py.vis module contains the visualizations offered in pm4py

pm4py.vis.view_petri_net(petri_net: PetriNet, initial_marking: Marking | None = None, final_marking: Marking | None = None, format: str = 'png', bgcolor: str = 'white', decorations: Dict[Any, Any] = None, debug: bool = False, rankdir: str = 'LR')[source]#

Views a (composite) Petri net

Parameters:
  • petri_net (PetriNet) – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

  • format (str) – Format of the output picture (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

  • decorations – Decorations (color, label) associated to the elements of the Petri net

  • debug (bool) – Boolean enabling/disabling the debug mode (show place and transition’s names)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.view_petri_net(net, im, fm, format='svg')
pm4py.vis.save_vis_petri_net(petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, file_path: str, bgcolor: str = 'white', decorations: Dict[Any, Any] = None, debug: bool = False, rankdir: str = 'LR', **kwargs)[source]#

Saves a Petri net visualization to a file

Parameters:
  • petri_net (PetriNet) – Petri net

  • initial_marking (Marking) – Initial marking

  • final_marking (Marking) – Final marking

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

  • decorations – Decorations (color, label) associated to the elements of the Petri net

  • debug (bool) – Boolean enabling/disabling the debug mode (show place and transition’s names)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_petri_net(net, im, fm, 'petri_net.png')
pm4py.vis.view_performance_dfg(dfg: dict, start_activities: dict, end_activities: dict, format: str = 'png', aggregation_measure='mean', bgcolor: str = 'white', rankdir: str = 'LR', serv_time: Dict[str, float] | None = None)[source]#

Views a performance DFG

Parameters:
  • dfg (dict) – DFG object

  • start_activities (dict) – Start activities

  • end_activities (dict) – End activities

  • format (str) – Format of the output picture (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • aggregation_measure (str) – Aggregation measure (default: mean): mean, median, min, max, sum, stdev

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

  • serv_time – (optional) provides the activities’ service times, used to decorate the graph

import pm4py

performance_dfg, start_activities, end_activities = pm4py.discover_performance_dfg(dataframe, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
pm4py.view_performance_dfg(performance_dfg, start_activities, end_activities, format='svg')
pm4py.vis.save_vis_performance_dfg(dfg: dict, start_activities: dict, end_activities: dict, file_path: str, aggregation_measure='mean', bgcolor: str = 'white', rankdir: str = 'LR', serv_time: Dict[str, float] | None = None, **kwargs)[source]#

Saves the visualization of a performance DFG

Parameters:
  • dfg (dict) – DFG object

  • start_activities (dict) – Start activities

  • end_activities (dict) – End activities

  • file_path (str) – Destination path

  • aggregation_measure (str) – Aggregation measure (default: mean): mean, median, min, max, sum, stdev

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

  • serv_time – (optional) provides the activities’ service times, used to decorate the graph

import pm4py

performance_dfg, start_activities, end_activities = pm4py.discover_performance_dfg(dataframe, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_performance_dfg(performance_dfg, start_activities, end_activities, 'perf_dfg.png')
pm4py.vis.view_dfg(dfg: dict, start_activities: dict, end_activities: dict, format: str = 'png', bgcolor: str = 'white', max_num_edges: int = 9223372036854775807, rankdir: str = 'LR')[source]#

Views a (composite) DFG

Parameters:
  • dfg (dict) – DFG object

  • start_activities (dict) – Start activities

  • end_activities (dict) – End activities

  • format (str) – Format of the output picture (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

  • max_num_edges (int) – maximum number of edges to represent in the graph

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

dfg, start_activities, end_activities = pm4py.discover_dfg(dataframe, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
pm4py.view_dfg(dfg, start_activities, end_activities, format='svg')
pm4py.vis.save_vis_dfg(dfg: dict, start_activities: dict, end_activities: dict, file_path: str, bgcolor: str = 'white', max_num_edges: int = 9223372036854775807, rankdir: str = 'LR', **kwargs)[source]#

Saves a DFG visualization to a file

Parameters:
  • dfg (dict) – DFG object

  • start_activities (dict) – Start activities

  • end_activities (dict) – End activities

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

  • max_num_edges (int) – maximum number of edges to represent in the graph

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

dfg, start_activities, end_activities = pm4py.discover_dfg(dataframe, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_dfg(dfg, start_activities, end_activities, 'dfg.png')
pm4py.vis.view_process_tree(tree: ProcessTree, format: str = 'png', bgcolor: str = 'white', rankdir: str = 'LR')[source]#

Views a process tree

Parameters:
  • tree (ProcessTree) – Process tree

  • format (str) – Format of the visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

process_tree = pm4py.discover_process_tree_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.view_process_tree(process_tree, format='svg')
pm4py.vis.save_vis_process_tree(tree: ProcessTree, file_path: str, bgcolor: str = 'white', rankdir: str = 'LR', **kwargs)[source]#

Saves the visualization of a process tree

Parameters:
  • tree (ProcessTree) – Process tree

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

process_tree = pm4py.discover_process_tree_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_process_tree(process_tree, 'process_tree.png')
pm4py.vis.save_vis_bpmn(bpmn_graph: BPMN, file_path: str, bgcolor: str = 'white', rankdir: str = 'LR', variant_str: str = 'classic', **kwargs)[source]#

Saves the visualization of a BPMN graph

Parameters:
  • bpmn_graph (BPMN) – BPMN graph

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

  • variant_str (str) – variant of the visualization to be used (“classic” or “dagrejs”)

import pm4py

bpmn_graph = pm4py.discover_bpmn_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_bpmn(bpmn_graph, 'trial.bpmn')
pm4py.vis.view_bpmn(bpmn_graph: BPMN, format: str = 'png', bgcolor: str = 'white', rankdir: str = 'LR', variant_str: str = 'classic')[source]#

Views a BPMN graph

Parameters:
  • bpmn_graph (BPMN) – BPMN graph

  • format (str) – Format of the visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

  • variant_str (str) – variant of the visualization to be used (“classic” or “dagrejs”)

import pm4py

bpmn_graph = pm4py.discover_bpmn_inductive(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.view_bpmn(bpmn_graph)
pm4py.vis.view_heuristics_net(heu_net: HeuristicsNet, format: str = 'png', bgcolor: str = 'white')[source]#

Views an heuristics net

Parameters:
  • heu_net (HeuristicsNet) – Heuristics net

  • format (str) – Format of the visualization

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

heu_net = pm4py.discover_heuristics_net(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.view_heuristics_net(heu_net, format='svg')
pm4py.vis.save_vis_heuristics_net(heu_net: HeuristicsNet, file_path: str, bgcolor: str = 'white', **kwargs)[source]#

Saves the visualization of an heuristics net

Parameters:
  • heu_net (HeuristicsNet) – Heuristics net

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

heu_net = pm4py.discover_heuristics_net(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_heuristics_net(heu_net, 'heu.png')
pm4py.vis.view_dotted_chart(log: EventLog | DataFrame, format: str = 'png', attributes=None, bgcolor: str = 'white', show_legend: bool = True)[source]#

Displays the dotted chart

The dotted chart is a classic visualization of the events inside an event log across different dimensions. Each event of the event log is corresponding to a point. The dimensions are projected on a graph having: - X axis: the values of the first dimension are represented there. - Y-axis: the values of the second dimension are represented there. - Color: the values of the third dimension are represented as different colors for the points of the dotted chart.

The values can be either string, numeric or date values, and are managed accordingly by the dotted chart. The dotted chart can be built on different attributes. A convenient choice for the dotted chart is to visualize the distribution of cases and events over the time, with the following choices: - X-axis: the timestamp of the event. - Y-axis: the index of the case inside the event log. - Color: the activity of the event.

The aforementioned choice permits to identify visually patterns such as: - Batches. - Variations in the case arrival rate. - Variations in the case finishing rate.

Parameters:
  • log – Event log

  • format (str) – Image format

  • attributes – Attributes that should be used to construct the dotted chart. If None, the default dotted chart will be shown: x-axis: time y-axis: cases (in order of occurrence in the event log) color: activity. For custom attributes, use a list of attributes of the form [x-axis attribute, y-axis attribute, color attribute], e.g., [“concept:name”, “org:resource”, “concept:name”])

  • bgcolor (str) – background color to be used in the dotted chart

  • show_legend (bool) – boolean (enables/disables showing the legend)

import pm4py

pm4py.view_dotted_chart(dataframe, format='svg')
pm4py.view_dotted_chart(dataframe, attributes=['time:timestamp', 'concept:name', 'org:resource'])
pm4py.vis.save_vis_dotted_chart(log: EventLog | DataFrame, file_path: str, attributes=None, bgcolor: str = 'white', show_legend: bool = True, **kwargs)[source]#

Saves the visualization of the dotted chart

The dotted chart is a classic visualization of the events inside an event log across different dimensions. Each event of the event log is corresponding to a point. The dimensions are projected on a graph having: - X axis: the values of the first dimension are represented there. - Y-axis: the values of the second dimension are represented there. - Color: the values of the third dimension are represented as different colors for the points of the dotted chart.

The values can be either string, numeric or date values, and are managed accordingly by the dotted chart. The dotted chart can be built on different attributes. A convenient choice for the dotted chart is to visualize the distribution of cases and events over the time, with the following choices: - X-axis: the timestamp of the event. - Y-axis: the index of the case inside the event log. - Color: the activity of the event.

The aforementioned choice permits to identify visually patterns such as: - Batches. - Variations in the case arrival rate. - Variations in the case finishing rate.

Parameters:
  • log – Event log

  • file_path (str) – Destination path

  • attributes – Attributes that should be used to construct the dotted chart (for example, [“concept:name”, “org:resource”])

  • bgcolor (str) – background color to be used in the dotted chart

  • show_legend (bool) – boolean (enables/disables showing the legend)

import pm4py

pm4py.save_vis_dotted_chart(dataframe, 'dotted.png', attributes=['time:timestamp', 'concept:name', 'org:resource'])
pm4py.vis.view_sna(sna_metric: SNA, variant_str: str | None = None)[source]#

Represents a SNA metric (.html)

Parameters:
  • sna_metric (SNA) – Values of the metric

  • variant_str – variant to be used (default: pyvis)

import pm4py

metric = pm4py.discover_subcontracting_network(dataframe, resource_key='org:resource', timestamp_key='time:timestamp', case_id_key='case:concept:name')
pm4py.view_sna(metric)
pm4py.vis.save_vis_sna(sna_metric: SNA, file_path: str, variant_str: str | None = None, **kwargs)[source]#

Saves the visualization of a SNA metric in a .html file

Parameters:
  • sna_metric (SNA) – Values of the metric

  • file_path (str) – Destination path

  • variant_str – variant to be used (default: pyvis)

import pm4py

metric = pm4py.discover_subcontracting_network(dataframe, resource_key='org:resource', timestamp_key='time:timestamp', case_id_key='case:concept:name')
pm4py.save_vis_sna(metric, 'sna.png')
pm4py.vis.view_case_duration_graph(log: EventLog | DataFrame, format: str = 'png', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name')[source]#

Visualizes the case duration graph

Parameters:
  • log – Log object

  • format (str) – Format of the visualization (png, svg, …)

  • activity_key (str) – attribute to be used as activity

  • case_id_key (str) – attribute to be used as case identifier

  • timestamp_key (str) – attribute to be used as timestamp

import pm4py

pm4py.view_case_duration_graph(dataframe, format='svg', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.save_vis_case_duration_graph(log: EventLog | DataFrame, file_path: str, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', **kwargs)[source]#

Saves the case duration graph in the specified path

Parameters:
  • log – Log object

  • file_path (str) – Destination path

  • activity_key (str) – attribute to be used as activity

  • case_id_key (str) – attribute to be used as case identifier

  • timestamp_key (str) – attribute to be used as timestamp

import pm4py

pm4py.save_vis_case_duration_graph(dataframe, 'duration.png', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.view_events_per_time_graph(log: EventLog | DataFrame, format: str = 'png', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name')[source]#

Visualizes the events per time graph

Parameters:
  • log – Log object

  • format (str) – Format of the visualization (png, svg, …)

  • activity_key (str) – attribute to be used as activity

  • case_id_key (str) – attribute to be used as case identifier

  • timestamp_key (str) – attribute to be used as timestamp

import pm4py

pm4py.view_events_per_time_graph(dataframe, format='svg', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.save_vis_events_per_time_graph(log: EventLog | DataFrame, file_path: str, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', **kwargs)[source]#

Saves the events per time graph in the specified path

Parameters:
  • log – Log object

  • file_path (str) – Destination path

  • activity_key (str) – attribute to be used as activity

  • case_id_key (str) – attribute to be used as case identifier

  • timestamp_key (str) – attribute to be used as timestamp

import pm4py

pm4py.save_vis_events_per_time_graph(dataframe, 'ev_time.png', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.view_performance_spectrum(log: EventLog | DataFrame, activities: List[str], format: str = 'png', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', bgcolor: str = 'white')[source]#

Displays the performance spectrum

The performance spectrum is a novel visualization of the performance of the process of the time elapsed between different activities in the process executions. The performance spectrum has initially been described in:

Denisov, Vadim, et al. “The Performance Spectrum Miner: Visual Analytics for Fine-Grained Performance Analysis of Processes.” BPM (Dissertation/Demos/Industry). 2018.

Parameters:
  • perf_spectrum – Performance spectrum

  • format (str) – Format of the visualization (png, svg …)

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • activity_key – attribute to be used as activity

  • case_id_key – attribute to be used as case identifier

  • timestamp_key – attribute to be used as timestamp

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

pm4py.view_performance_spectrum(dataframe, ['Act. A', 'Act. C', 'Act. D'], format='svg', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.save_vis_performance_spectrum(log: EventLog | DataFrame, activities: List[str], file_path: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', bgcolor: str = 'white', **kwargs)[source]#

Saves the visualization of the performance spectrum to a file

The performance spectrum is a novel visualization of the performance of the process of the time elapsed between different activities in the process executions. The performance spectrum has initially been described in:

Denisov, Vadim, et al. “The Performance Spectrum Miner: Visual Analytics for Fine-Grained Performance Analysis of Processes.” BPM (Dissertation/Demos/Industry). 2018.

Parameters:
  • log – Event log

  • activities – List of activities (in order) that is used to build the performance spectrum

  • file_path (str) – Destination path (including the extension)

  • activity_key (str) – attribute to be used for the activity

  • timestamp_key (str) – attribute to be used for the timestamp

  • case_id_key (str) – attribute to be used as case identifier

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

pm4py.save_vis_performance_spectrum(dataframe, ['Act. A', 'Act. C', 'Act. D'], 'perf_spec.png', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.view_events_distribution_graph(log: EventLog | DataFrame, distr_type: str = 'days_week', format='png', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name')[source]#

Shows the distribution of the events in the specified dimension

Observing the distribution of events over time permits to infer useful information about the work shifts, the working days, and the period of the year that are more or less busy.

Parameters:
  • log – Event log

  • distr_type (str) – Type of distribution (default: days_week): - days_month => Gets the distribution of the events among the days of a month (from 1 to 31) - months => Gets the distribution of the events among the months (from 1 to 12) - years => Gets the distribution of the events among the years of the event log - hours => Gets the distribution of the events among the hours of a day (from 0 to 23) - days_week => Gets the distribution of the events among the days of a week (from Monday to Sunday) - weeks => Gets the distribution of the events among the weeks of a year (from 0 to 52)

  • format (str) – Format of the visualization (default: png)

  • activity_key (str) – attribute to be used as activity

  • case_id_key (str) – attribute to be used as case identifier

  • timestamp_key (str) – attribute to be used as timestamp

import pm4py

pm4py.view_events_distribution_graph(dataframe, format='svg', distr_type='days_week', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.save_vis_events_distribution_graph(log: EventLog | DataFrame, file_path: str, distr_type: str = 'days_week', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name', **kwargs)[source]#

Saves the distribution of the events in a picture file

Observing the distribution of events over time permits to infer useful information about the work shifts, the working days, and the period of the year that are more or less busy.

Parameters:
  • log – Event log

  • file_path (str) – Destination path (including the extension)

  • distr_type (str) – Type of distribution (default: days_week): - days_month => Gets the distribution of the events among the days of a month (from 1 to 31) - months => Gets the distribution of the events among the months (from 1 to 12) - years => Gets the distribution of the events among the years of the event log - hours => Gets the distribution of the events among the hours of a day (from 0 to 23) - days_week => Gets the distribution of the events among the days of a week (from Monday to Sunday)

  • activity_key (str) – attribute to be used as activity

  • case_id_key (str) – attribute to be used as case identifier

  • timestamp_key (str) – attribute to be used as timestamp

import pm4py

pm4py.save_vis_events_distribution_graph(dataframe, 'ev_distr_graph.png', distr_type='days_week', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.vis.view_ocdfg(ocdfg: Dict[str, Any], annotation: str = 'frequency', act_metric: str = 'events', edge_metric='event_couples', act_threshold: int = 0, edge_threshold: int = 0, performance_aggregation: str = 'mean', format: str = 'png', bgcolor: str = 'white', rankdir: str = 'LR')[source]#

Views an OC-DFG (object-centric directly-follows graph) with the provided configuration.

Object-centric directly-follows multigraphs are a composition of directly-follows graphs for the single object type, which can be annotated with different metrics considering the entities of an object-centric event log (i.e., events, unique objects, total objects).

Parameters:
  • ocdfg – Object-centric directly-follows graph

  • annotation (str) – The annotation to use for the visualization. Values: - “frequency”: frequency annotation - “performance”: performance annotation

  • act_metric (str) – The metric to use for the activities. Available values: - “events” => number of events (default) - “unique_objects” => number of unique objects - “total_objects” => number of total objects

  • edge_metric (str) – The metric to use for the edges. Available values: - “event_couples” => number of event couples (default) - “unique_objects” => number of unique objects - “total_objects” => number of total objects

  • act_threshold (int) – The threshold to apply on the activities frequency (default: 0). Only activities having a frequency >= than this are kept in the graph.

  • edge_threshold (int) – The threshold to apply on the edges frequency (default 0). Only edges having a frequency >= than this are kept in the graph.

  • performance_aggregation (str) – The aggregation measure to use for the performance: mean, median, min, max, sum

  • format (str) – The format of the output visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

ocdfg = pm4py.discover_ocdfg(ocel)
pm4py.view_ocdfg(ocdfg, annotation='frequency', format='svg')
pm4py.vis.save_vis_ocdfg(ocdfg: Dict[str, Any], file_path: str, annotation: str = 'frequency', act_metric: str = 'events', edge_metric='event_couples', act_threshold: int = 0, edge_threshold: int = 0, performance_aggregation: str = 'mean', bgcolor: str = 'white', rankdir: str = 'LR', **kwargs)[source]#

Saves the visualization of an OC-DFG (object-centric directly-follows graph) with the provided configuration.

Object-centric directly-follows multigraphs are a composition of directly-follows graphs for the single object type, which can be annotated with different metrics considering the entities of an object-centric event log (i.e., events, unique objects, total objects).

Parameters:
  • ocdfg – Object-centric directly-follows graph

  • file_path (str) – Destination path (including the extension)

  • annotation (str) – The annotation to use for the visualization. Values: - “frequency”: frequency annotation - “performance”: performance annotation

  • act_metric (str) – The metric to use for the activities. Available values: - “events” => number of events (default) - “unique_objects” => number of unique objects - “total_objects” => number of total objects

  • edge_metric (str) – The metric to use for the edges. Available values: - “event_couples” => number of event couples (default) - “unique_objects” => number of unique objects - “total_objects” => number of total objects

  • act_threshold (int) – The threshold to apply on the activities frequency (default: 0). Only activities having a frequency >= than this are kept in the graph.

  • edge_threshold (int) – The threshold to apply on the edges frequency (default 0). Only edges having a frequency >= than this are kept in the graph.

  • performance_aggregation (str) – The aggregation measure to use for the performance: mean, median, min, max, sum

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

ocdfg = pm4py.discover_ocdfg(ocel)
pm4py.save_vis_ocdfg(ocdfg, 'ocdfg.png', annotation='frequency')
pm4py.vis.view_ocpn(ocpn: Dict[str, Any], format: str = 'png', bgcolor: str = 'white', rankdir: str = 'LR')[source]#

Visualizes on the screen the object-centric Petri net

Parameters:
  • ocpn – Object-centric Petri net

  • format (str) – Format of the visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

ocpn = pm4py.discover_oc_petri_net(ocel)
pm4py.view_ocpn(ocpn, format='svg')
pm4py.vis.save_vis_ocpn(ocpn: Dict[str, Any], file_path: str, bgcolor: str = 'white', rankdir: str = 'LR', **kwargs)[source]#

Saves the visualization of the object-centric Petri net into a file

Parameters:
  • ocpn – Object-centric Petri net

  • file_path (str) – Target path of the visualization

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

ocpn = pm4py.discover_oc_petri_net(ocel)
pm4py.save_vis_ocpn(ocpn, 'ocpn.png')
pm4py.vis.view_network_analysis(network_analysis: Dict[Tuple[str, str], Dict[str, Any]], variant: str = 'frequency', format: str = 'png', activity_threshold: int = 1, edge_threshold: int = 1, bgcolor: str = 'white')[source]#

Visualizes the network analysis

Parameters:
  • network_analysis – Network analysis

  • variant (str) – Variant of the visualization: - frequency (if the discovered network analysis contains the frequency of the interactions) - performance (if the discovered network analysis contains the performance of the interactions)

  • format (str) – Format of the visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • activity_threshold (int) – The minimum number of occurrences for an activity to be included (default: 1)

  • edge_threshold (int) – The minimum number of occurrences for an edge to be included (default: 1)

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

net_ana = pm4py.discover_network_analysis(dataframe, out_column='case:concept:name', in_column='case:concept:name', node_column_source='org:resource', node_column_target='org:resource', edge_column='concept:name')
pm4py.view_network_analysis(net_ana, format='svg')
pm4py.vis.save_vis_network_analysis(network_analysis: Dict[Tuple[str, str], Dict[str, Any]], file_path: str, variant: str = 'frequency', activity_threshold: int = 1, edge_threshold: int = 1, bgcolor: str = 'white', **kwargs)[source]#

Saves the visualization of the network analysis

Parameters:
  • network_analysis – Network analysis

  • file_path (str) – Target path of the visualization

  • variant (str) – Variant of the visualization: - frequency (if the discovered network analysis contains the frequency of the interactions) - performance (if the discovered network analysis contains the performance of the interactions)

  • activity_threshold (int) – The minimum number of occurrences for an activity to be included (default: 1)

  • edge_threshold (int) – The minimum number of occurrences for an edge to be included (default: 1)

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

net_ana = pm4py.discover_network_analysis(dataframe, out_column='case:concept:name', in_column='case:concept:name', node_column_source='org:resource', node_column_target='org:resource', edge_column='concept:name')
pm4py.save_vis_network_analysis(net_ana, 'net_ana.png')
pm4py.vis.view_transition_system(transition_system: TransitionSystem, format: str = 'png', bgcolor: str = 'white')[source]#

Views a transition system

Parameters:
  • transition_system (TransitionSystem) – Transition system

  • format (str) – Format of the visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

transition_system = pm4py.discover_transition_system(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.view_transition_system(transition_system, format='svg')
pm4py.vis.save_vis_transition_system(transition_system: TransitionSystem, file_path: str, bgcolor: str = 'white', **kwargs)[source]#

Persists the visualization of a transition system

Parameters:
  • transition_system (TransitionSystem) – Transition system

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

transition_system = pm4py.discover_transition_system(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_transition_system(transition_system, 'trans_system.png')
pm4py.vis.view_prefix_tree(trie: Trie, format: str = 'png', bgcolor: str = 'white')[source]#

Views a prefix tree

Parameters:
  • prefix_tree – Prefix tree

  • format (str) – Format of the visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

prefix_tree = pm4py.discover_prefix_tree(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.view_prefix_tree(prefix_tree, format='svg')
pm4py.vis.save_vis_prefix_tree(trie: Trie, file_path: str, bgcolor: str = 'white', **kwargs)[source]#

Persists the visualization of a prefix tree

Parameters:
  • prefix_tree – Prefix tree

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

import pm4py

prefix_tree = pm4py.discover_prefix_tree(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')
pm4py.save_vis_prefix_tree(prefix_tree, 'trie.png')
pm4py.vis.view_alignments(log: EventLog | DataFrame, aligned_traces: List[Dict[str, Any]], format: str = 'png')[source]#

Views the alignment table as a figure

Parameters:
  • log – event log

  • aligned_traces – results of an alignment

  • format (str) – format of the visualization (default: png)

import pm4py

log = pm4py.read_xes('tests/input_data/running-example.xes')
net, im, fm = pm4py.discover_petri_net_inductive(log)
aligned_traces = pm4py.conformance_diagnostics_alignments(log, net, im, fm)
pm4py.view_alignments(log, aligned_traces, format='svg')
pm4py.vis.save_vis_alignments(log: EventLog | DataFrame, aligned_traces: List[Dict[str, Any]], file_path: str, **kwargs)[source]#

Saves an alignment table’s figure in the disk

Parameters:
  • log – event log

  • aligned_traces – results of an alignment

  • file_path (str) – target path in the disk

import pm4py

log = pm4py.read_xes('tests/input_data/running-example.xes')
net, im, fm = pm4py.discover_petri_net_inductive(log)
aligned_traces = pm4py.conformance_diagnostics_alignments(log, net, im, fm)
pm4py.save_vis_alignments(log, aligned_traces, 'output.svg')
pm4py.vis.view_footprints(footprints: Tuple[Dict[str, Any], Dict[str, Any]] | Dict[str, Any], format: str = 'png')[source]#

Views the footprints as a figure

Parameters:
  • footprints – footprints

  • format (str) –

    format of the visualization (default: png)

    import pm4py
    
    log = pm4py.read_xes('tests/input_data/running-example.xes')
    fp_log = pm4py.discover_footprints(log)
    pm4py.view_footprints(fp_log, format='svg')
    

pm4py.vis.save_vis_footprints(footprints: Tuple[Dict[str, Any], Dict[str, Any]] | Dict[str, Any], file_path: str, **kwargs)[source]#

Saves the footprints’ visualization on disk

Parameters:
  • footprints – footprints

  • file_path (str) –

    target path of the visualization

    import pm4py
    
    log = pm4py.read_xes('tests/input_data/running-example.xes')
    fp_log = pm4py.discover_footprints(log)
    pm4py.save_vis_footprints(fp_log, 'output.svg')
    

pm4py.vis.view_powl(powl: POWL, format: str = 'png', bgcolor: str = 'white', variant_str: str = 'basic')[source]#

Perform a visualization of a POWL model.

Reference paper: Kourani, Humam, and Sebastiaan J. van Zelst. “POWL: partially ordered workflow language.” International Conference on Business Process Management. Cham: Springer Nature Switzerland, 2023.

Parameters:
  • powl (POWL) – POWL model

  • format (str) – format of the visualization (default: png)

  • bgcolor (str) – background color of the visualization (default: white)

  • rankdir – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

  • variant_str (str) –

    variant of the visualization to be used (values: “basic”, “net”)

    import pm4py
    
    log = pm4py.read_xes('tests/input_data/running-example.xes')
    powl_model = pm4py.discover_powl(log)
    pm4py.view_powl(powl_model, format='svg', variant_str='basic')
    pm4py.view_powl(powl_model, format='svg', variant_str='net')
    

pm4py.vis.save_vis_powl(powl: POWL, file_path: str, bgcolor: str = 'white', rankdir: str = 'TB', **kwargs)[source]#

Saves the visualization of a POWL model.

Reference paper: Kourani, Humam, and Sebastiaan J. van Zelst. “POWL: partially ordered workflow language.” International Conference on Business Process Management. Cham: Springer Nature Switzerland, 2023.

Parameters:
  • powl (POWL) – POWL model

  • file_path (str) – target path of the visualization

  • bgcolor (str) – background color of the visualization (default: white)

  • rankdir (str) –

    sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

    import pm4py
    
    log = pm4py.read_xes('tests/input_data/running-example.xes')
    powl_model = pm4py.discover_powl(log)
    pm4py.save_vis_powl(powl_model, 'powl.png')
    

pm4py.vis.view_object_graph(ocel: OCEL, graph: Set[Tuple[str, str]], format: str = 'png', bgcolor: str = 'white', rankdir: str = 'LR')[source]#

Visualizes an object graph on the screen

Parameters:
  • ocel (OCEL) – object-centric event log

  • graph – object graph

  • format (str) – format of the visualization (if html is provided, GraphvizJS is used to render the visualization in an HTML page)

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
obj_graph = pm4py.ocel_discover_objects_graph(ocel, graph_type='object_interaction')
pm4py.view_object_graph(ocel, obj_graph, format='svg')
pm4py.vis.save_vis_object_graph(ocel: OCEL, graph: Set[Tuple[str, str]], file_path: str, bgcolor: str = 'white', rankdir: str = 'LR', **kwargs)[source]#

Saves the visualization of an object graph

Parameters:
  • ocel (OCEL) – object-centric event log

  • graph – object graph

  • file_path (str) – Destination path

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
obj_graph = pm4py.ocel_discover_objects_graph(ocel, graph_type='object_interaction')
pm4py.save_vis_object_graph(ocel, obj_graph, 'trial.pdf')
pm4py.vis.view_dcr(dcr: DcrGraph, format: str = 'png', bgcolor: str = 'white', rankdir: str = 'LR')[source]#

Views a DCR graph

Parameters:
  • dcr_graph – DCR graph

  • format (str) – format of the visualization (default: png)

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

dcr = pm4py.discover_dcr(dataframe)
pm4py.view_dcr(dcr, format='svg')
pm4py.vis.save_vis_dcr(dcr: DcrGraph, file_path: str, bgcolor: str = 'white', rankdir: str = 'LR', **kwargs)[source]#

Saves the visualization of a DCR graph

Parameters:
  • dcr_graph – DCR graph

  • file_path (str) – output path for where DCR graph should be saved

  • bgcolor (str) – Background color of the visualization (default: white)

  • rankdir (str) – sets the direction of the graph (“LR” for left-to-right; “TB” for top-to-bottom)

import pm4py

dcr = pm4py.discover_dcr(dataframe)
pm4py.save_vis_dcr(dcr, format='svg')

pm4py.write module#

The pm4py.write module contains all funcationality related to writing files/objects to disk.

pm4py.write.write_xes(log: EventLog | DataFrame, file_path: str, case_id_key: str = 'case:concept:name', extensions: Collection[XESExtension] | None = None, encoding: str = 'utf-8', **kwargs) None[source]#

Writes an event log to disk in the XES format (see xes-standard)

Parameters:
  • log – log object (pandas.DataFrame) that needs to be written to disk

  • file_path (str) – target file path of the event log (.xes file) on disk

  • case_id_key (str) – column key that identifies the case identifier

  • extensions – extensions defined for the event log

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

pm4py.write_xes(log, '<path_to_export_to>', case_id_key='case:concept:name')
pm4py.write.write_pnml(petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, file_path: str, encoding: str = 'utf-8') None[source]#

Writes a Petri net object to disk in the .pnml format (see pnml-standard)

Parameters:
  • petri_net (PetriNet) – Petri net object that needs to be written to disk

  • initial_marking (Marking) – initial marking of the Petri net

  • final_marking (Marking) – final marking of the Petri net

  • file_path (str) – target file path on disk of the .pnml file

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_pnml(pn, im, fm, '<path_to_export_to>')
pm4py.write.write_ptml(tree: ProcessTree, file_path: str, encoding: str = 'utf-8') None[source]#

Writes a process tree object to disk in the .ptml format.

Parameters:
  • tree (ProcessTree) – ProcessTree object that needs to be written to disk

  • file_path (str) – target file path on disk of the .ptml file

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ptml(tree, '<path_to_export_to>')
pm4py.write.write_dfg(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end_activities: Dict[str, int], file_path: str, encoding: str = 'utf-8')[source]#

Writes a directly follows graph (DFG) object to disk in the .dfg format.

Parameters:
  • dfg – directly follows relation (multiset of activity-activity pairs)

  • start_activities – multiset tracking the number of occurrences of start activities

  • end_activities – mulltiset tracking the number of occurrences of end activities

  • file_path (str) – target file path on disk to write the dfg object to

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_dfg(dfg, sa, ea, '<path_to_export_to>')
pm4py.write.write_bpmn(model: BPMN, file_path: str, auto_layout: bool = True, encoding: str = 'utf-8')[source]#

Writes a BPMN model object to disk in the .bpmn format.

Parameters:
  • model (BPMN) – BPMN model to export

  • file_path (str) – target file path on disk to write the BPMN object to

  • auto_layout (bool) – boolean indicating whether the model should get an auto layout (which is written to disk)

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_bpmn(model, '<path_to_export_to>')
pm4py.write.write_ocel(ocel: OCEL, file_path: str, objects_path: str = None, encoding: str = 'utf-8')[source]#

Writes an OCEL object to disk in the .bpmn format. Different formats are supported, including CSV (flat table), JSON-OCEL, XML-OCEL and SQLite (described in the site http://www.ocel-standard.org/).

Parameters:
  • ocel (OCEL) – OCEL object to write to disk

  • file_path (str) – target file path on disk to write the OCEL object to

  • objects_path (str) – location of the objects table (only applicable in case of .csv exporting)

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel(ocel, '<path_to_export_to>')
pm4py.write.write_ocel_csv(ocel: OCEL, file_path: str, objects_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL object to disk in the .csv file format. The OCEL object is exported into two separate files, i.e., one event table and one objects table. Both file paths should be specified

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path on disk to write the event table to

  • objects_path (str) – target file path on disk to write the objects table to

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel_csv(ocel, '<path_to_export_events_to>', '<path_to_export_objects_to>')
pm4py.write.write_ocel_json(ocel: OCEL, file_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL object to disk in the .json file format (exported as .oceljson file).

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path on disk to write the OCEL object to

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel_json(ocel, '<path_to_export_to>')
pm4py.write.write_ocel_xml(ocel: OCEL, file_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL object to disk in the .xml file format (exported as .ocelxml file).

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path on disk to write the OCEL object to

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel_xml(ocel, '<path_to_export_to>')
pm4py.write.write_ocel_sqlite(ocel: OCEL, file_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL object to disk to a SQLite database (exported as .sqlite file).

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path to the SQLite datbaase

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel_sqlite(ocel, '<path_to_export_to>')
pm4py.write.write_ocel2(ocel: OCEL, file_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL2.0 object to disk

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path to the SQLite datbaase

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel2(ocel, '<path_to_export_to>')
pm4py.write.write_ocel2_json(ocel: OCEL, file_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL2.0 object to disk to an JSON file (exported as .jsonocel file).

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path to the JSON file

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel2_json(ocel, '<path_to_export_to>')
pm4py.write.write_ocel2_sqlite(ocel: OCEL, file_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL2.0 object to disk to a SQLite database (exported as .sqlite file).

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path to the SQLite datbaase

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel2_sqlite(ocel, '<path_to_export_to>')
pm4py.write.write_ocel2_xml(ocel: OCEL, file_path: str, encoding: str = 'utf-8')[source]#

Writes an OCEL2.0 object to disk to an XML file (exported as .xmlocel file).

Parameters:
  • ocel (OCEL) – OCEL object

  • file_path (str) – target file path to the XML file

  • encoding (str) – the encoding to be used (default: utf-8)

import pm4py

log = pm4py.write_ocel2_xml(ocel, '<path_to_export_to>')
pm4py.write.write_dcr_xml(dcr_graph, path, variant, dcr_title, replace_whitespace=None)[source]#

Writes a DCR graph object to disk in the .xml file format (exported as .xml file). :param dcr: DCR graph object :param path: target file path to the XML file :param variant: variant of the DCR graph :param dcr_title: title of the DCR graph :param replace_whitespace: optional replacement for the white space character .. code-block:: python3

import pm4py pm4py.write_dcr_xml(dcr, ‘<path_to_export_to>’, variant, ‘<dcr_title.xml>’)

Module contents#

Process mining for Python