Skip to content

MESQUAL Pandas Util filter_by_model_query

filter_by_model_query

filter_by_model_query(df: Series | DataFrame, model_df: DataFrame, query: str = None, match_on_level: int | str = None) -> DataFrame | Series

Filter DataFrame or Series based on a query applied to a model DataFrame.

This function filters data by applying a pandas query to a model DataFrame and using the resulting index to filter the target DataFrame or Series. It handles both simple and MultiIndex cases automatically.

Parameters:

Name Type Description Default
df Series | DataFrame

The DataFrame or Series to filter.

required
model_df DataFrame

The model DataFrame containing metadata used for filtering. Must have an index that can be matched against df's axis.

required
query str

A pandas query string to apply to model_df. If None or empty, returns df unchanged. Uses pandas query syntax.

None
match_on_level int | str

For MultiIndex cases, specifies which level to match on. Can be an integer (level position) or string (level name).

None

Returns:

Type Description
DataFrame | Series

Filtered DataFrame or Series with the same type as input df.

Example:

>>> # You have a generation time-series df
>>> print(gen_df)  # Original DataFrame
    generator            GenA  GenB  GenC  SolarA  WindA
    2024-01-01 00:00:00   100   200   150      50     80
    2024-01-01 01:00:00   120   180   170      60     90
    2024-01-01 02:00:00   110   190   160      55     85

>>> # You have a generator model df
>>> print(model_df)
              zone technology  is_res
    generator
    GenA        DE    nuclear   False
    GenB        DE       coal   False
    GenC        FR        gas   False
    SolarA      DE      solar    True
    WindA       NL       wind    True

>>> only_de_conv = filter_by_model_query(gen_df, model_df, '(not is_res) and (zone == "DE")')
>>> print(only_de_conv)  # DataFrame with only non-res generators in DE
    generator            GenA GenB
    2024-01-01 00:00:00   100  200
    2024-01-01 01:00:00   120  180
    2024-01-01 02:00:00   110  190
Source code in submodules/mesqual/mesqual/utils/pandas_utils/filter.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def filter_by_model_query(
        df: pd.Series | pd.DataFrame,
        model_df: pd.DataFrame,
        query: str = None,
        match_on_level: int | str = None,
) -> pd.DataFrame | pd.Series:
    """Filter DataFrame or Series based on a query applied to a model DataFrame.

    This function filters data by applying a pandas query to a model DataFrame and
    using the resulting index to filter the target DataFrame or Series. It handles
    both simple and MultiIndex cases automatically.

    Args:
        df: The DataFrame or Series to filter.
        model_df: The model DataFrame containing metadata used for filtering.
            Must have an index that can be matched against df's axis.
        query: A pandas query string to apply to model_df. If None or empty,
            returns df unchanged. Uses pandas query syntax.
        match_on_level: For MultiIndex cases, specifies which level to match on.
            Can be an integer (level position) or string (level name).

    Returns:
        Filtered DataFrame or Series with the same type as input df.

    Example:

        >>> # You have a generation time-series df
        >>> print(gen_df)  # Original DataFrame
            generator            GenA  GenB  GenC  SolarA  WindA
            2024-01-01 00:00:00   100   200   150      50     80
            2024-01-01 01:00:00   120   180   170      60     90
            2024-01-01 02:00:00   110   190   160      55     85

        >>> # You have a generator model df
        >>> print(model_df)
                      zone technology  is_res
            generator
            GenA        DE    nuclear   False
            GenB        DE       coal   False
            GenC        FR        gas   False
            SolarA      DE      solar    True
            WindA       NL       wind    True

        >>> only_de_conv = filter_by_model_query(gen_df, model_df, '(not is_res) and (zone == "DE")')
        >>> print(only_de_conv)  # DataFrame with only non-res generators in DE
            generator            GenA GenB
            2024-01-01 00:00:00   100  200
            2024-01-01 01:00:00   120  180
            2024-01-01 02:00:00   110  190
    """
    if query is None or query == '':
        return df

    axis, idx_selection_level = get_matching_axis_and_level(df, model_df.index, match_on_level)
    idx = df.axes[axis]

    selection = model_df.query(query, engine='python').copy(deep=True).index
    selection = list(set(selection).intersection(idx.get_level_values(idx_selection_level)))

    if isinstance(idx, pd.MultiIndex):
        selection = [i for i in idx if i[idx_selection_level] in selection]

    if isinstance(df, pd.Series):
        return df[selection] if axis == 0 else df.loc[selection]
    else:
        return df.loc[selection, :] if axis == 0 else df.loc[:, selection]