Skip to content

MESQUAL Aggregate Columns with Part in Common

AggregatedColumnAppender

Adds aggregated columns to pandas DataFrames by summing columns that share a common identifier.

This utility is particularly useful in energy systems modeling where multiple variables of the same type (e.g., different wind generation sources, various demand components, multiple storage units) need to be aggregated into totals for analysis or visualization.

The class handles both single-level and multi-level DataFrame columns, making it suitable for MESQUAL's energy variable structures that often include hierarchical indexing (time, regions, technologies, etc.).

Energy Domain Context
  • Aggregates extensive quantities (volumes, energy, capacities) by summation
  • Preserves NaN when all quantities in an aggregation are NaNs
  • Suitable for technology groupings, regional aggregations, and fuel type summations

Parameters:

Name Type Description Default
in_common_part str

The common substring to search for in column names. All columns containing this substring will be summed together.

required
agg_col_name_prefix str

Prefix to add to the aggregated column name. Defaults to empty string.

None
agg_col_name_suffix str

Suffix to add to the aggregated column name. Defaults to empty string.

None

Examples:

>>> # Basic energy variable aggregation
>>> data = pd.DataFrame({
...     'wind_onshore_gen': [100, 200, 300],
...     'wind_offshore_gen': [50, 70, 100], 
...     'solar_pv_gen': [30, 40, 50]
... })
>>> appender = AggregatedColumnAppender('wind', agg_col_name_suffix='_total')
>>> result = appender.add_aggregated_column(data)
>>> # Creates 'wind_total' column with sum of wind_onshore_gen and wind_offshore_gen

>>> # Multi-level columns for scenario analysis
>>> columns = pd.MultiIndex.from_tuples([
...     ('wind_onshore', 'scenario_1'), ('wind_onshore', 'scenario_2'),
...     ('wind_offshore', 'scenario_1'), ('wind_offshore', 'scenario_2')
... ])
>>> data = pd.DataFrame(np.random.rand(24, 4), columns=columns)
>>> appender = AggregatedColumnAppender('wind', agg_col_name_prefix='total_')
>>> result = appender.add_aggregated_column(data)
>>> # Creates 'total_wind' with aggregated values for each scenario
Source code in submodules/mesqual/mesqual/energy_data_handling/variable_utils/aggregate_cols_with_part_in_common.py
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
class AggregatedColumnAppender:
    """
    Adds aggregated columns to pandas DataFrames by summing columns that share a common identifier.

    This utility is particularly useful in energy systems modeling where multiple variables of the same
    type (e.g., different wind generation sources, various demand components, multiple storage units)
    need to be aggregated into totals for analysis or visualization.

    The class handles both single-level and multi-level DataFrame columns, making it suitable for
    MESQUAL's energy variable structures that often include hierarchical indexing (time,
    regions, technologies, etc.).

    Energy Domain Context:
        - Aggregates extensive quantities (volumes, energy, capacities) by summation
        - Preserves NaN when all quantities in an aggregation are NaNs
        - Suitable for technology groupings, regional aggregations, and fuel type summations

    Args:
        in_common_part (str): The common substring to search for in column names. All columns
            containing this substring will be summed together.
        agg_col_name_prefix (str, optional): Prefix to add to the aggregated column name.
            Defaults to empty string.
        agg_col_name_suffix (str, optional): Suffix to add to the aggregated column name.
            Defaults to empty string.

    Examples:

        >>> # Basic energy variable aggregation
        >>> data = pd.DataFrame({
        ...     'wind_onshore_gen': [100, 200, 300],
        ...     'wind_offshore_gen': [50, 70, 100], 
        ...     'solar_pv_gen': [30, 40, 50]
        ... })
        >>> appender = AggregatedColumnAppender('wind', agg_col_name_suffix='_total')
        >>> result = appender.add_aggregated_column(data)
        >>> # Creates 'wind_total' column with sum of wind_onshore_gen and wind_offshore_gen

        >>> # Multi-level columns for scenario analysis
        >>> columns = pd.MultiIndex.from_tuples([
        ...     ('wind_onshore', 'scenario_1'), ('wind_onshore', 'scenario_2'),
        ...     ('wind_offshore', 'scenario_1'), ('wind_offshore', 'scenario_2')
        ... ])
        >>> data = pd.DataFrame(np.random.rand(24, 4), columns=columns)
        >>> appender = AggregatedColumnAppender('wind', agg_col_name_prefix='total_')
        >>> result = appender.add_aggregated_column(data)
        >>> # Creates 'total_wind' with aggregated values for each scenario
    """

    def __init__(
            self,
            in_common_part: str,
            agg_col_name_prefix: str = None,
            agg_col_name_suffix: str = None,
    ):
        self._in_common_part = in_common_part
        self._agg_col_name_prefix = agg_col_name_prefix or ''
        self._agg_col_name_suffix = agg_col_name_suffix or ''

    def add_aggregated_column(self, df: pd.DataFrame) -> pd.DataFrame:
        """
        Add an aggregated column to the DataFrame by summing matching columns.

        This method identifies all columns containing the specified common part and sums them
        into a new aggregated column. The aggregation preserves the DataFrame's structure
        and handles both single-level and multi-level column indices.

        For energy data, this is typically used to:
        - Aggregate different technology types (e.g., all wind sources)
        - Sum regional contributions (e.g., all demand in a country)
        - Combine fuel types (e.g., all fossil fuel generators)

        Args:
            df (pd.DataFrame): Input DataFrame containing energy variables. Can have single-level
                or multi-level column structure. For multi-level columns, aggregation is performed
                at the appropriate level while preserving the hierarchy.

        Returns:
            pd.DataFrame: Original DataFrame with added aggregated column. The new column name
                follows the pattern: {prefix}{in_common_part}{suffix}

        Raises:
            ValueError: If no columns contain the specified common part.
            TypeError: If input is not a pandas DataFrame.

        Examples:
            Single-level columns - technology aggregation:
            >>> gen_data = pd.DataFrame({
            ...     'wind_onshore_MW': [120, 150, 180],
            ...     'wind_offshore_MW': [80, 90, 100],
            ...     'solar_pv_MW': [200, 250, 300],
            ...     'demand_MW': [400, 490, 580]
            ... })
            >>> appender = AggregatedColumnAppender('wind', agg_col_name_suffix='_total_MW')
            >>> result = appender.add_aggregated_column(gen_data)
            >>> result['wind_total_MW']  # [200, 240, 280]

            Multi-level columns - scenario and regional analysis
            >>> idx = pd.date_range('2024-01-01', periods=3, freq='h')
            >>> cols = pd.MultiIndex.from_tuples([
            ...     ('storage_battery', 'DE', 'scenario_1'), 
            ...     ('storage_battery', 'FR', 'scenario_1'),
            ...     ('storage_pumped', 'DE', 'scenario_1'),
            ...     ('storage_pumped', 'FR', 'scenario_1')
            ... ], names=['technology', 'region', 'scenario'])
            >>> data = pd.DataFrame(np.random.rand(3, 4), index=idx, columns=cols)
            >>> appender = AggregatedColumnAppender('storage', agg_col_name_prefix='total_')
            >>> result = appender.add_aggregated_column(data)
            >>> # Creates 'total_storage' with sums grouped by (region, scenario)

            Handling NaN values in energy time series
            >>> price_data = pd.DataFrame({
            ...     'price_day_ahead': [50.0, np.nan, 45.0],
            ...     'price_intraday': [np.nan, np.nan, 47.0],  
            ...     'demand_forecast': [1000, 1100, 1200]
            ... })
            >>> appender = AggregatedColumnAppender('price', agg_col_name_suffix='_avg')
            >>> result = appender.add_aggregated_column(price_data)
            >>> # 'price_avg': [50.0, NaN, 92.0] - preserves NaN when all inputs are NaN

        Note:
            For multi-level DataFrames, the aggregation respects the hierarchical structure.
            If columns have levels beyond the first (technology level), the aggregation
            groups by those additional levels, creating separate totals for each combination.

            NaN handling follows pandas summation rules: NaN values are ignored unless all
            values in a row are NaN, in which case the result is NaN.
        """
        cols = df.columns.get_level_values(0).unique()
        cols_with_common_part = [x for x in cols if self._in_common_part in x]

        df_in_common = df[cols_with_common_part]
        if df.columns.nlevels == 1:
            dff = df_in_common.sum(axis=1)
            dff.loc[df_in_common.isna().all(axis=1)] = np.nan
        else:
            _groupby = list(range(1, df.columns.nlevels))
            dff = df_in_common.T.groupby(level=_groupby).sum().T
            _all_na = df_in_common.isna().T.groupby(level=_groupby).all().T
            if _all_na.any().any():
                for c in _all_na.columns:
                    dff.loc[_all_na[c], c] = np.nan

        new_col_name = f'{self._agg_col_name_prefix}{self._in_common_part}{self._agg_col_name_suffix}'
        df = set_column(df, new_col_name, dff)
        return df

add_aggregated_column

add_aggregated_column(df: DataFrame) -> DataFrame

Add an aggregated column to the DataFrame by summing matching columns.

This method identifies all columns containing the specified common part and sums them into a new aggregated column. The aggregation preserves the DataFrame's structure and handles both single-level and multi-level column indices.

For energy data, this is typically used to: - Aggregate different technology types (e.g., all wind sources) - Sum regional contributions (e.g., all demand in a country) - Combine fuel types (e.g., all fossil fuel generators)

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame containing energy variables. Can have single-level or multi-level column structure. For multi-level columns, aggregation is performed at the appropriate level while preserving the hierarchy.

required

Returns:

Type Description
DataFrame

pd.DataFrame: Original DataFrame with added aggregated column. The new column name follows the pattern: {prefix}{in_common_part}{suffix}

Raises:

Type Description
ValueError

If no columns contain the specified common part.

TypeError

If input is not a pandas DataFrame.

Examples:

Single-level columns - technology aggregation:

>>> gen_data = pd.DataFrame({
...     'wind_onshore_MW': [120, 150, 180],
...     'wind_offshore_MW': [80, 90, 100],
...     'solar_pv_MW': [200, 250, 300],
...     'demand_MW': [400, 490, 580]
... })
>>> appender = AggregatedColumnAppender('wind', agg_col_name_suffix='_total_MW')
>>> result = appender.add_aggregated_column(gen_data)
>>> result['wind_total_MW']  # [200, 240, 280]

Multi-level columns - scenario and regional analysis

>>> idx = pd.date_range('2024-01-01', periods=3, freq='h')
>>> cols = pd.MultiIndex.from_tuples([
...     ('storage_battery', 'DE', 'scenario_1'), 
...     ('storage_battery', 'FR', 'scenario_1'),
...     ('storage_pumped', 'DE', 'scenario_1'),
...     ('storage_pumped', 'FR', 'scenario_1')
... ], names=['technology', 'region', 'scenario'])
>>> data = pd.DataFrame(np.random.rand(3, 4), index=idx, columns=cols)
>>> appender = AggregatedColumnAppender('storage', agg_col_name_prefix='total_')
>>> result = appender.add_aggregated_column(data)
>>> # Creates 'total_storage' with sums grouped by (region, scenario)

Handling NaN values in energy time series

>>> price_data = pd.DataFrame({
...     'price_day_ahead': [50.0, np.nan, 45.0],
...     'price_intraday': [np.nan, np.nan, 47.0],  
...     'demand_forecast': [1000, 1100, 1200]
... })
>>> appender = AggregatedColumnAppender('price', agg_col_name_suffix='_avg')
>>> result = appender.add_aggregated_column(price_data)
>>> # 'price_avg': [50.0, NaN, 92.0] - preserves NaN when all inputs are NaN
Note

For multi-level DataFrames, the aggregation respects the hierarchical structure. If columns have levels beyond the first (technology level), the aggregation groups by those additional levels, creating separate totals for each combination.

NaN handling follows pandas summation rules: NaN values are ignored unless all values in a row are NaN, in which case the result is NaN.

Source code in submodules/mesqual/mesqual/energy_data_handling/variable_utils/aggregate_cols_with_part_in_common.py
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
def add_aggregated_column(self, df: pd.DataFrame) -> pd.DataFrame:
    """
    Add an aggregated column to the DataFrame by summing matching columns.

    This method identifies all columns containing the specified common part and sums them
    into a new aggregated column. The aggregation preserves the DataFrame's structure
    and handles both single-level and multi-level column indices.

    For energy data, this is typically used to:
    - Aggregate different technology types (e.g., all wind sources)
    - Sum regional contributions (e.g., all demand in a country)
    - Combine fuel types (e.g., all fossil fuel generators)

    Args:
        df (pd.DataFrame): Input DataFrame containing energy variables. Can have single-level
            or multi-level column structure. For multi-level columns, aggregation is performed
            at the appropriate level while preserving the hierarchy.

    Returns:
        pd.DataFrame: Original DataFrame with added aggregated column. The new column name
            follows the pattern: {prefix}{in_common_part}{suffix}

    Raises:
        ValueError: If no columns contain the specified common part.
        TypeError: If input is not a pandas DataFrame.

    Examples:
        Single-level columns - technology aggregation:
        >>> gen_data = pd.DataFrame({
        ...     'wind_onshore_MW': [120, 150, 180],
        ...     'wind_offshore_MW': [80, 90, 100],
        ...     'solar_pv_MW': [200, 250, 300],
        ...     'demand_MW': [400, 490, 580]
        ... })
        >>> appender = AggregatedColumnAppender('wind', agg_col_name_suffix='_total_MW')
        >>> result = appender.add_aggregated_column(gen_data)
        >>> result['wind_total_MW']  # [200, 240, 280]

        Multi-level columns - scenario and regional analysis
        >>> idx = pd.date_range('2024-01-01', periods=3, freq='h')
        >>> cols = pd.MultiIndex.from_tuples([
        ...     ('storage_battery', 'DE', 'scenario_1'), 
        ...     ('storage_battery', 'FR', 'scenario_1'),
        ...     ('storage_pumped', 'DE', 'scenario_1'),
        ...     ('storage_pumped', 'FR', 'scenario_1')
        ... ], names=['technology', 'region', 'scenario'])
        >>> data = pd.DataFrame(np.random.rand(3, 4), index=idx, columns=cols)
        >>> appender = AggregatedColumnAppender('storage', agg_col_name_prefix='total_')
        >>> result = appender.add_aggregated_column(data)
        >>> # Creates 'total_storage' with sums grouped by (region, scenario)

        Handling NaN values in energy time series
        >>> price_data = pd.DataFrame({
        ...     'price_day_ahead': [50.0, np.nan, 45.0],
        ...     'price_intraday': [np.nan, np.nan, 47.0],  
        ...     'demand_forecast': [1000, 1100, 1200]
        ... })
        >>> appender = AggregatedColumnAppender('price', agg_col_name_suffix='_avg')
        >>> result = appender.add_aggregated_column(price_data)
        >>> # 'price_avg': [50.0, NaN, 92.0] - preserves NaN when all inputs are NaN

    Note:
        For multi-level DataFrames, the aggregation respects the hierarchical structure.
        If columns have levels beyond the first (technology level), the aggregation
        groups by those additional levels, creating separate totals for each combination.

        NaN handling follows pandas summation rules: NaN values are ignored unless all
        values in a row are NaN, in which case the result is NaN.
    """
    cols = df.columns.get_level_values(0).unique()
    cols_with_common_part = [x for x in cols if self._in_common_part in x]

    df_in_common = df[cols_with_common_part]
    if df.columns.nlevels == 1:
        dff = df_in_common.sum(axis=1)
        dff.loc[df_in_common.isna().all(axis=1)] = np.nan
    else:
        _groupby = list(range(1, df.columns.nlevels))
        dff = df_in_common.T.groupby(level=_groupby).sum().T
        _all_na = df_in_common.isna().T.groupby(level=_groupby).all().T
        if _all_na.any().any():
            for c in _all_na.columns:
                dff.loc[_all_na[c], c] = np.nan

    new_col_name = f'{self._agg_col_name_prefix}{self._in_common_part}{self._agg_col_name_suffix}'
    df = set_column(df, new_col_name, dff)
    return df