MESQUAL Aggregate Columns with Part in Common¶
AggregatedColumnAppender
¶
Adds aggregated columns to pandas DataFrames by summing columns that share a common identifier.
This utility is particularly useful in energy systems modeling where multiple variables of the same type (e.g., different wind generation sources, various demand components, multiple storage units) need to be aggregated into totals for analysis or visualization.
The class handles both single-level and multi-level DataFrame columns, making it suitable for MESQUAL's energy variable structures that often include hierarchical indexing (time, regions, technologies, etc.).
Energy Domain Context
- Aggregates extensive quantities (volumes, energy, capacities) by summation
- Preserves NaN when all quantities in an aggregation are NaNs
- Suitable for technology groupings, regional aggregations, and fuel type summations
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
in_common_part
|
str
|
The common substring to search for in column names. All columns containing this substring will be summed together. |
required |
agg_col_name_prefix
|
str
|
Prefix to add to the aggregated column name. Defaults to empty string. |
None
|
agg_col_name_suffix
|
str
|
Suffix to add to the aggregated column name. Defaults to empty string. |
None
|
Examples:
>>> # Basic energy variable aggregation
>>> data = pd.DataFrame({
... 'wind_onshore_gen': [100, 200, 300],
... 'wind_offshore_gen': [50, 70, 100],
... 'solar_pv_gen': [30, 40, 50]
... })
>>> appender = AggregatedColumnAppender('wind', agg_col_name_suffix='_total')
>>> result = appender.add_aggregated_column(data)
>>> # Creates 'wind_total' column with sum of wind_onshore_gen and wind_offshore_gen
>>> # Multi-level columns for scenario analysis
>>> columns = pd.MultiIndex.from_tuples([
... ('wind_onshore', 'scenario_1'), ('wind_onshore', 'scenario_2'),
... ('wind_offshore', 'scenario_1'), ('wind_offshore', 'scenario_2')
... ])
>>> data = pd.DataFrame(np.random.rand(24, 4), columns=columns)
>>> appender = AggregatedColumnAppender('wind', agg_col_name_prefix='total_')
>>> result = appender.add_aggregated_column(data)
>>> # Creates 'total_wind' with aggregated values for each scenario
Source code in submodules/mesqual/mesqual/energy_data_handling/variable_utils/aggregate_cols_with_part_in_common.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
add_aggregated_column
¶
add_aggregated_column(df: DataFrame) -> DataFrame
Add an aggregated column to the DataFrame by summing matching columns.
This method identifies all columns containing the specified common part and sums them into a new aggregated column. The aggregation preserves the DataFrame's structure and handles both single-level and multi-level column indices.
For energy data, this is typically used to: - Aggregate different technology types (e.g., all wind sources) - Sum regional contributions (e.g., all demand in a country) - Combine fuel types (e.g., all fossil fuel generators)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing energy variables. Can have single-level or multi-level column structure. For multi-level columns, aggregation is performed at the appropriate level while preserving the hierarchy. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Original DataFrame with added aggregated column. The new column name follows the pattern: {prefix}{in_common_part}{suffix} |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no columns contain the specified common part. |
TypeError
|
If input is not a pandas DataFrame. |
Examples:
Single-level columns - technology aggregation:
>>> gen_data = pd.DataFrame({
... 'wind_onshore_MW': [120, 150, 180],
... 'wind_offshore_MW': [80, 90, 100],
... 'solar_pv_MW': [200, 250, 300],
... 'demand_MW': [400, 490, 580]
... })
>>> appender = AggregatedColumnAppender('wind', agg_col_name_suffix='_total_MW')
>>> result = appender.add_aggregated_column(gen_data)
>>> result['wind_total_MW'] # [200, 240, 280]
Multi-level columns - scenario and regional analysis
>>> idx = pd.date_range('2024-01-01', periods=3, freq='h')
>>> cols = pd.MultiIndex.from_tuples([
... ('storage_battery', 'DE', 'scenario_1'),
... ('storage_battery', 'FR', 'scenario_1'),
... ('storage_pumped', 'DE', 'scenario_1'),
... ('storage_pumped', 'FR', 'scenario_1')
... ], names=['technology', 'region', 'scenario'])
>>> data = pd.DataFrame(np.random.rand(3, 4), index=idx, columns=cols)
>>> appender = AggregatedColumnAppender('storage', agg_col_name_prefix='total_')
>>> result = appender.add_aggregated_column(data)
>>> # Creates 'total_storage' with sums grouped by (region, scenario)
Handling NaN values in energy time series
>>> price_data = pd.DataFrame({
... 'price_day_ahead': [50.0, np.nan, 45.0],
... 'price_intraday': [np.nan, np.nan, 47.0],
... 'demand_forecast': [1000, 1100, 1200]
... })
>>> appender = AggregatedColumnAppender('price', agg_col_name_suffix='_avg')
>>> result = appender.add_aggregated_column(price_data)
>>> # 'price_avg': [50.0, NaN, 92.0] - preserves NaN when all inputs are NaN
Note
For multi-level DataFrames, the aggregation respects the hierarchical structure. If columns have levels beyond the first (technology level), the aggregation groups by those additional levels, creating separate totals for each combination.
NaN handling follows pandas summation rules: NaN values are ignored unless all values in a row are NaN, in which case the result is NaN.
Source code in submodules/mesqual/mesqual/energy_data_handling/variable_utils/aggregate_cols_with_part_in_common.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |