Skip to content

Dataset References

dataset

Dataset

Bases: Generic[DatasetConfigType, FlagType, FlagIndexType], ABC

Abstract base class for all datasets in the MESQUAL framework.

The Dataset class provides the fundamental interface for data access and manipulation in MESQUAL. It implements the core principle "Everything is a Dataset" where individual scenarios, collections of scenarios, and scenario comparisons all share the same unified interface.

Key Features
  • Unified .fetch(flag) interface for data access
  • Attribute management for scenario metadata
  • KPI calculation integration
  • Database caching support
  • Dot notation fetching via dotfetch property
  • Type-safe generic implementation

Class Type Parameters:

Name Bound or Constraints Description Default
DatasetConfigType

Configuration class for dataset behavior

required
FlagType

Type used for data flag identification (typically str)

required
FlagIndexType

Flag index implementation for flag mapping

required

Attributes:

Name Type Description
name str

Human-readable identifier for the dataset

kpi_collection KPICollection

Collection of KPIs associated with this dataset

dotfetch _DotNotationFetcher

Enables dot notation data access

Example:

>>> # Basic usage pattern
>>> data = dataset.fetch('buses_t.marginal_price')
>>> flags = dataset.accepted_flags
>>> if dataset.flag_is_accepted('generators_t.p'):
...     gen_data = dataset.fetch('generators_t.p')
Source code in submodules/mesqual/mesqual/datasets/dataset.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
class Dataset(Generic[DatasetConfigType, FlagType, FlagIndexType], ABC):
    """
    Abstract base class for all datasets in the MESQUAL framework.

    The Dataset class provides the fundamental interface for data access and manipulation
    in MESQUAL. It implements the core principle "Everything is a Dataset" where individual
    scenarios, collections of scenarios, and scenario comparisons all share the same
    unified interface.

    Key Features:
        - Unified `.fetch(flag)` interface for data access
        - Attribute management for scenario metadata
        - KPI calculation integration
        - Database caching support
        - Dot notation fetching via `dotfetch` property
        - Type-safe generic implementation

    Type Parameters:
        DatasetConfigType: Configuration class for dataset behavior
        FlagType: Type used for data flag identification (typically str)
        FlagIndexType: Flag index implementation for flag mapping

    Attributes:
        name (str): Human-readable identifier for the dataset
        kpi_collection (KPICollection): Collection of KPIs associated with this dataset
        dotfetch (_DotNotationFetcher): Enables dot notation data access

    Example:

        >>> # Basic usage pattern
        >>> data = dataset.fetch('buses_t.marginal_price')
        >>> flags = dataset.accepted_flags
        >>> if dataset.flag_is_accepted('generators_t.p'):
        ...     gen_data = dataset.fetch('generators_t.p')
    """

    def __init__(
            self,
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndexType = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None
    ):
        """
        Initialize a new Dataset instance.

        Args:
            name: Human-readable identifier. If None, auto-generates from class name
            parent_dataset: Optional parent dataset for hierarchical relationships
            flag_index: Index for mapping and validating data flags
            attributes: Dictionary of metadata attributes for the dataset
            database: Optional database for caching expensive computations
            config: Configuration object controlling dataset behavior
        """
        self.name = name or f'{self.__class__.__name__}_{str(id(self))}'
        self._flag_index = flag_index or EmptyFlagIndex()
        self._parent_dataset = parent_dataset
        self._attributes: dict = attributes or dict()
        self._database = database
        self._config = config
        self.dotfetch = _DotNotationFetcher(self)

        from mesqual.kpis.kpi_collection import KPICollection
        self.kpi_collection: KPICollection = KPICollection()

    @property
    def flag_index(self) -> FlagIndexType:
        if isinstance(self._flag_index, EmptyFlagIndex):
            logger.info(
                f"Dataset {self.name}: "
                "You're trying to use functionality of the FlagIndex but didn't define one. "
                "The current FlagIndex in use is empty. "
                "Make sure to set a flag_index in case you want to use full functionality of the flag_index."
            )
        return self._flag_index

    @property
    def database(self) -> Database | None:
        return self._database

    def add_kpis(self, kpis: Iterable[KPI | KPIFactory | Type[KPI]]):
        """
        Add multiple KPIs to this dataset's KPI collection.

        Args:
            kpis: Iterable of KPI instances, factories, or classes to add
        """
        for kpi in kpis:
            self.add_kpi(kpi)

    def add_kpi(self, kpi: KPI | KPIFactory | Type[KPI]):
        """
        Add a single KPI to this dataset's KPI collection.

        Automatically handles different KPI input types by converting factories
        and classes to KPI instances.

        Args:
            kpi: KPI instance, factory, or class to add
        """
        from mesqual.kpis.kpi_base import KPI
        from mesqual.kpis.kpis_from_aggregations import KPIFactory
        if isinstance(kpi, KPIFactory):
            kpi = kpi.get_kpi(self)
        elif isinstance(kpi, type) and issubclass(kpi, KPI):
            kpi = kpi.from_factory(self)
        self.kpi_collection.add_kpi(kpi)

    def clear_kpi_collection(self):
        from mesqual.kpis import KPICollection
        self.kpi_collection = KPICollection()

    @property
    def attributes(self) -> dict:
        return self._attributes

    def get_attributes_series(self) -> pd.Series:
        att_series = pd.Series(self.attributes, name=self.name)
        return att_series

    def set_attributes(self, **kwargs):
        for key, value in kwargs.items():
            if not isinstance(key, str):
                raise TypeError(f'Attribute keys must be of type str. Your key {key} is of type {type(key)}.')
            if not isinstance(value, (bool, int, float, str)):
                raise TypeError(
                    f'Attribute values must be of type (bool, int, flaot, str). '
                    f'Your value for {key} ({value}) is of type {type(value)}.'
                )
            self._attributes[key] = value

    @property
    def parent_dataset(self) -> 'DatasetLinkCollection':
        if self._parent_dataset is None:
            raise RuntimeError(f"Parent dataset called without / before assignment.")
        return self._parent_dataset

    @parent_dataset.setter
    def parent_dataset(self, parent_dataset: 'DatasetLinkCollection'):
        from mesqual.datasets.dataset_collection import DatasetLinkCollection
        if not isinstance(parent_dataset, DatasetLinkCollection):
            raise TypeError(f"Parent parent_dataset must be of type {DatasetLinkCollection.__name__}")
        self._parent_dataset = parent_dataset

    @property
    @abstractmethod
    def accepted_flags(self) -> set[FlagType]:
        """
        Set of all flags accepted by this dataset.

        This abstract property must be implemented by all concrete dataset classes
        to define which data flags can be fetched from the dataset.

        Returns:
            Set of flags that can be used with the fetch() method

        Example:

            >>> print(dataset.accepted_flags)
                {'buses', 'buses_t.marginal_price', 'generators', 'generators_t.p', ...}
        """
        return set()

    def get_accepted_flags_containing_x(self, x: str, match_case: bool = False) -> set[FlagType]:
        """
        Find all accepted flags containing a specific substring.

        Useful for discovering related data flags or filtering flags by category.

        Args:
            x: Substring to search for in flag names
            match_case: If True, performs case-sensitive search. Default is False.

        Returns:
            Set of accepted flags containing the substring

        Example:

            >>> ds = PyPSADataset()
            >>> ds.get_accepted_flags_containing_x('generators')
                {'generators', 'generators_t.p', 'generators_t.efficiency', ...}
            >>> ds.get_accepted_flags_containing_x('BUSES', match_case=True)
                set()  # Empty because case doesn't match
        """
        if match_case:
            return {f for f in self.accepted_flags if x in str(f)}
        x_lower = x.lower()
        return {f for f in self.accepted_flags if x_lower in str(f).lower()}

    def flag_is_accepted(self, flag: FlagType) -> bool:
        """
        Boolean check whether a flag is accepted by the Dataset.

        This method can be optionally overridden in any child-class
        in case you want to follow logic instead of the explicit set of accepted_flags.
        """
        return flag in self.accepted_flags

    @flag_must_be_accepted
    def required_flags_for_flag(self, flag: FlagType) -> set[FlagType]:
        return self._required_flags_for_flag(flag)

    @abstractmethod
    def _required_flags_for_flag(self, flag: FlagType) -> set[FlagType]:
        return set()

    @flag_must_be_accepted
    def fetch(self, flag: FlagType, config: dict | DatasetConfigType = None, **kwargs) -> pd.Series | pd.DataFrame:
        """
        Fetch data associated with a specific flag.

        This is the primary method for data access in MESQUAL datasets. It provides
        a unified interface for retrieving data regardless of the underlying source
        or dataset type. The method includes automatic caching, post-processing,
        and configuration management.

        Args:
            flag: Data identifier flag (must be in accepted_flags)
            config: Optional configuration to override dataset defaults.
                   Can be a dict or DatasetConfig instance.
            **kwargs: Additional keyword arguments passed to the underlying
                     data fetching implementation

        Returns:
            DataFrame or Series containing the requested data

        Raises:
            ValueError: If the flag is not accepted by this dataset

        Examples:

            >>> # Basic usage
            >>> prices = dataset.fetch('buses_t.marginal_price')
            >>>
            >>> # With custom configuration
            >>> prices = dataset.fetch('buses_t.marginal_price', config={'use_database': False})
        """
        effective_config = self._prepare_config(config)
        use_database = self._database is not None and effective_config.use_database

        if use_database:
            if self._database.key_is_up_to_date(self, flag, config=effective_config, **kwargs):
                return self._database.get(self, flag, config=effective_config, **kwargs)

        raw_data = self._fetch(flag, effective_config, **kwargs)
        processed_data = self._post_process_data(raw_data, flag, effective_config)

        if use_database:
            self._database.set(self, flag, config=effective_config, value=processed_data, **kwargs)

        return processed_data.copy()

    def _post_process_data(
            self,
            data: pd.Series | pd.DataFrame,
            flag: FlagType,
            config: DatasetConfigType
    ) -> pd.Series | pd.DataFrame:
        if config.remove_duplicate_indices and any(data.index.duplicated()):
            logger.info(
                f'For some reason your data-set {self.name} returns an object with duplicate indices for flag {flag}.\n'
                f'We manually remove duplicate indices. Please make sure your data importer / converter is set up '
                f'appropriately and that your raw data does not contain duplicate indices. \n'
                f'We will keep the first element of every duplicated index.'
            )
            data = data.loc[~data.index.duplicated()]
        if config.auto_sort_datetime_index and isinstance(data.index, pd.DatetimeIndex):
            data = data.sort_index()
        return data

    def _prepare_config(self, config: dict | DatasetConfigType = None) -> DatasetConfigType:
        if config is None:
            return self.instance_config

        if isinstance(config, dict):
            temp_config = self.get_config_type()()
            temp_config.__dict__.update(config)
            return self.instance_config.merge(temp_config)

        from mesqual.datasets.dataset_config import DatasetConfig
        if isinstance(config, DatasetConfig):
            return self.instance_config.merge(config)

        raise TypeError(f"Config must be dict or {DatasetConfig.__name__}, got {type(config)}")

    @abstractmethod
    def _fetch(self, flag: FlagType, effective_config: DatasetConfigType, **kwargs) -> pd.Series | pd.DataFrame:
        return pd.DataFrame()

    def fetch_multiple_flags_and_concat(
            self,
            flags: Iterable[FlagType],
            concat_axis: int = 1,
            concat_level_name: str = 'variable',
            concat_level_at_top: bool = True,
            config: dict | DatasetConfigType = None,
            **kwargs
    ) -> Union[pd.Series, pd.DataFrame]:
        dfs = {
            str(flag): self.fetch(flag, config, **kwargs)
            for flag in flags
        }
        df = pd.concat(
            dfs,
            axis=concat_axis,
            names=[concat_level_name],
        )
        if not concat_level_at_top:
            ax = df.axes[concat_axis]
            ax = ax.reorder_levels(list(range(1, ax.nlevels)) + [0])
            df.axes[concat_axis] = ax
        return df

    def fetch_filter_groupby_agg(
            self,
            flag: FlagType,
            model_filter_query: str = None,
            prop_groupby: str | list[str] = None,
            prop_groupby_agg: str = None,
            config: dict | DatasetConfigType = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        model_flag = self.flag_index.get_linked_model_flag(flag)
        if not model_flag:
            raise RuntimeError(f'FlagIndex could not successfully map flag {flag} to a model flag.')

        from mesqual.utils import pandas_utils

        data = self.fetch(flag, config, **kwargs)
        model_df = self.fetch(model_flag, config, **kwargs)

        if model_filter_query:
            data = pandas_utils.filter_by_model_query(data, model_df, query=model_filter_query)

        if prop_groupby:
            if isinstance(prop_groupby, str):
                prop_groupby = [prop_groupby]
            data = pandas_utils.prepend_model_prop_levels(data, model_df, *prop_groupby)
            data = data.groupby(prop_groupby)
            if prop_groupby_agg:
                data = data.agg(prop_groupby_agg)
        elif prop_groupby_agg:
            logger.warning(
                f"You provided a prop_groupby_agg operation, but didn't provide prop_groupby. "
                f"No aggregation performed."
            )
        return data

    @classmethod
    def get_flag_type(cls) -> Type[FlagType]:
        from mesqual.flag.flag import FlagTypeProtocol
        return FlagTypeProtocol

    @classmethod
    def get_flag_index_type(cls) -> Type[FlagIndexType]:
        from mesqual.flag.flag_index import FlagIndex
        return FlagIndex

    @classmethod
    def get_config_type(cls) -> Type[DatasetConfigType]:
        from mesqual.datasets.dataset_config import DatasetConfig
        return DatasetConfig

    @property
    def instance_config(self) -> DatasetConfigType:
        from mesqual.datasets.dataset_config import DatasetConfigManager
        return DatasetConfigManager.get_effective_config(self.__class__, self._config)

    def set_instance_config(self, config: DatasetConfigType) -> None:
        self._config = config

    def set_instance_config_kwargs(self, **kwargs) -> None:
        for key, value in kwargs.items():
            setattr(self._config, key, value)

    @classmethod
    def set_class_config(cls, config: DatasetConfigType) -> None:
        from mesqual.datasets.dataset_config import DatasetConfigManager
        DatasetConfigManager.set_class_config(cls, config)

    @classmethod
    def _get_class_name_lower_snake(cls) -> str:
        return to_lower_snake(cls.__name__)

    def __str__(self) -> str:
        return self.name

    def __hash__(self):
        return hash((self.name, self._config))

accepted_flags abstractmethod property

accepted_flags: set[FlagType]

Set of all flags accepted by this dataset.

This abstract property must be implemented by all concrete dataset classes to define which data flags can be fetched from the dataset.

Returns:

Type Description
set[FlagType]

Set of flags that can be used with the fetch() method

Example:

>>> print(dataset.accepted_flags)
    {'buses', 'buses_t.marginal_price', 'generators', 'generators_t.p', ...}

__init__

__init__(name: str = None, parent_dataset: Dataset = None, flag_index: FlagIndexType = None, attributes: dict = None, database: Database = None, config: DatasetConfigType = None)

Initialize a new Dataset instance.

Parameters:

Name Type Description Default
name str

Human-readable identifier. If None, auto-generates from class name

None
parent_dataset Dataset

Optional parent dataset for hierarchical relationships

None
flag_index FlagIndexType

Index for mapping and validating data flags

None
attributes dict

Dictionary of metadata attributes for the dataset

None
database Database

Optional database for caching expensive computations

None
config DatasetConfigType

Configuration object controlling dataset behavior

None
Source code in submodules/mesqual/mesqual/datasets/dataset.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def __init__(
        self,
        name: str = None,
        parent_dataset: Dataset = None,
        flag_index: FlagIndexType = None,
        attributes: dict = None,
        database: Database = None,
        config: DatasetConfigType = None
):
    """
    Initialize a new Dataset instance.

    Args:
        name: Human-readable identifier. If None, auto-generates from class name
        parent_dataset: Optional parent dataset for hierarchical relationships
        flag_index: Index for mapping and validating data flags
        attributes: Dictionary of metadata attributes for the dataset
        database: Optional database for caching expensive computations
        config: Configuration object controlling dataset behavior
    """
    self.name = name or f'{self.__class__.__name__}_{str(id(self))}'
    self._flag_index = flag_index or EmptyFlagIndex()
    self._parent_dataset = parent_dataset
    self._attributes: dict = attributes or dict()
    self._database = database
    self._config = config
    self.dotfetch = _DotNotationFetcher(self)

    from mesqual.kpis.kpi_collection import KPICollection
    self.kpi_collection: KPICollection = KPICollection()

add_kpis

add_kpis(kpis: Iterable[KPI | KPIFactory | Type[KPI]])

Add multiple KPIs to this dataset's KPI collection.

Parameters:

Name Type Description Default
kpis Iterable[KPI | KPIFactory | Type[KPI]]

Iterable of KPI instances, factories, or classes to add

required
Source code in submodules/mesqual/mesqual/datasets/dataset.py
152
153
154
155
156
157
158
159
160
def add_kpis(self, kpis: Iterable[KPI | KPIFactory | Type[KPI]]):
    """
    Add multiple KPIs to this dataset's KPI collection.

    Args:
        kpis: Iterable of KPI instances, factories, or classes to add
    """
    for kpi in kpis:
        self.add_kpi(kpi)

add_kpi

add_kpi(kpi: KPI | KPIFactory | Type[KPI])

Add a single KPI to this dataset's KPI collection.

Automatically handles different KPI input types by converting factories and classes to KPI instances.

Parameters:

Name Type Description Default
kpi KPI | KPIFactory | Type[KPI]

KPI instance, factory, or class to add

required
Source code in submodules/mesqual/mesqual/datasets/dataset.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def add_kpi(self, kpi: KPI | KPIFactory | Type[KPI]):
    """
    Add a single KPI to this dataset's KPI collection.

    Automatically handles different KPI input types by converting factories
    and classes to KPI instances.

    Args:
        kpi: KPI instance, factory, or class to add
    """
    from mesqual.kpis.kpi_base import KPI
    from mesqual.kpis.kpis_from_aggregations import KPIFactory
    if isinstance(kpi, KPIFactory):
        kpi = kpi.get_kpi(self)
    elif isinstance(kpi, type) and issubclass(kpi, KPI):
        kpi = kpi.from_factory(self)
    self.kpi_collection.add_kpi(kpi)

get_accepted_flags_containing_x

get_accepted_flags_containing_x(x: str, match_case: bool = False) -> set[FlagType]

Find all accepted flags containing a specific substring.

Useful for discovering related data flags or filtering flags by category.

Parameters:

Name Type Description Default
x str

Substring to search for in flag names

required
match_case bool

If True, performs case-sensitive search. Default is False.

False

Returns:

Type Description
set[FlagType]

Set of accepted flags containing the substring

Example:

>>> ds = PyPSADataset()
>>> ds.get_accepted_flags_containing_x('generators')
    {'generators', 'generators_t.p', 'generators_t.efficiency', ...}
>>> ds.get_accepted_flags_containing_x('BUSES', match_case=True)
    set()  # Empty because case doesn't match
Source code in submodules/mesqual/mesqual/datasets/dataset.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
def get_accepted_flags_containing_x(self, x: str, match_case: bool = False) -> set[FlagType]:
    """
    Find all accepted flags containing a specific substring.

    Useful for discovering related data flags or filtering flags by category.

    Args:
        x: Substring to search for in flag names
        match_case: If True, performs case-sensitive search. Default is False.

    Returns:
        Set of accepted flags containing the substring

    Example:

        >>> ds = PyPSADataset()
        >>> ds.get_accepted_flags_containing_x('generators')
            {'generators', 'generators_t.p', 'generators_t.efficiency', ...}
        >>> ds.get_accepted_flags_containing_x('BUSES', match_case=True)
            set()  # Empty because case doesn't match
    """
    if match_case:
        return {f for f in self.accepted_flags if x in str(f)}
    x_lower = x.lower()
    return {f for f in self.accepted_flags if x_lower in str(f).lower()}

flag_is_accepted

flag_is_accepted(flag: FlagType) -> bool

Boolean check whether a flag is accepted by the Dataset.

This method can be optionally overridden in any child-class in case you want to follow logic instead of the explicit set of accepted_flags.

Source code in submodules/mesqual/mesqual/datasets/dataset.py
261
262
263
264
265
266
267
268
def flag_is_accepted(self, flag: FlagType) -> bool:
    """
    Boolean check whether a flag is accepted by the Dataset.

    This method can be optionally overridden in any child-class
    in case you want to follow logic instead of the explicit set of accepted_flags.
    """
    return flag in self.accepted_flags

fetch

fetch(flag: FlagType, config: dict | DatasetConfigType = None, **kwargs) -> Series | DataFrame

Fetch data associated with a specific flag.

This is the primary method for data access in MESQUAL datasets. It provides a unified interface for retrieving data regardless of the underlying source or dataset type. The method includes automatic caching, post-processing, and configuration management.

Parameters:

Name Type Description Default
flag FlagType

Data identifier flag (must be in accepted_flags)

required
config dict | DatasetConfigType

Optional configuration to override dataset defaults. Can be a dict or DatasetConfig instance.

None
**kwargs

Additional keyword arguments passed to the underlying data fetching implementation

{}

Returns:

Type Description
Series | DataFrame

DataFrame or Series containing the requested data

Raises:

Type Description
ValueError

If the flag is not accepted by this dataset

Examples:

>>> # Basic usage
>>> prices = dataset.fetch('buses_t.marginal_price')
>>>
>>> # With custom configuration
>>> prices = dataset.fetch('buses_t.marginal_price', config={'use_database': False})
Source code in submodules/mesqual/mesqual/datasets/dataset.py
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
@flag_must_be_accepted
def fetch(self, flag: FlagType, config: dict | DatasetConfigType = None, **kwargs) -> pd.Series | pd.DataFrame:
    """
    Fetch data associated with a specific flag.

    This is the primary method for data access in MESQUAL datasets. It provides
    a unified interface for retrieving data regardless of the underlying source
    or dataset type. The method includes automatic caching, post-processing,
    and configuration management.

    Args:
        flag: Data identifier flag (must be in accepted_flags)
        config: Optional configuration to override dataset defaults.
               Can be a dict or DatasetConfig instance.
        **kwargs: Additional keyword arguments passed to the underlying
                 data fetching implementation

    Returns:
        DataFrame or Series containing the requested data

    Raises:
        ValueError: If the flag is not accepted by this dataset

    Examples:

        >>> # Basic usage
        >>> prices = dataset.fetch('buses_t.marginal_price')
        >>>
        >>> # With custom configuration
        >>> prices = dataset.fetch('buses_t.marginal_price', config={'use_database': False})
    """
    effective_config = self._prepare_config(config)
    use_database = self._database is not None and effective_config.use_database

    if use_database:
        if self._database.key_is_up_to_date(self, flag, config=effective_config, **kwargs):
            return self._database.get(self, flag, config=effective_config, **kwargs)

    raw_data = self._fetch(flag, effective_config, **kwargs)
    processed_data = self._post_process_data(raw_data, flag, effective_config)

    if use_database:
        self._database.set(self, flag, config=effective_config, value=processed_data, **kwargs)

    return processed_data.copy()

flag_must_be_accepted

flag_must_be_accepted(method)

Decorator that validates flag acceptance before method execution.

Ensures that only accepted flags are processed by dataset methods, providing clear error messages for invalid flag usage.

Parameters:

Name Type Description Default
method

The method to decorate

required

Returns:

Type Description

Decorated method that validates flag acceptance

Raises:

Type Description
ValueError

If the flag is not accepted by the dataset

Source code in submodules/mesqual/mesqual/datasets/dataset.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def flag_must_be_accepted(method):
    """
    Decorator that validates flag acceptance before method execution.

    Ensures that only accepted flags are processed by dataset methods,
    providing clear error messages for invalid flag usage.

    Args:
        method: The method to decorate

    Returns:
        Decorated method that validates flag acceptance

    Raises:
        ValueError: If the flag is not accepted by the dataset
    """
    def raise_if_flag_not_accepted(self: Dataset, flag: FlagType, config: DatasetConfigType = None, **kwargs):
        if not self.flag_is_accepted(flag):
            raise ValueError(f'Flag {flag} not accepted by Dataset "{self.name}" of type {type(self)}.')
        return method(self, flag, config, **kwargs)
    return raise_if_flag_not_accepted

dataset_collection

DatasetCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], Dataset[DatasetConfigType, FlagType, FlagIndexType], ABC

Abstract base class for collections of datasets.

DatasetCollection extends the Dataset interface to handle multiple child datasets while maintaining the same unified API. This enables complex hierarchical structures where collections themselves can be treated as datasets.

Key Features
  • Inherits all Dataset functionality
  • Manages collections of child datasets
  • Provides iteration and access methods
  • Aggregates accepted flags from all children
  • Supports KPI operations across all sub-datasets

Class Type Parameters:

Name Bound or Constraints Description Default
DatasetType

Type of datasets that can be collected

required
DatasetConfigType

Configuration class for dataset behavior

required
FlagType

Type used for data flag identification

required
FlagIndexType

Flag index implementation for flag mapping

required

Attributes:

Name Type Description
datasets list[DatasetType]

List of child datasets in this collection

Note

This class follows the "Everything is a Dataset" principle, allowing collections to be used anywhere a Dataset is expected.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
class DatasetCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    Dataset[DatasetConfigType, FlagType, FlagIndexType],
    ABC
):
    """
    Abstract base class for collections of datasets.

    DatasetCollection extends the Dataset interface to handle multiple child datasets
    while maintaining the same unified API. This enables complex hierarchical structures
    where collections themselves can be treated as datasets.

    Key Features:
        - Inherits all Dataset functionality
        - Manages collections of child datasets
        - Provides iteration and access methods
        - Aggregates accepted flags from all children
        - Supports KPI operations across all sub-datasets

    Type Parameters:
        DatasetType: Type of datasets that can be collected
        DatasetConfigType: Configuration class for dataset behavior
        FlagType: Type used for data flag identification
        FlagIndexType: Flag index implementation for flag mapping

    Attributes:
        datasets (list[DatasetType]): List of child datasets in this collection

    Note:
        This class follows the "Everything is a Dataset" principle, allowing
        collections to be used anywhere a Dataset is expected.
    """

    def __init__(
            self,
            datasets: list[DatasetType] = None,
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None
    ):
        super().__init__(
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        self.datasets: list[DatasetType] = datasets if datasets else []

    @property
    def dataset_iterator(self) -> Iterator[DatasetType]:
        for ds in self.datasets:
            yield ds

    @property
    def flag_index(self) -> FlagIndex:
        from mesqual.flag.flag_index import EmptyFlagIndex
        if (self._flag_index is None) or isinstance(self._flag_index, EmptyFlagIndex):
            from mesqual.utils.check_all_same import all_same_object
            if all_same_object(ds.flag_index for ds in self.datasets) and len(self.datasets):
                return self.get_dataset().flag_index
        return self._flag_index

    @property
    def attributes(self) -> dict:
        child_dataset_atts = [ds.attributes for ds in self.datasets]
        attributes_that_all_childs_have_in_common = get_intersection_of_dicts(child_dataset_atts)
        return {**attributes_that_all_childs_have_in_common, **self._attributes.copy()}

    def get_merged_kpi_collection(self, deep: bool = True) -> 'KPICollection':
        from mesqual.kpis.kpi_collection import KPICollection
        all_kpis = set()
        for ds in self.datasets:
            for kpi in ds.kpi_collection:
                all_kpis.add(kpi)
            if deep and isinstance(ds, DatasetCollection):
                for kpi in ds.get_merged_kpi_collection(deep=deep):
                    all_kpis.add(kpi)

        return KPICollection(all_kpis)

    def add_kpis_to_all_sub_datasets(self, kpis: Iterable[KPIFactory]):
        for kpi in kpis:
            self.add_kpi_to_all_sub_datasets(kpi)

    def add_kpi_to_all_sub_datasets(self, kpi: KPIFactory):
        for ds in self.datasets:
            ds.add_kpi(kpi)

    def clear_kpi_collection_for_all_sub_datasets(self, deep: bool = True):
        for ds in self.datasets:
            ds.clear_kpi_collection()
            if deep and isinstance(ds, DatasetCollection):
                ds.clear_kpi_collection_for_all_sub_datasets(deep=deep)

    @abstractmethod
    def _fetch(
            self,
            flag: FlagType,
            effective_config: DatasetConfigType,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        pass

    def flag_is_accepted(self, flag: FlagType) -> bool:
        return any(ds.flag_is_accepted(flag) for ds in self.datasets)

    @property
    def accepted_flags(self) -> set[FlagType]:
        return nested_union([ds.accepted_flags for ds in self.datasets])

    def _required_flags_for_flag(self, flag: FlagType) -> set[FlagType]:
        return nested_union([ds.accepted_flags for ds in self.datasets])

    def get_dataset(self, key: str = None) -> DatasetType:
        if key is None:
            if not self.datasets:
                raise ValueError("No datasets available")
            return self.datasets[0]

        for ds in self.datasets:
            if ds.name == key:
                return ds

        raise KeyError(f"Dataset with name '{key}' not found")

    def add_datasets(self, datasets: Iterable[DatasetType]):
        for ds in datasets:
            self.add_dataset(ds)

    def add_dataset(self, dataset: DatasetType):
        if not isinstance(dataset, self.get_child_dataset_type()):
            raise TypeError(f"Can only add data sets of type {self.get_child_dataset_type().__name__}.")

        for i, existing in enumerate(self.datasets):
            if existing.name == dataset.name:
                logger.warning(
                    f"Dataset {self.name}: "
                    f"dataset {dataset.name} already in this collection. Replacing it."
                )
                self.datasets[i] = dataset
                return

        self.datasets.append(dataset)

    @classmethod
    def get_child_dataset_type(cls) -> type[DatasetType]:
        return Dataset

    def fetch_merged(
            self,
            flag: FlagType,
            config: dict | DatasetConfigType = None,
            keep_first: bool = True,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        """Fetch method that merges dataframes from all child datasets, similar to DatasetMergeCollection."""
        temp_merge_collection = self.get_merged_dataset_collection(keep_first)
        return temp_merge_collection.fetch(flag, config, **kwargs)

    def get_merged_dataset_collection(self, keep_first: bool = True) -> 'DatasetMergeCollection':
        return DatasetMergeCollection(
            datasets=self.datasets,
            name=f"{self.name} merged",
            keep_first=keep_first
        )

fetch_merged

fetch_merged(flag: FlagType, config: dict | DatasetConfigType = None, keep_first: bool = True, **kwargs) -> Series | DataFrame

Fetch method that merges dataframes from all child datasets, similar to DatasetMergeCollection.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
177
178
179
180
181
182
183
184
185
186
def fetch_merged(
        self,
        flag: FlagType,
        config: dict | DatasetConfigType = None,
        keep_first: bool = True,
        **kwargs
) -> pd.Series | pd.DataFrame:
    """Fetch method that merges dataframes from all child datasets, similar to DatasetMergeCollection."""
    temp_merge_collection = self.get_merged_dataset_collection(keep_first)
    return temp_merge_collection.fetch(flag, config, **kwargs)

DatasetLinkCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Links multiple datasets to provide unified data access with automatic routing.

DatasetLinkCollection acts as a unified interface to multiple child datasets, automatically routing data requests to the appropriate child dataset that accepts the requested flag. This is the foundation for platform datasets that combine multiple data interpreters.

Key Features
  • Automatic flag routing to appropriate child dataset
  • Bidirectional parent-child relationships
  • First-match-wins routing strategy
  • Overlap detection and warnings
  • Maintains all Dataset interface compatibility
Routing Logic

When fetch() is called, iterates through child datasets in order and returns data from the first dataset that accepts the flag.

Example:

>>> # Platform dataset with multiple interpreters
>>> link_collection = DatasetLinkCollection([
...     ModelInterpreter(network),
...     TimeSeriesInterpreter(network),
...     ObjectiveInterpreter(network)
... ])
>>> # Automatically routes to appropriate interpreter
>>> buses = link_collection.fetch('buses')  # -> ModelInterpreter
>>> prices = link_collection.fetch('buses_t.marginal_price')  # -> TimeSeriesInterpreter
Warning

If multiple child datasets accept the same flag, only the first one will be used. The constructor logs warnings for such overlaps.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
class DatasetLinkCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Links multiple datasets to provide unified data access with automatic routing.

    DatasetLinkCollection acts as a unified interface to multiple child datasets,
    automatically routing data requests to the appropriate child dataset that 
    accepts the requested flag. This is the foundation for platform datasets
    that combine multiple data interpreters.

    Key Features:
        - Automatic flag routing to appropriate child dataset
        - Bidirectional parent-child relationships
        - First-match-wins routing strategy
        - Overlap detection and warnings
        - Maintains all Dataset interface compatibility

    Routing Logic:
        When fetch() is called, iterates through child datasets in order and
        returns data from the first dataset that accepts the flag.

    Example:

        >>> # Platform dataset with multiple interpreters
        >>> link_collection = DatasetLinkCollection([
        ...     ModelInterpreter(network),
        ...     TimeSeriesInterpreter(network),
        ...     ObjectiveInterpreter(network)
        ... ])
        >>> # Automatically routes to appropriate interpreter
        >>> buses = link_collection.fetch('buses')  # -> ModelInterpreter
        >>> prices = link_collection.fetch('buses_t.marginal_price')  # -> TimeSeriesInterpreter

    Warning:
        If multiple child datasets accept the same flag, only the first one
        will be used. The constructor logs warnings for such overlaps.
    """

    def __init__(
            self,
            datasets: list[DatasetType],
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
    ):
        super().__init__(
            datasets=datasets,
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        self._warn_if_flags_overlap()

    def _fetch(self, flag: FlagType, effective_config: DatasetConfigType, **kwargs) -> pd.Series | pd.DataFrame:
        for ds in self.datasets:
            if ds.flag_is_accepted(flag):
                return ds.fetch(flag, effective_config, **kwargs)
        raise KeyError(f"Key '{flag}' not recognized by any of the linked Datasets.")

    def _warn_if_flags_overlap(self):
        from collections import Counter

        accepted_flags = list()
        for ds in self.datasets:
            accepted_flags += list(ds.accepted_flags)

        counts = Counter(accepted_flags)
        duplicates = {k: v for k, v in counts.items() if v > 1}
        if any(duplicates.values()):
            logger.warning(
                f"Dataset {self.name}: "
                f"The following keys have multiple Dataset sources: {duplicates.keys()}. \n"
                f"Only the first one will be used! This might lead to unexpected behavior. \n"
                f"A potential reason could be the use of an inappropriate DatasetCollection Type."
            )

    def get_dataset_by_type(self, ds_type: type[Dataset]) -> DatasetType:
        """Returns instance of child dataset that matches the ds_type."""
        for ds in self.datasets:
            if isinstance(ds, ds_type):
                return ds
        raise KeyError(f'No Dataset of type {ds_type.__name__} found in {self.name}.')

get_dataset_by_type

get_dataset_by_type(ds_type: type[Dataset]) -> DatasetType

Returns instance of child dataset that matches the ds_type.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
280
281
282
283
284
285
def get_dataset_by_type(self, ds_type: type[Dataset]) -> DatasetType:
    """Returns instance of child dataset that matches the ds_type."""
    for ds in self.datasets:
        if isinstance(ds, ds_type):
            return ds
    raise KeyError(f'No Dataset of type {ds_type.__name__} found in {self.name}.')

DatasetMergeCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Fetch method will merge fragmented Datasets for same flag, e.g.: - fragmented simulation runs, e.g. CW1, CW2, CW3, CWn. - fragmented data sources, e.g. mapping from Excel file with model from simulation platform.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
class DatasetMergeCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Fetch method will merge fragmented Datasets for same flag, e.g.:
        - fragmented simulation runs, e.g. CW1, CW2, CW3, CWn.
        - fragmented data sources, e.g. mapping from Excel file with model from simulation platform.
    """
    def __init__(
            self,
            datasets: list[DatasetType],
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
            keep_first: bool = True,
    ):
        super().__init__(
            datasets=datasets,
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        self.keep_first = keep_first

    def _fetch(self, flag: FlagType, effective_config: DatasetConfigType, **kwargs) -> pd.Series | pd.DataFrame:
        data_frames = []
        for ds in self.datasets:
            if ds.flag_is_accepted(flag):
                data_frames.append(ds.fetch(flag, effective_config, **kwargs))

        if not data_frames:
            raise KeyError(f"Flag '{flag}' not recognized by any of the datasets.")

        from mesqual.utils.pandas_utils.combine_df import combine_dfs
        df = combine_dfs(data_frames, keep_first=self.keep_first)
        return df

DatasetConcatCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Concatenates data from multiple datasets with MultiIndex structure.

DatasetConcatCollection is fundamental to MESQUAL's multi-scenario analysis capabilities. It fetches the same flag from multiple child datasets and concatenates the results into a single DataFrame/Series with an additional index level identifying the source dataset.

Key Features
  • Automatic MultiIndex creation with dataset names
  • Configurable concatenation axis and level positioning
  • Preserves all dimensional relationships
  • Supports scenario and comparison collections
  • Enables unified analysis across multiple datasets
MultiIndex Structure

The resulting data structure includes an additional index level (typically named 'dataset') that identifies the source dataset for each data point.

Example:

>>> # Collection of scenario datasets
>>> scenarios = DatasetConcatCollection([
...     PyPSADataset(base_network, name='base'),
...     PyPSADataset(high_res_network, name='high_res'),
...     PyPSADataset(low_gas_network, name='low_gas')
... ])
>>> 
>>> # Fetch creates MultiIndex DataFrame
>>> prices = scenarios.fetch('buses_t.marginal_price')
>>> print(prices.columns.names)
    ['dataset', 'Bus']  # Original Bus index + dataset level
>>> 
>>> # Access specific scenario data
>>> base_prices = prices['base']
>>> 
>>> # Analyze across scenarios
>>> mean_prices = prices.mean()  # Mean across all scenarios
Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
class DatasetConcatCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Concatenates data from multiple datasets with MultiIndex structure.

    DatasetConcatCollection is fundamental to MESQUAL's multi-scenario analysis
    capabilities. It fetches the same flag from multiple child datasets and
    concatenates the results into a single DataFrame/Series with an additional
    index level identifying the source dataset.

    Key Features:
        - Automatic MultiIndex creation with dataset names
        - Configurable concatenation axis and level positioning  
        - Preserves all dimensional relationships
        - Supports scenario and comparison collections
        - Enables unified analysis across multiple datasets

    MultiIndex Structure:
        The resulting data structure includes an additional index level
        (typically named 'dataset') that identifies the source dataset
        for each data point.

    Example:

        >>> # Collection of scenario datasets
        >>> scenarios = DatasetConcatCollection([
        ...     PyPSADataset(base_network, name='base'),
        ...     PyPSADataset(high_res_network, name='high_res'),
        ...     PyPSADataset(low_gas_network, name='low_gas')
        ... ])
        >>> 
        >>> # Fetch creates MultiIndex DataFrame
        >>> prices = scenarios.fetch('buses_t.marginal_price')
        >>> print(prices.columns.names)
            ['dataset', 'Bus']  # Original Bus index + dataset level
        >>> 
        >>> # Access specific scenario data
        >>> base_prices = prices['base']
        >>> 
        >>> # Analyze across scenarios
        >>> mean_prices = prices.mean()  # Mean across all scenarios
    """
    DEFAULT_CONCAT_LEVEL_NAME = 'dataset'
    DEFAULT_ATT_LEVEL_NAME = 'attribute'

    def __init__(
            self,
            datasets: list[DatasetType],
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
            default_concat_axis: int = 1,
            concat_top: bool = True,
            concat_level_name: str = None,
    ):
        super().__init__(
            datasets=datasets,
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        super().__init__(datasets=datasets, name=name)
        self.default_concat_axis = default_concat_axis
        self.concat_top = concat_top
        self.concat_level_name = concat_level_name or self.DEFAULT_CONCAT_LEVEL_NAME

    def get_attributes_concat_df(self) -> pd.DataFrame:
        if all(isinstance(ds, DatasetConcatCollection) for ds in self.datasets):
            use_att_df_instead_of_series = True
        else:
            use_att_df_instead_of_series = False

        atts_per_dataset = dict()
        for ds in self.datasets:
            atts = ds.get_attributes_concat_df().T if use_att_df_instead_of_series else ds.get_attributes_series()
            atts_per_dataset[ds.name] = atts

        return pd.concat(
            atts_per_dataset,
            axis=1,
            names=[self.concat_level_name]
        ).rename_axis(self.DEFAULT_ATT_LEVEL_NAME).T

    def _fetch(
            self,
            flag: FlagType,
            effective_config: DatasetConfigType,
            concat_axis: int = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        if concat_axis is None:
            concat_axis = self.default_concat_axis

        dfs = {}
        for ds in self.datasets:
            if ds.flag_is_accepted(flag):
                dfs[ds.name] = ds.fetch(flag, effective_config, **kwargs)

        if not dfs:
            raise KeyError(f"Flag '{flag}' not recognized by any of the datasets in {type(self)} {self.name}.")

        df0 = list(dfs.values())[0]
        if not all(len(df.axes) == len(df0.axes) for df in dfs.values()):
            raise NotImplementedError(f'Axes lengths do not match between dfs.')

        for ax in range(len(df0.axes)):
            if not all(set(df.axes[ax].names) == set(df0.axes[ax].names) for df in dfs.values()):
                raise NotImplementedError(f'Axes names do not match between dfs.')

        df = pd.concat(dfs, join='outer', axis=concat_axis, names=[self.concat_level_name])

        if not self.concat_top:
            ax = df.axes[concat_axis]
            df.axes[concat_axis] = ax.reorder_levels([ax.nlevels - 1] + list(range(ax.nlevels - 1)))

        return df

dataset_comparison

DatasetComparison

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Computes and provides access to differences between two datasets.

DatasetComparison is a core component of MESQUAL's scenario comparison capabilities. It automatically calculates deltas, ratios, or side-by-side comparisons between a variation dataset and a reference dataset, enabling systematic analysis of scenario differences.

Key Features
  • Automatic delta computation between datasets
  • Multiple comparison types (DELTA, VARIATION, BOTH)
  • Handles numeric and non-numeric data appropriately
  • Preserves data structure and index relationships
  • Configurable unchanged value handling
  • Inherits full Dataset interface
Comparison Types
  • DELTA: Variation - Reference (default)
  • VARIATION: Returns variation data with optional NaN for unchanged values
  • BOTH: Side-by-side variation and reference data

Attributes:

Name Type Description
variation_dataset

The dataset representing the scenario being compared

reference_dataset

The dataset representing the baseline for comparison

Example:

>>> # Compare high renewable scenario to base case
>>> comparison = DatasetComparison(
...     variation_dataset=high_res_dataset,
...     reference_dataset=base_dataset
... )
>>> 
>>> # Get price differences
>>> price_deltas = comparison.fetch('buses_t.marginal_price')
>>> 
>>> # Get both datasets side-by-side (often used to show model changes)
>>> price_both = comparison.fetch('buses', comparison_type=ComparisonTypeEnum.BOTH)
>>> 
>>> # Highlight only changes (often used to show model changes)
>>> price_changes = comparison.fetch('buses', replace_unchanged_values_by_nan=True)
Source code in submodules/mesqual/mesqual/datasets/dataset_comparison.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
class DatasetComparison(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Computes and provides access to differences between two datasets.

    DatasetComparison is a core component of MESQUAL's scenario comparison capabilities.
    It automatically calculates deltas, ratios, or side-by-side comparisons between
    a variation dataset and a reference dataset, enabling systematic analysis of
    scenario differences.

    Key Features:
        - Automatic delta computation between datasets
        - Multiple comparison types (DELTA, VARIATION, BOTH)
        - Handles numeric and non-numeric data appropriately
        - Preserves data structure and index relationships
        - Configurable unchanged value handling
        - Inherits full Dataset interface

    Comparison Types:
        - DELTA: Variation - Reference (default)
        - VARIATION: Returns variation data with optional NaN for unchanged values
        - BOTH: Side-by-side variation and reference data

    Attributes:
        variation_dataset: The dataset representing the scenario being compared
        reference_dataset: The dataset representing the baseline for comparison

    Example:

        >>> # Compare high renewable scenario to base case
        >>> comparison = DatasetComparison(
        ...     variation_dataset=high_res_dataset,
        ...     reference_dataset=base_dataset
        ... )
        >>> 
        >>> # Get price differences
        >>> price_deltas = comparison.fetch('buses_t.marginal_price')
        >>> 
        >>> # Get both datasets side-by-side (often used to show model changes)
        >>> price_both = comparison.fetch('buses', comparison_type=ComparisonTypeEnum.BOTH)
        >>> 
        >>> # Highlight only changes (often used to show model changes)
        >>> price_changes = comparison.fetch('buses', replace_unchanged_values_by_nan=True)
    """
    COMPARISON_ATTRIBUTES_SOURCE = ComparisonAttributesSourceEnum.USE_VARIATION_ATTS
    COMPARISON_NAME_JOIN = ' vs '
    VARIATION_DS_ATT_KEY = 'variation_dataset'
    REFERENCE_DS_ATT_KEY = 'reference_dataset'

    def __init__(
            self,
            variation_dataset: Dataset,
            reference_dataset: Dataset,
            name: str = None,
            attributes: dict = None,
            config: DatasetConfigType = None,
    ):
        name = name or self._get_auto_generated_name(variation_dataset, reference_dataset)

        super().__init__(
            [reference_dataset, variation_dataset],
            name=name,
            attributes=attributes,
            config=config
        )

        self.variation_dataset = variation_dataset
        self.reference_dataset = reference_dataset

    def _get_auto_generated_name(self, variation_dataset: Dataset, reference_dataset: Dataset) -> str:
        return variation_dataset.name + self.COMPARISON_NAME_JOIN + reference_dataset.name

    @property
    def attributes(self) -> dict:
        match self.COMPARISON_ATTRIBUTES_SOURCE:
            case ComparisonAttributesSourceEnum.USE_VARIATION_ATTS:
                atts = self.variation_dataset.attributes.copy()
            case ComparisonAttributesSourceEnum.USE_REFERENCE_ATTS:
                atts = self.reference_dataset.attributes.copy()
            case _:
                atts = super().attributes
        atts[self.VARIATION_DS_ATT_KEY] = self.variation_dataset.name
        atts[self.REFERENCE_DS_ATT_KEY] = self.reference_dataset.name
        return atts

    def fetch(
            self,
            flag: FlagType,
            config: dict | DatasetConfigType = None,
            comparison_type: ComparisonTypeEnum = ComparisonTypeEnum.DELTA,
            replace_unchanged_values_by_nan: bool = False,
            fill_value: float | int | None = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        """
        Fetch comparison data between variation and reference datasets.

        Extends the base Dataset.fetch() method with comparison-specific parameters
        for controlling how the comparison is computed and formatted.

        Args:
            flag: Data identifier flag to fetch from both datasets
            config: Optional configuration overrides
            comparison_type: How to compare the datasets:
                - DELTA: variation - reference (default)
                - VARIATION: variation data only, optionally with NaN for unchanged
                - BOTH: concatenated variation and reference data
            replace_unchanged_values_by_nan: If True, replaces values that are
                identical between datasets with NaN (useful for highlighting changes)
            fill_value: Value to use for missing data in subtraction operations
            **kwargs: Additional arguments passed to child dataset fetch methods

        Returns:
            DataFrame or Series with comparison results

        Example:

            >>> # Basic delta comparison
            >>> deltas = comparison.fetch('buses_t.marginal_price')
            >>> 
            >>> # Highlight only changed values
            >>> changes_only = comparison.fetch(
            ...     'buses_t.marginal_price',
            ...     replace_unchanged_values_by_nan=True
            ... )
            >>> 
            >>> # Side-by-side comparison
            >>> both = comparison.fetch(
            ...     'buses_t.marginal_price',
            ...     comparison_type=ComparisonTypeEnum.BOTH
            ... )
        """
        return super().fetch(
            flag=flag,
            config=config,
            comparison_type=comparison_type,
            replace_unchanged_values_by_nan=replace_unchanged_values_by_nan,
            fill_value=fill_value,
            **kwargs
        )

    def _fetch(
            self,
            flag: FlagType,
            effective_config: DatasetConfigType,
            comparison_type: ComparisonTypeEnum = ComparisonTypeEnum.DELTA,
            replace_unchanged_values_by_nan: bool = False,
            fill_value: float | int | None = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        df_var = self.variation_dataset.fetch(flag, effective_config, **kwargs)
        df_ref = self.reference_dataset.fetch(flag, effective_config, **kwargs)

        match comparison_type:
            case ComparisonTypeEnum.VARIATION:
                return self._get_variation_comparison(df_var, df_ref, replace_unchanged_values_by_nan)
            case ComparisonTypeEnum.BOTH:
                return self._get_both_comparison(df_var, df_ref, replace_unchanged_values_by_nan)
            case ComparisonTypeEnum.DELTA:
                return self._get_delta_comparison(df_var, df_ref, replace_unchanged_values_by_nan, fill_value)
        raise ValueError(f"Unsupported comparison_type: {comparison_type}")

    def _values_are_equal(self, val1, val2) -> bool:
        if pd.isna(val1) and pd.isna(val2):
            return True
        try:
            return val1 == val2
        except:
            pass
        try:
            if str(val1) == str(val2):
                return True
        except:
            pass
        return False

    def _get_variation_comparison(
            self,
            df_var: pd.DataFrame,
            df_ref: pd.DataFrame,
            replace_unchanged_values_by_nan: bool
    ) -> pd.DataFrame:
        result = df_var.copy()

        if replace_unchanged_values_by_nan:
            common_indices = df_var.index.intersection(df_ref.index)
            common_columns = df_var.columns.intersection(df_ref.columns)

            for idx in common_indices:
                for col in common_columns:
                    if self._values_are_equal(df_var.loc[idx, col], df_ref.loc[idx, col]):
                        result.loc[idx, col] = float('nan')

        return result

    def _get_both_comparison(
            self,
            df_var: pd.DataFrame,
            df_ref: pd.DataFrame,
            replace_unchanged_values_by_nan: bool
    ) -> pd.DataFrame:
        var_name = self.variation_dataset.name
        ref_name = self.reference_dataset.name

        result = pd.concat([df_var, df_ref], keys=[var_name, ref_name])
        result = result.sort_index(level=1)

        if replace_unchanged_values_by_nan:
            common_indices = df_var.index.intersection(df_ref.index)
            common_columns = df_var.columns.intersection(df_ref.columns)

            for idx in common_indices:
                for col in common_columns:
                    if self._values_are_equal(df_var.loc[idx, col], df_ref.loc[idx, col]):
                        result.loc[(var_name, idx), col] = float('nan')
                        result.loc[(ref_name, idx), col] = float('nan')

        return result

    def _get_delta_comparison(
            self,
            df_var: pd.DataFrame,
            df_ref: pd.DataFrame,
            replace_unchanged_values_by_nan: bool,
            fill_value: float | int | None
    ) -> pd.DataFrame:
        if pd_is_numeric(df_var) and pd_is_numeric(df_ref):
            result = df_var.subtract(df_ref, fill_value=fill_value)

            if replace_unchanged_values_by_nan:
                result = result.replace(0, float('nan'))

            return result

        all_columns = df_var.columns.union(df_ref.columns)
        all_indices = df_var.index.union(df_ref.index)

        result = pd.DataFrame(index=all_indices, columns=all_columns)

        for col in all_columns:
            if col in df_var.columns and col in df_ref.columns:
                var_col = df_var[col]
                ref_col = df_ref[col]

                # Special handling for boolean columns
                if pd.api.types.is_bool_dtype(var_col) and pd.api.types.is_bool_dtype(ref_col):
                    # For booleans, we can mark where they differ
                    common_indices = var_col.index.intersection(ref_col.index)
                    delta = pd.Series(index=all_indices)

                    for idx in common_indices:
                        if var_col.loc[idx] != ref_col.loc[idx]:
                            delta.loc[idx] = f"{var_col.loc[idx]} (was {ref_col.loc[idx]})"
                        elif not replace_unchanged_values_by_nan:
                            delta.loc[idx] = var_col.loc[idx]

                    # Handle indices only in variation
                    for idx in var_col.index.difference(ref_col.index):
                        delta.loc[idx] = f"{var_col.loc[idx]} (new)"

                    # Handle indices only in reference
                    for idx in ref_col.index.difference(var_col.index):
                        delta.loc[idx] = f"DELETED: {ref_col.loc[idx]}"

                    result[col] = delta

                elif pd.api.types.is_numeric_dtype(var_col) and pd.api.types.is_numeric_dtype(ref_col):
                    delta = var_col.subtract(ref_col, fill_value=fill_value)
                    result[col] = delta

                    if replace_unchanged_values_by_nan:
                        result.loc[delta == 0, col] = float('nan')
                else:
                    common_indices = var_col.index.intersection(ref_col.index)
                    var_only_indices = var_col.index.difference(ref_col.index)
                    ref_only_indices = ref_col.index.difference(var_col.index)

                    for idx in common_indices:
                        if not self._values_are_equal(var_col.loc[idx], ref_col.loc[idx]):
                            result.loc[idx, col] = f"{var_col.loc[idx]} (was {ref_col.loc[idx]})"
                        elif not replace_unchanged_values_by_nan:
                            result.loc[idx, col] = var_col.loc[idx]

                    for idx in var_only_indices:
                        result.loc[idx, col] = f"{var_col.loc[idx]} (new)"

                    for idx in ref_only_indices:
                        val = ref_col.loc[idx]
                        if not pd.isna(val):
                            result.loc[idx, col] = f"DELETED: {val}"

            elif col in df_var.columns:
                for idx in df_var.index:
                    result.loc[idx, col] = f"{df_var.loc[idx, col]} (new column)"

            else:  # Column only in reference
                for idx in df_ref.index:
                    val = df_ref.loc[idx, col]
                    if not pd.isna(val):
                        result.loc[idx, col] = f"REMOVED: {val}"

        return result

fetch

fetch(flag: FlagType, config: dict | DatasetConfigType = None, comparison_type: ComparisonTypeEnum = DELTA, replace_unchanged_values_by_nan: bool = False, fill_value: float | int | None = None, **kwargs) -> Series | DataFrame

Fetch comparison data between variation and reference datasets.

Extends the base Dataset.fetch() method with comparison-specific parameters for controlling how the comparison is computed and formatted.

Parameters:

Name Type Description Default
flag FlagType

Data identifier flag to fetch from both datasets

required
config dict | DatasetConfigType

Optional configuration overrides

None
comparison_type ComparisonTypeEnum

How to compare the datasets: - DELTA: variation - reference (default) - VARIATION: variation data only, optionally with NaN for unchanged - BOTH: concatenated variation and reference data

DELTA
replace_unchanged_values_by_nan bool

If True, replaces values that are identical between datasets with NaN (useful for highlighting changes)

False
fill_value float | int | None

Value to use for missing data in subtraction operations

None
**kwargs

Additional arguments passed to child dataset fetch methods

{}

Returns:

Type Description
Series | DataFrame

DataFrame or Series with comparison results

Example:

>>> # Basic delta comparison
>>> deltas = comparison.fetch('buses_t.marginal_price')
>>> 
>>> # Highlight only changed values
>>> changes_only = comparison.fetch(
...     'buses_t.marginal_price',
...     replace_unchanged_values_by_nan=True
... )
>>> 
>>> # Side-by-side comparison
>>> both = comparison.fetch(
...     'buses_t.marginal_price',
...     comparison_type=ComparisonTypeEnum.BOTH
... )
Source code in submodules/mesqual/mesqual/datasets/dataset_comparison.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def fetch(
        self,
        flag: FlagType,
        config: dict | DatasetConfigType = None,
        comparison_type: ComparisonTypeEnum = ComparisonTypeEnum.DELTA,
        replace_unchanged_values_by_nan: bool = False,
        fill_value: float | int | None = None,
        **kwargs
) -> pd.Series | pd.DataFrame:
    """
    Fetch comparison data between variation and reference datasets.

    Extends the base Dataset.fetch() method with comparison-specific parameters
    for controlling how the comparison is computed and formatted.

    Args:
        flag: Data identifier flag to fetch from both datasets
        config: Optional configuration overrides
        comparison_type: How to compare the datasets:
            - DELTA: variation - reference (default)
            - VARIATION: variation data only, optionally with NaN for unchanged
            - BOTH: concatenated variation and reference data
        replace_unchanged_values_by_nan: If True, replaces values that are
            identical between datasets with NaN (useful for highlighting changes)
        fill_value: Value to use for missing data in subtraction operations
        **kwargs: Additional arguments passed to child dataset fetch methods

    Returns:
        DataFrame or Series with comparison results

    Example:

        >>> # Basic delta comparison
        >>> deltas = comparison.fetch('buses_t.marginal_price')
        >>> 
        >>> # Highlight only changed values
        >>> changes_only = comparison.fetch(
        ...     'buses_t.marginal_price',
        ...     replace_unchanged_values_by_nan=True
        ... )
        >>> 
        >>> # Side-by-side comparison
        >>> both = comparison.fetch(
        ...     'buses_t.marginal_price',
        ...     comparison_type=ComparisonTypeEnum.BOTH
        ... )
    """
    return super().fetch(
        flag=flag,
        config=config,
        comparison_type=comparison_type,
        replace_unchanged_values_by_nan=replace_unchanged_values_by_nan,
        fill_value=fill_value,
        **kwargs
    )

platform_dataset

PlatformDataset

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetLinkCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType], ABC

Base class for platform-specific datasets with automatic interpreter management.

PlatformDataset provides the foundation for integrating MESQUAL with specific energy modeling platforms (PyPSA, PLEXOS, etc.). It manages a registry of data interpreters and automatically instantiates them to handle different types of platform data.

Key Features
  • Automatic interpreter registration and instantiation
  • Type-safe interpreter management through generics
  • Flexible argument passing to interpreter constructors
  • Support for study-specific interpreter extensions
  • Unified data access through DatasetLinkCollection routing
Architecture
  • Uses DatasetLinkCollection for automatic flag routing
  • Manages interpreter registry at class level
  • Auto-instantiates all registered interpreters on construction
  • Supports inheritance and interpreter registration on subclasses

Class Type Parameters:

Name Bound or Constraints Description Default
DatasetType

Base type for all interpreters (must be Dataset subclass)

required
DatasetConfigType

Configuration class for dataset behavior

required
FlagType

Type used for data flag identification

required
FlagIndexType

Flag index implementation for flag mapping

required
Class Attributes

_interpreter_registry: List of registered interpreter classes

Usage Pattern
  1. Create platform dataset class inheriting from PlatformDataset
  2. Define get_child_dataset_type() to specify interpreter base class
  3. Create interpreter classes inheriting from the base interpreter
  4. Register interpreters using @PlatformDataset.register_interpreter
  5. Instantiate platform dataset - interpreters are auto-created

Example:

>>> # Define platform dataset
>>> class PyPSADataset(PlatformDataset[PyPSAInterpreter, ...]):
...     @classmethod
...     def get_child_dataset_type(cls):
...         return PyPSAInterpreter
...
>>> # Register core interpreters
>>> @PyPSADataset.register_interpreter
... class PyPSAModelInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'buses', 'generators', 'lines'}
...
>>> @PyPSADataset.register_interpreter  
... class PyPSATimeSeriesInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'buses_t.marginal_price', 'generators_t.p'}
...
>>> # Register study-specific interpreter
>>> @PyPSADataset.register_interpreter
... class CustomVariableInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'custom_metric'}
...
>>> # Use platform dataset
>>> dataset = PyPSADataset(network=my_network)
>>> buses = dataset.fetch('buses')  # Routes to ModelInterpreter
>>> prices = dataset.fetch('buses_t.marginal_price')  # Routes to TimeSeriesInterpreter
>>> custom = dataset.fetch('custom_metric')  # Routes to CustomVariableInterpreter
Notes
  • Interpreters are registered at class level and shared across instances
  • Registration order affects routing (last registered = first checked)
  • All registered interpreters are instantiated for each platform dataset
  • Constructor arguments are automatically extracted and passed to interpreters
Source code in submodules/mesqual/mesqual/datasets/platform_dataset.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
class PlatformDataset(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetLinkCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    ABC
):
    """
    Base class for platform-specific datasets with automatic interpreter management.

    PlatformDataset provides the foundation for integrating MESQUAL with specific
    energy modeling platforms (PyPSA, PLEXOS, etc.). It manages a registry of
    data interpreters and automatically instantiates them to handle different
    types of platform data.

    Key Features:
        - Automatic interpreter registration and instantiation
        - Type-safe interpreter management through generics
        - Flexible argument passing to interpreter constructors
        - Support for study-specific interpreter extensions
        - Unified data access through DatasetLinkCollection routing

    Architecture:
        - Uses DatasetLinkCollection for automatic flag routing
        - Manages interpreter registry at class level
        - Auto-instantiates all registered interpreters on construction
        - Supports inheritance and interpreter registration on subclasses

    Type Parameters:
        DatasetType: Base type for all interpreters (must be Dataset subclass)
        DatasetConfigType: Configuration class for dataset behavior
        FlagType: Type used for data flag identification
        FlagIndexType: Flag index implementation for flag mapping

    Class Attributes:
        _interpreter_registry: List of registered interpreter classes

    Usage Pattern:
        1. Create platform dataset class inheriting from PlatformDataset
        2. Define get_child_dataset_type() to specify interpreter base class
        3. Create interpreter classes inheriting from the base interpreter
        4. Register interpreters using @PlatformDataset.register_interpreter
        5. Instantiate platform dataset - interpreters are auto-created

    Example:

        >>> # Define platform dataset
        >>> class PyPSADataset(PlatformDataset[PyPSAInterpreter, ...]):
        ...     @classmethod
        ...     def get_child_dataset_type(cls):
        ...         return PyPSAInterpreter
        ...
        >>> # Register core interpreters
        >>> @PyPSADataset.register_interpreter
        ... class PyPSAModelInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'buses', 'generators', 'lines'}
        ...
        >>> @PyPSADataset.register_interpreter  
        ... class PyPSATimeSeriesInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'buses_t.marginal_price', 'generators_t.p'}
        ...
        >>> # Register study-specific interpreter
        >>> @PyPSADataset.register_interpreter
        ... class CustomVariableInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'custom_metric'}
        ...
        >>> # Use platform dataset
        >>> dataset = PyPSADataset(network=my_network)
        >>> buses = dataset.fetch('buses')  # Routes to ModelInterpreter
        >>> prices = dataset.fetch('buses_t.marginal_price')  # Routes to TimeSeriesInterpreter
        >>> custom = dataset.fetch('custom_metric')  # Routes to CustomVariableInterpreter

    Notes:
        - Interpreters are registered at class level and shared across instances
        - Registration order affects routing (last registered = first checked)
        - All registered interpreters are instantiated for each platform dataset
        - Constructor arguments are automatically extracted and passed to interpreters
    """

    _interpreter_registry: list[Type[DatasetType]] = []

    def __init__(
            self,
            name: str = None,
            flag_index: FlagIndexType = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
            **kwargs
    ):
        super().__init__(
            datasets=[],
            name=name,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        interpreter_args = self._prepare_interpreter_initialization_args(kwargs)
        datasets = self._initialize_registered_interpreters(interpreter_args)
        self.add_datasets(datasets)

    @classmethod
    def register_interpreter(cls, interpreter: Type[DatasetType]) -> Type['DatasetType']:
        """
        Register a data interpreter class with this platform dataset.

        This method is typically used as a decorator to register interpreter classes
        that handle specific types of platform data. Registered interpreters are
        automatically instantiated when the platform dataset is created.

        Args:
            interpreter: Interpreter class that must inherit from get_child_dataset_type()

        Returns:
            The interpreter class (unchanged) to support decorator usage

        Raises:
            TypeError: If interpreter doesn't inherit from the required base class

        Example:

            >>> @PyPSADataset.register_interpreter
            ... class CustomInterpreter(PyPSAInterpreter):
            ...     @property
            ...     def accepted_flags(self):
            ...         return {'custom_flag'}
            ...     
            ...     def _fetch(self, flag, config, **kwargs):
            ...         return compute_custom_data()
        """
        cls._validate_interpreter_type(interpreter)
        if interpreter not in cls._interpreter_registry:
            cls._add_interpreter_to_registry(interpreter)
        return interpreter

    @classmethod
    def get_registered_interpreters(cls) -> list[Type[DatasetType]]:
        return cls._interpreter_registry.copy()

    def get_interpreter_instance(self, interpreter_type: Type[DatasetType]) -> DatasetType:
        interpreter = self._find_interpreter_instance(interpreter_type)
        if interpreter is None:
            raise ValueError(
                f'No interpreter instance found for type {interpreter_type.__name__}'
            )
        return interpreter

    def get_flags_by_interpreter(self) -> dict[Type[DatasetType], set[FlagType]]:
        return {
            type(interpreter): interpreter.accepted_flags
            for interpreter in self.datasets.values()
        }

    def _prepare_interpreter_initialization_args(self, kwargs: dict) -> dict:
        interpreter_signature = InterpreterSignature.from_interpreter(self.get_child_dataset_type())
        return {
            arg: kwargs.get(arg, default)
            for arg, default in zip(interpreter_signature.args, interpreter_signature.defaults)
        }

    def _initialize_registered_interpreters(self, interpreter_args: dict) -> list[DatasetType]:
        return [
            interpreter(**interpreter_args, parent_dataset=self)
            for interpreter in self._interpreter_registry
        ]

    @classmethod
    def _validate_interpreter_type(cls, interpreter: Type[DatasetType]) -> None:
        if not issubclass(interpreter, cls.get_child_dataset_type()):
            raise TypeError(
                f'Interpreter must be subclass of {cls.get_child_dataset_type().__name__}'
            )

    @classmethod
    def _validate_interpreter_not_registered(cls, interpreter: Type[DatasetType]) -> None:
        if interpreter in cls._interpreter_registry:
            raise ValueError(f'Interpreter {interpreter.__name__} already registered')

    @classmethod
    def _add_interpreter_to_registry(cls, interpreter: Type[DatasetType]) -> None:
        cls._interpreter_registry.insert(0, interpreter)

    def _find_interpreter_instance(self, interpreter_type: Type[DatasetType]) -> DatasetType | None:
        for interpreter in self.datasets.values():
            if isinstance(interpreter, interpreter_type):
                return interpreter
        return None

register_interpreter classmethod

register_interpreter(interpreter: Type[DatasetType]) -> Type['DatasetType']

Register a data interpreter class with this platform dataset.

This method is typically used as a decorator to register interpreter classes that handle specific types of platform data. Registered interpreters are automatically instantiated when the platform dataset is created.

Parameters:

Name Type Description Default
interpreter Type[DatasetType]

Interpreter class that must inherit from get_child_dataset_type()

required

Returns:

Type Description
Type['DatasetType']

The interpreter class (unchanged) to support decorator usage

Raises:

Type Description
TypeError

If interpreter doesn't inherit from the required base class

Example:

>>> @PyPSADataset.register_interpreter
... class CustomInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'custom_flag'}
...     
...     def _fetch(self, flag, config, **kwargs):
...         return compute_custom_data()
Source code in submodules/mesqual/mesqual/datasets/platform_dataset.py
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
@classmethod
def register_interpreter(cls, interpreter: Type[DatasetType]) -> Type['DatasetType']:
    """
    Register a data interpreter class with this platform dataset.

    This method is typically used as a decorator to register interpreter classes
    that handle specific types of platform data. Registered interpreters are
    automatically instantiated when the platform dataset is created.

    Args:
        interpreter: Interpreter class that must inherit from get_child_dataset_type()

    Returns:
        The interpreter class (unchanged) to support decorator usage

    Raises:
        TypeError: If interpreter doesn't inherit from the required base class

    Example:

        >>> @PyPSADataset.register_interpreter
        ... class CustomInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'custom_flag'}
        ...     
        ...     def _fetch(self, flag, config, **kwargs):
        ...         return compute_custom_data()
    """
    cls._validate_interpreter_type(interpreter)
    if interpreter not in cls._interpreter_registry:
        cls._add_interpreter_to_registry(interpreter)
    return interpreter

dataset_config