Skip to content

Dataset References

datasets

Dataset module providing the core data access layer for MESQUAL.

This module implements the foundational "Everything is a Dataset" principle, where all data sources—individual scenarios, merged scenarios, collections, and comparisons—share a unified interface through the .fetch(flag) pattern.

Core Classes

Dataset: Abstract base class defining the universal data access interface. All dataset types inherit from this class and implement the fetch() method.

DatasetCollection: Base class for grouping related datasets together. Provides iteration and batch operations across multiple datasets.

DatasetLinkCollection: Collection maintaining parent-child relationships. Used when datasets need to reference back to their container.

DatasetMergeCollection: Combines multiple datasets by merging their data. Useful for aggregating results from different simulation runs.

DatasetSumCollection: Aggregates datasets by summing numeric values. Commonly used for capacity or production totals across scenarios.

DatasetConcatCollection: Concatenates datasets along a specified axis. Creates MultiIndex structures preserving scenario identities.

DatasetComparison: Computes differences between scenario pairs. Enables delta analysis and comparative studies.

DatasetConcatCollectionOfComparisons: Specialized collection for comparisons. Facilitates systematic comparison across multiple scenario pairs.

PlatformDataset: Dataset subclass for platform-specific implementations. Extended by platform interfaces (e.g., PyPSA, PLEXOS) to provide platform-aware data access.

DatasetConfig: Configuration class controlling dataset behavior. Manages caching, post-processing, and platform-specific options.

Example

Basic usage pattern::

from mesqual.datasets import Dataset, DatasetConfig

# Fetch data from a dataset
prices = dataset.fetch('buses_t.marginal_price')

# Configure dataset behavior
config = DatasetConfig(use_database=True)
dataset.set_instance_config(config)

# Work with collections
for scenario in collection:
    data = scenario.fetch('generators_t.p')
See Also
  • :mod:mesqual.flag: Flag types and flag index implementations
  • :mod:mesqual.kpis: KPI calculation framework
  • :mod:mesqual.databases: Caching backends for datasets

Dataset

Bases: Generic[DatasetConfigType, FlagType, FlagIndexType], ABC

Abstract base class for all datasets in the MESQUAL framework.

The Dataset class provides the fundamental interface for data access and manipulation in MESQUAL. It implements the core principle "Everything is a Dataset" where individual scenarios, scenarios merged from multiple simulation runs or data sources, collections of scenarios, and scenario comparisons all share the same unified interface.

Key Features
  • Unified .fetch(flag) interface for data access
  • Attribute management for scenario metadata
  • KPI calculation integrations
  • Database caching support
  • Dot notation fetching via dotfetch()
  • Type-safe generic implementation

Class Type Parameters:

Name Bound or Constraints Description Default
DatasetConfigType

Configuration class for dataset behavior

required
FlagType

Type used for data flag identification (typically str)

required
FlagIndexType

Flag index implementation for flag mapping

required

Attributes:

Name Type Description
name str

Human-readable identifier for the dataset

kpi_collection KPICollection

Collection of KPIs associated with this dataset

Example:

>>> # Basic usage pattern
>>> data = dataset.fetch('buses_t.marginal_price')
>>> flags = dataset.accepted_flags
>>> if dataset.flag_is_accepted('generators_t.p'):
...     gen_data = dataset.fetch('generators_t.p')
Source code in submodules/mesqual/mesqual/datasets/dataset.py
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
class Dataset(Generic[DatasetConfigType, FlagType, FlagIndexType], ABC):
    """
    Abstract base class for all datasets in the MESQUAL framework.

    The Dataset class provides the fundamental interface for data access and manipulation
    in MESQUAL. It implements the core principle "Everything is a Dataset" where individual
    scenarios, scenarios merged from multiple simulation runs or data sources,
    collections of scenarios, and scenario comparisons all share the same unified interface.

    Key Features:
        - Unified `.fetch(flag)` interface for data access
        - Attribute management for scenario metadata
        - KPI calculation integrations
        - Database caching support
        - Dot notation fetching via `dotfetch()`
        - Type-safe generic implementation

    Type Parameters:
        DatasetConfigType: Configuration class for dataset behavior
        FlagType: Type used for data flag identification (typically str)
        FlagIndexType: Flag index implementation for flag mapping

    Attributes:
        name (str): Human-readable identifier for the dataset
        kpi_collection (KPICollection): Collection of KPIs associated with this dataset

    Example:

        >>> # Basic usage pattern
        >>> data = dataset.fetch('buses_t.marginal_price')
        >>> flags = dataset.accepted_flags
        >>> if dataset.flag_is_accepted('generators_t.p'):
        ...     gen_data = dataset.fetch('generators_t.p')
    """

    def __init__(
            self,
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndexType = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None
    ):
        """
        Initialize a new Dataset instance.

        Args:
            name: Human-readable identifier. If None, auto-generates from class name
            parent_dataset: Optional parent dataset for hierarchical relationships
            flag_index: Index for mapping and validating data flags
            attributes: Dictionary of metadata attributes for the dataset
            database: Optional database for caching expensive computations
            config: Configuration object controlling dataset behavior
        """
        self.name = name or f'{self.__class__.__name__}_{str(id(self))}'
        self._flag_index = flag_index or EmptyFlagIndex()
        self._parent_dataset = parent_dataset
        self._attributes: dict = attributes or dict()
        self._database = database
        self._config = config

        from mesqual.kpis.collection import KPICollection
        self.kpi_collection: KPICollection = KPICollection()

    @flag_must_be_accepted
    def fetch(self, flag: FlagType, config: dict | DatasetConfigType = None, **kwargs) -> pd.Series | pd.DataFrame:
        """
        Fetch data associated with a specific flag.

        This is the primary method for data access in MESQUAL datasets. It provides
        a unified interface for retrieving data regardless of the underlying source
        or dataset type. The method includes automatic caching, post-processing,
        and configuration management.

        Configuration Override Behavior:

            The ``config`` parameter allows fetch-time overrides of dataset behavior.
            These overrides are merged with the dataset's effective configuration
            (which combines class-level and instance-level settings). Only non-None
            values in the override will replace the existing settings.

        The configuration resolution hierarchy (later overrides earlier):

            1. Base config defaults
            2. Class config (via DatasetConfigManager)
            3. Instance config (passed to Dataset.__init__)
            4. **Fetch-time config (this parameter)**

        Args:
            flag: Data identifier flag (must be in accepted_flags)
            config: Optional configuration to override dataset defaults.
                Can be either:

                - **dict**: Quick way to override specific settings. Keys must
                  match config attribute names (e.g., ``use_database``,
                  ``auto_sort_datetime_index``). Platform-specific options
                  are also supported if the dataset uses an extended config.

                - **DatasetConfig instance**: Full config object for type safety.
                  Must be compatible with the dataset's config type.

            **kwargs: Additional keyword arguments passed to the underlying
                data fetching implementation

        Returns:
            DataFrame or Series containing the requested data

        Raises:
            ValueError: If the flag is not accepted by this dataset

        Examples:
            Basic usage::

                >>> prices = dataset.fetch('buses_t.marginal_price')

            Override base config options with a dict::

                >>> # Skip database cache for this fetch
                >>> prices = dataset.fetch(
                ...     'buses_t.marginal_price',
                ...     config=dict(use_database=False)
                ... )
                >>>
                >>> # Disable datetime sorting
                >>> prices = dataset.fetch(
                ...     'generators_t.p',
                ...     config=dict(auto_sort_datetime_index=False)
                ... )

            Override platform-specific options::

                >>> # Platform configs may have additional options
                >>> # e.g., a config with timestamp conversion setting
                >>> data = dataset.fetch(
                ...     'some_flag',
                ...     config=dict(convert_period_enum_to_datetime_index=False)
                ... )

            Override study-specific options::

                >>> # Study-specific configs can add custom behavior
                >>> # e.g., toggle custom data corrections
                >>> data = dataset.fetch(
                ...     'some_flag',
                ...     config=dict(apply_custom_correction=False)
                ... )

            Using a config object::

                >>> from mesqual.datasets import DatasetConfig
                >>> custom_config = DatasetConfig(
                ...     use_database=False,
                ...     auto_sort_datetime_index=False
                ... )
                >>> prices = dataset.fetch('buses_t.marginal_price', config=custom_config)
        """
        effective_config = self._prepare_config(config)
        use_database = self._database is not None and effective_config.use_database

        if use_database:
            if self._database.key_is_up_to_date(self, flag, config=effective_config, **kwargs):
                return self._database.get(self, flag, config=effective_config, **kwargs)

        raw_data = self._fetch(flag, effective_config, **kwargs)
        processed_data = self._post_process_data(raw_data, flag, effective_config)

        if use_database:
            self._database.set(self, flag, config=effective_config, value=processed_data, **kwargs)

        return processed_data.copy()

    @property
    @abstractmethod
    def accepted_flags(self) -> set[FlagType]:
        """
        Set of all flags accepted by this dataset.

        This abstract property must be implemented by all concrete dataset classes
        to define which data flags can be fetched from the dataset.

        Returns:
            Set of flags that can be used with the fetch() method

        Example:

            >>> print(dataset.accepted_flags)
                {'buses', 'buses_t.marginal_price', 'generators', 'generators_t.p', ...}
        """
        return set()

    def get_accepted_flags_containing_x(self, x: str, match_case: bool = False) -> set[FlagType]:
        """
        Find all accepted flags containing a specific substring.

        Useful for discovering related data flags or filtering flags by category.

        Args:
            x: Substring to search for in flag names
            match_case: If True, performs case-sensitive search. Default is False.

        Returns:
            Set of accepted flags containing the substring

        Example:

            >>> ds = PyPSADataset()
            >>> ds.get_accepted_flags_containing_x('generators')
                {'generators', 'generators_t.p', 'generators_t.efficiency', ...}
            >>> ds.get_accepted_flags_containing_x('BUSES', match_case=True)
                set()  # Empty because case doesn't match
        """
        if match_case:
            return {f for f in self.accepted_flags if x in str(f)}
        x_lower = x.lower()
        return {f for f in self.accepted_flags if x_lower in str(f).lower()}

    def flag_is_accepted(self, flag: FlagType) -> bool:
        """
        Boolean check whether a flag is accepted by the Dataset.

        This method can be optionally overridden in any child-class
        in case you want to follow logic instead of the explicit set of accepted_flags.
        """
        return flag in self.accepted_flags

    def dotfetch(self) -> _DotNotationFetcher:
        """
        Create a dot notation fetcher for intuitive flag access.

        Returns a helper object that allows accessing nested data flags using
        Python attribute syntax instead of string-based flags. The fetcher
        accumulates attribute accesses and converts them to the appropriate
        flag when called.

        Returns:
            _DotNotationFetcher: Helper object enabling chained attribute access

        Example:
            Using dot notation instead of string flags::

                >>> # Traditional string-based fetch
                >>> prices = dataset.fetch('buses_t.marginal_price')

                >>> # Equivalent dot notation fetch
                >>> prices = dataset.dotfetch().buses_t.marginal_price()

                >>> # Multi-level flag access
                >>> gen_power = dataset.dotfetch().generators_t.p()
        """
        return _DotNotationFetcher(self)

    @property
    def flag_index(self) -> FlagIndexType:
        """
        Access the flag index for this dataset.

        The flag index provides flag mapping, validation, and metadata lookup
        capabilities. It enables features like dot notation fetching, flag-to-model
        mapping, and flag categorization.

        If no flag index was configured, returns an EmptyFlagIndex and logs
        an informational message when accessed.

        Returns:
            FlagIndexType: The configured flag index or EmptyFlagIndex if none set

        Note:
            For full flag index functionality (model mapping, flag categorization),
            ensure a proper flag index is set during dataset initialization.

        See Also:
            - [Flag System](../flag.md) - Flag index implementations and usage
        """
        if isinstance(self._flag_index, EmptyFlagIndex):
            logger.info(
                f"Dataset {self.name}: "
                "You're trying to use functionality of the FlagIndex but didn't define one. "
                "The current FlagIndex in use is empty. "
                "Make sure to set a flag_index in case you want to use full functionality of the flag_index."
            )
        return self._flag_index

    @property
    def database(self) -> Database | None:
        """
        Access the caching database for this dataset.

        The database provides persistent caching for expensive fetch operations.
        When configured, the fetch() method automatically checks the database
        before computing data and stores results for future access.

        Returns:
            Database | None: The configured database instance, or None if caching
                is not enabled for this dataset

        See Also:
            - Database configuration and caching behavior
            - Uses database for automatic caching when available (see `fetch()` method)
        """
        return self._database

    def add_kpis_from_definitions(self, kpi_definitions: KPIDefinition | list[KPIDefinition]):
        """
        Generate and add KPIs from one or more KPI definitions.

        KPI definitions are templates that generate concrete KPI instances
        based on the dataset's structure. This method processes definitions
        and adds the resulting KPIs to the dataset's KPI collection.

        Args:
            kpi_definitions: Single KPIDefinition or list of definitions.
                Each definition's generate_kpis() method is called with
                this dataset to produce KPI instances.

        Example:
            Adding KPIs from definitions::

                >>> from mesqual.kpis.definitions import TotalGenerationKPIDefinition
                >>> dataset.add_kpis_from_definitions(TotalGenerationKPIDefinition())

                >>> # Add multiple definitions at once
                >>> definitions = [
                ...     TotalGenerationKPIDefinition(),
                ...     MarginalPriceKPIDefinition(),
                ... ]
                >>> dataset.add_kpis_from_definitions(definitions)

        See Also:
            - `add_kpi()` - Add a single KPI directly
            - `add_kpis()` - Add multiple KPI instances
            - [KPI Definitions](../kpis/definitions/base.md) - Base KPI definition class
        """
        from mesqual.kpis.definitions.base import KPIDefinition
        if isinstance(kpi_definitions, KPIDefinition):
            kpis = kpi_definitions.generate_kpis(self)
            self.add_kpis(kpis)
        else:
            for kpi_def in kpi_definitions:
                kpis = kpi_def.generate_kpis(self)
                self.add_kpis(kpis)

    def add_kpis(self, kpis: Iterable[KPI]):
        """
        Add multiple KPIs to this dataset's KPI collection.

        Args:
            kpis: Iterable of KPI instances, factories, or classes to add
        """
        duplicates = []
        for kpi in kpis:
            if kpi in self.kpi_collection:
                duplicates.append(kpi)
            else:
                self.add_kpi(kpi)
        if duplicates:
            _num_duplicates = len(duplicates)
            logger.warning(f'{_num_duplicates} duplicates found and not added again or overwritten in {self.name}. ({duplicates[:3]}...)')

    def add_kpi(self, kpi: KPI):
        """
        Add a single KPI to this dataset's KPI collection.

        Args:
            kpi: KPI instance, factory, or class to add
        """
        self.kpi_collection.add(kpi)

    def clear_kpi_collection(self):
        """Clear the KPI collection."""
        from mesqual.kpis.collection import KPICollection
        self.kpi_collection = KPICollection()

    @property
    def attributes(self) -> dict:
        """
        Access the metadata attributes dictionary for this dataset.

        Attributes store scenario-level metadata such as configuration parameters,
        simulation settings, or descriptive labels. These are useful for filtering,
        grouping, and annotating datasets in collections.

        Returns:
            dict: Dictionary of attribute key-value pairs. Keys are strings,
                values are primitive types (bool, int, float, str).

        Example:
            Accessing and using attributes::

                >>> dataset.attributes
                {'year': 2030, 'scenario_type': 'high_renewable', 'carbon_price': 50.0}

                >>> # Filter datasets in a collection by attribute
                >>> high_re_scenarios = [d for d in collection if d.attributes.get('scenario_type') == 'high_renewable']

        See Also:
            - `set_attributes()` - Set attribute values
            - `get_attributes_series()` - Convert attributes to pandas Series
        """
        return self._attributes

    def get_attributes_series(self) -> pd.Series:
        """
        Convert dataset attributes to a pandas Series.

        Creates a Series with attribute names as the index and attribute
        values as data. The Series name is set to the dataset name, making
        it suitable for concatenation with other datasets' attribute series.

        Returns:
            pd.Series: Series containing attribute values, indexed by attribute
                names, with the dataset name as the Series name

        Example:
            Converting attributes and combining across datasets::

                >>> dataset.set_attributes(year=2030, carbon_price=50.0)
                >>> series = dataset.get_attributes_series()
                >>> series
                year            2030
                carbon_price    50.0
                Name: Scenario_A, dtype: object

                >>> # Combine attributes from multiple datasets
                >>> attr_df = pd.concat([d.get_attributes_series() for d in collection], axis=1).T

        See Also:
            - `attributes` - Access raw attributes dictionary
            - `set_attributes()` - Set attribute values
        """
        att_series = pd.Series(self.attributes, name=self.name)
        return att_series

    def set_attributes(self, **kwargs):
        """
        Set one or more metadata attributes on this dataset.

        Attributes are key-value pairs that store scenario metadata. They must
        use string keys and primitive values (bool, int, float, str) to ensure
        serializability and consistent comparison behavior.

        Args:
            **kwargs: Attribute key-value pairs to set. Keys must be strings,
                values must be bool, int, float, or str.

        Raises:
            TypeError: If any key is not a string
            TypeError: If any value is not bool, int, float, or str

        Example:
            Setting scenario metadata::

                >>> dataset.set_attributes(
                ...     year=2030,
                ...     scenario_type='high_renewable',
                ...     carbon_price=50.0,
                ...     includes_nuclear=True
                ... )

                >>> # Access the attributes
                >>> dataset.attributes['year']
                2030

        See Also:
            - `attributes` - Access attributes dictionary
            - `get_attributes_series()` - Convert to pandas Series
        """
        for key, value in kwargs.items():
            if not isinstance(key, str):
                raise TypeError(f'Attribute keys must be of type str. Your key {key} is of type {type(key)}.')
            if not isinstance(value, (bool, int, float, str)):
                raise TypeError(
                    f'Attribute values must be of type (bool, int, flaot, str). '
                    f'Your value for {key} ({value}) is of type {type(value)}.'
                )
            self._attributes[key] = value

    @property
    def parent_dataset(self) -> 'DatasetLinkCollection':
        """
        Access the parent collection linking this interpreter to sibling interpreters.

        The parent_dataset provides the bridge between specialized flag interpreters
        within a single platform dataset or study. It is NOT used to link scenarios
        together, but rather to enable modular interpreter architectures where each
        interpreter handles a specific subset of flags and can access flags from
        sibling interpreters through the shared parent.

        Architecture Pattern:
            A typical platform dataset (e.g., PyPSADataset, PlexosDataset) is
            implemented as a DatasetLinkCollection containing multiple specialized
            interpreters:

            - **ModelInterpreter**: Provides static model data (e.g., 'generators', 'buses')
            - **TimeSeriesInterpreter**: Provides time-series data (e.g., 'generators_t.p')
            - **ObjectiveInterpreter**: Provides objective function values
            - **Custom Interpreters**: Study-specific derived or corrected variables

            Each interpreter is a child dataset within the parent DatasetLinkCollection.
            Through parent_dataset, any interpreter can fetch flags from siblings without
            needing direct references or circular dependencies.

        Why This Pattern:
            - **Separation of Concerns**: Each interpreter focuses on one data type
            - **Modularity**: Add/remove/replace interpreters independently
            - **Dependency Resolution**: Interpreters can depend on each other's flags
            - **Study Customization**: Override or extend specific interpreters per study
            - **Maintainability**: Changes to one interpreter don't affect others

        Returns:
            DatasetLinkCollection: The parent collection that orchestrates flag
                routing between this interpreter and its siblings

        Raises:
            RuntimeError: If accessed before the parent has been assigned (typically
                happens if an interpreter is used standalone instead of within a
                DatasetLinkCollection)

        Example:
            Custom interpreter combining flags from sibling interpreters:

                >>> # Study-specific interpreter for renewable generation per bidding zone
                >>> class RESGenerationPerBZInterpreter(PlatformBaseInterpreterDataset):
                ...     @property
                ...     def accepted_flags(self):
                ...         return {'generators_t.res_generation_per_bz'}
                ...
                ...     def _fetch(self, flag, config, **kwargs):
                ...         # Fetch time series from TimeSeriesInterpreter sibling
                ...         generation = self.parent_dataset.fetch('generators_t.p')
                ...
                ...         # Fetch model data from ModelInterpreter sibling
                ...         gen_model = self.parent_dataset.fetch('generators.model')
                ...
                ...         # Filter to RES generators and aggregate by bidding zone
                ...         res_gens = gen_model[gen_model['carrier'].isin(['solar', 'wind'])]
                ...         res_generation = generation[res_gens.index]
                ...         return res_generation.groupby(gen_model['bidding_zone'], axis=1).sum()

            Accessing specific sibling interpreter by type:

                >>> class SomeCustomPTDFMatrixFormat(PlexosImporterBase):
                ...     def _fetch(self, flag, config, **kwargs):
                ...         # Get specific sibling interpreter
                ...         ptdf_ds = self.parent_dataset.get_dataset_by_type(
                ...             PlexosPTDFInterpreter
                ...         )
                ...
                ...         # Or fetch through parent (automatically routes to correct sibling)
                ...         headers = self.parent_dataset.fetch('PTDF.Headers')
                ...         factors = self.parent_dataset.fetch('PTDF.Factors')
                ...
                ...         # Process and return derived flag
                ...         return self._custom_ptdf_process(headers, factors)

            Study-specific correction of platform variables:

                >>> class LineFlows(MyStudyVariables):
                ...     '''Replaces specific flows with external data.'''
                ...
                ...     def _fetch(self, flag, config, **kwargs):
                ...         # Get reference dataset (sibling interpreter) for this flag
                ...         reference_ds = self._get_reference_dataset_for_flag(flag)
                ...
                ...         # Fetch original data from sibling
                ...         df = reference_ds.fetch(flag, config, **kwargs)
                ...
                ...         # Apply study-specific corrections
                ...         if self.parent_dataset.attributes['manual_line_flow_correction']:
                ...             df = self._apply_historical_corrections(df)
                ...
                ...         return df

        See Also:
            - `DatasetLinkCollection` - Parent collection class that orchestrates routing
            - `get_dataset_by_type()` - Method to access specific sibling by type
        """
        if self._parent_dataset is None:
            raise RuntimeError(f"Parent dataset called without / before assignment.")
        return self._parent_dataset

    @parent_dataset.setter
    def parent_dataset(self, parent_dataset: 'DatasetLinkCollection'):
        """
        Set the parent collection for this dataset.

        Args:
            parent_dataset: The DatasetLinkCollection that will contain this dataset

        Raises:
            TypeError: If parent_dataset is not a DatasetLinkCollection instance
        """
        from mesqual.datasets.dataset_collection import DatasetLinkCollection
        if not isinstance(parent_dataset, DatasetLinkCollection):
            raise TypeError(f"Parent parent_dataset must be of type {DatasetLinkCollection.__name__}")
        self._parent_dataset = parent_dataset

    @flag_must_be_accepted
    def required_flags_for_flag(self, flag: FlagType) -> set[FlagType]:
        """
        Get the set of flags required to compute a given flag.

        For derived or computed flags, this method returns the set of source
        flags that must be available to produce the requested data. This is
        useful for understanding data dependencies and ensuring prerequisite
        data exists.

        Args:
            flag: The flag to check requirements for. Must be in accepted_flags.

        Returns:
            set[FlagType]: Set of flags that are required to compute the given flag.
                Returns an empty set if the flag has no dependencies.

        Raises:
            ValueError: If the flag is not accepted by this dataset

        Example:
            Checking data dependencies::

                >>> # A derived flag might depend on multiple source flags
                >>> deps = dataset.required_flags_for_flag('total_generation')
                >>> deps
                {'generators_t.p', 'generators'}

        See Also:
            - `_required_flags_for_flag()` - Abstract method to implement
            - `flag_is_accepted()` - Check if a flag is valid
        """
        return self._required_flags_for_flag(flag)

    @abstractmethod
    def _required_flags_for_flag(self, flag: FlagType) -> set[FlagType]:
        """
        Abstract method to define flag dependencies.

        Subclasses must implement this method to specify which flags are
        required to compute a given flag. This enables dependency tracking
        and validation of data availability.

        Args:
            flag: The flag to get requirements for

        Returns:
            set[FlagType]: Set of prerequisite flags. Return empty set for
                flags with no dependencies.

        Note:
            This is a protected method called by required_flags_for_flag().
            The public method handles flag validation before calling this.
        """
        return set()

    def _post_process_data(
            self,
            data: pd.Series | pd.DataFrame,
            flag: FlagType,
            config: DatasetConfigType
    ) -> pd.Series | pd.DataFrame:
        """
        Apply standard post-processing to fetched data.

        Performs configuration-driven data cleaning and normalization after
        the raw data is fetched. This includes removing duplicate indices
        and sorting datetime indices.

        Args:
            data: Raw data from _fetch()
            flag: The flag that was fetched (for logging)
            config: Effective configuration controlling post-processing behavior

        Returns:
            Post-processed data with duplicates removed and/or sorted as configured

        Note:
            This method is called automatically by fetch(). Subclasses can
            override to add custom post-processing while calling super().
        """
        if config.remove_duplicate_indices and any(data.index.duplicated()):
            logger.info(
                f'For some reason your data-set {self.name} returns an object with duplicate indices for flag {flag}.\n'
                f'We manually remove duplicate indices. Please make sure your data importer / converter is set up '
                f'appropriately and that your raw data does not contain duplicate indices. \n'
                f'We will keep the first element of every duplicated index.'
            )
            data = data.loc[~data.index.duplicated()]
        if config.auto_sort_datetime_index and isinstance(data.index, pd.DatetimeIndex):
            data = data.sort_index()
        return data

    def _prepare_config(self, config: dict | DatasetConfigType = None) -> DatasetConfigType:
        """
        Prepare the effective configuration for a fetch operation.

        Resolves the final configuration by merging the provided config override
        with the dataset's instance config (which already includes class-level
        and base defaults via DatasetConfigManager).

        Args:
            config: Optional override configuration. Can be:
                - None: Use instance config as-is
                - dict: Create temp config from dict and merge
                - DatasetConfig: Merge directly with instance config

        Returns:
            The fully resolved configuration for the fetch operation.

        Raises:
            TypeError: If config is neither None, dict, nor DatasetConfig.
        """
        if config is None:
            return self.instance_config

        if isinstance(config, dict):
            temp_config = self.get_config_type()()
            temp_config.__dict__.update(config)
            return self.instance_config.merge(temp_config)

        from mesqual.datasets.dataset_config import DatasetConfig
        if isinstance(config, DatasetConfig):
            return self.instance_config.merge(config)

        raise TypeError(f"Config must be dict or {DatasetConfig.__name__}, got {type(config)}")

    @abstractmethod
    def _fetch(self, flag: FlagType, effective_config: DatasetConfigType, **kwargs) -> pd.Series | pd.DataFrame:
        """
        Abstract method implementing the actual data retrieval logic.

        Subclasses must implement this method to define how data is retrieved
        for each flag. This is the core data access method that fetch() calls
        after configuration resolution and before post-processing.

        Args:
            flag: The validated flag to fetch data for
            effective_config: The fully resolved configuration for this operation
            **kwargs: Additional implementation-specific arguments

        Returns:
            DataFrame or Series containing the requested data. The returned
            data will be post-processed by _post_process_data() before being
            returned to the caller.

        Note:
            - This method should not perform flag validation (handled by fetch())
            - This method should not apply post-processing (handled separately)
            - This method should not handle caching (handled by fetch())

        Example:
            Implementing in a subclass::

                def _fetch(self, flag, effective_config, **kwargs):
                    if flag == 'generators':
                        return self.network.generators
                    elif flag == 'generators_t.p':
                        return self.network.generators_t.p
                    # ... handle other flags
        """
        return pd.DataFrame()

    def fetch_multiple_flags_and_concat(
            self,
            flags: Iterable[FlagType],
            concat_axis: int = 1,
            concat_level_name: str = 'variable',
            concat_level_at_top: bool = True,
            config: dict | DatasetConfigType = None,
            **kwargs
    ) -> Union[pd.Series, pd.DataFrame]:
        """
        Fetch multiple flags and concatenate results into a single DataFrame.

        Convenience method for retrieving data from multiple flags and combining
        them into a single DataFrame with a MultiIndex. Useful for comparative
        analysis of multiple variables or creating wide-format data structures.

        Args:
            flags: Iterable of flags to fetch and concatenate
            concat_axis: Axis along which to concatenate (0=rows, 1=columns).
                Default is 1 (columns).
            concat_level_name: Name for the new MultiIndex level identifying
                the source flag. Default is 'variable'.
            concat_level_at_top: If True, the flag level is the outermost level
                in the MultiIndex. If False, it's moved to the innermost level.
                Default is True.
            config: Optional configuration override (see fetch() for details)
            **kwargs: Additional arguments passed to each fetch() call

        Returns:
            DataFrame with concatenated data and a MultiIndex identifying the
            source flag for each section

        Example:
            Fetching and comparing multiple variables::

                >>> # Fetch power output and efficiency for generators
                >>> combined = dataset.fetch_multiple_flags_and_concat(
                ...     flags=['generators_t.p', 'generators_t.efficiency'],
                ...     concat_level_name='metric'
                ... )
                >>> # Result has MultiIndex columns: (metric, generator_name)

                >>> # Row-wise concatenation
                >>> stacked = dataset.fetch_multiple_flags_and_concat(
                ...     flags=['bus_A_prices', 'bus_B_prices'],
                ...     concat_axis=0,
                ...     concat_level_name='bus'
                ... )

        See Also:
            - `fetch()` - Single flag data retrieval
            - `fetch_filter_groupby_agg()` - Fetch with filtering and aggregation
        """
        dfs = {
            str(flag): self.fetch(flag, config, **kwargs)
            for flag in flags
        }
        df = pd.concat(
            dfs,
            axis=concat_axis,
            names=[concat_level_name],
        )
        if not concat_level_at_top:
            ax = df.axes[concat_axis]
            ax = ax.reorder_levels(list(range(1, ax.nlevels)) + [0])
            df.axes[concat_axis] = ax
        return df

    def fetch_filter_groupby_agg(
            self,
            flag: FlagType,
            model_filter_query: str = None,
            prop_groupby: str | list[str] = None,
            prop_groupby_agg: str = None,
            config: dict | DatasetConfigType = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        """
        Fetch data with model-based filtering, grouping, and aggregation.

        Provides a powerful one-line method for common data analysis patterns:
        filter time series by model properties, group by categories, and
        aggregate results. Requires a flag index with model mappings.

        Args:
            flag: Data flag to fetch (must have a linked model flag)
            model_filter_query: Pandas query string to filter based on model
                properties. Applied to the linked model DataFrame.
                Example: "carrier == 'solar'" or "p_nom > 100"
            prop_groupby: Model property or list of properties to group by.
                Adds these as MultiIndex levels and groups the data.
                Example: 'carrier' or ['carrier', 'bus']
            prop_groupby_agg: Aggregation function to apply after grouping.
                Standard pandas aggregation strings like 'sum', 'mean', 'max'.
                Only used if prop_groupby is specified.
            config: Optional configuration override (see fetch() for details)
            **kwargs: Additional arguments passed to fetch()

        Returns:
            Filtered and/or aggregated data. If prop_groupby is specified without
            prop_groupby_agg, returns a DataFrameGroupBy object.

        Raises:
            RuntimeError: If the flag has no linked model flag in the flag index

        Example:
            Common analysis patterns::

                >>> # Filter generators to only solar, sum by carrier
                >>> solar_gen = dataset.fetch_filter_groupby_agg(
                ...     'generators_t.p',
                ...     model_filter_query="carrier == 'solar'",
                ...     prop_groupby='carrier',
                ...     prop_groupby_agg='sum'
                ... )

                >>> # Group all generation by carrier and bus
                >>> by_carrier_bus = dataset.fetch_filter_groupby_agg(
                ...     'generators_t.p',
                ...     prop_groupby=['carrier', 'bus'],
                ...     prop_groupby_agg='sum'
                ... )

                >>> # Filter to large generators only
                >>> large_gens = dataset.fetch_filter_groupby_agg(
                ...     'generators_t.p',
                ...     model_filter_query="p_nom >= 500"
                ... )

        See Also:
            - `fetch()` - Basic data retrieval
            - [Pandas Utils](../utils/pandas_utils/index.md) - Underlying filter/group utilities
        """
        model_flag = self.flag_index.get_linked_model_flag(flag)
        if not model_flag:
            raise RuntimeError(f'FlagIndex could not successfully map flag {flag} to a model flag.')

        from mesqual.utils import pandas_utils

        data = self.fetch(flag, config, **kwargs)
        from mesqual.datasets import DatasetCollection
        if isinstance(self, DatasetCollection):
            # TODO: implement MultiIndex capabilities into filter_by_model_query / prepend_model_prop_levels, then fetch_merged is not needed
            model_df = self.fetch_merged(model_flag, config, **kwargs)
        else:
            model_df = self.fetch(model_flag, config, **kwargs)

        if model_filter_query:
            data = pandas_utils.filter_by_model_query(data, model_df, query=model_filter_query)

        if prop_groupby:
            if isinstance(prop_groupby, str):
                prop_groupby = [prop_groupby]
            data = pandas_utils.prepend_model_prop_levels(data, model_df, *prop_groupby)
            data = data.groupby(prop_groupby)
            if prop_groupby_agg:
                data = data.agg(prop_groupby_agg)
        elif prop_groupby_agg:
            logger.warning(
                f"You provided a prop_groupby_agg operation, but didn't provide prop_groupby. "
                f"No aggregation performed."
            )
        return data

    @classmethod
    def get_flag_type(cls) -> Type[FlagType]:
        """
        Get the flag type class for this dataset type.

        Returns the type used for data flags in this dataset class. Subclasses
        can override to specify a custom flag type for type checking and
        validation.

        Returns:
            Type[FlagType]: The flag type class (default: FlagTypeProtocol)

        Note:
            Override in subclasses that use custom flag types.
        """
        from mesqual.flag.flag import FlagTypeProtocol
        return FlagTypeProtocol

    @classmethod
    def get_flag_index_type(cls) -> Type[FlagIndexType]:
        """
        Get the flag index type class for this dataset type.

        Returns the type used for the flag index in this dataset class.
        Subclasses can override to specify a custom flag index implementation.

        Returns:
            Type[FlagIndexType]: The flag index type class (default: FlagIndex)

        Note:
            Override in subclasses that use platform-specific flag indices.
        """
        from mesqual.flag.flag_index import FlagIndex
        return FlagIndex

    @classmethod
    def get_config_type(cls) -> Type[DatasetConfigType]:
        """
        Get the configuration type class for this dataset type.

        Returns the DatasetConfig subclass used by this dataset. Platform
        interfaces typically override this to return their extended config
        class with platform-specific options.

        Returns:
            Type[DatasetConfigType]: The config type class (default: DatasetConfig)

        Example:
            Creating a config instance for this dataset type::

                >>> ConfigClass = MyDataset.get_config_type()
                >>> config = ConfigClass(use_database=True)

        Note:
            Override in platform dataset subclasses to return platform-specific
            config types with additional options.
        """
        from mesqual.datasets.dataset_config import DatasetConfig
        return DatasetConfig

    @property
    def instance_config(self) -> DatasetConfigType:
        """
        Get the effective configuration for this dataset instance.

        Computes the merged configuration by combining:
        1. Base config defaults
        2. Class-level config (set via set_class_config)
        3. Instance-level config (passed to __init__ or set via set_instance_config)

        Later settings override earlier ones. This is the configuration used
        by fetch() unless overridden by a fetch-time config parameter.

        Returns:
            DatasetConfigType: The fully resolved configuration for this instance

        Example:
            Inspecting current configuration::

                >>> config = dataset.instance_config
                >>> print(config.use_database)
                True
                >>> print(config.auto_sort_datetime_index)
                True

        See Also:
            - `set_instance_config()` - Replace instance configuration
            - `set_class_config()` - Set class-level defaults
            - DatasetConfigManager - Configuration management system
        """
        from mesqual.datasets.dataset_config import DatasetConfigManager
        return DatasetConfigManager.get_effective_config(self.__class__, self._config)

    def set_instance_config(self, config: DatasetConfigType) -> None:
        """
        Replace the instance-level configuration for this dataset.

        Sets the configuration that will be merged with class-level defaults
        to produce the effective configuration used by fetch().

        Args:
            config: New configuration object to use for this instance

        Example:
            Setting a custom configuration::

                >>> from mesqual.datasets import DatasetConfig
                >>> config = DatasetConfig(use_database=False, auto_sort_datetime_index=False)
                >>> dataset.set_instance_config(config)

        See Also:
            - `instance_config` - Get the effective configuration
            - `set_instance_config_kwargs()` - Update individual settings
            - `set_class_config()` - Set class-level defaults
        """
        self._config = config

    def set_instance_config_kwargs(self, **kwargs) -> None:
        """
        Update individual configuration settings on this instance.

        Modifies specific attributes of the existing instance configuration
        without replacing the entire config object. Useful for tweaking
        individual settings.

        Args:
            **kwargs: Configuration attribute names and values to set

        Example:
            Adjusting specific settings::

                >>> dataset.set_instance_config_kwargs(
                ...     use_database=True,
                ...     auto_sort_datetime_index=False
                ... )

        Warning:
            Raises AttributeError if the config attribute doesn't exist.

        See Also:
            - `set_instance_config()` - Replace entire configuration
            - `instance_config` - Get the effective configuration
        """
        for key, value in kwargs.items():
            setattr(self._config, key, value)

    @classmethod
    def set_class_config(cls, config: DatasetConfigType) -> None:
        """
        Set the class-level configuration for all instances of this dataset type.

        Class-level configuration serves as the default for all instances of
        this class. Instance-level configuration (set via set_instance_config)
        can override these defaults.

        Args:
            config: Configuration object to use as class-level defaults

        Example:
            Setting defaults for all instances::

                >>> from mesqual.datasets import DatasetConfig
                >>> config = DatasetConfig(use_database=True)
                >>> MyDataset.set_class_config(config)
                >>>
                >>> # All new instances will use database by default
                >>> ds1 = MyDataset()  # uses database
                >>> ds2 = MyDataset()  # uses database

        Note:
            This affects all instances of the class, including existing ones
            that haven't overridden the setting at instance level.

        See Also:
            - `set_instance_config()` - Override for specific instances
            - DatasetConfigManager - Configuration management system
        """
        from mesqual.datasets.dataset_config import DatasetConfigManager
        DatasetConfigManager.set_class_config(cls, config)

    def __str__(self) -> str:
        return self.name

    def __hash__(self):
        return hash((self.name, self._config))

accepted_flags abstractmethod property

accepted_flags: set[FlagType]

Set of all flags accepted by this dataset.

This abstract property must be implemented by all concrete dataset classes to define which data flags can be fetched from the dataset.

Returns:

Type Description
set[FlagType]

Set of flags that can be used with the fetch() method

Example:

>>> print(dataset.accepted_flags)
    {'buses', 'buses_t.marginal_price', 'generators', 'generators_t.p', ...}

flag_index property

flag_index: FlagIndexType

Access the flag index for this dataset.

The flag index provides flag mapping, validation, and metadata lookup capabilities. It enables features like dot notation fetching, flag-to-model mapping, and flag categorization.

If no flag index was configured, returns an EmptyFlagIndex and logs an informational message when accessed.

Returns:

Name Type Description
FlagIndexType FlagIndexType

The configured flag index or EmptyFlagIndex if none set

Note

For full flag index functionality (model mapping, flag categorization), ensure a proper flag index is set during dataset initialization.

See Also

database property

database: Database | None

Access the caching database for this dataset.

The database provides persistent caching for expensive fetch operations. When configured, the fetch() method automatically checks the database before computing data and stores results for future access.

Returns:

Type Description
Database | None

Database | None: The configured database instance, or None if caching is not enabled for this dataset

See Also
  • Database configuration and caching behavior
  • Uses database for automatic caching when available (see fetch() method)

attributes property

attributes: dict

Access the metadata attributes dictionary for this dataset.

Attributes store scenario-level metadata such as configuration parameters, simulation settings, or descriptive labels. These are useful for filtering, grouping, and annotating datasets in collections.

Returns:

Name Type Description
dict dict

Dictionary of attribute key-value pairs. Keys are strings, values are primitive types (bool, int, float, str).

Example

Accessing and using attributes::

>>> dataset.attributes
{'year': 2030, 'scenario_type': 'high_renewable', 'carbon_price': 50.0}

>>> # Filter datasets in a collection by attribute
>>> high_re_scenarios = [d for d in collection if d.attributes.get('scenario_type') == 'high_renewable']
See Also
  • set_attributes() - Set attribute values
  • get_attributes_series() - Convert attributes to pandas Series

parent_dataset property writable

parent_dataset: 'DatasetLinkCollection'

Access the parent collection linking this interpreter to sibling interpreters.

The parent_dataset provides the bridge between specialized flag interpreters within a single platform dataset or study. It is NOT used to link scenarios together, but rather to enable modular interpreter architectures where each interpreter handles a specific subset of flags and can access flags from sibling interpreters through the shared parent.

Architecture Pattern

A typical platform dataset (e.g., PyPSADataset, PlexosDataset) is implemented as a DatasetLinkCollection containing multiple specialized interpreters:

  • ModelInterpreter: Provides static model data (e.g., 'generators', 'buses')
  • TimeSeriesInterpreter: Provides time-series data (e.g., 'generators_t.p')
  • ObjectiveInterpreter: Provides objective function values
  • Custom Interpreters: Study-specific derived or corrected variables

Each interpreter is a child dataset within the parent DatasetLinkCollection. Through parent_dataset, any interpreter can fetch flags from siblings without needing direct references or circular dependencies.

Why This Pattern
  • Separation of Concerns: Each interpreter focuses on one data type
  • Modularity: Add/remove/replace interpreters independently
  • Dependency Resolution: Interpreters can depend on each other's flags
  • Study Customization: Override or extend specific interpreters per study
  • Maintainability: Changes to one interpreter don't affect others

Returns:

Name Type Description
DatasetLinkCollection 'DatasetLinkCollection'

The parent collection that orchestrates flag routing between this interpreter and its siblings

Raises:

Type Description
RuntimeError

If accessed before the parent has been assigned (typically happens if an interpreter is used standalone instead of within a DatasetLinkCollection)

Example

Custom interpreter combining flags from sibling interpreters:

>>> # Study-specific interpreter for renewable generation per bidding zone
>>> class RESGenerationPerBZInterpreter(PlatformBaseInterpreterDataset):
...     @property
...     def accepted_flags(self):
...         return {'generators_t.res_generation_per_bz'}
...
...     def _fetch(self, flag, config, **kwargs):
...         # Fetch time series from TimeSeriesInterpreter sibling
...         generation = self.parent_dataset.fetch('generators_t.p')
...
...         # Fetch model data from ModelInterpreter sibling
...         gen_model = self.parent_dataset.fetch('generators.model')
...
...         # Filter to RES generators and aggregate by bidding zone
...         res_gens = gen_model[gen_model['carrier'].isin(['solar', 'wind'])]
...         res_generation = generation[res_gens.index]
...         return res_generation.groupby(gen_model['bidding_zone'], axis=1).sum()

Accessing specific sibling interpreter by type:

>>> class SomeCustomPTDFMatrixFormat(PlexosImporterBase):
...     def _fetch(self, flag, config, **kwargs):
...         # Get specific sibling interpreter
...         ptdf_ds = self.parent_dataset.get_dataset_by_type(
...             PlexosPTDFInterpreter
...         )
...
...         # Or fetch through parent (automatically routes to correct sibling)
...         headers = self.parent_dataset.fetch('PTDF.Headers')
...         factors = self.parent_dataset.fetch('PTDF.Factors')
...
...         # Process and return derived flag
...         return self._custom_ptdf_process(headers, factors)

Study-specific correction of platform variables:

>>> class LineFlows(MyStudyVariables):
...     '''Replaces specific flows with external data.'''
...
...     def _fetch(self, flag, config, **kwargs):
...         # Get reference dataset (sibling interpreter) for this flag
...         reference_ds = self._get_reference_dataset_for_flag(flag)
...
...         # Fetch original data from sibling
...         df = reference_ds.fetch(flag, config, **kwargs)
...
...         # Apply study-specific corrections
...         if self.parent_dataset.attributes['manual_line_flow_correction']:
...             df = self._apply_historical_corrections(df)
...
...         return df
See Also
  • DatasetLinkCollection - Parent collection class that orchestrates routing
  • get_dataset_by_type() - Method to access specific sibling by type

instance_config property

instance_config: DatasetConfigType

Get the effective configuration for this dataset instance.

Computes the merged configuration by combining: 1. Base config defaults 2. Class-level config (set via set_class_config) 3. Instance-level config (passed to init or set via set_instance_config)

Later settings override earlier ones. This is the configuration used by fetch() unless overridden by a fetch-time config parameter.

Returns:

Name Type Description
DatasetConfigType DatasetConfigType

The fully resolved configuration for this instance

Example

Inspecting current configuration::

>>> config = dataset.instance_config
>>> print(config.use_database)
True
>>> print(config.auto_sort_datetime_index)
True
See Also
  • set_instance_config() - Replace instance configuration
  • set_class_config() - Set class-level defaults
  • DatasetConfigManager - Configuration management system

__init__

__init__(name: str = None, parent_dataset: Dataset = None, flag_index: FlagIndexType = None, attributes: dict = None, database: Database = None, config: DatasetConfigType = None)

Initialize a new Dataset instance.

Parameters:

Name Type Description Default
name str

Human-readable identifier. If None, auto-generates from class name

None
parent_dataset Dataset

Optional parent dataset for hierarchical relationships

None
flag_index FlagIndexType

Index for mapping and validating data flags

None
attributes dict

Dictionary of metadata attributes for the dataset

None
database Database

Optional database for caching expensive computations

None
config DatasetConfigType

Configuration object controlling dataset behavior

None
Source code in submodules/mesqual/mesqual/datasets/dataset.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def __init__(
        self,
        name: str = None,
        parent_dataset: Dataset = None,
        flag_index: FlagIndexType = None,
        attributes: dict = None,
        database: Database = None,
        config: DatasetConfigType = None
):
    """
    Initialize a new Dataset instance.

    Args:
        name: Human-readable identifier. If None, auto-generates from class name
        parent_dataset: Optional parent dataset for hierarchical relationships
        flag_index: Index for mapping and validating data flags
        attributes: Dictionary of metadata attributes for the dataset
        database: Optional database for caching expensive computations
        config: Configuration object controlling dataset behavior
    """
    self.name = name or f'{self.__class__.__name__}_{str(id(self))}'
    self._flag_index = flag_index or EmptyFlagIndex()
    self._parent_dataset = parent_dataset
    self._attributes: dict = attributes or dict()
    self._database = database
    self._config = config

    from mesqual.kpis.collection import KPICollection
    self.kpi_collection: KPICollection = KPICollection()

fetch

fetch(flag: FlagType, config: dict | DatasetConfigType = None, **kwargs) -> Series | DataFrame

Fetch data associated with a specific flag.

This is the primary method for data access in MESQUAL datasets. It provides a unified interface for retrieving data regardless of the underlying source or dataset type. The method includes automatic caching, post-processing, and configuration management.

Configuration Override Behavior:

The ``config`` parameter allows fetch-time overrides of dataset behavior.
These overrides are merged with the dataset's effective configuration
(which combines class-level and instance-level settings). Only non-None
values in the override will replace the existing settings.

The configuration resolution hierarchy (later overrides earlier):

1. Base config defaults
2. Class config (via DatasetConfigManager)
3. Instance config (passed to Dataset.__init__)
4. **Fetch-time config (this parameter)**

Parameters:

Name Type Description Default
flag FlagType

Data identifier flag (must be in accepted_flags)

required
config dict | DatasetConfigType

Optional configuration to override dataset defaults. Can be either:

  • dict: Quick way to override specific settings. Keys must match config attribute names (e.g., use_database, auto_sort_datetime_index). Platform-specific options are also supported if the dataset uses an extended config.

  • DatasetConfig instance: Full config object for type safety. Must be compatible with the dataset's config type.

None
**kwargs

Additional keyword arguments passed to the underlying data fetching implementation

{}

Returns:

Type Description
Series | DataFrame

DataFrame or Series containing the requested data

Raises:

Type Description
ValueError

If the flag is not accepted by this dataset

Examples:

Basic usage::

>>> prices = dataset.fetch('buses_t.marginal_price')

Override base config options with a dict::

>>> # Skip database cache for this fetch
>>> prices = dataset.fetch(
...     'buses_t.marginal_price',
...     config=dict(use_database=False)
... )
>>>
>>> # Disable datetime sorting
>>> prices = dataset.fetch(
...     'generators_t.p',
...     config=dict(auto_sort_datetime_index=False)
... )

Override platform-specific options::

>>> # Platform configs may have additional options
>>> # e.g., a config with timestamp conversion setting
>>> data = dataset.fetch(
...     'some_flag',
...     config=dict(convert_period_enum_to_datetime_index=False)
... )

Override study-specific options::

>>> # Study-specific configs can add custom behavior
>>> # e.g., toggle custom data corrections
>>> data = dataset.fetch(
...     'some_flag',
...     config=dict(apply_custom_correction=False)
... )

Using a config object::

>>> from mesqual.datasets import DatasetConfig
>>> custom_config = DatasetConfig(
...     use_database=False,
...     auto_sort_datetime_index=False
... )
>>> prices = dataset.fetch('buses_t.marginal_price', config=custom_config)
Source code in submodules/mesqual/mesqual/datasets/dataset.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
@flag_must_be_accepted
def fetch(self, flag: FlagType, config: dict | DatasetConfigType = None, **kwargs) -> pd.Series | pd.DataFrame:
    """
    Fetch data associated with a specific flag.

    This is the primary method for data access in MESQUAL datasets. It provides
    a unified interface for retrieving data regardless of the underlying source
    or dataset type. The method includes automatic caching, post-processing,
    and configuration management.

    Configuration Override Behavior:

        The ``config`` parameter allows fetch-time overrides of dataset behavior.
        These overrides are merged with the dataset's effective configuration
        (which combines class-level and instance-level settings). Only non-None
        values in the override will replace the existing settings.

    The configuration resolution hierarchy (later overrides earlier):

        1. Base config defaults
        2. Class config (via DatasetConfigManager)
        3. Instance config (passed to Dataset.__init__)
        4. **Fetch-time config (this parameter)**

    Args:
        flag: Data identifier flag (must be in accepted_flags)
        config: Optional configuration to override dataset defaults.
            Can be either:

            - **dict**: Quick way to override specific settings. Keys must
              match config attribute names (e.g., ``use_database``,
              ``auto_sort_datetime_index``). Platform-specific options
              are also supported if the dataset uses an extended config.

            - **DatasetConfig instance**: Full config object for type safety.
              Must be compatible with the dataset's config type.

        **kwargs: Additional keyword arguments passed to the underlying
            data fetching implementation

    Returns:
        DataFrame or Series containing the requested data

    Raises:
        ValueError: If the flag is not accepted by this dataset

    Examples:
        Basic usage::

            >>> prices = dataset.fetch('buses_t.marginal_price')

        Override base config options with a dict::

            >>> # Skip database cache for this fetch
            >>> prices = dataset.fetch(
            ...     'buses_t.marginal_price',
            ...     config=dict(use_database=False)
            ... )
            >>>
            >>> # Disable datetime sorting
            >>> prices = dataset.fetch(
            ...     'generators_t.p',
            ...     config=dict(auto_sort_datetime_index=False)
            ... )

        Override platform-specific options::

            >>> # Platform configs may have additional options
            >>> # e.g., a config with timestamp conversion setting
            >>> data = dataset.fetch(
            ...     'some_flag',
            ...     config=dict(convert_period_enum_to_datetime_index=False)
            ... )

        Override study-specific options::

            >>> # Study-specific configs can add custom behavior
            >>> # e.g., toggle custom data corrections
            >>> data = dataset.fetch(
            ...     'some_flag',
            ...     config=dict(apply_custom_correction=False)
            ... )

        Using a config object::

            >>> from mesqual.datasets import DatasetConfig
            >>> custom_config = DatasetConfig(
            ...     use_database=False,
            ...     auto_sort_datetime_index=False
            ... )
            >>> prices = dataset.fetch('buses_t.marginal_price', config=custom_config)
    """
    effective_config = self._prepare_config(config)
    use_database = self._database is not None and effective_config.use_database

    if use_database:
        if self._database.key_is_up_to_date(self, flag, config=effective_config, **kwargs):
            return self._database.get(self, flag, config=effective_config, **kwargs)

    raw_data = self._fetch(flag, effective_config, **kwargs)
    processed_data = self._post_process_data(raw_data, flag, effective_config)

    if use_database:
        self._database.set(self, flag, config=effective_config, value=processed_data, **kwargs)

    return processed_data.copy()

get_accepted_flags_containing_x

get_accepted_flags_containing_x(x: str, match_case: bool = False) -> set[FlagType]

Find all accepted flags containing a specific substring.

Useful for discovering related data flags or filtering flags by category.

Parameters:

Name Type Description Default
x str

Substring to search for in flag names

required
match_case bool

If True, performs case-sensitive search. Default is False.

False

Returns:

Type Description
set[FlagType]

Set of accepted flags containing the substring

Example:

>>> ds = PyPSADataset()
>>> ds.get_accepted_flags_containing_x('generators')
    {'generators', 'generators_t.p', 'generators_t.efficiency', ...}
>>> ds.get_accepted_flags_containing_x('BUSES', match_case=True)
    set()  # Empty because case doesn't match
Source code in submodules/mesqual/mesqual/datasets/dataset.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
def get_accepted_flags_containing_x(self, x: str, match_case: bool = False) -> set[FlagType]:
    """
    Find all accepted flags containing a specific substring.

    Useful for discovering related data flags or filtering flags by category.

    Args:
        x: Substring to search for in flag names
        match_case: If True, performs case-sensitive search. Default is False.

    Returns:
        Set of accepted flags containing the substring

    Example:

        >>> ds = PyPSADataset()
        >>> ds.get_accepted_flags_containing_x('generators')
            {'generators', 'generators_t.p', 'generators_t.efficiency', ...}
        >>> ds.get_accepted_flags_containing_x('BUSES', match_case=True)
            set()  # Empty because case doesn't match
    """
    if match_case:
        return {f for f in self.accepted_flags if x in str(f)}
    x_lower = x.lower()
    return {f for f in self.accepted_flags if x_lower in str(f).lower()}

flag_is_accepted

flag_is_accepted(flag: FlagType) -> bool

Boolean check whether a flag is accepted by the Dataset.

This method can be optionally overridden in any child-class in case you want to follow logic instead of the explicit set of accepted_flags.

Source code in submodules/mesqual/mesqual/datasets/dataset.py
287
288
289
290
291
292
293
294
def flag_is_accepted(self, flag: FlagType) -> bool:
    """
    Boolean check whether a flag is accepted by the Dataset.

    This method can be optionally overridden in any child-class
    in case you want to follow logic instead of the explicit set of accepted_flags.
    """
    return flag in self.accepted_flags

dotfetch

dotfetch() -> _DotNotationFetcher

Create a dot notation fetcher for intuitive flag access.

Returns a helper object that allows accessing nested data flags using Python attribute syntax instead of string-based flags. The fetcher accumulates attribute accesses and converts them to the appropriate flag when called.

Returns:

Name Type Description
_DotNotationFetcher _DotNotationFetcher

Helper object enabling chained attribute access

Example

Using dot notation instead of string flags::

>>> # Traditional string-based fetch
>>> prices = dataset.fetch('buses_t.marginal_price')

>>> # Equivalent dot notation fetch
>>> prices = dataset.dotfetch().buses_t.marginal_price()

>>> # Multi-level flag access
>>> gen_power = dataset.dotfetch().generators_t.p()
Source code in submodules/mesqual/mesqual/datasets/dataset.py
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
def dotfetch(self) -> _DotNotationFetcher:
    """
    Create a dot notation fetcher for intuitive flag access.

    Returns a helper object that allows accessing nested data flags using
    Python attribute syntax instead of string-based flags. The fetcher
    accumulates attribute accesses and converts them to the appropriate
    flag when called.

    Returns:
        _DotNotationFetcher: Helper object enabling chained attribute access

    Example:
        Using dot notation instead of string flags::

            >>> # Traditional string-based fetch
            >>> prices = dataset.fetch('buses_t.marginal_price')

            >>> # Equivalent dot notation fetch
            >>> prices = dataset.dotfetch().buses_t.marginal_price()

            >>> # Multi-level flag access
            >>> gen_power = dataset.dotfetch().generators_t.p()
    """
    return _DotNotationFetcher(self)

add_kpis_from_definitions

add_kpis_from_definitions(kpi_definitions: KPIDefinition | list[KPIDefinition])

Generate and add KPIs from one or more KPI definitions.

KPI definitions are templates that generate concrete KPI instances based on the dataset's structure. This method processes definitions and adds the resulting KPIs to the dataset's KPI collection.

Parameters:

Name Type Description Default
kpi_definitions KPIDefinition | list[KPIDefinition]

Single KPIDefinition or list of definitions. Each definition's generate_kpis() method is called with this dataset to produce KPI instances.

required
Example

Adding KPIs from definitions::

>>> from mesqual.kpis.definitions import TotalGenerationKPIDefinition
>>> dataset.add_kpis_from_definitions(TotalGenerationKPIDefinition())

>>> # Add multiple definitions at once
>>> definitions = [
...     TotalGenerationKPIDefinition(),
...     MarginalPriceKPIDefinition(),
... ]
>>> dataset.add_kpis_from_definitions(definitions)
See Also
  • add_kpi() - Add a single KPI directly
  • add_kpis() - Add multiple KPI instances
  • KPI Definitions - Base KPI definition class
Source code in submodules/mesqual/mesqual/datasets/dataset.py
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
def add_kpis_from_definitions(self, kpi_definitions: KPIDefinition | list[KPIDefinition]):
    """
    Generate and add KPIs from one or more KPI definitions.

    KPI definitions are templates that generate concrete KPI instances
    based on the dataset's structure. This method processes definitions
    and adds the resulting KPIs to the dataset's KPI collection.

    Args:
        kpi_definitions: Single KPIDefinition or list of definitions.
            Each definition's generate_kpis() method is called with
            this dataset to produce KPI instances.

    Example:
        Adding KPIs from definitions::

            >>> from mesqual.kpis.definitions import TotalGenerationKPIDefinition
            >>> dataset.add_kpis_from_definitions(TotalGenerationKPIDefinition())

            >>> # Add multiple definitions at once
            >>> definitions = [
            ...     TotalGenerationKPIDefinition(),
            ...     MarginalPriceKPIDefinition(),
            ... ]
            >>> dataset.add_kpis_from_definitions(definitions)

    See Also:
        - `add_kpi()` - Add a single KPI directly
        - `add_kpis()` - Add multiple KPI instances
        - [KPI Definitions](../kpis/definitions/base.md) - Base KPI definition class
    """
    from mesqual.kpis.definitions.base import KPIDefinition
    if isinstance(kpi_definitions, KPIDefinition):
        kpis = kpi_definitions.generate_kpis(self)
        self.add_kpis(kpis)
    else:
        for kpi_def in kpi_definitions:
            kpis = kpi_def.generate_kpis(self)
            self.add_kpis(kpis)

add_kpis

add_kpis(kpis: Iterable[KPI])

Add multiple KPIs to this dataset's KPI collection.

Parameters:

Name Type Description Default
kpis Iterable[KPI]

Iterable of KPI instances, factories, or classes to add

required
Source code in submodules/mesqual/mesqual/datasets/dataset.py
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
def add_kpis(self, kpis: Iterable[KPI]):
    """
    Add multiple KPIs to this dataset's KPI collection.

    Args:
        kpis: Iterable of KPI instances, factories, or classes to add
    """
    duplicates = []
    for kpi in kpis:
        if kpi in self.kpi_collection:
            duplicates.append(kpi)
        else:
            self.add_kpi(kpi)
    if duplicates:
        _num_duplicates = len(duplicates)
        logger.warning(f'{_num_duplicates} duplicates found and not added again or overwritten in {self.name}. ({duplicates[:3]}...)')

add_kpi

add_kpi(kpi: KPI)

Add a single KPI to this dataset's KPI collection.

Parameters:

Name Type Description Default
kpi KPI

KPI instance, factory, or class to add

required
Source code in submodules/mesqual/mesqual/datasets/dataset.py
429
430
431
432
433
434
435
436
def add_kpi(self, kpi: KPI):
    """
    Add a single KPI to this dataset's KPI collection.

    Args:
        kpi: KPI instance, factory, or class to add
    """
    self.kpi_collection.add(kpi)

clear_kpi_collection

clear_kpi_collection()

Clear the KPI collection.

Source code in submodules/mesqual/mesqual/datasets/dataset.py
438
439
440
441
def clear_kpi_collection(self):
    """Clear the KPI collection."""
    from mesqual.kpis.collection import KPICollection
    self.kpi_collection = KPICollection()

get_attributes_series

get_attributes_series() -> Series

Convert dataset attributes to a pandas Series.

Creates a Series with attribute names as the index and attribute values as data. The Series name is set to the dataset name, making it suitable for concatenation with other datasets' attribute series.

Returns:

Type Description
Series

pd.Series: Series containing attribute values, indexed by attribute names, with the dataset name as the Series name

Example

Converting attributes and combining across datasets::

>>> dataset.set_attributes(year=2030, carbon_price=50.0)
>>> series = dataset.get_attributes_series()
>>> series
year            2030
carbon_price    50.0
Name: Scenario_A, dtype: object

>>> # Combine attributes from multiple datasets
>>> attr_df = pd.concat([d.get_attributes_series() for d in collection], axis=1).T
See Also
  • attributes - Access raw attributes dictionary
  • set_attributes() - Set attribute values
Source code in submodules/mesqual/mesqual/datasets/dataset.py
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
def get_attributes_series(self) -> pd.Series:
    """
    Convert dataset attributes to a pandas Series.

    Creates a Series with attribute names as the index and attribute
    values as data. The Series name is set to the dataset name, making
    it suitable for concatenation with other datasets' attribute series.

    Returns:
        pd.Series: Series containing attribute values, indexed by attribute
            names, with the dataset name as the Series name

    Example:
        Converting attributes and combining across datasets::

            >>> dataset.set_attributes(year=2030, carbon_price=50.0)
            >>> series = dataset.get_attributes_series()
            >>> series
            year            2030
            carbon_price    50.0
            Name: Scenario_A, dtype: object

            >>> # Combine attributes from multiple datasets
            >>> attr_df = pd.concat([d.get_attributes_series() for d in collection], axis=1).T

    See Also:
        - `attributes` - Access raw attributes dictionary
        - `set_attributes()` - Set attribute values
    """
    att_series = pd.Series(self.attributes, name=self.name)
    return att_series

set_attributes

set_attributes(**kwargs)

Set one or more metadata attributes on this dataset.

Attributes are key-value pairs that store scenario metadata. They must use string keys and primitive values (bool, int, float, str) to ensure serializability and consistent comparison behavior.

Parameters:

Name Type Description Default
**kwargs

Attribute key-value pairs to set. Keys must be strings, values must be bool, int, float, or str.

{}

Raises:

Type Description
TypeError

If any key is not a string

TypeError

If any value is not bool, int, float, or str

Example

Setting scenario metadata::

>>> dataset.set_attributes(
...     year=2030,
...     scenario_type='high_renewable',
...     carbon_price=50.0,
...     includes_nuclear=True
... )

>>> # Access the attributes
>>> dataset.attributes['year']
2030
See Also
  • attributes - Access attributes dictionary
  • get_attributes_series() - Convert to pandas Series
Source code in submodules/mesqual/mesqual/datasets/dataset.py
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
def set_attributes(self, **kwargs):
    """
    Set one or more metadata attributes on this dataset.

    Attributes are key-value pairs that store scenario metadata. They must
    use string keys and primitive values (bool, int, float, str) to ensure
    serializability and consistent comparison behavior.

    Args:
        **kwargs: Attribute key-value pairs to set. Keys must be strings,
            values must be bool, int, float, or str.

    Raises:
        TypeError: If any key is not a string
        TypeError: If any value is not bool, int, float, or str

    Example:
        Setting scenario metadata::

            >>> dataset.set_attributes(
            ...     year=2030,
            ...     scenario_type='high_renewable',
            ...     carbon_price=50.0,
            ...     includes_nuclear=True
            ... )

            >>> # Access the attributes
            >>> dataset.attributes['year']
            2030

    See Also:
        - `attributes` - Access attributes dictionary
        - `get_attributes_series()` - Convert to pandas Series
    """
    for key, value in kwargs.items():
        if not isinstance(key, str):
            raise TypeError(f'Attribute keys must be of type str. Your key {key} is of type {type(key)}.')
        if not isinstance(value, (bool, int, float, str)):
            raise TypeError(
                f'Attribute values must be of type (bool, int, flaot, str). '
                f'Your value for {key} ({value}) is of type {type(value)}.'
            )
        self._attributes[key] = value

required_flags_for_flag

required_flags_for_flag(flag: FlagType) -> set[FlagType]

Get the set of flags required to compute a given flag.

For derived or computed flags, this method returns the set of source flags that must be available to produce the requested data. This is useful for understanding data dependencies and ensuring prerequisite data exists.

Parameters:

Name Type Description Default
flag FlagType

The flag to check requirements for. Must be in accepted_flags.

required

Returns:

Type Description
set[FlagType]

set[FlagType]: Set of flags that are required to compute the given flag. Returns an empty set if the flag has no dependencies.

Raises:

Type Description
ValueError

If the flag is not accepted by this dataset

Example

Checking data dependencies::

>>> # A derived flag might depend on multiple source flags
>>> deps = dataset.required_flags_for_flag('total_generation')
>>> deps
{'generators_t.p', 'generators'}
See Also
  • _required_flags_for_flag() - Abstract method to implement
  • flag_is_accepted() - Check if a flag is valid
Source code in submodules/mesqual/mesqual/datasets/dataset.py
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
@flag_must_be_accepted
def required_flags_for_flag(self, flag: FlagType) -> set[FlagType]:
    """
    Get the set of flags required to compute a given flag.

    For derived or computed flags, this method returns the set of source
    flags that must be available to produce the requested data. This is
    useful for understanding data dependencies and ensuring prerequisite
    data exists.

    Args:
        flag: The flag to check requirements for. Must be in accepted_flags.

    Returns:
        set[FlagType]: Set of flags that are required to compute the given flag.
            Returns an empty set if the flag has no dependencies.

    Raises:
        ValueError: If the flag is not accepted by this dataset

    Example:
        Checking data dependencies::

            >>> # A derived flag might depend on multiple source flags
            >>> deps = dataset.required_flags_for_flag('total_generation')
            >>> deps
            {'generators_t.p', 'generators'}

    See Also:
        - `_required_flags_for_flag()` - Abstract method to implement
        - `flag_is_accepted()` - Check if a flag is valid
    """
    return self._required_flags_for_flag(flag)

fetch_multiple_flags_and_concat

fetch_multiple_flags_and_concat(flags: Iterable[FlagType], concat_axis: int = 1, concat_level_name: str = 'variable', concat_level_at_top: bool = True, config: dict | DatasetConfigType = None, **kwargs) -> Union[Series, DataFrame]

Fetch multiple flags and concatenate results into a single DataFrame.

Convenience method for retrieving data from multiple flags and combining them into a single DataFrame with a MultiIndex. Useful for comparative analysis of multiple variables or creating wide-format data structures.

Parameters:

Name Type Description Default
flags Iterable[FlagType]

Iterable of flags to fetch and concatenate

required
concat_axis int

Axis along which to concatenate (0=rows, 1=columns). Default is 1 (columns).

1
concat_level_name str

Name for the new MultiIndex level identifying the source flag. Default is 'variable'.

'variable'
concat_level_at_top bool

If True, the flag level is the outermost level in the MultiIndex. If False, it's moved to the innermost level. Default is True.

True
config dict | DatasetConfigType

Optional configuration override (see fetch() for details)

None
**kwargs

Additional arguments passed to each fetch() call

{}

Returns:

Type Description
Union[Series, DataFrame]

DataFrame with concatenated data and a MultiIndex identifying the

Union[Series, DataFrame]

source flag for each section

Example

Fetching and comparing multiple variables::

>>> # Fetch power output and efficiency for generators
>>> combined = dataset.fetch_multiple_flags_and_concat(
...     flags=['generators_t.p', 'generators_t.efficiency'],
...     concat_level_name='metric'
... )
>>> # Result has MultiIndex columns: (metric, generator_name)

>>> # Row-wise concatenation
>>> stacked = dataset.fetch_multiple_flags_and_concat(
...     flags=['bus_A_prices', 'bus_B_prices'],
...     concat_axis=0,
...     concat_level_name='bus'
... )
See Also
  • fetch() - Single flag data retrieval
  • fetch_filter_groupby_agg() - Fetch with filtering and aggregation
Source code in submodules/mesqual/mesqual/datasets/dataset.py
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
def fetch_multiple_flags_and_concat(
        self,
        flags: Iterable[FlagType],
        concat_axis: int = 1,
        concat_level_name: str = 'variable',
        concat_level_at_top: bool = True,
        config: dict | DatasetConfigType = None,
        **kwargs
) -> Union[pd.Series, pd.DataFrame]:
    """
    Fetch multiple flags and concatenate results into a single DataFrame.

    Convenience method for retrieving data from multiple flags and combining
    them into a single DataFrame with a MultiIndex. Useful for comparative
    analysis of multiple variables or creating wide-format data structures.

    Args:
        flags: Iterable of flags to fetch and concatenate
        concat_axis: Axis along which to concatenate (0=rows, 1=columns).
            Default is 1 (columns).
        concat_level_name: Name for the new MultiIndex level identifying
            the source flag. Default is 'variable'.
        concat_level_at_top: If True, the flag level is the outermost level
            in the MultiIndex. If False, it's moved to the innermost level.
            Default is True.
        config: Optional configuration override (see fetch() for details)
        **kwargs: Additional arguments passed to each fetch() call

    Returns:
        DataFrame with concatenated data and a MultiIndex identifying the
        source flag for each section

    Example:
        Fetching and comparing multiple variables::

            >>> # Fetch power output and efficiency for generators
            >>> combined = dataset.fetch_multiple_flags_and_concat(
            ...     flags=['generators_t.p', 'generators_t.efficiency'],
            ...     concat_level_name='metric'
            ... )
            >>> # Result has MultiIndex columns: (metric, generator_name)

            >>> # Row-wise concatenation
            >>> stacked = dataset.fetch_multiple_flags_and_concat(
            ...     flags=['bus_A_prices', 'bus_B_prices'],
            ...     concat_axis=0,
            ...     concat_level_name='bus'
            ... )

    See Also:
        - `fetch()` - Single flag data retrieval
        - `fetch_filter_groupby_agg()` - Fetch with filtering and aggregation
    """
    dfs = {
        str(flag): self.fetch(flag, config, **kwargs)
        for flag in flags
    }
    df = pd.concat(
        dfs,
        axis=concat_axis,
        names=[concat_level_name],
    )
    if not concat_level_at_top:
        ax = df.axes[concat_axis]
        ax = ax.reorder_levels(list(range(1, ax.nlevels)) + [0])
        df.axes[concat_axis] = ax
    return df

fetch_filter_groupby_agg

fetch_filter_groupby_agg(flag: FlagType, model_filter_query: str = None, prop_groupby: str | list[str] = None, prop_groupby_agg: str = None, config: dict | DatasetConfigType = None, **kwargs) -> Series | DataFrame

Fetch data with model-based filtering, grouping, and aggregation.

Provides a powerful one-line method for common data analysis patterns: filter time series by model properties, group by categories, and aggregate results. Requires a flag index with model mappings.

Parameters:

Name Type Description Default
flag FlagType

Data flag to fetch (must have a linked model flag)

required
model_filter_query str

Pandas query string to filter based on model properties. Applied to the linked model DataFrame. Example: "carrier == 'solar'" or "p_nom > 100"

None
prop_groupby str | list[str]

Model property or list of properties to group by. Adds these as MultiIndex levels and groups the data. Example: 'carrier' or ['carrier', 'bus']

None
prop_groupby_agg str

Aggregation function to apply after grouping. Standard pandas aggregation strings like 'sum', 'mean', 'max'. Only used if prop_groupby is specified.

None
config dict | DatasetConfigType

Optional configuration override (see fetch() for details)

None
**kwargs

Additional arguments passed to fetch()

{}

Returns:

Type Description
Series | DataFrame

Filtered and/or aggregated data. If prop_groupby is specified without

Series | DataFrame

prop_groupby_agg, returns a DataFrameGroupBy object.

Raises:

Type Description
RuntimeError

If the flag has no linked model flag in the flag index

Example

Common analysis patterns::

>>> # Filter generators to only solar, sum by carrier
>>> solar_gen = dataset.fetch_filter_groupby_agg(
...     'generators_t.p',
...     model_filter_query="carrier == 'solar'",
...     prop_groupby='carrier',
...     prop_groupby_agg='sum'
... )

>>> # Group all generation by carrier and bus
>>> by_carrier_bus = dataset.fetch_filter_groupby_agg(
...     'generators_t.p',
...     prop_groupby=['carrier', 'bus'],
...     prop_groupby_agg='sum'
... )

>>> # Filter to large generators only
>>> large_gens = dataset.fetch_filter_groupby_agg(
...     'generators_t.p',
...     model_filter_query="p_nom >= 500"
... )
See Also
  • fetch() - Basic data retrieval
  • Pandas Utils - Underlying filter/group utilities
Source code in submodules/mesqual/mesqual/datasets/dataset.py
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
def fetch_filter_groupby_agg(
        self,
        flag: FlagType,
        model_filter_query: str = None,
        prop_groupby: str | list[str] = None,
        prop_groupby_agg: str = None,
        config: dict | DatasetConfigType = None,
        **kwargs
) -> pd.Series | pd.DataFrame:
    """
    Fetch data with model-based filtering, grouping, and aggregation.

    Provides a powerful one-line method for common data analysis patterns:
    filter time series by model properties, group by categories, and
    aggregate results. Requires a flag index with model mappings.

    Args:
        flag: Data flag to fetch (must have a linked model flag)
        model_filter_query: Pandas query string to filter based on model
            properties. Applied to the linked model DataFrame.
            Example: "carrier == 'solar'" or "p_nom > 100"
        prop_groupby: Model property or list of properties to group by.
            Adds these as MultiIndex levels and groups the data.
            Example: 'carrier' or ['carrier', 'bus']
        prop_groupby_agg: Aggregation function to apply after grouping.
            Standard pandas aggregation strings like 'sum', 'mean', 'max'.
            Only used if prop_groupby is specified.
        config: Optional configuration override (see fetch() for details)
        **kwargs: Additional arguments passed to fetch()

    Returns:
        Filtered and/or aggregated data. If prop_groupby is specified without
        prop_groupby_agg, returns a DataFrameGroupBy object.

    Raises:
        RuntimeError: If the flag has no linked model flag in the flag index

    Example:
        Common analysis patterns::

            >>> # Filter generators to only solar, sum by carrier
            >>> solar_gen = dataset.fetch_filter_groupby_agg(
            ...     'generators_t.p',
            ...     model_filter_query="carrier == 'solar'",
            ...     prop_groupby='carrier',
            ...     prop_groupby_agg='sum'
            ... )

            >>> # Group all generation by carrier and bus
            >>> by_carrier_bus = dataset.fetch_filter_groupby_agg(
            ...     'generators_t.p',
            ...     prop_groupby=['carrier', 'bus'],
            ...     prop_groupby_agg='sum'
            ... )

            >>> # Filter to large generators only
            >>> large_gens = dataset.fetch_filter_groupby_agg(
            ...     'generators_t.p',
            ...     model_filter_query="p_nom >= 500"
            ... )

    See Also:
        - `fetch()` - Basic data retrieval
        - [Pandas Utils](../utils/pandas_utils/index.md) - Underlying filter/group utilities
    """
    model_flag = self.flag_index.get_linked_model_flag(flag)
    if not model_flag:
        raise RuntimeError(f'FlagIndex could not successfully map flag {flag} to a model flag.')

    from mesqual.utils import pandas_utils

    data = self.fetch(flag, config, **kwargs)
    from mesqual.datasets import DatasetCollection
    if isinstance(self, DatasetCollection):
        # TODO: implement MultiIndex capabilities into filter_by_model_query / prepend_model_prop_levels, then fetch_merged is not needed
        model_df = self.fetch_merged(model_flag, config, **kwargs)
    else:
        model_df = self.fetch(model_flag, config, **kwargs)

    if model_filter_query:
        data = pandas_utils.filter_by_model_query(data, model_df, query=model_filter_query)

    if prop_groupby:
        if isinstance(prop_groupby, str):
            prop_groupby = [prop_groupby]
        data = pandas_utils.prepend_model_prop_levels(data, model_df, *prop_groupby)
        data = data.groupby(prop_groupby)
        if prop_groupby_agg:
            data = data.agg(prop_groupby_agg)
    elif prop_groupby_agg:
        logger.warning(
            f"You provided a prop_groupby_agg operation, but didn't provide prop_groupby. "
            f"No aggregation performed."
        )
    return data

get_flag_type classmethod

get_flag_type() -> Type[FlagType]

Get the flag type class for this dataset type.

Returns the type used for data flags in this dataset class. Subclasses can override to specify a custom flag type for type checking and validation.

Returns:

Type Description
Type[FlagType]

Type[FlagType]: The flag type class (default: FlagTypeProtocol)

Note

Override in subclasses that use custom flag types.

Source code in submodules/mesqual/mesqual/datasets/dataset.py
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
@classmethod
def get_flag_type(cls) -> Type[FlagType]:
    """
    Get the flag type class for this dataset type.

    Returns the type used for data flags in this dataset class. Subclasses
    can override to specify a custom flag type for type checking and
    validation.

    Returns:
        Type[FlagType]: The flag type class (default: FlagTypeProtocol)

    Note:
        Override in subclasses that use custom flag types.
    """
    from mesqual.flag.flag import FlagTypeProtocol
    return FlagTypeProtocol

get_flag_index_type classmethod

get_flag_index_type() -> Type[FlagIndexType]

Get the flag index type class for this dataset type.

Returns the type used for the flag index in this dataset class. Subclasses can override to specify a custom flag index implementation.

Returns:

Type Description
Type[FlagIndexType]

Type[FlagIndexType]: The flag index type class (default: FlagIndex)

Note

Override in subclasses that use platform-specific flag indices.

Source code in submodules/mesqual/mesqual/datasets/dataset.py
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
@classmethod
def get_flag_index_type(cls) -> Type[FlagIndexType]:
    """
    Get the flag index type class for this dataset type.

    Returns the type used for the flag index in this dataset class.
    Subclasses can override to specify a custom flag index implementation.

    Returns:
        Type[FlagIndexType]: The flag index type class (default: FlagIndex)

    Note:
        Override in subclasses that use platform-specific flag indices.
    """
    from mesqual.flag.flag_index import FlagIndex
    return FlagIndex

get_config_type classmethod

get_config_type() -> Type[DatasetConfigType]

Get the configuration type class for this dataset type.

Returns the DatasetConfig subclass used by this dataset. Platform interfaces typically override this to return their extended config class with platform-specific options.

Returns:

Type Description
Type[DatasetConfigType]

Type[DatasetConfigType]: The config type class (default: DatasetConfig)

Example

Creating a config instance for this dataset type::

>>> ConfigClass = MyDataset.get_config_type()
>>> config = ConfigClass(use_database=True)
Note

Override in platform dataset subclasses to return platform-specific config types with additional options.

Source code in submodules/mesqual/mesqual/datasets/dataset.py
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
@classmethod
def get_config_type(cls) -> Type[DatasetConfigType]:
    """
    Get the configuration type class for this dataset type.

    Returns the DatasetConfig subclass used by this dataset. Platform
    interfaces typically override this to return their extended config
    class with platform-specific options.

    Returns:
        Type[DatasetConfigType]: The config type class (default: DatasetConfig)

    Example:
        Creating a config instance for this dataset type::

            >>> ConfigClass = MyDataset.get_config_type()
            >>> config = ConfigClass(use_database=True)

    Note:
        Override in platform dataset subclasses to return platform-specific
        config types with additional options.
    """
    from mesqual.datasets.dataset_config import DatasetConfig
    return DatasetConfig

set_instance_config

set_instance_config(config: DatasetConfigType) -> None

Replace the instance-level configuration for this dataset.

Sets the configuration that will be merged with class-level defaults to produce the effective configuration used by fetch().

Parameters:

Name Type Description Default
config DatasetConfigType

New configuration object to use for this instance

required
Example

Setting a custom configuration::

>>> from mesqual.datasets import DatasetConfig
>>> config = DatasetConfig(use_database=False, auto_sort_datetime_index=False)
>>> dataset.set_instance_config(config)
See Also
  • instance_config - Get the effective configuration
  • set_instance_config_kwargs() - Update individual settings
  • set_class_config() - Set class-level defaults
Source code in submodules/mesqual/mesqual/datasets/dataset.py
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
def set_instance_config(self, config: DatasetConfigType) -> None:
    """
    Replace the instance-level configuration for this dataset.

    Sets the configuration that will be merged with class-level defaults
    to produce the effective configuration used by fetch().

    Args:
        config: New configuration object to use for this instance

    Example:
        Setting a custom configuration::

            >>> from mesqual.datasets import DatasetConfig
            >>> config = DatasetConfig(use_database=False, auto_sort_datetime_index=False)
            >>> dataset.set_instance_config(config)

    See Also:
        - `instance_config` - Get the effective configuration
        - `set_instance_config_kwargs()` - Update individual settings
        - `set_class_config()` - Set class-level defaults
    """
    self._config = config

set_instance_config_kwargs

set_instance_config_kwargs(**kwargs) -> None

Update individual configuration settings on this instance.

Modifies specific attributes of the existing instance configuration without replacing the entire config object. Useful for tweaking individual settings.

Parameters:

Name Type Description Default
**kwargs

Configuration attribute names and values to set

{}
Example

Adjusting specific settings::

>>> dataset.set_instance_config_kwargs(
...     use_database=True,
...     auto_sort_datetime_index=False
... )
Warning

Raises AttributeError if the config attribute doesn't exist.

See Also
  • set_instance_config() - Replace entire configuration
  • instance_config - Get the effective configuration
Source code in submodules/mesqual/mesqual/datasets/dataset.py
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
def set_instance_config_kwargs(self, **kwargs) -> None:
    """
    Update individual configuration settings on this instance.

    Modifies specific attributes of the existing instance configuration
    without replacing the entire config object. Useful for tweaking
    individual settings.

    Args:
        **kwargs: Configuration attribute names and values to set

    Example:
        Adjusting specific settings::

            >>> dataset.set_instance_config_kwargs(
            ...     use_database=True,
            ...     auto_sort_datetime_index=False
            ... )

    Warning:
        Raises AttributeError if the config attribute doesn't exist.

    See Also:
        - `set_instance_config()` - Replace entire configuration
        - `instance_config` - Get the effective configuration
    """
    for key, value in kwargs.items():
        setattr(self._config, key, value)

set_class_config classmethod

set_class_config(config: DatasetConfigType) -> None

Set the class-level configuration for all instances of this dataset type.

Class-level configuration serves as the default for all instances of this class. Instance-level configuration (set via set_instance_config) can override these defaults.

Parameters:

Name Type Description Default
config DatasetConfigType

Configuration object to use as class-level defaults

required
Example

Setting defaults for all instances::

>>> from mesqual.datasets import DatasetConfig
>>> config = DatasetConfig(use_database=True)
>>> MyDataset.set_class_config(config)
>>>
>>> # All new instances will use database by default
>>> ds1 = MyDataset()  # uses database
>>> ds2 = MyDataset()  # uses database
Note

This affects all instances of the class, including existing ones that haven't overridden the setting at instance level.

See Also
  • set_instance_config() - Override for specific instances
  • DatasetConfigManager - Configuration management system
Source code in submodules/mesqual/mesqual/datasets/dataset.py
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
@classmethod
def set_class_config(cls, config: DatasetConfigType) -> None:
    """
    Set the class-level configuration for all instances of this dataset type.

    Class-level configuration serves as the default for all instances of
    this class. Instance-level configuration (set via set_instance_config)
    can override these defaults.

    Args:
        config: Configuration object to use as class-level defaults

    Example:
        Setting defaults for all instances::

            >>> from mesqual.datasets import DatasetConfig
            >>> config = DatasetConfig(use_database=True)
            >>> MyDataset.set_class_config(config)
            >>>
            >>> # All new instances will use database by default
            >>> ds1 = MyDataset()  # uses database
            >>> ds2 = MyDataset()  # uses database

    Note:
        This affects all instances of the class, including existing ones
        that haven't overridden the setting at instance level.

    See Also:
        - `set_instance_config()` - Override for specific instances
        - DatasetConfigManager - Configuration management system
    """
    from mesqual.datasets.dataset_config import DatasetConfigManager
    DatasetConfigManager.set_class_config(cls, config)

DatasetCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], Dataset[DatasetConfigType, FlagType, FlagIndexType], ABC

Abstract base class for collections of datasets.

DatasetCollection extends the Dataset interface to handle multiple child datasets while maintaining the same unified API. This enables complex hierarchical structures where collections themselves can be treated as datasets.

Key Features
  • Inherits all Dataset functionality
  • Manages collections of child datasets
  • Provides iteration and access methods
  • Aggregates accepted flags from all children
  • Supports KPI operations across all sub-datasets

Class Type Parameters:

Name Bound or Constraints Description Default
DatasetType

Type of datasets that can be collected

required
DatasetConfigType

Configuration class for dataset behavior

required
FlagType

Type used for data flag identification

required
FlagIndexType

Flag index implementation for flag mapping

required

Attributes:

Name Type Description
datasets list[DatasetType]

List of child datasets in this collection

Note

This class follows the "Everything is a Dataset" principle, allowing collections to be used anywhere a Dataset is expected.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
class DatasetCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    Dataset[DatasetConfigType, FlagType, FlagIndexType],
    ABC
):
    """
    Abstract base class for collections of datasets.

    DatasetCollection extends the Dataset interface to handle multiple child datasets
    while maintaining the same unified API. This enables complex hierarchical structures
    where collections themselves can be treated as datasets.

    Key Features:
        - Inherits all Dataset functionality
        - Manages collections of child datasets
        - Provides iteration and access methods
        - Aggregates accepted flags from all children
        - Supports KPI operations across all sub-datasets

    Type Parameters:
        DatasetType: Type of datasets that can be collected
        DatasetConfigType: Configuration class for dataset behavior
        FlagType: Type used for data flag identification
        FlagIndexType: Flag index implementation for flag mapping

    Attributes:
        datasets (list[DatasetType]): List of child datasets in this collection

    Note:
        This class follows the "Everything is a Dataset" principle, allowing
        collections to be used anywhere a Dataset is expected.
    """

    def __init__(
            self,
            datasets: list[DatasetType] = None,
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None
    ):
        super().__init__(
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        self.datasets: list[DatasetType] = datasets if datasets else []

    @property
    def dataset_iterator(self) -> Iterator[DatasetType]:
        for ds in self.datasets:
            yield ds

    @property
    def flag_index(self) -> FlagIndex:
        from mesqual.flag.flag_index import EmptyFlagIndex
        if (self._flag_index is None) or isinstance(self._flag_index, EmptyFlagIndex):
            from mesqual.utils.check_all_same import all_same_object
            if all_same_object(ds.flag_index for ds in self.datasets) and len(self.datasets):
                return self.get_dataset().flag_index
        return self._flag_index

    @property
    def attributes(self) -> dict:
        child_dataset_atts = [ds.attributes for ds in self.datasets]
        attributes_that_all_childs_have_in_common = get_intersection_of_dicts(child_dataset_atts)
        return {**attributes_that_all_childs_have_in_common, **self._attributes.copy()}

    def get_merged_kpi_collection(self, deep: bool = True) -> KPICollection:
        """
        Merge KPI collections from all child datasets.

        This method collects KPIs from all child datasets' kpi_collection
        properties and returns a unified collection. Optionally recurses into
        nested DatasetCollections.

        Args:
            deep: If True, recursively merge from nested DatasetCollections

        Returns:
            KPICollection containing all KPIs from all child datasets

        Example:

            >>> # Create KPIs for all scenarios
            >>> study.scen: DatasetConcatCollection
            >>> study.scen.add_kpis_from_definitions_to_all_child_datasets(kpi_defs)
            >>>
            >>> # Get merged collection across all scenarios
            >>> all_kpis = study.scen.get_merged_kpi_collection()
            >>>
            >>> # Filter and export
            >>> mean_prices = all_kpis.filter_by(aggregation=Aggregations.Mean)
            >>> df = mean_prices.to_dataframe(unit_handling='auto_convert')
        """
        from mesqual.kpis.collection import KPICollection
        merged = KPICollection()

        for ds in self.datasets:
            # Add KPIs from this dataset
            merged.extend(ds.kpi_collection._kpis)

            # Recursively add from nested collections
            if deep and isinstance(ds, DatasetCollection):
                nested_merged = ds.get_merged_kpi_collection(deep=deep)
                merged.extend(nested_merged._kpis)

        return merged

    def add_kpis_from_definitions_to_all_child_datasets(self, kpi_definitions: KPIDefinition | list[KPIDefinition]):
        for ds in self.dataset_iterator:
            ds.add_kpis_from_definitions(kpi_definitions)

    def clear_kpi_collection_for_all_child_datasets(self, deep: bool = True):
        for ds in self.datasets:
            ds.clear_kpi_collection()
            if deep and isinstance(ds, DatasetCollection):
                ds.clear_kpi_collection_for_all_child_datasets(deep=deep)

    @abstractmethod
    def _fetch(
            self,
            flag: FlagType,
            effective_config: DatasetConfigType,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        pass

    def flag_is_accepted(self, flag: FlagType) -> bool:
        return any(ds.flag_is_accepted(flag) for ds in self.datasets)

    @property
    def accepted_flags(self) -> set[FlagType]:
        return nested_union([ds.accepted_flags for ds in self.datasets])

    def _required_flags_for_flag(self, flag: FlagType) -> set[FlagType]:
        return nested_union([ds.accepted_flags for ds in self.datasets])

    def get_dataset(self, key: str = None) -> DatasetType:
        if key is None:
            if not self.datasets:
                raise ValueError("No datasets available")
            return self.datasets[0]

        for ds in self.datasets:
            if ds.name == key:
                return ds

        raise KeyError(f"Dataset with name '{key}' not found")

    def add_datasets(self, datasets: Iterable[DatasetType]):
        for ds in datasets:
            self.add_dataset(ds)

    def add_dataset(self, dataset: DatasetType):
        if not isinstance(dataset, self.get_child_dataset_type()):
            raise TypeError(f"Can only add data sets of type {self.get_child_dataset_type().__name__}.")

        for i, existing in enumerate(self.datasets):
            if existing.name == dataset.name:
                logger.warning(
                    f"Dataset {self.name}: "
                    f"dataset {dataset.name} already in this collection. Replacing it."
                )
                self.datasets[i] = dataset
                return

        self.datasets.append(dataset)

    @classmethod
    def get_child_dataset_type(cls) -> type[DatasetType]:
        return Dataset

    def fetch_merged(
            self,
            flag: FlagType,
            config: dict | DatasetConfigType = None,
            keep_first: bool = True,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        """Fetch method that merges dataframes from all child datasets, similar to DatasetMergeCollection."""
        temp_merge_collection = self.get_merged_dataset_collection(keep_first)
        return temp_merge_collection.fetch(flag, config, **kwargs)

    def get_merged_dataset_collection(self, keep_first: bool = True) -> 'DatasetMergeCollection':
        return DatasetMergeCollection(
            datasets=self.datasets,
            name=f"{self.name} merged",
            keep_first=keep_first
        )

get_merged_kpi_collection

get_merged_kpi_collection(deep: bool = True) -> KPICollection

Merge KPI collections from all child datasets.

This method collects KPIs from all child datasets' kpi_collection properties and returns a unified collection. Optionally recurses into nested DatasetCollections.

Parameters:

Name Type Description Default
deep bool

If True, recursively merge from nested DatasetCollections

True

Returns:

Type Description
KPICollection

KPICollection containing all KPIs from all child datasets

Example:

>>> # Create KPIs for all scenarios
>>> study.scen: DatasetConcatCollection
>>> study.scen.add_kpis_from_definitions_to_all_child_datasets(kpi_defs)
>>>
>>> # Get merged collection across all scenarios
>>> all_kpis = study.scen.get_merged_kpi_collection()
>>>
>>> # Filter and export
>>> mean_prices = all_kpis.filter_by(aggregation=Aggregations.Mean)
>>> df = mean_prices.to_dataframe(unit_handling='auto_convert')
Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
def get_merged_kpi_collection(self, deep: bool = True) -> KPICollection:
    """
    Merge KPI collections from all child datasets.

    This method collects KPIs from all child datasets' kpi_collection
    properties and returns a unified collection. Optionally recurses into
    nested DatasetCollections.

    Args:
        deep: If True, recursively merge from nested DatasetCollections

    Returns:
        KPICollection containing all KPIs from all child datasets

    Example:

        >>> # Create KPIs for all scenarios
        >>> study.scen: DatasetConcatCollection
        >>> study.scen.add_kpis_from_definitions_to_all_child_datasets(kpi_defs)
        >>>
        >>> # Get merged collection across all scenarios
        >>> all_kpis = study.scen.get_merged_kpi_collection()
        >>>
        >>> # Filter and export
        >>> mean_prices = all_kpis.filter_by(aggregation=Aggregations.Mean)
        >>> df = mean_prices.to_dataframe(unit_handling='auto_convert')
    """
    from mesqual.kpis.collection import KPICollection
    merged = KPICollection()

    for ds in self.datasets:
        # Add KPIs from this dataset
        merged.extend(ds.kpi_collection._kpis)

        # Recursively add from nested collections
        if deep and isinstance(ds, DatasetCollection):
            nested_merged = ds.get_merged_kpi_collection(deep=deep)
            merged.extend(nested_merged._kpis)

    return merged

fetch_merged

fetch_merged(flag: FlagType, config: dict | DatasetConfigType = None, keep_first: bool = True, **kwargs) -> Series | DataFrame

Fetch method that merges dataframes from all child datasets, similar to DatasetMergeCollection.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
203
204
205
206
207
208
209
210
211
212
def fetch_merged(
        self,
        flag: FlagType,
        config: dict | DatasetConfigType = None,
        keep_first: bool = True,
        **kwargs
) -> pd.Series | pd.DataFrame:
    """Fetch method that merges dataframes from all child datasets, similar to DatasetMergeCollection."""
    temp_merge_collection = self.get_merged_dataset_collection(keep_first)
    return temp_merge_collection.fetch(flag, config, **kwargs)

DatasetLinkCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Links specialized flag interpreters into a unified platform dataset interface.

DatasetLinkCollection is the foundation for modular platform dataset architectures. It orchestrates multiple specialized interpreter datasets, each handling a specific subset of flags, and automatically routes fetch requests to the appropriate interpreter. This is NOT used for linking scenarios (use DatasetConcatCollection for that), but for linking interpreters within a single scenario/platform.

Architecture Pattern

Platform datasets (PyPSADataset, PlexosDataset, etc.) are typically implemented as DatasetLinkCollections containing specialized interpreters:

  • Core Platform Interpreters: Handle standard platform data

    • ModelInterpreter: Static model data (generators, buses, lines, etc.)
    • TimeSeriesInterpreter: Time-varying data (generators_t.p, buses_t.marginal_price)
    • ObjectiveInterpreter: Optimization objective values
    • ConstraintInterpreters: Shadow prices, binding constraints
  • Study-Specific Interpreters: Extend or override platform behavior

    • Custom variable interpreters: Derived metrics specific to the study
    • Correction interpreters: Override platform data with corrections
    • Integration interpreters: Combine external data sources
Key Features
  • Automatic Flag Routing: Fetches are routed to the interpreter that accepts the flag
  • Bidirectional Relationships: Each interpreter can access siblings via parent_dataset
  • Separation of Concerns: Each interpreter specializes in one aspect of the data
  • Study Extensibility: Add custom interpreters without modifying platform code
  • First-Match Routing: First interpreter accepting a flag handles it
  • Overlap Detection: Warns if multiple interpreters accept the same flag
Routing Logic
  1. User calls platform_dataset.fetch('some_flag')
  2. DatasetLinkCollection iterates through child interpreters in order
  3. Returns data from first interpreter where interpreter.flag_is_accepted('some_flag')
  4. If no interpreter accepts the flag, raises KeyError
Interpreter Communication

Interpreters access sibling data through the parent_dataset property:

  • self.parent_dataset.fetch('other_flag') - Fetch from any sibling
  • self.parent_dataset.get_dataset_by_type(InterpreterClass) - Access specific sibling
  • self.parent_dataset.attributes - Access shared dataset attributes
Example

Building a platform dataset with modular interpreters:

>>> # Standard platform dataset structure
>>> class PyPSADataset(DatasetLinkCollection):
...     def __init__(self, network, name=None):
...         interpreters = [
...             PyPSAModelInterpreter(network),      # Handles: 'generators', 'buses', 'lines'
...             PyPSATimeSeriesInterpreter(network),  # Handles: 'generators_t.p', 'buses_t.marginal_price'
...             PyPSAObjectiveInterpreter(network),   # Handles: 'objective', 'total_cost'
...         ]
...         super().__init__(datasets=interpreters, name=name)
...
...         # Set bidirectional parent-child links
...         for interpreter in interpreters:
...             interpreter.parent_dataset = self
>>>
>>> # Usage: transparent routing to correct interpreter
>>> dataset = PyPSADataset(network, name='base_case')
>>> buses = dataset.fetch('buses')                    # -> PyPSAModelInterpreter
>>> gen_p = dataset.fetch('generators_t.p')           # -> PyPSATimeSeriesInterpreter
>>> cost = dataset.fetch('objective')                 # -> PyPSAObjectiveInterpreter

Study-specific extension with custom interpreter:

>>> # Study extends platform dataset with custom variables
>>> class StudyDataset(PyPSADataset):
...     def __init__(self, network, name=None):
...         super().__init__(network, name)
...
...         # Add study-specific interpreter for derived metrics
...         custom_interpreter = RESGenerationInterpreter()
...         custom_interpreter.parent_dataset = self
...         self.add_dataset(custom_interpreter)
...
>>> # Custom interpreter accesses platform interpreters via parent
>>> class RESGenerationInterpreter(Dataset):
...     @property
...     def accepted_flags(self):
...         return {'generators_t.res_generation_total'}
...
...     def _fetch(self, flag, config, **kwargs):
...         # Access sibling interpreters through parent
...         gen_p = self.parent_dataset.fetch('generators_t.p')      # From TimeSeriesInterpreter
...         gen_model = self.parent_dataset.fetch('generators')       # From ModelInterpreter
...
...         # Calculate derived metric
...         res_gens = gen_model[gen_model['carrier'].isin(['solar', 'wind'])]
...         return gen_p[res_gens.index].sum(axis=1)

Study-specific override of platform variable:

>>> # Study corrects platform data for specific scenarios
>>> class CorrectedLineFlowsInterpreter(Dataset):
...     '''Override platform line flows with corrected external data.'''
...
...     @property
...     def accepted_flags(self):
...         return {'Line.flow_net'}  # Same flag as platform interpreter
...
...     def _fetch(self, flag, config, **kwargs):
...         # This interpreter is added BEFORE the previous platform interpreter,
...         # so it gets priority due to first-match routing
...
...         # Get original platform data from sibling
...         platform_interpreter = self.parent_dataset.get_dataset_by_type(
...             PlatformLineFlowInterpreter
...         )
...         flows = platform_interpreter.fetch(flag, config, **kwargs)
...
...         # Apply corrections for historical scenarios
...         if self.parent_dataset.attributes.get('replace_line_flow_with_custom_data'):
...             flows = self._replace_line_flow_with_custom_data(flows)
...
...         return flows
...
>>> # Add correction interpreter FIRST to override platform behavior
>>> study_dataset = StudyDataset(network)
>>> study_dataset.datasets.insert(0, CorrectedLineFlowsInterpreter())
Warning

If multiple child interpreters accept the same flag, only the FIRST one in the datasets list will handle it. The constructor logs warnings for such overlaps. This can be intentional (override pattern) or accidental.

To override a flag, add the overriding interpreter BEFORE the original interpreter in the datasets list.

See Also
  • Dataset.parent_dataset - Property that child interpreters use to access parent
  • DatasetConcatCollection - For linking multiple scenarios (different use case)
  • get_dataset_by_type() - Method to access specific child interpreter by type
Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
class DatasetLinkCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Links specialized flag interpreters into a unified platform dataset interface.

    DatasetLinkCollection is the foundation for modular platform dataset architectures.
    It orchestrates multiple specialized interpreter datasets, each handling a specific
    subset of flags, and automatically routes fetch requests to the appropriate
    interpreter. This is NOT used for linking scenarios (use DatasetConcatCollection
    for that), but for linking interpreters within a single scenario/platform.

    Architecture Pattern:
        Platform datasets (PyPSADataset, PlexosDataset, etc.) are typically
        implemented as DatasetLinkCollections containing specialized interpreters:

        - **Core Platform Interpreters**: Handle standard platform data
            - ModelInterpreter: Static model data (generators, buses, lines, etc.)
            - TimeSeriesInterpreter: Time-varying data (generators_t.p, buses_t.marginal_price)
            - ObjectiveInterpreter: Optimization objective values
            - ConstraintInterpreters: Shadow prices, binding constraints

        - **Study-Specific Interpreters**: Extend or override platform behavior
            - Custom variable interpreters: Derived metrics specific to the study
            - Correction interpreters: Override platform data with corrections
            - Integration interpreters: Combine external data sources

    Key Features:
        - **Automatic Flag Routing**: Fetches are routed to the interpreter that accepts the flag
        - **Bidirectional Relationships**: Each interpreter can access siblings via parent_dataset
        - **Separation of Concerns**: Each interpreter specializes in one aspect of the data
        - **Study Extensibility**: Add custom interpreters without modifying platform code
        - **First-Match Routing**: First interpreter accepting a flag handles it
        - **Overlap Detection**: Warns if multiple interpreters accept the same flag

    Routing Logic:
        1. User calls `platform_dataset.fetch('some_flag')`
        2. DatasetLinkCollection iterates through child interpreters in order
        3. Returns data from first interpreter where `interpreter.flag_is_accepted('some_flag')`
        4. If no interpreter accepts the flag, raises KeyError

    Interpreter Communication:
        Interpreters access sibling data through the parent_dataset property:

        - `self.parent_dataset.fetch('other_flag')` - Fetch from any sibling
        - `self.parent_dataset.get_dataset_by_type(InterpreterClass)` - Access specific sibling
        - `self.parent_dataset.attributes` - Access shared dataset attributes

    Example:
        Building a platform dataset with modular interpreters:

            >>> # Standard platform dataset structure
            >>> class PyPSADataset(DatasetLinkCollection):
            ...     def __init__(self, network, name=None):
            ...         interpreters = [
            ...             PyPSAModelInterpreter(network),      # Handles: 'generators', 'buses', 'lines'
            ...             PyPSATimeSeriesInterpreter(network),  # Handles: 'generators_t.p', 'buses_t.marginal_price'
            ...             PyPSAObjectiveInterpreter(network),   # Handles: 'objective', 'total_cost'
            ...         ]
            ...         super().__init__(datasets=interpreters, name=name)
            ...
            ...         # Set bidirectional parent-child links
            ...         for interpreter in interpreters:
            ...             interpreter.parent_dataset = self
            >>>
            >>> # Usage: transparent routing to correct interpreter
            >>> dataset = PyPSADataset(network, name='base_case')
            >>> buses = dataset.fetch('buses')                    # -> PyPSAModelInterpreter
            >>> gen_p = dataset.fetch('generators_t.p')           # -> PyPSATimeSeriesInterpreter
            >>> cost = dataset.fetch('objective')                 # -> PyPSAObjectiveInterpreter

        Study-specific extension with custom interpreter:

            >>> # Study extends platform dataset with custom variables
            >>> class StudyDataset(PyPSADataset):
            ...     def __init__(self, network, name=None):
            ...         super().__init__(network, name)
            ...
            ...         # Add study-specific interpreter for derived metrics
            ...         custom_interpreter = RESGenerationInterpreter()
            ...         custom_interpreter.parent_dataset = self
            ...         self.add_dataset(custom_interpreter)
            ...
            >>> # Custom interpreter accesses platform interpreters via parent
            >>> class RESGenerationInterpreter(Dataset):
            ...     @property
            ...     def accepted_flags(self):
            ...         return {'generators_t.res_generation_total'}
            ...
            ...     def _fetch(self, flag, config, **kwargs):
            ...         # Access sibling interpreters through parent
            ...         gen_p = self.parent_dataset.fetch('generators_t.p')      # From TimeSeriesInterpreter
            ...         gen_model = self.parent_dataset.fetch('generators')       # From ModelInterpreter
            ...
            ...         # Calculate derived metric
            ...         res_gens = gen_model[gen_model['carrier'].isin(['solar', 'wind'])]
            ...         return gen_p[res_gens.index].sum(axis=1)

        Study-specific override of platform variable:

            >>> # Study corrects platform data for specific scenarios
            >>> class CorrectedLineFlowsInterpreter(Dataset):
            ...     '''Override platform line flows with corrected external data.'''
            ...
            ...     @property
            ...     def accepted_flags(self):
            ...         return {'Line.flow_net'}  # Same flag as platform interpreter
            ...
            ...     def _fetch(self, flag, config, **kwargs):
            ...         # This interpreter is added BEFORE the previous platform interpreter,
            ...         # so it gets priority due to first-match routing
            ...
            ...         # Get original platform data from sibling
            ...         platform_interpreter = self.parent_dataset.get_dataset_by_type(
            ...             PlatformLineFlowInterpreter
            ...         )
            ...         flows = platform_interpreter.fetch(flag, config, **kwargs)
            ...
            ...         # Apply corrections for historical scenarios
            ...         if self.parent_dataset.attributes.get('replace_line_flow_with_custom_data'):
            ...             flows = self._replace_line_flow_with_custom_data(flows)
            ...
            ...         return flows
            ...
            >>> # Add correction interpreter FIRST to override platform behavior
            >>> study_dataset = StudyDataset(network)
            >>> study_dataset.datasets.insert(0, CorrectedLineFlowsInterpreter())

    Warning:
        If multiple child interpreters accept the same flag, only the FIRST one
        in the datasets list will handle it. The constructor logs warnings for
        such overlaps. This can be intentional (override pattern) or accidental.

        To override a flag, add the overriding interpreter BEFORE the original
        interpreter in the datasets list.

    See Also:
        - `Dataset.parent_dataset` - Property that child interpreters use to access parent
        - `DatasetConcatCollection` - For linking multiple scenarios (different use case)
        - `get_dataset_by_type()` - Method to access specific child interpreter by type
    """

    def __init__(
            self,
            datasets: list[DatasetType],
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
    ):
        super().__init__(
            datasets=datasets,
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        self._warn_if_flags_overlap()

    def _fetch(self, flag: FlagType, effective_config: DatasetConfigType, **kwargs) -> pd.Series | pd.DataFrame:
        for ds in self.datasets:
            if ds.flag_is_accepted(flag):
                return ds.fetch(flag, effective_config, **kwargs)
        raise KeyError(f"Key '{flag}' not recognized by any of the linked Datasets.")

    def _warn_if_flags_overlap(self):
        from collections import Counter

        accepted_flags = list()
        for ds in self.datasets:
            accepted_flags += list(ds.accepted_flags)

        counts = Counter(accepted_flags)
        duplicates = {k: v for k, v in counts.items() if v > 1}
        if any(duplicates.values()):
            logger.warning(
                f"Dataset {self.name}: "
                f"The following keys have multiple Dataset sources: {duplicates.keys()}. \n"
                f"Only the first one will be used! This might lead to unexpected behavior. \n"
                f"A potential reason could be the use of an inappropriate DatasetCollection Type."
            )

    def get_dataset_by_type(self, ds_type: type[Dataset]) -> DatasetType:
        """Returns instance of child dataset that matches the ds_type."""
        for ds in self.datasets:
            if isinstance(ds, ds_type):
                return ds
        raise KeyError(f'No Dataset of type {ds_type.__name__} found in {self.name}.')

get_dataset_by_type

get_dataset_by_type(ds_type: type[Dataset]) -> DatasetType

Returns instance of child dataset that matches the ds_type.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
409
410
411
412
413
414
def get_dataset_by_type(self, ds_type: type[Dataset]) -> DatasetType:
    """Returns instance of child dataset that matches the ds_type."""
    for ds in self.datasets:
        if isinstance(ds, ds_type):
            return ds
    raise KeyError(f'No Dataset of type {ds_type.__name__} found in {self.name}.')

DatasetMergeCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Fetch method will merge fragmented Datasets for same flag, e.g.: - fragmented simulation runs, e.g. CW1, CW2, CW3, CWn. - fragmented data sources, e.g. mapping from Excel file with model from simulation platform.

Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
class DatasetMergeCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Fetch method will merge fragmented Datasets for same flag, e.g.:
        - fragmented simulation runs, e.g. CW1, CW2, CW3, CWn.
        - fragmented data sources, e.g. mapping from Excel file with model from simulation platform.
    """
    def __init__(
            self,
            datasets: list[DatasetType],
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
            keep_first: bool = True,
    ):
        super().__init__(
            datasets=datasets,
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        self.keep_first = keep_first

    def _fetch(self, flag: FlagType, effective_config: DatasetConfigType, **kwargs) -> pd.Series | pd.DataFrame:
        data_frames = []
        for ds in self.datasets:
            if ds.flag_is_accepted(flag):
                data_frames.append(ds.fetch(flag, effective_config, **kwargs))

        if not data_frames:
            raise KeyError(f"Flag '{flag}' not recognized by any of the datasets.")

        from mesqual.utils.pandas_utils.combine_df import combine_dfs
        df = combine_dfs(data_frames, keep_first=self.keep_first)
        return df

DatasetConcatCollection

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Concatenates data from multiple datasets with MultiIndex structure.

DatasetConcatCollection is fundamental to MESQUAL's multi-scenario analysis capabilities. It fetches the same flag from multiple child datasets and concatenates the results into a single DataFrame/Series with an additional index level identifying the source dataset.

Key Features
  • Automatic MultiIndex creation with dataset names
  • Configurable concatenation axis and level positioning
  • Preserves all dimensional relationships
  • Supports scenario and comparison collections
  • Enables unified analysis across multiple datasets
MultiIndex Structure

The resulting data structure includes an additional index level (typically named 'dataset') that identifies the source dataset for each data point.

Example:

>>> # Collection of scenario datasets
>>> scenarios = DatasetConcatCollection([
...     PyPSADataset(base_network, name='base'),
...     PyPSADataset(high_res_network, name='high_res'),
...     PyPSADataset(low_gas_network, name='low_gas')
... ])
>>> 
>>> # Fetch creates MultiIndex DataFrame
>>> prices = scenarios.fetch('buses_t.marginal_price')
>>> print(prices.columns.names)
    ['dataset', 'Bus']  # Original Bus index + dataset level
>>> 
>>> # Access specific scenario data
>>> base_prices = prices['base']
>>> 
>>> # Analyze across scenarios
>>> mean_prices = prices.mean()  # Mean across all scenarios
Source code in submodules/mesqual/mesqual/datasets/dataset_collection.py
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
class DatasetConcatCollection(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Concatenates data from multiple datasets with MultiIndex structure.

    DatasetConcatCollection is fundamental to MESQUAL's multi-scenario analysis
    capabilities. It fetches the same flag from multiple child datasets and
    concatenates the results into a single DataFrame/Series with an additional
    index level identifying the source dataset.

    Key Features:
        - Automatic MultiIndex creation with dataset names
        - Configurable concatenation axis and level positioning  
        - Preserves all dimensional relationships
        - Supports scenario and comparison collections
        - Enables unified analysis across multiple datasets

    MultiIndex Structure:
        The resulting data structure includes an additional index level
        (typically named 'dataset') that identifies the source dataset
        for each data point.

    Example:

        >>> # Collection of scenario datasets
        >>> scenarios = DatasetConcatCollection([
        ...     PyPSADataset(base_network, name='base'),
        ...     PyPSADataset(high_res_network, name='high_res'),
        ...     PyPSADataset(low_gas_network, name='low_gas')
        ... ])
        >>> 
        >>> # Fetch creates MultiIndex DataFrame
        >>> prices = scenarios.fetch('buses_t.marginal_price')
        >>> print(prices.columns.names)
            ['dataset', 'Bus']  # Original Bus index + dataset level
        >>> 
        >>> # Access specific scenario data
        >>> base_prices = prices['base']
        >>> 
        >>> # Analyze across scenarios
        >>> mean_prices = prices.mean()  # Mean across all scenarios
    """
    DEFAULT_CONCAT_LEVEL_NAME = 'dataset'
    DEFAULT_ATT_LEVEL_NAME = 'attribute'

    def __init__(
            self,
            datasets: list[DatasetType],
            name: str = None,
            parent_dataset: Dataset = None,
            flag_index: FlagIndex = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
            default_concat_axis: int = 1,
            concat_top: bool = True,
            concat_level_name: str = None,
    ):
        super().__init__(
            datasets=datasets,
            name=name,
            parent_dataset=parent_dataset,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        super().__init__(datasets=datasets, name=name)
        self.default_concat_axis = default_concat_axis
        self.concat_top = concat_top
        self.concat_level_name = concat_level_name or self.DEFAULT_CONCAT_LEVEL_NAME

    def get_attributes_concat_df(self) -> pd.DataFrame:
        if all(isinstance(ds, DatasetConcatCollection) for ds in self.datasets):
            use_att_df_instead_of_series = True
        else:
            use_att_df_instead_of_series = False

        atts_per_dataset = dict()
        for ds in self.datasets:
            atts = ds.get_attributes_concat_df().T if use_att_df_instead_of_series else ds.get_attributes_series()
            atts_per_dataset[ds.name] = atts

        return pd.concat(
            atts_per_dataset,
            axis=1,
            names=[self.concat_level_name]
        ).rename_axis(self.DEFAULT_ATT_LEVEL_NAME).T

    def _fetch(
            self,
            flag: FlagType,
            effective_config: DatasetConfigType,
            concat_axis: int = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        if concat_axis is None:
            concat_axis = self.default_concat_axis

        dfs = {}
        for ds in self.datasets:
            if ds.flag_is_accepted(flag):
                dfs[ds.name] = ds.fetch(flag, effective_config, **kwargs)

        if not dfs:
            raise KeyError(f"Flag '{flag}' not recognized by any of the datasets in {type(self)} {self.name}.")

        df0 = list(dfs.values())[0]
        if not all(len(df.axes) == len(df0.axes) for df in dfs.values()):
            raise NotImplementedError(f'Axes lengths do not match between dfs.')

        for ax in range(len(df0.axes)):
            if not all(set(df.axes[ax].names) == set(df0.axes[ax].names) for df in dfs.values()):
                raise NotImplementedError(f'Axes names do not match between dfs.')

        df = pd.concat(dfs, join='outer', axis=concat_axis, names=[self.concat_level_name])

        if not self.concat_top:
            ax = df.axes[concat_axis]
            df.axes[concat_axis] = ax.reorder_levels([ax.nlevels - 1] + list(range(ax.nlevels - 1)))

        return df

DatasetComparison

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]

Computes and provides access to differences between two datasets.

DatasetComparison is a core component of MESQUAL's scenario comparison capabilities. It automatically calculates deltas, ratios, or side-by-side comparisons between a variation dataset and a reference dataset, enabling systematic analysis of scenario differences.

Key Features
  • Automatic delta computation between datasets
  • Multiple comparison types (DELTA, VARIATION, BOTH)
  • Handles numeric and non-numeric data appropriately
  • Preserves data structure and index relationships
  • Configurable unchanged value handling
  • Inherits full Dataset interface
Comparison Types
  • DELTA: Variation - Reference (default)
  • VARIATION: Returns variation data with optional NaN for unchanged values
  • BOTH: Side-by-side variation and reference data

Attributes:

Name Type Description
variation_dataset

The dataset representing the scenario being compared

reference_dataset

The dataset representing the baseline for comparison

Example:

>>> # Compare high renewable scenario to base case
>>> comparison = DatasetComparison(
...     variation_dataset=high_res_dataset,
...     reference_dataset=base_dataset
... )
>>> 
>>> # Get price differences
>>> price_deltas = comparison.fetch('buses_t.marginal_price')
>>> 
>>> # Get both datasets side-by-side (often used to show model changes)
>>> price_both = comparison.fetch('buses', comparison_type=ComparisonTypeEnum.BOTH)
>>> 
>>> # Highlight only changes (often used to show model changes)
>>> price_changes = comparison.fetch('buses', replace_unchanged_values_by_nan=True)
Source code in submodules/mesqual/mesqual/datasets/dataset_comparison.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
class DatasetComparison(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType]
):
    """
    Computes and provides access to differences between two datasets.

    DatasetComparison is a core component of MESQUAL's scenario comparison capabilities.
    It automatically calculates deltas, ratios, or side-by-side comparisons between
    a variation dataset and a reference dataset, enabling systematic analysis of
    scenario differences.

    Key Features:
        - Automatic delta computation between datasets
        - Multiple comparison types (DELTA, VARIATION, BOTH)
        - Handles numeric and non-numeric data appropriately
        - Preserves data structure and index relationships
        - Configurable unchanged value handling
        - Inherits full Dataset interface

    Comparison Types:
        - DELTA: Variation - Reference (default)
        - VARIATION: Returns variation data with optional NaN for unchanged values
        - BOTH: Side-by-side variation and reference data

    Attributes:
        variation_dataset: The dataset representing the scenario being compared
        reference_dataset: The dataset representing the baseline for comparison

    Example:

        >>> # Compare high renewable scenario to base case
        >>> comparison = DatasetComparison(
        ...     variation_dataset=high_res_dataset,
        ...     reference_dataset=base_dataset
        ... )
        >>> 
        >>> # Get price differences
        >>> price_deltas = comparison.fetch('buses_t.marginal_price')
        >>> 
        >>> # Get both datasets side-by-side (often used to show model changes)
        >>> price_both = comparison.fetch('buses', comparison_type=ComparisonTypeEnum.BOTH)
        >>> 
        >>> # Highlight only changes (often used to show model changes)
        >>> price_changes = comparison.fetch('buses', replace_unchanged_values_by_nan=True)
    """
    COMPARISON_ATTRIBUTES_SOURCE = ComparisonAttributesSourceEnum.USE_VARIATION_ATTS
    COMPARISON_NAME_JOIN = ' vs '
    VARIATION_DS_ATT_KEY = 'variation_dataset'
    REFERENCE_DS_ATT_KEY = 'reference_dataset'

    def __init__(
            self,
            variation_dataset: Dataset,
            reference_dataset: Dataset,
            name: str = None,
            attributes: dict = None,
            config: DatasetConfigType = None,
    ):
        name = name or self._get_auto_generated_name(variation_dataset, reference_dataset)

        super().__init__(
            [reference_dataset, variation_dataset],
            name=name,
            attributes=attributes,
            config=config
        )

        self.variation_dataset = variation_dataset
        self.reference_dataset = reference_dataset

    def _get_auto_generated_name(self, variation_dataset: Dataset, reference_dataset: Dataset) -> str:
        return variation_dataset.name + self.COMPARISON_NAME_JOIN + reference_dataset.name

    @property
    def attributes(self) -> dict:
        match self.COMPARISON_ATTRIBUTES_SOURCE:
            case ComparisonAttributesSourceEnum.USE_VARIATION_ATTS:
                atts = self.variation_dataset.attributes.copy()
            case ComparisonAttributesSourceEnum.USE_REFERENCE_ATTS:
                atts = self.reference_dataset.attributes.copy()
            case _:
                atts = super().attributes
        atts[self.VARIATION_DS_ATT_KEY] = self.variation_dataset.name
        atts[self.REFERENCE_DS_ATT_KEY] = self.reference_dataset.name
        return atts

    def fetch(
            self,
            flag: FlagType,
            config: dict | DatasetConfigType = None,
            comparison_type: ComparisonTypeEnum = ComparisonTypeEnum.DELTA,
            replace_unchanged_values_by_nan: bool = False,
            fill_value: float | int | None = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        """
        Fetch comparison data between variation and reference datasets.

        Extends the base Dataset.fetch() method with comparison-specific parameters
        for controlling how the comparison is computed and formatted.

        Args:
            flag: Data identifier flag to fetch from both datasets
            config: Optional configuration overrides
            comparison_type: How to compare the datasets:
                - DELTA: variation - reference (default)
                - VARIATION: variation data only, optionally with NaN for unchanged
                - BOTH: concatenated variation and reference data
            replace_unchanged_values_by_nan: If True, replaces values that are
                identical between datasets with NaN (useful for highlighting changes)
            fill_value: Value to use for missing data in subtraction operations
            **kwargs: Additional arguments passed to child dataset fetch methods

        Returns:
            DataFrame or Series with comparison results

        Example:

            >>> # Basic delta comparison
            >>> deltas = comparison.fetch('buses_t.marginal_price')
            >>> 
            >>> # Highlight only changed values
            >>> changes_only = comparison.fetch(
            ...     'buses_t.marginal_price',
            ...     replace_unchanged_values_by_nan=True
            ... )
            >>> 
            >>> # Side-by-side comparison
            >>> both = comparison.fetch(
            ...     'buses_t.marginal_price',
            ...     comparison_type=ComparisonTypeEnum.BOTH
            ... )
        """
        return super().fetch(
            flag=flag,
            config=config,
            comparison_type=comparison_type,
            replace_unchanged_values_by_nan=replace_unchanged_values_by_nan,
            fill_value=fill_value,
            **kwargs
        )

    def _fetch(
            self,
            flag: FlagType,
            effective_config: DatasetConfigType,
            comparison_type: ComparisonTypeEnum = ComparisonTypeEnum.DELTA,
            replace_unchanged_values_by_nan: bool = False,
            fill_value: float | int | None = None,
            **kwargs
    ) -> pd.Series | pd.DataFrame:
        df_var = self.variation_dataset.fetch(flag, effective_config, **kwargs)
        df_ref = self.reference_dataset.fetch(flag, effective_config, **kwargs)

        match comparison_type:
            case ComparisonTypeEnum.VARIATION:
                return self._get_variation_comparison(df_var, df_ref, replace_unchanged_values_by_nan)
            case ComparisonTypeEnum.BOTH:
                return self._get_both_comparison(df_var, df_ref, replace_unchanged_values_by_nan)
            case ComparisonTypeEnum.DELTA:
                return self._get_delta_comparison(df_var, df_ref, replace_unchanged_values_by_nan, fill_value)
        raise ValueError(f"Unsupported comparison_type: {comparison_type}")

    def _values_are_equal(self, val1, val2) -> bool:
        if pd.isna(val1) and pd.isna(val2):
            return True
        try:
            return val1 == val2
        except:
            pass
        try:
            if str(val1) == str(val2):
                return True
        except:
            pass
        return False

    def _get_variation_comparison(
            self,
            df_var: pd.DataFrame,
            df_ref: pd.DataFrame,
            replace_unchanged_values_by_nan: bool
    ) -> pd.DataFrame:
        result = df_var.copy()

        if replace_unchanged_values_by_nan:
            common_indices = df_var.index.intersection(df_ref.index)
            common_columns = df_var.columns.intersection(df_ref.columns)

            for idx in common_indices:
                for col in common_columns:
                    if self._values_are_equal(df_var.loc[idx, col], df_ref.loc[idx, col]):
                        result.loc[idx, col] = float('nan')

        return result

    def _get_both_comparison(
            self,
            df_var: pd.DataFrame,
            df_ref: pd.DataFrame,
            replace_unchanged_values_by_nan: bool
    ) -> pd.DataFrame:
        var_name = self.variation_dataset.name
        ref_name = self.reference_dataset.name

        result = pd.concat([df_var, df_ref], keys=[var_name, ref_name])
        result = result.sort_index(level=1)

        if replace_unchanged_values_by_nan:
            common_indices = df_var.index.intersection(df_ref.index)
            common_columns = df_var.columns.intersection(df_ref.columns)

            for idx in common_indices:
                for col in common_columns:
                    if self._values_are_equal(df_var.loc[idx, col], df_ref.loc[idx, col]):
                        result.loc[(var_name, idx), col] = float('nan')
                        result.loc[(ref_name, idx), col] = float('nan')

        return result

    def _get_delta_comparison(
            self,
            df_var: pd.DataFrame,
            df_ref: pd.DataFrame,
            replace_unchanged_values_by_nan: bool,
            fill_value: float | int | None
    ) -> pd.DataFrame:
        if pd_is_numeric(df_var) and pd_is_numeric(df_ref):
            result = df_var.subtract(df_ref, fill_value=fill_value)

            if replace_unchanged_values_by_nan:
                result = result.replace(0, float('nan'))

            return result

        all_columns = df_var.columns.union(df_ref.columns)
        all_indices = df_var.index.union(df_ref.index)

        result = pd.DataFrame(index=all_indices, columns=all_columns)

        for col in all_columns:
            if col in df_var.columns and col in df_ref.columns:
                var_col = df_var[col]
                ref_col = df_ref[col]

                # Special handling for boolean columns
                if pd.api.types.is_bool_dtype(var_col) and pd.api.types.is_bool_dtype(ref_col):
                    # For booleans, we can mark where they differ
                    common_indices = var_col.index.intersection(ref_col.index)
                    delta = pd.Series(index=all_indices)

                    for idx in common_indices:
                        if var_col.loc[idx] != ref_col.loc[idx]:
                            delta.loc[idx] = f"{var_col.loc[idx]} (was {ref_col.loc[idx]})"
                        elif not replace_unchanged_values_by_nan:
                            delta.loc[idx] = var_col.loc[idx]

                    # Handle indices only in variation
                    for idx in var_col.index.difference(ref_col.index):
                        delta.loc[idx] = f"{var_col.loc[idx]} (new)"

                    # Handle indices only in reference
                    for idx in ref_col.index.difference(var_col.index):
                        delta.loc[idx] = f"DELETED: {ref_col.loc[idx]}"

                    result[col] = delta

                elif pd.api.types.is_numeric_dtype(var_col) and pd.api.types.is_numeric_dtype(ref_col):
                    delta = var_col.subtract(ref_col, fill_value=fill_value)
                    result[col] = delta

                    if replace_unchanged_values_by_nan:
                        result.loc[delta == 0, col] = float('nan')
                else:
                    common_indices = var_col.index.intersection(ref_col.index)
                    var_only_indices = var_col.index.difference(ref_col.index)
                    ref_only_indices = ref_col.index.difference(var_col.index)

                    for idx in common_indices:
                        if not self._values_are_equal(var_col.loc[idx], ref_col.loc[idx]):
                            result.loc[idx, col] = f"{var_col.loc[idx]} (was {ref_col.loc[idx]})"
                        elif not replace_unchanged_values_by_nan:
                            result.loc[idx, col] = var_col.loc[idx]

                    for idx in var_only_indices:
                        result.loc[idx, col] = f"{var_col.loc[idx]} (new)"

                    for idx in ref_only_indices:
                        val = ref_col.loc[idx]
                        if not pd.isna(val):
                            result.loc[idx, col] = f"DELETED: {val}"

            elif col in df_var.columns:
                for idx in df_var.index:
                    result.loc[idx, col] = f"{df_var.loc[idx, col]} (new column)"

            else:  # Column only in reference
                for idx in df_ref.index:
                    val = df_ref.loc[idx, col]
                    if not pd.isna(val):
                        result.loc[idx, col] = f"REMOVED: {val}"

        return result

fetch

fetch(flag: FlagType, config: dict | DatasetConfigType = None, comparison_type: ComparisonTypeEnum = DELTA, replace_unchanged_values_by_nan: bool = False, fill_value: float | int | None = None, **kwargs) -> Series | DataFrame

Fetch comparison data between variation and reference datasets.

Extends the base Dataset.fetch() method with comparison-specific parameters for controlling how the comparison is computed and formatted.

Parameters:

Name Type Description Default
flag FlagType

Data identifier flag to fetch from both datasets

required
config dict | DatasetConfigType

Optional configuration overrides

None
comparison_type ComparisonTypeEnum

How to compare the datasets: - DELTA: variation - reference (default) - VARIATION: variation data only, optionally with NaN for unchanged - BOTH: concatenated variation and reference data

DELTA
replace_unchanged_values_by_nan bool

If True, replaces values that are identical between datasets with NaN (useful for highlighting changes)

False
fill_value float | int | None

Value to use for missing data in subtraction operations

None
**kwargs

Additional arguments passed to child dataset fetch methods

{}

Returns:

Type Description
Series | DataFrame

DataFrame or Series with comparison results

Example:

>>> # Basic delta comparison
>>> deltas = comparison.fetch('buses_t.marginal_price')
>>> 
>>> # Highlight only changed values
>>> changes_only = comparison.fetch(
...     'buses_t.marginal_price',
...     replace_unchanged_values_by_nan=True
... )
>>> 
>>> # Side-by-side comparison
>>> both = comparison.fetch(
...     'buses_t.marginal_price',
...     comparison_type=ComparisonTypeEnum.BOTH
... )
Source code in submodules/mesqual/mesqual/datasets/dataset_comparison.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def fetch(
        self,
        flag: FlagType,
        config: dict | DatasetConfigType = None,
        comparison_type: ComparisonTypeEnum = ComparisonTypeEnum.DELTA,
        replace_unchanged_values_by_nan: bool = False,
        fill_value: float | int | None = None,
        **kwargs
) -> pd.Series | pd.DataFrame:
    """
    Fetch comparison data between variation and reference datasets.

    Extends the base Dataset.fetch() method with comparison-specific parameters
    for controlling how the comparison is computed and formatted.

    Args:
        flag: Data identifier flag to fetch from both datasets
        config: Optional configuration overrides
        comparison_type: How to compare the datasets:
            - DELTA: variation - reference (default)
            - VARIATION: variation data only, optionally with NaN for unchanged
            - BOTH: concatenated variation and reference data
        replace_unchanged_values_by_nan: If True, replaces values that are
            identical between datasets with NaN (useful for highlighting changes)
        fill_value: Value to use for missing data in subtraction operations
        **kwargs: Additional arguments passed to child dataset fetch methods

    Returns:
        DataFrame or Series with comparison results

    Example:

        >>> # Basic delta comparison
        >>> deltas = comparison.fetch('buses_t.marginal_price')
        >>> 
        >>> # Highlight only changed values
        >>> changes_only = comparison.fetch(
        ...     'buses_t.marginal_price',
        ...     replace_unchanged_values_by_nan=True
        ... )
        >>> 
        >>> # Side-by-side comparison
        >>> both = comparison.fetch(
        ...     'buses_t.marginal_price',
        ...     comparison_type=ComparisonTypeEnum.BOTH
        ... )
    """
    return super().fetch(
        flag=flag,
        config=config,
        comparison_type=comparison_type,
        replace_unchanged_values_by_nan=replace_unchanged_values_by_nan,
        fill_value=fill_value,
        **kwargs
    )

PlatformDataset

Bases: Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType], DatasetLinkCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType], ABC

Base class for platform-specific datasets with automatic interpreter management.

PlatformDataset provides the foundation for integrating MESQUAL with specific energy modeling platforms (PyPSA, PLEXOS, etc.). It manages a registry of data interpreters and automatically instantiates them to handle different types of platform data.

Key Features
  • Automatic interpreter registration and instantiation
  • Type-safe interpreter management through generics
  • Flexible argument passing to interpreter constructors
  • Support for study-specific interpreter extensions
  • Unified data access through DatasetLinkCollection routing
Architecture
  • Uses DatasetLinkCollection for automatic flag routing
  • Manages interpreter registry at class level
  • Auto-instantiates all registered interpreters on construction
  • Supports inheritance and interpreter registration on subclasses

Class Type Parameters:

Name Bound or Constraints Description Default
DatasetType

Base type for all interpreters (must be Dataset subclass)

required
DatasetConfigType

Configuration class for dataset behavior

required
FlagType

Type used for data flag identification

required
FlagIndexType

Flag index implementation for flag mapping

required
Class Attributes

_interpreter_registry: List of registered interpreter classes

Usage Pattern
  1. Create platform dataset class inheriting from PlatformDataset
  2. Define get_child_dataset_type() to specify interpreter base class
  3. Create interpreter classes inheriting from the base interpreter
  4. Register interpreters using @PlatformDataset.register_interpreter
  5. Instantiate platform dataset - interpreters are auto-created

Example:

>>> # Define platform dataset
>>> class PyPSADataset(PlatformDataset[PyPSAInterpreter, ...]):
...     @classmethod
...     def get_child_dataset_type(cls):
...         return PyPSAInterpreter
...
>>> # Register core interpreters
>>> @PyPSADataset.register_interpreter
... class PyPSAModelInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'buses', 'generators', 'lines'}
...
>>> @PyPSADataset.register_interpreter  
... class PyPSATimeSeriesInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'buses_t.marginal_price', 'generators_t.p'}
...
>>> # Register study-specific interpreter
>>> @PyPSADataset.register_interpreter
... class CustomVariableInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'custom_metric'}
...
>>> # Use platform dataset
>>> dataset = PyPSADataset(network=my_network)
>>> buses = dataset.fetch('buses')  # Routes to ModelInterpreter
>>> prices = dataset.fetch('buses_t.marginal_price')  # Routes to TimeSeriesInterpreter
>>> custom = dataset.fetch('custom_metric')  # Routes to CustomVariableInterpreter
Notes
  • Interpreters are registered at class level and shared across instances
  • Registration order affects routing (last registered = first checked)
  • All registered interpreters are instantiated for each platform dataset
  • Constructor arguments are automatically extracted and passed to interpreters
Source code in submodules/mesqual/mesqual/datasets/platform_dataset.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
class PlatformDataset(
    Generic[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    DatasetLinkCollection[DatasetType, DatasetConfigType, FlagType, FlagIndexType],
    ABC
):
    """
    Base class for platform-specific datasets with automatic interpreter management.

    PlatformDataset provides the foundation for integrating MESQUAL with specific
    energy modeling platforms (PyPSA, PLEXOS, etc.). It manages a registry of
    data interpreters and automatically instantiates them to handle different
    types of platform data.

    Key Features:
        - Automatic interpreter registration and instantiation
        - Type-safe interpreter management through generics
        - Flexible argument passing to interpreter constructors
        - Support for study-specific interpreter extensions
        - Unified data access through DatasetLinkCollection routing

    Architecture:
        - Uses DatasetLinkCollection for automatic flag routing
        - Manages interpreter registry at class level
        - Auto-instantiates all registered interpreters on construction
        - Supports inheritance and interpreter registration on subclasses

    Type Parameters:
        DatasetType: Base type for all interpreters (must be Dataset subclass)
        DatasetConfigType: Configuration class for dataset behavior
        FlagType: Type used for data flag identification
        FlagIndexType: Flag index implementation for flag mapping

    Class Attributes:
        _interpreter_registry: List of registered interpreter classes

    Usage Pattern:
        1. Create platform dataset class inheriting from PlatformDataset
        2. Define get_child_dataset_type() to specify interpreter base class
        3. Create interpreter classes inheriting from the base interpreter
        4. Register interpreters using @PlatformDataset.register_interpreter
        5. Instantiate platform dataset - interpreters are auto-created

    Example:

        >>> # Define platform dataset
        >>> class PyPSADataset(PlatformDataset[PyPSAInterpreter, ...]):
        ...     @classmethod
        ...     def get_child_dataset_type(cls):
        ...         return PyPSAInterpreter
        ...
        >>> # Register core interpreters
        >>> @PyPSADataset.register_interpreter
        ... class PyPSAModelInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'buses', 'generators', 'lines'}
        ...
        >>> @PyPSADataset.register_interpreter  
        ... class PyPSATimeSeriesInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'buses_t.marginal_price', 'generators_t.p'}
        ...
        >>> # Register study-specific interpreter
        >>> @PyPSADataset.register_interpreter
        ... class CustomVariableInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'custom_metric'}
        ...
        >>> # Use platform dataset
        >>> dataset = PyPSADataset(network=my_network)
        >>> buses = dataset.fetch('buses')  # Routes to ModelInterpreter
        >>> prices = dataset.fetch('buses_t.marginal_price')  # Routes to TimeSeriesInterpreter
        >>> custom = dataset.fetch('custom_metric')  # Routes to CustomVariableInterpreter

    Notes:
        - Interpreters are registered at class level and shared across instances
        - Registration order affects routing (last registered = first checked)
        - All registered interpreters are instantiated for each platform dataset
        - Constructor arguments are automatically extracted and passed to interpreters
    """

    _interpreter_registry: list[Type[DatasetType]] = []

    def __init__(
            self,
            name: str = None,
            flag_index: FlagIndexType = None,
            attributes: dict = None,
            database: Database = None,
            config: DatasetConfigType = None,
            **kwargs
    ):
        super().__init__(
            datasets=[],
            name=name,
            flag_index=flag_index,
            attributes=attributes,
            database=database,
            config=config,
        )
        interpreter_args = self._prepare_interpreter_initialization_args(kwargs)
        datasets = self._initialize_registered_interpreters(interpreter_args)
        self.add_datasets(datasets)

    @classmethod
    def register_interpreter(cls, interpreter: Type[DatasetType]) -> Type['DatasetType']:
        """
        Register a data interpreter class with this platform dataset.

        This method is typically used as a decorator to register interpreter classes
        that handle specific types of platform data. Registered interpreters are
        automatically instantiated when the platform dataset is created.

        Args:
            interpreter: Interpreter class that must inherit from get_child_dataset_type()

        Returns:
            The interpreter class (unchanged) to support decorator usage

        Raises:
            TypeError: If interpreter doesn't inherit from the required base class

        Example:

            >>> @PyPSADataset.register_interpreter
            ... class CustomInterpreter(PyPSAInterpreter):
            ...     @property
            ...     def accepted_flags(self):
            ...         return {'custom_flag'}
            ...     
            ...     def _fetch(self, flag, config, **kwargs):
            ...         return compute_custom_data()
        """
        cls._validate_interpreter_type(interpreter)
        if interpreter not in cls._interpreter_registry:
            cls._add_interpreter_to_registry(interpreter)
        return interpreter

    @classmethod
    def get_registered_interpreters(cls) -> list[Type[DatasetType]]:
        return cls._interpreter_registry.copy()

    def get_interpreter_instance(self, interpreter_type: Type[DatasetType]) -> DatasetType:
        interpreter = self._find_interpreter_instance(interpreter_type)
        if interpreter is None:
            raise ValueError(
                f'No interpreter instance found for type {interpreter_type.__name__}'
            )
        return interpreter

    def get_flags_by_interpreter(self) -> dict[Type[DatasetType], set[FlagType]]:
        return {
            type(interpreter): interpreter.accepted_flags
            for interpreter in self.datasets.values()
        }

    def _prepare_interpreter_initialization_args(self, kwargs: dict) -> dict:
        interpreter_signature = InterpreterSignature.from_interpreter(self.get_child_dataset_type())
        return {
            arg: kwargs.get(arg, default)
            for arg, default in zip(interpreter_signature.args, interpreter_signature.defaults)
        }

    def _initialize_registered_interpreters(self, interpreter_args: dict) -> list[DatasetType]:
        return [
            interpreter(**interpreter_args, parent_dataset=self)
            for interpreter in self._interpreter_registry
        ]

    @classmethod
    def _validate_interpreter_type(cls, interpreter: Type[DatasetType]) -> None:
        if not issubclass(interpreter, cls.get_child_dataset_type()):
            raise TypeError(
                f'Interpreter must be subclass of {cls.get_child_dataset_type().__name__}'
            )

    @classmethod
    def _validate_interpreter_not_registered(cls, interpreter: Type[DatasetType]) -> None:
        if interpreter in cls._interpreter_registry:
            raise ValueError(f'Interpreter {interpreter.__name__} already registered')

    @classmethod
    def _add_interpreter_to_registry(cls, interpreter: Type[DatasetType]) -> None:
        cls._interpreter_registry.insert(0, interpreter)

    def _find_interpreter_instance(self, interpreter_type: Type[DatasetType]) -> DatasetType | None:
        for interpreter in self.datasets.values():
            if isinstance(interpreter, interpreter_type):
                return interpreter
        return None

register_interpreter classmethod

register_interpreter(interpreter: Type[DatasetType]) -> Type['DatasetType']

Register a data interpreter class with this platform dataset.

This method is typically used as a decorator to register interpreter classes that handle specific types of platform data. Registered interpreters are automatically instantiated when the platform dataset is created.

Parameters:

Name Type Description Default
interpreter Type[DatasetType]

Interpreter class that must inherit from get_child_dataset_type()

required

Returns:

Type Description
Type['DatasetType']

The interpreter class (unchanged) to support decorator usage

Raises:

Type Description
TypeError

If interpreter doesn't inherit from the required base class

Example:

>>> @PyPSADataset.register_interpreter
... class CustomInterpreter(PyPSAInterpreter):
...     @property
...     def accepted_flags(self):
...         return {'custom_flag'}
...     
...     def _fetch(self, flag, config, **kwargs):
...         return compute_custom_data()
Source code in submodules/mesqual/mesqual/datasets/platform_dataset.py
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
@classmethod
def register_interpreter(cls, interpreter: Type[DatasetType]) -> Type['DatasetType']:
    """
    Register a data interpreter class with this platform dataset.

    This method is typically used as a decorator to register interpreter classes
    that handle specific types of platform data. Registered interpreters are
    automatically instantiated when the platform dataset is created.

    Args:
        interpreter: Interpreter class that must inherit from get_child_dataset_type()

    Returns:
        The interpreter class (unchanged) to support decorator usage

    Raises:
        TypeError: If interpreter doesn't inherit from the required base class

    Example:

        >>> @PyPSADataset.register_interpreter
        ... class CustomInterpreter(PyPSAInterpreter):
        ...     @property
        ...     def accepted_flags(self):
        ...         return {'custom_flag'}
        ...     
        ...     def _fetch(self, flag, config, **kwargs):
        ...         return compute_custom_data()
    """
    cls._validate_interpreter_type(interpreter)
    if interpreter not in cls._interpreter_registry:
        cls._add_interpreter_to_registry(interpreter)
    return interpreter

DatasetConfig dataclass

Base configuration class for controlling Dataset behavior.

DatasetConfig provides common configuration options that apply to all datasets in the MESQUAL framework. Platform-specific and study-specific configurations should extend this class to add additional options.

The configuration system uses a merge-based hierarchy where each level can override settings from the previous level. The :meth:merge method combines configurations, with later values taking precedence over earlier ones.

Attributes:

Name Type Description
use_database bool

If True, enables database caching for expensive fetch operations. When a database is configured on the dataset, fetched data will be cached and retrieved from cache on subsequent calls. Set to False to bypass caching. Default: True.

auto_sort_datetime_index bool

If True, automatically sorts the returned DataFrame/Series by its DatetimeIndex after fetching. This ensures time-series data is always in chronological order regardless of the source data ordering. Default: True.

remove_duplicate_indices bool

If True, automatically removes duplicate index entries from fetched data, keeping the first occurrence. A warning is logged when duplicates are found. This protects against data quality issues in source data. Default: True.

Example

Creating a custom configuration::

>>> config = DatasetConfig(use_database=False)
>>> dataset = MyDataset(config=config)

Extending for platform-specific options::

>>> @dataclass
... class MyPlatformConfig(DatasetConfig):
...     custom_option: bool = True
...     date_filter: list = None
Source code in submodules/mesqual/mesqual/datasets/dataset_config.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
@dataclass
class DatasetConfig:
    """
    Base configuration class for controlling Dataset behavior.

    DatasetConfig provides common configuration options that apply to all datasets
    in the MESQUAL framework. Platform-specific and study-specific configurations
    should extend this class to add additional options.

    The configuration system uses a merge-based hierarchy where each level can
    override settings from the previous level. The :meth:`merge` method combines
    configurations, with later values taking precedence over earlier ones.

    Attributes:
        use_database: If True, enables database caching for expensive fetch
            operations. When a database is configured on the dataset, fetched
            data will be cached and retrieved from cache on subsequent calls.
            Set to False to bypass caching. Default: True.

        auto_sort_datetime_index: If True, automatically sorts the returned
            DataFrame/Series by its DatetimeIndex after fetching. This ensures
            time-series data is always in chronological order regardless of
            the source data ordering. Default: True.

        remove_duplicate_indices: If True, automatically removes duplicate
            index entries from fetched data, keeping the first occurrence.
            A warning is logged when duplicates are found. This protects
            against data quality issues in source data. Default: True.

    Example:
        Creating a custom configuration::

            >>> config = DatasetConfig(use_database=False)
            >>> dataset = MyDataset(config=config)

        Extending for platform-specific options::

            >>> @dataclass
            ... class MyPlatformConfig(DatasetConfig):
            ...     custom_option: bool = True
            ...     date_filter: list = None
    """
    use_database: bool = True
    auto_sort_datetime_index: bool = True
    remove_duplicate_indices: bool = True

    def merge(self, other: Optional[DatasetConfigType | dict]) -> DatasetConfigType:
        """
        Merge this configuration with another, returning a new combined config.

        Creates a new configuration instance that combines settings from both
        configurations. Values from ``other`` override values from ``self``,
        but only for non-None values. This allows partial overrides where you
        only specify the settings you want to change.

        The merge creates a new instance of the same type as ``self``, ensuring
        that subclass-specific attributes are preserved.

        Args:
            other: Configuration to merge with. Can be:
                - None: Returns self unchanged
                - dict: Keys map to config attribute names
                - DatasetConfig: Another config instance (same or subclass)

        Returns:
            A new configuration instance combining both configs. The return
            type matches the type of ``self``.

        Example:

            >>> base = DatasetConfig(use_database=True, auto_sort_datetime_index=True)
            >>> override = DatasetConfig(use_database=False)
            >>> merged = base.merge(override)
            >>> merged.use_database
                False
            >>> merged.auto_sort_datetime_index  # Preserved from base
                True

            Using a dict for quick overrides:

                >>> merged = base.merge({'use_database': False})
        """
        if other is None:
            return self

        merged_config = self.__class__()

        for attr_name in dir(self):
            if not attr_name.startswith('_'):  # Skip private attributes
                setattr(merged_config, attr_name, getattr(self, attr_name))

        if isinstance(other, dict):
            for key, value in other.items():
                if value is not None:
                    setattr(merged_config, key, value)
            return merged_config

        for attr_name in dir(other):
            if not attr_name.startswith('_'):
                other_value = getattr(other, attr_name)
                if other_value is not None:
                    setattr(merged_config, attr_name, other_value)

        return merged_config

    def __repr__(self) -> str:
        """Return a string representation showing all config attributes."""
        attrs = {
            name: getattr(self, name)
            for name in dir(self)
            if not name.startswith('_') and not callable(getattr(self, name))
        }
        return f"{self.__class__.__name__}({attrs})"

merge

merge(other: Optional[DatasetConfigType | dict]) -> DatasetConfigType

Merge this configuration with another, returning a new combined config.

Creates a new configuration instance that combines settings from both configurations. Values from other override values from self, but only for non-None values. This allows partial overrides where you only specify the settings you want to change.

The merge creates a new instance of the same type as self, ensuring that subclass-specific attributes are preserved.

Parameters:

Name Type Description Default
other Optional[DatasetConfigType | dict]

Configuration to merge with. Can be: - None: Returns self unchanged - dict: Keys map to config attribute names - DatasetConfig: Another config instance (same or subclass)

required

Returns:

Type Description
DatasetConfigType

A new configuration instance combining both configs. The return

DatasetConfigType

type matches the type of self.

Example:

>>> base = DatasetConfig(use_database=True, auto_sort_datetime_index=True)
>>> override = DatasetConfig(use_database=False)
>>> merged = base.merge(override)
>>> merged.use_database
    False
>>> merged.auto_sort_datetime_index  # Preserved from base
    True

Using a dict for quick overrides:

    >>> merged = base.merge({'use_database': False})
Source code in submodules/mesqual/mesqual/datasets/dataset_config.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
def merge(self, other: Optional[DatasetConfigType | dict]) -> DatasetConfigType:
    """
    Merge this configuration with another, returning a new combined config.

    Creates a new configuration instance that combines settings from both
    configurations. Values from ``other`` override values from ``self``,
    but only for non-None values. This allows partial overrides where you
    only specify the settings you want to change.

    The merge creates a new instance of the same type as ``self``, ensuring
    that subclass-specific attributes are preserved.

    Args:
        other: Configuration to merge with. Can be:
            - None: Returns self unchanged
            - dict: Keys map to config attribute names
            - DatasetConfig: Another config instance (same or subclass)

    Returns:
        A new configuration instance combining both configs. The return
        type matches the type of ``self``.

    Example:

        >>> base = DatasetConfig(use_database=True, auto_sort_datetime_index=True)
        >>> override = DatasetConfig(use_database=False)
        >>> merged = base.merge(override)
        >>> merged.use_database
            False
        >>> merged.auto_sort_datetime_index  # Preserved from base
            True

        Using a dict for quick overrides:

            >>> merged = base.merge({'use_database': False})
    """
    if other is None:
        return self

    merged_config = self.__class__()

    for attr_name in dir(self):
        if not attr_name.startswith('_'):  # Skip private attributes
            setattr(merged_config, attr_name, getattr(self, attr_name))

    if isinstance(other, dict):
        for key, value in other.items():
            if value is not None:
                setattr(merged_config, key, value)
        return merged_config

    for attr_name in dir(other):
        if not attr_name.startswith('_'):
            other_value = getattr(other, attr_name)
            if other_value is not None:
                setattr(merged_config, attr_name, other_value)

    return merged_config

__repr__

__repr__() -> str

Return a string representation showing all config attributes.

Source code in submodules/mesqual/mesqual/datasets/dataset_config.py
193
194
195
196
197
198
199
200
def __repr__(self) -> str:
    """Return a string representation showing all config attributes."""
    attrs = {
        name: getattr(self, name)
        for name in dir(self)
        if not name.startswith('_') and not callable(getattr(self, name))
    }
    return f"{self.__class__.__name__}({attrs})"