Skip to content

MESQUAL Pandas Util set_new_column

set_column

set_column(df: DataFrame, new_column_name: Hashable, new_column_values: Series | DataFrame) -> DataFrame

Set or replace a column in a DataFrame with new values.

Adds a new column or replaces an existing column in a DataFrame. Handles both Series and DataFrame inputs, with special logic for MultiIndex columns when using DataFrame inputs.

Parameters:

Name Type Description Default
df DataFrame

The DataFrame to modify.

required
new_column_name Hashable

Name/key for the new column.

required
new_column_values Series | DataFrame

Values for the new column. Can be a Series for simple columns or a DataFrame for MultiIndex column structures.

required

Returns:

Type Description
DataFrame

A copy of the DataFrame with the new column added or existing column replaced.

Raises:

Type Description
ValueError

If length of df and new_column_values don't match, or if new_column_values DataFrame has incorrect number of column levels.

TypeError

If new_column_values is neither Series nor DataFrame.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>>
>>> # Add Series as new column
>>> new_series = pd.Series([7, 8, 9])
>>> result = set_column(df, 'C', new_series)
>>> print(result.columns.tolist())
    ['A', 'B', 'C']
>>>
>>> # Replace existing column
>>> replacement = pd.Series([10, 11, 12])
>>> result = set_column(df, 'A', replacement)
>>> print(result['A'].tolist())
    [10, 11, 12]
Source code in submodules/mesqual/mesqual/utils/pandas_utils/set_new_column.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def set_column(
        df: pd.DataFrame,
        new_column_name: Hashable,
        new_column_values: pd.Series | pd.DataFrame
) -> pd.DataFrame:
    """Set or replace a column in a DataFrame with new values.

    Adds a new column or replaces an existing column in a DataFrame. Handles both
    Series and DataFrame inputs, with special logic for MultiIndex columns when
    using DataFrame inputs.

    Args:
        df: The DataFrame to modify.
        new_column_name: Name/key for the new column.
        new_column_values: Values for the new column. Can be a Series for simple
            columns or a DataFrame for MultiIndex column structures.

    Returns:
        A copy of the DataFrame with the new column added or existing column replaced.

    Raises:
        ValueError: If length of df and new_column_values don't match, or if
            new_column_values DataFrame has incorrect number of column levels.
        TypeError: If new_column_values is neither Series nor DataFrame.

    Examples:

        >>> import pandas as pd
        >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
        >>>
        >>> # Add Series as new column
        >>> new_series = pd.Series([7, 8, 9])
        >>> result = set_column(df, 'C', new_series)
        >>> print(result.columns.tolist())
            ['A', 'B', 'C']
        >>>
        >>> # Replace existing column
        >>> replacement = pd.Series([10, 11, 12])
        >>> result = set_column(df, 'A', replacement)
        >>> print(result['A'].tolist())
            [10, 11, 12]
    """

    dff = df.copy()

    if not len(dff) == len(new_column_values):
        raise ValueError('Length of dff and new_column_values must be equal.')

    # TODO optional: check index

    if isinstance(new_column_values, pd.Series):
        dff[new_column_name] = new_column_values
        return dff

    if isinstance(new_column_values, pd.DataFrame):
        if not new_column_values.columns.nlevels == (dff.columns.nlevels - 1):
            raise ValueError(
                'Your new_column_values must have n-1 column levels, where n is the number of levels in dff.'
            )

        if new_column_name in dff.columns:
            dff = dff.drop(columns=[new_column_name])

        new_column_values = pd.concat({new_column_name: new_column_values}, axis=1, names=[dff.columns.names[0]])
        dff = pd.concat([dff, new_column_values], axis=1)
        return dff

    else:
        raise TypeError('Used new_column_values type not accepted.')