# 141、pandas.Series.agg(regate)方法 pandas.Series.agg(func=None, axis=0, *args, **kwargs) Aggregate using one or more operations over the specified axis. Parameters: funcfunction, str, list or dict Function to use for aggregating the data. If a function, must either work when passed a Series or when passed to Series.apply. Accepted combinations are: function string function name list of functions and/or function names, e.g. [np.sum, 'mean'] dict of axis labels -> functions, function names or list of such. axis{0 or ‘index’} Unused. Parameter needed for compatibility with DataFrame. *args Positional arguments to pass to func. **kwargs Keyword arguments to pass to func. Returns: scalar, Series or DataFrame The return can be: scalar : when Series.agg is called with single function Series : when DataFrame.agg is called with a single function DataFrame : when DataFrame.agg is called with several functions
# 141、pandas.Series.agg(regate)方法 # 141-1、使用单个函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) sum_result = s.agg('sum') print(f"Sum: {sum_result}", end='\n\n') # 141-2、使用多个聚合函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) agg_result = s.agg(['sum', 'mean', 'min', 'max']) print(agg_result, end='\n\n') # 141-3、使用自定义的聚合函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) def range_func(x): return x.max() - x.min() range_result = s.agg(range_func) print(f"Range: {range_result}", end='\n\n') # 141-4、使用字典指定不同函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) agg_dict_result = s.agg({'sum': 'sum', 'average': 'mean'}) print(agg_dict_result)
# 141、pandas.Series.agg(regate)方法 # 141-1、使用单个函数 # Sum: 67 # 141-2、使用多个聚合函数 # sum 67.000000 # mean 9.571429 # min 3.000000 # max 24.000000 # dtype: float64 # 141-3、使用自定义的聚合函数 # Range: 21 # 141-4、使用字典指定不同函数 # sum 67.000000 # average 9.571429 # dtype: float64
# 142、pandas.Series.transform方法 pandas.Series.transform(func, axis=0, *args, **kwargs) Call func on self producing a Series with the same axis shape as self. Parameters: funcfunction, str, list-like or dict-like Function to use for transforming the data. If a function, must either work when passed a Series or when passed to Series.apply. If func is both list-like and dict-like, dict-like behavior takes precedence. Accepted combinations are: function string function name list-like of functions and/or function names, e.g. [np.exp, 'sqrt'] dict-like of axis labels -> functions, function names or list-like of such. axis{0 or ‘index’} Unused. Parameter needed for compatibility with DataFrame. *args Positional arguments to pass to func. **kwargs Keyword arguments to pass to func. Returns: Series A Series that must have the same length as self. Raises: ValueError If the returned Series has a different length than self.
# 142、pandas.Series.transform方法 # 142-1、应用单个函数 import pandas as pd # 示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 要应用的函数 def square(x): return x ** 2 # 应用transform transformed_data = data.transform(square) print(transformed_data, end='\n\n') # 142-2、应用多个函数 import pandas as pd # 示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 要应用的函数列表 funcs = [lambda x: x + 1, lambda x: x * 2] # 创建一个空的DataFrame来存储结果 transformed_df = pd.DataFrame() # 使用循环应用多个函数 for i, func in enumerate(funcs): transformed_df[f'func_{i + 1}'] = data.transform(func) # 输出结果 print(transformed_df)
# 142、pandas.Series.transform方法 # 142-1、应用单个函数 # 0 9 # 1 25 # 2 36 # 3 64 # 4 100 # 5 121 # 6 576 # dtype: int64 # 142-2、应用多个函数 # func_1 func_2 # 0 4 6 # 1 6 10 # 2 7 12 # 3 9 16 # 4 11 20 # 5 12 22 # 6 25 48
# 143、pandas.Series.map方法 pandas.Series.map(arg, na_action=None) Map values of Series according to an input mapping or function. Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series. Parameters: arg function, collections.abc.Mapping subclass or Series Mapping correspondence. na_action {None, ‘ignore’}, default None If ‘ignore’, propagate NaN values, without passing them to the mapping correspondence. Returns: Series Same index as caller.
用于将给定函数或映射(如字典、Series等)应用到Series的每个元素上,并返回一个新 Series,它主要用于元素级别的转换或替换操作。
# 143、pandas.Series.map方法 # 143-1、使用函数进行映射 import pandas as pd # 示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 映射函数:将每个元素平方 transformed_data = data.map(lambda x: x ** 2) print(transformed_data, end='\n\n') # 143-2、使用字典进行映射 import pandas as pd # 示例Series data = pd.Series(['Myelsa', 'Jimmy', 'bryce']) # 映射字典 mapping = {'Myelsa': '爸爸', 'Jimmy': '儿子', 'bryce': '女儿'} # 应用map进行映射 mapped_data = data.map(mapping) print(mapped_data, end='\n\n') # 143-3、处理缺失值 import pandas as pd # 示例Series,包含缺失值 data = pd.Series([3, 5, None, 6, 8]) # 映射函数 transformed_data = data.map(lambda x: x ** 2, na_action='ignore') print(transformed_data)
# 143、pandas.Series.map方法 # 143-1、使用函数进行映射 # 0 9 # 1 25 # 2 36 # 3 64 # 4 100 # 5 121 # 6 576 # dtype: int64 # 143-2、使用字典进行映射 # 0 爸爸 # 1 儿子 # 2 女儿 # dtype: object # 143-3、处理缺失值 # 0 9.0 # 1 25.0 # 2 NaN # 3 36.0 # 4 64.0 # dtype: float64
# 144、pandas.Series.groupby方法 pandas.Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=_NoDefault.no_default, dropna=True) Group Series using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Parameters: bymapping, function, label, pd.Grouper or list of such Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key. axis{0 or ‘index’, 1 or ‘columns’}, default 0 Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0. Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead. levelint, level name, or sequence of such, default None If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level. as_indexbool, default True Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide). sortbool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide). Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values. group_keysbool, default True When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise. Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not. Changed in version 2.0.0: group_keys now defaults to True. observedbool, default False This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers. Deprecated since version 2.1.0: The default value will change to True in a future version of pandas. dropnabool, default True If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups. Returns: pandas.api.typing.SeriesGroupBy Returns a groupby object that contains information about the groups.
# 144、pandas.Series.groupby方法 # 144-1、数据聚合 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 按索引分组并求和 grouped_sum = data.groupby(data.index).sum() print(grouped_sum, end='\n\n') # 144-2、数据转换 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 归一化每个组的数据 grouped_normalized = data.groupby(data.index).apply(lambda x: (x - x.mean()) / x.std()) print(grouped_normalized, end='\n\n') # 144-3、数据筛选 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 找出每组的最大值 grouped_max = data.groupby(data.index).max() print(grouped_max, end='\n\n') # 144-4、数据填充 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 创建一个带有缺失值的示例 Series data_with_nan = pd.Series([10, 20, None, 40, 50, None], index=['A', 'A', 'B', 'B', 'C', 'C']) # 用每组的均值填充缺失值 grouped_filled = data_with_nan.groupby(data_with_nan.index).transform(lambda x: x.fillna(x.mean())) print(grouped_filled, end='\n\n') # 144-5、分组统计 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 计算每组的标准差 grouped_std = data.groupby(data.index).std() print(grouped_std, end='\n\n') # 144-6、时间序列分析 import pandas as pd # 创建一个示例时间序列 date_range = pd.date_range(start='2024-01-01', periods=6, freq='ME') time_series_data = pd.Series([100, 200, 300, 400, 500, 600], index=date_range) # 按年分组并求和 grouped_by_year = time_series_data.groupby(time_series_data.index.year).sum() print(grouped_by_year, end='\n\n') # 144-7、分类数据分析 import pandas as pd # 创建一个分类数据的示例Series category_data = pd.Series([1, 2, 2, 3, 3, 3], index=['Myelsa', 'Alex', 'Lucy', 'Myelsa', 'Alex', 'Myelsa']) # 按类别分组并计数 grouped_count = category_data.groupby(category_data.index).count() print(grouped_count)
# 144、pandas.Series.groupby方法 # 144-1、数据聚合 # A 30 # B 70 # C 110 # dtype: int64 # 144-2、数据转换 # A A -0.707107 # A 0.707107 # B B -0.707107 # B 0.707107 # C C -0.707107 # C 0.707107 # dtype: float64 # 144-3、数据筛选 # A 20 # B 40 # C 60 # dtype: int64 # 144-4、数据填充 # A 10.0 # A 20.0 # B 40.0 # B 40.0 # C 50.0 # C 50.0 # dtype: float64 # 144-5、分组统计 # A 7.071068 # B 7.071068 # C 7.071068 # dtype: float64 # 144-6、时间序列分析 # 2024 2100 # dtype: int64 # 144-7、分类数据分析 # Alex 2 # Lucy 1 # Myelsa 3 # dtype: int64
# 145、pandas.Series.rolling方法 pandas.Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=_NoDefault.no_default, closed=None, step=None, method='single') Provide rolling window calculations. Parameters: windowint, timedelta, str, offset, or BaseIndexer subclass Size of the moving window. If an integer, the fixed number of observations used for each window. If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link. If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, closed and step will be passed to get_window_bounds. min_periodsint, default None Minimum number of observations in window required to have a value; otherwise, result is np.nan. For a window that is specified by an offset, min_periods will default to 1. For a window that is specified by an integer, min_periods will default to the size of the window. centerbool, default False If False, set the window labels as the right edge of the window index. If True, set the window labels as the center of the window index. win_typestr, default None If None, all points are evenly weighted. If a string, it must be a valid scipy.signal window function. Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature. onstr, optional For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame’s index. Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window. axisint or str, default 0 If 0 or 'index', roll across the rows. If 1 or 'columns', roll across the columns. For Series this parameter is unused and defaults to 0. Deprecated since version 2.1.0: The axis keyword is deprecated. For axis=1, transpose the DataFrame first instead. closedstr, default None If 'right', the first point in the window is excluded from calculations. If 'left', the last point in the window is excluded from calculations. If 'both', the no points in the window are excluded from calculations. If 'neither', the first and last points in the window are excluded from calculations. Default None ('right'). stepint, default None New in version 1.5.0. Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input. methodstr {‘single’, ‘table’}, default ‘single’ New in version 1.3.0. Execute the rolling operation per single column or row ('single') or over the entire object ('table'). This argument is only implemented when specifying engine='numba' in the method call. Returns: pandas.api.typing.Window or pandas.api.typing.Rolling An instance of Window is returned if win_type is passed. Otherwise, an instance of Rolling is returned.
145-2-4、win_type(可选,默认值为None):窗口类型(如'boxcar', 'triang', 'blackman'等),如果提供了此参数,则窗口会按该类型进行加权。
145-2-7、closed(可选,默认值为None):定义窗口端点是否包含在内,可以是'right', 'left', 'both', 'neither'。
145-2-8、step(可选,默认值为None):每次移动窗口的步长,默认为 1。
- .mean()
- .sum()
- .std()
- .var()
- .min()
- .max()
- .median()
- .apply(func)
# 145、pandas.Series.rolling方法 # 145-1、滚动平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动平均值,窗口大小为3 rolling_mean = data.rolling(window=3).mean() print(rolling_mean, end='\n\n') # 145-2、滚动总和 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动总和,窗口大小为3 rolling_sum = data.rolling(window=3).sum() print(rolling_sum, end='\n\n') # 145-3、滚动标准差 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动标准差,窗口大小为3 rolling_std = data.rolling(window=3).std() print(rolling_std, end='\n\n') # 145-4、滚动窗口的加权平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动加权平均值,窗口大小为3,窗口类型为'triang' rolling_weighted_mean = data.rolling(window=3, win_type='triang').mean() print(rolling_weighted_mean, end='\n\n') # 145-5、自定义步长的滚动平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动平均值,窗口大小为3,步长为2 rolling_mean_step = data.rolling(window=3, step=2).mean() print(rolling_mean_step, end='\n\n') # 145-6、居中对齐的滚动平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算居中对齐的滚动平均值,窗口大小为3 rolling_mean_centered = data.rolling(window=3, center=True).mean() print(rolling_mean_centered)
# 145、pandas.Series.rolling方法 # 145-1、滚动平均值 # 0 NaN # 1 NaN # 2 4.666667 # 3 6.333333 # 4 8.000000 # 5 9.666667 # 6 15.000000 # dtype: float64 # 145-2、滚动总和 # 0 NaN # 1 NaN # 2 14.0 # 3 19.0 # 4 24.0 # 5 29.0 # 6 45.0 # dtype: float64 # 145-3、滚动标准差 # 0 NaN # 1 NaN # 2 1.527525 # 3 1.527525 # 4 2.000000 # 5 1.527525 # 6 7.810250 # dtype: float64 # 145-4、滚动窗口的加权平均值 # 0 NaN # 1 NaN # 2 4.75 # 3 6.25 # 4 8.00 # 5 9.75 # 6 14.00 # dtype: float64 # 145-5、自定义步长的滚动平均值 # 0 NaN # 2 4.666667 # 4 8.000000 # 6 15.000000 # dtype: float64 # 145-6、居中对齐的滚动平均值 # 0 NaN # 1 4.666667 # 2 6.333333 # 3 8.000000 # 4 9.666667 # 5 15.000000 # 6 NaN # dtype: float64