目录
141、pandas.Series.agg(regate)方法
一、用法精讲
141、pandas.Series.agg(regate)方法
141-1、语法
# 141、pandas.Series.agg(regate)方法 pandas.Series.agg(func=None, axis=0, *args, **kwargs) Aggregate using one or more operations over the specified axis. Parameters: funcfunction, str, list or dict Function to use for aggregating the data. If a function, must either work when passed a Series or when passed to Series.apply. Accepted combinations are: function string function name list of functions and/or function names, e.g. [np.sum, 'mean'] dict of axis labels -> functions, function names or list of such. axis{0 or ‘index’} Unused. Parameter needed for compatibility with DataFrame. *args Positional arguments to pass to func. **kwargs Keyword arguments to pass to func. Returns: scalar, Series or DataFrame The return can be: scalar : when Series.agg is called with single function Series : when DataFrame.agg is called with a single function DataFrame : when DataFrame.agg is called with several functions
141-2、参数
141-2-1、func(可选,默认值为None):可以是单个函数、函数名(字符串)或者一个函数列表/字典,指定要应用的聚合函数。
141-2-2、axis(可选,默认值为0):表示按行聚合,在Series上时,该参数通常不必设置,因为Series只有一维。
141-2-3、*args(可选):额外的位置参数,将传递给函数。
141-2-4、**kwargs(可选):关键字参数,将传递给函数。
141-3、功能
用于对Series中的数据应用聚合函数。
141-4、返回值
返回聚合后的结果,具体类型取决于应用的聚合函数的类型。
141-5、说明
使用场景:
141-5-1、统计分析:计算总和、均值、最小值、最大值等。
141-5-2、数据概括:对数据进行概括和简单描述。
141-5-3、自定义统计:自定义聚合函数用于特殊的统计需求。
141-6、用法
141-6-1、数据准备
无
141-6-2、代码示例
# 141、pandas.Series.agg(regate)方法 # 141-1、使用单个函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) sum_result = s.agg('sum') print(f"Sum: {sum_result}", end='\n\n') # 141-2、使用多个聚合函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) agg_result = s.agg(['sum', 'mean', 'min', 'max']) print(agg_result, end='\n\n') # 141-3、使用自定义的聚合函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) def range_func(x): return x.max() - x.min() range_result = s.agg(range_func) print(f"Range: {range_result}", end='\n\n') # 141-4、使用字典指定不同函数 import pandas as pd # 创建一个示例Series s = pd.Series([3, 5, 6, 8, 10, 11, 24]) agg_dict_result = s.agg({'sum': 'sum', 'average': 'mean'}) print(agg_dict_result)
141-6-3、结果输出
# 141、pandas.Series.agg(regate)方法 # 141-1、使用单个函数 # Sum: 67 # 141-2、使用多个聚合函数 # sum 67.000000 # mean 9.571429 # min 3.000000 # max 24.000000 # dtype: float64 # 141-3、使用自定义的聚合函数 # Range: 21 # 141-4、使用字典指定不同函数 # sum 67.000000 # average 9.571429 # dtype: float64
142、pandas.Series.transform方法
142-1、语法
# 142、pandas.Series.transform方法 pandas.Series.transform(func, axis=0, *args, **kwargs) Call func on self producing a Series with the same axis shape as self. Parameters: funcfunction, str, list-like or dict-like Function to use for transforming the data. If a function, must either work when passed a Series or when passed to Series.apply. If func is both list-like and dict-like, dict-like behavior takes precedence. Accepted combinations are: function string function name list-like of functions and/or function names, e.g. [np.exp, 'sqrt'] dict-like of axis labels -> functions, function names or list-like of such. axis{0 or ‘index’} Unused. Parameter needed for compatibility with DataFrame. *args Positional arguments to pass to func. **kwargs Keyword arguments to pass to func. Returns: Series A Series that must have the same length as self. Raises: ValueError If the returned Series has a different length than self.
142-2、参数
142-2-1、func(必须):表示要应用于每个元素的函数,可以是单个函数或函数列表。
142-2-2、axis(可选,默认值为0):对于Series总是0,因为Series是一维的。
142-2-3、*args(可选):传递给func的位置参数。
142-2-4、**kwargs(可选):传递给func的关键字参数。
142-3、功能
用于对Series应用一个函数,该方法返回一个具有与输入相同形状的Series,但每个元素都经过函数处理。
142-4、返回值
一个经过转换的Series,长度与输入相同。
142-5、说明
142-5-1、transform类似于apply,但transform确保输出与输入形状相同。
142-5-2、该方法在需要应用转换并保持原始结构时特别有用,例如在分组操作时。
142-6、用法
142-6-1、数据准备
无
142-6-2、代码示例
# 142、pandas.Series.transform方法 # 142-1、应用单个函数 import pandas as pd # 示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 要应用的函数 def square(x): return x ** 2 # 应用transform transformed_data = data.transform(square) print(transformed_data, end='\n\n') # 142-2、应用多个函数 import pandas as pd # 示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 要应用的函数列表 funcs = [lambda x: x + 1, lambda x: x * 2] # 创建一个空的DataFrame来存储结果 transformed_df = pd.DataFrame() # 使用循环应用多个函数 for i, func in enumerate(funcs): transformed_df[f'func_{i + 1}'] = data.transform(func) # 输出结果 print(transformed_df)
142-6-3、结果输出
# 142、pandas.Series.transform方法 # 142-1、应用单个函数 # 0 9 # 1 25 # 2 36 # 3 64 # 4 100 # 5 121 # 6 576 # dtype: int64 # 142-2、应用多个函数 # func_1 func_2 # 0 4 6 # 1 6 10 # 2 7 12 # 3 9 16 # 4 11 20 # 5 12 22 # 6 25 48
143、pandas.Series.map方法
143-1、语法
# 143、pandas.Series.map方法 pandas.Series.map(arg, na_action=None) Map values of Series according to an input mapping or function. Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series. Parameters: arg function, collections.abc.Mapping subclass or Series Mapping correspondence. na_action {None, ‘ignore’}, default None If ‘ignore’, propagate NaN values, without passing them to the mapping correspondence. Returns: Series Same index as caller.
143-2、参数
143-2-1、arg(必须):函数、字典或Series,用于映射每个元素。
143-2-2、na_action(可选,默认值为None):指定对缺失值的处理方式,如果设置为'ignore',则缺失值将不进行映射。
143-3、功能
用于将给定函数或映射(如字典、Series等)应用到Series的每个元素上,并返回一个新 Series,它主要用于元素级别的转换或替换操作。
143-4、返回值
一个新的Series,其中的每个元素都根据提供的arg进行了映射或转换。
143-5、说明
pandas.Series.map方法是一个灵活的工具,用于将特定函数或映射应用到Series的每个元素,它支持处理缺失值,并且可以与各种映射对象(如字典、函数等)配合使用,非常适合进行元素级别的转换和替换操作。
143-6、用法
143-6-1、数据准备
无
143-6-2、代码示例
# 143、pandas.Series.map方法 # 143-1、使用函数进行映射 import pandas as pd # 示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 映射函数:将每个元素平方 transformed_data = data.map(lambda x: x ** 2) print(transformed_data, end='\n\n') # 143-2、使用字典进行映射 import pandas as pd # 示例Series data = pd.Series(['Myelsa', 'Jimmy', 'bryce']) # 映射字典 mapping = {'Myelsa': '爸爸', 'Jimmy': '儿子', 'bryce': '女儿'} # 应用map进行映射 mapped_data = data.map(mapping) print(mapped_data, end='\n\n') # 143-3、处理缺失值 import pandas as pd # 示例Series,包含缺失值 data = pd.Series([3, 5, None, 6, 8]) # 映射函数 transformed_data = data.map(lambda x: x ** 2, na_action='ignore') print(transformed_data)
143-6-3、结果输出
# 143、pandas.Series.map方法 # 143-1、使用函数进行映射 # 0 9 # 1 25 # 2 36 # 3 64 # 4 100 # 5 121 # 6 576 # dtype: int64 # 143-2、使用字典进行映射 # 0 爸爸 # 1 儿子 # 2 女儿 # dtype: object # 143-3、处理缺失值 # 0 9.0 # 1 25.0 # 2 NaN # 3 36.0 # 4 64.0 # dtype: float64
144、pandas.Series.groupby方法
144-1、语法
# 144、pandas.Series.groupby方法 pandas.Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=_NoDefault.no_default, dropna=True) Group Series using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Parameters: bymapping, function, label, pd.Grouper or list of such Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key. axis{0 or ‘index’, 1 or ‘columns’}, default 0 Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0. Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead. levelint, level name, or sequence of such, default None If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level. as_indexbool, default True Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide). sortbool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide). Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values. group_keysbool, default True When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise. Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not. Changed in version 2.0.0: group_keys now defaults to True. observedbool, default False This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers. Deprecated since version 2.1.0: The default value will change to True in a future version of pandas. dropnabool, default True If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups. Returns: pandas.api.typing.SeriesGroupBy Returns a groupby object that contains information about the groups.
144-2、参数
144-2-1、by(可选,默认值为None):用于分组的标准,可以是函数、字符串、列表或字典。
144-2-2、axis(可选,默认值为0):指定操作的轴。
144-2-3、level(可选,默认值为None):用于多层索引分组的级别。
144-2-4、as_index(可选,默认值为True):如果为True,分组键会成为结果DataFrame的索引。
144-2-5、sort(可选,默认值为True):对分组键进行排序。
144-2-6、group_keys(可选,默认值为True):如果为True,将分组键包含在结果中。
144-2-7、observed(可选):对于分类数据类型,是否只显示观察到的分类。
144-2-8、dropna(可选,默认值为True):如果为True,丢弃包含NA的分组。
144-3、功能
用于对Series数据进行分组操作,然后可以对每个分组应用聚合、转换或其他操作,这在数据分析中非常有用,尤其是在需要基于某些条件对数据进行分组和处理时。
144-4、返回值
返回一个pandas.core.groupby.SeriesGroupBy对象,可以对其应用各种聚合和转换操作。
144-5、说明
pandas.Series.groupby是一个强大的方法,可以根据不同的标准对数据进行分组,并对每个分组进行各种操作。无论是基于索引、函数还是多层索引,groupby都提供了灵活的分组方式,并且能够与其他pandas方法结合使用,进行复杂的数据分析和处理。
144-6、用法
144-6-1、数据准备
无
144-6-2、代码示例
# 144、pandas.Series.groupby方法 # 144-1、数据聚合 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 按索引分组并求和 grouped_sum = data.groupby(data.index).sum() print(grouped_sum, end='\n\n') # 144-2、数据转换 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 归一化每个组的数据 grouped_normalized = data.groupby(data.index).apply(lambda x: (x - x.mean()) / x.std()) print(grouped_normalized, end='\n\n') # 144-3、数据筛选 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 找出每组的最大值 grouped_max = data.groupby(data.index).max() print(grouped_max, end='\n\n') # 144-4、数据填充 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 创建一个带有缺失值的示例 Series data_with_nan = pd.Series([10, 20, None, 40, 50, None], index=['A', 'A', 'B', 'B', 'C', 'C']) # 用每组的均值填充缺失值 grouped_filled = data_with_nan.groupby(data_with_nan.index).transform(lambda x: x.fillna(x.mean())) print(grouped_filled, end='\n\n') # 144-5、分组统计 import pandas as pd # 创建一个示例Series data = pd.Series([10, 20, 30, 40, 50, 60], index=['A', 'A', 'B', 'B', 'C', 'C']) # 计算每组的标准差 grouped_std = data.groupby(data.index).std() print(grouped_std, end='\n\n') # 144-6、时间序列分析 import pandas as pd # 创建一个示例时间序列 date_range = pd.date_range(start='2024-01-01', periods=6, freq='ME') time_series_data = pd.Series([100, 200, 300, 400, 500, 600], index=date_range) # 按年分组并求和 grouped_by_year = time_series_data.groupby(time_series_data.index.year).sum() print(grouped_by_year, end='\n\n') # 144-7、分类数据分析 import pandas as pd # 创建一个分类数据的示例Series category_data = pd.Series([1, 2, 2, 3, 3, 3], index=['Myelsa', 'Alex', 'Lucy', 'Myelsa', 'Alex', 'Myelsa']) # 按类别分组并计数 grouped_count = category_data.groupby(category_data.index).count() print(grouped_count)
144-6-3、结果输出
# 144、pandas.Series.groupby方法 # 144-1、数据聚合 # A 30 # B 70 # C 110 # dtype: int64 # 144-2、数据转换 # A A -0.707107 # A 0.707107 # B B -0.707107 # B 0.707107 # C C -0.707107 # C 0.707107 # dtype: float64 # 144-3、数据筛选 # A 20 # B 40 # C 60 # dtype: int64 # 144-4、数据填充 # A 10.0 # A 20.0 # B 40.0 # B 40.0 # C 50.0 # C 50.0 # dtype: float64 # 144-5、分组统计 # A 7.071068 # B 7.071068 # C 7.071068 # dtype: float64 # 144-6、时间序列分析 # 2024 2100 # dtype: int64 # 144-7、分类数据分析 # Alex 2 # Lucy 1 # Myelsa 3 # dtype: int64
145、pandas.Series.rolling方法
145-1、语法
# 145、pandas.Series.rolling方法 pandas.Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=_NoDefault.no_default, closed=None, step=None, method='single') Provide rolling window calculations. Parameters: windowint, timedelta, str, offset, or BaseIndexer subclass Size of the moving window. If an integer, the fixed number of observations used for each window. If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link. If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, closed and step will be passed to get_window_bounds. min_periodsint, default None Minimum number of observations in window required to have a value; otherwise, result is np.nan. For a window that is specified by an offset, min_periods will default to 1. For a window that is specified by an integer, min_periods will default to the size of the window. centerbool, default False If False, set the window labels as the right edge of the window index. If True, set the window labels as the center of the window index. win_typestr, default None If None, all points are evenly weighted. If a string, it must be a valid scipy.signal window function. Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature. onstr, optional For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame’s index. Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window. axisint or str, default 0 If 0 or 'index', roll across the rows. If 1 or 'columns', roll across the columns. For Series this parameter is unused and defaults to 0. Deprecated since version 2.1.0: The axis keyword is deprecated. For axis=1, transpose the DataFrame first instead. closedstr, default None If 'right', the first point in the window is excluded from calculations. If 'left', the last point in the window is excluded from calculations. If 'both', the no points in the window are excluded from calculations. If 'neither', the first and last points in the window are excluded from calculations. Default None ('right'). stepint, default None New in version 1.5.0. Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input. methodstr {‘single’, ‘table’}, default ‘single’ New in version 1.3.0. Execute the rolling operation per single column or row ('single') or over the entire object ('table'). This argument is only implemented when specifying engine='numba' in the method call. Returns: pandas.api.typing.Window or pandas.api.typing.Rolling An instance of Window is returned if win_type is passed. Otherwise, an instance of Rolling is returned.
145-2、参数
145-2-1、window(必须):滚动窗口的大小,即每次计算时包含的元素数。
145-2-2、min_periods(可选,默认值为None):每个窗口中需要的非缺失值的最小数量,如果窗口中非缺失值的数量小于该值,则结果为NaN。
145-2-3、center(可选,默认值为False):布尔值,如果为True,则将窗口居中对齐。
145-2-4、win_type(可选,默认值为None):窗口类型(如'boxcar', 'triang', 'blackman'等),如果提供了此参数,则窗口会按该类型进行加权。
145-2-5、on(可选,默认值为None):用于滚动窗口的时间序列或时间戳数据的列名,仅对DataFrame有效。
145-2-6、axis(可选):要滚动的轴,0或'index'表示行,1或'columns'表示列。对于Series,只能是0。
145-2-7、closed(可选,默认值为None):定义窗口端点是否包含在内,可以是'right', 'left', 'both', 'neither'。
145-2-8、step(可选,默认值为None):每次移动窗口的步长,默认为 1。
145-2-9、method(可选,默认值为'single'):窗口方法,可选'table'(使用更高效的窗口操作)。
145-3、功能
145-3-1、移动平均值:计算指定窗口大小内的平均值,用于平滑数据。
145-3-2、移动总和:计算指定窗口大小内的总和,适用于累积数据分析。
145-3-3、移动标准差:计算指定窗口大小内的数据的标准差,用于度量数据的波动性。
145-3-4、加权移动平均:使用不同的窗口类型(如三角形、汉宁窗等)来计算加权平均值。
145-3-5、移动最大值/最小值:计算窗口内的最大值或最小值。
145-3-6、移动中位数:计算窗口内的中位数,用于了解数据的中心趋势。
145-3-7、自定义统计量:通过apply方法可以对窗口内的数据应用自定义函数,计算特定的统计量。
145-4、返回值
pandas.Series.rolling返回一个Rolling对象,该对象可以进一步调用各种统计方法来计算滚动窗口内的统计量,这些方法包括但不限于:
- .mean()
- .sum()
- .std()
- .var()
- .min()
- .max()
- .median()
- .apply(func)
145-5、说明
145-5-1、窗口大小:窗口大小(window)是最关键的参数,决定了每次计算包含的元素数量。
145-5-2、缺失值处理:min_periods参数控制每个窗口中需要的非缺失值的最小数量,如果窗口中非缺失值的数量小于该值,结果为NaN。
145-5-3、对齐方式:center参数控制窗口是否居中对齐,如果为True,窗口中心对齐当前元素,否则窗口的右端对齐当前元素。
145-5-4、加权窗口:通过win_type参数可以指定不同的加权方式,使得窗口内的每个元素具有不同的权重。
145-6、用法
145-6-1、数据准备
无
145-6-2、代码示例
# 145、pandas.Series.rolling方法 # 145-1、滚动平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动平均值,窗口大小为3 rolling_mean = data.rolling(window=3).mean() print(rolling_mean, end='\n\n') # 145-2、滚动总和 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动总和,窗口大小为3 rolling_sum = data.rolling(window=3).sum() print(rolling_sum, end='\n\n') # 145-3、滚动标准差 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动标准差,窗口大小为3 rolling_std = data.rolling(window=3).std() print(rolling_std, end='\n\n') # 145-4、滚动窗口的加权平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动加权平均值,窗口大小为3,窗口类型为'triang' rolling_weighted_mean = data.rolling(window=3, win_type='triang').mean() print(rolling_weighted_mean, end='\n\n') # 145-5、自定义步长的滚动平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算滚动平均值,窗口大小为3,步长为2 rolling_mean_step = data.rolling(window=3, step=2).mean() print(rolling_mean_step, end='\n\n') # 145-6、居中对齐的滚动平均值 import pandas as pd # 创建一个示例Series data = pd.Series([3, 5, 6, 8, 10, 11, 24]) # 计算居中对齐的滚动平均值,窗口大小为3 rolling_mean_centered = data.rolling(window=3, center=True).mean() print(rolling_mean_centered)
145-6-3、结果输出
# 145、pandas.Series.rolling方法 # 145-1、滚动平均值 # 0 NaN # 1 NaN # 2 4.666667 # 3 6.333333 # 4 8.000000 # 5 9.666667 # 6 15.000000 # dtype: float64 # 145-2、滚动总和 # 0 NaN # 1 NaN # 2 14.0 # 3 19.0 # 4 24.0 # 5 29.0 # 6 45.0 # dtype: float64 # 145-3、滚动标准差 # 0 NaN # 1 NaN # 2 1.527525 # 3 1.527525 # 4 2.000000 # 5 1.527525 # 6 7.810250 # dtype: float64 # 145-4、滚动窗口的加权平均值 # 0 NaN # 1 NaN # 2 4.75 # 3 6.25 # 4 8.00 # 5 9.75 # 6 14.00 # dtype: float64 # 145-5、自定义步长的滚动平均值 # 0 NaN # 2 4.666667 # 4 8.000000 # 6 15.000000 # dtype: float64 # 145-6、居中对齐的滚动平均值 # 0 NaN # 1 4.666667 # 2 6.333333 # 3 8.000000 # 4 9.666667 # 5 15.000000 # 6 NaN # dtype: float64