Python酷库之旅-第三方库Pandas(045)_业界新闻

发布时间:2024-08-07 11:07

阅读量:0

一、用法精讲

156、pandas.Series.count方法

156-1、语法

156-2、参数

156-3、功能

156-4、返回值

156-5、说明

156-6、用法

156-6-1、数据准备

156-6-2、代码示例

156-6-3、结果输出

157、pandas.Series.cov方法

157-1、语法

157-2、参数

157-3、功能

157-4、返回值

157-5、说明

157-6、用法

157-6-1、数据准备

157-6-2、代码示例

157-6-3、结果输出

158、pandas.Series.cummax方法

158-1、语法

158-2、参数

158-3、功能

158-4、返回值

158-5、说明

158-6、用法

158-6-1、数据准备

158-6-2、代码示例

158-6-3、结果输出

159、pandas.Series.cummin方法

159-1、语法

159-2、参数

159-3、功能

159-4、返回值

159-5、说明

159-6、用法

159-6-1、数据准备

159-6-2、代码示例

159-6-3、结果输出

160、pandas.Series.cumprod方法

160-1、语法

160-2、参数

160-3、功能

160-4、返回值

160-5、说明

160-6、用法

一、用法精讲

156、pandas.Series.count方法

156-1、语法

# 156、pandas.Series.count方法 pandas.Series.count() Return number of non-NA/null observations in the Series.  Returns: int Number of non-null values in the Series.

156-2、参数

无

156-3、功能

用于计算Series中非NaN值的数量的方法，它会忽略NaN和None值，只统计有效的非缺失值。

156-4、返回值

返回的是一个整数，表示Series中非NaN或None值的数量，如果Series是空的或所有值都是缺失值，返回值将是0。

156-5、说明

无

156-6、用法

156-6-1、数据准备

无

156-6-2、代码示例

# 156、pandas.Series.count方法 import pandas as pd import numpy as np # 创建一个Series s = pd.Series([1, 2, np.nan, 4, None]) # 计算非NaN值的数量 count = s.count() print(count)

156-6-3、结果输出

# 156、pandas.Series.count方法 # 3

157、pandas.Series.cov方法

157-1、语法

# 157、pandas.Series.cov方法 pandas.Series.cov(other, min_periods=None, ddof=1) Compute covariance with Series, excluding missing values.  The two Series objects are not required to be the same length and will be aligned internally before the covariance is calculated.  Parameters: other Series Series with which to compute the covariance.  min_periods int, optional Minimum number of observations needed to have a valid result.  ddof int, default 1 Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.  Returns: float Covariance between Series and other normalized by N-1 (unbiased estimator).

157-2、参数

157-2-1、other(必须)：表示另一个Series对象，它与当前Series对象进行协方差计算，other必须与当前Series对象的长度相同。

157-2-2、min_periods(可选，默认值为None)：指定了计算协方差所需的最小有效观测数量，即在计算协方差之前，两个Series中的有效数据点必须达到这个数量。如果有效数据点少于这个数量，返回的结果将是NaN；如果未设置(即None)，则没有最小观测数量的限制，协方差会计算所有有效数据点。

157-2-3、ddof(可选，默认值为1)：自由度调整参数，该参数用于控制协方差的计算方式：

157-2-3-1、如果ddof=1，则计算样本协方差。这是默认设置，通常用于估计样本间的协方差

157-2-3-2、如果ddof=0，则计算总体协方差，即假设数据是整个总体的一部分。

157-3、功能

用于计算两个Series对象之间的协方差。

157-4、返回值

157-4-1、返回一个float类型的数值，表示两个Series之间的协方差。

157-4-2、如果无法计算协方差(例如Series的有效数据点数量不足)，则返回NaN。

157-5、说明

使用场景：

157-5-1、金融分析

风险管理：在投资组合管理中，协方差用来衡量两个资产(如股票、债券)价格变动的相关性，这有助于评估资产组合的风险。
资产配置：通过计算不同资产对组合收益的协方差，投资者可以优化资产配置以实现风险最小化或收益最大化。

157-5-2、数据分析与特征选择

特征相关性分析：在构建机器学习模型时，了解特征之间的协方差可以帮助选择相关性较强的特征，改进模型性能。
数据预处理：对于高维数据集，协方差矩阵有助于降维(例如，主成分分析PCA)以提取主要特征。

157-5-3、统计学研究

回归分析：协方差是回归分析中的基础统计量之一，用于理解自变量和因变量之间的线性关系。
变量关系探索：在探索数据集中的变量关系时，协方差可以作为初步分析工具，帮助识别变量间的潜在联系。

157-5-4、质量控制

过程控制：在制造业或服务业中，协方差可以用来监控两个过程变量(如生产速率与产品质量)的关系，以优化生产过程和产品质量。

157-5-5、社会科学研究

行为研究：在心理学或社会学研究中，协方差可以帮助分析不同变量(如心理测试分数与行为指标)之间的关系，揭示潜在的行为模式。

157-6、用法

157-6-1、数据准备

无

157-6-2、代码示例

# 157、pandas.Series.cov方法 import pandas as pd s1 = pd.Series([1, 2, 3, 4, 5]) s2 = pd.Series([5, 4, 3, 2, 1]) # 计算协方差，使用默认的min_periods和ddof covariance_default = s1.cov(s2) print("Default covariance:", covariance_default) # 设置min_periods为4 covariance_min_periods = s1.cov(s2, min_periods=4) print("Covariance with min_periods=4:", covariance_min_periods) # 设置ddof为0 covariance_ddof = s1.cov(s2, ddof=0) print("Covariance with ddof=0:", covariance_ddof)

157-6-3、结果输出

# 157、pandas.Series.cov方法 # Default covariance: -2.5 # Covariance with min_periods=4: -2.5 # Covariance with ddof=0: -2.0

158、pandas.Series.cummax方法

158-1、语法

# 158、pandas.Series.cummax方法 pandas.Series.cummax(axis=None, skipna=True, *args, **kwargs) Return cumulative maximum over a DataFrame or Series axis.  Returns a DataFrame or Series of the same size containing the cumulative maximum.  Parameters: axis {0 or ‘index’, 1 or ‘columns’}, default 0 The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.  skipna bool, default True Exclude NA/null values. If an entire row/column is NA, the result will be NA.  *args, **kwargs Additional keywords have no effect but might be accepted for compatibility with NumPy.  Returns: scalar or Series Return cumulative maximum of scalar or Series.

158-2、参数

158-2-1、axis(可选，默认值为None)：在Series上，此参数没有实际作用，因为Series只有一个轴。

158-2-2、skipna(可选，默认值为True)：如果为True，则在计算时会忽略NaN值；若为False，遇到NaN值时，结果也会为NaN。

158-2-3、*args(可选)：传递其他位置参数。

158-2-4、**kwarg(可选)：传递其他关键字参数。

158-3、功能

用于计算Series对象的累积最大值，该方法沿着指定的轴(对Series来说，通常是轴0，即数据的顺序)计算累积最大值。

158-4、返回值

返回一个新的Series对象，该Series的每个值表示从数据的开头到当前位置的最大值，这意味着返回的Series中的每个元素都是输入Series的累积最大值序列。具体来说，返回值的索引和原始Series一样，但其每个位置的值是从该位置开始向前的最大值。

158-5、说明

应用场景：

158-5-1、时间序列分析: 用于计算时间序列数据的累积最大值，帮助识别数据的波动模式。

158-5-2、投资分析: 用于跟踪某一资产的累积最大值，帮助评估其表现。

158-5-3、数据预处理: 在特征工程中，累积最大值可以用作特征提取的一部分，特别是在处理时间序列数据时。

158-6、用法

158-6-1、数据准备

无

158-6-2、代码示例

# 158、pandas.Series.cummax方法 import pandas as pd # 创建一个Series 对象 s = pd.Series([3, 1, 4, 1, 5, 9, 2, 6]) # 计算累积最大值 cummax_series = s.cummax() print(cummax_series)

158-6-3、结果输出

# 158、pandas.Series.cummax方法 # 0    3 # 1    3 # 2    4 # 3    4 # 4    5 # 5    9 # 6    9 # 7    9 # dtype: int64

159、pandas.Series.cummin方法

159-1、语法

# 159、pandas.Series.cummin方法 pandas.Series.cummin(axis=None, skipna=True, *args, **kwargs) Return cumulative minimum over a DataFrame or Series axis.  Returns a DataFrame or Series of the same size containing the cumulative minimum.  Parameters: axis {0 or ‘index’, 1 or ‘columns’}, default 0 The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.  skipna bool, default True Exclude NA/null values. If an entire row/column is NA, the result will be NA.  *args, **kwargs Additional keywords have no effect but might be accepted for compatibility with NumPy.  Returns: scalar or Series Return cumulative minimum of scalar or Series.

159-2、参数

159-2-1、axis(可选，默认值为None)：在Series上，此参数没有实际作用，因为Series只有一个轴。

159-2-2、skipna(可选，默认值为True)：如果为True，则在计算时会忽略NaN值；若为False，遇到NaN值时，结果也会为NaN。

159-2-3、*args(可选)：传递其他位置参数。

159-2-4、**kwarg(可选)：传递其他关键字参数。

159-3、功能

用于计算数据序列的累积最小值，它返回一个与原数据序列长度相同的序列，其中每个位置的值表示该位置之前(包括该位置)的所有元素中的最小值。

159-4、返回值

返回值是一个与原始Series对象长度相同的Series对象，其中每个值表示从序列的开始到当前位置的累计最小值，这个新的Series对象的索引与原始Series相同，而数据值则是对应位置的累计最小值。

159-5、说明

应用场景：

159-5-1、股票或金融时间序列分析：例如，计算某股票价格时间序列的每个时间点的最低价格，这对于识别最低点、计算潜在的止损点或分析趋势反转非常有用。

159-5-2、监控设备或传感器数据：在连续监控设备或传感器数据时，可能需要了解每个时间点的历史最低值，以检测异常情况或确定故障的可能性。

159-5-3、质量控制和生产线监测：在生产过程中，可能需要监测某些参数(如温度、压力等)的累积最小值，以确保生产过程在预期范围内运行。

159-5-4、竞赛或游戏中的排名分析：在某些竞赛或游戏中，可能需要跟踪某个选手在比赛过程中的最低排名。

159-6、用法

159-6-1、数据准备

无

159-6-2、代码示例

# 159、pandas.Series.cummin方法 # 159-1、股票或金融时间序列分析 import pandas as pd prices = pd.Series([10, 12, 8, 11, 9, 15, 7]) cummin_prices = prices.cummin() print(cummin_prices, end='\n\n')  # 159-2、监控设备或传感器数据 import pandas as pd temperatures = pd.Series([22, 21, 19, 20, 18, 17, 16]) cummin_temperatures = temperatures.cummin() print(cummin_temperatures, end='\n\n')  # 159-3、质量控制和生产线监测 import pandas as pd pressures = pd.Series([30, 28, 29, 27, 26, 25, 24]) cummin_pressures = pressures.cummin() print(cummin_pressures, end='\n\n')  # 159-4、竞赛或游戏中的排名分析 import pandas as pd ranks = pd.Series([5, 3, 4, 2, 1, 4, 3]) cummin_ranks = ranks.cummin() print(cummin_ranks)

159-6-3、结果输出

# 159、pandas.Series.cummin方法 # 159-1、股票或金融时间序列分析 # 0    10 # 1    10 # 2     8 # 3     8 # 4     8 # 5     8 # 6     7 # dtype: int64  # 159-2、监控设备或传感器数据 # 0    22 # 1    21 # 2    19 # 3    19 # 4    18 # 5    17 # 6    16 # dtype: int64  # 159-3、质量控制和生产线监测 # 0    30 # 1    28 # 2    28 # 3    27 # 4    26 # 5    25 # 6    24 # dtype: int64  # 159-4、竞赛或游戏中的排名分析 # 0    5 # 1    3 # 2    3 # 3    2 # 4    1 # 5    1 # 6    1 # dtype: int64

160、pandas.Series.cumprod方法

160-1、语法

# 160、pandas.Series.cumprod方法 pandas.Series.cumprod(axis=None, skipna=True, *args, **kwargs) Return cumulative product over a DataFrame or Series axis.  Returns a DataFrame or Series of the same size containing the cumulative product.  Parameters: axis {0 or ‘index’, 1 or ‘columns’}, default 0 The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.  skipna bool, default True Exclude NA/null values. If an entire row/column is NA, the result will be NA.  *args, **kwargs Additional keywords have no effect but might be accepted for compatibility with NumPy.  Returns: scalar or Series Return cumulative product of scalar or Series.

160-2、参数

160-2-1、axis(可选，默认值为None)：在Series上，此参数没有实际作用，因为Series只有一个轴。

160-2-2、skipna(可选，默认值为True)：如果为True，则在计算时会忽略NaN值；若为False，遇到NaN值时，结果也会为NaN。

160-2-3、*args(可选)：传递其他位置参数。

160-2-4、**kwarg(可选)：传递其他关键字参数。

160-3、功能

用于计算数据序列的累积乘积。

160-4、返回值

返回一个与原数据序列长度相同的序列，其中每个位置的值表示该位置之前(包括该位置)的所有元素的乘积。

160-5、说明

应用场景：

160-5-1、金融时间序列分析：例如，计算某投资组合在每个时间点的累计收益率。

160-5-2、产品生产过程中的累积效率：在生产过程中，可以使用累积乘积来计算某个过程中多个步骤的总效率。

160-5-3、概率累积计算：在一些概率问题中，可能需要计算一系列独立事件的联合概率。

160-5-4、成长因子的累积计算：例如，计算一个公司在每年增长率下的累积增长因子。

160-6、用法

160-6-1、数据准备

无

160-6-2、代码示例

# 160、pandas.Series.cumprod方法 # 160-1、金融时间序列分析 import pandas as pd returns = pd.Series([1.02, 1.03, 0.97, 1.05, 0.98]) cumprod_returns = returns.cumprod() print(cumprod_returns, end='\n\n')  # 160-2、产品生产过程中的累积效率 import pandas as pd efficiencies = pd.Series([0.95, 0.98, 0.99, 0.97]) cumprod_efficiencies = efficiencies.cumprod() print(cumprod_efficiencies, end='\n\n')  # 160-3、概率累积计算 import pandas as pd probabilities = pd.Series([0.9, 0.85, 0.8, 0.95]) cumprod_probabilities = probabilities.cumprod() print(cumprod_probabilities, end='\n\n')  # 160-4、成长因子的累积计算 import pandas as pd growth_factors = pd.Series([1.1, 1.05, 1.2, 1.15]) cumprod_growth_factors = growth_factors.cumprod() print(cumprod_growth_factors)

160-6-3、结果输出

# 160、pandas.Series.cumprod方法 # 160-1、金融时间序列分析 # 0    1.020000 # 1    1.050600 # 2    1.019082 # 3    1.070036 # 4    1.048635 # dtype: float64  # 160-2、产品生产过程中的累积效率 # 0    0.950000 # 1    0.931000 # 2    0.921690 # 3    0.894039 # dtype: float64  # 160-3、概率累积计算 # 0    0.9000 # 1    0.7650 # 2    0.6120 # 3    0.5814 # dtype: float64  # 160-4、成长因子的累积计算 # 0    1.1000 # 1    1.1550 # 2    1.3860 # 3    1.5939 # dtype: float64

支持

资讯

Python酷库之旅-第三方库Pandas(045)

一、用法精讲

156、pandas.Series.count方法

156-1、语法

156-2、参数

156-3、功能

156-4、返回值

156-5、说明

156-6、用法

156-6-1、数据准备

156-6-2、代码示例

156-6-3、结果输出

157、pandas.Series.cov方法

157-1、语法

157-2、参数

157-3、功能

157-4、返回值

157-5、说明

157-6、用法

157-6-1、数据准备

157-6-2、代码示例

157-6-3、结果输出

158、pandas.Series.cummax方法

158-1、语法

158-2、参数

158-3、功能

158-4、返回值

158-5、说明

158-6、用法

158-6-1、数据准备

158-6-2、代码示例

158-6-3、结果输出

159、pandas.Series.cummin方法

159-1、语法

159-2、参数

159-3、功能

159-4、返回值

159-5、说明

159-6、用法

159-6-1、数据准备

159-6-2、代码示例

159-6-3、结果输出

160、pandas.Series.cumprod方法

160-1、语法

160-2、参数

160-3、功能

160-4、返回值

160-5、说明

160-6、用法

160-6-1、数据准备

160-6-2、代码示例

160-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

相关阅读

广告一刻