機率與統計 2

August 2, 2019 1 minute read

資料轉換

為什麼要做資料轉換?
常見的資料轉換方式
對數轉換(Log Transformation)
Box-Cox Transformation
標準化(Standardization)
要使用哪㇐種資料轉換方式?

###　為什麼要做資料轉換?

to make it more closely the assumptions of a statistical inference procedure,
to make it easier to visualize (appearance of graphs),
to improve interpretability,
to make descriptors that have been measured in different units comparable,
to make the relationships among variables linear,
to modify the weightsof the variables or objects (e.g. give the same length (or norm) to all object vectors)
to codecategorical variables into dummy binary variables

對數轉換

資料數值都必須是正的

   # 自己練習一個
   numbers = np.arange(50)
   plt.scatter(numbers, np.log(numbers))
   plt.show()

對數轉換: How to handle Negative Data Values?

Solution 1 : Translate, then Transform

log(x + min(x))

 #logx <- function(x){
 #  log(x +1 - min(x))
 #}
 def logx(x):
     a = np.log(x + 1 - min(x))
     return a
      
 #x <- runif(80, min = -5 , max = 5)
 x = np.random.uniform(-5, 5, 80)
      
 #x <- c(x, rnorm(20, mean = 20, sd = 10))
 x = np.concatenate((x, np.random.normal(loc = 20.0, scale = 10.0, size = 20)))
      
 #par(mfrow = c(1,3))
 #hist(x, main = "x~runif")
 plt.hist(x)
 plt.show()
      
 #plot(x, logx(x), main = "x vs logx")
 plt.scatter(x, logx(x))
 plt.show()
      
 #hist(logx(x), main = "logx")
 plt.hist(logx(x))
 plt.show()

Solution 2 : Missing Values
- A criticismof the previous method is that some practicing statisticians don’t like to add an arbitrary constant to the data.
- They argue that a better wayto handle negative values is to use missing values for the logarithm of a nonpositivenumber.

Box-Cox Transformations

Imgur

Standardization

Imgur

標準化指令: python 透過 sklearn

API
sklearn.preprocessing.scale

 import pandas as pd
 from sklearn.preprocessing import scale
 
 cellraw = pd.read_csv('./Data/trad_alpha103.txt', header = 0, index_col = 0, sep = '\t')
 cellxdata = scale(cellraw.iloc[:, 1:19], axis = 1)

阿葛廷

機率與統計 2

資料轉換

對數轉換

對數轉換: How to handle Negative Data Values?

Box-Cox Transformations

Standardization

重抽法則

資料不平衡

無母樹統計

平滑技巧

探索是資料分析

You May Also Enjoy

再不說些話我都快被ＡＩ淹沒拉！！！！

daily Programming: 寶哥出場品質保證又見 GitHub Copilot!

daily Programming: Azure AI Search

當你覺得對方很Ｇ8討厭的時候，放面鏡子照照自己剛剛的行為先吧！

阿葛廷

資料轉換

對數轉換

對數轉換: How to handle Negative Data Values?

Box-Cox Transformations

Standardization

重抽法則

資料不平衡

無母樹統計

平滑技巧

探索是資料分析

You May Also Enjoy

再不說些話我都快被ＡＩ淹沒拉！！！！

daily Programming: 寶哥出場 品質保證 又見 GitHub Copilot!

daily Programming: Azure AI Search

當你覺得對方很Ｇ8討厭的時候，放面鏡子照照自己剛剛的行為先吧！

daily Programming: 寶哥出場品質保證又見 GitHub Copilot!