脸谱图
出于一种个体的审美原因,我一直都很欣赏Chernoff创造的脸谱图。它让我感觉到了统计学家的浪漫,让我摆脱了对统计学家严谨、保守的刻板印象。
脸谱图工作原理极其简单,它用人类的脸部特征来刻画多维变量,与雷达图、平行坐标图等在本质上并无太大不同。只不过,它带着人类的五官出现,是一种显得可爱又好玩的针对多元数据的可视化方法,可能十分适合用来激发人们对多元数据可视化的兴趣。
一般的Chernoff脸谱图是这样的,以R中自带的鸢尾花数据集(iris)为例:
library(aplpack)
faces(iris[1:20,1:4],face.type=0)

## effect of variables:
## modified item Var
## "height of face " "Sepal.Length"
## "width of face " "Sepal.Width"
## "structure of face" "Petal.Length"
## "height of mouth " "Petal.Width"
## "width of mouth " "Sepal.Length"
## "smiling " "Sepal.Width"
## "height of eyes " "Petal.Length"
## "width of eyes " "Petal.Width"
## "height of hair " "Sepal.Length"
## "width of hair " "Sepal.Width"
## "style of hair " "Petal.Length"
## "height of nose " "Petal.Width"
## "width of nose " "Sepal.Length"
## "width of ear " "Sepal.Width"
## "height of ear " "Petal.Length"
上图可能看起来比较呆板。我们当然可以画一个更好看的:
faces(iris[1:20,1:4],face.type=1)

## effect of variables:
## modified item Var
## "height of face " "Sepal.Length"
## "width of face " "Sepal.Width"
## "structure of face" "Petal.Length"
## "height of mouth " "Petal.Width"
## "width of mouth " "Sepal.Length"
## "smiling " "Sepal.Width"
## "height of eyes " "Petal.Length"
## "width of eyes " "Petal.Width"
## "height of hair " "Sepal.Length"
## "width of hair " "Sepal.Width"
## "style of hair " "Petal.Length"
## "height of nose " "Petal.Width"
## "width of nose " "Sepal.Length"
## "width of ear " "Sepal.Width"
## "height of ear " "Petal.Length"
我比较喜欢圣诞老人这个版本:
faces(iris[1:20,1:4],face.type=2)

## effect of variables:
## modified item Var
## "height of face " "Sepal.Length"
## "width of face " "Sepal.Width"
## "structure of face" "Petal.Length"
## "height of mouth " "Petal.Width"
## "width of mouth " "Sepal.Length"
## "smiling " "Sepal.Width"
## "height of eyes " "Petal.Length"
## "width of eyes " "Petal.Width"
## "height of hair " "Sepal.Length"
## "width of hair " "Sepal.Width"
## "style of hair " "Petal.Length"
## "height of nose " "Petal.Width"
## "width of nose " "Sepal.Length"
## "width of ear " "Sepal.Width"
## "height of ear " "Petal.Length"
主流多元数据可视化方法
虽然脸谱图看起来好玩儿有趣,但真正在业界落地数据可视化时,它并不太实用。业界有很多更加实用的多元数据可视化方法值得学习。这些方法大致可以分为以下几类:
基于几何的交互式方法
这类方法主要包括常用的散点图矩阵、平行坐标图和雷达图。
散点图矩阵
散点图以网格形式展示多个二维散点图,可以直观反映变量间的两两关系。适合探索变量之间的相关性,且可以通过颜色/符号增强信息表达的力度。
看一个例子:
# 加载包
library(GGally)
## 载入需要的程序包:ggplot2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
# 绘制散点图矩阵(含密度图和相关系数)
ggpairs(iris, columns = 1:4,
mapping = aes(color = Species), # 按类别着色
upper = list(continuous = "cor"),
lower = list(continuous = "smooth"))

散点图矩阵的缺点也很突出,当变量很多时,图形阅读难度会大幅提升。
平行坐标图
通过平行排列的坐标轴进行多维数据可视化,图上以折线连接各变量值。它的优点是清晰呈现高维数据中的聚类、趋势和异常值。缺点是高维数据易导致线条重叠,需结合交互操作优。
上一个例子:
library(GGally)
# 绘制平行坐标图(标准化处理)
ggparcoord(iris, columns = 1:4, groupColumn = 5,
scale = "uniminmax", # 标准化到[0,1]
alphaLines = 0.5) +
theme_minimal()

雷达图
雷达图在工业界和商业界应用广泛,它是以多边形闭合图形展示多维数据,用顶点代表变量值。雷达图适合多指标对比(如产品性能评估),缺点是数据差异较大时图形易失真。
# 安装包(如未安装)
# devtools::install_github("ricardo-bion/ggradar")
library(ggradar)
library(dplyr)
##
## 载入程序包:'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# 数据预处理(计算各物种均值)
iris_radar <- iris %>%
group_by(Species) %>%
summarise(across(1:4, mean))
# 绘制雷达图
ggradar(iris_radar, grid.min = 0, grid.max = 8)

基于像素的密度表现方法
主要包括热力图和像素图。
热力图
热力图以色块颜色深浅表示数值大小或分布密度。常用于展示数据矩阵的关联模式(如用户行为聚类)等。
library(ggplot2)
library(reshape2)
# 数据长格式转换
iris_melt <- melt(iris[,1:4])
## No id variables; using all as measure variables
# 绘制热力图(数值分布)
ggplot(iris_melt, aes(x = variable, y = value)) +
geom_bin2d(bins = 20) + # 二维密度统计
scale_fill_viridis_c() # 颜色渐变

像素图
像素图将数据映射为像素的颜色或亮度属性。适合大规模数据的高效呈现。
降维和空间映射方法
主要包括主成分分析方法、t-SNE与UMAP方法及气泡图。
主成分分析(PCA)
PCA的思想是将高维数据投影到低维空间,保留主要方差信息,再在低维空间对数据进行可视化。
# PCA分析
pca <- prcomp(iris[,1:4], scale = TRUE)
pca_scores <- data.frame(pca$x, Species = iris$Species)
# 绘制PCA二维投影
ggplot(pca_scores, aes(x = PC1, y = PC2, color = Species)) +
geom_point(size = 3) +
stat_ellipse(level = 0.95) + # 添加置信椭圆
labs(x = "PC1 (73%)", y = "PC2 (22%)") # 方差解释率

t-SNE与UMAP方法
这两种方法都是非线性降维方法,更擅长保留局部结构和聚类特征。
# t-SNE
library(Rtsne)
set.seed(123)
iris <- unique(iris)
tsne <- Rtsne(unique(iris[,1:4]), perplexity = 30)
tsne_df <- data.frame(TSNE1 = tsne$Y[,1], TSNE2 = tsne$Y[,2], Species = iris$Species)
# UMAP
library(umap)
umap_res <- umap(iris[,1:4])
umap_df <- data.frame(UMAP1 = umap_res$layout[,1], UMAP2 = umap_res$layout[,2], Species = iris$Species)
# 可视化对比
library(patchwork)
p1 <- ggplot(tsne_df, aes(TSNE1, TSNE2, color = Species)) + geom_point()
p2 <- ggplot(umap_df, aes(UMAP1, UMAP2, color = Species)) + geom_point()
p1 + p2 # 并排显示

气泡图
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, size = hp, color = factor(cyl))) +
geom_point(alpha = 0.7) +
scale_size(range = c(3, 15)) + # 调整气泡尺寸范围
labs(x = "车重", y = "油耗", color = "气缸数", size = "马力")

气泡图可以和热图相结合。
library(reshape2)
data_melt <- melt(cor(mtcars)) # 数据长格式转换
p <- ggplot(data_melt, aes(Var1, Var2, size = abs(value), color = value)) +
geom_point() +
scale_color_gradient2(low = "blue", mid = "white", high = "red") + # 渐变色
theme_minimal()
p

气泡图还可以变成交互式的。
library(plotly)
##
## 载入程序包:'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
ggplotly(p) # 将静态ggplot转换为交互式图表
其他方法
主要包括桑基图、三维散点图和表格透镜。
桑基图
这是一个垂直领域的多元数据可视化的方法,它可以展示流量或资源在多节点间的流转路径(如用户转化漏斗)。
# 安装包(如未安装)
# devtools::install_github("https://github.com/christophergandrud/networkD3")
library(networkD3)
# 构造示例数据(模拟物种间转化)
nodes <- data.frame(name = c("Setosa", "Versicolor", "Virginica"))
links <- data.frame(source = c(0,1,2), target = c(1,2,0), value = c(10,20,15))
# 绘制桑基图
sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name")
三维散点图
通过三维空间扩展,直观呈现三个变量间的关系。
library(plotly)
# 交互式三维散点图
plot_ly(iris, x = ~Sepal.Length, y = ~Sepal.Width, z = ~Petal.Length,
color = ~Species, type = "scatter3d", mode = "markers")
表格透镜
以交互式表格结合横条/点状图,快速比较大量数据与属性。
library(gt)
library(gtExtras)
# 创建交互式表格(含条形图)
iris %>%
gt() %>%
gt_plt_bar(column = Sepal.Length, width = 50) %>% # 添加条形图
gt_theme_nytimes() # 主题风格
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
3.5 | 1.4 | 0.2 | setosa | |
3.0 | 1.4 | 0.2 | setosa | |
3.2 | 1.3 | 0.2 | setosa | |
3.1 | 1.5 | 0.2 | setosa | |
3.6 | 1.4 | 0.2 | setosa | |
3.9 | 1.7 | 0.4 | setosa | |
3.4 | 1.4 | 0.3 | setosa | |
3.4 | 1.5 | 0.2 | setosa | |
2.9 | 1.4 | 0.2 | setosa | |
3.1 | 1.5 | 0.1 | setosa | |
3.7 | 1.5 | 0.2 | setosa | |
3.4 | 1.6 | 0.2 | setosa | |
3.0 | 1.4 | 0.1 | setosa | |
3.0 | 1.1 | 0.1 | setosa | |
4.0 | 1.2 | 0.2 | setosa | |
4.4 | 1.5 | 0.4 | setosa | |
3.9 | 1.3 | 0.4 | setosa | |
3.5 | 1.4 | 0.3 | setosa | |
3.8 | 1.7 | 0.3 | setosa | |
3.8 | 1.5 | 0.3 | setosa | |
3.4 | 1.7 | 0.2 | setosa | |
3.7 | 1.5 | 0.4 | setosa | |
3.6 | 1.0 | 0.2 | setosa | |
3.3 | 1.7 | 0.5 | setosa | |
3.4 | 1.9 | 0.2 | setosa | |
3.0 | 1.6 | 0.2 | setosa | |
3.4 | 1.6 | 0.4 | setosa | |
3.5 | 1.5 | 0.2 | setosa | |
3.4 | 1.4 | 0.2 | setosa | |
3.2 | 1.6 | 0.2 | setosa | |
3.1 | 1.6 | 0.2 | setosa | |
3.4 | 1.5 | 0.4 | setosa | |
4.1 | 1.5 | 0.1 | setosa | |
4.2 | 1.4 | 0.2 | setosa | |
3.1 | 1.5 | 0.2 | setosa | |
3.2 | 1.2 | 0.2 | setosa | |
3.5 | 1.3 | 0.2 | setosa | |
3.6 | 1.4 | 0.1 | setosa | |
3.0 | 1.3 | 0.2 | setosa | |
3.4 | 1.5 | 0.2 | setosa | |
3.5 | 1.3 | 0.3 | setosa | |
2.3 | 1.3 | 0.3 | setosa | |
3.2 | 1.3 | 0.2 | setosa | |
3.5 | 1.6 | 0.6 | setosa | |
3.8 | 1.9 | 0.4 | setosa | |
3.0 | 1.4 | 0.3 | setosa | |
3.8 | 1.6 | 0.2 | setosa | |
3.2 | 1.4 | 0.2 | setosa | |
3.7 | 1.5 | 0.2 | setosa | |
3.3 | 1.4 | 0.2 | setosa | |
3.2 | 4.7 | 1.4 | versicolor | |
3.2 | 4.5 | 1.5 | versicolor | |
3.1 | 4.9 | 1.5 | versicolor | |
2.3 | 4.0 | 1.3 | versicolor | |
2.8 | 4.6 | 1.5 | versicolor | |
2.8 | 4.5 | 1.3 | versicolor | |
3.3 | 4.7 | 1.6 | versicolor | |
2.4 | 3.3 | 1.0 | versicolor | |
2.9 | 4.6 | 1.3 | versicolor | |
2.7 | 3.9 | 1.4 | versicolor | |
2.0 | 3.5 | 1.0 | versicolor | |
3.0 | 4.2 | 1.5 | versicolor | |
2.2 | 4.0 | 1.0 | versicolor | |
2.9 | 4.7 | 1.4 | versicolor | |
2.9 | 3.6 | 1.3 | versicolor | |
3.1 | 4.4 | 1.4 | versicolor | |
3.0 | 4.5 | 1.5 | versicolor | |
2.7 | 4.1 | 1.0 | versicolor | |
2.2 | 4.5 | 1.5 | versicolor | |
2.5 | 3.9 | 1.1 | versicolor | |
3.2 | 4.8 | 1.8 | versicolor | |
2.8 | 4.0 | 1.3 | versicolor | |
2.5 | 4.9 | 1.5 | versicolor | |
2.8 | 4.7 | 1.2 | versicolor | |
2.9 | 4.3 | 1.3 | versicolor | |
3.0 | 4.4 | 1.4 | versicolor | |
2.8 | 4.8 | 1.4 | versicolor | |
3.0 | 5.0 | 1.7 | versicolor | |
2.9 | 4.5 | 1.5 | versicolor | |
2.6 | 3.5 | 1.0 | versicolor | |
2.4 | 3.8 | 1.1 | versicolor | |
2.4 | 3.7 | 1.0 | versicolor | |
2.7 | 3.9 | 1.2 | versicolor | |
2.7 | 5.1 | 1.6 | versicolor | |
3.0 | 4.5 | 1.5 | versicolor | |
3.4 | 4.5 | 1.6 | versicolor | |
3.1 | 4.7 | 1.5 | versicolor | |
2.3 | 4.4 | 1.3 | versicolor | |
3.0 | 4.1 | 1.3 | versicolor | |
2.5 | 4.0 | 1.3 | versicolor | |
2.6 | 4.4 | 1.2 | versicolor | |
3.0 | 4.6 | 1.4 | versicolor | |
2.6 | 4.0 | 1.2 | versicolor | |
2.3 | 3.3 | 1.0 | versicolor | |
2.7 | 4.2 | 1.3 | versicolor | |
3.0 | 4.2 | 1.2 | versicolor | |
2.9 | 4.2 | 1.3 | versicolor | |
2.9 | 4.3 | 1.3 | versicolor | |
2.5 | 3.0 | 1.1 | versicolor | |
2.8 | 4.1 | 1.3 | versicolor | |
3.3 | 6.0 | 2.5 | virginica | |
2.7 | 5.1 | 1.9 | virginica | |
3.0 | 5.9 | 2.1 | virginica | |
2.9 | 5.6 | 1.8 | virginica | |
3.0 | 5.8 | 2.2 | virginica | |
3.0 | 6.6 | 2.1 | virginica | |
2.5 | 4.5 | 1.7 | virginica | |
2.9 | 6.3 | 1.8 | virginica | |
2.5 | 5.8 | 1.8 | virginica | |
3.6 | 6.1 | 2.5 | virginica | |
3.2 | 5.1 | 2.0 | virginica | |
2.7 | 5.3 | 1.9 | virginica | |
3.0 | 5.5 | 2.1 | virginica | |
2.5 | 5.0 | 2.0 | virginica | |
2.8 | 5.1 | 2.4 | virginica | |
3.2 | 5.3 | 2.3 | virginica | |
3.0 | 5.5 | 1.8 | virginica | |
3.8 | 6.7 | 2.2 | virginica | |
2.6 | 6.9 | 2.3 | virginica | |
2.2 | 5.0 | 1.5 | virginica | |
3.2 | 5.7 | 2.3 | virginica | |
2.8 | 4.9 | 2.0 | virginica | |
2.8 | 6.7 | 2.0 | virginica | |
2.7 | 4.9 | 1.8 | virginica | |
3.3 | 5.7 | 2.1 | virginica | |
3.2 | 6.0 | 1.8 | virginica | |
2.8 | 4.8 | 1.8 | virginica | |
3.0 | 4.9 | 1.8 | virginica | |
2.8 | 5.6 | 2.1 | virginica | |
3.0 | 5.8 | 1.6 | virginica | |
2.8 | 6.1 | 1.9 | virginica | |
3.8 | 6.4 | 2.0 | virginica | |
2.8 | 5.6 | 2.2 | virginica | |
2.8 | 5.1 | 1.5 | virginica | |
2.6 | 5.6 | 1.4 | virginica | |
3.0 | 6.1 | 2.3 | virginica | |
3.4 | 5.6 | 2.4 | virginica | |
3.1 | 5.5 | 1.8 | virginica | |
3.0 | 4.8 | 1.8 | virginica | |
3.1 | 5.4 | 2.1 | virginica | |
3.1 | 5.6 | 2.4 | virginica | |
3.1 | 5.1 | 2.3 | virginica | |
3.2 | 5.9 | 2.3 | virginica | |
3.3 | 5.7 | 2.5 | virginica | |
3.0 | 5.2 | 2.3 | virginica | |
2.5 | 5.0 | 1.9 | virginica | |
3.0 | 5.2 | 2.0 | virginica | |
3.4 | 5.4 | 2.3 | virginica | |
3.0 | 5.1 | 1.8 | virginica |
以上所有这些方法也同样可以用于经济数据、财务数据和股票数据的可视化。