Principal Components Analysis (PCA)¶
Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.
Given the training set \(X = \{ x_1, \ldots, x_n \}\) of \(p\)-dimensional feature vectors and the number of principal components \(r\), the problem is to compute \(r\) principal directions (\(p\)-dimensional eigenvectors) for the training set. The eigenvectors can be grouped into the \(r \times p\) matrix \(T\) that contains one eigenvector in each row.
oneDAL specifies two methods for PCA computation:
Covariance-based method¶
[TBD]
SVD-based method¶
[TBD]
Sign-flip technique¶
Eigenvectors computed by some eigenvalue solvers are not uniquely defined due to sign ambiguity. To get the deterministic result, a sign-flip technique should be applied. One of the sign-flip techniques proposed in [Bro07] requires the following modification of matrix \(T\):
where \(T_i\) is \(i\)-th row, \(T_{ij}\) is the element in the \(i\)-th row and \(j\)-th column, \(\mathrm{sgn}(\cdot)\) is the signum function,
Note
The sign-flip technique described above is an example. oneDAL spec does not require implementation of this sign-flip technique. Implementer can choose an arbitrary technique that modifies the eigenvectors’ signs.
Usage example¶
onedal::pca::model run_training(const onedal::table& data) {
const auto pca_desc = onedal::pca::desc<float>{}
.set_component_count(5)
.set_deterministic(true);
const auto result = onedal::train(pca_desc, data);
print_table("means", result.get_means());
print_table("variances", result.get_variances());
print_table("eigenvalues", result.get_eigenvalues());
print_table("eigenvectors", result.get_model().get_eigenvectors());
return result.get_model();
}
onedal::table run_inference(const onedal::pca::model& model,
const onedal::table& new_data) {
const auto pca_desc = onedal::pca::desc<float>{}
.set_component_count(model.get_component_count());
const auto result = onedal::infer(pca_desc, model, new_data);
print_table("labels", result.get_transformed_data());
}
API¶
Methods¶
namespace method {
struct cov {};
struct svd {};
using by_default = cov;
} // namespace method
-
struct
cov
¶ Tag-type that denotes Covariance-based method.
-
struct
svd
¶ Tag-type that denotes SVD-based method.
-
using
by_default
= cov¶ Alias tag-type for the Covariance-based method.
Descriptor¶
template <typename Float = float,
typename Method = method::by_default>
class desc {
public:
desc();
int64_t get_component_count() const;
bool get_deterministic() const;
desc& set_component_count(int64_t);
desc& set_deterministic(bool);
};
-
template<typename
Float
= float, typenameMethod
= method::by_default>
classdesc
¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
float
ordouble
.Method – Tag-type that specifies an implementation of PCA algorithm. Can be
method::cov
,method::svd
ormethod::by_default
.
-
desc
()¶ Creates a new instance of the descriptor with the default attribute values.
-
std::int64_t
component_count
= 0¶ The number of principal components \(r\). If it is zero, the algorithm computes the eigenvectors for all features, \(r = p\).
- Getter & Setter
std::int64_t get_component_count() const
desc& set_component_count(std::int64_t)
- Invariants
component_count >= 0
-
bool
set_deterministic
= true¶ Specifies whether the algorithm applies the Sign-flip technique or uses a deterministic eigenvalues solver. If it is true, directions of the eigenvectors must be deterministic.
- Getter & Setter
bool get_deterministic() const
desc& set_deterministic(bool)
Model¶
class model {
public:
model();
const table& get_eigenvectors() const;
int64_t get_component_count() const;
};
-
class
model
¶ -
model
()¶ Creates a model with the default attribute values.
-
table
eigenvectors
= table()¶ \(r \times p\) table with the eigenvectors. Each row contains one eigenvector.
- Getter
const table& get_eigenvectors() const
-
std::int64_t
component_count
= 0¶ The number of components \(r\) in the trained model.
- Getter
std::int64_t get_component_count() const
- Invariants
component_count == eigenvectors.row_count
-
Training onedal::train(...)
¶
Input¶
class train_input {
public:
train_input();
train_input(const table& data);
const table& get_data() const;
train_input& set_data(const table&);
};
Result¶
class train_result {
public:
train_result();
const model& get_model() const;
const table& get_means() const;
const table& get_variances() const;
const table& get_eigenvalues() const;
};
-
class
train_result
¶ -
train_result
()¶ Creates a result of the training operation with the default attribute values.
-
table
means
= table()¶ \(1 \times r\) table that contains mean value for the first \(r\) features.
- Getter
const table& get_means() const
-
Operation semantics¶
-
template<typename
Descriptor
>
pca::train_resulttrain
(const Descriptor &desc, const pca::train_input &input)¶ - Template Parameters
Descriptor – PCA algorithm descriptor
pca::desc
.
- Preconditions
- Postconditions
result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count
Inference onedal::infer(...)
¶
Input¶
class infer_input {
public:
infer_input();
infer_input(const model& m);
infer_input(const model& m, const table& data);
const model& get_model() const;
const table& get_data() const;
infer_input& set_model(const model&);
infer_input& set_data(const table&);
};
-
class
infer_input
¶ -
infer_input
()¶ Creates an input for the inference operation with the default attribute values.
-
infer_input
(const pca::model &model)¶ Creates an input for the inference operation with the given
model
, the other attributes get default values.
-
infer_input
(const pca::model &model, const table &data)¶ Creates an input for the inference operation with the given
model
anddata
.
-
Result¶
class infer_result {
public:
infer_result();
const table& get_transformed_data() const;
};
Operation semantics¶
-
template<typename
Descriptor
>
pca::infer_resultinfer
(const Descriptor &desc, const pca::infer_input &input)¶ - Template Parameters
Descriptor – PCA algorithm descriptor
pca::desc
.
- Preconditions
- Postconditions