AI News Hub Logo

AI News Hub

Vectors, Dimensions, and Feature Spaces — The Geometry Behind Machine Learning

DEV Community
Samuel Akopyan

If you strip machine learning of all the complex terminology and buzzwords, you almost always end up with the same core idea: we represent real-world objects as numbers and work with those numbers mathematically. This is where vectors, dimensions, and feature spaces come in. As PHP developers, it’s especially important to understand this intuitively rather than formally, because in code you’ll deal not with abstract linear algebra, but with arrays of numbers, matrices, and operations on them. A vector as a way to describe an object In machine learning, a vector is simply an ordered set of numbers. Each number represents some aspect of an object. If the object is simple, the vector is short. If the object is complex, the vector can be very long. Imagine a user of an online store. We can describe them using: age, number of purchases per year, and average order value. Then one user is a vector of three numbers: (age, purchases, average order value) In PHP, this looks very straightforward: $userVector = [35, 12, 78.5]; It’s important to understand: a vector is not just an array. The order of elements matters. If you swap age and average order value, the model won’t "figure out" what you meant. To it, these are completely different data. Vector dimensionality The dimensionality of a vector is the number of values it contains. In the example above, the dimensionality is 3. If you add another feature, say "days since last purchase", the dimensionality becomes 4. Dimensionality is directly tied to how detailed your object description is. Low dimensionality means a rough description, high dimensionality means a more detailed one. But higher dimensionality is not always better. Every additional feature is both a new degree of freedom for the model and a potential source of noise. It helps to think of dimensionality as a fixed contract. If a model expects a vector of length 10, you must always pass exactly 10 numbers, in the same order. function predict(array $features): float { if (count($features) !== 10) { throw new InvalidArgumentException("Expected a vector of dimensionality 10"); } // further computations } Example: $features = [0.12, 0.85, 0.33, 0.67, 0.91, 0.44, 0.58, 0.76, 0.29, 0.50]; try { $result = predict($features); echo "Model score: " . round($result, 3) . "\n"; // Interpret the result if ($result > 0.7) { echo "High probability of a positive outcome"; } elseif ($result > 0.4) { echo "Medium probability"; } else { echo "Low probability"; } } catch (Exception $e) { echo "Error: " . $e->getMessage(); } // Model score: 0.75 // High probability of a positive outcome Feature space Formally, a feature space is usually denoted as RnR^nRn . This means each object is represented by a vector x=(x₁,x₂,…,xₙ)x = (x₁, x₂, …, xₙ)x=(x₁,x₂,…,xₙ) of nnn real numbers, and the set of all such vectors forms a single abstract space. Most machine learning models operate within this nnn -dimensional space: vector addition and scalar multiplication are defined here, which allows us to use linear algebra tools. While formal axioms are rarely the focus in practice, it’s critical to understand that any model always operates within a fixed space defined by the structure of its input features. If the dimensionality is 2, the feature space is a plane. If it’s 3, it’s the familiar 3D space. If it’s higher, we can no longer visualize it, but mathematically it’s still the same feature-engineering RnR^nRn space. Each point in this space is a real-world object translated into numbers. A user, a product, a document, an image – after preprocessing, all of them become points in feature space. Feature space is the environment where the machine learning algorithm operates. Everything that used to be strings, dates, categories, and JSON becomes pure math – a set of numbers – after feature engineering. Features as coordinate axes Each feature is a separate coordinate axis in the space. Mathematically, this means the value $$xᵢ$$ is the coordinate of the point along the iii -th axis. For example, the vector (35, 12, 78.5) is a point in 3D space where the first axis is age, the second is number of purchases, and the third is average order value. This immediately explains a few important things. First, features should be comparable in scale (another reason we convert everything to numbers). If one axis is measured in tens and another in hundreds of thousands, then distances and angles computed by the chosen metric stop reflecting true similarity between objects. Second, adding a new feature means adding a new axis. The space becomes higher-dimensional (one more dimension), and each point gets an additional coordinate. The model now has to account for another direction when making decisions. Normalization and scaling In practice, you almost always need to bring features to comparable scales. This can be done in different ways: normalization (mapping values to a fixed range, for example from 0 to 1) or standardization (transforming the distribution to have zero mean and unit standard deviation). Both approaches solve the same engineering problem – making features comparable for the algorithm. Depending on the data, logarithmic or other nonlinear transformations may also be used, but those are beyond the scope of this book. It’s important to understand that normalization and scaling are not mathematical niceties, but practical necessities. Most machine learning algorithms are sensitive to the scale of input data: features with large numeric values start to dominate others, distorting the contribution of truly informative features. This reduces both the quality and stability of training. Normalization The simplest example is normalization to the range [0, 1]: function normalize(float $value, float $min, float $max): float { $range = $max - $min; if ($range === 0.0) { return 0.0; } return ($value - $min) / $range; } Example: $result = normalize(value: 75, min: 50, max: 100); echo $result; // Result: 0.5 // Explanation: (75 - 50) / (100 - 50) = 0.5 After this transformation, age, number of purchases, and average order value start to "weigh" roughly the same in feature space. Standardization Another widely used approach is feature standardization. Unlike normalization, it does not constrain values to a fixed range, but transforms the feature distribution to have zero mean and unit standard deviation. This is especially important for models that are sensitive to feature scale (linear and logistic regression, SVM, neural networks), as well as optimization methods based on gradient descent. Standardization makes features comparable while preserving information about outliers and relative deviations. The basic standardization formula looks like this: function standardize(float $value, float $mean, float $std): float { if ($std == 0.0) { return 0.0; } return ($value - $mean) / $std; } Example: // Suppose this is the feature "user response time" (in seconds) $value = 8.5; // Statistics from the training dataset $mean = 5.0; // mean $std = 2.0; // standard deviation $zScore = standardize($value, $mean, $std); echo "Z-score: " . round($zScore, 2) . "\n"; // Interpretation if ($zScore > 2) { echo "Significantly above average (anomaly)"; } elseif ($zScore 1) { echo "Above average"; } elseif ($zScore 0f(x) > 0f(x)>0 or f(x)predict($features); If you clearly understand what each element of this array represents and in which space it lives, you’ve already solved half of the problems in machine learning. Vectors are the language data uses to talk to models. Dimensionality is the complexity of that language. Feature space is the stage where all machine learning happens. As a developer, it’s important to stop seeing this as abstract math and start seeing well-structured arrays of numbers that represent real properties of real objects. All further machine learning math builds on this geometric interpretation: data are points, the model is a surface, and training is the process of finding the shape that best separates or approximates those points. Originally adapted from the book "AI for PHP Developers": https://apphp.gitbook.io/ai-for-php-developers https://aiwithphp.org/books/ai-for-php-developers/examples/part-1/what-is-a-model