[{"content":"Intro These are my study notes for Stanford CS 224N: Natural Language Processing with Deep Learning, Spring 2024. The course covers word vectors, neural networks, dependency parsing, RNNs, LSTMs, Seq2Seq models, machine translation, attention, Transformers, and related NLP topics.\nWhat is this course about? Natural language processing is one of the most important technologies in the information age. Search, advertising, email, customer service, translation, virtual agents, medical reports, and many other systems all depend on language understanding. CS224N introduces both the foundations of deep learning for NLP and newer research around large language models, with assignments and projects implemented in PyTorch.\nWord Vectors $ vector(\"King\") - vector(\"Man\") + vector(\"Woman\") $\nThis operation produces a vector close to the representation of Queen.\nHow do we have usable meaning in a computer? The slides introduce several ways to represent meaning:\nWordNet one-hot vectors word vectors The first two are useful, but they also have obvious limitations.\nWordNet relies on synonym sets and hypernym sets to define relationships between words. It is manually constructed, complex to maintain, and slow to absorb new words. One-hot vectors assign a symbol to every word. Even though the symbol is numeric, mathematically similar words are unrelated because their dot product is 0. This leads to the famous distributional idea: You shall know a word by the company it keeps.\nWord vectors normalize words into continuous vectors. Although a word may have multiple senses, its learned vector is often close to an average of its contextual usages. Dot products can then be used to measure relatedness between word vectors.\nWord2vec Original word2vec paper\nWord2vec captures the word-vector idea well: it compares a center word with nearby context words and learns a probability distribution from their similarity.\nWe have a large corpus of text: a long list of words. Every word in a fixed vocabulary is represented by a vector. We go through each position $t$ in the text, where there is a center word $c$ and context, or outside, words $o$. We use the similarity between the vectors of $c$ and $o$ to calculate the probability of $o$ given $c$, or vice versa. We keep adjusting the word vectors to maximize this probability. Objective Function For each position $t=1,\\ldots,T$, the model predicts context words within a fixed window of size $m$, given the center word $w_t$. The data likelihood is:\n$$ L(\\theta) = \\prod_{t=1}^{T} \\prod_{\\substack{-m \\le j \\le m \\\\ j \\neq 0}} P(w_{t+j} \\mid w_t; \\theta) $$For easier optimization and computation, this is converted into the average negative log likelihood:\n$$ J(\\theta) = -\\frac{1}{T} \\log L(\\theta) = -\\frac{1}{T} \\sum_{t=1}^{T} \\sum_{\\substack{-m \\le j \\le m \\\\ j \\neq 0}} \\log P(w_{t+j} \\mid w_t; \\theta) $$Minimizing $J(\\theta)$ is equivalent to maximizing prediction accuracy.\nPrediction Function $$ P(o \\mid c) = \\frac{\\exp(u_o^T v_c)}{\\sum_{w \\in V} \\exp(u_w^T v_c)} $$ $u_o^T v_c$: the larger the dot product, the closer the two words are in vector space, so the higher their semantic relatedness. Dot product compares the similarity of $o$ and $c$: $u^T v = u \\cdot v = \\sum_{i=1}^{n} u_i v_i$. A larger dot product means a larger probability. $\\exp()$: maps any real number to a positive number. Because the exponential function grows quickly, it amplifies larger dot products and gives highly related words larger weights. $\\sum_{w \\in V} \\exp(u_w^T v_c)$: the denominator sums over all possible words in vocabulary $V$. This ensures the probabilities over all possible outputs sum to 1. This is an application of the softmax function:\n$$ \\text{softmax}(x_i) = \\frac{\\exp(x_i)}{\\sum_{j=1}^{n} \\exp(x_j)} = p_i $$ Softmax maps arbitrary values $x_i$ into a probability distribution $p_i$. \u0026ldquo;Max\u0026rdquo;: it amplifies the probability corresponding to the largest $x_i$. \u0026ldquo;Soft\u0026rdquo;: it still assigns some probability to smaller $x_i$ values. To Train the Model To train a model, we gradually adjust parameters to minimize a loss.\n$$ \\theta = \\begin{bmatrix} v_{aardvark} \\\\ v_a \\\\ \\vdots \\\\ v_{zebra} \\\\ u_{aardvark} \\\\ u_a \\\\ \\vdots \\\\ u_{zebra} \\end{bmatrix} \\in \\mathbb{R}^{2dV} $$ If the word-vector dimension is $d$ and the vocabulary size is $V$, the total number of parameters is $2dV$. Each word has two vectors. $\\theta$ contains both representations for every word in the vocabulary: the center-word vector $v$ and the outside-word vector $u$. The model computes gradients for all parameters and updates them. The gradient formula comes from differentiating the softmax loss. The math process is:\nInitial loss function\nFor a center word $c$ and an outside word $o$, the negative log likelihood is:\n$$ \\text{Loss} = -\\log P(o \\mid c) $$ Expand using the softmax definition\nSubstitute the formula for $P(o|c)$ and use log properties:\n$$ \\text{Loss} = -\\log \\left( \\frac{\\exp(u_o^T v_c)}{\\sum_{w \\in V} \\exp(u_w^T v_c)} \\right) = -u_o^T v_c + \\log \\sum_{w \\in V} \\exp(u_w^T v_c) $$ Derivative of the first part\nDifferentiate the dot-product term with respect to the center-word vector $v_c$:\n$$ \\frac{\\partial}{\\partial v_c} (u_o^T v_c) = u_o $$ Derivative of the second part\nApply the chain rule to the $\\log \\sum \\exp(\\dots)$ term:\n$$ \\frac{\\partial}{\\partial v_c} \\log \\sum_{w \\in V} \\exp(u_w^T v_c) = \\frac{1}{\\sum_{w \\in V} \\exp(u_w^T v_c)} \\cdot \\sum_{x \\in V} \\left[ \\exp(u_x^T v_c) \\cdot u_x \\right] $$ Rewrite it as an expectation under the probability distribution\nExtract the original probability term $P(x|c)$:\n$$ \\sum_{x \\in V} \\left[ \\frac{\\exp(u_x^T v_c)}{\\sum_{w \\in V} \\exp(u_w^T v_c)} \\right] u_x = \\sum_{x \\in V} P(x \\mid c) u_x $$ Final gradient\nCombining both parts gives the gradient used to update $v_c$:\n$$ \\frac{\\partial \\text{Loss}}{\\partial v_c} = -u_o + \\sum_{x \\in V} P(x \\mid c) u_x $$ Gradient Descent Gradient descent update rule in matrix form:\n$$ \\theta^{new} = \\theta^{old} - \\alpha \\nabla_{\\theta} J(\\theta) $$For a single parameter:\n$$ \\theta_j^{new} = \\theta_j^{old} - \\alpha \\frac{\\partial}{\\partial \\theta_j^{old}} J(\\theta) $$ $\\alpha$: step size or learning rate. In practice, however, we usually use Stochastic Gradient Descent (SGD) instead.\nThe objective function $J(\\theta)$ is defined over all windows in the corpus. If every update required calculating the gradient $\\nabla_{\\theta} J(\\theta)$ over the whole corpus, the computation would be extremely expensive. SGD does not compute the whole corpus each time. It repeatedly samples windows and updates parameters after each single window, or each small batch. Skip-gram Model with Negative Sampling Negative sampling paper\n$$ P(o\\mid c)=\\frac{\\exp(u_x^T v_c)}{\\sum_{w \\in V} \\exp(u_w^T v_c)} $$If we calculate probabilities with the traditional softmax, the denominator sums over all words, which is too expensive.\nSkip-gram with Negative Sampling avoids calculating all possible words. Instead, it trains several logistic regression classifiers that prefer real context pairs over random context pairs. In practice, it samples $K$ negative examples, reducing the computation to $O(K)$:\n$$ J_{neg-sample}(u_o, v_c, U) = -\\log \\sigma(u_o^T v_c) - \\sum_{k \\in \\{K \\text{ sampled indices}\\}} \\log \\sigma(-u_k^T v_c) $$Here, $\\sigma(x)=\\frac{1}{1+e^{-x}}$ is the sigmoid function. It pushes positive pairs toward probability 1 and negative pairs toward probability 0.\nHowever, this can make low-frequency words such as \u0026ldquo;zebra\u0026rdquo; too unlikely, while words like \u0026ldquo;the\u0026rdquo; are sampled too often. Therefore, the sampling distribution is adjusted with the $3/4$ power:\n$$ P(W)=U(W)^{3/4}/Z $$This increases the relative probability of low-frequency words.\nGloVe Original GloVe paper: Global Vectors for Word Representation\nCo-occurrence Matrix Building a co-occurrence matrix is straightforward: first set a window size, then count the frequency of words that co-occur within that window. The figure above shows a simple example with window size 1, counting only neighboring words. However:\nThe dimension of word vectors grows greatly as the vocabulary grows. This increases storage cost, makes the matrix very sparse, and makes models based on it less robust. Function words appear extremely often but provide little information. It does not reflect the relationship between word distance and word relatedness. How can we reduce dimensionality? A classic method is SVD matrix factorization. I still do not fully understand the theory after asking AI, but in Assignment 1 a single sklearn function solves it.\n$$ X = U \\Sigma V^T $$ $X$ (co-occurrence matrix): size $|V| \\times |V|$. Each element $X_{ij}$ represents how many times word $i$ and word $j$ co-occur in the corpus. $U$ and $V$ (orthogonal matrices): their column vectors are orthonormal. In NLP, each row of $U$ is often treated as the original embedding of a word. $\\Sigma$ (diagonal matrix): the diagonal values $\\sigma_1, \\sigma_2, \\dots$ are called singular values. They are sorted from large to small and represent the importance, variance, or information carried by each dimension. Dimensionality reduction means compressing a word vector of length $V$ from the co-occurrence matrix into a vector of length $K$.\nThe key insight is that semantic meaning is not encoded by co-occurrence probabilities themselves, but by ratios of co-occurrence probabilities.\nRatios of co-occurrence probabilities can encode semantic components, and we want to capture these as linear semantic components in the word-vector space.\n$x = \\text{solid}$ $x = \\text{gas}$ $x = \\text{water}$ $x = \\text{fashion}$ $P(x\\mid\\text{ice})$ $1.9 \\times 10^{-4}$ $6.6 \\times 10^{-5}$ $3.0 \\times 10^{-3}$ $1.7 \\times 10^{-5}$ $P(x\\mid\\text{steam})$ $2.2 \\times 10^{-5}$ $7.8 \\times 10^{-4}$ $2.2 \\times 10^{-3}$ $1.8 \\times 10^{-5}$ $\\dfrac{P(x\\mid\\text{ice})}{P(x\\mid\\text{steam})}$ $8.9$ $8.5 \\times 10^{-2}$ $1.36$ $0.96$ Analogies Word vectors are mathematically powerful, but their analogy behavior has many practical problems.\nIn the following example, the question is $woman + grandfather - man = ?$. The obvious and most likely result is grandmother. But why do other words such as granddaughter and mother also receive almost equally high scores?\n1 2 # Run this cell to answer the analogy -- man : grandfather :: woman : x pprint.pprint(wv_from_bin.most_similar(positive=[\u0026#39;woman\u0026#39;, \u0026#39;grandfather\u0026#39;], negative=[\u0026#39;man\u0026#39;])) 1 2 3 4 5 6 7 8 9 10 [(\u0026#39;grandmother\u0026#39;, 0.7608445286750793), (\u0026#39;granddaughter\u0026#39;, 0.7200808525085449), (\u0026#39;daughter\u0026#39;, 0.7168302536010742), (\u0026#39;mother\u0026#39;, 0.7151536345481873), (\u0026#39;niece\u0026#39;, 0.7005682587623596), (\u0026#39;father\u0026#39;, 0.6659888029098511), (\u0026#39;aunt\u0026#39;, 0.6623409390449524), (\u0026#39;grandson\u0026#39;, 0.6618767976760864), (\u0026#39;grandparents\u0026#39;, 0.644661009311676), (\u0026#39;wife\u0026#39;, 0.6445354223251343)] Although the assignment does not give a standard answer, this can be understood through semantic clustering.\nSemantic neighborhood effect In vector space, logically similar words often cluster together. When we calculate $\\vec{w} + \\vec{g} - \\vec{m}$, we are actually locating a coordinate point in the space. granddaughter also has features such as \u0026ldquo;female\u0026rdquo; and \u0026ldquo;relative\u0026rdquo;, and often appears in contexts similar to grandfather or grandmother. Along the \u0026ldquo;family relation\u0026rdquo; dimension, these words are very close. Overlapping dimensions Word vectors usually have hundreds of dimensions. Although we subtract man and add grandfather, this does not completely erase similarity along other dimensions. daughter, mother, and grandmother share many dimensions such as [+female], [+human], and [+relative]. The next example is an incorrect analogy. The expected answer should be something like socks, but why does the model ignore glove and hand and output many square-related terms? Clearly this is not about foot as a body part, but about foot as a unit of length.\n1 pprint.pprint(wv_from_bin.most_similar(positive=[\u0026#39;foot\u0026#39;, \u0026#39;glove\u0026#39;], negative=[\u0026#39;hand\u0026#39;])) 1 2 3 4 5 6 7 8 9 10 [(\u0026#39;45,000-square\u0026#39;, 0.4922032654285431), (\u0026#39;15,000-square\u0026#39;, 0.4649604558944702), (\u0026#39;10,000-square\u0026#39;, 0.4544755816459656), (\u0026#39;6,000-square\u0026#39;, 0.44975775480270386), (\u0026#39;3,500-square\u0026#39;, 0.444133460521698), (\u0026#39;700-square\u0026#39;, 0.44257497787475586), (\u0026#39;50,000-square\u0026#39;, 0.4356396794319153), (\u0026#39;3,000-square\u0026#39;, 0.43486514687538147), (\u0026#39;30,000-square\u0026#39;, 0.4330596923828125), (\u0026#39;footed\u0026#39;, 0.43236875534057617)] Interference from polysemy As mentioned above, foot is also a unit of length, and it often combines with square. Training corpus bias Since foot has multiple meanings but the output is almost entirely about the unit sense, the training corpus may contain many ...square foot contexts. Word choice Even though all outputs are ...square terms, their scores are only around 0.5. This suggests the model did not find a strongly related word and probably did not understand the relationship among glove, hand, and foot. Neural Network A neural network = running several logistic regressions at the same time.\nCS231n Deep Learning on Network Architectures\nCS231n Deep Learning for Computer Vision on Backprop\nStructure Non-linearities Why do neural networks need non-linearities?\nCore idea: neural networks perform function approximation, such as regression or classification. Without non-linearity: a deep neural network can only perform linear transformations. More layers do not help: extra linear layers collapse into a single linear transformation: $W_1 W_2 x = Wx$. With non-linearity: a multi-layer structure with non-linear functions can approximate more complex functions. Bottom-left figures: the left figure shows linear classification, which can only draw a straight line and cannot separate complex red/green point distributions. The right figure shows non-linear classification, which can draw curves and separate the data. Three wave figures on the right: as function complexity increases, only non-linear models can fit the oscillating observed data. The common non-linear activation functions were already covered in my Intelligent Computing Systems course, so I will not expand on them here.\nGradients derivatives.pdf\nAt a simple level, a gradient is a derivative with respect to a variable. For example:\n$$ f(x)=x^3 $$Its derivative is:\n$$ \\frac{df}{dx}=3x^2 $$Of course, this is only a very simple example. In practice, neural networks involve large-scale chain rule calculations and gradients of matrices, or Jacobian matrices.\nChain Rule In single-variable calculus, if $y = f(u)$ and $u = g(x)$, then:\n$$ \\frac{dy}{dx} = \\frac{dy}{du} \\cdot \\frac{du}{dx} $$In neural networks, each layer is usually a vector, such as $\\mathbf{h}, \\mathbf{z} \\in \\mathbb{R}^n$. When this logic is extended to vectors, multiplication becomes matrix multiplication.\nFor multiple variables, we multiply Jacobian matrices:\nSuppose $\\mathbf h= f(z)$ and $\\mathbf z=Wx+b$. The partial derivatives below form Jacobian matrices:\n$$ \\frac{\\partial \\mathbf{h}}{\\partial \\mathbf{x}} = \\frac{\\partial \\mathbf{h}}{\\partial \\mathbf{z}} \\frac{\\partial \\mathbf{z}}{\\partial \\mathbf{x}} $$Matrix Calculus From the following expression, the Jacobian has non-zero values only on the diagonal:\n$$ \\begin{aligned} \\left( \\frac{\\partial \\mathbf{h}}{\\partial \\mathbf{z}} \\right)_{ij} \u0026= \\frac{\\partial h_i}{\\partial z_j} = \\frac{\\partial}{\\partial z_j} f(z_i) \\quad \u0026\u0026 \\text{definition of Jacobian} \\\\ \u0026= \\begin{cases} f'(z_i) \u0026 \\text{if } i = j \\\\ 0 \u0026 \\text{if otherwise} \\end{cases} \\quad \u0026\u0026 \\text{regular 1-variable derivative} \\end{aligned} $$$$ \\frac{\\partial \\mathbf h}{\\partial \\mathbf z} = \\begin{pmatrix} f'(z_1) \u0026 0 \u0026 \\cdots \u0026 0 \\\\ 0 \u0026 f'(z_2) \u0026 \\cdots \u0026 0 \\\\ \\vdots \u0026 \\vdots \u0026 \\ddots \u0026 \\vdots \\\\ 0 \u0026 0 \u0026 \\cdots \u0026 f'(z_n) \\end{pmatrix} = \\operatorname{diag}(f'(\\mathbf z)) $$Another common Jacobian is:\n$$ \\frac{\\partial}{\\partial \\mathbf{u}}(\\mathbf{u}^T \\mathbf{h})=\\mathbf h^T $$Suppose $\\mathbf{u}$ and $\\mathbf{h}$ are both $n$-dimensional column vectors:\n$$ \\mathbf{u} = \\begin{bmatrix} u_1 \\\\ u_2 \\\\ \\vdots \\\\ u_n \\end{bmatrix}, \\quad \\mathbf{h} = \\begin{bmatrix} h_1 \\\\ h_2 \\\\ \\vdots \\\\ h_n \\end{bmatrix} $$Their inner product is a scalar:\n$$ f = \\mathbf{u}^T \\mathbf{h} = u_1 h_1 + u_2 h_2 + \\dots + u_n h_n = \\sum_{i=1}^n u_i h_i $$We want to differentiate with respect to vector $\\mathbf{u}$. According to the definition of a Jacobian, we differentiate with respect to each element $u_k$:\n$$ \\frac{\\partial f}{\\partial u_k} = \\frac{\\partial}{\\partial u_k} (u_1 h_1 + \\dots + u_k h_k + \\dots + u_n h_n) $$All terms except $u_k h_k$ do not contain $u_k$, so their derivatives are 0:\n$$ \\frac{\\partial f}{\\partial u_k} = h_k $$By the usual Jacobian convention, the derivative of a scalar with respect to a column vector is a row vector:\n$$ \\frac{\\partial f}{\\partial \\mathbf{u}} = \\begin{bmatrix} \\frac{\\partial f}{\\partial u_1} \u0026 \\frac{\\partial f}{\\partial u_2} \u0026 \\dots \u0026 \\frac{\\partial f}{\\partial u_n} \\end{bmatrix} = \\begin{bmatrix} h_1 \u0026 h_2 \u0026 \\dots \u0026 h_n \\end{bmatrix} = \\mathbf{h}^T $$Write out the Jacobians $$ \\begin{aligned} \\frac{\\partial s}{\\partial \\mathbf{b}} \u0026= \\frac{\\partial s}{\\partial \\mathbf{h}} \\frac{\\partial \\mathbf{h}}{\\partial \\mathbf{z}} \\frac{\\partial \\mathbf{z}}{\\partial \\mathbf{b}} \\\\ \u0026= \\mathbf{u}^T \\text{diag}(f'(\\mathbf{z})) \\mathbf{I} \\\\ \u0026= \\mathbf{u}^T \\odot f'(\\mathbf{z}) \\end{aligned} $$$\\odot$ = Hadamard product = element-wise multiplication of two vectors to produce a vector.\nVariable Meaning in neural networks Note $s$ Loss/score The final scalar output, such as cross-entropy loss. We want to know how it changes with parameters. $\\mathbf{b}$ Bias vector A learnable parameter in the current layer. $\\mathbf{z}$ Logits/pre-activation The result of the linear combination: $\\mathbf{z} = \\mathbf{W}\\mathbf{x} + \\mathbf{b}$. $\\mathbf{h}$ Activation/hidden state The output after applying a non-linear activation: $\\mathbf{h} = f(\\mathbf{z})$. $\\mathbf{u}^T$ Upstream gradient $\\frac{\\partial s}{\\partial \\mathbf{h}}$ The signal propagated backward from higher layers. $f'(\\mathbf{z})$ Derivative of the activation function For example, the derivative of ReLU or sigmoid. It determines which neurons are active. $\\mathbf{I}$ Identity matrix Since $\\mathbf{z} = \\dots + \\mathbf{b}$, the derivative of $\\mathbf{z}$ with respect to $\\mathbf{b}$ is 1, represented as the identity matrix. Re-using Computation The upstream error signal $\\boldsymbol{\\delta}$ is:\n$$ \\boldsymbol{\\delta} = \\frac{\\partial s}{\\partial \\mathbf{h}} \\frac{\\partial \\mathbf{h}}{\\partial \\mathbf{z}} = \\mathbf{u}^T \\circ f'(\\mathbf{z}) $$After computing $\\boldsymbol{\\delta}$ first, later calculations become simpler:\nGradient of the weight matrix $W$:\n$$ \\frac{\\partial s}{\\partial \\mathbf{W}} = \\boldsymbol{\\delta} \\frac{\\partial \\mathbf{z}}{\\partial \\mathbf{W}} $$ Gradient of the bias vector $\\mathbf{b}$:\n$$ \\frac{\\partial s}{\\partial \\mathbf{b}} = \\boldsymbol{\\delta} \\frac{\\partial \\mathbf{z}}{\\partial \\mathbf{b}} = \\boldsymbol{\\delta} $$ Shape Convention Suppose the weight matrix is $\\mathbf{W} \\in \\mathbb{R}^{n \\times m}$ and the output is a scalar $s$, such as a loss. By the pure mathematical definition, $\\frac{\\partial s}{\\partial \\mathbf{W}}$ should be a $1 \\times nm$ row vector, a Jacobian. But if we use this form directly, the gradient update rule $\\theta^{new} = \\theta^{old} - \\alpha \\nabla_{\\theta} J(\\theta)$ cannot subtract tensors because the shapes do not match.\nFor convenience in computation, we use the convention that the gradient shape should match the parameter shape. Therefore, $\\frac{\\partial s}{\\partial \\mathbf{W}}$ is also an $n \\times m$ matrix:\n$$ \\frac{\\partial s}{\\partial \\mathbf{W}} = \\begin{bmatrix} \\frac{\\partial s}{\\partial W_{11}} \u0026 \\dots \u0026 \\frac{\\partial s}{\\partial W_{1m}} \\\\ \\vdots \u0026 \\ddots \u0026 \\vdots \\\\ \\frac{\\partial s}{\\partial W_{n1}} \u0026 \\dots \u0026 \\frac{\\partial s}{\\partial W_{nm}} \\end{bmatrix} $$$$ \\frac{\\partial s}{\\partial \\mathbf{W}} = \\boldsymbol{\\delta}^T \\mathbf{x}^T $$So what shape should a derivative result take?\nThe practical answer is to follow the shape convention:\nMethod: do not get stuck on the strict Jacobian definition. Always watch the variable dimensions. Core trick: use dimensional analysis to decide when to transpose a term or adjust multiplication order, so each layer\u0026rsquo;s gradient has exactly the same shape as the corresponding parameter. Important conclusion about $\\boldsymbol{\\delta}$: the error signal propagated to a hidden layer should have the same dimension as the number of neurons in that hidden layer, or the dimension of its activation vector. Backpropagation Computing each function step by step from input to output is forward propagation.\nFor a single node in backpropagation:\n$$ downstream\\ gradient = upstream\\ gradient \\times local\\ gradient $$\nFor a node with multiple inputs, the upstream gradient remains the same, but each input has a different local gradient. The formula is unchanged.\nHere is a concrete example with multiple inputs:\nBased on this example, suppose the input value $y$ changes to 2.1. Then $a=x+y=3.1$, $b=max(y+z)=y=2.1$, and $a\\times b=6.51$.\nSo a change of 0.1 in $y$ causes a change of 0.51 in the result. The gradient is:\n$$ \\frac{\\Delta f}{\\Delta y}=5.1 $$Implementations In theory, once the symbolic computation of forward propagation is known, a computer can automatically derive the result of backpropagation. But in modern frameworks, users or framework authors still define local derivative rules. This is more efficient and stable than a fully automatic symbolic approach.\n1 2 3 4 5 6 7 8 9 10 11 class MultiplyGate(object): def forward(self, x, y): z = x * y self.x = x # must keep these around! self.y = y return z def backward(self, dz): dx = self.y * dz # [dz/dx * dL/dz] dy = self.x * dz # [dz/dy * dL/dz] return [dx, dy] Numeric Gradient Checking When manually deriving and implementing backpropagation, numeric gradient checking is the standard way to verify that the math and code are correct:\n$$ f'(x) \\approx \\frac{f(x + h) - f(x - h)}{2h} $$ It only needs the forward function $f(x)$, so it does not require complex mathematical derivation and is less likely to be wrong. It must run two forward passes for each parameter, one with $+h$ and one with $-h$, so it is inefficient. It is suitable for local tests, not for validating a large whole network. Use it for a specific layer or a small parameter tensor, such as a $3 \\times 3$ matrix. Dependency Parsing Syntactic Structure Phrase structure organizes words into nested constituents. We can define grammar rules for phrases ourselves. For example, a noun phrase can be \u0026ldquo;determiner + adjective + noun\u0026rdquo; or \u0026ldquo;determiner + noun + prepositional phrase\u0026rdquo;; a prepositional phrase can be \u0026ldquo;preposition + noun\u0026rdquo;, and so on.\nDependency structure shows which words depend on, modify, attach to, or act as arguments of other words. Ambiguity is common in language, and prepositional phrases create even more ambiguity in English. For example:\nScientists count whales from space\nThis can be understood as Scientists [count] [whales from space], or Scientists [count whales] [from space].\nDependency Grammar and Treebanks Dependency syntax assumes that syntactic structure consists of relations between lexical items, usually binary asymmetric relations called dependencies.\nThe figure below is an older example of a dependency structure.\nAn arrow connects a head, also called governor, superior, or regent, with a dependent, also called modifier, inferior, or subordinate.\nUsually, dependencies form a tree: a connected, acyclic, single-root graph.\nAnnotated Data At first, building a treebank may look slower than manually writing grammar rules, and perhaps less useful. Manual annotation is indeed troublesome, but it has several major advantages:\nReusability: one annotated dataset can be used to train many parsers and POS taggers. Broad coverage: hand-written rules often cover only a few intuitive examples, while annotated real corpora cover the complexity of language in actual use. Frequencies and distributional information: a treebank tells the model which structures are more common, helping probabilistic models make better decisions. A way to evaluate NLP systems: without this kind of gold standard, we cannot measure parser accuracy through metrics such as LAS and UAS. The dependency labels in the example figure can be roughly understood as:\nLabel Meaning Simple understanding nsubj Nominal subject The doer of the action, as in I think. nsubjpass Passive subject The subject in passive voice, as in city called. ccomp Clausal complement A clause after a verb, as in think \u0026hellip;. advmod Adverbial modifier Modifies degree, question words, or verbs, as in Why. amod Adjectival modifier An adjective modifying a noun, as in famous goat. compound Compound modifier A noun modifying another noun, as in goat trainer. det Determiner Points to words like a, the, any. case Case marker Points to prepositions such as in, at. conj Conjunction Words connected by or, and, as in trainer or something. Dependency Conditioning Preferences During parsing, the model uses dependency conditioning preferences to judge whether two words are likely to have a dependency relation:\nBilexical affinities: whether a dependency such as [discussion -\u0026gt; issues] is reasonable. Dependency distance: most, but not all, dependencies occur between nearby words. Intervening material: dependencies rarely cross intervening verbs or punctuation. Valency of heads: for a head word, how many dependents does it usually have on each side? Projectivity If the words of a sentence are arranged in linear order and all dependency arcs are drawn above the words, a parse is projective when no two arcs cross. If arcs cross, the parse is non-projective, which usually indicates long-distance movement or overlapping structure.\nNon-projective examples are common in real language, such as:\nWho did Bill buy the coffee from yesterday\nTransition-Based Dependency Parser A transition-based dependency parser has a stack, a buffer, and three operations.\nStart: $\\sigma = [ROOT], \\beta = w_1, ..., w_n, A = \\emptyset$\nShift: $\\sigma, w_i | \\beta, A \\Rightarrow \\sigma | w_i, \\beta, A $\nLeft-$Arc_r$: $\\sigma | w_i | w_j, \\beta, A \\Rightarrow \\sigma | w_j, \\beta, A \\cup \\{r(w_j, w_i)\\} $\nRight-$Arc_r$: $\\sigma | w_i | w_j, \\beta, A \\Rightarrow \\sigma | w_j, \\beta, A \\cup \\{r(w_i, w_j)\\}$\nFinish: $\\sigma = [w], \\beta = \\emptyset$\n$\\sigma$ represents the stack, storing words currently being processed or waiting for dependency relations. $\\beta$ represents the buffer, storing the input words that have not yet been processed. $A$ represents the set of dependency arcs, storing dependency relations already created. Left-$Arc_r$ and Right-$Arc_r$ are two reduction operations that establish whether one word depends on another, with left or right direction. Now consider the example: analysis of I ate fish.\nThe Left Arc operation creates an arc from the stack top toward the second element, establishing that ate is the head and I depends on ate. Then I is removed from the stack. The Shift operation moves fish from the buffer into the stack. The Right Arc operation creates an arc from the second element to the stack top, establishing that ate is the head and fish depends on ate. Then fish is removed from the stack. The final Right Arc operation makes [root] point to ate. After ate is popped, only the root node remains and parsing is complete. Evaluation of Dependency Parsing Dependency parsing is evaluated with UAS (Unlabeled Attachment Score) and LAS (Labeled Attachment Score). The following example uses [ROOT] She saw the video lecture.; Gold is the standard answer and Parsed is the parser output.\nUAS checks whether the Head is correct. In this example, the third word the has a different head from the gold parse. LAS checks whether both the Head and the relation label are correct. In this example, only the relation between She and saw matches the gold parse. Neural dependency parsing More than 95% of parsing time is consumed by feature computation.\nTherefore, neural networks can be used to accelerate feature extraction. The method is still based on the transition-based dependency parser above, but it uses vectorization and non-linear neural network modeling. This led to the first neural-network-based dependency parser in 2014.\nRecurrent Neural Networks Language Modeling In simple terms, a language model takes text, or tokens, as input and outputs probabilities.\n$$ \\begin{aligned} P(\\boldsymbol{x}^{(1)}, \\dots, \\boldsymbol{x}^{(T)}) \u0026= P(\\boldsymbol{x}^{(1)}) \\times P(\\boldsymbol{x}^{(2)} | \\boldsymbol{x}^{(1)}) \\times \\dots \\times P(\\boldsymbol{x}^{(T)} | \\boldsymbol{x}^{(T-1)}, \\dots, \\boldsymbol{x}^{(1)}) \\\\ \u0026= \\prod_{t=1}^{T} \\underbrace{P(\\boldsymbol{x}^{(t)} | \\boldsymbol{x}^{(t-1)}, \\dots, \\boldsymbol{x}^{(1)})}_{\\text{This is what our LM provides}} \\end{aligned} $$$P(\\boldsymbol{x}^{(1)}, \\dots, \\boldsymbol{x}^{(T)})$ is the probability of an entire sequence, such as a sentence. By decomposing the joint probability into a product of conditional probabilities using the chain rule, we can calculate the probability of the sequence. The core task of a language model is to use the previous context $\\boldsymbol{x}^{(t-1)}, \\dots, \\boldsymbol{x}^{(1)}$ to predict the probability of the next token $\\boldsymbol{x}^{(t)}$.\nn-gram Language Models An n-gram is a chunk of $n$ consecutive words. Here, $n$ means how many words form one unit. To build an n-gram language model:\nFirst, make a Markov assumption: the word $x^{(t+1)}$ depends only on the previous $n-1$ words.\n$$ P(x^{(t+1)} | x^{(t)}, \\dots, x^{(1)}) = P(x^{(t+1)} | \\underbrace{x^{(t)}, \\dots, x^{(t-n+2)}}_{n-1 \\text{ words}}) \\quad \\text{(assumption)} $$ Using the definition of conditional probability, the above formula can be written as the ratio between an n-gram probability and an $(n-1)$-gram probability:\n$$ = \\frac{P(x^{(t+1)}, x^{(t)}, \\dots, x^{(t-n+2)}) \\leftarrow \\text{prob of a n-gram}}{P(x^{(t)}, \\dots, x^{(t-n+2)}) \\leftarrow \\text{prob of a (n-1)-gram}} \\quad \\text{(definition of conditional prob)} $$ We approximate these probabilities by counting n-gram frequencies in a large text corpus:\n$$ \\approx \\frac{\\text{count}(x^{(t+1)}, x^{(t)}, \\dots, x^{(t-n+2)})}{\\text{count}(x^{(t)}, \\dots, x^{(t-n+2)})} \\quad \\text{(statistical approximation)} $$ For example, suppose we have a 4-gram language model and want to predict the last blank:\nas the proctor started the clock, the students opened their ......\nWe only use the last three words, students opened their:\n$$ P(w\\mid students\\ opened\\ their)=\\frac{count(students\\ opened\\ their\\ w)}{count(students\\ opened\\ their)} $$According to the corpus, students opened their books may appear most often, while the more contextually appropriate students opened their exams may appear less often.\nProblems with n-gram Language Models When using counting to estimate probabilities, we face sparsity problems:\nIf the phrase students opened their $w$ never appears in the training data, then the probability for any such $w$ becomes 0.\nThis can be handled by adding a small value $\\delta$ to the count of each word $w \\in V$, which is smoothing.\nIf the prefix students opened their never appears in the training data, then we cannot calculate the probability of any $w$ because the denominator is 0.\nIn this case, we back off to a shorter context, such as opened their.\nThere is also a storage problem:\nWe need to store counts for all observed n-grams in the corpus. If $n$ increases, the required corpus size and storage grow greatly. A Fixed-window Neural Language Model Input layer (words / one-hot vectors): the inputs are one-hot vectors of words $\\boldsymbol{x}^{(1)}, \\boldsymbol{x}^{(2)}, \\boldsymbol{x}^{(3)}, \\boldsymbol{x}^{(4)}$.\nEmbedding layer (concatenated word embeddings): words are converted into dense embeddings and concatenated:\n$$ \\boldsymbol{e} = [\\boldsymbol{e}^{(1)}; \\boldsymbol{e}^{(2)}; \\boldsymbol{e}^{(3)}; \\boldsymbol{e}^{(4)}] $$ Hidden layer: apply a linear transformation with weight matrix $W$ and bias $b_1$, then pass through an activation function $f$, usually tanh or ReLU:\n$$ \\boldsymbol{h} = f(W\\boldsymbol{e} + \\boldsymbol{b}_1) $$ Output distribution: apply weight matrix $U$ and bias $b_2$, then use softmax to produce a probability distribution over vocabulary $V$:\n$$ \\hat{\\boldsymbol{y}} = \\text{softmax}(U\\boldsymbol{h} + \\boldsymbol{b}_2) \\in \\mathbb{R}^{|V|} $$ Compared with n-gram methods, this improves:\nSparsity problem: it no longer relies on exact counts, and can generalize unseen word groups through vector-space similarity. Storage: it does not need to store frequencies for all observed n-grams, only model parameters. But some problems remain:\nFixed-window limitation The fixed context window is usually too small. Increasing the window size linearly increases the number of parameters in weight matrix $W$. No matter how large the window is, it cannot capture long-range dependencies outside the window. Lack of symmetry Inputs $\\boldsymbol{x}^{(1)}$ and $\\boldsymbol{x}^{(2)}$ are multiplied by completely different parts of $W$, so the model does not process each input position consistently. RNN Language Model The Unreasonable Effectiveness of Recurrent Neural Networks\nAdvantages of RNNs:\nThey can process input of any length. In theory, computation at step $t$ can use information from many steps earlier. Fixed model size: increasing input length does not increase the number of model parameters. Symmetry: the same weights are applied at every step, so input positions are processed consistently. Disadvantages of RNNs:\nSlow computation: because computation is recurrent, it cannot be fully parallelized. Practical difficulty: in practice, it is hard to use information from many steps earlier, because of vanishing or exploding gradients. Train an RNN Language Model Obtain a large text corpus consisting of a word sequence $\\boldsymbol{x}^{(1)}, \\dots, \\boldsymbol{x}^{(T)}$.\nFeed the sequence into the RNN-LM and compute the output distribution $\\hat{\\boldsymbol{y}}^{(t)}$ for every time step $t$. This means the model predicts the probability distribution over possible next words at each position, given the words seen so far.\nThe model produces a loss at every time step. At step $t$, the loss is the cross entropy between the predicted distribution $\\hat{\\boldsymbol{y}}^{(t)}$ and the true next word $\\boldsymbol{y}^{(t)}$, which is the one-hot vector of $\\boldsymbol{x}^{(t+1)}$:\n$$ J^{(t)}(\\theta) = CE(\\boldsymbol{y}^{(t)}, \\hat{\\boldsymbol{y}}^{(t)}) = - \\sum_{w \\in V} \\boldsymbol{y}^{(t)}_w \\log \\hat{\\boldsymbol{y}}^{(t)}_w = - \\log \\hat{\\boldsymbol{y}}^{(t)}_{\\boldsymbol{x}_{t+1}} $$ To get the loss over the whole training sequence, average the loss over all steps:\n$$ J(\\theta) = \\frac{1}{T} \\sum_{t=1}^{T} J^{(t)}(\\theta) = \\frac{1}{T} \\sum_{t=1}^{T} - \\log \\hat{\\boldsymbol{y}}^{(t)}_{\\boldsymbol{x}_{t+1}} $$This uses the idea of teacher forcing: when calculating loss, the model does not feed its own previous prediction into the next step. It directly uses the correct word from the corpus.\nComputing the loss and gradients over the entire corpus $\\boldsymbol{x}^{(1)}, \\dots, \\boldsymbol{x}^{(T)}$ at once is extremely expensive in memory. In practice, we treat the sequence as sentences or documents, use SGD to compute loss and gradients over a small chunk of data, and update parameters immediately.\nBackpropagation for RNN RNN parameters are trained with backpropagation through time. The backward pass runs along time steps $i=t,\\dots,0$ and accumulates gradients.\nBecause $\\boldsymbol{W}_h$ is shared at every time step, the total gradient is the sum of gradients produced at each step:\n$$ \\frac{\\partial J^{(t)}}{\\partial \\boldsymbol{W}_h} = \\sum_{i=1}^{t} \\left. \\frac{\\partial J^{(t)}}{\\partial \\boldsymbol{W}_h} \\right|_{(i)} \\frac{\\partial \\boldsymbol{W}_h|_{(i)}}{\\partial \\boldsymbol{W}_h} = \\sum_{i=1}^{t} \\left. \\frac{\\partial J^{(t)}}{\\partial \\boldsymbol{W}_h} \\right|_{(i)} $$As the sequence grows longer, full backpropagation becomes very expensive and is prone to vanishing or exploding gradients. In practice, training is often truncated after about 20 time steps.\nExploding Gradient Exploding gradients occur when:\nThe eigenvalues of $W_h$, roughly the magnitude of the weights, are greater than 1. As time step $T$ increases, gradients grow exponentially. Model weights are updated too aggressively, making the network unstable. Parameters may overflow into NaN and training collapses. If the norm of the gradient exceeds a preset threshold before updating model parameters, we scale it down proportionally. If $\\|\\hat{\\boldsymbol{g}}\\| \\ge threshold$, we apply gradient clipping:\n$$ \\hat{\\boldsymbol{g}} \\leftarrow \\frac{threshold}{\\|\\hat{\\boldsymbol{g}}\\|} \\hat{\\boldsymbol{g}} $$Gradient clipping keeps the update in the same direction, but takes a smaller step.\nVanishing Gradient Vanishing gradients occur when:\nThe eigenvalues of $W_h$ are less than 1, or the derivatives of activation functions such as $f$ or tanh are less than 1. Gradients shrink exponentially as the number of backward steps increases. This corresponds to the RNN limitation mentioned earlier: in practice, it is hard to access information from many steps earlier. When gradients become extremely small, far-away weights are barely updated, and the model \u0026ldquo;forgets\u0026rdquo; long-term context. For a vanilla RNN, learning to preserve information across many time steps is difficult because the hidden state $\\boldsymbol{h}^{(t)}$ is constantly rewritten:\n$$ \\boldsymbol{h}^{(t)} = \\sigma(\\boldsymbol{W}_h \\boldsymbol{h}^{(t-1)} + \\boldsymbol{W}_x \\boldsymbol{x}^{(t)} + \\boldsymbol{b}) $$Therefore, we introduce independent memory, such as LSTMs, or build more direct connections, such as attention mechanisms.\nLong Short-Term Memory Understanding LSTM Networks \u0026ndash; colah\u0026rsquo;s blog\nForget gate: controls what to keep and what to forget from the previous cell state.\n$$ \\boldsymbol{f}^{(t)} = \\sigma (\\boldsymbol{W}_f \\boldsymbol{h}^{(t-1)} + \\boldsymbol{U}_f \\boldsymbol{x}^{(t)} + \\boldsymbol{b}_f) $$Input gate: controls which parts of the new cell content are written into the cell.\n$$ \\boldsymbol{i}^{(t)} = \\sigma (\\boldsymbol{W}_i \\boldsymbol{h}^{(t-1)} + \\boldsymbol{U}_i \\boldsymbol{x}^{(t)} + \\boldsymbol{b}_i) $$Output gate: controls which parts of the cell are output to the hidden state.\n$$ \\boldsymbol{o}^{(t)} = \\sigma (\\boldsymbol{W}_o \\boldsymbol{h}^{(t-1)} + \\boldsymbol{U}_o \\boldsymbol{x}^{(t)} + \\boldsymbol{b}_o) $$New cell content: the new content to be written into the cell, also known as candidate content.\n$$ \\tilde{\\boldsymbol{c}}^{(t)} = \\tanh (\\boldsymbol{W}_c \\boldsymbol{h}^{(t-1)} + \\boldsymbol{U}_c \\boldsymbol{x}^{(t)} + \\boldsymbol{b}_c) $$Cell state: erase, or forget, parts of the previous cell state and write in new cell content.\n$$ \\boldsymbol{c}^{(t)} = \\boldsymbol{f}^{(t)} \\odot \\boldsymbol{c}^{(t-1)} + \\boldsymbol{i}^{(t)} \\odot \\tilde{\\boldsymbol{c}}^{(t)} $$Hidden state: read, or output, some content from the cell.\n$$ \\boldsymbol{h}^{(t)} = \\boldsymbol{o}^{(t)} \\odot \\tanh \\boldsymbol{c}^{(t)} $$Step-by-Step LSTM Walk Through In the figure above, each line carries a complete vector from one node\u0026rsquo;s output to other nodes\u0026rsquo; inputs. Pink circles represent pointwise operations such as vector addition, and yellow boxes represent learned neural network layers. Merged lines represent concatenation, and forked lines mean the content is copied and sent to different places. The key to LSTM is the cell state, the horizontal line running through the top of the diagram.\nThe cell state is like a conveyor belt. It runs straight through the chain with only minor linear interactions. Information can flow along it relatively unchanged.\nLSTMs can add or remove information from the cell state. This is carefully controlled by structures called gates.\nA gate is a way to selectively allow information through. It consists of a sigmoid neural network layer and a pointwise multiplication operation.\nThe first step of an LSTM is to decide what information to discard from the cell state. This decision is made by the forget gate layer, a sigmoid layer. It receives $h_{t-1}$ and $x_t$, and outputs a value between 0 and 1 for each number in the previous cell state $C_{t-1}$. A value of 1 means \u0026ldquo;keep completely\u0026rdquo;; 0 means \u0026ldquo;discard completely\u0026rdquo;. Returning to the language-model example, the cell state may contain the gender of the current subject, so the model can use the correct pronoun. When a new subject appears, we want to forget the gender of the old subject. The next step is to decide what new information to store in the cell state. This has two parts. First, an input gate layer decides which values to update. Then a $\\tanh$ layer creates a vector of new candidate values $\\tilde{C}_t$ that can be added to the state. In the next step, these two parts are combined to update the state. In the language-model example, we want to add the gender of the new subject into the cell state, replacing the old gender information we are forgetting. Now it is time to update the old cell state $C_{t-1}$ into the new cell state $C_t$. The previous steps already decided what to do; now we execute it.\nWe multiply the old state by $f_t$ to forget the information we decided to forget. Then we add $i_t * \\tilde{C}_t$. These are the new candidate values, scaled by how much we decided to update each state value.\nIn the language-model example, this is where we actually remove the old subject-gender information and add the new information.\nFinally, we need to decide what to output. This output is based on the cell state, but it is a filtered version.\nFirst, we run a sigmoid layer to decide which parts of the cell state to output. Then we pass the cell state through $\\tanh$, pushing values into the range -1 to 1, and multiply it by the sigmoid gate output. In this way, we only output the parts we decided to output.\nIn a language-model example, after processing a subject, the model may want to output information related to the upcoming verb, such as whether the subject is singular or plural.\nHow does LSTM solve vanishing gradients The LSTM architecture makes it easier for an RNN to preserve information over multiple time steps. For example, if the forget gate of a cell dimension is set to 1 and the input gate is set to 0, that information can be kept indefinitely.\nIn contrast, a vanilla RNN must learn a recurrent weight matrix $W_h$ that preserves information in the hidden state, which is much harder.\nAlthough vanishing and exploding gradients cannot be completely avoided, models can create more direct and more linear paths for long-distance dependencies. ResNet and DenseNet are examples of architectures that create direct connections between modules or layers.\nBidirectional RNNs Traditional one-way RNNs or LSTMs have an obvious limitation: when processing a sequence, they can only \u0026ldquo;look left\u0026rdquo;, meaning they only use past context. However, in many NLP tasks such as sentiment classification, named entity recognition, or sentence-level understanding, the meaning of the current word may also depend on the \u0026ldquo;right side\u0026rdquo;, or future context. To solve this, researchers introduced bidirectional architectures, often implemented with LSTMs: Forward RNN: processes the input sequence from left to right and computes hidden states $\\overrightarrow{h}_t$. Backward RNN: processes the same input sequence from right to left and computes hidden states $\\overleftarrow{h}_t$. Concatenated state: at each time step $t$, concatenate the forward and backward hidden states to form the final representation at that position: $h_t = [\\overrightarrow{h}_t; \\overleftarrow{h}_t]$. Each word representation therefore contains both left and right context. Bidirectional LSTMs are powerful feature extractors, but they are only suitable for tasks where the complete input sequence is available at once, such as text classification or encoding the source sentence in translation. They cannot be used for traditional language modeling, because language modeling predicts the next word. If the model can see future words on the right, it violates the autoregressive prediction setup. Neural Machine Translation Neural machine translation was one of the first major successes of deep learning in NLP. NMT is mainly based on the Sequence-to-Sequence (Seq2Seq) architecture, whose core consists of two RNNs, usually LSTMs: an encoder and a decoder. The encoder reads the source-language sentence. While reading, it does not produce the translation directly; it continuously updates its hidden state. After the encoder processes the final word, its final hidden state is treated as a compressed semantic representation of the whole sentence. This acts as an \u0026ldquo;information bottleneck\u0026rdquo;, because all complex meanings of the source sentence must be compressed into one fixed-dimensional vector. The decoder-side LSTM is essentially a conditional language model. Its initial hidden state is not random or all zero; it is set to the bottleneck vector output by the encoder. This means every generation step of the decoder is conditioned on the semantic vector of the source sentence. At each time step, it outputs the word with the highest probability according to the current hidden state, then feeds the last generated word into the next step until it produces the end-of-sentence token \u0026lt;EOS\u0026gt;. ","date":"2026-06-27T00:00:00+08:00","permalink":"/en/p/cs224n/","title":"CS224N"},{"content":"References Practical BM25 - Part 1: How Shards Affect Relevance Scoring in Elasticsearch | Elastic Blog\nPractical BM25 - Part 2: The BM25 Algorithm and its Variables | Elastic Blog\nPractical BM25 - Part 3: Considerations for Picking b and k1 in Elasticsearch | Elastic Blog\ntf-idf - Wikipedia\nBackground In Elasticsearch 5.0, the default similarity algorithm was changed to Okapi BM25, which is used to score the relevance between search results and a query. This post focuses on the practical side of BM25, including its available parameters and the factors that affect scoring.\nUnderstanding How Shards Affect Scoring Before learning BM25, it is necessary to understand that an Elasticsearch index can be split into multiple shards, which are physical partitions of the index. This matters because BM25 relevance scores are not naturally calculated from global statistics across the entire index. By default, they may be calculated separately inside each shard. The more shards there are, and the less data each shard contains, the easier it is for scoring bias to appear.\nBelow, we follow the example from the reference article. The goal is to create an Elasticsearch index named people, insert a few test documents, and repeatedly search for the same query term \u0026quot;Shane\u0026quot; to observe how BM25 relevance scores change with document count and shard distribution.\nThe author creates an index named people, sets it to have 5 primary shards, and uses BM25 as the default similarity algorithm:\n1 2 3 4 5 6 7 8 9 10 11 12 13 PUT people { \u0026#34;settings\u0026#34;: { \u0026#34;number_of_shards\u0026#34;: 5, \u0026#34;index\u0026#34; : { \u0026#34;similarity\u0026#34; : { \u0026#34;default\u0026#34; : { \u0026#34;type\u0026#34; : \u0026#34;BM25\u0026#34; } } } } } The author uses his own name as the example:\n1 2 3 4 5 6 7 8 9 10 11 12 PUT /people/_doc/1 { \u0026#34;title\u0026#34;: \u0026#34;Shane\u0026#34; } GET /people/_doc/_search { \u0026#34;query\u0026#34;: { \u0026#34;match\u0026#34;: { \u0026#34;title\u0026#34;: \u0026#34;Shane\u0026#34; } } } The search looks for documents whose title field matches \u0026quot;Shane\u0026quot;, so it naturally matches /people/_doc/1:\n1 2 3 4 5 6 7 8 9 10 11 12 PUT /people/_doc/2 { \u0026#34;title\u0026#34;: \u0026#34;Shane C\u0026#34; } PUT /people/_doc/3 { \u0026#34;title\u0026#34;: \u0026#34;Shane Connelly\u0026#34; } PUT /people/_doc/4 { \u0026#34;title\u0026#34;: \u0026#34;Shane P Connelly\u0026#34; } Then the same search is run again:\n1 2 3 4 5 6 7 8 GET /people/_doc/_search { \u0026#34;query\u0026#34;: { \u0026#34;match\u0026#34;: { \u0026#34;title\u0026#34;: \u0026#34;Shane\u0026#34; } } } At this point there are 4 \u0026ldquo;documents\u0026rdquo;:\nShane Shane C Shane Connelly Shane P Connelly The search finds documents whose title field matches \u0026quot;Shane\u0026quot;. Although all titles contain \u0026quot;Shane\u0026quot;, their BM25 scores are not the same. The result is that doc1 and doc3 both score 0.2876821, while doc2 scores 0.19856805 and doc4 scores 0.16853254.\nAlthough doc2 and doc3 look similar, their scores differ a lot. This is not mainly caused by the difference between \u0026quot;C\u0026quot; and \u0026quot;Connelly\u0026quot;, but by how documents are distributed across shards. So how can the scores become more consistent?\nThe larger the dataset, the smaller the statistical difference between shards.\nReducing the number of shards can reduce scoring bias.\nIf you want BM25 scores under multiple shards to be closer to \u0026ldquo;global statistics\u0026rdquo;, you can add ?search_type=dfs_query_then_fetch when querying. It collects term-frequency statistics from all shards first, then calculates scores in a unified way, so the result will be close to, or even the same as, the result when number_of_shards=1.\ndfs_query_then_fetch first aggregates term-frequency statistics across shards and then calculates BM25 scores, making multi-shard scoring closer to single-shard global scoring. However, it adds one extra communication round, so it is only worth using when the dataset is small, there are many shards, the data distribution is uneven, and relevance scores matter a lot.\nAlgorithm and its variables BM25 model:\n$$ \\sum_{i}^{n} IDF(q_i) \\frac{f(q_i, D) * (k_1 + 1)}{f(q_i, D) + k_1 * (1 - b + b * \\frac{fieldLen}{avgFieldLen})} $$ $q_i$: the $i$-th keyword in the query. $IDF(q_i)$: the inverse document frequency of keyword $q_i$. $f(q_i, D)$: the term frequency of keyword $q_i$ in document $D$. $fieldLen$: the length of the current document field. $avgFieldLen$: the average field length across all documents in the index. $k_1$ and $b$: tunable parameters. Usually $k1 \\in [1.2, 2.0]$, and $b = 0.75$. In simple terms, BM25 is a TF-IDF model that introduces nonlinearity and handles the frequency saturation problem. The TF-IDF model is:\n$$ \\text{Score} = f(q_i, D) \\times \\log\\left(\\frac{N}{n(q_i)}\\right) $$$q_i$ For example, if I search for \u0026ldquo;shane\u0026rdquo;, there is only one query term, so $q_0$ is \u0026ldquo;shane\u0026rdquo;. If I search for \u0026ldquo;shane connelly\u0026rdquo; in English, Elasticsearch recognizes the space and tokenizes the query into two terms: $q_0$ is \u0026ldquo;shane\u0026rdquo;, and $q_1$ is \u0026ldquo;connelly\u0026rdquo;. These query terms are substituted into the other parts of the formula, and the final results are summed.\n$IDF(q_i)$ The IDF (Inverse Document Frequency) part of the formula measures how frequently a term appears across all documents. It \u0026ldquo;penalizes\u0026rdquo; common terms by lowering their weight. In the Lucene/BM25 algorithm, the actual formula is:\n$$ \\ln \\left( 1 + \\frac{(docCount - f(q_i) + 0.5)}{f(q_i) + 0.5} \\right) $$Here, $docCount$ is the total number of documents in this shard that contain a value for this field. If the search_type=dfs_query_then_fetch parameter is used, it is the count across all shards. $f(q_i)$ is the number of documents containing the $i$-th query term. In the example, the term \u0026ldquo;shane\u0026rdquo; appears in all 4 documents, so the inverse document frequency $IDF(\\text{\"shane\"})$ is:\n$$ \\ln\\left(1 + \\frac{(4 - 4 + 0.5)}{4 + 0.5}\\right) = \\ln\\left(1 + \\frac{0.5}{4.5}\\right) = 0.105360515657826 $$$IDF(\\text{\"connelly\"})$ is:\n$$ \\ln\\left(1 + \\frac{(4 - 2 + 0.5)}{2 + 0.5}\\right) = \\ln\\left(1 + \\frac{2.5}{2.5}\\right) = 0.693147180559945 $$We can see that queries containing rarer terms have a higher multiplier. In this 4-document corpus, \u0026ldquo;connelly\u0026rdquo; is rarer than \u0026ldquo;shane\u0026rdquo;, so it contributes more to the final score. This matches intuition: the word \u0026ldquo;the\u0026rdquo; may appear in almost every English document, so when a user searches for something like \u0026ldquo;the elephant\u0026rdquo;, \u0026ldquo;elephant\u0026rdquo; is clearly more important than \u0026ldquo;the\u0026rdquo;, and we also expect it to contribute more to the search score.\n$fieldLen/avgFieldLen$ The more terms a document contains, at least terms that do not match the query, the lower the document score tends to be. This also matches intuition: if a 300-page document mentions my name only once, it is probably less relevant than a short tweet that also mentions my name once.\n$b$ The larger the value of $b$, the more the document length ratio affects the score. To understand this, imagine setting $b$ to 0. In that case, the length ratio has no effect at all, and the score is only affected by term frequency. Document length does not affect scoring. If $b$ is set to 1, the score is affected only by the length ratio and not by frequency.\n$f(q_i, D)$ This value corresponds to TF, or Term Frequency.\n$f(q_i, D)$ means: how many times does the $i$-th query term appear in document $D$? In all of the example documents, $f(\\text{\"shane\"}, D)$ is 1, but $f(\\text{\"connelly\"}, D)$ differs: it is 1 in documents 3 and 4, and 0 in documents 1 and 2. If there were a 5th document whose text was \u0026ldquo;shane shane\u0026rdquo;, then $f(\\text{\"shane\"}, D)$ would be 2. We can see that $f(q_i, D)$ appears in both the numerator and denominator, together with a special factor called \u0026ldquo;$k_1$\u0026rdquo;, which is discussed below. The basic intuition is that the more often a query term appears in a document, the higher the score becomes. A document that mentions our name multiple times is more likely to be relevant than one that mentions it only once.\n$k_1$ In BM25, $k_1$ is the core parameter controlling term frequency saturation. It sets an asymptotic upper bound for the contribution of $f(q_i, D)$ to the relevance score, making the marginal gain decrease nonlinearly as term frequency increases. Compared with the almost linear weight growth in traditional TF-IDF, this mechanism effectively suppresses excessive ranking influence from high-frequency terms, such as keyword stuffing. The value of $k_1$ directly determines how quickly the score approaches saturation: a smaller $k_1$ makes term frequency contribution hit the bottleneck quickly, while a larger $k_1$ allows term frequency to maintain meaningful weight gains over a wider range.\nIf $k_1$ is set to 0, the score becomes fixed at 1. If $k_1$ is set to a very large value, such as 10000, the formula approximately degenerates into $\\frac{TF \\times k_1}{k_1} = TF$, becoming term frequency itself.\nPicking $b$ and $k_1$ Regarding the values of $b$ and $k_1$, the Elasticsearch article also points out that the current defaults are empirical values that work for most cases, but there is no globally optimal b and k1. They must be evaluated together with the corpus and queries.\nAlso, when retrieval performance is not good enough, the following should be optimized before tuning $b$ and $k_1$:\nBoost exact phrase matches. Use synonyms to expand expressions that users may care about. Use analysis components such as fuzziness, typeahead, phonetic matching, and stemming to handle spelling mistakes, language differences, and word-form variations. Use function score to adjust document scores based on publish time, geographical distance, or business features. As for the Explain API in the later part of the Elasticsearch article, I will not expand on it here.\n","date":"2026-06-27T00:00:00+08:00","permalink":"/en/p/practical-bm25/","title":"Practical BM25"},{"content":"References Naver SPLADE official repository naver/splade_v2_max model page Sentence Transformers Sparse Encoder documentation Sentence Transformers Sparse Encoder training overview Sentence Transformers Sparse Encoder inference efficiency Sparse Vectors and the SPLADE Model In RAG systems, dense vectors have become the most common retrieval method. They map text into a continuous vector space and are good at capturing semantically similar expressions, such as \u0026ldquo;employee resignation process\u0026rdquo; and \u0026ldquo;personnel exit procedure\u0026rdquo;. However, dense vectors also have clear weaknesses: they are not always good at exact matching for entities, IDs, terminology, error codes, product models, table field names, and code snippets.\nThis is where sparse vectors become valuable. They are more like a neural-network-enhanced inverted index: text is still represented as sparse weights over term dimensions, but these weights are not calculated by pure statistical methods like BM25. Instead, they are predicted by a model.\nIn short:\nDense vectors handle semantic similarity. BM25 handles exact lexical matching. SPLADE sparse vectors handle weighted matching after neural term expansion. Hybrid Search merges dense and sparse retrieval results. Paper model SPLADE maps a piece of text into vocabulary space based on the logits of a Masked Language Model. Suppose the vocabulary contains 30522 WordPiece tokens. Each text can eventually be represented as:\n1 token_id -\u0026gt; weight This is a sparse vector. Most token weights are 0, and only a small number of tokens that the model considers important have non-zero weights.\nThe biggest difference from ordinary embeddings is that each dimension in a dense embedding is usually not interpretable, while each dimension in a sparse vector is a vocabulary token. A token activated by the model can be understood as \u0026ldquo;this text is related to this term\u0026rdquo;.\nFor example, a document may not explicitly contain the word \u0026ldquo;reimbursement\u0026rdquo;, but it may contain \u0026ldquo;travel expense\u0026rdquo;, \u0026ldquo;invoice\u0026rdquo;, and \u0026ldquo;approval form\u0026rdquo;. SPLADE may activate tokens related to \u0026ldquo;reimbursement\u0026rdquo;. Then, when the query is \u0026ldquo;reimbursement process\u0026rdquo;, the document may still be retrieved even if it does not exactly match the original term.\nMore specifically, SPLADE uses the logits from the Masked Language Model layer to predict the importance of each term in the BERT WordPiece vocabulary. Suppose the tokenized input text is:\n$$ t=(t_{1},t_{2},...,t_{N}) $$and the corresponding contextual representations are:\n$$ (h_{1},h_{2},...,h_{N}) $$For the $i$-th token in the input, the model calculates its importance for the $j$-th token in the vocabulary:\n$$ w_{ij}=transform(h_{i})^{T}E_{j}+b_{j}, \\quad j\\in\\{1,...,|V|\\} $$Here, $E_j$ is the BERT input embedding of vocabulary ${token}_j$, and $b_j$ is the token-level bias. transform(.) is usually a linear transformation with GeLU and LayerNorm. Intuitively, this step asks: for this position in the input, how related is it to each term in the vocabulary?\nHowever, retrieval does not need \u0026ldquo;the score of a term at one position\u0026rdquo;. It needs \u0026ldquo;the score of a term for the whole text\u0026rdquo;. Therefore, SPLADE aggregates activations from different positions into a sparse representation for the whole text:\n$$ w_{j}=\\sum_{i\\in t}\\log(1+ReLU(w_{ij})) $$There are three meanings in this formula:\nReLU sets negative scores to zero and keeps only positively related terms. $log(1+x)$ performs logarithmic saturation, preventing scores of frequent or repeated words from growing without bound. $\\sum$ accumulates activations from different positions for the same vocabulary token, producing the term weight for the whole text. Finally, the text becomes a high-dimensional but sparse vector:\n1 token_id -\u0026gt; weight After both the query and document are mapped into the same vocabulary space, the retrieval score is the dot product of sparse vectors:\n$$ s(q,d)=\\sum_j w_j^q w_j^d $$This is also why SPLADE can be connected to inverted indexes or sparse vector indexes.\nRanking loss During training, SPLADE needs to make relevant documents score higher and irrelevant documents score lower. Given a query $q_i$, a positive document $d_i^+$, a hard negative document $d_i^-$, and a group of in-batch negative documents ${d_{i,j}^{-}}$, a contrastive ranking loss similar to the following can be used:\n$$ \\mathcal{L}_{rank-IBN} = -\\log \\frac{e^{s(q_i,d_i^+)}} {e^{s(q_i,d_i^+)} + e^{s(q_i,d_i^-)} + \\sum e^{s(q_i,d_{i,j}^{-})}} $$Its goal is direct: make the probability of the positive document as large as possible within the candidate set. From an engineering perspective, the model keeps learning which term expansions help it rank the correct document higher.\nFLOPS sparsity regularization If only ranking quality is optimized, the model may activate too many tokens. This may improve recall, but the inverted index becomes larger, and queries need to access more posting lists.\nTherefore, SPLADE introduces FLOPS regularization to control sparsity. For a batch of documents, first estimate the average activation of vocabulary token (j) in this batch:\n$$ \\overline{a}_{j}=\\frac{1}{N}\\sum_{i=1}^{N}w_{j}^{(d_i)} $$Then square and sum the average activations:\n$$ l_{FLOPS}=\\sum_{j\\in V}\\overline{a}_{j}^{2} =\\sum_{j\\in V}(\\frac{1}{N}\\sum_{i=1}^{N}w_{j}^{(d_i)})^{2} $$This regularization term is not simply controlling \u0026ldquo;vector dimensionality\u0026rdquo;. It controls the number and distribution of non-zero tokens. It tries to prevent the model from binding many documents to a few high-frequency words, and also prevents every document from activating too many terms.\nTherefore, the sparsity weight can be understood as a knob between recall quality and retrieval cost:\nLarger weight: shorter sparse vectors, smaller index, faster retrieval, but possibly lower recall. Smaller weight: longer sparse vectors and richer expansion, but higher index and retrieval cost. Overall loss Finally, SPLADE trains ranking loss and sparsity regularization together:\n$$ \\mathcal{L}=\\mathcal{L}_{rank-IBN} +\\lambda_q\\mathcal{L}_{reg}^{q} +\\lambda_d\\mathcal{L}_{reg}^{d} $$Here, (\\lambda_q) controls query-side sparsity, and (\\lambda_d) controls document-side sparsity. Query-side sparsity is usually very important because queries are more sensitive to latency. Document-side vectors can be computed offline, so slightly higher compute cost is often acceptable, but index size still needs to be controlled.\nFrom sum pooling to max pooling The original SPLADE aggregates term predictions from every input position:\n$$ w_{j}=\\sum_{i\\in t}\\log(1+ReLU(w_{ij})) $$The more common later SPLADE-max uses max pooling:\n$$ w_{j}=\\max_{i\\in t}\\log(1+ReLU(w_{ij})) $$This does not mean the whole text only keeps one token. Instead, it takes the maximum activation separately for each vocabulary dimension. This can reduce amplification from long text or repeated words, making the representation focus more on whether a semantic term is strongly activated, rather than simply depending on occurrence count.\nSPLADE-doc and distillation training Standard SPLADE encodes both query and document. In other words, both query-side and document-side representations may produce neural expansion terms. Retrieval calculates:\n$$ s(q,d)=\\sum_j w_j^q w_j^d $$SPLADE-doc is more focused on engineering efficiency. It only applies SPLADE encoding on the document side, while the query side usually uses only the original query tokens. The document score can be written as:\n$$ s(q,d)=\\sum_{j\\in q}w_j^d $$This means document-side expansion can be precomputed offline, and the query side does not need to run a SPLADE encoder, reducing latency. The tradeoff is that the query side has no neural expansion ability and can only use \u0026ldquo;document-side expansion\u0026rdquo;.\nIn addition, many strong SPLADE models use knowledge distillation and hard negatives. A common approach is to first train a first-stage retriever and a cross-encoder reranker, then continue training with harder negatives and reranker scores. In engineering practice, we do not have to reproduce this whole training pipeline to use public models. But understanding it helps explain why words like distil, ensemble, and cocondenser appear in model names.\nWhy sparsity matters If the model activates many tokens, recall may improve, but the index becomes larger and retrieval becomes slower. SPLADE uses FLOPS regularization to control the number and distribution of non-zero tokens.\nFrom an engineering perspective, sparse vectors are not better just because they are longer.\nToo few non-zero tokens: the index is small and retrieval is fast, but recall may be insufficient. Too many non-zero tokens: recall may be better, but the index expands and retrieval cost increases. In practice, secondary pruning is often applied, such as:\nKeeping only the top_k tokens. Filtering tokens whose weight is below a threshold. Limiting the maximum number of sparse dimensions for a single chunk. These parameters often affect online cost more than the model itself.\nModel selection SPLADE is more like a family of sparse neural retrieval methods than a single model. The official Naver repository also notes that different regularization strengths produce models ranging from \u0026ldquo;very sparse\u0026rdquo; to \u0026ldquo;strong query/doc expansion\u0026rdquo;. Their effectiveness, index size, and latency all differ.\nIf the goal is only to quickly validate engineering feasibility, naver/splade-cocondenser-ensembledistil is a good starting point. It is a common strong model in the official SPLADE++ series. The Naver repository reports its MS MARCO dev MRR@10 as 38.3, higher than splade_v2_max at 34.0 and splade_v2_distil at 36.8. It is suitable for first checking whether sparse retrieval can fill the keyword, entity, and terminology recall gaps of dense retrieval.\nIf inference cost matters more, consider naver/splade_v2_max or the efficient SPLADE series. splade_v2_max is structurally simple. Its Hugging Face model page marks it as DistilBERT base, with a 512-token maximum length, 30522-dimensional output, and dot-product similarity. The efficient SPLADE series further separates document encoder and query encoder, aiming to reduce query-side latency.\nA practical selection order is:\nFirst choose a strong public model for offline evaluation, such as naver/splade-cocondenser-ensembledistil. If offline evaluation is effective, then measure average non-zero token count, index size, document-side encoding throughput, and query-side P95 latency. If query-side latency is too high, first try query caching, ONNX/OpenVINO, quantization, or efficient SPLADE. If the index is too large, first reduce top-k, increase the minimum weight threshold, or choose a model with stronger regularization and higher sparsity. If business data differs greatly from public English retrieval datasets, consider fine-tuning with domain data instead of directly trusting public leaderboards. Do not choose a model only by MRR. SPLADE model selection should consider at least five things at the same time: retrieval quality, average non-zero dimensions, index size, query latency, and deployment complexity.\nSentence Transformers now provides SparseEncoder, which can directly load SPLADE models:\n1 2 3 4 from sentence_transformers import SparseEncoder model = SparseEncoder(\u0026#34;naver/splade-cocondenser-ensembledistil\u0026#34;) embeddings = model.encode([\u0026#34;example query\u0026#34;]) It also provides encode_query(), encode_document(), sparsity statistics, Qdrant/Elasticsearch/OpenSearch integration, and deployment capabilities related to ONNX/OpenVINO/quantization. For engineering prototypes, this route can be used first, and then the implementation can be moved to a custom inference service depending on performance bottlenecks.\nDifferences between SPLADE and BM25 BM25 and SPLADE can both use inverted indexes for retrieval, but their weights come from different sources.\nBM25 weights come from statistics, such as TF, IDF, and document length normalization. It mainly depends on exact matching between query terms and document terms.\nSPLADE weights come from neural model predictions. It can not only preserve tokens that appear in the original text, but may also activate semantically related tokens that do not appear in the original text.\nSo it can be roughly understood as:\n1 2 BM25 = statistical matching of original terms SPLADE = weighted matching of neural expansion terms In enterprise knowledge bases, technical documentation, customer-service FAQs, code documentation, policies, and regulations, both BM25 and SPLADE are valuable. BM25 is lighter, while SPLADE is stronger but more expensive.\n","date":"2026-06-27T00:00:00+08:00","permalink":"/en/p/sparse-vectors-and-the-splade-model/","title":"Sparse Vectors and the SPLADE Model"},{"content":"Recently, when using v2rayA on my Synology NAS, I encountered an issue where incorrect time synchronization caused it to stop working properly, which in turn prevented Emby from scraping metadata. Although I don\u0026rsquo;t know the exact reason, it\u0026rsquo;s an old problem. The version was very old, and I was using a Docker image from a third-party application store; currently, Docker can\u0026rsquo;t pull images via direct connection either. Here is my experience with solving this:\nReinstall v2rayA Configure the Docker version of Emby. Reference Websites:\nLinux Backup Installation Method - v2rayA\nXTLS/Xray-core: Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens. An open platform for various uses.\nImplementing Transparent Proxy on Synology - v2rayA\nsjtuross/syno-iptables: Some missing iptables modules for Synology\nSetting up Proxy for Container Manager (Docker) on Synology DSM 7.2 - CSDN Blog\nv2rayA According to the official v2rayA documentation, there is no specially adapted version for Synology, so we use the generic binary file for installation. Since we cannot pull files from GitHub over a direct connection, we download it on our computer and transfer it to the NAS. In addition, the installation steps below are actually slightly different from the official documentation.\nCopy v2rayA and xray\n1 2 3 4 cp v2raya_linux_x64 /usr/local/bin/v2raya cp xray /usr/local/bin/ chmod +x /usr/local/bin/v2raya chmod +x /usr/local/bin/xray Copy the GFWList proxy rule files included with xray: geoip.dat and geosite.dat\n1 2 3 cd /usr/local/share mkdir xray cp .../g* xray Modify the service configuration file\n1 sudo vi /etc/systemd/system/v2raya.service The content was generated by ChatGPT combined with the official documentation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [Unit] Description=v2rayA Service After=network.target [Service] Type=simple User=root Environment=\u0026#34;V2RAYA_CONFIG=/usr/local/etc/v2raya\u0026#34; Environment=\u0026#34;V2RAYA_LOG_FILE=/tmp/v2raya.log\u0026#34; Environment=\u0026#34;XRAY_LOCATION_ASSET=/usr/local/share/xray\u0026#34; ExecStart=/usr/local/bin/v2raya --passcheckroot Restart=on-failure LimitNOFILE=1000000 [Install] WantedBy=multi-user.target Activate v2rayA. According to the official documentation, run it as a service\n1 2 sudo systemctl start v2raya sudo systemctl status v2raya Configure it to start up automatically on boot. Theoretically, the configuration is complete after this step. Along the way, I also solved the issue of transparency proxy not working on Synology.\n1 2 sudo systemctl enable v2raya sudo systemctl is-enabled v2raya Activate Transparent Proxy\narch kernel iptables version system model platform version apollolake 4.4.180+ v1.8.3 DS918+ 7.0.1-42218 apollolake 4.4.59+ v1.6.0 DS918+ 6.2.3-25426 broadwell 3.10.105 v1.6.0 DS3617xs 6.2.3-25426 bromolow 3.10.105 v1.6.0 DS3615xs 6.2.3-25426 geminilake 4.4.180+ v1.8.3 DS920+ 7.1-42661 geminilake 4.4.302+ v1.8.3 DS220+ 7.2-64570 Due to the characteristics of the Synology system, first determine the architecture based on your model, then select the appropriate files for your machine from the downloaded modules from the GitHub repository (for example, my DS224 is geminilake).\nUpload the corresponding ko modules to /lib/modules/, and upload the corresponding so modules to /usr/lib/iptables/, and that\u0026rsquo;s it.\nRun sudo -i and then run the following insmod commands to attempt loading the ko kernel modules. Because the modules depend on each other, they must be loaded in a specific order.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 insmod /lib/modules/nfnetlink.ko insmod /lib/modules/ip_set.ko insmod /lib/modules/ip_set_hash_ip.ko insmod /lib/modules/xt_set.ko insmod /lib/modules/ip_set_hash_net.ko insmod /lib/modules/xt_mark.ko insmod /lib/modules/xt_connmark.ko insmod /lib/modules/xt_comment.ko insmod /lib/modules/xt_TPROXY.ko insmod /lib/modules/xt_socket.ko insmod /lib/modules/iptable_mangle.ko insmod /lib/modules/textsearch.ko insmod /lib/modules/ts_bm.ko insmod /lib/modules/xt_string.ko insmod /lib/modules/nf_nat_ipv6.ko insmod /lib/modules/nf_nat_masquerade_ipv6.ko insmod /lib/modules/ip6t_MASQUERADE.ko insmod /lib/modules/ip6table_nat.ko insmod /lib/modules/ip6table_raw.ko insmod /lib/modules/ip6table_mangle.ko However, after a system reboot, the modules need to be reloaded, so generate a script with the above content at /usr/local/bin/load_v2raya_mods.sh.\nThen restart the v2rayA service:\n1 sudo systemctl restart v2raya Emby After configuring v2rayA, even though I used the method of modifying the configuration file, Emby\u0026rsquo;s traffic still couldn\u0026rsquo;t successfully pass through v2rayA. After various fruitless attempts to modify the IP configuration, it naturally occurred to me that this was caused by Docker running as a host service.\nAfter modifying the configuration on v2rayA multiple times, I finally found that enabling the transparent proxy solved the problem, and it also fixed the issue of being unable to connect to the Docker repository. But a bigger problem arose: enabling the transparent proxy caused all external network access to fail (fortunately, local network connections still worked). I think \u0026ldquo;transparent proxy\u0026rdquo; can be understood as a \u0026ldquo;global proxy\u0026rdquo;.\nFor the issue of setting up a proxy specifically for Docker (Container Manager), please refer to the links in the preface.\nSelect Emby in \u0026ldquo;Images\u0026rdquo; and run it. Add the path to the media library in \u0026ldquo;Volume Settings\u0026rdquo;, add three environment variables in \u0026ldquo;Environment\u0026rdquo;, and select host mode in \u0026ldquo;Network\u0026rdquo;:\n1 2 3 HTTP_PROXY = http://127.0.0.1:20171 HTTPS_PROXY = http://127.0.0.1:20171 NO_PROXY = localhost,127.0.0.1 Complete the basic setup, install the previous anime-related plugins, and the scraping was successful.\n","date":"2026-02-16T11:12:15+08:00","permalink":"/en/p/synology-nas-proxy-and-emby-configuration/","title":"Synology NAS Proxy and Emby Configuration"},{"content":"Since I\u0026rsquo;ve been frequently downloading various models with Python recently, I finally decided to solve the long-standing issue of WSL2 not being able to use the Windows host\u0026rsquo;s proxy. After wasting half an afternoon, I finally got it working. With the help of my capable assistant Gemini, the main steps are as follows:\n.wslconfig settings in Windows Core settings of v2rayN Firewall settings in Windows Cleaning up old settings in WSL2 ~/.bashrc settings in WSL2 curl -v testing in WSL2 References:\nAdvanced settings configuration in WSL | Microsoft Learn\nWSL2 使用 V2RayN 局域网 proxychains 代理方案 · Issue #2653 · 2dust/v2rayN\n记一次用wsl2中共享宿主机的代理-v2rayN - 沉迷于学习，无法自拔^_^\n.wslconfig Access your personal account folder in Windows. Press Win + R, enter %UserProfile%, and hit Enter.\nCheck if there is a .wslconfig file. If not, create a new text file and name it .wslconfig.\nPaste the following content into .wslconfig:\n1 2 3 4 5 6 7 [wsl2] # Enable mirrored networking mode networkingMode=mirrored # Allow WSL2 to access Windows localhost localhostForwarding=true # Automatically synchronize proxy settings (optional: true/false) autoProxy=true Note that the autoProxy parameter determines how WSL2 handles the proxy. Setting it to true means you don\u0026rsquo;t need to configure ~/.bashrc anymore, but the problems are:\nWhen we use the env | grep -i proxy command, we will see many strange network-related variables, even though the proxy does successfully work. The proxy cannot be toggled on or off within WSL2, causing a lot of traffic to go through the proxy unnecessarily. Later, we will set it to false so that we can easily control the proxy switch inside WSL2, ensuring a clear and transparent WSL2 system.\nIn the Windows terminal, enter wsl --shutdown to shut down WSL2.\nv2rayN In the basic settings of v2rayN (version V7.15.7 at the time of writing), enable \u0026ldquo;Allow connections from the LAN\u0026rdquo; and \u0026ldquo;Open a new port for the LAN\u0026rdquo; (optional). At the bottom left of the v2rayN client\u0026rsquo;s main interface, you can see the port open for the internet, which is 10810 in my case. The system proxy here is \u0026ldquo;Set system proxy automatically\u0026rdquo;, and the routing mode is \u0026ldquo;Bypass (Whitelist)\u0026rdquo;. Then select a node and keep v2rayN running. FireWall Type \u0026ldquo;firewall\u0026rdquo; in the Windows search box and select Windows Defender Firewall. Click \u0026ldquo;Allow an app or feature through Windows Defender Firewall\u0026rdquo;. Find or add v2rayN.exe and its core programs (such as v2ray.exe or xray.exe), ensuring that both Private and Public are checked. The usual paths are under v2rayN\\ and v2rayN\\bin\\xray, respectively. In the advanced settings of the firewall, create a new inbound rule to allow TCP traffic on port 10810 (corresponding to v2rayN). wsl2 Cleaning Up Old Settings 1 2 3 4 5 6 7 8 # 1. Clear old, messy proxy environment variables unset http_proxy unset https_proxy unset no_proxy # 2. Re-set correct proxy (pointing to localhost port in mirrored mode) export http_proxy=\u0026#34;http://127.0.0.1:10810\u0026#34; export https_proxy=\u0026#34;http://127.0.0.1:10810\u0026#34; Ensure that you have enabled mirrored networking mode in the first step and successfully restarted WSL2. Of course, this step is not strictly necessary either, because ~/.bashrc will automatically handle these old variables later.\n~/.bashrc 1 2 # Same applies to other editors sudo vim ~/.bashrc After multiple experiments with Gemini, here is the content to append to the bottom of ~/.bashrc:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 function proxy_on() { # Thoroughly clean up any residual variables (prevent conflicts from mixed case) unset http_proxy https_proxy ALL_PROXY NO_PROXY HTTP_PROXY HTTPS_PROXY all_proxy no_proxy # Set the port you verified works (since 10810 was tested successfully, use 10810) export hostip=\u0026#34;127.0.0.1\u0026#34; export port=\u0026#34;10810\u0026#34; export http_proxy=\u0026#34;http://$hostip:$port\u0026#34; export https_proxy=\u0026#34;http://$hostip:$port\u0026#34; export all_proxy=\u0026#34;socks5://$hostip:$port\u0026#34; # no_proxy here only keeps localhost export no_proxy=\u0026#34;localhost,127.0.0.1\u0026#34; echo \u0026#34;WSL Proxy: ON (127.0.0.1:10810)\u0026#34; } function proxy_off() { unset http_proxy https_proxy ALL_PROXY NO_PROXY HTTP_PROXY HTTPS_PROXY all_proxy no_proxy echo \u0026#34;WSL Proxy: OFF\u0026#34; } After saving the above content, enter:\n1 source ~/.bashrc And then:\n1 curl -I https://www.google.com The following output should appear, indicating a successful connection:\nI also tested it with ping, which didn\u0026rsquo;t work but doesn\u0026rsquo;t affect its usability.\n1 HTTP/1.1 200 Connection established I tried downloading the model again, and it successfully worked:\n1 2 import gensim.downloader as api wv_from_bin = api.load(\u0026#34;glove-wiki-gigaword-200\u0026#34;) Solution If It Still Fails The most effective method is, after completing the above steps, to enter the following in WSL2:\n1 curl -v https://www.google.com Then feed the output result to an AI, and it will tell you what to do.\n","date":"2026-01-30T16:31:20+08:00","permalink":"/en/p/how-to-use-host-proxy-on-wsl2/","title":"How to Use Host Proxy on WSL2"},{"content":"This post covers the learning of basic PyTorch usage, primarily based on Bilibili Xiaotudui\u0026rsquo;s https://www.bilibili.com/video/BV1hE411t7RN/ and with the help of Gemini. In addition to practical applications, it also includes the principles of some deep learning models.\nDataset An abstract class representing a Dataset.\nAll datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__, which is expected to return the size of the dataset by many ~torch.utils.data.Sampler implementations and the default options of ~torch.utils.data.DataLoader. Subclasses could also optionally implement __getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.\nnote\n~torch.utils.data.DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided. An abstract class representing a dataset.\nAll datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__, which is expected to return the size of the dataset by many ~torch.utils.data.Sampler implementations and the default options of ~torch.utils.data.DataLoader. Subclasses could also optionally implement __getitems__, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.\nNote\nBy default, ~torch.utils.data.DataLoader constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.\nCustom Dataset So in actual usage, suppose there is a dataset (let\u0026rsquo;s use the previously used fer2013 as an example):\nFirst, define a data class inheriting from Dataset, and define __init__()\n1 2 3 4 5 6 7 8 from torch.utils.data import Dataset import os class MyData(Dataset): def __init__(self, root_dir, label_dir): self.root_dir = root_dir # E.g., fer2013/test self.label_dir = label_dir # E.g., angry self.path = os.path.join(self.root_dir, self.label_dir) # Path merging function, solves cross-system file path issues self.img_path = os.listdir(self.path) # E.g., the string list of all image names under the angry folder Override __getitem()\n1 2 3 4 5 6 def __getitem__(self, index): img_name = self.img_path[index] img_item_path = os.path.join(self.root_dir, self.label_dir, img_name) # Concatenate the specific path for a certain image img = Image.open(img_item_path) label = self.label_dir return img, label # Return an object (image) from the dataset and its type Override __len()__\n1 2 def __len__(self): return len(self.img_path) Definition Example\n1 2 3 root_dir = \u0026#34;fer2013/train\u0026#34; angry_label_dir = \u0026#34;angry\u0026#34; angry_dataset = MyData(root_dir, angry_label_dir) Merging Datasets\n1 2 3 4 disguest_label_dir = \u0026#34;disguest\u0026#34; disgust_dataset = MyData(root_dir, disguest_label_dir) train_dataset = angry_dataset + disgust_dataset # \u0026#39;+\u0026#39; TensorBoard SummaryWriter Create a log directory Initialize the Writer object Generate event files This file is the real database (events.out\u0026hellip;\u0026hellip;). When writer.add_scalar is subsequently called, the data is not drawn directly on the screen, but appended to this file. Once you finish executing the corresponding code and run tensorboard --logdir=logs, TensorBoard\u0026rsquo;s backend server reads the contents of this file and renders it into charts in the browser. 1 2 3 from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter(\u0026#34;logs\u0026#34;) # This line of code will create a folder named \u0026#34;logs\u0026#34; in your project\u0026#39;s root directory Note that you can use a different folder to record data for each experiment. For instance, writer = SummaryWriter(\u0026quot;logs/lr0.01_batch32\u0026quot;), and after modifying the learning rate, writer = SummaryWriter(\u0026quot;logs/lr0.001_batch32\u0026quot;).\nadd_scalar() Drawing function add_scalar() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 (method) def add_scalar( tag: Any, scalar_value: Any, # Corresponds to the y-axis of the image global_step: Any | None = None, # Corresponds to the x-axis of the image walltime: Any | None = None, new_style: bool = False, double_precision: bool = False ) -\u0026gt; None ​``` Example ``` for i in range(100): writer.add_scalar(\u0026#34;y=2x\u0026#34;, 2*i, i) writer.close() add_image() Image viewing function add_image() 1 2 3 4 5 6 7 (method) def add_image( tag: Any, img_tensor: Any, global_step: Any | None = None, walltime: Any | None = None, dataformats: str = \u0026#34;CHW\u0026#34; ) -\u0026gt; None Pay attention to the type requirements for the parameters: img_tensor (torch.Tensor, numpy.ndarray, or string/blobname): Image data, so images of PIL type need to be converted, for instance using numpy. Also mind the data shape requirements: Tensor with :math:(1, H, W), :math:(H, W), :math:(H, W, 3) is also suitable as long as corresponding dataformats argument is passed, e.g. CHW, HWC, HW. Meaning three image data formats where the order of number of channels, height, and width differ.\n1 2 3 4 5 img = Image.open(image_path) img_array = np.array(img) print(f\u0026#34;Image shape: {img_array.shape}\u0026#34;) writer.add_image(\u0026#34;test\u0026#34;,img_array,1,dataformats=\u0026#39;HW\u0026#39;) # Obvious from the .shape writer.close() add_graph() Common Transforms The following are mostly image processing methods.\nToTensor Convert a PIL Image or ndarray to tensor and scale the values accordingly.\nThis transform does not support torchscript.\nConverts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8\nIn the other cases, tensors are returned without scaling.\nnote\nBecause the input image is scaled to [0.0, 1.0], this transformation should not be used when transforming target image masks. See the [references](vscode-file://vscode-app/d:/Microsoft VS Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) for implementing the transforms for image masks.\nConvert PIL images to torch type, such as torch.Size([1, 48, 48])\n1 2 3 4 5 trans_totensor = transforms.ToTensor() img_tensor = trans_totensor(img) print(img_tensor.shape) writer.add_image(\u0026#34;ToTensor\u0026#34;,img_tensor) writer.close() Normalize Normalize a tensor image with mean and standard deviation.\n​ This transform does not support PIL Image.\n​ Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n\n​ channels, this transform will normalize each channel of the input\n​ torch.*Tensor i.e.,\n​ output[channel] = (input[channel] - mean[channel]) / std[channel]\nnote:\n​ This transform acts out of place, i.e., it does not mutate the input tensor.\nArgs:\n​ mean (sequence): Sequence of means for each channel.\n​ std (sequence): Sequence of standard deviations for each channel.\n​ inplace(bool,optional): Bool to make this operation in-place.\nThe mathematical principle of Normalize is: $$ output = \\frac{input-mean}{std} $$ The parameter mean affects the \u0026ldquo;center position\u0026rdquo;. After ToTensor, pixels range from $[0, 1]$, centered around $0.5$. If mean = 0.5, subtracting $0.5$ shifts the data center to $0$. The original range of $[0, 1]$ becomes $[-0.5, 0.5]$. The parameter std affects the \u0026ldquo;scaling magnitude\u0026rdquo;. E.g., std = 0.5 means dividing the data range by $0.5$, making the final range $[-1, 1]$ from $[-0.5, 0.5]$. Resize Resize the input image to the given size.\n​ If the image is torch Tensor, it is expected\n​ to have [\u0026hellip;, H, W] shape, where \u0026hellip; means a maximum of two leading dimensions\nArgs:\n​ size (sequence or int): Desired output size. If size is a sequence like\n​ (h, w), output size will be matched to this. If size is an int,\n​ smaller edge of the image will be matched to this number.\n​ i.e, if height \u0026gt; width, then image will be rescaled to\n​ (size * height / width, size).\nresize() supports both PIL and Tensor image formats. If it\u0026rsquo;s a Tensor, the expected shape is [..., H, W]. The ... here indicates it can handle [C, H, W] (single image) or [B, C, H, W] (a Batch of images). The parameter size needs to be written as a sequence, like resize((512, 512)). If only one parameter is input, like resize(512), the image\u0026rsquo;s shorter edge becomes 512, while the longer edge scales proportionally. 1 2 3 4 print(img.size) trans_resize = transforms.Resize((224,224)) img_resize = trans_resize(img) print(img_resize) Compose Composes several transforms together. This transform does not support torchscript.\n​ Please, see the note below.\nArgs:\n​ transforms (list of Transform objects): list of transforms to compose.\nThe Compose() operation is a pipeline class for various transforms operations. In deep learning, images usually go through a fixed series of steps (e.g., resizing -\u0026gt; converting to Tensor -\u0026gt; normalizing). If Compose isn\u0026rsquo;t used, you have to manually call multiple functions for every image, making the code highly redundant. The argument type for Compose() is a list, and the operations fire sequentially, so the data type output by the previous operation must be acceptable as input for the next. 1 2 3 4 5 6 7 8 9 from torchvision import transforms # Define preprocessing for the training set train_transform = transforms.Compose([ transforms.Resize((224, 224)), # VGG16 standard input is 224x224 transforms.RandomHorizontalFlip(), # Data augmentation: Random horizontal flip transforms.ToTensor(), # Normalize to [0.0, 1.0] transforms.Normalize([0.5], [0.5]) # Standardize to [-1.0, 1.0] ]) img_tensor = train_transform(img) PyTorch Dataset Usage For instance, to import a dataset for computer vision learning, we can directly download the dataset within the program.\n1 2 3 4 5 import torchvision train_set = torchvision.datasets.CIFAR10(root=\u0026#34;./dataset...\u0026#34;, train=True, download=True) test_set = torchvision.datasets.CIFAR10(root=\u0026#34;./dataset...\u0026#34;, train=false, download=True) The root parameter indicates where the dataset is stored, train specifies whether the dataset is for training, and download indicates whether to download it locally (it generates a download link). Specific parameter configurations may differ for each dataset\u0026hellip; If the dataset has already been downloaded locally, it can be copied into the project\u0026rsquo;s dataset directory, saving download time upon running. 1 2 3 4 5 6 print(test_set.classes) # You can see all the categories in the test dataset img, target = test_set[0] print(img) print(target) print(test_set.classes[target]) # Output the category corresponding to the first element in the test set DataLoader Data loader combines a dataset and a sampler, and provides an iterable over the given dataset.\n​ The :class:~torch.utils.data.DataLoader supports both map-style and\n​ iterable-style datasets with single- or multi-process loading, customizing\n​ loading order and optional automatic batching (collation) and memory pinning.\nWhen training a model, a massive volume of data from the dataset cannot be crammed into memory all at once. DataLoader achieves:\nBatching: Packages images into groups (Batches). Shuffling: Shuffles data randomly at the start of every training epoch, ensuring the model prevents rote memorization of the data\u0026rsquo;s ordering. Parallel Computing: Leverages multi-core CPUs to pre-prepare proceeding batches of data, letting GPUs avoid idle time waiting. Parameter Common Values Description dataset Custom Dataset Required. Tells DataLoader which \u0026ldquo;warehouse\u0026rdquo; to fetch data from. batch_size 16, 32, 64\u0026hellip; Number of samples loaded per batch. The larger it is, the faster the training, but it consumes more VRAM. In FER emotion recognition, 32 or 64 are common values. shuffle True / False Whether to shuffle the order. Training sets are usually set to True (adding randomness); test sets are usually set to False. num_workers 0, 2, 4, 8\u0026hellip; Multi-process loading. 0 means only the main process is used (slow). Increasing the value speeds up read times. Recommendation: Set to half of your CPU cores. drop_last True / False Drop the last incomplete batch. E.g., if there are 100 images and batch_size=32. The remaining 4 images aren\u0026rsquo;t enough for a batch. Setting this to True discards these 4, ensuring each Batch size is consistent. pin_memory True Page-locked memory. If training on a GPU, setting to True accelerates data transfer speeds from RAM to VRAM. For example, we use DataLoader to process data from CIFAR10. 1 2 3 test_data = torchvision.datasets.CIFAR10(\u0026#34;./dataset\u0026#34;, train=False, transform=torchvision.transforms.ToTensor(), download=True) test_loader = DataLoader(dataset=test_data, batch_size=4, shuffle=True, num_workers=0, drop_last=False) Combining loops and tensorboard, output images used in each step arrayed across every epoch. In the code below, step + epoch * len(test_loader) utilizes a global step size, but alternative setups skipping this to treat each epoch identically completely as distinct grouped iterations behave comparably similarly. 1 2 3 4 5 6 7 8 writer = SummaryWriter(\u0026#34;dataloader\u0026#34;) for epoch in range(2): step = 0 for data in test_loader: imgs, targets = data writer.add_images(\u0026#34;test_data_batch\u0026#34;, imgs, step + epoch * len(test_loader)) step = step + 1 writer.close() nn.Module Base class for all neural network modules.\n​ Your models should also subclass this class.\n​ Modules can also contain other Modules, allowing to nest them in a tree structure.\nIn PyTorch, whether it\u0026rsquo;s a simple linear layer or a complex VGG16 or Transformer, they are all essentially an nn.Module. It is the base class for all neural network modules. nn.Module supports nesting. When you call model.to(\u0026quot;cuda\u0026quot;) on a large model, PyTorch automatically traverses this \u0026ldquo;tree\u0026rdquo; and moves all its sub-layers to the GPU. As long as you assign a layer to self.xxx in __init__, PyTorch will automatically identify its Weights and Bias and add them to the list of parameters to be optimized. When writing a subclass of nn.Module, you must override __init__() and forward().\n__init(self)__ Define network layers here (convolution, pooling, fully connected, etc.). You must call super().__init__(). This line initializes the parent class\u0026rsquo;s properties; without it, PyTorch won\u0026rsquo;t be able to automatically track the defined layers, and the model won\u0026rsquo;t train. forward(self, x) Defines the flow of data. Specifies which layers an image passes through sequentially. You do not need to manually call forward. Simply run model(input), and PyTorch will automatically trigger forward. 1 2 3 4 5 6 7 8 9 10 import torch from torch import nn class myModule(nn.Module): def __init__(self): super().__init__() def forward(self, input): output = input + 1 # Simply add 1 to the input and output it return output Convolution Conv $$ \\text{out}(N_i, C_{\\text{out}_j}) = \\text{bias}(C_{\\text{out}_j}) + \\sum_{k = 0}^{C_{\\text{in}} - 1} \\text{weight}(C_{\\text{out}_j}, k) \\star \\text{input}(N_i, k) $$ Convolution animation page: conv_arithmetic/README.md at master · vdumoulin/conv_arithmetic Parameter Meaning Function in_channels Input Channels Usually 3 (RGB) for color images, and 1 for grayscale images. out_channels Output Channels Number of convolution kernels. The number of kernels determines the number of layers in the output feature map. kernel_size Kernel Size The size of the \u0026ldquo;window\u0026rdquo; used to extract features. Commonly 3 or 5 (VGG usually defaults to 3). stride Stride The span at which the window slides. Defaults to 1. The larger the stride, the smaller the output image. padding Padding Pads with 0s around the image. 'same' keeps the size unchanged, while 'valid' applies no padding. dilation Dilated Convolution The spacing between points in the convolution kernel. Used to increase the receptive field (without increasing the number of parameters). bias Bias Whether to add a constant offset to the result. Enabled by default. The Padding parameter of convolution is very important. If zeros are not padded around it, the convolution will cause the image size to become smaller and smaller. Shape calculation formula: $$ H_{out} = \\left\\lfloor\\frac{H_{in} + 2 \\times \\text{padding}[0] - \\text{dilation}[0] \\times (\\text{kernel_size}[0] - 1) - 1}{\\text{stride}[0]} + 1\\right\\rfloor $$$W_{out}$ calculations follow a similar approach.\n1 2 3 4 5 6 7 8 9 10 11 12 13 dataset = torchvision.datasets.CIFAR10(\u0026#34;./dataset\u0026#34;,train = False, transform=torchvision.transforms.ToTensor(),download=True) dataloader = DataLoader(dataset, batch_size=64) class myModule(nn.Module): def __init__(self): super().__init__() self.conv1 = Conv2d(in_channels=3,out_channels=6,kernel_size=3,stride=1,padding=0) def forward(self,x): x = self.conv1(x) return x mymodule = myModule() # Model instantiation 1 2 3 4 5 6 7 8 9 10 step = 0 for data in dataloader: imgs, targets = data output = mymodule(imgs) print(imgs.shape) # torch.Size([64, 3, 32, 32]) print(output.shape) # torch.Size([64, 6, 30, 30]) channel == 6 After convolution, the number of channels changes, so the image cannot be directly outputted step = step + 1 (Max) Pooling MaxPool The logic of max pooling is extremely simple: within a window range (Kernel), only the largest value is kept, and the rest are discarded. It preserves input features while simultaneously reducing data volume, speeding up training. Parameter Unique Features kernel_size Window size. Typically 2 (meaning merging a $2 \\times 2$ region). stride Default value equals kernel_size! This differs from convolution. If kernel_size=2, the stride defaults to 2, so the windows do not overlap. ceil_mode Very important. The default is False (floor). If set to True (ceiling), when the window exceeds boundaries, as long as there is data in the window, the result will be retained instead of discarded. padding Padding. Note that pooling pads with negative infinity ($-\\infty$), ensuring padded spots are not selected as the maximum value. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 input = torch.tensor([[1,2,0,3,1], [0,1,2,3,1], [1,2,1,0,0], [5,2,3,1,1], [2,1,0,1,1]], dtype=float) input = torch.reshape(input,(-1,1,5,5)) print(input.shape) class myMoudle(nn.Module): def __init__(self, *args, **kwargs): super(myMoudle, self).__init__() self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=True) \u0026#39;\u0026#39;\u0026#39; ceil_mode = False means it only takes the pooling result when the pooling kernel encounters the maximum expected size (e.g., 3x3), otherwise it discards it \u0026#39;\u0026#39;\u0026#39; def forward(self, input): output = self.maxpool1(input) return output mymoudle = myMoudle() output = mymoudle(input) print(output) 1 2 3 torch.Size([1, 1, 5, 5]) tensor([[[[2., 3.], [5., 1.]]]], dtype=torch.float64) Loss Functions and Backpropagation The loss function is used to calculate the gap between the actual output and the target, providing a basis for backpropagation and parameter updates. In classification tasks, the cross-entropy function is commonly used to calculate the error. 1 loss_func = nn.CrossEntropyLoss() Optimizer torch.optim — PyTorch 2.10 documentation\nExample code: 1 2 3 4 5 6 for input, target in dataset: optimizer.zero_grad() # Clear gradients output = model(input) loss = loss_fn(output, target) # Call loss function loss.backward() # Backpropagation optimizer.step() PyTorch in Practice: CIFAR10 A practical example of a simple classification model targeting the CIFAR10 image dataset. First, let\u0026rsquo;s understand Sequential. nn.Sequential is a special subclass of nn.Module whose purpose is to automatically execute the forward logic. Note: each parameter inside is a class for a certain layer, meaning they must be comma-separated. Sequential simplifies both the model definition and the forward() operation. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 class CIFAR10_Simple(nn.Module): def __init__(self, *args, **kwargs): super(CIFAR10_Simple, self).__init__(*args, **kwargs) self.conv1 = Conv2d(in_channels=3, out_channels=32, kernel_size=5, padding=2) \u0026#39;\u0026#39;\u0026#39; The value of the padding parameter can be derived from visualizing the image: for a 5x5 convolution kernel, when centered at the image\u0026#39;s (0,0), the kernel extends outwards by 2 units. This is a simplified estimation method; in reality, you should substitute values into the dimension formula for calculation (refer to the \u0026#34;Convolution Conv\u0026#34; section). \u0026#39;\u0026#39;\u0026#39; self.model_s = Sequential( Conv2d(in_channels=3, out_channels=32, kernel_size=5, padding=2), nn.ReLU(), MaxPool2d(kernel_size=2), Conv2d(in_channels=32, out_channels=32, kernel_size=5, padding=2), nn.ReLU(), MaxPool2d(2), Conv2d(32,64,5,padding=2), nn.ReLU(), MaxPool2d(2), Flatten(), Linear(1024, 64), nn.ReLU(), Linear(64, 10) ) def forward(self, x): \u0026#39;\u0026#39;\u0026#39; x = self.conv1(x) x = self.maxpool1(x) x = self.conv2(x) x = self.maxpool2(x) x = self.conv3(x) x = self.maxpool3(x) x = self.flatten(x) x = self.linear1(x) x = self.linear2(x) \u0026#39;\u0026#39;\u0026#39; x = self.model_s(x) return x Before building the model, initially set up DataLoader to handle the dataset.\nPrepare train_data and test_data independently respectively. 1 2 3 4 5 6 dataset_transfrom = tf.Compose([tf.ToTensor(),tf.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))]) train_data = torchvision.datasets.CIFAR10(\u0026#34;./dataset\u0026#34;, transform=dataset_transfrom, download=True) test_data = torchvision.datasets.CIFAR10(\u0026#34;./dataset\u0026#34;, train=False, transform=dataset_transfrom, download=True) # -------- train_loader = DataLoader(dataset=train_data, batch_size=64, shuffle=True, drop_last=True) test_loader = DataLoader(dataset=test_data, batch_size=64, shuffle=True, drop_last=True) After building out the model properly, run a trivial check testing evaluating exactly asserting checking whether output dimension correctly matches expectation guidelines appropriately cleanly. 1 2 3 4 5 cifar = CIFAR10_Simple() print(cifar) input = torch.ones((64, 3, 32, 32)) # Test with identical dataset picture sizes output = cifar(input) print(output.shape) Basic settings before training Define TensorBoard writer Setup device to call the graphics card for accelerated training Instantiate the training model Define the loss function Set up the optimizer 1 2 3 4 5 6 writer = SummaryWriter(\u0026#34;./logs\u0026#34;) writer.add_graph(cifar, input) device = torch.device(\u0026#34;cuda\u0026#34; if torch.cuda.is_available() else \u0026#34;cpu\u0026#34;) model = CIFAR10_Simple().to(device) loss_func = nn.CrossEntropyLoss() # Cross-entropy loss function optim = torch.optim.SGD(cifar.parameters(), lr=0.01) # Learning rate Training section Boilerplate code for the optimizer Record training loss 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 total_step = 0 for epoch in range(10): # --- Training section --- model.train() for data in train_loader: imgs, targets = data outputs = model(imgs.to(device)) loss = loss_func(outputs, targets.to(device)) optim.zero_grad() loss.backward() optim.step() # Record training loss writer.add_scalar(\u0026#34;Train_Loss\u0026#34;, loss.item(), total_step) total_step += 1 Evaluation section Execute once per epoch with torch.no_grad(): Turn off gradient recording Compile performance metrics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 model.eval() total_test_loss = 0 total_accuracy = 0 with torch.no_grad(): # No need to calculate gradients during testing, saves performance for data in test_loader: imgs, targets = data imgs, targets = imgs.to(device), targets.to(device) outputs = model(imgs) # Calculate total loss loss = loss_func(outputs, targets) total_test_loss += loss.item() # Calculate accuracy: argmax(1) finds the category index with the highest probability accuracy = (outputs.argmax(1) == targets).sum() total_accuracy += accuracy Visualization 1 2 3 4 5 6 7 # Output to TensorBoard writer.add_scalar(\u0026#34;Test_Loss\u0026#34;, total_test_loss / len(test_loader), epoch) writer.add_scalar(\u0026#34;Test_Accuracy\u0026#34;, total_accuracy / len(test_data), epoch) print(f\u0026#34;Epoch {epoch+1} finished, Accuracy: {total_accuracy / len(test_data)}\u0026#34;) writer.close() ","date":"2026-01-22T10:49:28+08:00","permalink":"/en/p/pytorch-basics/","title":"PyTorch Basics"},{"content":"Data Mining Review Content comes from PPTs and the key information highlighted at the end of the class.\nReview Question PPT What relationships does association rule mining in data mining primarily aim to discover between data items?\nAssociation rule mining is mainly used to discover frequent co-occurrences or hidden associative relationships between data items.\nIn cluster analysis, what does the K value in the K-means algorithm represent?\nIn the K-means clustering algorithm, the K value represents the number of clusters the user expects the dataset to be partitioned into.\nWhat algorithms exist for decision trees? What criteria do they mainly base their feature selection for partitioning on, and analyze the shortcomings of each criterion.\nAlgorithm Criterion Shortcoming ID3 Information Gain Strongly prefers multi-valued features, prone to overfitting C4.5 Information Gain Ratio Computation is more complex, might prefer features with fewer values CART Gini Index Has a slight preference for multi-valued features, tends towards unbalanced splits Given a simple text classification training set used to determine whether an email is \u0026ldquo;Spam\u0026rdquo;. The dictionary contains the following 5 words: [\u0026quot;deal\u0026quot;, \u0026quot;money\u0026quot;, \u0026quot;urgent\u0026quot;, \u0026quot;meeting\u0026quot;, \u0026quot;free\u0026quot;]\nTraining data:\nSpam (Spam, S): \u0026ldquo;deal free money\u0026rdquo; \u0026ldquo;urgent free deal\u0026rdquo; \u0026ldquo;money urgent free\u0026rdquo; Non-spam (Ham, H): \u0026ldquo;meeting deal\u0026rdquo; \u0026ldquo;urgent meeting\u0026rdquo; The contents of the new email to be classified are: \u0026ldquo;free urgent meeting\u0026rdquo;\nTask requirements: Use the multinomial Naive Bayes model (applying Laplace smoothing, where the smoothing parameter $\\lambda=1$) to classify the email. Please complete the following calculations step-by-step:\nCalculate the prior probabilities $P(Spam)$ and $P(Ham)$.\nTotal number of documents: $N_{\\text{doc}} = 3 + 2 = 5$\n$P(S) = \\frac{3}{5} = 0.6$\n$P(H) = \\frac{2}{5} = 0.4$\nPrior probability results:\n$P(S) = 0.6, \\quad P(H) = 0.4$\nCalculate the class-conditional probability of each word under the Spam and Ham categories.\nDictionary size $|V| = 5$\nSpam:\ndeal: 2, money: 2, urgent: 2, meeting: 0, free: 3\nTotal number of words: $2+2+2+0+3 = 9$ Denominator: $9 + 5 = 14$ $P(\\text{deal}|S) = \\frac{2+1}{14} = \\frac{3}{14}$\n$P(\\text{money}|S) = \\frac{2+1}{14} = \\frac{3}{14}$\n$P(\\text{urgent}|S) = \\frac{2+1}{14} = \\frac{3}{14}$\n$P(\\text{meeting}|S) = \\frac{0+1}{14} = \\frac{1}{14}$\n$P(\\text{free}|S) = \\frac{3+1}{14} = \\frac{4}{14}$\nHam:\ndeal: 1, money: 0, urgent: 1, meeting: 2, free: 0\nTotal number of words: $1+0+1+2+0 = 4$\nDenominator: $4 + 5 = 9$\n$P(\\text{deal}|H) = \\frac{1+1}{9} = \\frac{2}{9}$\n$P(\\text{money}|H) = \\frac{0+1}{9} = \\frac{1}{9}$\n$P(\\text{urgent}|H) = \\frac{1+1}{9} = \\frac{2}{9}$\n$P(\\text{meeting}|H) = \\frac{2+1}{9} = \\frac{3}{9}$\n$P(\\text{free}|H) = \\frac{0+1}{9} = \\frac{1}{9}$\nExpress the new email \u0026ldquo;free urgent meeting\u0026rdquo; as a term frequency vector.\nTerm frequency vector for \u0026ldquo;free urgent meeting\u0026rdquo; (in dictionary order: deal, money, urgent, meeting, free):\n$[0, 0, 1, 1, 1]$\nCalculate the posterior probabilities of the email belonging to Spam and Ham.\nSpam:\n$P(S|d) \\propto P(S) \\times P(\\text{urgent}|S) \\times P(\\text{meeting}|S) \\times P(\\text{free}|S)$\n$= 0.6 \\times \\frac{3}{14} \\times \\frac{1}{14} \\times \\frac{4}{14}$\n$= 0.6 \\times \\frac{12}{2744} \\approx 0.002624$\nHam:\n$P(H|d) \\propto P(H) \\times P(\\text{urgent}|H) \\times P(\\text{meeting}|H) \\times P(\\text{free}|H)$\n$= 0.4 \\times \\frac{2}{9} \\times \\frac{3}{9} \\times \\frac{1}{9}$\n$= 0.4 \\times \\frac{6}{729} \\approx 0.003292$\nNormalization:\n$P(S|d) = \\frac{0.002624}{0.002624 + 0.003292} \\approx 0.444$\n$P(H|d) = \\frac{0.003292}{0.002624 + 0.003292} \\approx 0.556$\nDetermine its final category based on the posterior probabilities.\nSince $P(H|d) \u003e P(S|d)$, the new email is classified as:\n$\\boxed{Ham}$\n2.1.4 Types of Attributes Nominal, Ordinal, Interval, Ratio\nAttribute Type Description Examples Operations Categorical (Qualitative) Nominal The values of a nominal attribute are just different names, meaning nominal values only provide enough information to distinguish objects (=, ≠) Zip code, employee ID number, gender Mode, entropy, contingency correlation, chi-square test Ordinal The values of an ordinal attribute provide enough information to determine the order of objects (\u0026lt;, \u0026gt;) Ore hardness {good, average}, grade levels, street numbers Median, percentiles, rank correlation, run test, sign test Numeric (Quantitative) Interval For interval attributes, the differences between values are meaningful, meaning a unit of measurement exists (+, -) Calendar dates, Celsius or Fahrenheit temperatures Mean, standard deviation, Pearson correlation coefficient, t and F tests Ratio For ratio attributes, both differences and ratios are meaningful (+, -, *, /) Absolute temperature, monetary amounts, counts, age, mass, length, electrical current Geometric mean, harmonic mean, percent variation 2.2.1 Measures of Central Tendency Mean, median, and mode\n$Mean-mode = 3 * (mean - median)$\n2.2.2 Interquartile Range The first quartile, i.e., the data at the 25th percentile is $(n-1)/4$ (This is only for the first quartile; if it is the third quartile, it needs to be multiplied by 3).\nIf n is even, we need $+0.25\\times (d_{n+1}-d_n)$ (This is only for the first quartile; if it is the third quartile, it\u0026rsquo;s 0.75).\n2.4.4 Proximity Measures for Numeric Attributes The Minkowski distance between two $p$-dimensional variables $x_1 = \\{x_{11}, x_{12}, \\ldots, x_{1p}\\}$ and $x_2 = \\{x_{21}, x_{22}, \\ldots, x_{2p}\\}$ is defined as:\n$$ d(i,j) = \\sqrt[q]{|x_{i1} - x_{j1}|^q + |x_{i2} - x_{j2}|^q + \\cdots + |x_{ip} - x_{jp}|^q} $$When $q=1$, it represents the Manhattan distance:\n$$ d(i,j) = |x_{i1} - x_{j1}| + |x_{i2} - x_{j2}| + \\cdots + |x_{ip} - x_{jp}| $$When $q=2$, it represents the Euclidean distance:\n$$ d(i,j) = \\sqrt{|x_{i1} - x_{j1}|^2 + |x_{i2} - x_{j2}|^2 + \\cdots + |x_{ip} - x_{jp}|^2} $$When $q \\to \\infty$, it represents the Chebyshev distance:\n$$ d(i,j) = \\lim_{q \\to \\infty} \\left( \\sum_{k=1}^p |x_{ik} - x_{jk}|^q \\right)^{\\frac{1}{q}} = \\max_{1 \\le k \\le p} |x_{ik} - x_{jk}| $$n Euclidean and Manhattan distances satisfy the following mathematical properties\nPositive definiteness: the distance is a non-negative number d(i,j)\u0026gt;0, if i≠j d(i,i)=0\nSymmetry: d(i,j)=d(j,i)\nTriangle inequality\n2.4.8 Cosine Similarity Cosine similarity can be used to compare the similarity of documents\n$$ s(x, y) = \\frac{x^T y}{\\|x\\|_2 \\|y\\|_2} \\quad x = [1, 1, 0, 0] \\quad y = [0, 1, 1, 0] $$$$ s(x, y) = \\frac{0 + 1 + 0 + 0}{\\sqrt{2} \\sqrt{2}} = 0.5 $$3.2 Data Preprocessing Data cleaning\nFill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies Data integration\nIntegrate multiple databases, data cubes, or files Data reduction\nDimensionality reduction, data compression, Numerosity Reduction Data transformation and data discretization\nNormalization Concept hierarchy generation 3.2.1 How to Handle Missing Data Ignore the tuple When the class label is missing, in supervised learning When the proportion of missing values for a specific attribute is large Manually fill in the missing value, which is computationally expensive Automatically fill in Use a global constant Use the attribute mean to fill in the missing value Global attribute mean Mean of the attribute for data objects belonging to the same class The most probable value: inference based on Bayesian formula or decision tree, regression, nearest neighbor strategy 3.2.4 Correlation Analysis 3.2.7 Data Reduction Strategies Why perform data reduction?\nBecause a data warehouse can store terabytes of data, complex data analysis on a complete dataset may take a very long time.\nGenerally, data reduction is required during data preprocessing.\nReplaces the original data with a smaller dataset.\nCommon data reduction strategies\nDimensionality reduction\nData reduction\nData compression\n3.2.8 Dimensionality Reduction PCA (Principal Component Analysis) method Non-negative Matrix Factorization (NMF) Linear Discriminant Analysis (LDA) Feature Selection\nSelect a representative subset of features from the original feature set\nSingle-feature importance evaluation\nModel-based feature importance evaluation\n3.2.11 Normalization Min-Max Normalization\nNormalizes an attribute $A$ from interval $[\\min_A, \\max_A]$ to $[new_{\\min_A}, new_{\\max_A}]$ $$ v' = \\frac{v - \\min_A}{\\max_A - \\min_A} (new_{\\max_A} - new_{\\min_A}) + new_{\\min_A} $$ Example: Normalize income from the interval $12000$ to $98000$ to between $[0,1]$, what is the normalized value of $73600$? Z-score Normalization $$ v' = \\frac{v - \\mu_A}{\\sigma_A} $$ Example: The mean of attribute $A$ is $\\mu_A = 54000$, the standard deviation is $\\sigma_A = 16000$, what is the normalized value of $73600$? Decimal Scaling Normalization\nMove the position of the decimal point; the number of places to move depends on the maximum absolute value of attribute $A$, defined by the formula $$ v' = \\frac{v}{10^j} $$ $j$ is the smallest integer such that $\\max(|v'|) \u003c 1$ For example: the minimum value of a dataset is $12000$, maximum value is $98000$, then the value of $j$ is $5$ $$ [12000, 98000] \\rightarrow [0.12, 0.98] $$ 4.3.2 Decision Tree Construction Decision Tree: A tree-like structured model learned from training data. It is a predictive analysis model expressed in the form of a tree structure (including binary trees and multi-way trees). A decision tree is a supervised learning algorithm and belongs to discriminative models. A decision tree is also known as a classification capability tree and is an important classification and regression method in data mining technology. There are two types of decision trees: classification trees and regression trees. Decision tree learning typically consists of 3 steps: feature selection, decision tree generation, and decision tree pruning. Commonly used methods: ID3, C4.5, CART 4.5 KNN Algorithm The k-Nearest Neighbor (kNN) method is a relatively mature and also the simplest machine learning algorithm, which can be used for basic classification and regression methods. The main idea of the algorithm: If a sample is most similar (i.e., its nearest neighbors in the feature space) to $k$ instances in the feature space, then whichever category the majority of these $k$ instances belong to, the sample will also belong to that category. Three fundamental elements of the $k$-nearest neighbor algorithm: the choice of $k$ value, distance metric, and classification decision rule. Differences Between KNN and K-Means K-NN is a classification algorithm in supervised learning where Categories are known. It trains and learns from classified data to find the features of these different classes, and then classifies unclassified data.\nK-Means is a clustering algorithm in unsupervised learning. It is unknown beforehand how the data will be classified. Through cluster analysis, data is grouped into several clusters. Clustering doesn\u0026rsquo;t require training and learning over data.\nSupervised Learning and Unsupervised Learning Supervised learning is a machine learning method which utilizes labeled data for training. Every input data has a corresponding output label, and the model predicts by learning the relationship between these inputs and outputs.\nCharacteristics:\nRequires extensive labeled data.\nThe training process has a definite objective, and the model can continuously adjust through feedback.\nCommon algorithms include linear regression, logistic regression, support vector machines (SVM), decision trees, etc.\nApplication Scenarios:\nSuitable for classification and regression problems, such as image recognition, speech recognition, and financial forecasting. Unsupervised learning is a machine learning method in which training is performed using unlabeled data. The model automatically discovers patterns and structures from the input data without relying on any labels.\nCharacteristics: Does not require labeled data, making it suitable for processing large amounts of unlabeled data. The training process lacks an explicit goal; the model learns through the intrinsic structure of the data. Common algorithms include clustering (e.g., K-Means), association rule learning (e.g., Apriori algorithm), etc. Application Scenarios: Suitable for data clustering, market segmentation, anomaly detection, etc. 5.3 Density-Based Clustering Methods DBSCAN Algorithm Description\nInput: A database containing $n$ objects, radius $\\varepsilon$ (Eps), and minimum number MinPts\nOutput: All generated clusters that meet density requirements\nRepeat\nExtract an unprocessed point from the data If the extracted point is a core point Then find all objects density-reachable from this point to form a cluster Else the extracted point is a border point (non-core object), exit the current iteration, and seek the next point EndIf Until all points are processed\nCore object: If the $\\varepsilon$-neighborhood of an object contains at least a minimum number of objects, MinPts, the object is called a core object.\nA border point\u0026rsquo;s \u0026ldquo;Eps\u0026rdquo; ($\\varepsilon$) neighborhood contains fewer than MinPts objects, but a core object exists within its neighborhood.\n6.1 Confusion Matrix Confusion Matrix\nActual Class \\ Predicted Class Class=Yes Class=No Class=Yes a (TP) b (FN) Class=No c (FP) d (TN) $a+d$ represents the number of correctly classified samples among all samples $b+c$ represents the number of incorrectly classified samples among all samples $a+b+c+d$ represents the total number of samples Accuracy $$ Accuracy = \\frac{a+d}{a+b+c+d} = \\frac{TP+TN}{TP+TN+FP+FN} $$ Recall\n$$ recall = \\frac{TP}{TP+FN} $$ Represents the proportion of positive samples that are correctly predicted, that is, how many positive samples are correctly identified. Precision\n$$ precision = \\frac{TP}{TP+FP} $$ Represents the proportion of truly positive samples out of those predicted as positive, that is, how many of the predicted true sample predictions are actually correct positive. 6.5 Overfitting and Underfitting Causes of Overfitting:\nNoise: The training set contains a massive volume of noisy data.\nLack of representative samples: The size of the training set is comparatively small, resulting in overly complex training models.\n7.1 Advantages of Ensemble Learning Can effectively reduce prediction error\nSuppose an ensemble classifier consists of 3 individual classifiers, where the error rate of each classifier is 40%. Let C denote a correct prediction, I denote an incorrect prediction, and Probability denote the probability of the final prediction result. The total number of combinations is $2^3=8$.\nThe model\u0026rsquo;s error rate is: 0.096+0.096+0.096+0.064=35.2% \u0026lt; 40%\nLet the number of models be $m$, and the error rate of each model be $r$.\nThe general formula for calculating error is: $$ p(error) = \\sum_{i=(m+1)/2}^{m} C_{m}^{i} r^{i}(1-r)^{m-i} $$ When over half of the $m$ models misclassify -\u0026gt; the final result is wrong, $i$ ranges from $(m+1)/2$ to $m$. Randomly selecting $i$ out of $m$, the remaining $m-i$ models classify correctly. The figure below depicts the relations between error rates and model scales when $r=0.4$. ","date":"2025-11-23T21:42:00+08:00","permalink":"/en/p/data-mining/","title":"Data Mining"},{"content":"Reference Websites Maven Fundamentals - Java Tutorial - Liao Xuefeng\u0026rsquo;s Official Website\nMaven Fundamentals Introduction to Maven Before understanding Maven, let\u0026rsquo;s look at what a Java project needs. First, we need to determine which dependency packages to introduce. For example, if we need to use commons logging, we must put the commons logging jar package into the classpath. If we also need log4j, we need to put all log4j-related jar packages into the classpath. This is dependency management.\nSecondly, we must determine the directory structure of the project. For example, the src directory stores Java source code, the resources directory stores configuration files, and the bin directory stores the compiled .class files.\nFurthermore, we also need to configure the environment, such as the JDK version, the compilation and packaging process, and the version number of the current code.\nFinally, in addition to using IDEs like Eclipse for compilation, we must also be able to compile via command-line tools so that the project can be compiled, tested, and deployed on an independent server.\nThese tasks are not difficult, but they are very tedious and time-consuming. If every project had its own set of configurations, it would certainly be a mess. What we need is a standardized Java project management and build tool.\nMaven is a management and build tool specifically created for Java projects. Its main features include:\nProviding a standardized project structure; Providing a standardized build process (compilation, testing, packaging, publishing\u0026hellip;); Providing a dependency management mechanism. Maven Project Structure A typical Java project managed using Maven has the following default directory structure:\n1 2 3 4 5 6 7 8 9 10 a-maven-project ├── pom.xml ├── src │ ├── main │ │ ├── java │ │ └── resources │ └── test │ ├── java │ └── resources └── target The root directory of the project a-maven-project is the project name. It has a project description file pom.xml. The directory storing Java source code is src/main/java, the directory storing resource files is src/main/resources, the directory storing test source code is src/test/java, and the directory storing test resources is src/test/resources. Finally, all files generated by compilation and packaging are placed in the target directory. These constitute the standard directory structure of a Maven project.\nAll directory structures are agreed-upon standard structures, and we must never modify the directory structure arbitrarily. Using the standard structure requires no configuration, and Maven can be utilized normally.\nLet\u0026rsquo;s look at the most critical project description file, pom.xml. Its content looks like the following:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 \u0026lt;project ...\u0026gt; \u0026lt;modelVersion\u0026gt;4.0.0\u0026lt;/modelVersion\u0026gt; \u0026lt;groupId\u0026gt;com.itranswarp.learnjava\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hello\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.0\u0026lt;/version\u0026gt; \u0026lt;packaging\u0026gt;jar\u0026lt;/packaging\u0026gt; \u0026lt;properties\u0026gt; \u0026lt;project.build.sourceEncoding\u0026gt;UTF-8\u0026lt;/project.build.sourceEncoding\u0026gt; \u0026lt;maven.compiler.release\u0026gt;17\u0026lt;/maven.compiler.release\u0026gt; \u0026lt;/properties\u0026gt; \u0026lt;dependencies\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.slf4j\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;slf4j-simple\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.0.16\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;/dependencies\u0026gt; \u0026lt;/project\u0026gt; Here, groupId is similar to Java\u0026rsquo;s package name, typically the name of a company or organization. artifactId is similar to a Java class name, typically the project name. Together with version, a Maven project is uniquely identified by groupId, artifactId, and version.\nWhen we reference other third-party libraries, it is also determined through these 3 variables. For example, depending on org.slfj4:slf4j-simple:2.0.16:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.slf4j\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;slf4j-simple\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.0.16\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; After declaring a dependency using \u0026lt;dependency\u0026gt;, Maven will automatically download this dependency package and put it into the classpath.\nAdditionally, note that \u0026lt;properties\u0026gt; defines some properties. Commonly used properties are:\nproject.build.sourceEncoding: Indicates the character encoding of the project source code, typically set to UTF-8; maven.compiler.release: Indicates the JDK version to use, for example, 21; maven.compiler.source: Indicates the source code version read by the Java compiler; maven.compiler.target: Indicates the Class version compiled by the Java compiler. Starting from Java 9, it is recommended to use the maven.compiler.release property to ensure that the input source code and the compiled output version are consistent during compilation. If the source code and output versions are different, maven.compiler.source and maven.compiler.target should be set respectively.\nBy defining properties via \u0026lt;properties\u0026gt;, the JDK version can be fixed, preventing different developers of the same project from using different versions of the JDK.\nSummary Maven is a management and build tool for Java projects:\nMaven uses pom.xml to define project content and utilizes a preset directory structure; Declaring a dependency in Maven can automatically download and import it into the classpath; Maven uses groupId, artifactId, and version to uniquely locate a dependency. Dependency Management If our project depends on third-party jar packages, such as commons logging, the question arises: where do we download the published jar package for commons logging?\nIf we also want to depend on log4j, what jar packages are needed to use log4j?\nSimilar dependencies include: JUnit, JavaMail, MySQL driver, etc. A feasible method is to search for the project\u0026rsquo;s official website via a search engine, manually download the zip package, extract it, and put it into the classpath. However, this process is very tedious.\nMaven solves the dependency management problem. For example, our project depends on the jar package abc, and abc depends on the jar package xyz:\n1 2 3 4 5 6 7 8 9 10 11 12 13 ┌──────────────┐ │Sample Project│ └──────────────┘ │ ▼ ┌──────────────┐ │ abc │ └──────────────┘ │ ▼ ┌──────────────┐ │ xyz │ └──────────────┘ When we declare the dependency for abc, Maven automatically adds both abc and xyz to our project dependencies. We don\u0026rsquo;t need to manually investigate whether abc requires xyz.\nTherefore, the first role of Maven is to resolve dependency management. We declare that our project needs abc, Maven will automatically import the jar package of abc, then determine that abc needs xyz, and will automatically import the jar package of xyz. Thus, ultimately, our project will depend on both abc and xyz jar packages.\nLet\u0026rsquo;s look at a complex dependency example:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.springframework.boot\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;spring-boot-starter-web\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.4.2.RELEASE\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; When we declare a spring-boot-starter-web dependency, Maven will automatically parse and determine that approximately twenty or thirty other dependencies are ultimately required:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 spring-boot-starter-web spring-boot-starter spring-boot sprint-boot-autoconfigure spring-boot-starter-logging logback-classic logback-core slf4j-api jcl-over-slf4j slf4j-api jul-to-slf4j slf4j-api log4j-over-slf4j slf4j-api spring-core snakeyaml spring-boot-starter-tomcat tomcat-embed-core tomcat-embed-el tomcat-embed-websocket tomcat-embed-core jackson-databind ... If we try to manually manage these dependencies ourselves, it is extremely time-consuming, laborious, and the probability of errors is very high.\nDependency Relationships Maven defines several dependency relationships, namely compile, test, runtime, and provided:\nscope Description Example compile This jar package is needed during compilation (default) commons-logging test This jar package is needed when compiling tests junit runtime Not needed during compilation, but required at runtime mysql provided Needed during compilation, but provided by JDK or a server at runtime servlet-api Among them, the default compile is the most commonly used, and Maven will place dependencies of this type directly into the classpath.\ntest dependencies indicate they are only used during testing and are not needed during normal execution. The most common test dependency is JUnit:\n1 2 3 4 5 6 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.junit.jupiter\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;junit-jupiter-api\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;5.3.2\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;test\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; runtime dependencies indicate they are not needed during compilation but are required at runtime. The most typical runtime dependencies are JDBC drivers, such as the MySQL driver:\n1 2 3 4 5 6 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;mysql\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;mysql-connector-java\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;5.1.48\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;runtime\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; provided dependencies indicate they are needed during compilation but not at runtime. The most typical provided dependency is the Servlet API, which is needed during compilation; however, at runtime, the Servlet server has built-in related jars, so they are not needed during the execution phase:\n1 2 3 4 5 6 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;jakarta.servlet\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jakarta.servlet-api\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;4.0.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; The last question is, how does Maven know where to download the required dependencies? That is, the related jar packages? The answer is that Maven maintains a central repository (repo1.maven.org), where all third-party libraries upload their own jars and related information. Maven can download the required dependencies from the central repository to the local machine.\nMaven does not download jar packages from the central repository every time. Once a jar package has been downloaded, it is automatically cached by Maven in a local directory (the .m2 directory in the user\u0026rsquo;s home directory). Therefore, apart from the first compilation being relatively slow due to the time needed for downloading, subsequent processes will not repeatedly download the same jar packages because of the local cache.\nUnique ID For any given dependency, Maven only needs 3 variables to uniquely identify a jar package:\ngroupId: The name of the organization it belongs to, similar to a Java package name; artifactId: The name of the jar package itself, similar to a Java class name; version: The version of the jar package. Through the above 3 variables, a certain jar package can be uniquely determined. Maven ensures that any jar package cannot be modified once published by performing PGP signing on the jar packages. The only way to modify a published jar package is to publish a new version.\nTherefore, once a jar package has been downloaded by Maven, it can be permanently and safely cached locally.\nNote: Only version numbers ending with -SNAPSHOT are regarded by Maven as development versions. Development versions are repeatedly downloaded every time. Such SNAPSHOT versions can only be used in internal private Maven repos, and publicly published versions are not allowed to be SNAPSHOTs.\nHenceforth, when we represent Maven dependencies, we use the abbreviated form groupId:artifactId:version, for example: org.slf4j:slf4j-api:2.0.4.\nMaven Mirrors Besides downloading from Maven\u0026rsquo;s central repository, you can also download from Maven mirror repositories. If accessing Maven\u0026rsquo;s central repository is very slow, we can choose a faster Maven mirror repository. Maven mirror repositories synchronize periodically from the central repository:\n1 2 3 4 5 6 7 8 9 slow ┌───────────────────┐ ┌─────────────▶│Maven Central Repo.│ │ └───────────────────┘ │ │ │ │sync │ ▼ ┌───────┐ fast ┌───────────────────┐ │ User │─────────▶│Maven Mirror Repo. │ └───────┘ └───────────────────┘ Users in the China region can use the Maven mirror repository provided by Alibaba Cloud. Using a Maven mirror repository requires configuration. In the user\u0026rsquo;s home directory, enter the .m2 directory and create a settings.xml configuration file with the following content:\n1 2 3 4 5 6 7 8 9 10 11 \u0026lt;settings\u0026gt; \u0026lt;mirrors\u0026gt; \u0026lt;mirror\u0026gt; \u0026lt;id\u0026gt;aliyun\u0026lt;/id\u0026gt; \u0026lt;name\u0026gt;aliyun\u0026lt;/name\u0026gt; \u0026lt;mirrorOf\u0026gt;central\u0026lt;/mirrorOf\u0026gt; \u0026lt;!-- Alibaba Cloud\u0026#39;s Maven mirror is recommended in China --\u0026gt; \u0026lt;url\u0026gt;https://maven.aliyun.com/repository/central\u0026lt;/url\u0026gt; \u0026lt;/mirror\u0026gt; \u0026lt;/mirrors\u0026gt; \u0026lt;/settings\u0026gt; After configuring the mirror repository, Maven\u0026rsquo;s downloading speed will be very fast.\nSearching for Third-Party Components The final question: if we want to reference a third-party component, such as okhttp, how do we exactly acquire its groupId, artifactId, and version? The method is to search for keywords via search.maven.org. After finding the corresponding component, directly copy it.\nCommand Line Compilation In the command line, navigate to the directory where pom.xml is located, and enter the following command:\n1 $ mvn clean package If everything goes smoothly, you can obtain the automatically packaged jar after compilation in the target directory.\nUsing Maven in an IDE Almost all IDEs have built-in support for Maven. In Eclipse, you can directly create or import a Maven project. If the imported Maven project has errors, you can try selecting the project, right-clicking, and choosing Maven - Update Project\u0026hellip; to update it.\nSummary Maven determines the jar packages required by the project through parsing dependency relationships. The 4 commonly used scopes are: compile (default), test, runtime, and provided;\nMaven downloads the required jar packages from the central repository and caches them locally;\nDownloading can be accelerated through mirror repositories.\nBuild Process Maven not only has a standardized project structure, but it also has a standardized build process that can automatically automate compiling, packaging, publishing, and more.\nLifecycle and Phase When using Maven, we first need to understand what Maven\u0026rsquo;s lifecycle is.\nMaven\u0026rsquo;s lifecycle consists of a series of phases. Taking the built-in lifecycle default as an example, it includes the following phases:\nvalidate initialize generate-sources process-sources generate-resources process-resources compile process-classes generate-test-sources process-test-sources generate-test-resources process-test-resources test-compile process-test-classes test prepare-package package pre-integration-test integration-test post-integration-test verify install deploy If we run mvn package, Maven will execute the default lifecycle, and it will run consistently from the beginning up until the package phase:\nvalidate initialize \u0026hellip; prepare-package package If we run mvn compile, Maven will also execute the default lifecycle, but this time it will only run up to compile, namely the following phases:\nvalidate initialize \u0026hellip; process-resources compile Another commonly used lifecycle in Maven is clean, which executes 3 phases:\npre-clean clean (note that this clean is a phase, not a lifecycle) post-clean Therefore, when we use the mvn command, the parameter following it is a phase, and Maven automatically runs up to the specified phase according to the lifecycle.\nA more complex example is specifying multiple phases. For example, running mvn clean package, Maven first executes the clean lifecycle and runs up to the clean phase, then it executes the default lifecycle and runs up to the package phase. The actually executed phases are as follows:\npre-clean clean (note that this clean is a phase) validate (starts executing the first phase of the default lifecycle) initialize \u0026hellip; prepare-package package During the actual development process, frequently used commands include:\nmvn clean: Cleans up all generated classes and jars;\nmvn clean compile: Cleans first, then executes up to compile;\nmvn clean test: Cleans first, then executes up to test. Because compile must be executed before executing test, there is no need to specify compile here;\nmvn clean package: Cleans first, then executes up to package.\nDuring the execution process of most phases, because we usually do not configure related settings in pom.xml, these phases effectively do nothing.\nThe phases that are frequently used are actually only a few:\nclean: clean up compile: compile test: run tests package: package Goal Executing a phase subsequently triggers one or multiple goals:\nExecuted Phase Corresponding Executed Goal compile compiler:compile test compiler:testCompile surefire:test The naming of a goal always takes the format of abc:xyz.\nActually, if we draw an analogy, it becomes clear:\nlifecycle is equivalent to a Java package; it contains one or multiple phases; phase is equivalent to a Java class; it contains one or multiple goals; goal is equivalent to a class method; it is actually the one doing the real work. In most cases, we simply specify the phase, and it defaults to executing the goals bound by default to these phases. Only in a few instances do we directly specify running a goal, for example, starting a Tomcat server:\n1 $ mvn tomcat:run Summary Maven provides a standard build process through lifecycles, phases, and goals.\nThe most commonly used build command entails specifying a phase, subsequently allowing Maven to execute up to the designated phase:\nmvn clean mvn clean compile mvn clean test mvn clean package Normally, we always execute the goals natively bound by default to the phase, so it is unnecessary to specify the goal.\n","date":"2025-04-01T14:49:00+08:00","permalink":"/en/p/maven-fundamentals/","title":"Maven Fundamentals"},{"content":"Reference Websites Multithreading - Java Tutorial - Liao Xuefeng\u0026rsquo;s Official Website\nMostly referenced Mr. Liao\u0026rsquo;s blog, a very good tutorial.\nDetailed Illustration of Java Multithreading - Personal Article - SegmentFault\nRelatively less detailed; covers synchronization locks and thread pools. Concise and clear.\nAlso supplemented some knowledge, such as thread status, synchronized locks, the producer-consumer model, etc.\nJava Multithreading Process/Thread The relationship between processes and threads: A process can contain one or multiple threads, but there is always at least one thread.\nThe smallest execution unit scheduled by the operating system is actually a thread, not a process. Commonly used operating systems like Windows and Linux utilize preemptive multitasking. How a thread is scheduled is entirely determined by the OS; the program itself cannot decide when or for how long a thread executes.\nMultitasking can be achieved by multi-processing, multi-threading within a single process, or a mix of multi-processing + multi-threading.\nCompared to multi-threading, the disadvantages of multi-processing are:\nCreating a process incurs more overhead than creating a thread, especially on Windows systems. Inter-process communication is slower than inter-thread communication, as inter-thread communication merely involves reading and writing the same variables, which is extremely fast. The advantages of multi-processing are:\nMulti-processing has higher stability than multi-threading. In a multi-process scenario, the crash of one process does not affect others. In a multi-threading scenario, the crash of any single thread directly causes the crash of the entire process. Multithreading The Java language has built-in support for multithreading: a Java program is actually a JVM process. The JVM process uses a main thread to execute the main() method, and within the main() method, we can start multiple threads. Furthermore, the JVM has other worker threads responsible for garbage collection, etc.\nCompared to single-threaded programming, the characteristic of multithreaded programming is: multithreading often requires reading and writing shared data, which requires synchronization.\nFor instance, when playing a movie, one thread must play the video and another the audio. The two threads must coordinate their execution, otherwise, the imagery and audio will be out of sync. Therefore, multithreaded programming is highly complex and more difficult to debug.\nCreating Multithreading Creating a New Thread - Java Tutorial - Liao Xuefeng\u0026rsquo;s Official Website\nCreating a new thread is very easy; we need to instantiate a Thread object, and then call its start() method:\n1 2 3 4 5 6 public class Main { public static void main(String args[]) { Thread t = new Thread(); t.start(); } } To have the new thread execute specified code, there are several methods:\nMethod 1: Derive a custom class from Thread and then override the run() method:\n1 2 3 4 5 6 7 8 9 10 11 12 public class Main { public static void main(String args[]) { Thread t = new Thread(); t.start(); } } class MyThread extends Thread { @Override public void run(){ System.out.println(\u0026#34;start new thread!\u0026#34;); } } When executing the above code, notice that the start() method will automatically invoke the instance\u0026rsquo;s run() method internally.\nMethod 2: When creating a Thread instance, pass in a Runnable instance.\n1 2 3 4 5 6 7 8 9 10 11 12 public class Main { public static void main(String[] args) { Thread t = new Thread(new MyRunnable()); t.start(); // Start new thread } } class MyRunnable implements Runnable { @Override public void run() { System.out.println(\u0026#34;start new thread!\u0026#34;); } } Or further simplify it using the lambda syntax introduced in Java 8:\n1 2 3 4 5 6 7 8 public class Main { public static void main(String[] args) { Thread t = new Thread(() -\u0026gt; { System.out.println(\u0026#34;start new thread!\u0026#34;); }); t.start(); // Start new thread } } However, directly calling the run() method does not achieve multithreading, the current thread will not change; it merely executes the run() method.\nYou must call the start() method of the Thread instance to start a new thread. If we look at the source code of the Thread class, we see that the start() method internally invokes a private native void start0() method. The native modifier indicates that this method is implemented by C code inside the JVM virtual machine, not by Java code.\nThe difference between using a thread and executing directly in the main() method:\n1 2 3 4 5 6 7 8 9 10 11 12 13 public class Main { public static void main(String[] args) { System.out.println(\u0026#34;main start...\u0026#34;); Thread t = new Thread() { public void run() { System.out.println(\u0026#34;thread run...\u0026#34;); System.out.println(\u0026#34;thread end.\u0026#34;); } }; t.start(); System.out.println(\u0026#34;main end...\u0026#34;); } } Execution order in main:\nPrint main start... Create Thread object start invokes the new thread When the start() method is called, the JVM creates a new thread. We represent this new thread object using the instance variable t and start execution.\nPrint main end... However, after thread t starts running, main and t run concurrently. At this point, the program itself cannot determine the scheduling order of the threads.\nTo simulate the effect of concurrent execution, we can call Thread.sleep() in the thread. The parameter unit is milliseconds. sleep() forces the current thread to pause for a while:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public class Main { public static void main(String[] args) { System.out.println(\u0026#34;main start...\u0026#34;); Thread t = new Thread() { public void run() { System.out.println(\u0026#34;thread run...\u0026#34;); try { Thread.sleep(10); } catch (InterruptedException e) {} System.out.println(\u0026#34;thread end.\u0026#34;); } }; t.start(); try { Thread.sleep(20); } catch (InterruptedException e) {} System.out.println(\u0026#34;main end...\u0026#34;); } } Thread Priority 1 Thread.setPriority(int n) // Default is 5 The JVM automatically maps priorities from 1 (lowest) to 10 (highest) to the actual priorities of the OS (different operating systems have different numbers of priority levels). Threads with higher priority are more likely to be scheduled by the OS. The OS might schedule high-priority threads more frequently, but we must never rely on setting priority to guarantee that a high-priority thread executes first. When the CPU is busy, threads with higher priorities acquire more time slices; when the CPU is idle, setting priorities is essentially useless.\nThe yield() method makes the running thread switch to the ready state, re-contending for the CPU\u0026rsquo;s time slice. Whether it gets the time slice when contending depends on the CPU\u0026rsquo;s allocation.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public static native void yield(); Runnable r1 = () -\u0026gt; { int count = 0; for (;;){ log.info(\u0026#34;---- 1\u0026gt;\u0026#34; + count++); } }; Runnable r2 = () -\u0026gt; { int count = 0; for (;;){ Thread.yield(); log.info(\u0026#34; ---- 2\u0026gt;\u0026#34; + count++); } }; Thread t1 = new Thread(r1,\u0026#34;t1\u0026#34;); Thread t2 = new Thread(r2,\u0026#34;t2\u0026#34;); t1.start(); t2.start(); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 // Execution results 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129504 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129505 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129506 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129507 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129508 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129509 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129510 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129511 11:49:15.796 [t1] INFO thread.TestYield - ---- 1\u0026gt;129512 11:49:15.798 [t2] INFO thread.TestYield - ---- 2\u0026gt;293 11:49:15.798 [t1] INFO thread.TestYield - ---- 1\u0026gt;129513 11:49:15.798 [t1] INFO thread.TestYield - ---- 1\u0026gt;129514 11:49:15.798 [t1] INFO thread.TestYield - ---- 1\u0026gt;129515 11:49:15.798 [t1] INFO thread.TestYield - ---- 1\u0026gt;129516 11:49:15.798 [t1] INFO thread.TestYield - ---- 1\u0026gt;129517 11:49:15.798 [t1] INFO thread.TestYield - ---- 1\u0026gt;129518 As the results above display, since thread t2 executed yield() every time it ran, the execution opportunities for thread 1 were noticeably more numerous than for thread 2.\nSummary Java uses a Thread object to represent a thread and starts a new thread by calling start(). A thread object can only call the start() method once. The execution code of a thread is written in the run() method. Thread scheduling is determined by the OS; the program itself cannot dictate the scheduling sequence. Thread.sleep() can pause the current thread for a duration. Thread Blocking The ways a thread can be placed into a blocking state are as follows:\nBIO blocking, i.e., using blocking IO streams. sleep(long time) forces the thread to sleep, entering the block state. a.join() invoking thread enters blocking and awaits thread a to finish execution before resuming. synchronized or ReentrantLock causes a thread to enter the blocking state if it cannot acquire the lock. Calling the wait() method after acquiring a lock also forces the thread into a blocking state. LockSupport.park() places the thread in a blocking state. Thread.sleep() Makes a thread sleep, putting a running thread into a blocked state. When the sleep duration ends, the thread re-contends for the CPU\u0026rsquo;s time slice to resume execution.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 // Method declaration, a native method public static native void sleep(long millis) throws InterruptedException; try { // Sleep for 2 seconds // This method throws an InterruptedException, meaning it can be interrupted during sleep, throwing an exception once interrupted Thread.sleep(2000); } catch (InterruptedException e) { } try { // APIs utilizing TimeUnit serve as a replacement for Thread.sleep TimeUnit.SECONDS.sleep(1); } catch (InterruptedException e) { } Thread.join() A thread can also wait for another thread until it concludes its execution. For example, after initiating thread t, the main thread can utilize t.join() to await thread t concluding before continuing to run:\n1 2 3 4 5 6 7 8 9 10 11 public class Main { public static void main(String[] args) throws InterruptedException { Thread t = new Thread(() -\u0026gt; { System.out.println(\u0026#34;hello\u0026#34;); });\t// Java 8 lambda method System.out.println(\u0026#34;start\u0026#34;); t.start(); // Start thread t t.join(); // The main thread waits here for t to finish System.out.println(\u0026#34;end\u0026#34;); } } When the main thread calls join() on thread object t, the main thread will wait until thread t finishes execution, and only then continue running its own subsequent code. Therefore, the print order of the code above is guaranteed: the main thread prints start first, thread t then prints hello, and finally the main thread prints end.\nIf thread t has already finished, calling join() on instance t returns immediately. Additionally, the overloaded method join(long) allows you to specify a maximum wait time, after which the thread stops waiting.\nSummary Common ways to block a thread: BIO blocking, sleep(), join(), failing to acquire a lock (synchronized/ReentrantLock), wait(), LockSupport.park(). sleep(): Makes the thread sleep for a specified duration. It can be interrupted during sleep. Using TimeUnit is recommended for better readability. join(): Makes the current thread wait until the target thread finishes execution. Commonly used to control the order of thread execution. Blocking and resumption: Once a thread enters a blocked state, it must wait for a specific condition to be met (e.g., sleep time elapsed, lock released, target thread completed) before it can resume execution. Interrupting a Thread Interrupting a Thread - Java Tutorial - Liao Xuefeng\u0026rsquo;s Official Website\nIf a thread needs to execute a long-running task, it may become necessary to interrupt it. Interrupting a thread means that another thread sends it a signal. Upon receiving this signal, the target thread exits its run() method, allowing it to terminate immediately.\nFor example, when downloading a 100M file from the network, if the connection is slow and the user clicks \u0026lsquo;Cancel\u0026rsquo;, the program must interrupt the downloading thread.\nThread.interrupt Interrupting a thread is very simple; you just need another thread to call the interrupt() method on the target thread. The target thread needs to repeatedly check whether its state is interrupted, and if so, it must terminate its execution immediately.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public class Main { public static void main(String[] args) throws InterruptedException { Thread t = new MyThread(); t.start(); Thread.sleep(1); // Pause for 1 millisecond t.interrupt(); // Interrupt thread t t.join(); // Wait for thread t to end System.out.println(\u0026#34;end\u0026#34;); } } class MyThread extends Thread { public void run() { int n = 0; while (! isInterrupted()) { n++; System.out.println(n + \u0026#34; hello!\u0026#34;); } } } In the code above, the main thread interrupts thread t by calling t.interrupt(). However, note that the interrupt() method only sends an \u0026lsquo;interrupt request\u0026rsquo; to thread t. As for whether thread t can respond immediately, it depends on its code. Since the while loop of thread t detects isInterrupted(), the code above correctly responds to the interrupt() request, allowing the run() method to conclude.\nIf the thread is in a waiting state—for example, t.join() places the main thread in a waiting state—then if interrupt() is called on the main thread, the join() method will immediately throw an InterruptedException. Therefore, as long as the target thread catches the InterruptedException thrown by the join() method, it implies that another thread has called the interrupt() method on it, and typically the thread should immediately exit.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 public class Main { public static void main(String[] args) throws InterruptedException { Thread t = new MyThread(); t.start(); Thread.sleep(1000); t.interrupt(); // Interrupt thread t t.join(); // Wait for thread t to end System.out.println(\u0026#34;end\u0026#34;); } } class MyThread extends Thread { public void run() { Thread hello = new HelloThread(); hello.start(); // Start the hello thread try { hello.join(); // Wait for the hello thread to end } catch (InterruptedException e) { System.out.println(\u0026#34;interrupted!\u0026#34;); } hello.interrupt(); } } class HelloThread extends Thread { public void run() { int n = 0; while (!isInterrupted()) { n++; System.out.println(n + \u0026#34; hello!\u0026#34;); try { Thread.sleep(100); } catch (InterruptedException e) { break; } } } } The main thread notifies thread t to interrupt by calling t.interrupt(). At this point, thread t is waiting inside hello.join(); this method immediately stops waiting and throws an InterruptedException. Inside thread t, the InterruptedException is caught, preparing the thread to terminate. Before thread t terminates, it also calls interrupt() on the hello thread to notify it to interrupt. The running Flag Another common method to interrupt a thread is setting a flag. We normally use a running flag to indicate whether the thread should continue executing. By setting HelloThread.running to false from an external thread, we can make the thread terminate:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public class Main { public static void main(String[] args) throws InterruptedException { HelloThread t = new HelloThread(); t.start(); Thread.sleep(1); t.running = false; // Set flag to false } } class HelloThread extends Thread { public volatile boolean running = true; public void run() { int n = 0; while (running) { n ++; System.out.println(n + \u0026#34; hello!\u0026#34;); } System.out.println(\u0026#34;end!\u0026#34;); } } Notice that the flag boolean running of HelloThread is a variable shared between threads. Shared variables between threads need to be marked with the volatile keyword to ensure that every thread can read the updated value of the variable.\nThe Purpose of volatile Why declare variables shared across threads with the keyword volatile? This relates to Java\u0026rsquo;s memory model. Inside the Java virtual machine, variable values are stored in main memory. However, when a thread accesses a variable, it first obtains a copy and saves it in its own working memory. If a thread modifies the value of a variable, the virtual machine will write the modified value back to main memory at some point, but this timing is uncertain!\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 // This diagram is really well drawn! ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ Main Memory │ │ ┌───────┐┌───────┐┌───────┐ │ │ var A ││ var B ││ var C │ │ └───────┘└───────┘└───────┘ │ │ ▲ │ ▲ │ ─ ─ ─│─│─ ─ ─ ─ ─ ─ ─ ─│─│─ ─ ─ │ │ │ │ ┌ ─ ─ ┼ ┼ ─ ─ ┐ ┌ ─ ─ ┼ ┼ ─ ─ ┐ ▼ │ ▼ │ │ ┌───────┐ │ │ ┌───────┐ │ │ var A │ │ var C │ │ └───────┘ │ │ └───────┘ │ Thread 1 Thread 2 └ ─ ─ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─ ─ ┘ This causes a situation where if one thread updates a certain variable, the value read by another thread might still be the one before the update.\nFor example, if the variable in main memory is a = true, when Thread 1 executes a = false, it merely changes its copy of variable a to false at that moment; the variable a in main memory is still true. Before the JVM writes the modified a back to main memory, the value of a read by other threads remains true, which leads to inconsistency in shared variables among multiple threads.\nTherefore, the purpose of the volatile keyword is to tell the virtual machine:\nEvery time you access a variable, always acquire the latest value from main memory; Every time you modify a variable, instantly write it back to main memory. The volatile keyword solves the visibility problem: when one thread modifies the value of a shared variable, other threads can immediately see the modified value.\nIf we remove the volatile keyword and run the program above, we find the effect is similar to having volatile. This is because, under the x86 architecture, the JVM writes back to main memory extremely fast, but switching to an ARM architecture would incur significant delays.\nSummary Calling interrupt() on a target thread sends an interruption request. The target thread checks its status via isInterrupted(). If the target thread is in a waiting state, it will catch an InterruptedException. A target thread should terminate immediately when isInterrupted() returns true or when it catches an InterruptedException. When using flag-based approaches to control threads, the volatile keyword must be applied correctly. The volatile keyword solves the visibility problem of shared variables across threads. Thread State Detailed Illustration of Java Multithreading - Personal Article - SegmentFault\nThread State - Java Tutorial - Liao Xuefeng\u0026rsquo;s Official Website\nThe System - Five States Thread states can be divided into five states at the operating system level:\nInitial state: The state when the thread object is created. Runnable state (Ready state): After calling the start() method, it enters the ready state, which implies it\u0026rsquo;s prepared to be scheduled and executed by the CPU. Running state: The thread obtains the CPU\u0026rsquo;s time slice and executes the logic of the run() method. Blocked state: The thread is blocked, relinquishing the CPU\u0026rsquo;s time slice, and waits for the block to be lifted to return to the ready state and contend for a time slice again. Terminated state: The state after the thread has finished execution or thrown an exception. Java - Six States In a Java program, a thread object can only call the start() method once to initiate a new thread, and it executes the run() method within the new thread. Once the run() method finishes executing, the thread concludes. Hence, the states of a Java thread are as follows:\nNEW: The thread object is created. Runnable: The thread enters this state after calling the start() method. This state encompasses three scenarios: Ready state: Waiting for the CPU to allocate a time slice. Running state: Entering the Runnable method to execute a task. Blocked state: State during BIO execution of blocking IO streams. Blocked: The blocked state when failing to acquire a lock (will be detailed in the synchronization lock section). WAITING: The state after calling methods like wait() or join(). TIMED_WAITING: The state after calling methods like sleep(time), wait(time), or join(time). TERMINATED: The state after the thread has finished executing or thrown an exception. After a thread starts, it can switch among the Runnable, Blocked, Waiting, and Timed Waiting states until it finally transitions to the Terminated state, at which point the thread terminates.\nThe reasons for a thread to terminate include:\nNormal termination: The run() method executes and returns at the return statement; Unexpected termination: The run() method terminates due to an uncaught exception; Forceful termination: Calling the stop() method on a specific Thread instance (strongly discouraged). Core Methods in the Thread Class Method Name Is Static Description start() No Starts the thread, entering the ready state to await the CPU allocating a time slice. run() No The method overriding the Runnable interface, representing the specific logic executed when the thread receives a CPU time slice. yield() Yes Thread concession. Forces the thread holding the CPU time slice to enter the ready state to recompete for a time slice. sleep(time) Yes The thread sleeps for a fixed period and enters a blocked state. Once the sleep duration completes, it recompetes for a time slice. Sleeping can be interrupted. join()/join(time) No Calling the join method on a thread object forces the calling thread into a blocked state. It waits until the thread object finishes executing or reaches the designated time limit before recovering and re-contending for a time slice. isInterrupted() No Retrieves the thread\u0026rsquo;s interruption flag: true for interrupted, false for uninterrupted. Calling this will not modify the interruption flag. interrupt() No Interrupts the thread. Methods throwing an InterruptedException can all be interrupted; however, after interruption, the flag will not be modified. If a normally executing thread is interrupted, the interruption flag will be modified. interrupted() No Fetches the thread\u0026rsquo;s interrupted flag. Calling this clears the interruption flag. stop() No Stops thread execution (Not recommended). suspend() No Suspends thread (Not recommended). resume() No Resumes thread execution (Not recommended). currentThread() Yes Acquires the current thread. Thread-related methods in Object\nMethod Name Description wait()/wait(long timeout) Makes the thread that has acquired the lock enter a blocked state. notify() Randomly wakes up one thread that has been wait()-ed. notifyAll(); Wakes up all threads that have been wait()-ed so they can recompete for time slices. Daemon Threads Daemon Threads - Java Tutorial - Liao Xuefeng\u0026rsquo;s Official Website\nThe Java program entry point involves the JVM launching the main thread, which can in turn launch other threads. When all threads have finished executing, the JVM exits, and the process ends.\nIf there is a thread that hasn\u0026rsquo;t exited, the JVM process will not exit. Therefore, it must be guaranteed that all threads can conclude promptly.\nHowever, there is a type of thread whose purpose is looping unconditionally. For example, a thread that triggers a task on a timer:\n1 2 3 4 5 6 7 8 9 10 11 12 13 class TimerThread extends Thread { @Override public void run() { while (true) { System.out.println(LocalTime.now()); try { Thread.sleep(1000); } catch (InterruptedException e) { break; } } } } If this thread does not finish, the JVM process cannot end. The question is, who is responsible for closing this thread?\nOften, such threads lack a designated manager to terminate them. However, when other threads have finished, the JVM process unequivocally must end. What can be done?\nThe answer is using a Daemon Thread.\nA daemon thread refers to a thread that serves other threads. In the JVM, once all non-daemon threads have completed execution, regardless of whether daemon threads exist, the virtual machine will automatically exit.\nHence, when the JVM exits, it doesn\u0026rsquo;t need to care whether daemon threads have concluded.\nHow does one create a daemon thread? The method is identical to an ordinary thread; only, before calling the start() method, you call setDaemon(true) to mark the thread as a daemon thread:\n1 2 3 Thread t = new MyThread(); t.setDaemon(true); t.start(); Inside a daemon thread, caution must be exercised when writing code: Daemon threads cannot hold any resources that require closing, such as opened files. This is because when the virtual machine exits, the daemon thread is afforded no opportunity to close the files, which will result in data loss.\nSummary Daemon threads are threads that serve other threads.\nAfter all non-daemon threads have completed execution, the virtual machine exits, and daemon threads are consequently terminated.\nDaemon threads cannot hold resources that require closure (e.g., opened files).\nThread Synchronization Thread Synchronization - Java Tutorial - Liao Xuefeng\u0026rsquo;s Official Website\nWhen multiple threads execute concurrently, the scheduling of threads is dictated by the operating system, and the program itself cannot control it. Consequently, there is a possibility for any thread to be paused by the OS at any instruction and then resume execution after a certain timeframe.\nAt this point, a problem emerges that doesn\u0026rsquo;t exist under single-threaded models: if multiple threads concurrently read and write to a shared variable, data inconsistency issues will arise.\nLet\u0026rsquo;s look at an example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 // Multiple threads public class Main { public static void main(String[] args) throws Exception { var add = new AddThread(); var dec = new DecThread(); add.start(); dec.start(); add.join(); dec.join(); System.out.println(Counter.count); } } class Counter { public static int count = 0; } class AddThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { Counter.count += 1; } } } class DecThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { Counter.count -= 1; } } } The code above is very simple. Two threads simultaneously perform operations on an int variable; one adds 1 ten thousand times, and the other subtracts 1 ten thousand times. Ultimately, the result should be 0. However, every time it runs, the actual result varies.\nThis is because when reading and writing a variable, to get the correct result, it must be guaranteed to be an atomic operation. Atomic operations are single operations or a sequence of operations that cannot be interrupted.\nFor example, regarding the statement:\n1 n = n + 1; It appears to be a single statement, but in reality, it maps to 3 instructions:\n1 2 3 ILOAD IADD ISTORE Suppose the value of n is 100. If two threads concurrently execute n = n + 1, the obtained result is highly likely not 102, but rather 101. The reason being:\n1 2 3 4 5 6 7 8 9 10 11 ┌───────┐ ┌───────┐ │Thread1│ │Thread2│ └───┬───┘ └───┬───┘ │ │ │ILOAD (100) │ │ │ILOAD (100) │ │IADD │ │ISTORE (101) │IADD │ │ISTORE (101) │ ▼ ▼ If Thread 1 is interrupted by the OS after executing ILOAD, and if Thread 2 is scheduled to run at that exact moment, the value it retrieves after executing ILOAD is still 100. Ultimately, after the ISTORE writes of both threads, the result becomes 101 instead of the anticipated 102.\nThis demonstrates that beneath the multithreaded model, to ensure logic exactness, when reading and writing shared variables, you must ensure a group of instructions are executed atomically: meaning when an individual thread is executing, other threads must wait:\nsynchronized Synchronization Lock 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ┌───────┐ ┌───────┐ │Thread1│ │Thread2│ └───┬───┘ └───┬───┘ │ │ │-- lock -- │ │ILOAD (100) │ │IADD │ │ISTORE (101) │ │-- unlock -- │ │ │-- lock -- │ │ILOAD (101) │ │IADD │ │ISTORE (102) │ │-- unlock -- ▼ ▼ Through lock and unlock operations, we ensure that the 3 instructions always execute within a single thread\u0026rsquo;s execution period, preventing other threads from entering this instruction region. Even if the executing thread is interrupted by the OS, other threads still cannot enter this region because they cannot acquire the lock. Only after the executing thread releases the lock can other threads acquire it and proceed. The code block between locking and unlocking is called a Critical Section. At any given time, at most one thread can execute within the critical section.\nEvidently, ensuring the atomicity of a segment of code is achieved by acquiring and releasing a lock. A Java program uses the synchronized keyword to lock an object:\n1 2 3 synchronized(lock) { n = n + 1; } synchronized guarantees that the code block can be executed by at most one thread at an arbitrary moment. We can rewrite the code above utilizing synchronized as follows:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 // Multiple threads public class Main { public static void main(String[] args) throws Exception { var add = new AddThread(); var dec = new DecThread(); add.start(); dec.start(); add.join(); dec.join(); System.out.println(Counter.count); } } class Counter { public static final Object lock = new Object(); public static int count = 0; } class AddThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { synchronized(Counter.lock) { Counter.count += 1; } } } } class DecThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { synchronized(Counter.lock) { Counter.count -= 1; } } } } Observe the code:\n1 2 3 synchronized(Counter.lock) { // Acquire lock ... } // Release lock It indicates using the Counter.lock instance as a lock. When the two threads execute their respective synchronized(Counter.lock) { ... } code blocks, they must first acquire the lock before entering the code block. After execution concludes, the lock is automatically released at the end of the synchronized statement block. In this way, reading and writing the Counter.count variable simultaneously is impossible. No matter how many times the code above is run, the final result is always 0.\nUsing synchronized solves the problem of correct concurrent access to shared variables by multiple threads. However, its disadvantage is a performance drop, because synchronized code blocks cannot execute concurrently. Additionally, acquiring and releasing locks requires a certain amount of time, meaning synchronized reduces the program\u0026rsquo;s execution efficiency.\nLet\u0026rsquo;s outline how to use synchronized:\nIdentify the thread code blocks that modify shared variables; Choose a shared instance as a lock; Use synchronized(lockObject) { ... }. When using synchronized, you do not need to worry about exceptions being thrown. Because regardless of whether there is an exception or not, the lock will be released correctly at the end of synchronized:\n1 2 3 4 5 6 7 8 public void add(int m) { synchronized (obj) { if (m \u0026lt; 0) { throw new RuntimeException(); } this.value += m; } // The lock is released here regardless of exceptions } Moreover, multiple threads can concurrently obtain their respective locks simultaneously: because JVM only ensures that the same lock can only be acquired by one thread at any arbitrary moment, but two different locks can be acquired separately by two threads at the same time.\nTherefore, when using synchronized, which lock is acquired is extremely important. If the lock object is incorrect, the code logic will be wrong.\nBelow is an example of employing two different locks to improve efficiency:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 public class Main { public static void main(String[] args) throws Exception { var ts = new Thread[] { new AddStudentThread(), new DecStudentThread(), new AddTeacherThread(), new DecTeacherThread() }; for (var t : ts) { t.start(); } for (var t : ts) { t.join(); } System.out.println(Counter.studentCount); System.out.println(Counter.teacherCount); } } class Counter { public static final Object lockStudent = new Object(); public static final Object lockTeacher = new Object(); public static int studentCount = 0; public static int teacherCount = 0; } class AddStudentThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { synchronized(Counter.lockStudent) { Counter.studentCount += 1; } } } } class DecStudentThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { synchronized(Counter.lockStudent) { Counter.studentCount -= 1; } } } } class AddTeacherThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { synchronized(Counter.lockTeacher) { Counter.teacherCount += 1; } } } } class DecTeacherThread extends Thread { public void run() { for (int i=0; i\u0026lt;10000; i++) { synchronized(Counter.lockTeacher) { Counter.teacherCount -= 1; } } } } Operations That Do Not Require synchronized The JVM specification defines several atomic operations:\nAssignment of basic types (excluding long and double), e.g., int n = m; Reference type assignment, e.g., List\u0026lt;String\u0026gt; list = anotherList. long and double are 64-bit data. The JVM does not strictly specify whether 64-bit assignments are atomic, but on x64 platform JVMs, the assignments of long and double are implemented as atomic operations.\nStatements with a single atomic operation do not require synchronization. For example:\n1 2 3 4 5 public void set(int m) { synchronized(lock) { this.value = m; } } Does not require synchronization.\nIt\u0026rsquo;s similar for references. For example:\n1 2 3 public void set(String s) { this.value = s; } The aforementioned assignment statement does not require synchronization.\nHowever, if they are multi-line assignment statements, they must be guaranteed to be synchronized operations. For example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class Point { int x; int y; public void set(int x, int y) { synchronized(this) { this.x = x; this.y = y; } } public int[] get() { synchronized(this) { return new int[]{x,y}; } } } The read and write operations above, namely (set(), get()), need synchronization. If reading is unsynchronized, it will cause logical errors in the program:\n1 2 3 4 5 public int[] get() { int[] copy = new int[2]; copy[0] = x; copy[1] = y; } Suppose the current coordinates are (100, 200). Then, when setting the new coordinates to (110, 220), the values read multithreadedly by the aforementioned unsynchronized code might be:\n(100, 200): before updating x and y; (110, 200): after updating x, before updating y; (110, 220): after updating x and y. If it reads (110, 200), i.e., having read the updated x but the pre-update y, there\u0026rsquo;s no guarantee the states of multiple read variables stay consistent.\nSometimes, through some clever transformations, non-atomic operations can be turned into atomic operations. For example, if the code above is rewritten as:\n1 2 3 4 5 6 7 class Point { int[] ps; public void set(int x, int y) { int[] ps = new int[] { x, y }; this.ps = ps; } } Synchronization is no longer required because this.ps = ps is an atomic operation for reference assignment. Meanwhile, the statement:\n1 int[] ps = new int[] { x, y }; Here, ps is a local variable defined inside the method. Every thread will have its own individual local variables, unaffecting each other and remaining mutually invisible, hence demanding no synchronization.\nNote, however, that the reading method still requires synchronization during the process of copying the int[] array.\nImmutable Objects Do Not Require Synchronization Immutable objects denote objects whose state cannot be altered after creation. In Java, typical immutable objects include:\nString Immutable collections created by List.of() (Java 9+) Wrapper classes for basic types (e.g., Integer, Long, etc.) If multiple threads read from or write to an immutable object, synchronization isn\u0026rsquo;t necessary because the object\u0026rsquo;s state won\u0026rsquo;t be modified:\n1 2 3 4 5 6 7 8 9 class Data { List\u0026lt;String\u0026gt; names; void set(String[] names) { this.names = List.of(names); } List\u0026lt;String\u0026gt; get() { return this.names; } } Notice that the set() method internally created an immutable List. The objects incorporated in this List are also immutable String objects; therefore, the whole List\u0026lt;String\u0026gt; instance comprises immutability, making both reading and writing synchrony-free.\nWhen analyzing whether a variable can be accessed concurrently by multiple threads, one must first clarify the concepts. What multiple threads execute simultaneously are methods. For the example below:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class Status { List\u0026lt;String\u0026gt; names; int x; int y; void set(String[] names, int n) { List\u0026lt;String\u0026gt; ns = List.of(names); this.names = ns; int step = n * 10; this.x += step; this.y += step; } StatusRecord get() { return new StatusRecord(this.names, this.x, this.y); } } If threads A and B exist, \u0026ldquo;executing simultaneously\u0026rdquo; signifies:\nset() might execute simultaneously; get() might execute simultaneously; A might execute set() concurrently while B executes get(). The class member variables names, x, y clearly can be simultaneously read and written by multiple threads, but local variables (including method parameters) if not \u0026ldquo;escaped\u0026rdquo;, remain solely visible to the current thread. The local variable step is only used inside the set() method, therefore when every thread executes set(), it contains an independent storage of step in the thread\u0026rsquo;s stack, without mutual influence.\nThe local variable ns is also held separately by each thread, but the subsequent assignment this.names = ns turns it visible to other threads. If the set() method is synchronized, and you wish to minimize the synchronized code block, you can rewrite it as:\n1 2 3 4 5 6 7 8 9 10 void set(String[] names, int n) { // Local variables are invisible to other threads: List\u0026lt;String\u0026gt; ns = List.of(names); int step = n * 10; synchronized(this) { this.names = ns; this.x += step; this.y += step; } } Therefore, deeply understanding multithreading requires comprehending variable storage in the stack, as primitive types and reference types are stored differently.\nScenario Requires Sync Reason Immutable object (e.g., List.of()) No Object immutable, multi-thread read-only, no race conditions. Local variable (e.g., step) No Thread private, confined to stack. Member variable assignment (e.g., this.names) Yes Reference could be modified simultaneously; needs sync or volatile. Compound ops (e.g., x += step) Yes Non-atomic operations (read-modify-write); needs sync. Summary When multiple threads concurrently read and write shared variables, logical errors may occur; therefore, synchronization via synchronized is required.\nThe essence of synchronization is locking a specified object; only after acquiring the lock can the subsequent code execute.\nNote that the lock object must be the same instance.\nSingle atomic operations defined by the JVM do not require synchronization.\nThread Synchronization Methods Thread Safety If a class is designed to permit multiple threads to access it correctly, we say this class is \u0026ldquo;thread-safe.\u0026rdquo; The java.lang.StringBuffer in the Java standard library is also thread-safe.\nThere are also some immutable classes, such as String, Integer, LocalDate, whose member variables are all final. Multiple threads can only read and cannot write when accessing them simultaneously. These immutable classes are also thread-safe.\nLastly, classes like Math that only provide static methods and have no member variables are also thread-safe.\nApart from the few exceptions mentioned above, most classes, such as ArrayList, are non-thread-safe classes. We cannot modify them in a multithreaded environment. However, if all threads only read and do not write, then ArrayList can be safely shared across threads.\nWithout specific elaboration, a class is non-thread-safe by default.\nTake the Counter class below for example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public class Counter { private int count = 0; public void add(int n) { synchronized(this) { count += n; } } public void dec(int n) { synchronized(this) { count -= n; } } public int get() { return count; } } This way, when a thread calls the add() and dec() methods, it doesn\u0026rsquo;t need to care about synchronization logic because the synchronized code block is inside the add() and dec() methods. Moreover, we notice that the object locked by synchronized is this, meaning the current instance, which again ensures that when multiple Counter instances are created, they do not influence each other and can execute concurrently.\nThe synchronized Modifier Let\u0026rsquo;s observe the Counter code again:\n1 2 3 4 5 6 7 8 public class Counter { public void add(int n) { synchronized(this) { count += n; } } ... } When what we lock is the this instance, we can actually use synchronized to modify the method. The following two approaches are equivalent:\n1 2 3 4 5 public void add(int n) { synchronized(this) { // Lock \u0026#39;this\u0026#39; count += n; } // Unlock } Approach two:\n1 2 3 public synchronized void add(int n) { // Lock \u0026#39;this\u0026#39; count += n; } // Unlock Therefore, a method modified with synchronized is a synchronized method, which signifies that the entire method must be locked using the this instance.\nFor static methods, there is no this instance because static methods target the class rather than an instance. However, we note that any class has a Class instance automatically created by the JVM. Hence, adding synchronized to a static method locks the Class instance of that class. The aforementioned synchronized static method actually equates to:\n1 2 3 4 5 6 7 public class Counter { public static void test(int n) { synchronized(Counter.class) { ... } } } Summary Using synchronized to modify a method can turn the entire method into a synchronized code block. The locking object for a synchronized method is this.\nThrough reasonable design and data encapsulation, a class can become \u0026ldquo;thread-safe\u0026rdquo;.\nUnless otherwise stated, a class is not thread-safe by default.\nWhether multiple threads can safely access a certain non-thread-safe instance requires analyzing the specific situation.\nDeadlock Reentrant Locks Java\u0026rsquo;s thread locks are reentrant locks.\nWhat is a reentrant lock? Let\u0026rsquo;s check out an example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public class Counter { private int count = 0; public synchronized void add(int n) { if (n \u0026lt; 0) { dec(-n); } else { count += n; } } public synchronized void dec(int n) { count += n; } } Execution flow:\nCalling add(-1): Acquires the this lock: counter = 1, holding thread = current thread. Calls dec(1) after entering the add method: Acquires the this lock again: discovers it is already held by the current thread, counter increases to 2. Exits the dec method: Counter decreases to 1. Exits the add method: Counter decreases to 0, lock truly released. Observe the add() method modified by synchronized. Once a thread executes inside the add() method, it implies that it has already obtained the this lock of the current instance. If the passed n \u0026lt; 0, the dec() method will be called inside the add() method. Because the dec() method also needs to acquire the this lock, a question arises:\nFor the same thread, is it possible to continue acquiring the same lock after having already acquired it?\nThe answer is affirmative. The JVM permits the same thread to repeatedly acquire the same lock. A lock that can be repeatedly acquired by the same thread is called a reentrant lock.\nSince Java\u0026rsquo;s thread locks are reentrant locks, when acquiring a lock, it not only checks whether it is being acquired for the first time but also records the number of acquisitions. Each time the lock is acquired, the record is incremented by 1, and each time a synchronized block is exited, the record is decremented by 1. Only when the record decreases to 0 is the lock genuinely released.\nDeadlock A thread can acquire one lock and then proceed to acquire another lock. For example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public void add(int m) { synchronized(lockA) { // Acquire the lock for lockA this.value += m; synchronized(lockB) { // Acquire the lock for lockB this.another += m; } // Release the lock for lockB } // Release the lock for lockA } public void dec(int m) { synchronized(lockB) { // Acquire the lock for lockB this.another -= m; synchronized(lockA) { // Acquire the lock for lockA this.value -= m; } // Release the lock for lockA } // Release the lock for lockB } When acquiring multiple locks, different threads acquiring locks of multiple distinct objects may induce a deadlock. For the code above, if thread 1 and thread 2 simultaneously execute the add() and dec() methods respectively:\nThread 1: Enters add(), obtains lockA; Thread 2: Enters dec(), obtains lockB. Subsequently:\nThread 1: Prepares to obtain lockB, fails, waiting; Thread 2: Prepares to obtain lockA, fails, waiting. At this point, the two threads each hold different locks and then attempt to acquire the lock held by the other, resulting in an infinite mutual wait. This is a deadlock.\nAfter a deadlock occurs, there\u0026rsquo;s no mechanism to clear it; the JVM process can merely be forcefully terminated.\nHence, when writing multi-threaded applications, particular attention should be paid to guard against deadlock. Because once a deadlock forms, one can only forcefully terminate the process.\nSo how should we avoid deadlocks? The answer is: the order in which threads acquire locks must be consistent. Specifically, strictly follow the order of acquiring lockA first, then lockB. The rewritten dec() method is as follows:\n1 2 3 4 5 6 7 8 public void dec(int m) { synchronized(lockA) { // Acquire the lock for lockA this.value -= m; synchronized(lockB) { // Acquire the lock for lockB this.another -= m; } // Release the lock for lockB } // Release the lock for lockA } Summary Java\u0026rsquo;s synchronized locks are reentrant locks.\nDeadlock preconditions imply multiple threads each hold different locks and mutually attempt to retrieve the locks already held by the other, causing infinite waiting.\nAvoiding deadlock relies on multiple threads acquiring locks in an identical order.\nThread Communication In Java programs, synchronized resolves the problem of multithread competition. For instance, for a task manager, when multiple threads concurrently add tasks to a queue, synchronized can be used to apply locks:\n1 2 3 4 5 6 7 class TaskQueue { Queue\u0026lt;String\u0026gt; queue = new LinkedList\u0026lt;\u0026gt;(); public synchronized void addTask(String s) { this.queue.add(s); } } However, synchronized does not solve the coordination problem of multiple threads.\nStill using the TaskQueue above as an example, let\u0026rsquo;s write another getTask() method to extract the first task from the queue:\n1 2 3 4 5 6 7 8 9 10 11 12 13 class TaskQueue { Queue\u0026lt;String\u0026gt; queue = new LinkedList\u0026lt;\u0026gt;(); public synchronized void addTask(String s) { this.queue.add(s); } public synchronized String getTask() { while (queue.isEmpty()) { } return queue.remove(); } } The code above seems faultless: getTask() initially checks whether the queue is empty internally. If it is empty, it waits in a loop until another thread inserts a task into the queue. The while() loop exits, and it can return the element from the queue.\nHowever, the while() loop will never actually exit. Because when the thread executes the while() loop, it has already acquired the this lock at the entrance of getTask(). Other threads can\u0026rsquo;t possibly call addTask(), as executing addTask() also requires acquiring the this lock.\nTherefore, executing the code above will cause the thread to 100% consume CPU resources inside getTask() due to an infinite loop.\nIf we think deeper, the execution effect we desire is:\nThread 1 can call addTask() to constantly add tasks to the queue; Thread 2 can call getTask() to fetch tasks from the queue. If the queue is empty, getTask() should wait until there is at least one task in the queue before returning. Thus, the principle of multiple threads coordinating their execution is: when conditions are not met, the thread enters a waiting state; when conditions are met, the thread is awakened to continue executing tasks.\nwait() For the TaskQueue above, let\u0026rsquo;s first transform the getTask() method to make the thread enter a waiting state when conditions aren\u0026rsquo;t met:\n1 2 3 4 5 6 public synchronized String getTask() { while (queue.isEmpty()) { this.wait(); } return queue.remove(); } When a thread executes to the while loop interior of the getTask() method, it must have already acquired the this lock. At this point, the thread evaluates the while condition. If the condition holds true (queue is empty), the thread will execute this.wait(), entering a waiting state.\nThe key here is: the wait() method must be invoked on the lock object currently acquired. The lock acquired here is this, hence the call to this.wait().\nAfter a thread invokes wait(), it enters a waiting state. The wait() method won\u0026rsquo;t return until a subsequent moment when the thread is awakened from its waiting state by another thread. After yielding to wait(), the thread continues processing the next statement.\nSome diligent folks might point out: even if a thread rests inside getTask(), if other threads fail to snag the this lock, they still won\u0026rsquo;t be able to execute addTask(), what do we do?\nThe crux of this problem lies in the fact that the execution mechanism of wait() is highly complex. First, it\u0026rsquo;s not a regular Java method, but a native method defined in the Object class, meaning it is implemented by JVM\u0026rsquo;s C code. Secondly, the wait() method can only be invoked within a synchronized block, because when wait() is called, it will release the lock obtained by the thread. When wait() returns, the thread will again attempt to acquire the lock.\nTherefore, the wait() method can only be invoked on the object lock. Because we acquired the this lock in getTask(), the wait() method can only be called on the this object.\n1 2 3 4 5 6 7 8 public synchronized String getTask() { while (queue.isEmpty()) { // Release the \u0026#39;this\u0026#39; lock: this.wait(); // Reacquire the \u0026#39;this\u0026#39; lock } return queue.remove(); } When a thread sleeps at this.wait(), it yields the this lock, enabling other threads to snare the this lock inside the addTask() method.\nnotify() Now we face a second problem: how do we get the slumbering thread to be reawakened and then return from wait()? The answer is calling notify() on the same lock object. Let\u0026rsquo;s alter addTask() as follows:\n1 2 3 4 public synchronized void addTask(String s) { this.queue.add(s); this.notify(); // Wake up threads waiting on the \u0026#39;this\u0026#39; lock } Notice that immediately after lodging a task into the queue, the thread instantly calls notify() on the this lock object. This method will awaken a thread that is presently sleeping on the this lock (which is the sequence suspended at this.wait() interior to getTask()), rendering the suspended thread capable of returning from this.wait().\nLet\u0026rsquo;s scrutinize a full example (which is also a producer-consumer model):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 import java.util.*; public class Main { public static void main(String[] args) throws InterruptedException { var q = new TaskQueue(); var ts = new ArrayList\u0026lt;Thread\u0026gt;(); for (int i=0; i\u0026lt;5; i++) { var t = new Thread() { public void run() { // Execute task: while (true) { try { String s = q.getTask(); System.out.println(\u0026#34;execute task: \u0026#34; + s); } catch (InterruptedException e) { return; } } } }; t.start(); ts.add(t); } var add = new Thread(() -\u0026gt; { for (int i=0; i\u0026lt;10; i++) { // Insert task: String s = \u0026#34;t-\u0026#34; + Math.random(); System.out.println(\u0026#34;add task: \u0026#34; + s); q.addTask(s); try { Thread.sleep(100); } catch(InterruptedException e) {} } }); add.start(); add.join(); Thread.sleep(100); for (var t : ts) { t.interrupt(); } } } class TaskQueue { Queue\u0026lt;String\u0026gt; queue = new LinkedList\u0026lt;\u0026gt;(); public synchronized void addTask(String s) { this.queue.add(s); this.notifyAll(); } public synchronized String getTask() throws InterruptedException { while (queue.isEmpty()) { this.wait(); } return queue.remove(); } } In this example, our focus is the addTask() method, which calls this.notifyAll() instead of this.notify(). Exerting notifyAll() awakens all threads presently lingering at the this lock, while notify() solely stirs one of them (which exact thread is contingent on the operating system, carrying explicit randomness). Because multiple threads might be waiting inside the wait() of the getTask() method, using notifyAll() will awaken them all at once. Usually speaking, notifyAll() is much safer. At times, if the logic lacks comprehensiveness, using notify() might lead to only one thread waking, while others might wait perpetually, never to wake.\nStill, take heed that wait() necessitates reobtaining the this lock when it returns. Suppose 3 threads receive a wake-up signal; following the awakening, they must first wait for the thread executing addTask() to finish the method before the this lock is dropped. Subsequently, among these 3 threads, only one manages to grasp the this lock; the leftover two will plunge back into waiting.\nAlso observe how we deploy wait() within a while() loop rather than within an if block:\n1 2 3 4 5 6 public synchronized String getTask() throws InterruptedException { if (queue.isEmpty()) { this.wait(); } return queue.remove(); } This arrangement fundamentally harbors errors, since a thread necessitates reacquiring the this lock after awakening. Once several threads are spurred, merely one thread nabs the this lock. Currently this thread executing queue.remove() extracts elements correctly, however, when the remaining threads attain the this lock down the line and execute queue.remove(), the queue might no longer contain any elements. Therefore wait() should always reside in a while loop, and the lock must be rechecked explicitly upon acquisition.\n1 2 3 while (queue.isEmpty()) { this.wait(); } Summary wait and notify are employed for multithread coordination:\nWithin synchronized, invoking wait() drives a thread into a waiting state; wait() must be triggered from an already held lock object; Within synchronized, notify() or notifyAll() can be called to awaken other waiting threads; notify() or notifyAll() must be triggered on an already-grasped lock object; Awakened threads are still compelled to secure the lock anew before they proceed with execution. Producer-Consumer Model Java Producer Consumer Model Implementation and Analysis_bilibili\nBelow is a simple producer-consumer model example from Bilibili. Although it\u0026rsquo;s not as complex as the preceding thread communication example or the messaging queue example below, mastering these three examples should be sufficient to grasp this model.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 public class Demo1 { /** * Alternates execution of two threads * One outputs \u0026#34;1,2,3,...\u0026#34; * The other outputs \u0026#34;a,b,c,...\u0026#34; */ public static void main(String[] args) { Factory factory = new Factory(); final Thread t1 = new Thread(new Runnable() { @Override public void run() { for(int i = 1;i \u0026lt;= 26;i++){ factory.product(i); } } }); final Thread t2 = new Thread(new Runnable() { @Override public void run() { for(int i = \u0026#39;a\u0026#39;;i \u0026lt;= \u0026#39;z\u0026#39;;i++){ factory.consume((char) i); } } }); t1.start(); t2.start(); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 public class Factory { /** * 0: Producer is producing, consumer is waiting. After producing, producer notifies consumer to consume. * 1: Consumer is consuming, producer is waiting. After consuming, consumer notifies producer to produce. */ private int sign = 0;\t// State value public synchronized void product(int n){ if(sign == 1){ try { this.wait(); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.print(n); this.notify(); this.sign = 1; } public synchronized void consume(char c){ if(sign == 0){ try { this.wait(); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.print(c); this.notify(); this.sign = 0; } } The execution of threads carries a degree of randomness that users cannot completely control. However, the producer-consumer model can achieve the \u0026ldquo;alternating\u0026rdquo; execution of two threads.\nThe comments explain most of the logic. Let\u0026rsquo;s analyze it further:\nAssume thread t1 is called first. Since sign = 0, it prints character 1 and changes sign to 1. From here, there are two possibilities: thread t1 or t2 is called next.\nCalling t1:\nsign = 1, enters the try/catch block. Calls this.wait() on the synchronization lock object, entering the \u0026ldquo;waiting\u0026rdquo; state. wait() will release the lock. Thread t2 executes, running consume(). notify() awakens thread t1 currently waiting on this. sign is assigned 0, and the cycle repeats. Calling t2:\nThread t2 executes, running consume(). notify() does not awaken any thread (because no thread is in a waiting state). sign is assigned 0, and the cycle repeats. Example Analysis Below is a more complex (and realistic) example. The concept is quite similar to the simple example above.\nLet\u0026rsquo;s clarify the inline lambda expression used to create a thread in the example below:\nnew Thread() - Creates a new thread () -\u0026gt; {...} - Lambda expression defining the thread task \u0026quot;Producer\u0026quot; + i - Thread naming .start() - Starts the thread Here, threads are created using a loop, so the loop variable is used to name them.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 public static void main(String[] args) throws InterruptedException { MessageQueue queue = new MessageQueue(2); // Three producers put values into the queue for (int i = 0; i \u0026lt; 3; i++) { int id = i; new Thread(() -\u0026gt; { queue.put(new Message(id, \u0026#34;Value \u0026#34; + id)); }, \u0026#34;Producer \u0026#34; + i).start(); } Thread.sleep(1000); // One consumer continuously takes values from the queue new Thread(() -\u0026gt; { while (true) { queue.take(); } }, \u0026#34;Consumer\u0026#34;).start(); } } // Message queue is shared by producers and consumers class MessageQueue { private LinkedList\u0026lt;Message\u0026gt; list = new LinkedList\u0026lt;\u0026gt;(); // Capacity private int capacity; public MessageQueue(int capacity) { this.capacity = capacity; } // Producer public void put(Message message) { synchronized (list) { while (list.size() == capacity) { log.info(\u0026#34;Queue is full, producer is waiting\u0026#34;); try { list.wait(); } catch (InterruptedException e) { e.printStackTrace(); } } list.addLast(message); log.info(\u0026#34;Produced message: {}\u0026#34;, message); // Notify consumers after producing list.notifyAll(); } } // Consumer public Message take() { synchronized (list) { while (list.isEmpty()) { log.info(\u0026#34;Queue is empty, consumer is waiting\u0026#34;); try { list.wait(); } catch (InterruptedException e) { e.printStackTrace(); } } Message message = list.removeFirst(); // Retrieve message from the head of the queue log.info(\u0026#34;Consumed message: {}\u0026#34;, message); // Notify producers after consuming list.notifyAll(); return message; } } } // Message class Message { private int id; private Object value; } Main Function:\nCreates a MessageQueue with a capacity of 2. Starts 3 producer threads. Each producer puts one message into the queue. The main thread sleeps for 1 second, giving producers enough time to start working. Starts a consumer thread that continuously extracts messages from the queue. Producer:\nUses a synchronized block to acquire the lock for the list object. Checks if the queue is full (using a while loop to prevent spurious wakeups). If the queue is full, calls wait() to release the lock and wait. When there is free space in the queue, adds a message to the end of the queue. Calls notifyAll() to wake up any consumer threads that might be waiting. Consumer:\nUses a synchronized block to acquire the lock for the list object. Checks if the queue is empty (using a while loop to prevent spurious wakeups). If the queue is empty, calls wait() to release the lock and wait. When there are messages in the queue, retrieves a message from the head of the queue. Calls notifyAll() to wake up any producer threads that might be waiting. Returns the retrieved message. Summary Synchronization Mechanism: Uses synchronized to ensure atomic operations on the queue. Wait/Notify Mechanism: Uses wait() and notifyAll() to achieve inter-thread communication. Loop Condition Check: Uses while instead of if to check conditions, preventing spurious wakeups. Capacity Limits: Controls queue size to prevent memory exhaustion. ReentrantLock Starting with Java 5, an advanced java.util.concurrent package was introduced to handle concurrency. It provides numerous robust concurrent functionalities, greatly simplifying multi-threaded programming.\nWe know that Java natively provides the synchronized keyword for locking. However, this lock is somewhat heavy, and when trying to acquire it, threads must wait indefinitely without any mechanism to attempt locking and abort if failed.\nThe ReentrantLock offered by the java.util.concurrent.locks package serves as a substitute for synchronized locking. Let\u0026rsquo;s look at conventional synchronized code:\n1 2 3 4 5 6 7 8 9 public class Counter { private int count; public void add(int n) { synchronized(this) { count += n; } } } If we replace it with ReentrantLock, we can modify the code as follows:\n1 2 3 4 5 6 7 8 9 10 11 12 13 public class Counter { private final Lock lock = new ReentrantLock(); private int count; public void add(int n) { lock.lock(); try { count += n; } finally { lock.unlock(); } } } Because synchronized is syntax provided directly at the Java language level, we don\u0026rsquo;t consider exceptions. But since ReentrantLock is a lock implemented in Java code, we must explicitly acquire the lock and then reliably release it within a finally block.\nAs the name implies, ReentrantLock is a reentrant lock. Like synchronized, a thread can acquire the same lock multiple times.\nUnlike synchronized, ReentrantLock allows one to attempt acquiring a lock:\n1 2 3 4 5 6 7 if (lock.tryLock(1, TimeUnit.SECONDS)) { try { ... } finally { lock.unlock(); } } In the code above, when trying to capture the lock, it waits up to 1 second. If the lock is still not obtained after 1 second, tryLock() returns false. This allows the program to handle it elegantly rather than waiting infinitely.\nTherefore, using ReentrantLock is safer than raw synchronized; if a thread fails during tryLock(), it won\u0026rsquo;t lead to a deadlock.\nBelow, we introduce its various methods, along with a more complex example.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 // Default non-fair lock, passing \u0026#39;true\u0026#39; creates a fair lock ReentrantLock lock = new ReentrantLock(false); // Try to acquire the lock lock() // Release the lock. Should be placed in a finally block to ensure it is executed unlock() try { // Can be interrupted while acquiring the lock; blocked threads can be interrupted LOCK.lockInterruptibly(); } catch (InterruptedException e) { return; } // Try to acquire the lock. Returns false if unobtainable LOCK.tryLock() // Supports timeout. Returns false if the lock is not acquired within the specified duration tryLock(long timeout, TimeUnit unit) // Specify condition variable (waiting room). One lock can create multiple waiting rooms Condition waitSet = ROOM.newCondition(); // Release the lock and enter waitSet to wait. Other threads can contend for the lock after release yanWaitSet.await() // Wake up threads in a specific waiting room. After waking up, they re-compete for the lock yanWaitSet.signal() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 public static void main(String[] args) { AwaitSignal awaitSignal = new AwaitSignal(5); // Build three condition variables Condition a = awaitSignal.newCondition(); Condition b = awaitSignal.newCondition(); Condition c = awaitSignal.newCondition(); // Start three threads new Thread(() -\u0026gt; { awaitSignal.print(\u0026#34;a\u0026#34;, a, b); }).start(); new Thread(() -\u0026gt; { awaitSignal.print(\u0026#34;b\u0026#34;, b, c); }).start(); new Thread(() -\u0026gt; { awaitSignal.print(\u0026#34;c\u0026#34;, c, a); }).start(); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } awaitSignal.lock(); try { // Wake up \u0026#39;a\u0026#39; first a.signal(); } finally { awaitSignal.unlock(); } } } class AwaitSignal extends ReentrantLock { // Number of loops private int loopNumber; public AwaitSignal(int loopNumber) { this.loopNumber = loopNumber; } /** * @param print Character to print * @param current Current condition variable * @param next Next condition variable */ public void print(String print, Condition current, Condition next) { for (int i = 0; i \u0026lt; loopNumber; i++) { lock(); try { try { // Wait after acquiring the lock current.await(); System.out.print(print); } catch (InterruptedException e) { } next.signal(); } finally { unlock(); } } } Process Analysis:\nInitialization: The main thread creates an AwaitSignal object, setting the number of loops to 5. Three Condition objects are created: a, b, c, corresponding respectively to three threads. The three threads start, respectively calling print(\u0026quot;a\u0026quot;, a, b), print(\u0026quot;b\u0026quot;, b, c), and print(\u0026quot;c\u0026quot;, c, a). After sleeping for 1 second, the main thread acquires the lock and wakes up thread A via a.signal(). After thread startup: Each thread enters the print method and executes lock() to acquire the lock. Because ReentrantLock is a mutual exclusion lock, only one thread can hold the lock at any given moment. Assuming thread A acquires the lock first, it calls a.await(), releasing the lock and entering a waiting state (waiting for the signal of Condition a). The other threads (B and C) attempt lock(), but the lock is occupied, so they block on lock(). Main thread wakes up thread A: After try { Thread.sleep(1000); }, the main thread executes awaitSignal.lock(), acquiring the lock. It calls a.signal(), awakening thread A which is waiting on Condition a. The main thread executes unlock(), releasing the lock. After thread A is awakened: Thread A returns from a.await(), but it needs to reacquire the lock to continue execution. Because the main thread has already released the lock (unlock()), thread A successfully reacquires the lock. Thread A prints \u0026ldquo;a\u0026rdquo;, then calls b.signal() to wake up thread B. Thread A executes unlock(), releasing the lock. After thread B is awakened: Thread B has been waiting on b.await(), and is awakened after receiving b.signal(). Thread B attempts to reacquire the lock. Since thread A has released the lock, thread B succeeds in acquiring the lock. Thread B prints \u0026ldquo;b\u0026rdquo;, calls c.signal() to awaken thread C, and then releases the lock. Summary ReentrantLock can substitute for synchronized to perform synchronization operations.\nAcquiring a lock with ReentrantLock is safer.\nOne must first acquire the lock before entering a try {...} code block, and finally use a finally block to guarantee the lock\u0026rsquo;s release.\nYou can use tryLock() to attempt acquiring a lock.\nThread Pool (The explanations typically found regarding thread pools are rather vague.)\nAlthough Java natively provides multithreading support and starting a new thread is very convenient, creating a thread inherently demands operating system resources (such as thread resources, stack space, etc.). The frequent creation and destruction of massive amounts of threads consume a tremendous amount of time.\nWhat if we could reuse a set of threads:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 ┌─────┐ execute ┌──────────────────┐ │Task1│─────────▶│ThreadPool │ ├─────┤ │┌───────┐┌───────┐│ │Task2│ ││Thread1││Thread2││ ├─────┤ │└───────┘└───────┘│ │Task3│ │┌───────┐┌───────┐│ ├─────┤ ││Thread3││Thread4││ │Task4│ │└───────┘└───────┘│ ├─────┤ └──────────────────┘ │Task5│ ├─────┤ │Task6│ └─────┘ ... Then we can have a group of threads execute many small tasks, instead of creating a new thread for each task. This mechanism that accepts large numbers of small tasks and distributes them for processing is called a thread pool.\nSimply put, a thread pool internally maintains a set of threads. When there are no tasks, these threads are in a waiting state. When a new task arrives, an idle thread is assigned to execute it. If all threads are busy, the new task is either placed in a queue to wait, or a new thread is created to handle it.\nThe Java standard library provides the ExecutorService interface representing thread pools, whose typical usage goes as follows:\n1 2 3 4 5 6 7 8 // Create a fixed-size thread pool: ExecutorService executor = Executors.newFixedThreadPool(3); // Submit tasks: executor.submit(task1); executor.submit(task2); executor.submit(task3); executor.submit(task4); executor.submit(task5); Since ExecutorService is just an interface, the Java standard library provides several common implementations:\nFixedThreadPool: A thread pool with a fixed number of threads; CachedThreadPool: A thread pool that dynamically adjusts its thread count based on the number of tasks; SingleThreadExecutor: A thread pool that uses only a single thread for execution. The methods to create these thread pools are all encapsulated in the Executors class. Let\u0026rsquo;s use FixedThreadPool as an example to see how a thread pool executes:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 // thread-pool import java.util.concurrent.*; public class Main { public static void main(String[] args) { // Create a fixed-size thread pool: ExecutorService es = Executors.newFixedThreadPool(4); for (int i = 0; i \u0026lt; 6; i++) { es.submit(new Task(\u0026#34;\u0026#34; + i)); } // Shut down the thread pool: es.shutdown(); } } class Task implements Runnable { private final String name; public Task(String name) { this.name = name; } @Override public void run() { System.out.println(\u0026#34;start task \u0026#34; + name); try { Thread.sleep(1000); } catch (InterruptedException e) { } System.out.println(\u0026#34;end task \u0026#34; + name); } } Looking at the execution results, when 6 tasks are submitted at once, only the first 4 tasks execute simultaneously because the thread pool has a fixed size of 4 threads. The remaining two tasks execute only after some threads become idle.\nThread pools must be shut down when the program terminates. When utilizing the shutdown() method to close a thread pool, it will wait for currently executing tasks to conclude prior to closing. shutdownNow() immediately halts operating tasks, whereas awaitTermination() will delay for a specified period for the thread pool to close sequentially.\nIf we switch to a CachedThreadPool, since this thread pool implementation dynamically adjusts its size based on the number of tasks, all 6 tasks can execute simultaneously.\nWhat if we wish to confine the thread pool\u0026rsquo;s size to dynamically adjust between 4 and 10? We inspect the source code of the Executors.newCachedThreadPool() method:\n1 2 3 4 5 6 public static ExecutorService newCachedThreadPool() { return new ThreadPoolExecutor( 0, Integer.MAX_VALUE, 60L, TimeUnit.SECONDS, new SynchronousQueue\u0026lt;Runnable\u0026gt;()); } Thus, to construct a thread pool with a specified dynamic boundary range, we can draft it as:\n1 2 3 4 5 6 int min = 4; int max = 10; ExecutorService es = new ThreadPoolExecutor( min, max, 60L, TimeUnit.SECONDS, new SynchronousQueue\u0026lt;Runnable\u0026gt;()); ScheduledThreadPool There is another type of task that needs to be executed periodically, for example, refreshing stock prices every second. Such tasks that are fixed in nature and need to run repeatedly can use ScheduledThreadPool. Tasks placed in a ScheduledThreadPool can be executed on a recurring schedule.\nCreating a ScheduledThreadPool is still done through the Executors class:\n1 ScheduledExecutorService ses = Executors.newScheduledThreadPool(4); We can submit a one-time task that will be executed once after a specified delay:\n1 2 // Execute a one-time task after 1 second: ses.schedule(new Task(\u0026#34;one-time\u0026#34;), 1, TimeUnit.SECONDS); If a task proceeds on a fixed 3-second routine consistently, we frame it as:\n1 2 // Begin a periodic task after 2 seconds, execute every 3 seconds: ses.scheduleAtFixedRate(new Task(\u0026#34;fixed-rate\u0026#34;), 2, 3, TimeUnit.SECONDS); If tasks execute consecutively spaced with fixed 3-second buffering intervals universally, we implement it as:\n1 2 // Begin a periodic task after 2 seconds, execute matching 3-second buffer intervals sequentially: ses.scheduleWithFixedDelay(new Task(\u0026#34;fixed-delay\u0026#34;), 2, 3, TimeUnit.SECONDS); Note the difference between FixedRate and FixedDelay:\nFixedRate means that the task is always triggered at a fixed time interval, regardless of how long the task takes to execute:\n1 2 3 │░░░░ │░░░░░░ │░░░ │░░░░░ │░░░ ├───────┼───────┼───────┼───────┼────▶ │◀─────▶│◀─────▶│◀─────▶│◀─────▶│ FixedDelay, on the other hand, means that after the previous task finishes executing, it waits for a fixed time interval before executing the next task:\n1 2 3 │░░░│ │░░░░░│ │░░│ │░ └───┼───────┼─────┼───────┼──┼───────┼──▶ │◀─────▶│ │◀─────▶│ │◀─────▶│ Therefore, when using a ScheduledThreadPool, we must choose whether to execute a task once, at a fixed rate (FixedRate), or with a fixed delay (FixedDelay), depending on our requirements.\nYou can also consider the following questions:\nIn FixedRate mode, assuming a task is triggered every second, if a particular execution takes longer than 1 second, will the subsequent tasks execute concurrently? If a task throws an exception, will the subsequent tasks continue to execute? The Java Standard Library also provides the java.util.Timer class, which can execute tasks periodically. However, a single Timer is backed by a single Thread. Because of this, one Timer can only execute one task periodically; to run multiple scheduled tasks, you must start multiple Timer instances. In contrast, a single ScheduledThreadPool can schedule multiple periodic tasks. Therefore, we can completely replace the legacy Timer class with ScheduledThreadPool.\n","date":"2025-03-21T14:50:07+08:00","permalink":"/en/p/java-multithreading/","title":"Java Multithreading"},{"content":"I recently set up a physical server and wanted to use it to host a Minecraft server. After staying up late researching, I succeeded and am sharing this guide in hopes that it helps everyone, especially Linux users without a public IP environment (like me).\nReference Websites Installing and using Red Hat build of OpenJDK 21 on RHEL | Red Hat Product Documentation\nSakuraFrp Launcher Installation / Usage Guide | SakuraFrp Documentation\nCentOS | Docker Docs\nLinux Terminal Server Hosting Tutorial ★ No Panel ★ Minecraft_bilibili\nFrom the Bilibili uploader 翱翔大使, which is the main source of ideas for this whole article. Java Configuration Running Minecraft requires the corresponding version of the Java environment; here I installed OpenJDK 21.\n1 2 sudo yum install java-21-openjdk java -version // verify if successfully installed If the server has multiple Java versions, you can use alternatives to switch versions.\n1 alternatives --config java As shown below, we enter 2 and press Enter to switch to the required version.\nGame Deployment First, download the Minecraft server software from the following URL. Here I downloaded Banner (1.20.1), which supports Fabric.\n[MohistMC](MohistMC - Home)\nOnce downloaded, you will get a file like banner-1.20.1-800-server.jar. Next, open your SSH client to operate on the server:\n1 2 3 cd /home/username // switch to personal directory or desired installation location mkdir Minecraft // create a folder for the game cd Minecraft Using the SFTP feature in your SSH client (or any other file transfer method), copy the game file banner-1.20.1-800-server.jar you just downloaded into the newly created /home/username/Minecraft folder.\nNext, let\u0026rsquo;s write a startup script for the server.\n1 nano start.sh Fill in the following content, but note the purpose of each parameter:\n-Xmx is the maximum allocated memory, -Xms is the minimum allocated memory. I have 32GB of memory and allocated 6GB to the game (feel free to allocate more). banner-1.20.1-800-server.jar is the name of the game file you just downloaded. 1 2 java -Xmx6144M -Xms6144m -jar banner-1.20.1-800-server.jar stty echo Press Ctrl + O to write, Enter to confirm, and Ctrl + X to exit.\nNext, grant execution permissions to start.sh to avoid permission denied issues.\n1 chmod 777 start.sh Then install screen. Simply put, screen is a tool that helps users create independent sessions that can be resumed at any time.\n1 yum install screen screen has the following common commands:\n1 2 3 screen -S [name] // create a new screen named \u0026#34;name\u0026#34; screen -ls // list names and ports of all running screens screen -r [port] // attach to the screen with the specified port Next, create a new screen to run the script.\n1 screen -S Minecraft In the newly appeared session, run start.sh.\n1 ./start.sh Then everything should go smoothly. I didn\u0026rsquo;t encounter any errors here. Finally, it comes to ...EULA... asking us to agree to the EULA. Enter true and press Enter. After a short wait, the game server will run successfully on port 25565.\nTo leave this Minecraft screen, just press Ctrl+A+D.\nFor game rule changes (like \u0026lsquo;whether cracked players are allowed to join\u0026rsquo;), you need to modify the contents of server.properties.\nRegarding connection: If it\u0026rsquo;s a cloud server, map the port in the administration panel, then connect to domain:port in your Minecraft client.\nHowever, for a physical server like mine, or a Linux device, or a personal PC without a public IP, we need to proceed with Intranet Penetration.\nIntranet Penetration Here I am using SakuraFrp, which is well-known and reliable in the Minecraft community, for intranet penetration. Other tools are mostly similar.\nDocker SakuraFrp on Linux runs on Docker, so let\u0026rsquo;s deploy Docker first. The operations entirely follow the official documentation.\n1 2 sudo dnf -y install dnf-plugins-core sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo I encountered very slow installation speeds and download failures here. Running the command again solved the problem.\n1 sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin After the installation is complete:\n1 sudo systemctl enable --now docker 1 sudo docker run hello-world The run hello-world test above is very likely to fail. Let\u0026rsquo;s solve this issue by referring to the following two articles:\n[Complete Solution] Failed to run hello-world image after Docker installation: Unable to find image \u0026lsquo;hello-world:latest\u0026rsquo; locally - CSDN Blog\nFailed or Timeout Running hello-world Image in Docker - Paul7777 - cnblogs\nCombining the two above will eventually solve the problem. First, let\u0026rsquo;s configure the daemon file.\n1 nano /etc/docker/daemon.json Copy the following content into it:\nAt the time of my testing (2025/3/19), the following mirrors are still working. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 { \u0026#34;registry-mirrors\u0026#34;: [ \u0026#34;https://h59pkpv6.mirror.aliyuncs.com\u0026#34;, \u0026#34;https://registry.docker-cn.com\u0026#34;, \u0026#34;https://docker.mirrors.ustc.edu.cn\u0026#34;, \u0026#34;https://hub-mirror.c.163.com\u0026#34;, \u0026#34;https://mirror.baidubce.com\u0026#34;, \u0026#34;https://do.nark.eu.org\u0026#34;, \u0026#34;https://dc.j8.work\u0026#34;, \u0026#34;https://docker.m.daocloud.io\u0026#34;, \u0026#34;https://dockerproxy.com\u0026#34;, \u0026#34;https://docker.nju.edu.cn\u0026#34; ] } Save + Exit. Next, restart docker, and execute the test once more.\n1 2 3 sudo systemctl daemon-reload sudo systemctl restart docker docker run hello-world I successfully installed docker up to this point. If the test still fails, please check the content of daemon.json to see if there are missing or extra commas and brackets.\nSakuraFrp For deploying SakuraFrp on Linux, the official documentation provides a detailed solution.\nFirst, run the following command as an administrator in the terminal:\n1 sudo bash -c \u0026#34;. \u0026lt;(curl -sSL https://doc.natfrp.com/launcher.sh)\u0026#34; After installation, it should automatically output logs and prompt you to fill in the access token. This (or the subsequent operations) can be found in the management panel on the SakuraFrp official website.\nAfter logging in, you will be able to see its log files. Below are the common operations to start it and view logs:\n1 2 docker start natfrp-service docker logs natfrp-service As shown below, we now need to physically operate the server.\nOpen a browser (Linux usually comes with Firefox) and access the URL after \u0026ldquo;Usage\u0026rdquo; to open the WebUI.\nThen you will see there is nothing under \u0026ldquo;Tunnels\u0026rdquo;, only a plus sign. At this time, we enter the SakuraFrp management panel, find Tunnel List under Services, and create two new tunnels as shown below:\nThe first one with port 7102 is the WebUI for SakuraFrp on the server, intended for remote management.\nThe second one with port 25565 is for the Minecraft server.\nReturn to the WebUI interface and refresh, you will see the two tunnels just created. Double-click them respectively, and then go back to the terminal log interface.\nThe links in red characters as shown in the image are the remote access links for the WebUI and Minecraft. Just copy the Minecraft one into the game, and you can connect to it.\nConclusion We are done! (A screenshot of my server\u0026rsquo;s spawn point \u0026gt;w\u0026lt;)\n","date":"2025-03-19T18:48:59+08:00","permalink":"/en/p/deploying-a-minecraft-server-on-linux-centos/","title":"Deploying a Minecraft Server on Linux (CentOS)"},{"content":"Redis (Remote Dictionary Server) is a high-performance, open-source, in-memory data structure store, used as a database, cache, and message broker.\nReferences 知乎 超强、超详细Redis入门教程\nCSDN【Redis二三事】一套超详细的Redis学习教程（步骤图片+实操）\u0026mdash;第一集\nDetailed, with real business scenario examples.\nData Structures Redis includes five major data types: string (string), list (list), hash (hash), set (set), and sorted set (zset).\nstring The most basic data type in Redis. Each key corresponds to a value, which can be text, numbers, or binary data, with a maximum storage of 512MB. It supports operations like string concatenation, substring, increment, and decrement, and is suitable for scenarios such as caching data, counters (e.g., page view statistics), and distributed locks.\nBasic Operations 1 2 3 set key value get key del key Add/Modify Multiple Data 1 mset key1 value1 key2 value2... Get Multiple Data 1 mget key1 key2... Get Character Count of Data 1 2 3 4 5 strlen key // For example set name1 nosql strlen name1 // Output: 5 Append Information 1 2 3 4 5 append key value // For example append name1 name get name1 /* Output: nosqlname*/ Multi-data vs Single-data Operations Executing $n$ commands individually requires 1 send + 1 process + 1 return $n$ times Executing $n$ commands via a multi-data command requires 1 send + $n$ processes + 1 return When the data volume is large, the time consumed by multi-data commands is much less than individual commands Advanced Operations Increment/Decrement Numeric Data by a Specified Range 1 2 incrby key increment decrby key increment Numeric Operations on String Types 1 2 3 4 5 6 7 8 127.0.0.1:6379\u0026gt; set mynum \u0026#34;2\u0026#34; OK 127.0.0.1:6379\u0026gt; get mynum \u0026#34;2\u0026#34; 127.0.0.1:6379\u0026gt; incr mynum (integer) 3 127.0.0.1:6379\u0026gt; get mynum \u0026#34;3\u0026#34; When encountering numeric operations, Redis will automatically convert the string type into a number.\nConsiderations for String Type Numeric Operations Differences between execution status feedback and normal data manipulation feedback Indicates whether the operation results were successful (integer) 0 -\u0026gt; false Failed (integer) 1 -\u0026gt; true Succeeded Indicates the resulting value (integer) 3 -\u0026gt; 3 3 items (integer) 1 -\u0026gt; 1 1 item Data not found (nil) is equivalent to null Maximum data storage capacity 512MB Maximum range for numeric calculations (maximum value of long in java) 9223372036854775807 hash Similar to a small key-value store, suitable for storing structured data such as user information (ID, name, email, etc.). Compared to the String type, it saves more memory because multiple fields share the same key. You can operate on fields individually to avoid reading and modifying the entire object, making it suitable for storing objects, session information, etc.\nBasic Operations 1 2 3 4 hset key field value // Add/Modify data hget key field // Get data hgetall key hdel key field1 [field2] Add/Modify Multiple Data 1 hmset key field1 value1 field2 value2 Get Multiple Data 1 hmget key field1 field2... Get the Number of Fields in a Hash Table 1 hlen key Check if a Specified Field Exists in a Hash Table 1 hexists key field Advanced Operations Get All Field Names or Field Values in a Hash Table 1 2 hkeys key hvals key Increment the Numeric Value of a Specified Field by a Certain Range 1 hincrby key field increment Comprehensive Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 // Create hash and assign values 127.0.0.1:6379\u0026gt; HMSET user:001 username antirez password P1pp0 age 34 OK // List the contents of the hash 127.0.0.1:6379\u0026gt; HGETALL user:001 1) \u0026#34;username\u0026#34; 2) \u0026#34;antirez\u0026#34; 3) \u0026#34;password\u0026#34; 4) \u0026#34;P1pp0\u0026#34; 5) \u0026#34;age\u0026#34; 6) \u0026#34;34\u0026#34; // Change a specific value in the hash 127.0.0.1:6379\u0026gt; HSET user:001 password 12345 (integer) 0 // List the contents of the hash again 127.0.0.1:6379\u0026gt; HGETALL user:001 1) \u0026#34;username\u0026#34; 2) \u0026#34;antirez\u0026#34; 3) \u0026#34;password\u0026#34; 4) \u0026#34;12345\u0026#34; 5) \u0026#34;age\u0026#34; 6) \u0026#34;34\u0026#34; Considerations for Hash Type Data Operations The value under the hash type can only store strings, and no other data types are allowed, so there is no nesting. If no data is found, the corresponding value is (nil). Each hash can store 2^23-1 key-value pairs. The hash type is very close to the data storage format of objects, and object properties can be flexibly added. However, its initial design was not meant for storing large numbers of objects. Remember not to abuse it, and never use a hash as an object list. The hgetall operation can retrieve all properties. If there are too many fields inside, traversing the whole data will be very inefficient and may become a bottleneck for data access. list Based on a doubly linked list, it allows fast insertion and deletion of elements from the head (left) or tail (right), and supports reading elements by specifying an index range. It is suitable for implementing message queues, timelines (such as Weibo feeds), task scheduling, and other applications, especially for scenarios that require processing data according to insertion order.\nBasic Operations Add/Modify Data 1 2 lpush key value1 [value2]... // Push from the left rpush key value1 [value2]... /* Push from the right */ Get Data About lrange:\nlrange is used to get elements within a specified range -1 represents the last element List element indexing starts from position 0 1 2 3 lrange key start stop lindex key index lien key Get and Remove Data 1 2 lpop key rpop key Remove Specified Data About lrem:\nParameter count count \u0026gt; 0 -\u0026gt; Remove count matching values starting from the head (left) count \u0026lt; 0 -\u0026gt; Remove count matching values starting from the tail (right) count = 0 -\u0026gt; Remove all matching values (equivalent to removing all elements with this value from the list) If key does not exist Returns 0 Commonly used to remove duplicate elements from a list or clean up data 1 lrem key count value Advanced Operations Get and Remove Data Within a Specified Time About blpop:\nParameter key1 [key2...] Multiple list keys can be provided, and Redis will check these lists in order sequentially Parameter timeout timeout \u0026gt; 0 -\u0026gt; If the list is empty, wait up to timeout seconds timeout \u0026lt; 0 -\u0026gt; Block indefinitely until data is available Applicable to scenarios like task queues, producer-consumer models, etc. 1 2 blpop key1 [key2] timeout brpop key1 [key2] timeout Example\n1 2 3 4 5 RPUSH list1 \u0026#34;a\u0026#34; \u0026#34;b\u0026#34; \u0026#34;c\u0026#34; // List content: [\u0026#34;a\u0026#34;, \u0026#34;b\u0026#34;, \u0026#34;c\u0026#34;] BLPOP list1 10 // Pop \u0026#34;a\u0026#34;, returns [\u0026#34;list1\u0026#34;, \u0026#34;a\u0026#34;] BLPOP list1 10 // Pop \u0026#34;b\u0026#34;, returns [\u0026#34;list1\u0026#34;, \u0026#34;b\u0026#34;] BLPOP list1 10 // Pop \u0026#34;c\u0026#34;, returns [\u0026#34;list1\u0026#34;, \u0026#34;c\u0026#34;] BLPOP list1 10 /* List is empty, blocks for up to 10 seconds; if no new elements, returns nil */ Comprehensive Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // Create a new list called mylist and insert element \u0026#34;1\u0026#34; at the head 127.0.0.1:6379\u0026gt; lpush mylist \u0026#34;1\u0026#34; // Returns the current number of elements in mylist (integer) 1 // Insert element \u0026#34;2\u0026#34; on the right side of mylist 127.0.0.1:6379\u0026gt; rpush mylist \u0026#34;2\u0026#34; (integer) 2 // Insert element \u0026#34;0\u0026#34; on the left side of mylist 127.0.0.1:6379\u0026gt; lpush mylist \u0026#34;0\u0026#34; (integer) 3 // List elements in mylist from index 0 to 1 127.0.0.1:6379\u0026gt; lrange mylist 0 1 1) \u0026#34;0\u0026#34; 2) \u0026#34;1\u0026#34; // List elements in mylist from index 0 to the last element 127.0.0.1:6379\u0026gt; lrange mylist 0 -1 1) \u0026#34;0\u0026#34; 2) \u0026#34;1\u0026#34; 3) \u0026#34;2\u0026#34; Considerations for List Type Data Operations The data stored in a list is of the string type, and the total data capacity is limited to a maximum of 2^32-1 elements. A list has the concept of \u0026ldquo;indexes,\u0026rdquo; but manipulating data is usually done in the form of a \u0026ldquo;queue\u0026rdquo; (enqueue/dequeue) or a \u0026ldquo;stack\u0026rdquo; (push/pop). When retrieving all data, the end index is set to -1. A list allows pagination of data; typically, page 1 information comes from the list, while page 2 and more are loaded in a \u0026ldquo;database\u0026rdquo; fashion. set It consists of unique, unordered elements, supporting addition, deletion, and search operations with O(1) time complexity, and provides set operations like intersection, union, and difference. Suitable for applications like deduplication, mutual follows in recommendation systems, and tag management. Because duplicate elements are not allowed, it efficiently stores collections of distinct data.\nBasic Operations Add Data 1 sadd key member1 [member2] Get All Data 1 smembers key Delete Data 1 srem key member1 [member2] Get Total Amount of Set Data 1 scard key Check if Specified Data is in the Set 1 sismember key member Advanced Operations Randomly Get a Specified Number of Elements from the Set 1 srandmember key [count] Randomly Get an Element from the Set and Remove It 1 spop key Calculate the Intersection, Union, and Difference of Two Sets 1 2 3 sinter key1 [key2] sunion key1 [key2] sdiff key1 [key2] Calculate the Intersection, Union, and Difference of Two Sets and Store in a Specified Set 1 2 3 sinterstore destination key1 [key2] sunionstore destination key1 [key2] sdiffstore destination key1 [key2] Move Specified Data from the Source Set to the Destination Set 1 smove source destination member Comprehensive Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 // Add a new element \u0026#34;one\u0026#34; to the set myset 127.0.0.1:6379\u0026gt; sadd myset \u0026#34;one\u0026#34; (integer) 1 127.0.0.1:6379\u0026gt; sadd myset \u0026#34;two\u0026#34; (integer) 1 // List all elements in the set myset 127.0.0.1:6379\u0026gt; smembers myset 1) \u0026#34;one\u0026#34; 2) \u0026#34;two\u0026#34; // Check if element \u0026#34;one\u0026#34; is in the set myset, returning 1 indicates existence 127.0.0.1:6379\u0026gt; sismember myset \u0026#34;one\u0026#34; (integer) 1 // Check if element \u0026#34;three\u0026#34; is in the set myset, returning 0 indicates non-existence 127.0.0.1:6379\u0026gt; sismember myset \u0026#34;three\u0026#34; (integer) 0 // Create a new set yourset 127.0.0.1:6379\u0026gt; sadd yourset \u0026#34;1\u0026#34; (integer) 1 127.0.0.1:6379\u0026gt; sadd yourset \u0026#34;2\u0026#34; (integer) 1 127.0.0.1:6379\u0026gt; smembers yourset 1) \u0026#34;1\u0026#34; 2) \u0026#34;2\u0026#34; // Take the union of the two sets 127.0.0.1:6379\u0026gt; sunion myset yourset 1) \u0026#34;1\u0026#34; 2) \u0026#34;one\u0026#34; 3) \u0026#34;2\u0026#34; 4) \u0026#34;two\u0026#34; Considerations for Set Type Data Operations The set type does not allow duplicate data. If the added data already exists in the set, only one copy will be kept. Although set has the same storage structure as hash, it cannot utilize the space for storing values in the hash. zset Builds upon Set by adding a score to each element and ordering them by this score, supporting ranged queries, score-based ranking, and other operations. Suitable for leaderboards (like game scores), priority queues (like scheduled tasks), and time-sorted data storage (like article reading rankings), where sorting by weight is needed.\nBasic Operations Add Data 1 zadd key score1 member1 [score2 member2] Get All Data 1 2 zrange key start stop [WITHSCORES] // Display in ascending order zrevrange key start stop [WITHSCORES] // Display in descending order Delete Data 1 zrem key member [member...] Get Data by Condition 1 2 zrangebyscore key min max [Withscores][limit] zrevrangebyscore key max min [withscores] Delete Data by Condition 1 2 zremrangebyrank key start stop // Delete by index zremrangebyscore key min max // Delete by range Comprehensive Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 // Add a new sorted set myzset, insert the element baidu.com, and assign it score 1: 127.0.0.1:6379\u0026gt; zadd myzset 1 baidu.com (integer) 1 // Add the element 360.com to myzset, assigning it score 3 127.0.0.1:6379\u0026gt; zadd myzset 3 360.com (integer) 1 // Add the element google.com to myzset, assigning it score 2 127.0.0.1:6379\u0026gt; zadd myzset 2 google.com (integer) 1 // List all elements of myzset along with their scores, showing that myzset is already ordered. 127.0.0.1:6379\u0026gt; zrange myzset 0 -1 with scores 1) \u0026#34;baidu.com\u0026#34; 2) \u0026#34;1\u0026#34; 3) \u0026#34;google.com\u0026#34; 4) \u0026#34;2\u0026#34; 5) \u0026#34;360.com\u0026#34; 6) \u0026#34;3\u0026#34; // List only the elements of myzset 127.0.0.1:6379\u0026gt; zrange myzset 0 -1 1) \u0026#34;baidu.com\u0026#34; 2) \u0026#34;google.com\u0026#34; 3) \u0026#34;360.com\u0026#34; Advanced Operations Get Total Amount of Set Data 1 2 zcard key zcount key min max Set Intersection and Union Operations 1 2 zinterstore destination numkeys key [key …] zunionstore destination numkeys key [key …] Considerations for sorted_set Type Data Operations The data storage space for the score is 64-bit; for an integer, the range is -9007199254740992 to 9007199254740992. The data saved in the score can also be a double-precision floating-point number. Based on the characteristics of double-precision floating-point numbers, precision might be lost, so it should be used with caution. The underlying storage of sorted_set is still based on the set structure. Therefore, data cannot be duplicated. If identical data is added repeatedly, the score value will simply be overwritten repeatedly, keeping the result of the last modification.\n","date":"2025-03-18T15:41:00+08:00","permalink":"/en/p/redis/","title":"Redis"},{"content":"References 自学SQL网(教程 视频 练习全套)\nLearn the knowledge and immediately have exercises to do.\nMySQL 教程 | 菜鸟教程\nMySQL总结_sq连表-CSDN博客\n主键 - SQL教程 - 廖雪峰的官方网站\nBest viewing experience tutorial.\nRelational Model Quoted from 关系模型 - SQL教程 - 廖雪峰的官方网站\nPrimary Key In a relational database, each row of data in a table is called a record. A record is composed of multiple fields. For example, two records in the students table:\nid class id name gender score 1 1 Xiaoming M 90 2 1 Xiaohong F 95 For a relational table, there is a very important constraint: any two records cannot be duplicated. Non-duplication doesn\u0026rsquo;t mean two records are not entirely identical, but rather that different records can be uniquely distinguished by a certain field, and this field is called the primary key.\nFor example, assuming we use the name field as the primary key, then we can uniquely identify a record through the name Xiaoming or Xiaohong. However, with this setup, we couldn\u0026rsquo;t store students with the same name, because inserting two records with the same primary key is not allowed.\nThe most crucial requirement for a primary key is: once a record is inserted into the table, it\u0026rsquo;s best not to modify the primary key anymore, because the primary key is used to uniquely locate a record. Modifying the primary key will cause a series of cascading effects.\nBecause the role of the primary key is so important, how to select a primary key will have a significant impact on business development. If we use a student\u0026rsquo;s ID number as the primary key, it seems to uniquely locate a record. However, an ID number is also a business scenario. If the ID number\u0026rsquo;s length increases or needs to be changed, and as a primary key, it has to be modified, it will have a severe impact on the business.\nTherefore, a basic principle for selecting a primary key is: do not use any business-related fields as the primary key.\nThus, fields that seem unique like ID numbers, mobile phone numbers, and email addresses, should not be used as primary keys.\nThe best primary key is a field completely unrelated to the business, and we generally name this field id. Common types suitable for the id field are:\nAuto-incrementing integer type: The database will automatically assign an auto-incrementing integer to each record upon insertion, so we don\u0026rsquo;t have to worry at all about primary key duplication, nor do we need to pre-generate the primary key ourselves; Globally Unique Identifier (GUID) type: Also known as UUID, using a globally unique string as a primary key, such as 8f55d96b-8acc-4636-8cb8-76bf8abc2f57. The GUID algorithm ensures that the strings generated by any computer at any time are different through network card MAC addresses, timestamps, and random numbers. Most programming languages have built-in GUID algorithms, allowing you to pre-calculate the primary key. For most applications, an auto-incrementing primary key is usually sufficient. The primary key we defined in the students table is also of BIGINT NOT NULL AUTO_INCREMENT type.\nIf an INT auto-increment type is used, an error will occur when the number of records in a table exceeds 2,147,483,647 (about 2.1 billion) as it reaches the upper limit. Using the BIGINT auto-increment type allows for a maximum of about 9.22 quintillion records.\nSummary The primary key is the unique identifier for a record in a relational table. Selecting a primary key is very important: a primary key should not have any business meaning, and should instead use a BIGINT auto-increment or GUID type. A primary key should also not allow NULL.\nMultiple columns can be used as a composite primary key, but composite primary keys are not commonly used.\nForeign Key When we uniquely identify records using a primary key, we can determine any student\u0026rsquo;s record in the students table:\nid name other columns\u0026hellip; 1 Xiaoming \u0026hellip; 2 Xiaohong \u0026hellip; We can also determine any class record in the classes table:\nid name other columns\u0026hellip; 1 Class 1 \u0026hellip; 2 Class 2 \u0026hellip; But how do we determine which class a record in the students table belongs to, for example, Xiaoming with id=1?\nSince one class can have multiple students, in the relational model, the relationship between these two tables can be called \u0026ldquo;one-to-many\u0026rdquo;, meaning one record in the classes table can correspond to multiple records in the students table.\nTo express this one-to-many relationship, we need to add a class_id column to the students table, letting its value correspond to a specific record in the classes table:\nid class_id name other columns\u0026hellip; 1 1 Xiaoming \u0026hellip; 2 1 Xiaohong \u0026hellip; 5 2 Xiaobai \u0026hellip; This way, we can directly locate which record in the classes table a students table record should correspond to based on the class_id column.\nXiaoming\u0026rsquo;s class_id is 1, therefore, the corresponding record in the classes table is Class 1 with id=1; Xiaohong\u0026rsquo;s class_id is 1, therefore, the corresponding record in the classes table is Class 1 with id=1; Xiaobai\u0026rsquo;s class_id is 2, therefore, the corresponding record in the classes table is Class 2 with id=2. In the students table, through the class_id field, data can be associated with another table, and such a column is called a foreign key.\nA foreign key is not implemented through the column name, but by defining a foreign key constraint:\n1 2 3 4 ALTER TABLE students ADD CONSTRAINT fk_class_id FOREIGN KEY (class_id) REFERENCES classes (id); Here, the foreign key constraint name fk_class_id can be anything, FOREIGN KEY (class_id) specifies class_id as the foreign key, and REFERENCES classes (id) specifies that this foreign key will be linked to the id column of the classes table (i.e., the primary key of the classes table).\nBy defining a foreign key constraint, a relational database can guarantee that invalid data cannot be inserted. That is, if a record with id=99 doesn\u0026rsquo;t exist in the classes table, the students table cannot insert a record with class_id=99.\nSince foreign key constraints reduce database performance, most Internet applications, in pursuit of speed, do not set foreign key constraints and solely rely on the application itself to ensure logical correctness. In this case, class_id is just an ordinary column, it simply acts as a foreign key.\nTo delete a foreign key constraint, it is also achieved via ALTER TABLE:\n1 2 ALTER TABLE students DROP FOREIGN KEY fk_class_id; Note: Deleting a foreign key constraint does not delete the foreign key column itself. Deleting a column is achieved via DROP COLUMN ....\nMany-to-Many By associating a foreign key in one table with another table, we can define a one-to-many relationship. Sometimes, we also need to define a \u0026ldquo;many-to-many\u0026rdquo; relationship. For example, one teacher can correspond to multiple classes, and one class can also correspond to multiple teachers. Thus, there is a many-to-many relationship between the class table and the teacher table.\nA many-to-many relationship is actually implemented through two one-to-many relationships. That is, by using an intermediate table to connect two one-to-many relationships, a many-to-many relationship is formed:\nteachers table:\nid name 1 Mr. Zhang 2 Mr. Wang 3 Mr. Li 4 Mr. Zhao classes table:\nid name 1 Class 1 2 Class 2 Intermediate table teacher_class connecting two one-to-many relationships:\nid teacher_id class_id 1 1 1 2 1 2 3 2 1 4 2 2 5 3 1 6 4 2 Through the intermediate table teacher_class, we can know the relationship from teachers to classes:\nMr. Zhang with id=1 corresponds to Class 1 and Class 2 with id=1,2; Mr. Wang with id=2 corresponds to Class 1 and Class 2 with id=1,2; Mr. Li with id=3 corresponds to Class 1 with id=1; Mr. Zhao with id=4 corresponds to Class 2 with id=2. Similarly, we can know the relationship from classes to teachers:\nClass 1 with id=1 corresponds to Mr. Zhang, Mr. Wang, and Mr. Li with id=1,2,3; Class 2 with id=2 corresponds to Mr. Zhang, Mr. Wang, and Mr. Zhao with id=1,2,4; Therefore, through the intermediate table, we\u0026rsquo;ve defined a \u0026ldquo;many-to-many\u0026rdquo; relationship.\nOne-to-One A one-to-one relationship means that a record in one table corresponds to a single, unique record in another table.\nFor instance, every student in the students table can have their own contact information. If we store the contact details in another table contacts, we can obtain a \u0026ldquo;one-to-one\u0026rdquo; relationship:\nid student_id mobile 1 1 135xxxx6300 2 2 138xxxx2209 3 5 139xxxx8086 Some attentive readers might ask, since it\u0026rsquo;s a one-to-one relationship, why not just add a mobile column to the students table so they can be merged into one?\nIf the business logic allows, combining the two tables into one is entirely possible. However, sometimes if a student doesn\u0026rsquo;t have a mobile number, a corresponding record wouldn\u0026rsquo;t exist in the contacts table. In fact, a one-to-one relationship, strictly speaking, is the contacts table having a one-to-one correspondence with the students table.\nFurthermore, some applications split a large table into two one-to-one tables to separate frequently read fields from infrequently read ones for better performance. For example, splitting a large user table into a basic user information table user_info and a detailed user information table user_profiles. Most of the time, only the user_info table needs to be queried without querying user_profiles, which improves the query speed.\nSummary Relational databases can implement one-to-many, many-to-many, and one-to-one relationships using foreign keys. Foreign keys can either be constrained by the database or set without constraints, relying solely on the application\u0026rsquo;s logic to guarantee integrity.\nIndex In a relational database, if there are tens of thousands or even hundreds of millions of records, you need to use indexes to achieve very fast query speeds.\nAn index is a strictly pre-sorted data structure in a relational database for the values of one or multiple columns. By utilizing indexes, the database system doesn\u0026rsquo;t have to scan the entire table, but directly pinpoints the records that meet the criteria, greatly speeding up queries.\nFor example, for the students table:\nid class_id name gender score 1 1 Xiaoming M 90 2 1 Xiaohong F 95 3 1 Xiaojun M 88 If you frequently query based on the score column, you can create an index on the score column:\n1 2 ALTER TABLE students ADD INDEX idx_score (score); Using ADD INDEX idx_score (score) creates an index named idx_score that utilizes the score column. The index name is arbitrary, and if the index comprises multiple columns, they can be written sequentially in parentheses, for example:\n1 2 ALTER TABLE students ADD INDEX idx_name_score (name, score); The efficiency of an index depends on whether the values of the indexed column are dispersed—that is, the more distinct the column\u0026rsquo;s values are, the higher the index efficiency. Conversely, if a column\u0026rsquo;s records contain a large number of identical values, like the gender column where roughly half the values are M and the other half are F, creating an index for that column makes no sense.\nYou can create multiple indexes for a single table. The advantage of indexes is improved query efficiency, while the disadvantage is that when inserting, updating, and deleting records, the index needs to be modified simultaneously. Therefore, the more indexes there are, the slower the operations to insert, update, and delete records become.\nFor primary keys, relational databases will automatically create a primary key index for them. Using a primary key index is the most efficient because the primary key guarantees absolute uniqueness.\nUnique Index When designing relational data tables, columns that appear unique, such as ID numbers and email addresses, should not be modeled as primary keys due to having business significance.\nHowever, based on business requirements, these columns still have a uniqueness constraint: meaning two records cannot store the exact same ID number. At this point, we can add a unique index to this column. For example, assuming the name in the students table cannot be duplicated:\n1 2 ALTER TABLE students ADD UNIQUE INDEX uni_name (name); Through the UNIQUE keyword, we have added a unique index.\nYou can also add a unique constraint to a certain column without creating a unique index:\n1 2 ALTER TABLE students ADD CONSTRAINT uni_name UNIQUE (name); In this scenario, the name column doesn\u0026rsquo;t have an index, but still maintains a uniqueness guarantee.\nRegardless of whether an index is created or not, using a relational database will make no difference to the user and the application. This implies that when we query the database, if a matching index is available, the database system will automatically utilize the index to boost query efficiency. If no index exists, the query will still execute normally, but at a slower speed. Hence, indexes can be progressively optimized during database usage.\nSummary Creating indexes for database tables can accelerate query speeds;\nCreating unique indexes acts to guarantee the uniqueness of the values in a specific column;\nDatabase indexes are transparent to both users and applications.\nSELECT Query 1 2 3 4 5 6 7 8 SELECT column, another_column, … FROM mytable WHERE condition AND/OR another_condition AND/OR …; SELECT * FROM movies WHERE year\u0026gt;=2010 AND length_minutes\u0026lt;120; Filtering Numeric Attribute Columns Keyword Example =, !=, \u0026lt; \u0026lt;=, \u0026gt;, ≥ col_name != 4 BETWEEN … AND … Between two numbers col_name BETWEEN 1.5 AND 10.5 NOT BETWEEN … AND … col_name NOT BETWEEN 1 AND 10 IN (…) In a list col_name IN (2, 4, 6) NOT IN (…) col_name NOT IN (1, 3, 5) Filtering String Attribute Columns = Exactly equals != or \u0026lt;\u0026gt; Does not equal LIKE Equivalent to = without wildcards NOT LIKE Equivalent to != without wildcards % Wildcard col_name LIKE \u0026ldquo;%AT%\u0026rdquo; _(Underscore) col_name LIKE \u0026ldquo;AN_\u0026rdquo; 1 2 3 4 5 6 7 /* Wildcards */ col_name LIKE \u0026#34;%AT%\u0026#34;; /* \u0026#34;AT\u0026#34;, \u0026#34;AT*...\u0026#34;, \u0026#34;...*AT\u0026#34;, \u0026#34;...*AT*...\u0026#34; all satisfy the condition There can be arbitrary characters before and after \u0026#34;AT\u0026#34; */ col_name LIKE \u0026#34;AN_\u0026#34;; /* \u0026#34;AND\u0026#34; is okay, \u0026#34;AN\u0026#34;, \u0026#34;ANDD\u0026#34; are not Similar to \u0026#39;%\u0026#39;, but only represents a single character */ Filtering / Sorting 1 2 3 4 /* Use the DISTINCT keyword to specify that a certain attribute column or columns return uniquely */ SELECT DISTINCT column, another_column, … FROM mytable WHERE condition(s); 1 2 3 4 5 6 7 8 9 10 11 /* Sort the results based on one or more attribute columns */ SELECT column, another_column, … FROM mytable WHERE condition(s) /* ASC for ascending or DESC for descending */ ORDER BY column ASC/DESC /* LIMIT specifies how many rows of results to return OFFSET specifies from which row to start returning */ LIMIT num_limit OFFSET num_offset; /* Regarding OFFSET, if you want to output the Nth row (and onwards) the parameter for OFFSET must be N-1 */ Example Problem SELECT Review Problems Using Expressions in Queries Actually, AS is not only used for assigning aliases to expressions; standard attribute columns and even tables can be assigned an alias, making the SQL easier to grasp.\n1 2 3 4 5 -- Examples of aliasing attribute columns and tables SELECT column AS better_column_name, … FROM a_long_widgets_table_name AS mywidgets INNER JOIN widget_sales ON mywidgets.id = widget_sales.widget_id; 1 2 3 4 5 -- Example containing an expression SELECT particle_speed / 2.0 AS half_particle_speed -- Divided the results by 2 FROM physics_data WHERE ABS(particle_position) * 10.0 \u0026gt;500 -- (The condition requires the absolute value of this attribute multiplied by 10 to be greater than 500); Performing Statistics in Queries 1 2 3 SELECT AGG_FUNC(column_or_expression) AS aggregate_description, … FROM mytable WHERE constraint_expression; Common statistical functions:\nFunction Description COUNT(*) COUNT(column) Counting! COUNT(*) counts the number of data rows, COUNT(column) counts the number of non-NULL rows in the column MIN(column) Finds the row with the smallest column value MAX(column) Finds the row with the largest column value AVG(column) Takes the average of the column for all rows SUM(column) Sums the column for all rows Grouped Statistics The GROUP BY data grouping syntax can group data by a specific col_name. For example, GROUP BY Year means grouping the data by year, placing data from the same year into the same group. If a statistical function is combined with GROUP BY, then the statistical result is constrained to the data within each group. The number of data rows resulting from a GROUP BY grouping is exactly the number of groups. For instance, with GROUP BY Year, however many years exist in the overall data, that same number of data rows will be returned, regardless of whether a statistical function is applied.\n1 2 3 4 5 -- Statistics using groupings SELECT AGG_FUNC(column_or_expression) AS aggregate_description, … FROM mytable WHERE constraint_expression GROUP BY column; In the GROUP BY grouping syntax, we know that the database first filters the data with WHERE, and then groups the results. What if we want to filter out a few more rows from the already grouped data? A less commonly utilized syntax, the HAVING syntax, will be employed to resolve this problem; it allows further SELECT filtering on the post-grouping data.\nThe HAVING syntax is identical to WHERE, except the result set it operates on is different. In the case of the small datasets in our examples, HAVING may not seem very useful, but when your data volume hits the thousands or millions with numerous attributes, it can be of immense help.\nJOIN Connections Database Normal Forms Database normal forms are standardizations for data table design. Under these normal form guidelines, the duplicate data stored by each schema is reduced to a minimum (assisting in maintaining data coherence). Meanwhile, under database normalization, tables no longer possess rigid data decoupling, permitting independent scaling (i.e., for example, the growth of car engines and cars is completely decoupled).\n1 2 3 4 5 6 7 8 SELECT column, another_table_column, … FROM mytable -- The primary table INNER JOIN another_table -- The table to be joined ON mytable.id = another_table.id -- Imagine the primary key join mentioned earlier, two identical ones are merged into 1 row WHERE condition(s) ORDER BY column, … ASC/DESC LIMIT num_limit OFFSET num_offset; The relation association described by the ON condition in this example:\nINNER JOIN (Inner) Connections First, combine the data from two tables together, any data from either table that fails to find a counterpart via ID will be discarded. At this stage, you may envision the post-join data as an aggregation of two tables, where the remaining SQL clauses will continue their execution upon this combination (imagine it\u0026rsquo;s exactly the same as the previous single-table operations). Another method for comprehending an INNER JOIN is to think of an INNER JOIN as the intersection of two sets.\nOUTER JOIN Outer Connections 1 2 3 4 5 6 7 8 -- Perform multi-table queries using LEFT/RIGHT/FULL JOINs SELECT column, another_column, … FROM mytable INNER/LEFT/RIGHT/FULL JOIN another_table ON mytable.id = another_table.matching_id WHERE condition(s) ORDER BY column, … ASC/DESC LIMIT num_limit OFFSET num_offset; When executing a connection on Table A to Table B, a LEFT JOIN retains all elements of A, unaffected by if they match successfully with B, conversely, a RIGHT JOIN retains all elements present inside B. Concluding, a FULL JOIN will simultaneously conserve all row items originating from both A and B regardless of achieving any matches.\nEstablishing a 1-to-1 linkage connecting two table schemas reserves A\u0026rsquo;s or B\u0026rsquo;s intrinsic entries, where if a provided column doesn\u0026rsquo;t reside within the opposite table, a NULL is to fill up the consequential data.\nQuery Execution Order 1 2 3 4 5 6 7 8 9 10 -- This is the complete SELECT query SELECT DISTINCT column, AGG_FUNC(column_or_expression), … FROM mytable JOIN another_table ON mytable.column = another_table.column WHERE constraint_expression GROUP BY column HAVING constraint_expression ORDER BY column ASC/DESC LIMIT count OFFSET COUNT; 1. FROM and JOIN FROM or JOIN are the first to execute, pinpointing the overall constraints behind a data range. Should different tables require a JOIN, an intermediate temporary Table may generate, subsequently serving later downstream operations. Fundamentally, this initial block could be summarized as recognizing the genesis data source table (inclusive of any temp tables).\n2. WHERE Now that we have ascertained the data schema source, the WHERE statement undertakes data screening on this data origin depending exclusively on specified parameters, ejecting whichever data entries do not correspond to the requested qualifications. All screened col attributes ought to materialize out of the tables identified through FROM. Due to aliases plausibly embodying non-executed formulas, AS aliases cannot be invoked dynamically inside this specific phase.\n3. GROUP BY If you implemented GROUP BY for classifying entries, the subsequent GROUP BY groups earlier data alongside computations for tallies, shrinking resultant outcomes aligned closer to the aggregate number of partitions. Therefore, any extra parameters not part of the classified dimensions get wiped.\n4. HAVING If you implemented GROUP BY clustering, HAVING conducts additional sifting onto corresponding outcome schemas immediately after finishing earlier groupings. AS aliases conversely stay deactivated for use amidst this procedure phase.\n5. SELECT After finalizing findings, SELECT is implemented across columns comprising the solution output to perform simple sifting operations or computational mathematics aimed specifically at establishing the explicit contents outputted.\n6. DISTINCT Should elements suffer uninvited repetition, DISTINCT bears the duty to enforce deduplication.\n7. ORDER BY Where the generated finding bounds stay solidly solidified, ORDER BY categorizes sequence alignments over the end-product. Since standard evaluation across variables enveloped in SELECT rests fulfilled. Thus here entails dynamic utility via establishing AS aliases.\n8. LIMIT / OFFSET As the conclusion steps, LIMIT beside OFFSET carve out compartmental slices directly abstracted right after sorted listings.\nConclusion Not absolutely every SQL statement relies intrinsically around comprehensively capturing all potential terminologies, but nimbly navigating via varied compositional constructs paired securely closely with intuitive SQL theoretical execution foundations ensures superior resolutions handling localized database hurdles exclusively bounded natively on SQL, rather than blindly migrating all problematic dilemmas towards application code programming abstractions.\nNULL If you overlook committing distinct inputs allocated for designated database columns, it prominently projects a NULL. Thus, a recurrent solution allocates structural default values, typically like standardizing numeric fields to 0, combined with calibrating textual dimensions matching \u0026quot;\u0026quot; string dimensions. Nevertheless, where intrinsic characteristics authentically mirror absolute intentional NULL meanings, please pay cautious consideration against recklessly standardizing alternative defaults versus legitimately accepting pure NULL. (For example, whilst establishing cross-record average arithmetic, applying 0 automatically registers evaluations consequently contaminating accurate calculations, although deploying pure NULL exclusively bypasses erroneous quantitative inclusions entirely).\nAnother situational bracket proving highly restrictive preventing NULL generations involves executing outer-joining multi-schema combinations earlier narrated, considering whenever quantitative data discrepancies present amongst A contrasting B schemas, one absolutely depends directly around NULL to seamlessly patch up dimensions natively missing references. Tackling equivalent predicaments, manipulating logic gates bounded internally with IS NULL identically accompanied by IS NOT NULL expertly determines specific boolean verifications judging if a definitive spatial coordinate explicitly registers fundamentally parallel corresponding precisely toward NULL.\n1 2 3 4 5 SELECT column, another_column, … FROM mytable WHERE column IS/IS NOT NULL AND/OR another_condition AND/OR …; Modifying Data Partially quoted from 修改数据 - SQL教程 - 廖雪峰的官方网站\nThe bedrock operational foundations revolving around relational database constructs categorically parallel CRUD manipulations: Create, Retrieve, Update, Delete. Addressing specific interrogations, we comprehensively expounded exactly describing elaborate dynamic utilities encompassing the SELECT statements effectively.\nTransitioning precisely over dimensional expansions, excisions alongside modifications dynamically reflect parallel standard matching respective descriptive SQL queries:\nINSERT: Inject original fresh record items; UPDATE: Overwrite formerly registered content; DELETE: Terminate and rip historical documented dimensions out. We will singularly unpack distinct structural application semantics independently driving these tri-fold modifying algorithmic assertions consecutively.\nINSERT Insertion Referenced in MySQL 插入数据 | 菜鸟教程\nFor illustration, injecting completely novel unrepresented components directly supplementing an internal user structural matrix primarily lists requisite sequential locational destinations dynamically accompanied by parallel structured inputs categorically sequentially encoded straightly following downstream VALUES parameters:\n1 2 3 4 5 INSERT INTO table_name (column1, column2, column3, ...) VALUES (value1, value2, value3, ...); -- INSERT INTO users (username, email, birthdate, is_active) VALUES (\u0026#39;test\u0026#39;, \u0026#39;test@runoob.com\u0026#39;, \u0026#39;1990-01-01\u0026#39;, true); Whenever attempting holistic whole scale broad injections globally accommodating all columns (which ultimately equates loading literal standalone rows natively), explicit dimensional coordinates remain perfectly dismissible entirely voluntarily:\n1 2 INSERT INTO users VALUES (NULL,\u0026#39;test\u0026#39;, \u0026#39;test@runoob.com\u0026#39;, \u0026#39;1990-01-01\u0026#39;, true); Moreover it is similarly trivial establishing multifaceted multi-tier simultaneous bundled row records entirely unified through simply assigning combined aggregated sets grouped intrinsically under VALUES clauses, sequentially mapping individualized items cleanly compartmentalized bound uniquely using encapsulated parenthesis formatted like (...) neatly divided employing systematic commas , effectively:\n1 2 3 4 5 INSERT INTO users (username, email, birthdate, is_active) VALUES (\u0026#39;test1\u0026#39;, \u0026#39;test1@runoob.com\u0026#39;, \u0026#39;1985-07-10\u0026#39;, true), (\u0026#39;test2\u0026#39;, \u0026#39;test2@runoob.com\u0026#39;, \u0026#39;1988-11-25\u0026#39;, false), (\u0026#39;test3\u0026#39;, \u0026#39;test3@runoob.com\u0026#39;, \u0026#39;1993-05-03\u0026#39;, true); UPDATE Undergoing Updates MySQL UPDATE 更新 | 菜鸟教程\n1 2 3 UPDATE table_name SET column1 = value1, column2 = value2, ... WHERE condition; Parameter Details:\ntable_name is the name of the table wherein data rests scheduled anticipating functional updates. column1, column2, \u0026hellip; highlight discrete positional columns formally recognized awaiting modifications outright. value1, value2, \u0026hellip; outline structurally novel information intrinsically scheduled effectively overwriting earlier dated legacy materials seamlessly. WHERE condition represents conditional non-mandatory clauses functionally singling filtering parameters strictly governing explicitly updated rows. Omitting completely identical WHERE clauses subsequently universally upgrades holistically comprehensive tables unconditionally totally. Complementary Additions:\nYou remain effectively legally permitted upgrading independently singular and synchronously mixed multifield properties jointly. You seamlessly retain liberties programming absolutely disparate multifaceted conditions freely enveloped universally underneath WHERE subqueries. You natively manage comprehensive wide-scale adjustments immediately localized purely isolated targeting standalone tables independently. Implementing conditional bounds effectively anchored via WHERE structures provides phenomenally irreplaceable instrumental control purposefully identifying discrete unique tabular boundaries specifically designated requiring surgical alterations effectively.\n1 2 3 4 -- Update the value of a single column UPDATE employees SET salary = 60000 WHERE employee_id = 101; 1 2 3 4 -- Update the values of multiple columns UPDATE orders SET status = \u0026#39;Shipped\u0026#39;, ship_date = \u0026#39;2023-03-01\u0026#39; WHERE order_id = 1001; 1 2 3 4 -- Use an expression to update a value UPDATE products SET price = price * 1.1 WHERE category = \u0026#39;Electronics\u0026#39;; 1 2 3 4 5 6 7 8 -- Update using a value from a subquery UPDATE customers SET total_purchases = ( SELECT SUM(amount) FROM orders WHERE orders.customer_id = customers.customer_id ) WHERE customer_type = \u0026#39;Premium\u0026#39;; While authentically negotiating functionally operational structured rational core relational database engines like specifically MySQL inherently, deploying UPDATE statements generally consistently echoes feedback indicating successfully modified elements matching corresponding filtering parameters aligned alongside comprehensive WHERE clauses.\nShowcasing structural examples, notably modernizing distinct elements bound identifying primarily tracking id=1 exactly natively:\n1 2 3 mysql\u0026gt; UPDATE students SET name=\u0026#39;大宝\u0026#39; WHERE id=1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 MySQL distinctly natively signals explicit responsive 1, transparently immediately visible inspecting completely rendered identical results cleanly reading Rows matched: 1 Changed: 1 accurately unequivocally.\nShould corresponding updates primarily affect tracking exclusively registering elements possessing intrinsically tracking id=999 similarly:\n1 2 3 mysql\u0026gt; UPDATE students SET name=\u0026#39;大宝\u0026#39; WHERE id=999; Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 MySQL naturally natively signifies respective conditional returning 0, explicitly transparently visible natively inspecting distinctly echoing outputs explicitly validating Rows matched: 0 Changed: 0 accurately clearly.\nDELETE Removal Basic fundamental grammar orchestrating structured DELETE syntaxes structurally aligns identically resembling:\n1 DELETE FROM \u0026lt;Table Name\u0026gt; WHERE ...; Applying illustrative formatting, attempting functionally eliminating distinct entities harboring unique mapping identifiers securely registered notably id=1 localized strictly natively bounded among active students properties structurally explicitly fundamentally entails encoding practically:\n1 2 3 4 -- Delete the record with id=1: DELETE FROM students WHERE id=1; -- Query and observe the results: SELECT * FROM students; Deliberately paying meticulous attention regarding operational definitions surrounding conditional bounding WHERE elements dynamically filtering out structurally precise parameters mandating prompt executions authentically mirrors exact operations mirroring identical capabilities natively accessible inherently inside structural comparable UPDATE variants. As identically similarly analogous identically dynamically natively, deploying structural functional declarative DELETE operations inherently successfully eliminates identically diverse extensive multiple item datasets independently automatically simultaneously effectively cleanly essentially perfectly similarly entirely efficiently fundamentally:\n1 2 3 4 -- Delete records with id=5,6,7: DELETE FROM students WHERE id\u0026gt;=5 AND id\u0026lt;=7; -- Query and observe the results: SELECT * FROM students; Assuming implicitly applied uniquely distinguishing boundary conditions explicitly failing matching corresponding any active database contents intrinsically perfectly effectively directly completely natively prevents completely DELETE statements automatically triggering system errors, identically mirroring similar outcomes consequently completely yielding total absences involving active eliminations occurring effectively whatsoever natively. Exemplifying analogous parallel dynamically practically:\n1 2 3 4 -- Delete the record with id=999: DELETE FROM students WHERE id=999; -- Query and observe the results: SELECT * FROM students; Finally, it is essential to be extremely careful. Similar to UPDATE, a DELETE statement without a WHERE condition will delete the data for the entire table:\n1 DELETE FROM students; In this case, all records in the entire table will be deleted. Therefore, you must also be very careful when executing a DELETE statement. It is best to first use a SELECT statement to test whether the WHERE condition filters out the expected set of records, and then use DELETE to delete them.\nWhen using a true relational database like MySQL, the DELETE statement will also return the number of deleted rows and the number of rows matching the WHERE condition.\nFor example, individually executing the deletion of records with id=1 and id=999:\n1 2 3 4 5 mysql\u0026gt; DELETE FROM students WHERE id=1; Query OK, 1 row affected (0.01 sec) mysql\u0026gt; DELETE FROM students WHERE id=999; Query OK, 0 rows affected (0.01 sec) CREATE Creation MySQL 创建数据表 | 菜鸟教程\n1 2 3 4 5 6 7 8 -- User table instance CREATE TABLE users ( id INT AUTO_INCREMENT PRIMARY KEY, username VARCHAR(50) NOT NULL, email VARCHAR(100) NOT NULL, birthdate DATE, is_active BOOLEAN DEFAULT TRUE ); Instance Analysis:\nid: User id, integer type, auto-incrementing, acting as the primary key. username: Username, variable-length string, empty values not allowed. email: User email, variable-length string, empty values not allowed. birthdate: User\u0026rsquo;s date of birth, date type. is_active: Whether the user has been activated, boolean type, default value is true. The above is just a simple instance utilizing some common data types including INT, VARCHAR, DATE, BOOLEAN. You can choose different data types depending on actual needs.\nThe AUTO_INCREMENT keyword is deployed for creating an auto-incrementing column, and PRIMARY KEY is used for defining a primary key.\nIf you desire to assign the data engine, character set, and sorting rules upon creating the table, you may employ CHARACTER SET alongside COLLATE clauses:\n1 2 3 4 CREATE TABLE mytable ( id INT PRIMARY KEY, name VARCHAR(50) ) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; ","date":"2025-03-17T20:06:34+08:00","permalink":"/en/p/mysql/","title":"MySQL"}]