I have a basic question about hamming distances, something confuse me in the book I’m reading about it.

Let’s assume we have a codeword $ y$ that received an error $ e$ . Thus we had the following event:

$ $ y_0 \rightarrow y’=y_0+e $ $

In Nielsen & Chuang, page 449, he says:

Provided the probability of a bit flip is less than 1/2, the most likely codeword to have been encoded is the codeword y which minimizes the number of bit flips needed to get from y to y’,that is,which minimizes wt(e)= d(y, y’).

Where $ d(a,b)$ is the Hamming distance between $ a$ and $ b$ , and $ wt(e)$ is the Hamming weight of $ e$ , which is $ d(e,0)$ .

My question is the following:

I agree that if there is more probability of no bit flip than the probability to have one, then the most likely codeword is the closest one from $ y’$ , thus a codeword $ y$ minimising $ d(y,y’)$ .

However, as he writes $ d(y,y’)=wt(e)$ , he seems to assume that the closest vector from $ y’$ is necesseraly the one we encoded. Which for me is not true in general, the error could have put us closer to another encoded word than the one we encoded right ?

Where is my misunderstanding here ?