What is the difference between a lock convoy and lock/thread contention?

From wikipedia on lock convoy:

A lock convoy occurs when multiple threads of equal priority contend repeatedly for the same lock. Unlike deadlock and livelock situations, the threads in a lock convoy do progress; however, each time a thread attempts to acquire the lock and fails, it relinquishes the remainder of its scheduling quantum and forces a context switch. The overhead of repeated context switches and underutilization of scheduling quanta degrade overall performance.

From wikipedia on lock/thread contention contention:

lock contention: this occurs whenever one process or thread attempts to acquire a lock held by another process or thread. The more fine-grained the available locks, the less likely one process/thread will request a lock held by the other. (For example, locking a row rather than the entire table, or locking a cell rather than the entire row.);

Could somebody please elaborate a bit further on both of those things? To me it seems like they are essentially the same, or if they are not, then surely lock contention causes a lock convoy. Is that the case or are they separate and independent concepts? Also, I don’t understand the sentence "it relinquishes the remainder of its scheduling quantum and forces a context switch".

Design choice to avoid lock contention

I have a network application F. It receives requests from one (or many) client network function. F can handle multi-client requests using an epoll loop. F maintains a state machine for each client(user). F also maintains some context for each user or client. Whenever a message is received from a client network function (by one user), F processes the message and if required fetch context specific to this user and updates the context.

Currently, the contexts of different users are maintained in a C++ STL map.

map<clientId,context> userMap; 

where clientId is an integer and context is a struct containing user-specific data. Whenever I need to access the userMap I would take a lock first, access the data, and then unlock.

For example, if a client sends a request message X. The server F, epoll_wait() will get the event (i.e. the incoming message). Then this messages is read from the socket and processed further. In the server handleX() method will be invoked (if user state and context is consistent and allows for handleX to be executed for this user). Each of these handleX function needs the current user context for some computation. it locks the userMap, gets or set the data, and then unlocks it.

In the single threaded version of F, a single thread only waits for events and processes those events one by one.

I tried to use a thread pool to check the multicore scalability of F. In this case, a single thread reads the messages from the socket and puts those messages in a queue. A pool of threads are waiting on the queue and picks any messages that pushed on the queue. But the throughput is not better than single-threaded code version of F.

I think locking is inherently serializing the F‘s code. I would like to know is there any other model of user context store and retrieve to minimize lock contention?