Recent advances in reinforcement-learning-based routing protocols for vehicular ad-hoc networks (VANETs) are enabling vehicles and roadside units (RSUs) to dynamically learn optimal forwarding strategies in highly mobile, unpredictable traffic environments. These protocols formulate routing decisions as a Markov decision process where states may include vehicle position, speed, direction, neighbor connectivity and road-segment conditions, actions correspond to selecting the next-hop node or RSU, and the reward function is designed to optimise metrics such as packet delivery ratio, end-to-end delay, link stability and energy or overhead cost. Deep-reinforcement learning (DRL) techniques like DQN or multi-agent Q-learning are increasingly used to handle large state-action spaces and continuous mobility conditions, adapting to changing traffic densities and topology breaks without relying on fixed heuristics. For example, hierarchical Q-learning with grouped RSUs divides the urban network into segments, enabling distributed learning and faster convergence compared to classic protocols. The trend is toward more agile, scalable RL-driven routing frameworks that learn from real-time vehicular behaviour, making VANET routing more autonomous and context-aware.