Given enough time to "think," small language models can beat LLMs at math and coding tasks by generating and verifying multiple answers.