00:00

Enhancing Language Models with Thinking Tokens for Improved Calculations

Language models are enhanced with the concept of "thinking tokens" to allow for more time for computations, improving accuracy in complex calculations. By inserting special tokens at strategic points in sentences, the model can allocate additional processing time, similar to human thinking pauses. This innovation shows promise in reducing errors and improving performance in various tasks, such as mathematics. Training and analyzing language models with thinking tokens reveal potential benefits in accuracy and efficiency, particularly in contexts involving numerical values. The goal is to develop a self-adjusting model that determines the optimal use of thinking tokens for producing the best possible answers.

monteseirin
Download Presentation

Enhancing Language Models with Thinking Tokens for Improved Calculations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thinking Tokens for Language Modeling David Herel, Tomas Mikolov CTU FEE, CIIRC CTU AITP 2023

  2. How much is 56 times 37?

  3. Llama 2 70B

  4. How much is 56 times 37? Correct answer: 2072

  5. Language models for calculations - - - - - LM often make mistakes in difficult calculations Inability to perform complex reasoning Large training sets and great memorization capability Humans also cannot perform this calculation immediately They require a considerable amount of time to construct the solution

  6. Thinking Tokens - As a parallel to human behavior, we propose to give a more time to “think” by using special “thinking tokens” Each thinking token buys more time to run additional computations Great potential in RNNs due to their architecture Example: “How much is 56 times 37? <T> <T> <T> ...<T> 2072” - - -

  7. Proof of Concept - We have added N thinking tokens (“<T>”) after each observed word in a dataset Our vision is that this basic concept can be extended to self-adjusting model, which will decide by itself how many thinking tokens will be used -

  8. Proof of Concept - - RNN LM as baseline model vs RNN LM with “thinking tokens” Trained on standard LM tasks, mathematics datasets and economics datasets We have identified sentences where the largest difference in perplexity was measured to see, where the usage of “thinking tokens” could be beneficial -

  9. Results - Loss generated by 'thinking tokens' is omitted from the calculation of perplexity <T> helps in sentences which include specific numbers, e.g. three studios albums, or representative symbols of numerical values, such as N -

  10. Results

  11. Future work - Create a model that would be able to decide itself how much extra time is needed in order to produce best answer possible That means it could choose how many or if any thinking tokens should be generated End Goal example: “How much is 56 times 37? <T> <T> <T> ...<T> 2072” - -

  12. Thank you!

  13. Bonus material

  14. Bonus material

  15. Bonus material

More Related