The article discusses how to use prompt caching to reduce input tokens when working with Anthropic's Claude model. Here’s a summary of key points:
Key Concepts:
- Prompt Caching: A technique that allows you to cache (store) the system prompt once and reuse it across multiple requests, thereby reducing the number of input tokens.
- Input Tokens vs Output Tokens:
- Input tokens: The total number of characters in your prompts and inputs.
- Output tokens: The number of characters generated by Claude as a response.
Steps to Implement Prompt Caching:
-
Load Dataset: Load the dataset containing TED Talks transcripts.
python1df_ted = pd.read_csv('TED_Talk.csv').head(10)[['talk__id','transcript']] -
Set Cache Control:
- Mark your system prompt for caching using
cache_control.
python1self._system = [ 2 { 3 "type": "text", 4 "text": get_prompt(), 5 "cache_control": {"type": "ephemeral"}, # cache breakpoint 6 } 7] - Mark your system prompt for caching using
-
Invoke Model:
- Ensure
Read the full article at Towards AI - Medium
Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

![[AINews] The Unreasonable Effectiveness of Closing the Loop](/_next/image?url=https%3A%2F%2Fmedia.nemati.ai%2Fmedia%2Fblog%2Fimages%2Farticles%2F600e22851bc7453b.webp&w=3840&q=75)



