Prompt caching mean caching the KV cache in transformers. KV cahe is the caching of Key-Value pairs in attention computation
How is ollama different from hugging face, does ollama download the models locally on our system like hugging face?
ollama gives you infrastructure which hugging face does not.By infrastructure I mean, if I want to use qwen for helping me in coding task then I have to set up the state, maintain a list of ai-human message , in case of hugging face. But, if I am using ollama, then it provides the chat interface and it maintains the state on its own. ollama reduces the engineering work.
still not clear!!!
Prompt caching helps when requests per minute is huge