Pass only the first couple of PDF pages to "local" AI

MarvinMarvelouis · October 17, 2025, 8:29pm

Given I have got some lengthy documents I want to pass to a local AI (Ollama/MLX) - though not the full content is required for my use case - file renaming. In a couple of cases it took quite long to get a result by the model (Phi4). So, I look for some ways to get things done faster.

What would you suggest, does it make sense to only pass the first 2-3 pages to the local AI? Is there a simple way to achieve that in DT/AppleScript? Anything else?

I use AppleScript to call local AI services:

set theResponse to get chat response for message thePrompt engine theEngine URL theUrl temperature 0 as theResultFormat

Here’s the prompt - model Phi4.

property globalPrompt : "¬
Generate a concise, descriptive filename for the following text.¬
# Rules/Expectations for the filename:¬
- 3-6 words maximum, no more than 50 letters¬
- Use only lowercase letters, numbers, and underscores¬
- Capture the main topic or purpose of the given text¬
- Be specific enough to identify the content¬
  - For invoices: if a company name is given always use it¬
  - For invoices: use the main product¬
  - Ignore city names in invoices¬
  - For bookings: use the name of the place or the destination¬
- Ignore prices for filename¬
- Ignore first names or surnames of persons¬
- Avoid generic terms at all costs like \"document\", \"file\", or \"text\"¬
- Avoid any names of persons at all costs¬
# Steps¬
2. Capture the main topic or purpose based on the text¬
3. Extract dates from the text - ignore ones before the date saved in the CURRENT_DATE-variable¬
4. Order dates from newest to oldest¬
5. Use the oldest date in YYYY-MM-DD notation¬
6. Generate the name in German for the file - follow the rule mentioned ealier¬
7. Make sure the filename follows the rules mentioned earlier - fix it, if it does stick to the rules¬
8. Make sure the filename follows the rules mentioned earlier - fix it, if it does stick to the rules¬
¬
You are an excellent JSON generator.¬
Extract date and filename from the input.¬
Ignore dates before the CURRENT_DATE-variable
¬
Filename format: [topic]_[subtopic]_[descriptor] (if applicable)¬
# expected json output fields with data type¬
```json¬
{¬
 \"new_filename\": \"string\",¬
 \"new_date\": \"date\"¬
}¬
¬
# example for expected output¬
```json¬
{¬
 \"new_filename\": \"this-is-the-file-name\",¬
 \"new_date\": \"1970-10-01\"¬
}¬
```¬
¬
# Variables¬

"

Thanks a lot.

cgrunenberg · October 18, 2025, 7:45am

For renaming or metadata retrieval it’s probably sufficient to use the first 1000 words of the plain text. Dates, authors, copyright, titles are usually all found at the beginning of documents.

P.S: Any reason you’re using Phi 4? During my limited testing this model didn’t excel in any way and has a relatively small context window in 2025 compared to the alternatives like GPT-OSS, Gemma 3 or Mistral Small 3.2.