Experimenting with llama2 LLM for local file classification (renaming, summarizing, analysing)

syntagm · September 15, 2023, 1:03am

Follow-up from OpenAI ChatGPT for automatic generation of matching filenames - #3 by syntagm

ChatGPT works extremely well to get some logic into OCRed documents and PDFs, but would be nice to do this locally with llama2. I did a lot of playing around with it but wasn’t able to get it into something useful (yet).

First of all, here’s my script:

# function to generate a random string
on randomString(length)
  set theCharacters to "abcdefghijklmnopqrstuvwxyz0123456789"
  set theResult to ""
  repeat length times
    set theResult to theResult & character (random number from 1 to length of theCharacters) of theCharacters
  end repeat
  return theResult
end randomString

# store filecontent into a temporary txt file and return the path to it
on storeFileContent(filecontent)
--  set uniqueIdentifier to current application's NSUUID's UUID()'s UUIDString as text
  set uniqueIdentifier to my randomString(20)
  set posixtmpfile to POSIX path of (path to temporary items folder) & uniqueIdentifier & ".txt"

  try
    set fhandle to open for access posixtmpfile with write permission
    write filecontent to fhandle as «class utf8»
    close access fhandle

    return posixtmpfile
  on error
    try
      close access posixtmpfile
    end try
  end try
end storeFileContent

on processRecord(theRecord)
  tell application id "DNtp"
    if type of theRecord as text is "group" or (word count of theRecord) is 0 then return
    set c to plain text of theRecord

    # cut c to be max 8000 chars long, if it's longer than 8000. otherwise take the entire content
    if length of c > 8000 then
      set c to text 1 thru 8000 of c
    end if

    set posixtmpfile to my storeFileContent(c)

    log "temporary filepath: " & posixtmpfile

    # current date as "yyyy-mm"
    set currentDate to text 1 thru 7 of (do shell script "date +'%Y-%m'")

    set theCommand to "/opt/homebrew/bin/ollama run llama2:7b-chat-q5_K_M \"You are a filename generation AI. Given the following text, output exactly 1 descriptive filename option that could match this content and could be it's filename on disk. Output the possible file name in quotes. Use spaces instead of underscores to separate words. Do not output a file extension, only the name. Include date if applicable. Only output the filename and nothing else, do not chat, no preamble, get to the point. Your output format should be: `Filename: <your suggested filename>`\" \"$(cat " & posixtmpfile & ")\""

    log "executing: " & theCommand

    try
      set theResult to do shell script theCommand

      log "command result: " & theResult
      display dialog theResult

--      set name of theRecord to theResult
    on error errorMessage number errorNumber
      log errorMessage
      display dialog "Error: " & errorMessage & " (" & errorNumber & ")"
    end try
  end tell
end processRecord

on performSmartRule(theRecords)
  tell application id "DNtp"
    repeat with theRecord in theRecords
      my processRecord(theRecord)
    end repeat
  end tell
end performSmartRule

-- this is for testing so we can just execute with osascript xxx.applescript and don't need to put it into a smartrule first
tell application id "DNtp"
  set theRecords to selected records
  my performSmartRule(theRecords)
end tell

How to setup llama2

The easiest method is ollama and I would recommend that, so download it from https://ollama.ai and run the instructions.

Run which ollama on the command line to figure out where your ollama installation is. If you used brew install ollama it’s gonna be in /opt/homebrew/bin/ollama, but adjust this path in the script to whatever fits on your system

Picking a model

ollama can pull a bunch of models out of the box: library and you can pull them with ollama pull <model>

3b models are the smallest and the dumbest
7b models are bigger, but require at least 16gb of memory
13b models need at least 32gb of memory and are a good bit slower

Quantizations:

the higher the q number in the model name, the higher the bit quantization. tl;dr: higher quantization = more memory, better performance. q4 is the normal one, q5 is a tad better. higher than q5 is probably not gonna give good results for 3-13b models so no need to try them (I think?)
K_M models are said to be the best mix of the bunch

Chat vs text vs instruct

the -chat models are finetuned to work like a chat, so like chatgpt you say “hello model” and it responds to you like a chatbot
the -text models are default LLM models, so a very smart autocompletion engine. You write “Hello world, my name is llama, I am” and it generates text that should come after this. This is going to give you the best results, but needs more thinking how to tweak it
then there is stuff like codellama:7b-instruct which are for code completion, but the instruct models are said to be better at handling specific instructions (eg: “do X”)

Anyway I would recommend the following models (pull them with ollama pull <model>:

llama2 - this is the default model and the same as 7b-chat-q4_0
llama2:7b-q4_K_M (using K_M instead of default)
llama2:7b-chat-q5_K_M (same but with q5)
llama2:llama2:7b-q5_K_M or llama2:7b-q4_K_M - same as above, but non-chat models

all the models above, but with 13b if your mac has enough memory and power

Testing with DEOVNthink

Chug the model and prompt into the script I posted above and run osascript xxx.applescript while selecting a document in DEVONthink

The prompt I tinkered with is

You are a filename generation AI. Given the following text, output exactly 1 descriptive filename option that could match this content and could be it’s filename on disk. Output the possible file name in quotes. Use spaces instead of underscores to separate words. Do not output a file extension, only the name. Include date if applicable. Only output the filename and nothing else, do not chat, no preamble, get to the point. Your output format should be: Filename: <your suggested filename>" "$(cat " & posixtmpfile & ")"

my problem is that it flat out ignores my instructions. I tell it to not output a file extension but it outputs a file extension. It tell it to only output a filename and it responds with “Sure thing, based on the content you gave me, a filename could be: xxx.txt”

I got around this by doing some extra work, like extracing only text from the results that’s inside quotes, or trying to search for Filename: xxx and extracting that, but sometimes it doesn’t even output that.

Then when the file content is long, it often just ignores my instructions completely and starts telling me things about the document lol

I think using the -chat models is the wrong way to go at this and we have to use the non-chat models which is the way LLMs are designed to work, ChatGPT just spoiled us into the chat method. So a possible prompt could be:

This is the output of a program that outputs filenames based on content

Content:
---
Booking confirmation, flight XXX to YYY
Date: 2023-09-15
Passenger: XXX
---
Filename: Booking Confirmation

Content
---
blah blah blah
---
Filename: Something Else

Content
---
{actual file content here}
---
Filename:

The model will then complete the prompt, so hopefully output something more coherent, but needs more work extracting and telling it when to stop.

It’s also possible to tweak the models even more with Modelfile to change temperature and system prompt, see: GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

If anyone has ideas or came up with a good way to get this to work, please share

Pda0 · February 8, 2024, 2:13am

I think this is brilliant, thanks for sharing.

I’ve been testing this sort of stuff for some time, but local LLM’s severely underperform compared to GPT3.5. I guess we are not there yet. The option is to just pay OpenAI and use their API for now.

My use-case wishlist includes:

Automatic Tagging (Might need training on past criteria)
Summarize document
“Smarter” find related documents, and grouping

allcry · May 5, 2024, 12:21pm

Llama3 is out

BLUEFROG · May 5, 2024, 2:25pm

Llama3 is still not performant compared to online models.

syntagm · May 31, 2024, 3:25am

I have re-run my experiment with llama3 7B and it’s very good. I am now able to move some of the batch classification that I’m using OpenAI APIs for to local llama3 to cut cost.

Especially for things like keyword generation that eat a lot of tokens and I don’t really care if it’s slow or not. For example:

~/Downloads took 33s 
❯ /opt/homebrew/bin/ollama run llama3:8b-instruct-fp16 "You are a classification program. Given the provided content, provide a short ENGLISH summary in bulletpoints of all the important information contained. Make sure to include dates, names, and cost if present. Only output the result and nothing
 else. No preamble, only output the result." "$(pdftotext expedia.pdf -)"
Here is a short ENGLISH summary in bulletpoints of all the important 
information contained:

• Dates: May XX, 2024 - May XX, 2024
• Hotel: <hotel name>
• Confirmation Number: Đã**************************
• Expedia Itinerary Number: 728301**********
• Reservation Details:
        + Only the person who booked can change or cancel the booking
        + Check-in time: 2:00 PM, check-out time: noon
        + Minimum age for check-in: 18 years old
• Room Details:
        + Deluxe Apartment with 2 Bedrooms, Kitchen, and City View
        + Reserved for XXXX, 2 adults
        + Requests: 1 King Bed and 1 Queen Bed, Non-Smoking
• Payment Details:
        + Room price: ¥20,515 per day (May 26-27)
        + Taxes: ¥5,818
        + Coupon applied: -¥3,473
        + Total: ¥43,375
        + Paid in full
• Location: XXXXX
• Contact Information:
        + Expedia Phone Number: 03674 00000

or

❯ /opt/homebrew/bin/ollama run llama3:8b-instruct-fp16 "You are a filename classification program. Given the following content, output exactly 1 (one) filename that could match the content. If the content contains a date, prefix the filename with that date, such as 2024-05 XXXX.pdf, if no date is present, use the current date, which is '2024-05'. If the content contains the name of a place, restaurant or company, include it. Include the kind of document it is, such as 'receipt' or 'booking confirmation'. Example filename: '2024-05 Apple Store Receipt iPad.pdf' Keep the name short and concise, in english. Output ONLY the name suggestion and NOTHING else. No preamble, no explanation." "$(pdftotext bill.pdf -)"
2024-05 Tokyo Gas Receipt.pdf

or

❯ /opt/homebrew/bin/ollama run llama3:8b-instruct-fp16 "You are a classification program. Given the following filecontent, output an ARRAY of 5 (five!) tags [] that could fit the filecontent. For example, if the content is a electricity bill, output 'bill:electricity'. If it is a water bill, output 'bill:water'. If it is a receipt, output 'receipt'. If it is a booking confirmation, output 'booking-confirmation'. And so on. ONLY output the tags and nothing else. No preamble, no explanation. ONLY THE TAGS. IF YOU OUTPUT ANYTHING ELSE THAN THIS ARRAY, YOU WILL DIE." "$(pdftotext expedia.pdf -)"
["booking-confirmation", "hotel-reservation", "travel-document", 
"accommodation-details", "itinerary"]

It’s still hit and miss, so often ignores specific instructions altogether. and does what I tell it not to do anyway. I haven’t played with the 70B model yet, but I am already thinking to use the 7B variant to batch process some older documents to add keywords and improve DEVONthink search index.

(OT, but I found so far that the vision models (Anthropic Claude 3 Opus, GPT 4o) have the best success. OCRd content is ok, but it often results in the model just getting a blob of text that’s unstructured. Providing the content in image format yields very good results for classification.)

Exciting times ahead! Looking forward to tinkering with this
One step closer to an automatically managed database

syntagm · May 31, 2024, 4:43am

Quick applescript to use llama3 to generate a description and add it to the “AI Description” custom property:

# function to generate a random string
on randomString(length)
	set theCharacters to "abcdefghijklmnopqrstuvwxyz0123456789"
	set theResult to ""
	repeat length times
		set theResult to theResult & character (random number from 1 to length of theCharacters) of theCharacters
	end repeat
	return theResult
end randomString

# store filecontent into a temporary txt file and return the path to it
on storeFileContent(filecontent)
	--  set uniqueIdentifier to current application's NSUUID's UUID()'s UUIDString as text
	set uniqueIdentifier to my randomString(20)
	set posixtmpfile to POSIX path of (path to temporary items folder) & uniqueIdentifier & ".txt"
	
	try
		set fhandle to open for access posixtmpfile with write permission
		write filecontent to fhandle as «class utf8»
		close access fhandle
		
		return posixtmpfile
	on error
		try
			close access posixtmpfile
		end try
	end try
end storeFileContent

on processRecord(theRecord)
	tell application id "DNtp"
		if type of theRecord as text is "group" or (word count of theRecord) is 0 then return
		set c to plain text of theRecord
		
		# cut c to be max 8000 chars long, if it's longer than 8000. otherwise take the entire content
		if length of c > 8000 then
			set c to texts 1 thru 8000 of c
		end if
		
		set posixtmpfile to my storeFileContent(c)
		
		log "temporary filepath: " & posixtmpfile
		
		# current date as "yyyy-mm"
		set currentDate to texts 1 thru 7 of (do shell script "date +'%Y-%m'")
		
		set theCommand to "/opt/homebrew/bin/ollama run llama3:8b-instruct-fp16 \"You are a classification program. Given the provided content, provide a short ENGLISH META DESCRIPTION in 2-3 CONCISE sentences of what this document is about. It should act as a meta description of the document. Do not start with 'this is', keep it simple such as 'Booking confirmation for XXX' or 'Receipt for purchase of iPad at Apple Store'. Include exact dates (with years), numbers, cost,  and names (such as companies, hotel names, cities, etc) if present. ONLY OUTPUT THE SUMMARY AND NOTHING ELSE!\" \"$(cat " & posixtmpfile & ")\""
		
		
		log "executing: " & theCommand
		
		try
			set theResult to do shell script theCommand
			
			if theResult starts with "\"" and theResult ends with "\"" then
				set theResult to texts 2 thru -2 of theResult
			end if
			
			log "command result: " & theResult
			-- display dialog theResult
			add custom meta data theResult for "aiDescription" to theRecord
			
			--      set name of theRecord to theResult
		on error errorMessage number errorNumber
			log errorMessage
			display dialog "Error: " & errorMessage & " (" & errorNumber & ")"
		end try
	end tell
end processRecord

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			my processRecord(theRecord)
		end repeat
	end tell
end performSmartRule

-- this is for testing so we can just execute with osascript xxx.applescript and don't need to put it into a smartrule first
tell application id "DNtp"
	set theRecords to selected records
	my performSmartRule(theRecords)
end tell

cgrunenberg · May 31, 2024, 5:45am

Unless you try a different language (e.g. German) and then you might get some hilarious results including invented words English results are definitely better.

syntagm · May 31, 2024, 6:18am

Haha yeah I noticed that too. I have a bunch of Japanese documents and it throws it off, including the instructions I give it. For example, suddenly it becomes very verbose and outputs preamble and explanations when I explicitely tell it not to do that, while on english content it works. Or it sees Japanese characters, and suddenly the entire output is in Chinese.

Llama3 is multilingual, including English, Japanese and German, so maybe higher quantizations or bigger models could do better, but my Macbook can’t run the 70B models, even with 2bit quantization.

I’m also not able to get it to consistently NOT output a preamble, not even with a stronger systemprompt and reduced temperature. Half the output is “Here is the summary” or “Here is the meta description”. Maybe a way around this is to generate multiple variants and pick one that doesn’t have a preamble.

cgrunenberg · May 31, 2024, 6:51am

That’s one of the advantages of ChatGPT/Claude as both support JSON output and function/tool calling, parsing the output of especially local engines can easily become tedious.

syntagm · May 31, 2024, 8:49am

I went down the llama rabbithole and ended up comparing the latest state-of-the-art local models side-by-side: phi3:medium (microsoft), llama3 (meta), mistral (mistral) to see how useful they are for description generation. I used them on a wide range of documents, some OCRed, some directly downloaded as PDF. Some in English, some in German, some in Japanese.

I basically just clicked on random items in my DEVONthink inbox, then ran the applescript to pull the textcontent through DEVONthink out, write it into a text file, then use that as ollama input.

All these models were run unaltered, so at the default temperature with default system prompt.

Results:

Japanese bill (PDF, not OCRed):

llama3: “Credit Card Charge Form” for XXX, issued on 05APR24 and processed in Japan.

llama3:8b-instruct-fp16: “Credit Card Charge Form” for an American Express card held by XXX, with a charge of JPY1000 on April 5th, 2024.

phi3:medium: Credit Card Charge Form for XXX., Ltd. on 24th April 2024.

mistral: Credit Card Charge for JPY1000 to American Express CREDIT CARD COMPANY for XXX from XXX. with Ticket Number 205 YYYYYY, Reservation Code AAAAA and Passenger Name XXXXX MR on 05APR24 in Japan - NH TYO WEB REVENUE.

German bill (PDF, not OCRed):

llama3: Confirmation of Payment for services rendered to Herr XXX by YYYY GmbH from April 15, 2024 to May 14, 2024.

phi3:medium: XXX bill dated 15.04.2024 for YYY totaling 21,90 EUR net including MwSt. at the cost of 18.40 EUR.

mistral: Invoice from XXX for YYY, dated April 15th, 2024. The invoice total is €21.90, due on or after April 17th, 2024. Services include xxxxxx.

Expedia.com confirmation PDF (english)

lama3: CONFIRMATION for XXX stay at YYY in ZZZ from May 26, 2024 to May 28, 2024.

llama3:8b-instruct-fp16: Confirmation for XXX booking at ZZZZ in YYYY from May 26, 2024 to May 28, 2024.

phi3:medium: Hotel Reservation Summary for XXX in YYYY from May 26, 2024 to May aturday, May 28, 2024.

mistral: Hotel Reservation Confirmation for XXXX in YYYY from May 26, 2024 to May 28, 2024. The reservation is for ZZZ and the room type is a Deluxe Apartment with 2 Bedrooms, Kitchen, and City View. The total cost is $43,375 including taxes ($5,818). Please note that payment was made using a coupon for a discount of $3,473. Check-in is at 2:00 PM on May 26, 2024, and check-out is at noon on May 28, 2024. Guests are recommended to download the property’s Whatsapp mobile app before check-in.

Japanese electricity bill (OCRed with Abbyy)

llama3: XXX BILL for the period of 4/11 to 5/14, covering consumption of 354m3 gas and 253kWh electricity, with a total cost of ¥4,555.

llama3:8b-instruct-fp16: Gas Bill for 1111-111-111 from XXXX.

phi3:medium: Gas Bill for YYYY from XXXX.

mistral: Gas Bill for XXXX from July 2021 to June 2022. Consumption details include: 1111 cubic meters, 755 kWh, cost of 3,544 yen. Additional charges such as tax and service fees are included. The total bill amounts to 4,555 yen.

Japanese medical document (OCRed with Abbyy)

llama3: Medical Prescription.

llama3:8b-instruct-fp16: Receipt for XXXX for ¥12345 on April 23, 2024.

phi3:medium: reservation confirmation for XXX at the Sagaris Hotel in Tokyo on 2024-04-23. The cost breakdown includes room rate (73), taxes (1,430 JPY), and additional services totaling 68 JPY. Total due is 1,598 JPY with no advance payment required.

mistral: Prescription Receipt for xxxx, dated on . The prescription includes medications such as ABCDEF and costs $480. The prescription was filled at and the next refill is scheduled for .

All in all, for something that I can run offline, I am very satisfied. So far I think I like phi3:medium the most, it feels the most consistent to me. Mistral is also great but often too wordy, but very detailed.

With a bit of temperature tweaking, I think we could get this into something more consistent.

cgrunenberg · May 31, 2024, 1:05pm

Just wondering but did this help? Some chat engines seem to prefer polite requests in my experience.

syntagm · May 31, 2024, 2:11pm

It’s a technique used in some Jailbreaks (like DAN) to get the LLM to break its own rules and comply. On some models more effective, on some not.

Other psychological tricks like pretending to be in danger if the LLM doesn’t comply or offering tips can also work