Whitespace in RegEX

Since the document is standardised, I could grab the value according to the line number. :smiley:

Could you point me to a good place in the internet where I could look up how to script that?

Must be so easy I maybe can teach myself?


I anonymised the respective fields

lin 7 connects to line 18 and so on.

lin 7 connects to line 18 and so on.

This proves the underlying text layer has little relationship to what youā€™re looking at in the file.

Are you just trying to pull the NORSEDE71XXX value from the file?

try running this on your original file:

tell application id "DNtp"
	set theRecords to selected records
	repeat with theRecord in theRecords
		set documentText to plain text of theRecord
		set lns to paragraphs of documentText
		set theContent to item 18 of lns # use whatever line number you need
		display dialog theContent
	end repeat
end tell

Iā€™m playing with this myself, so I canā€™t actually point you to a source (it being a combination of scripts I have written, searching my brain and the 'net). I was inspired by this post in Stack Overflow.

1 Like

As I said it is more an educational project to learn something.
I chose this entry point because I got a lot of them and if something goes wrong it doesnā€™t matter.

I wanted to learn automise renaming files.
So the date and recipient as filename via regex as a first exercise.

Regex isnā€™t really a ā€œbeginnerā€™s topicā€ but if youā€™ve got the time and energy, itā€™s an interesting and very powerful technology.

Hereā€™s a famous quote from a well known hacker back in the dayā€¦

Some people, when confronted with a problem, think ā€œI know, Iā€™ll use regular expressions.ā€
Now they have two problems.
~ Jamie Zawinski,

3 Likes

Results stay the same. (Printing pdf or jpg as pdf.)
A couple of years ago I had Adobe Acrobat Pro and there you could create Textfields and then define their order.

Meaning you could guide the user through the document he is filling it out.

So you could set where the marker jumps to next when pressing tab. First I thought that is the reason why the fields are stacked liked this.

But since it seems OCR it anyway like this, there must be different logic behind it.

Anyway problem solved, even tough @Blanc helped far too muchā€¦ :wink: I think I can cross this item from my bucket list anyway since I came up with the idea to grab the line. :relaxed:

(Thank you all very much)

1 Like

This is unrelated. Youā€™re not using input fields here.

I suppose that the OCR engine treats your document as if it had a columnar layout. Which would be ok and desired for a normal text (e.g. a page in a magazine or a newspaper article). However, for a bank statement, it is certainly not what one wants. Unfortunately, you canā€™t influence the OCR engineā€™s behaviour, so you probably have no chance to correct it.

Out of interest: Is this by any chance happening with account statements from german ING bank? I see similar behaviour here with mine.