Sheets - How to change column headings to first row or vice versa

gelbin · May 7, 2020, 5:51pm

Hi, I have imported a new sheet and the behavior in DTP3 seems to make the first row of the sheet the column headings for the sheet. I want to bump the rows down and make the first row the values for the column heading.

Alternatively, I need to know the reference for the column heading I guess for AppleScript.

BLUEFROG · May 7, 2020, 7:12pm

DTP3 seems to make the first row of the sheet the column headings for the sheet.

That’s pretty standard for tabular data.
Where did the sheet come from?

tell application id "DNtp"
	set sel to content record
	columns of sel
end tell

Note: There is a small change coming for referring to cells in a selected sheet.

gelbin · May 7, 2020, 7:39pm

a numbers csv file. it seems it used to import as no headers, but now the first row becomes a header row.

BLUEFROG · May 7, 2020, 7:48pm

Nothing has changed with the import of CSV. The first row in a CSV file is assumed to be the column headers.

Also, the export options in Numbers make a difference.

Exporting without table names will definitely treat the first row as column headers.
Exporting with the table names will have A, B, C, etc. as column headers.

gelbin · May 7, 2020, 10:25pm

thanks bluefrog. I wrote my script to read row one as the headers now I am debating whether I should have it read the column headers as the actual headers. I am thinking the best option would be to use the headers as proper headers and start my data at row one, however, I am not sure I completely understand your script above. what I need to do is be able to say for all items in columns, test if each matches the relevant headers I want to gather data for. I am under the impression that columns is an array of the header values, is that right? what I need to is to know what column number represent the fields I want.

for example if my header row looks like:

Author Filename Case Filepath Recipient

and I want the Author and Recipient, I run through columns(1)-columns(5) test if it matches “Author” or “Recipient” and then log index 1 and index 5 as the author and recipient indices. does that make any sense at all?

BLUEFROG · May 8, 2020, 1:14am

Can you post your code and a screen capture showing the top section of your sheet?

gelbin · May 8, 2020, 1:40am

you asked for it! it ain’t pretty but…

tell application id "DNtp"
	tell think window 1
		set theCustodianArray to {}
		set columnArray to {}
		set theMetaDataArray to {}
		set currRow to 2
		set currCol to 1
		set currFile to 1
		set currCounter to 1
		set customMetaD to ""
		set DatAuthor to "Author"
		set DatCustodian to "Custodian"
		set DatAllCustodians to "All Custodians"
		set DatConfidential to "Confidential"
		set DatFilePath to "File Path"
		set DatFileName to "File Name"
		set DatMD5Hash to "MD5 Hash"
		set DatTitle to "Title"
		set DatTimeCreated to "Date Time Created"
		set DatSubject to "Subject"
		set DatFrom to "From"
		set DatTo to "To"
		set DatCC to "CC"
		set DatBCC to "BCC"
		set DatDateTimeSent to "Date Time Sent"
		set BUTTON_Returned to "More"
		repeat until BUTTON_Returned is "Last"
			set valueReturned to (display dialog ("Enter Column with Metadata" & columnArray) with title ("Column Entry") buttons {"Last", "More"} default button "More" default answer "")
			set BUTTON_Returned to button returned of valueReturned
			set new_column to text returned of valueReturned as number
			copy new_column to the end of columnArray
		end repeat
		-- set columnArray to {new_column}
		set rowCnt to number of rows
		set colCnt to number of columns
		if rowCnt ≠ -1 then -- A sheet is not selected in the frontmost window
			-- repeat with theItem in columns
			repeat while currCol ≤ colCnt
				set datHeader to get cell at row 1 column currCol
				-- set datHeader to theItem of columns
				if datHeader = DatAuthor then
					set authorColumn to currCol
				end if
				if datHeader = DatCustodian then
					set custodianColumn to currCol
				end if
				if datHeader = DatAllCustodians then
					set allCustodiansColumn to currCol
				end if
				if datHeader = DatConfidential then
					set confidentialColumn to currCol
				end if
				if datHeader = DatFilePath then
					set filePathColumn to currCol
				end if
				if datHeader = DatFileName then
					set fileNameColumn to currCol
				end if
				if datHeader = DatMD5Hash then
					set md5HashColumn to currCol
				end if
				if datHeader = DatTitle then
					set titleColumn to currCol
				end if
				if datHeader = DatTimeCreated then
					set timeCreatedColumn to currCol
				end if
				if datHeader = DatSubject then
					set subjectColumn to currCol
				end if
				if datHeader = DatFrom then
					set fromColumn to currCol
				end if
				if datHeader = DatTo then
					set toColumn to currCol
				end if
				if datHeader = DatCC then
					set ccColumn to currCol
				end if
				if datHeader = DatBCC then
					set bccColumn to currCol
				end if
				if datHeader = DatDateTimeSent then
					set timeSentColumn to currCol
				end if
				set currCol to currCol + 1
			end repeat
			repeat while currRow ≤ rowCnt
				set file_name to get cell at row currRow column 1
				set fullfileName to file_name & ".pdf"
				repeat with theItem in columnArray
					-- set newMetaName to get cell at row 1 column theItem
					set newAuthorData to get cell at row currRow column authorColumn
					--set newTagComplete to (newTagName & ": " & newTag)
					set newCustodianData to get cell at row currRow column custodianColumn
					set newAllCustodiansData to get cell at row currRow column allCustodiansColumn
					set newConfidentialData to get cell at row currRow column confidentialColumn
					set newFilePathData to get cell at row currRow column filePathColumn
					set newMd5HashData to get cell at row currRow column md5HashColumn
					set newTitleData to get cell at row currRow column titleColumn
					set newTimeCreatedData to get cell at row currRow column timeCreatedColumn
					set newSubjectData to get cell at row currRow column subjectColumn
					set newFromData to get cell at row currRow column fromColumn
					set newtoData to get cell at row currRow column toColumn
					set newCCData to get cell at row currRow column ccColumn
					set newBccData to get cell at row currRow column bccColumn
					set newTimeSentData to get cell at row currRow column timeSentColumn
					my transfer_data(fullfileName, newAuthorData, newCustodianData, newAllCustodiansData, newConfidentialData, newFilePathData, newMd5HashData, newTitleData, newTimeCreatedData, newSubjectData, newFromData, newtoData, newCCData, newBccData, newTimeSentData)
				end repeat
				set currRow to currRow + 1
			end repeat
		end if
	end tell
end tell
on transfer_data(sub_file_name, sub_author_data, sub_custodian_data, sub_allcustodians_data, sub_confidential_data, sub_filepath_data, sub_md5hash_data, sub_title_data, sub_timecreated_data, sub_subject_data, sub_from_data, sub_to_data, sub_cc_data, sub_bcc_data, sub_timesent_data)
	tell application "DEVONthink 3"
		set recoList to lookup records with file sub_file_name
		set recoListLength to length of recoList
		if recoListLength = 1 then
			set theRecord to item 1 of recoList
			tell theRecord
				--set sub_full_data to the tags of theRecord
				set customMetaD to the custom meta data of theRecord
				add custom meta data sub_author_data for "author" to it
				add custom meta data sub_custodian_data for "custodian" to it
				add custom meta data sub_allcustodians_data for "allcustodians" to it
				add custom meta data sub_confidential_data for "confidential" to it
				add custom meta data sub_filepath_data for "filepath" to it
				add custom meta data sub_md5hash_data for "md5hash" to it
				add custom meta data sub_title_data for "title" to it
				add custom meta data sub_timecreated_data for "datetimecreated" to it
				add custom meta data sub_subject_data for "subject" to it
				add custom meta data sub_from_data for "from" to it
				add custom meta data sub_to_data for "to" to it
				add custom meta data sub_cc_data for "cc" to it
				add custom meta data sub_bcc_data for "bcc" to it
				add custom meta data sub_timesent_data for "datetimesent" to it
				-- set the Author of theRecord to newTag
			end tell
		else
			if recoListLength = 0 then
				error "Zero records were found!"
			else if recoListLength > 1 then
				error "More than 1 record was found!"
			end if
		end if
	end tell
end transfer_data

Edited: Use three backticks before and after the code to created fenced code clocks. Thanks.

BLUEFROG · May 8, 2020, 2:06am

So you’re trying to apply the cell content in each row as custom metadata to existing files in the database, looking them up by the name in column 1?

gelbin · May 8, 2020, 2:36am

So, here is what i am trying to do:

I receive a folder of pdf files along with a spreadsheet (a load file) that has a column with the file name and then a bunch of additional descriptive data for each file. I want to import the folder of files, and then map the metadata from the spreadsheet into the custom metadata now available in DTP3.

To be more precise, some of the header fields (metadata types) I don’t need. So the app is looking for specific headings and identifying those columns associated with those sought after fields and then maps data from those fields back onto the files after it finds the pdfs in the database.

i understand my process if rather brute force and not foolproof, but generally it works if i can clean up the issue with row 1 now being the header row…

BLUEFROG · May 8, 2020, 3:31am

Like this…

to this…

And…

How many files are you processing?
Can the filename include the extension?
Do you ever process the same file again?

gelbin · May 8, 2020, 3:38am

exactly like that, yeah. processing thousands to tens of thousands. the filename ideally would include the extension, but generally the way documents are produced is as .pdf or .tiff files and thus the actual file name (for example report.doc) is not the name of the file included in the database unless its produced as a native file (which would generally include .xls and .doc files). if the worst case was having all the metadata assigned to the pdfs that would be a good start.

i have talked to eric about this in the past, and it seems like potentially my conversations about metadata may have taken some hold in the advanced metadata features found in DTP3. still requires code like this to turn this into a real powerhouse for these giant file productions…but i am cheap and prefer technical solutions even if i have to work for them!

BLUEFROG · May 8, 2020, 3:51am

So you have potentially tens of thousands of rows in one document or multiple documents?
That would make a difference as timeouts might need to be applied.

And…

Do you ever process the same file again?
Are you using label colors for anything?

gelbin · May 8, 2020, 2:03pm

well, good question. there are multiple spreadsheets each providing metadata for thousands of documents. making a total of hundreds of thousands of documents. currently my goal was only to be able to process one “sheet” at a time for the thousand or so documents.

these files should only need to be processed once.

i use labels to tag my documents as hot, very hot, irrelevant, etc. in my databases.
s

BLUEFROG · May 8, 2020, 6:31pm

Stay tuned and come back to this as we have some things going on in here that will likely help streamline this.

gelbin · May 8, 2020, 7:46pm

why don’t we talk offline about the realities (I can share actual exemplary files)?

BLUEFROG · May 8, 2020, 7:52pm

Hold the Option key and choose Help > Report bug to start a support ticket and we can discuss more off to the side.

gelbin · May 8, 2020, 10:39pm

done