Create a CSV from other CSVs

Thanks Bluefrog,

Could you please post the complete script you propose that includes this code? I’m confused as to how it will work.

I will run the script on my desktop instead, as you suggest.

Thanks for your time,

I’m working with the multiple .csv files imported into Devonthink
Select the files to be processed and run the script
For each file,
the contents are retrieved, by reading the raw .csv file in the database
and the first line is added to the new .csv file text
After all the files are processed, the new .csv file is created

set theNewcsvFileText to ""

tell application id "DNtp"
	
	set theSelectedcsvFiles to get selection
	
	repeat with theSelectedcsvFile in theSelectedcsvFiles
		set theSelectedcsvName to "\"" & name of the theSelectedcsvFile & "\""
		set theSelectedText to paragraphs of (read (path of theSelectedcsvFile as string) as «class utf8»)
		set theNewcsvFileText to theNewcsvFileText & theSelectedcsvName & "," & item 1 of theSelectedText & linefeed
	end repeat
	
end tell

set targetFile to (path to desktop as text) & "theNewFile.csv"
set openFile to open for access file targetFile with write permission
write theNewcsvFileText to openFile starting at eof as text
close access file targetFile
1 Like

On import DEVONthink tries to automatically detect whether there’s a header or not as not all CSV/TSV files have one. However, this might fail and therefore example files would be appreciated, thanks!

The sample file was contained verbatim in my post, it contains only two rows.

Without any explicit explanation even I wouldn’t know whether this CSV file is supposed to have a header or not :slight_smile:

To itch my curiosity, I wrote the small Python script to do as described by the OP @rctill. The basis is that the 30 files are in an indexed folder, run the python script in a macOS Terminal window with the current directory that indexed folder. Output is displayed to the terminal window, and an output.csv file created (but with no CSV headers per the OP)–using headers would make this little script even easier and more standard as a CSV file. For simplicity for me, uses Python pandas which may need to be installed for anyone using this.

Python available standard on all Macs. However, I use Python in a virtual environment which to explain is well beyond scope of this post. See the “interweb” for that.

#!/usr/bin/env python
# coding: utf-8
import glob
import pandas as pd
output_csv="output.csv"
file_list = glob.glob("*.csv")  # get list of all the csv files in the current directory
file_list.sort() # sort in ascending order
with open(output_csv,'w') as f:
    for fn in file_list:
        if fn != output_csv: # skip the possibly pre-existing output csv file
            df = pd.read_csv(fn) # create a dataframe with the file contents
            list_of_column_names = list(df.columns) # creating a list of column names by calling the .columns 
            combine = '"Year'+fn+'",'+','.join(list_of_column_names)
            print(combine)
            f.write(combine+'\n')
f.close()

Attached is the test folder with 30 dummy input files.
Archive.zip (21.2 KB)

Note: archive.zip updated.

Ok. I was simply assuming that the first row would become the header automagically, since I’m not aware of any special “header markup” for CSV.

What’s the heuristics in DT to figure that out?

Cool. Alternatively, on the command line:

head -1 *.csv | sed -e 's/^==>.*$//'  -e '/^$/D' > newCSV.csv
1 Like

Basically looking for certain keywords or headers created by DEVONthink or whether the first row uses only uppercase as there’s no official marker.

I see. That explains a lot, since the OP also (as I did) has all kinds of string in their header - numbers, text in mixed case etc.

  • Would it be possible to modify the header from inside DT? I haven’t found a way, but possibly overlooked something.
  • Would it be possible to have a (hidden) preference for “always interpret first line as header”?

I guess modifiable headers would be preferable, though.

Yes, cool. Brain freeze here, though.

Isn’t that identical to editing the columns? Or do you have a different “modification” in mind?

What I see is this:

When changing to “form view”, I get three rows with editable text fields, labelled “A”, “B”, “C”. Clicking into the column labels in the table view sorts by the corresponding column, and in the form view, I can’t click on the labels.

So, I don’t see how I could change, for example, the header “A” to “1979”. Not seeing the forest for the trees?

See Edit Columns… in Tools > Document > Sheets or in contextual menu.

1 Like

Well, the forest was called context menu. Thanks.

It’s not overly convenient, but possible. Which makes me think that maybe a (hidden) preference or a “use first row as headers in the next CSV import” might be helpful. Though the latter would, of course, break the usual import process that doesn’t require any interaction.

Thank you for the 30 dummy input files
Sadly Devonthink says none of the files have a header row :frowning_face:
There’s no columns function in AppleScript, so I just retrieved the first line of the .csv file

sort in ascending order

Nice addition

Sadly, IMHO DEVONthink is wrong :wink: if that is what it’s saying to us. I didn’t think to actually import into DEVONthink as it never occurred to me to have to test that. I just imported now, and they are displayed as tables, which is what I would have expected. Beyond that I did not pursue to learn how DEVONthink tells me what the columns are as described in first row.

There is nothing special about a header row. It’s simply declared. By convention, the first line of a file delimited by something (comma, tab, … whatever) is the header row. It doesn’t have to be the first line for when using methods of Pandas, R, and native Python where one is able to declare which row is the header. I’m sure other tools have same capabilities, but I only use these and in my work, I’m reading CSV files and munging into structures to facilitate analysis.

Sorting in descending order also easily to accomplish but I chose not to do that.

1 Like

Are these mixed quotes actually intented? There are single, double & curly quotes. Sometimes even mixed: 'RID01"

Typed by hand. Then copy/paste. Did not notice. My input routine wasn’t bothered, so I did not notice. Nor was I reading past the header. I did give some thought to write an algorithm for the computer to inform me of problems like the OP was looking for, but I stopped at that point and went on to other things.

I’ll fix these and edit my upload.

1 Like

Thanks! Unfortunately the header still uses curly quotes, currently only single/double quotes are supported.