Use semicolon as separator viewing csv files

Hi,
I want to use semicolon as separator for imported CSV files but cannot find how to do this. It’s so basic so I must be missing something but I’ve looked in the documentation, on Google and here in the forum without finding it.
Best,
Lasse

Did you try importing a csv with semicolons and that didn’t work? I doubt that there’s a setting for this.

As you are saying you are importing CSV files, the separator will be as they are in the imported file.

Perhaps you can edit the CSV text file you import with an editor and do a “replace all” from what it is (say, ‘,’) to what you want (‘;’).

Not sure if I understand, my file now has semicolons and I want DT to separate on them and not on comma (it’s a transcription containing a lot of other commas).

Yes, Devonthink separates on comma if .csv and on tab if .tsv. If there is no setting for this I hope there will be, very basic and very possible in most other programs.

With that clarification, seems @chrillek on the right track. Or view in an app that acts as you prefer. To do this, use “open with …” command.

The csv separator is one of the locale settings. It’s a comma in locales that use the decimal dot, and a semicolon in locales that use the decimal comma.
But I’m confused. My Dutch system should use the semicolon, and it does, but DT is running in (US) English, so it should use the comma. I expected DT to not split a csv file correctly, but it did. So DT is using the Dutch locale behind the scenes.
I should note that the system identifies these files as “csv file”. They have that extension.

The separation doesn’t depend on the locale settings at all, DEVONthink tries to determine the separator (tab, comma or semicolon, quoted or not) on its own.

Thanks for the clarification! Is there a way to override DTs guess?

There’s no such way currently and so far the automatic recognition didn’t cause any issues (AFAIK). Therefore an example file that’s causing the troubles would be great, thanks!

example.csv.zip (808 Bytes)

I think the problem appears when there’s a quote within a string (see the 3rd row), maybe that confuses DT?

The attached file shows like this in DT:

Given that Numbers also can’t read that file correctly, maybe you should check its coding. (That of the file, not of Numbers).

Skärmavbild 2023-01-30 kl. 11.10.50

In Numbers there is an alternative to split on custom separators, like semicolon. That’s the functionality I’m looking for in DT.

According to

your last line is not conforming to CSV rules. This document may not be accurate, but I have no time to investigate further.

When I use a leading double quote there, I get this in DT:

Which looks a lot better than the previous version.

2 Likes

Thanks for taking your time! I can format the CSV file to work better in DT but it would definitely be more convenient for all of us if DT could handle the file as it is, like Numbers or other applications can.

Well, to do that, the file has to conform to some standard. And CSV is anything but – as you can already deduce from the acronym, fields were originally meant to be separated by commas, not semicolons. Which is, btw, reflected by the only document even coming close to a standard, namely RFC 4180.

Since its inception, many people have implemented different ideas and still called the result “CSV”. One way to cope with that would be to follow your suggestion: add a GUI allowing to set all kinds of parameters. Another way is what DT tries to do, namely figure out the separator and everything else on its own. Which requires a file at list borderline accurate: In your case, you use a single quote to start a string, then a double quote to escape the single quote within it, and then another double quote to terminate the string. Which cannot work. If you consider a double quote to escape a quote, then your string is not terminated.

And there’s definitely more to it than just the field delimiter – character encoding, handling of first row, text-delimiter etc. So, DT (and perhaps DTTG?) would need a special dialog for just this single, quite stupid and under-specified format.

Instead, I suggest modifying your CSV writing so that DT can work with the result. Or simply ignore what DT does and use your file in Numbers (or Excel or whatever). DT’s support for tabular data is fairly limited anyway.

BTW: Why are you enclosing your strings with quotes if they do not contain a semicolon?

2 Likes

Thanks for many insights in the world of csv! I’ll try to modify the script generating the csv-files, it’s written in C++ so I hope I can figure out what to do. The text and the citation marks within it is generated by OpenAI:s Whisper, maybe the outer citation marks are from the C++ wrapper and can be changed more easily.

To me it seems like there could be a combination of solutions: an automated function trying to guess the correct separator, if that one fails a possibility to chose separators. Don’t has to be more complicated than that and it seems to work fine in Pages etc.

1 Like