Split a Excel File

Hi everybody,

I have a question about hot to split a Excel file.

Let’s try to explain. I have a lot of Excel files, and each one, has a line with some information about a patent (name, number, abstract, year, etc).

** What I need to do is: each line should be a plan text in DEVONthink.

I show below some pictures to illustrate that. Maybe some of you should help me with a tip about how to create a macro or another way to convert Excel lines in external text files (one text file for each line).

My problem is: I have about 20.000 lines to input at DEVONthink.

Help me, please? @BLUEFROG @Bernardo_V @cgrunenberg


What I have? Excel files (a lot of them). I wish that each line will be convert in a text plain at DEVONthink

What I expect to have as an output? Plain Texts with the information of each line.


Excel Files to convert in Notes

You would either have to script Excel or convert the files to CSV, afterwards they could be imported into DEVONthink and scripted via DEVONthink.

Hi @cgrunenberg

I really tried a lot of things before ask you here.
I test your idea with CSV. Look the result. In this Excel file I have 15.685 lines.

I don’t know how to script Excel. Should be great to have suggestions about how to procedure.

You might look into TextSoap, a utility with a lot of tools for processing text files. There is a free trial, I believe. https://www.unmarked.com/textsoap/

It is very interesting @GordonMeyer but I can’t solve my problem yet.

@cgrunenberg I tried today. I created a script as well as I can do it.
But I’m not able to create a increase value for it. Is it possible to help me?

I just copy and paste this block many times:

Windows(“patent families on 3d printing 00001-00100.csv”).Activate
Rows(“102:102”).Select
Selection.Copy
Workbooks.Add
ActiveSheet.Paste
Application.CutCopyMode = False
ActiveWorkbook.SaveAs Filename:= _
“D:\Users\Dropbox\Cris\00_mac\transfer\teste\0102.txt”, FileFormat:=xlTextMac _
, CreateBackup:=False

I like it.
I have a lot of texts that really need a special “treat” like that.
Tks

@Ulbrich,

It’s called “data munging” (Google the term). I’ve done this sort of thing hundreds (or more) times. While it is certainly possible to do it all in Excel to create the files that you want to put into DEVONthink, that probably would be the hard (and time-consuming) way. It’s certainly not something for DEVONthink to do for you except for the final step of importing which is your goal.

The process, which can be described as an algorithm, would be something like the
following (which is “psuedo-code”):

  1. Step 1: Decide which columns of the Excel file are going to be exported
  2. Step 2: Decide if each line in each Excel file is an individual file (I think that the case, from what you say). If not, then decide. In addition, decide if there is any transformational stuff you can do in the code to the info in the files, e.g. concatenate columns, do some standard re-wording, etc. that you can automate. Develop output file format. Decide output file naming convention. Use *.txt suffix.
  3. Step 3: Develop some code which does something like the psuedo-code below. Test, adjust, correct … recycle till it does what you want.
  4. Step 4: Re-run it all on the thousands of files and yet more thousands of records in Excel to create the text files.
  5. Step 5: Import the new text Files into DEVONthink and do your DEVONthink stuff.

Psuedo-code (‘=’ characters to make it look right … best way I could figure for this) that you would run inside a Mac OSX “terminal”. It would be something like:

read list of Excel file names (all in same folder to make easy)
for each file in list:
====put full contents of the Excel file into a string variable
====for each line (Excel row) in contents variable:
========for each Excel column in the contents:
============copy the relevant columns into individual variables
========build the output as string variable by concatenating/transforming columns See step 2).
========create a unique new file name (see Step 2).
========open a new output file using the new name
========write the output string created above
========close the output file

The tools used depends on who does the work. First step would be to find someone in
your firm or “diaspora” who knows data munging and has some experience and tools. Use the tools that person knows how to use. Or do it all yourself. I like and would use Python supplemented by Pandas. Probably a few hours or so (maybe more, maybe less) to figure out what to do then do it. My gut says 15 minutes. My experience says “be careful of what gut says, please.” Then test, then … let it rip on your data. Put some “print” statements in code to watch progress. Probably good to put some exception testing in to make sure it’s all ok. Inspect the results.

If your colleague is an Excel person and wants to use, I’m sure it would work also. The work process wouldn’t be much different than that expressed in the psuedo-code above.

Recommended Resources (let you and your data munging colleague explore)

  • Google “read excel into python” and “data munging with python”
  • Python (already on your Mac, but recommend using Anaconda to install a fresh Python and any other tools you may want to use)
  • Pandas for Python (use the Excel reading functions, et. al. Arrives with Anaconda)
  • see examples via Google in Stack Overflow

None of these resources will cost you anything. Just your time to investigate.

2 Likes

I don’t know anything about scripting Excel, but it seems like a two-step process:

(1) Extract each row in an Excel file into individual text files into a folder in macOS by AppleScript . The 2nd post in the link below shows the basic script (I haven’t tested it but that’s the idea) but someone who knows Excel scripting better may be able to automate the range selection.
https://discussions.apple.com/thread/7238131
(2) If the script in the above link works, then you can import/index the text files into DT3.

For step (1), perhaps you don’t even need AppleScript, the VBA of Excel is very powerful. See the last post of this link https://stackoverflow.com/questions/24252470/create-text-files-from-data-rows-in-excel. (I haven’t tested it)

Just my 5 cents.

Haven’t touched VBA for 20 years but… The VBA works!

The excel file

The VBA with a tiny change in file path in the above mentioned link:

Sub toFile()

    Dim FilePath As String, CellData As String, LastCol As Long, LastRow As Long
    Dim Filenum As Integer

    LastCol = ActiveSheet.UsedRange.SpecialCells(xlCellTypeLastCell).Column
    LastRow = ActiveSheet.UsedRange.SpecialCells(xlCellTypeLastCell).Row

    For i = 1 To LastRow
        FilePath = "/Users/user name/Desktop/TestFolder" & "/" & Trim(ActiveSheet.Cells(i, 1).Value) & ".txt"
        Filenum = FreeFile

        Open FilePath For Output As Filenum
        CellData = ""

        For j = 2 To LastCol
        CellData = Trim(ActiveSheet.Cells(i, j).Value)
        Write #Filenum, CellData

        Next j

        Close #Filenum

    Next i
    MsgBox ("Done")
End Sub

The output:

The file:

My another 5 cents: perhaps it means to consider going to Excel/VBA forum for help to automate the process of exporting the rows of a butch of Excel files…

@ngan I must to say many thanks.
In these days, that everything could be hard and difficult, we find nice people that do you well to help us.

@rmschne I will keep in mind your suggestion. I understood that I need time to know well about “data munging” but should be great to learn it in a soon future.

@GordonMeyer I used textsoap for another topic and it works very well.

So many thanks for all of you.

I hope to contribute with my 5 cents one day too.

One more (max out my VBA now …)

The spreadsheet has a description row and a title row (like yours):

The filename with the workbook name and title row as prefix:

The contents in the file don’t have the quotation marks now:

The VBA (sorry that I used to be able to do way better than this 20 years ago :sweat_smile: :sob:)

Sub toFile()

    Dim FilePath As String, CellData As String, LastCol As Long, LastRow As Long
    Dim Filenum As Integer
    Dim theFileName As String
    
    LastCol = ActiveSheet.UsedRange.SpecialCells(xlCellTypeLastCell).Column
    LastRow = ActiveSheet.UsedRange.SpecialCells(xlCellTypeLastCell).Row

    For i = 3 To LastRow
        theFileName = ThisWorkbook.Name & " - " & ActiveSheet.Cells(1, 1).Value & " - " & Trim(ActiveSheet.Cells(i, 1).Value)

        FilePath = "/Users/username/Desktop/TestFolder2" & "/" & theFileName & ".txt"
        Filenum = FreeFile

        Open FilePath For Output As Filenum
        CellData = ""


        For j = 2 To LastCol
        CellData = ActiveSheet.Cells(2, j).Value & ":"
        Print #Filenum, CellData
   
        CellData = Trim(ActiveSheet.Cells(i, j).Value) & vbCrLf
        Print #Filenum, CellData

        Next j

        Close #Filenum

    Next i
    MsgBox ("Done")
End Sub

@ngan I have one more difficult task to do.

I have a lot of files. What I need to do is create a new directory with the same name of the Excel File.

Is it possible to help with it?

tela_02

I’ll leave that to you…
The 2nd post in this https://stackoverflow.com/questions/32846009/save-excel-workbook-in-a-new-created-folder-with-the-same-names has the relevant codes for your needs.

Happy[pain but joyful] VBA.

P.S. google for “create a folder with name of workbook by VBA” and you should find what you need.

Thanks very much @ngan
I’m really very happy with your attention for my questions.
You are right. The last post solve all my problems.

The final result @ngan

15.685 files today. A complete study about 3D Printing Patents Family.

1 Like

Cool!

Hi,

I have a question about rename files.

I want to rename all files in a database.
The second line is the title that I wish as a new filename.

But:

  • some titles are so big
  • some titles have special characters as + or /

Is it possible to create a way to rename it, using an automatic command or script that:

  • select the second line of a markdown file as its name
  • limit this filename into 30 characters or 8 words (for e.g.)

original name: Method and apparatus of three-dimensional printing and electronic apparatus
text to use to rename: Method and apparatus of three-dimensional printing

  • do not use symbols as + : / in its name

Backup the database and use a test group.

Use smart rule, insert the script in edit script.

There might be a better/more elegant way to script, and I haven’t tested the scripts extensively.

For words.

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			set theText to plain text of theRecord
			set theName to paragraph 2 of theText
			set theWordCount to count (every word of theName)
			if theWordCount > 8 then
				tell application "TextEdit" to set AppleScript's text item delimiters to " "
				set name of theRecord to (words 1 thru 8 of theName) as string
				tell application "TextEdit" to set AppleScript's text item delimiters to ""
			else
				set name of theRecord to theName
			end if
		end repeat
	end tell
end performSmartRule

For characters

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			set theText to plain text of theRecord
			set theName to paragraph 2 of theText
			set theCount to count theName
			if theCount > 30 then
				set name of theRecord to texts 1 thru 30 of theName
			else
				set name of theRecord to texts 1 thru -1 of theName
			end if
		end repeat
	end tell
end performSmartRule

I don’t know how to deal with special character, not familiar with regex.

Tks @ngan it works very well. I test with a few data. Now I will try to lead with special character, like [ + : / ? . Any other help?