How to isolate and quantify keywords in a pdf file?

lemon_twist · February 12, 2019, 3:49am

I’m currently working on a project that involves analyzing interview transcription pdf documents. My goal is to use DTPO to quantify the number of occurrences for very specific words related to the subject of the research.

I’m new to the program, but feel it would be an excellent resource for this type work. What would be the best approach? The data found would then be exported to an Excel spreadsheet.

Thanks.

Gerry

cgrunenberg · February 12, 2019, 9:20am

Did you have a look at the Concordance drawer for documents or the Concordance panel (see menu Tools) for databases?

lemon_twist · February 12, 2019, 11:40pm

Yes, and it seems to give part of what I’m looking for in the transcript. Someone also suggested that I use a dedicated qualitative / quantitative program for this purpose Nvivo. Seems promising, but perhaps a bit overkill for my project.

R_2_is_misleading · February 25, 2019, 7:13pm

The Concordance function in DT is a good start. But it counts EVERY word, including lots of junk.You can start with Concordance, but if you are going to be working on this project for long, I recommend investing in some (probably free) program. The key requirement is that you can input a list of specific terms to be counted.

Once you see some initial results, you will probably want to do additional and more elaborate analysis. For this, raw DT will rapidly run out of power. But yes, Nvivo is way more than you need. There are a jillion open source packages (especially if you use R or Python), and I’m sure there are some “pre-packaged” alternatives, as well.

cgrunenberg · February 26, 2019, 7:38am

An upcoming release will include the possibility to exclude words from the concordance.