Hi All,
I’m a member of a Labor Union which is governed by its constitution through its members. Our history goes back decades, and all of our agreements, rules, and procedures are the result of democratic processes. This means that, as a member, if you encounter a rule or contract clause, there was at least one meeting at which it was discussed and approved. As is human nature, we sometimes have disagreements amongst ourselves and our employers over the source, wording, or meaning of a particular rule or procedure and sometimes spend hours in the back office flipping through drawers of meeting minutes, agreements, and arbitrations to find the documents to support a resolution. With the exception of the rigorous democratic process, this scenario is probably not too different from what one may experience in any organization with a little history.
When this project started, we were convinced we needed to scan and OCR all of our meeting minutes and agreements to speed up our searches and feel a bit more confident that the search was exhaustive. That’s when I thought to dump our documents into DT, using it at first as a more powerful search engine to sift through them. This helped immensely, but we still had some problems. Number one was OCR errors, and the possibility that something could be easily missed in a search. The second was our own evolving terminology. Over the decades, the lexicon of our organization has changed, and it became very easy to overlook a topic if you didn’t know what to search for. So I started thinking about classification.
Although our meeting minutes were the authoritative source of information, the subject matter of a single meeting varied so much that it was difficult to assemble a comprehensive thread on the history of an issue by just searching the PDFs. Since the individual motions approved at the meetings were where the magic happened, it was becoming clear that I should take the time to isolate and classify them. When researching different procedures I could use to do this, I discovered this wonderful script in the forums that could do about exactly what I wanted. It has taken about a hundred or so hours of my free time over the last couple of years to go through 60 years of minutes, but I have a beautiful collection of “note cards” that contain exactly one motion with a link to the original source that is pretty extensively categorized with tags. I was able to fix the OCR errors as I went along and tags helped eliminate the issue of lexical inconsistency. It’s a great resource for me, but now it’s time to share this work with the rest of our organization.
Unfortunately, not everyone in our Union uses a Mac or has the time or tech savvy to use DT the way I do. So I am attempting to move my work into an online WIKI format where it can live on and be utilized for searches and further categorization by others. I’m in the early stages now, prototyping a proof-of-concept before I create the real thing.
I thought I’d share my experience here as an example of how DT has helped me do my work and what hurdles I may need to clear along the way. If it helps someone else conceptualize a similar solution to their own knowledge problems or inspires someone to take it in a surprising direction, I’ll feel like it was worth sharing. Please feel free to question my choices or offer suggestions along the way if you are interested. I have a tendency to mull things over too much and can easily get paralyzed in indecision. If I’ve learned one thing while learning to use DT, it is to just jump in and not worry too much about wasted effort once you get going. There’s always a way to pivot on to another track if you need to. Nothing is perfect, and no matter how good it is, there’s always something that could be improved. On to the project:
Platform Selection
When choosing a platform to make my work available to others, I considered many options: DT Server, a shared storage volume on the office network, a website on the office network, and a website on the internet. Each had drawbacks. I thought DT Server would be too fussy for my non-tech colleagues to use, and the potential for catastrophic screw-ups made me nervous, even with back-ups. The shared storage volume on the office network seemed a little safer, but ultimately lacking in features for collaboration and organization. A website on the office network sounded like it could be customized a little more, but one on the internet would allow people to research and work from home or out in the field where they might need something to help them make an argument in a dispute on the job. So then, what kind of website?
My first instinct was to use a blog. Most good blogs support tags, search, and comments, but after spinning up an instance of Ghost on a Digital Ocean droplet, I soon realized it would require too many custom views and too much time to customize once I migrated the information into the system. Finally I settled on a WIKI. I wouldn’t have to customize it so much, and I would get the benefit of page history to go along with tags, search, and comments. Most people know what a WIKI is, from the basic consumer of information to the contributors. I chose WIKI.JS since Digital Ocean has a one-click droplet to get you started and the style of the site is modern and straightforward. The jury is still out as to whether or not this is the best choice, but in the interest of moving forward, I’m going for it. I may end up using MediaWIKI for its more familiar layout and mature code base, but I have my own preferences to indulge for the time being.
WIKI.JS
OK, I have a properly configured server with SSL and user permissions to keep away prying eyes. How do I get my stuff in it? I have a few goals. Upload our individual motions as markdown pages with their tags, upload the original PDFs as a reference source, and upload the OCR’d text from the PDFs as markdown pages to be searched (and ultimately formatted and corrected when folks can get around to it). The PDFs and text sets of complete minutes will have to be uploaded first, as they are more static and to be referenced by the motion pages. Thanks to cgrunenberg’s script, I can see that I should be able to automate the tagging and upload of my individual motion files to the blog without too much trouble. I’m still working out how to make the connection to the server to make requests via the GraphQL interface, but I’m hoping I can just use the script editor on my Mac. I’ve been remiss in learning this tool, and now may be the time.
I like when a plan comes together, and I can already foresee adding to our repository of organizational knowledge after I get the meeting minutes up there. We are buried in documents and reference them constantly to guide our daily work life. It’s a real shot in the arm to know how you fit in an organization and how to change things when necessary, and I hope this resource will help others alleviate the feeling that it’s all this faceless bureaucracy that bears down on us imposing it’s inhuman will.
That’s where I’m at for now. I’ll add to the post or the thread as interesting things happen. I should have some useful scripts to share by the time I’m done. Thanks for reading this far.