Saving as markdown (any kind) or webarchive clutter-free destroy formatting of the code + brackets wrongly interpreted

I am trying to download this site as clutter-free website or any kind of markdown
https://blog.esciencecenter.nl/testing-shell-commands-from-python-2a2ec87ebf71

In the saved file, the code from website has removed all spaces and tabulation which make the code unusable (Python uses intendation as a part of the syntax);

import pytest 
import os 
import shdef test_install(cookies): 
# generate a temporary project using the cookiecutter 
# cookies fixture 
project = cookies.bake()                                                    # remember the directory where tests should be run from 
cwd = os.getcwd() 
# change directories to the generated project directory  
# (the installation command must be run from here) 
os.chdir(str(project.project))  try: 
# run the shell command 
sh.python(['setup.py', 'install']) 
except sh.ErrorReturnCode as e: 
# print the error, so we know what went wrong 
print(e) 
# make sure the test fails 
pytest.fail(e) 
finally: 
# always change directories to the test directory 
os.chdir(cwd)

What I expect is here

import pytest
import os
import sh
def test_install(cookies):
  # generate a temporary project using the cookiecutter
  # cookies fixture
  project = cookies.bake()                                                  
  # remember the directory where tests should be run from
  cwd = os.getcwd()
  # change directories to the generated project directory 
  # (the installation command must be run from here)
  os.chdir(str(project.project))
  try:
    # run the shell command
    sh.python(['setup.py', 'install'])
  except sh.ErrorReturnCode as e:
    # print the error, so we know what went wrong
    print(e)
    # make sure the test fails
    pytest.fail(e)
  finally:
    # always change directories to the test directory
    os.chdir(cwd)

Saving as markdown apart of flattening of the indentation structure introduce random dt-links in output files

import pytest 
import os 
import shdef test_install(cookies): 
# generate a temporary [project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B) using the cookiecutter 
# cookies fixture 
[project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B) = cookies.bake()                                                    # remember the directory where tests should be run from 
cwd = os.getcwd() 
# change directories to the generated [project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B) directory  
# (the installation command must be run from here) 
os.chdir(str([project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B).[project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B)))  try: 
# run the shell command 
sh.[python](x-devonthink-item://F717E71B-E4A9-4E50-B48A-EF2AAA7FB638)(['setup.py', 'install']) 
except sh.ErrorReturnCode as e: 
# print the error, so we know what went wrong 
print(e) 
# make sure the test fails 
pytest.fail(e) 
finally: 
# always change directories to the test directory 
os.chdir(cwd)

Are you just trying to capture the code block? If so, try DEVONthink’s Preferences > Sorter > Copy Selection. I selected the code block and yielded this (though I added the fenced code block and language so Prism shows it more nicely)

It’s unclear what process you’re referring to here. Please clarify.

To reproduce the problem:

  1. Use “Capture content from …” on any page using Medium CMS containing Python code
  2. When choosing format, chose either option:
    a) Archive, clutter-free
    b) Markdown (unticked clutter-free)
    c) Markdown (clutter-free)
  1. Resulting saved file has removed formatting in Python code,
    The Archive, clutter-free result:
import pytest 
import os 
import shdef test_install(cookies): 
# generate a temporary project using the cookiecutter 
# cookies fixture 
project = cookies.bake()                                                    # remember the directory where tests should be run from 
cwd = os.getcwd() 
# change directories to the generated project directory  
# (the installation command must be run from here) 
os.chdir(str(project.project))  try: 
# run the shell command 
sh.python(['setup.py', 'install']) 
except sh.ErrorReturnCode as e: 
# print the error, so we know what went wrong 
print(e) 
# make sure the test fails 
pytest.fail(e) 
finally: 
# always change directories to the test directory 
os.chdir(cwd)
  1. If any of Markdown options in chosen, apart being flattened, the resulting file have inserted x-devonthink-links in places where was brackets used
import pytest 
import os 
import shdef test_install(cookies): 
# generate a temporary [project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B) using the cookiecutter 
# cookies fixture 
[project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B) = cookies.bake()                                                    # remember the directory where tests should be run from 
cwd = os.getcwd() 
# change directories to the generated [project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B) directory  
# (the installation command must be run from here) 
os.chdir(str([project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B).[project](x-devonthink-item://90E83D99-D0C5-4ED3-B6B8-183D2CB3CA4B)))  try: 
# run the shell command 
sh.[python](x-devonthink-item://F717E71B-E4A9-4E50-B48A-EF2AAA7FB638)(['setup.py', 'install']) 
except sh.ErrorReturnCode as e: 
# print the error, so we know what went wrong 
print(e) 
# make sure the test fails 
pytest.fail(e) 
finally: 
# always change directories to the test directory 
os.chdir(cwd)
  1. Expected formatting of the code in the article (code isn’t a standalone note, is a part of the saved article)
import pytest
import os
import sh
def test_install(cookies):
  # generate a temporary project using the cookiecutter
  # cookies fixture
  project = cookies.bake()                                                  
  # remember the directory where tests should be run from
  cwd = os.getcwd()
  # change directories to the generated project directory 
  # (the installation command must be run from here)
  os.chdir(str(project.project))
  try:
    # run the shell command
    sh.python(['setup.py', 'install'])
  except sh.ErrorReturnCode as e:
    # print the error, so we know what went wrong
    print(e)
    # make sure the test fails
    pytest.fail(e)
  finally:
    # always change directories to the test directory
    os.chdir(cwd)

the problem is present not only on medium sites, but some sites will render properly like real python https://realpython.com/python-debugging-pdb/23

For example html that DT is not able to parse properly and the spaces are removed looks like that (example from the link I posted);

test that tests the installation</a> is:</p><pre class\="gk gl gm gn go kd ke bz"><span id\="d782" class\="ed iv iw dg kc b kf kg kh s ki">import pytest<br/>import os<br/>import sh</span><span id\="f88f" class\="ed iv iw dg kc b kf kj kk kl km kn kh s ki">def test\_install(cookies):<br/> # generate a temporary project using the cookiecutter<br/> # cookies fixture<br/> project = cookies.bake() </span><span id\="9594" class\="ed iv iw dg kc b kf kj kk kl km kn kh s ki"> 

But for example this plugin which save sites to markdown work correctly, and the spaces aren’t removed in resulting file GitHub - deathau/markdownload: A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.

Why are there backquotes in front of the equals signs?

I didn’t wrote this code, I just looked into source of this site (the fragment start around the text “The code for the test that tests the installation is”:
https://blog.esciencecenter.nl/testing-shell-commands-from-python-2a2ec87ebf71

I tested other medium pages with code blocks included, and devonthink also have problem with properly extracting code block

I cannot reproduce this here.
Anyone else?

you need to have tag #shell or #python in your database to reproduce this exact code I think (I have those tags)

In general To reproduce this create a tag in DT database from any word that is inside code block before downloading the page from given link, then save markdown file, the x-devonthink-link to this tag will be inserted as link to this word

(so basically inserting x-links is different issue to removing intendation from the code, I am on DT 3.7 now, so maybe latest app-wide changes how to work with markdown cause this)

I had a look at that what the medium page you posted first used in its code section. It is a mind boggling mess. Which does not mean that it is impossible to render in MD, just that it might be overly complicated (e.g. they pepper the code with span and br elements that are completely pointless in this context).

Anyway, if the plug-ins you mentioned do what you want, it might be worth for the DT people (@cgrunenberg @aedwards ?) to have a look at Mozilla’s readability.js – that’s at the core of the plug-ins. And who if not Mozilla should know how to parse even the weirdest HTML :wink:

My workarkaround is that I actually using that plugin and save markdown to indexed by DT folder :slight_smile: