This #recipe allows you to convert any document into Markdown for storing them in your notes.
We will be using Pandoc, a popular universal document converter. It can convert documents in Microsoft Word, HTML, LaTeX, and many other formats to various formats including markdown and many others.
We will go through the example of converting Microsoft Word documents to Markdown. For detailed instructions on how to use Pandoc, please refer to the Pandoc documentation.
pandoc --version
Enter
to runfind -name "*.docx" -type f -exec sh -c '
for f; do
pandoc --extract-media=./ -f docx -t markdown -o "${f%.*}.md" "$f"
done
' find-sh {} +
Get-ChildItem . -Filter *.docx |
Foreach-Object {
pandoc --extract-media=./ --from docx --to markdown $_ -o $_.Name.Replace('.docx', '.md')
}
Pandoc accepts a range of command line arguments to control the conversion process. Here, we’ll mention a few that are relevant to the example above.
--extract-media=./
is used to extract the images from the Microsoft Word documents and store them in a subfolder named media
-t markdown
converts the Microsoft Word documents to Pandoc’s Markdown. You can also use -t gfm
to convert to GitHub Flavored MarkdownNote that you may want to review the converted Markdown files to ensure that the conversion was successful. Then, You may want to delete the original Microsoft Word documents.