Word Docs to HTML

Convert Word Docs to HTML
Tagged: WebsitesTuesday, September 26, 2017

Convert Word Docs to HTML

Install Pandoc

Visit pandoc.org to download and install the tool. I am on windows and used the installer package. It was very simple and nothing but the defaults were needed to install it.

This command-line tool will help get that word document into html. Its not perfect and most of the headings will need corrected, but it depends a lot as to how a user formatted things when writing the word doc. But its close and strips out all that nasty ms-style code every web developer hates.

Convert the Files

Open a command prompt and navigate to where your word document exists. In this example the document is called content.docx. Run the following command to convert this document into an html file.

pandoc -f docx -t html -o content.html content.docx --extract-media=images

In the same directory there should now be a content.html file. Now you typically have to correct the headlines and clean up some markup. But your halfway there and saved a bunch of time!

extract-media flag

The --extracted-media= flag will extract any images from the document to a folder. I wanted them placed into an images folder so the flag becomes --extract-media=images.