Baidu searches according to the keyword of PDF to makrdown, and most of the results are conversed, that is, markdown text to PDF format.
But there are few solutions for PDF to markdown.
Just because I have this requirement in my work, I have implemented a solution myself.
The following figure is a PDF file opened with PDF xchange editor. I want to export its contents in markdown format.
(1) first, export the PDF to word format with suffix of. Docx
(2) use typora to obtain the markdown source code of the word document:
At this time, the task is only half completed, because the tool typora is converted into markdown format. If the original word document contains pictures, and these pictures exist in markdown in the form of local pictures, then if I directly publish markdown containing the labels of these local pictures to the communities that support markdown, such as Jianshu, CSDN, open source China, Tencent cloud and Alibaba cloud , these local pictures will not be displayed.
Therefore, we must find an efficient way to upload the local image contained in word to the network first, and then replace the local image tag with the generated markdown tag containing the image network URL.
(3) change the suffix of word file from. Docx to. Zip. After decompression, all local files can be found in the sub folder media of the folder word.
Upload all these local files to the website and generate the following URLs:
I wrote a tool to merge the markdown source code that only contains the local image tag and the source code that contains the above online image URL tag. After that, the local image tag will be replaced by the online image tag:
This tool can be obtained from my GitHub:
The following figure shows the effect of my original PDF after it is converted to markdown format and published in a community, which is completely consistent with the appearance of the original PDF:
To get more original articles of Jerry, please pay attention to the public number “Wang Zixi”: