Common errors you might see in PDF to Word conversion

By Author thumbnail image Team DocFly | on

Converting a PDF file to a Word document opens a ton of doors -- allowing you to modify and rework content that was previously uneditable. Especially when using a top tier PDF to Word conversion tool, changing the file type will save you time and aggravation regardless of what type of modifications you need to make.

screen showing errors

Source: Unsplash/David Pupăză

Scenarios for converting PDF to Word

If you are handed a large paper file that needs to be reviewed and edited, scanning the file and converting it from PDF to Word is the most practical way to create an editable file.

If your team is about to email a PDF proposal to a client and you catch a last minute typo, quickly convert it to Word, make the necessary changes, review it, convert it back from Word to PDF and send it off to the client.

If you’re a book publisher who receives manuscripts in all sorts of formats, you need a standardised process to convert them to Word so you can format the book and get it dialed in for the galleys.

These are just a few scenarios when a PDF to Word conversion tool will come in handy. Needless to say, there are countless other situations when file converting will make your life easier.

Now that we’ve convinced you that PDF to Word conversion tools might be the greatest thing since sliced bread, we must acknowledge that not all conversions always turn out 100% perfectly. They’re usually very close, but one con of converting PDF to Word is that even the very best tools occasionally produce some minor errors.

Let’s say you’re publishing a 100,000 word book and using a very reliable PDF to Word conversion tool that guarantees 99.9% accuracy. That means approximately one out of every 1,000 words may have a mistake. Publishing your manuscript with 100 minor typos will inevitably result in a major problem. 

Conversion errors can take a variety of forms -- ranging from subtle font inconsistencies that are hardly noticeable to critical misspellings or distorted graphics. While conversion errors are usually rare and small in the grand scheme of things, it’s important to be aware of a few of the most common types of conversion errors and stomp them out. If you know to keep an eye out for these, catching and correcting them should be straightforward.


Here are 8 common conversion errors to be on the lookout for when reviewing a converted file:


Conversion errors when converting PDF to Word

Font Problems

PDF to Word conversion tools often use optical character recognition (OCR) software to identify how words and figures fit together. They are designed to read and convert a wide variety of fonts. With that said, new fonts are constantly being created and modified.

The accuracy of OCR software has come a long way over the past few years. However, even with the considerable advancements, OCR software is unfortunately still far from perfect.

If you use Times New Roman 12 point font the entire time, it’s pretty likely you won’t have any issues. But exclusively using bland fonts isn’t any fun. If your text includes a combination of Lobster, Pacifico and Anton, it will certainly be more lively and engaging, but you may be more susceptible to font problems while converting from PDF to Word.


Disjointed Letters and Numbers

In addition to font errors, OCR software can cause some other types of small issues. Particularly in interpreting lower quality scanned papers, letters or numbers are sometimes mistranslated.

For example, the capital letter "O" could get mistaken with the number "0." Depending on the fonts, lowercase "l," capital "I" and number "1" can all resemble each other. A lowercase "b" and number "6" can get mixed up.

Making matters more complex, some scanned PDFs can combine two letters into one or break one letter into two. For example, the software could incorrectly divide a "w" into "vv." Your Word Document spell check feature will usually catch egregiously misspelled words by displaying the infamous squiggly red line under the allegedly misspelled word. Yet, relying on Word spell check exclusively is a dangerous game that should be avoided.


Wrong Words

If disjointed letters cause a word to be misspelled in some nonsensical way, Word’s spell check feature should display its infamous red squiggly line below the misspelled word -- which, in this case, is actually good news.

For instance , let’s use our previous example in which a PDF conversion software incorrectly read a "w" as "vv." The original document used the word "lower," but the converted document displays "lovver." A quick reread of the sentence will demonstrate that you didn't intend to say "lovver" or "lover" -- you meant to say "lower." Manually correct and you are good to go. Word spell check should shine a spotlight on the vast majority of text-based spelling errors.

Alas, spell check is not foolproof and should not be relied upon instead of thorough proofreading. You and your publisher would feel rather foolish if "lovver" or "lover" is accidentally displayed instead of "lower" anywhere in your manuscript.


Bold, Underline and Italics Errors

Bold, underline and italics are effective ways to emphasize titles, names, key points and more. Writers don’t use them arbitrarily -- they serve a particular purpose.

If the text that you emphasized in a certain way doesn’t convert correctly, it’s a problem. Sometimes OCR conversions might interpret bold, underline and italics text as a different font or even entirely different characters.

Modified typography can be an area of weakness for OCR software. It’s wise to double and triple check your fancy typography and make sure your fonts and styles display exactly as you intended.


Hyphenation Confusion

Most standard manuscripts use "justified alignment" -- meaning that text stretches across the entire page. This is slightly different from "left alignment" -- which means that if a word doesn’t fit on the same line it automatically drops down to the next line. For reference, this blog post is aligned left.

Justified alignment has aesthetic advantages -- namely the text is neatly displayed in a box-like shape instead of having the right column unevenly distributed based on wherever the last word on the line ends.

One of the tricks that justified alignment uses to pull this off is hyphenating words that don’t completely fit with the rest of the text on the first line. Justified alignment only hyphenates words in between syllables, but that’s definitely enough to cause confusion in file conversion. If Word page settings (such as gutters width or line spacing) aren’t identical to the original PDF document, unnatural hyphenations can inexplicably pop up in the middle of a line on the converted file.

Sometimes, these are inconsequential errors that stand out and are easy to catch. For example, if the word "encourage" is broken up as "en-courage," it’s easy to catch and delete the hyphen.

Sometimes, they can be catastrophic miscues where meaning is completely changed. Suppose you intended to use the word "manslaughter" in your text. How disastrous would it be if the justified alignment settings resulted in converting your text to "mans-laughter"?

The CTRL+F (command+F on Macs) feature allows you to locate all hyphens and delete those that are unnecessary. You don’t have to stress too much about hyphen errors because they’re easy to catch, but definitely be aware of the potential awkwardness that could arise.


Disappearing Links

Links are a fundamental part of just about every webpage. They enable you to navigate a particular website and the Internet in general. All online content should include links that enhance reader experience.

Alas, links are one of the more common pieces of information lost when a PDF document is converted into Word. The OCR software accurately pulls the text, but sometimes misses hyperlinks.

The likelihood of hyperlinks being excluded increases when using natural anchor text instead of the actual URL in the body of the text. Since including the URL in the natural flow of written prose disrupts flow, relevant anchor text is the most common and natural way to include links.

For example, including "Edit PDF" in the natural flow of your document fits much better than ""

Unfortunately, there isn’t a great shortcut to ensure that all your links were transferred correctly. Proofread thoroughly and, when in doubt, compare the new document to the original.


Column and Line Break Inconsistencies

Columns and line breaks don’t always align perfectly between PDF and Word. Column widths and line spacing in Word are very easy to customise -- for the purpose of PDF to Word file conversion, perhaps too easy.

Even the slightest page setting discrepancies can cause a ripple effect through your converted file. Suppose you adjust margin positions so they are half a cm wider than they were in the original PDF document. That 0.5 cm of additional space on each side is enough to add a couple of extra letters per line -- which can often have the cumulative effect of moving words, sentences and even paragraphs further up the page.

Additionally, it seems like every document uses slightly different breaks between lines and paragraphs. Some document settings automatically include a 6pt space between paragraphs. Some are double spaced. Others are exclusively single spaced and use the tab button to signal a new paragraph.

Fortunately for people who want to design their documents in particular ways, there are seemingly endless formatting options to design your document so it looks exactly as you want it.

That freedom is great, but it does come at a small cost when it comes to file conversion. Pay close attention to the margins and spacing of your converted file and make sure they do indeed meet your exact specifications.


Distorted Graphics

The final category of PDF to Word conversion issues falls in a separate category from text-based errors. If your PDF file includes pictures, tables, graphs or other visuals (most usually do!), there’s a chance that your graphics will not transfer perfectly when converted from PDF to Word.

Suppose you include an 8x8cm bar graph in your PDF document with two paragraphs of text neatly wrapped around the left of the graph. It’s likely that the text was formatted to fit around the graph in a visually attractive way. If margins are slightly different or the graph converts slightly larger or smaller than its original size, it could throw off your original text wrapped layout.

Tiny graphic distortions usually aren’t a big deal, but it never hurts to compare converted graphics to originals and view them with a little extra scrutiny.


Bottom Line

Now that you’ve just read a detailed list of conversion errors, we want to remind you that PDF to Word conversion software has improved immensely over the last decade. The best conversion tools are remarkably adept at interpreting characters, fonts, spacing and graphics. They operate well above 99% accuracy (DocFly is one such tool) . It’s highly unlikely any of the above issues will ruin your document or cause you major issues.

However, after investing large amounts of time, energy and money into creating a high quality document, it would be a shame to speed through the final edit and miss a couple of silly conversion errors that easily could’ve been avoided. Keep these possible errors in mind while converting PDF to Word so the world can see the best possible version of your work.