This guide was created to emphasize the importance of using PDF files with selectable text and to highlight a tool, Doc Drop that helps to optimize these files before creating a Hypothesis activity.
If you find yourself working with a PDF in which none of the text is selectable then this is the page for you!
What is OCR?
OCR, or Optical Character Recognition, is a process where software converts images of text into a machine-readable format. Web browsers and apps like Hypothesis need this machine-readable format in order to recognize and select text within the document.
OCR-optimized documents are beneficial to blind and visually impaired readers, as OCR allows screen readers and other assistive technology to interact with the text. Working with OCR-optimized documents is a best practice whether or not you are annotating with Hypothesis.
How do I know whether my PDF is OCRed?
If you can easily select a line of text and then copy and paste it elsewhere, and the pasted text is properly formatted, your PDF is OCR-optimized and you can start annotating.
You will need to apply OCR technology to your PDF if any of the following is true:
- You are unable to to select any text
- You can select text, but it is difficult to get only the text you want
- You can select text, but it is “garbled” or poorly formatted once you copy and paste it elsewhere
- Someone who uses screen reader technology has indicated the PDF is difficult to read
Below are directions on how to use DocDrop
What is Doc Drop?
Doc Drop is a free, simple to use tool created by Hypothesis to optimize your PDF files and uses the best underlying technology. It allows you to drag and drop almost any document from your computer to their site.
To OCR a PDF
- Open the Doc Drop webpage
- Drag a file on to the DocDrop page or click the DocDrop page and select the file from your computer.
- Click “Run OCR”.
- If your PDF already has selectable text but it is garbled, incomplete, or otherwise broken you can try the “Force OCR” button to create a new text layer in the document.
- Download the resulting PDF and use it in Hypothesis.
This content is adapted from a resource created by Hypothesis and is shared under a CC-BY-NC license.