Friday 9 January 2009

SBS 2008 - PDF Search in Companyweb SharePoint

Here are some excellent instructions for enabling the SharePoint spiders to crawl and index PDF files and their text content: SharePoint Server 2007 and SharePoint Services V3 PDF search and indexing.

While the post does have the correct icon for us to download and install, though the name in the post and the actual file name are different, we need a new version of the iFilter for 64bit systems: Adobe - Acrobat : For Windows : Adobe PDF iFilter 9 for 64-bit ...


  • The correct file name for the XML edit: icpdf_3.gif
In a nutshell:
  1. Download the above iFilter v9 from Adobe's site.
  2. Extract the installer from PDFiFilter64installer.zip.
  3. Click Start and right click on Command Prompt and Run as administrator.
  4. Click Continue at the UAC prompt.
  5. net stop iisadmin [Enter]
  6. In Windows Explorer, double click on the PDFFilter64installer file.
  7. Copy the PDF icon to C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images
  8. In the C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\ directory edit the DOCICON.XML file.
  9. Paste the following into the (ByExtension) section that is sorted alphabetically by extension (ours went under the "onetoc2" extension):
  10. (mapping key="pdf" value="icpdf_3.gif") (Change the round brackets to Greater or Less Than as Blogger's editor does not like them)
  11. There will be a need to save the modified XML file to the server's desktop and then copy it back into the XML folder through two UAC prompts.
  12. Back at the command prompt: net start iisadmin [Enter]
  13. Click Start and type Windows SBS Native Tools Management [Enter] in Search.
  14. Click on the IIS Manager.
  15. Under the SS-SBS (MySBSDomain) click on Application Pools.
  16. Right click on the following and Recycle them:
    • DefaultAppPool
    • SBS Sharepoint AppPool
    • SBS Web Applications application pool
    • SBS Web Workplace AppPool
    • SharePoint Central Administration v3
    • Post the PDFs.
  17. Or, instead of recycling the services, append the System Path with: C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\ and reboot!

The update to the System path is suggested in the Adobe installation instructions.

Once the SharePoint application pools have been recycled, we will see:



PDF document icons

One very, very important thing to remember: Upload the PDF files AFTER the Adobe PDF iFilter is installed so that the files are indexed. Otherwise, they will not be indexed until they are touched!

This is very important if there are thousands of PDF files to place into a document library or libraries.

UPDATE: Added the System Path variable to be appended in the last step, and made some grammar error corrections.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists

*All Mac on SBS posts are posted on our in-house iMac via the Safari Web browser.

14 comments:

Anonymous said...

Phil,

As the blog article you cite mentions if you already have PDF's in SharePoint after you add the iFilter you can simply force a recrawl of all the search data via

stsadm.exe -o spsearch -action fullcrawlstop
then: stsadm.exe –o spsearch -action fullcrawlstart

Thanks
Robert Crane

Philip Elder Cluster MVP said...

Robert,

Thanks for catching me on that.

Been a very looonnnngggg day today with another coming tomorrow. :)

Philip

Anonymous said...

Phil,

Not trying to catch you out but it took me sometime to locate that solution while I was preparing my SharePoint Guide.

I though that this time of the year was for holidays not loooonnng days!

Thanks
Robert Crane

Anonymous said...

I followed this guide but I can't search the contents of the PDF files, also there icon is still blank.

the only 2 things that were different for me were:

During the install iisadmin was allready started when I issued the net start issadmin command. It must have restarted itself sometime during the process?

Also I'm not sure if this was in the wrong place but I don't have a Post the PDFs application pool.

I tried to reboot the server and added the path as suggested.

Any ideas on where to start?

Philip Elder Cluster MVP said...

Steve,

Make sure your filename is correct in the XML file since the icon file name in the download is not the same as the ones contained in the instructions.

You will need to follow the steps to initiate a spider crawl of the site manually to get your PDFs to be searchable.

Also, the PDF files themselves need to have "active" text. That is, you can copy and paste the text within a PDF file to Word or Notepad.

Thanks for the comment,

Philip

Unknown said...

Dear Philip,
I Have an SBS 2k8 scenario, very clean enviroment....

i've been working for a while on the getting the pdf files searchable, but it's not working for me for some reason.
it's finding the pdf documents only in the properties and title.

Icons are working properly.

I did install the pdf IFilter and followed the steps upthere, but no good results!

Any recomandations??

Philip Elder Cluster MVP said...

hgaop,

Open one of the PDF documents. Can you highlight, copy, and paste the text into a word processor? If not, then the PDF contains a picture snapshot of the page with the text.

This happens if the device that creates the PDF during the scan is not set to OCR the document as it is scanned.

Search Server Express has the ability to OCR a PDF document that contains images of documents IIRC, but you need another box to run it as it cannot be installed on SBS.

Philip

Unknown said...

thanks for your quick feedback...

Yep, defenetly, i am able to copy paste the pdfs, which are loaded AFTER doing all the required steps...

(actually, i am simply copying from the uploaded pdf, and pasting the bits of text into the seach bar of the wss)

Concering the search server express, i do have another box which i could use, and since the SBS premium comes with a second license of Server, that could be an alternative.

however, my main concern is to get the pdfs running from wss...

any advice?

Unknown said...

oh, forgot to mention that i first isnatlled the IFilter 6... which obviously didn't work due to the 32 64 bits issue.

Hoever i uninstalled the Ifitler 6 before deploying the 9

Philip Elder Cluster MVP said...

Did you restart the crawl after setting up the iFilter?

Philip

Unknown said...

yep,
rested the applications in the IIS as indicated, and did a stop/start services, as well as a full recrawl using the stsadm.exe

Unknown said...

do you mean by:

17.Or, instead of recycling the services, append the System Path with: C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\ and reboot!

(this is the only thing i didn't do actually, but i recycled the services)

Philip Elder Cluster MVP said...

hgaop,

The path change requires a reboot of the server. Once the path has been added to the path, initiate a crawl as per Robert's previous instructions.

Thanks,

Philip

Macker said...

I've spent the last few hours trying to get this to work. I'm going to have to give up for a while or I'll end up throwing the server out the window (from the 9th floor). I also tried the instructions here... http://www.youtube.com/watch?v=SE9BnEdVKbg&feature=channel_page

...to no avail. I'm wondering if perhaps this doesn't work with the latest updates to SharePoint (as of 2/19/2010). I've even rebooted several times....which as you all know takes AGES on an SBS 2008 machine.