Posts Tagged ‘sharepoint’

Search Meetup Cambridge – Challenges of Unstructured Data

Another Cambridge Search Meetup this week, with two speakers on unstructured data, plus the usual networking, beer and snacks. We started with Dean Yearsley of Pingar talking and bravely attempting a live demo of their API, which amongst other things has facilities for entity extraction in multiple languages including English, Chinese and Japanese. The Pingar system is written in .Net and thus unsurprisingly plays well with Sharepoint: Dean demonstrated it automatically providing extra metadata for Sharepoint items, especially useful if a new column has been added to a Sharepoint store, as it would be tedious for operators to have to add data for this column to each item manually.

Jordan Hrycaj of 7Safe, recently acquired by PA Consulting, was up next to talk about what he described as ‘ad-hoc’ search – for use in digital forensics or digital discovery applications. The application he described can be used to search the hard disks of suspect PCs or servers for information such as credit card numbers extremely quickly, working at a low level to avoid leaving any impression on the data (i.e., no file timestamps are altered) and usually working on live systems. This system is command line based, tiny in size and portable across operating systems and is an impressive way to cut down the likely candidates for a data security breach. It was fascinating to hear about a way to search that doesn’t depend on indexing, and the compromises made for performance reasons (i.e., regular expressions can be used but without wildcards).

Thanks to both speakers and to all who came to hear them. We already have some more talks lined up so we expect the next Meetup to be sooner rather than later!

Building bridges in the Cloud with open source search

We’ve just published a case study on our work for C Spencer Ltd., a UK-based civil engineering company who take a pro-active approach to document management – instead of taking the default Sharepoint route or buying another product off the shelf, they decided to create their own in-house system based on open source components, hosted on the Amazon AWS Cloud. We’ve helped them integrate Apache Solr to provide full text search across the millions of items held in the document management system, with a sub-second response. Their staff can now find letters, contracts, emails and designs quickly via a web interface.

C Spencer are known for their innovative and modern approach – they’re even building their own green power station on a brownfield site in Hull. It’s thus not surprising that they chose cutting-edge open source technology for search: tracking and managing documents correctly is extremely important to their business.

Not so FAST…

Microsoft have announced a roadmap for their enterprise search products: none of this is very surprising. How successful they’ll be at integrating the FAST technology (which comes from a Linux background) with Sharepoint, .NET etc. remains to be seen. More coverage here.

They’ve also released an ‘Express’ (i.e., free but feature limited) version of Microsoft Search Server. We’re going to take a deeper look at this soon.

Tags: ,

Posted in Uncategorized

February 12th, 2009

No Comments »