Search Meetup Cambridge – Challenges of Unstructured Data

Another Cambridge Search Meetup this week, with two speakers on unstructured data, plus the usual networking, beer and snacks. We started with Dean Yearsley of Pingar talking and bravely attempting a live demo of their API, which amongst other things has facilities for entity extraction in multiple languages including English, Chinese and Japanese. The Pingar system is written in .Net and thus unsurprisingly plays well with Sharepoint: Dean demonstrated it automatically providing extra metadata for Sharepoint items, especially useful if a new column has been added to a Sharepoint store, as it would be tedious for operators to have to add data for this column to each item manually.

Jordan Hrycaj of 7Safe, recently acquired by PA Consulting, was up next to talk about what he described as ‘ad-hoc’ search – for use in digital forensics or digital discovery applications. The application he described can be used to search the hard disks of suspect PCs or servers for information such as credit card numbers extremely quickly, working at a low level to avoid leaving any impression on the data (i.e., no file timestamps are altered) and usually working on live systems. This system is command line based, tiny in size and portable across operating systems and is an impressive way to cut down the likely candidates for a data security breach. It was fascinating to hear about a way to search that doesn’t depend on indexing, and the compromises made for performance reasons (i.e., regular expressions can be used but without wildcards).

Thanks to both speakers and to all who came to hear them. We already have some more talks lined up so we expect the next Meetup to be sooner rather than later!

Leave a Reply

Your email address will not be published. Required fields are marked *