From a search engine developer’s point of view there are various things to consider: how quickly new content must become searchable, how to balance this against performance demands and how to rank the results.
A lot of search engine architectures are built on the assumption that indexes won’t need to be updated very often, sacrificing index freshness for search speed, so constantly adding new content is expensive in terms of performance. One approach is to maintain several indexes: a small, fresh one and some older, static ones, with the fresh index periodically being merged into the older static set. Searches must be made across all these indexes of course, with care taken to maintain accurate statistics and thus relevancy ranking.
The question of ranking is also an interesting one: in a ‘real-time’ situation, how should we present the results – does ‘more recent’ always trump ‘more relevant’? As always, a combination of both is probably the best default approach, with an option available to the user to choose one or the other.
In any case there will always be some delay between content being published and being searchable – the trick is to keep this to the minimum, so it appears as ‘real-time’ as possible.