Once thing I wanted to do was support as much of a natural conversational style as possible, as is the trend with bots. Here is an example dialog between me and my bot:
I spent a little bit of time investigating how I might do "true" natural language processing (NLP) to answer queries like "Show me traffic at Sunset", perhaps using something like Stanford CoreNLP. The question then arose as to how I would train an appropriate model. With traffic cameras, the camera names sometimes look sorta like addresses, which is clearly trainable. But many times, they don't, and in fact I finally decided they didn't really fall into any trainable pattern whatsoever.
Instead, I decided to apply search techniques. I set up a Lucene index. Each document in the index represents one traffic camera. I added text to it with different combinations of possible abbreviations. For example, a camera named "NE 85th St" might be added to the index with a document like:
ne 85th st
northeast 85th st
ne 85th street
northeast 85th street
When it comes time to process a query, we first look for exact (conjunctive terms) and fuzzy matches. This will fail for the natural-language style text. So at that point (and kind of as a last resort), the whole query string gets passed directly to the index with no preprocessing other than lowercase normalization, and the results scored. All of the documents passing a certain threshold are returned.
If there is only one matching document, we have achieved "magic" and present the camera directly to the user. Otherwise (as in the example dialog above), we present a choice menu.
What I found in practice is that this works for a wide variety of query and camera names. Typically, the desired camera document(s) will have a score around 0.3, and there is an order-of-magnitude drop-off in the scores of other "matching" documents (which perhaps just match a generic term like "avenue").
So at the end of the day, with no true NLP algorithms in play at all, it seems the bot can do a fairly decent job of handling natural-language style queries in this limited domain.