Tuesday, July 19, 2016

Registering All Implementations of an Interface with Unity

Currently, the extensibility story for the TrafficCamBot requires anyone wanting to add a new set of cameras to implement a specific .NET interface (directly inside the bot assembly, although this may change in the future). All implementations of said interface need to be registered with the .NET Unity dependency injection container, for this is how the master "camera data manager" finds them.

Originally, I was using an overload of UnityContainer.RegisterType that took generic type parameters. This is well-and-good, but because generic type parameters are really only helpful at compile-time, this approach would have required each new implementation to add an individual line of additional code to call RegisterType.


            container.RegisterType<ICameraDataService, SeattleCameraDataService>(
                typeof(SeattleCameraDataService).Name, new ContainerControlledLifetimeManager());
            container.RegisterType<ICameraDataService, BayAreaCameraDataService>(
                typeof(BayAreaCameraDataService).Name, new ContainerControlledLifetimeManager());
            // etc.

Ideally, this would not be necessary, as it is an additional step for the camera data contributor and is prone to error. Fortunately, we can reflect over all of the types in the assembly looking for appropriate implementations and use a non-generic RegisterType overload as follows:


            var cameraDataServiceTypes = from t in Assembly.GetExecutingAssembly().GetTypes()
                                         where t.IsClass && !t.IsAbstract && t.GetInterface(typeof(ICameraDataService).Name) != null
                                         select t;
            foreach (var cameraDataServiceType in cameraDataServiceTypes)
            {
                container.RegisterType(typeof(ICameraDataService), cameraDataServiceType,
                    cameraDataServiceType.Name, new ContainerControlledLifetimeManager(),
                    new InjectionMember[] { });
            }

Later, since Unity is happy with multiple registered implementations per interface, these can be snarfed out of the container using UnityContainer.ResolveAll:


            var managers = UnityConfig.GetConfiguredContainer().ResolveAll(typeof(ICameraDataService));
            foreach (ICameraDataService manager in managers) {
                var service = manager as ICameraDataService;
                services[service.Name] = service;
            }

Monday, July 18, 2016

Faking NLP With Lucene

Since I'm going back to work at Microsoft soon, I decided I ought to start up a C# / .NET project to stretch my skills a bit. I went looking for good ideas and came up with a Microsoft Bot Framework bot to answer queries for public traffic camera data. I started a Github project here, so check it out. I will blog a little about it as opportunities arise.

Once thing I wanted to do was support as much of a natural conversational style as possible, as is the trend with bots. Here is an example dialog between me and my bot:

I spent a little bit of time investigating how I might do "true" natural language processing (NLP) to answer queries like "Show me traffic at Sunset", perhaps using something like Stanford CoreNLP. The question then arose as to how I would train an appropriate model. With traffic cameras, the camera names sometimes look sorta like addresses, which is clearly trainable. But many times, they don't, and in fact I finally decided they didn't really fall into any trainable pattern whatsoever.

Instead, I decided to apply search techniques. I set up a Lucene index. Each document in the index represents one traffic camera. I added text to it with different combinations of possible abbreviations. For example, a camera named "NE 85th St" might be added to the index with a document like:

ne 85th st
northeast 85th st
ne 85th street
northeast 85th street

        private Document CreateDocument(string title, IEnumerable<string> altNames)
        {
            var doc = new Document();
            doc.Add(new Field(CAMERA_NAME_FIELD, title, Field.Store.YES, Field.Index.NOT_ANALYZED));
            var sb = new StringBuilder();
            sb.Append(title);
            sb.Append('\n');
            foreach (var altName in altNames)
            {
                sb.Append(altName);
                sb.Append('\n');
            }
            var content = sb.ToString();
            doc.Add(new Field(CONTENT_FIELD, content, Field.Store.YES, Field.Index.ANALYZED));
            return doc;
        }

When it comes time to process a query, we first look for exact (conjunctive terms) and fuzzy matches. This will fail for the natural-language style text. So at that point (and kind of as a last resort), the whole query string gets passed directly to the index with no preprocessing other than lowercase normalization, and the results scored. All of the documents passing a certain threshold are returned.

        private IList<string> RunQuery(Query query)
        {
            var collector = TopScoreDocCollector.Create(MAX_SEARCH_RESULTS, false);
            searcher.Search(query, collector);
            var scoreDocs = collector.TopDocs().ScoreDocs;
            logger.Debug("Searching for " + query +
                ", top score is " + (scoreDocs.Any() ? scoreDocs[0].Score.ToString() : "non-existent"));
            var results = from hit in scoreDocs
                          where hit.Score >= HitScore
                          select searcher.Doc(hit.Doc).Get(CAMERA_NAME_FIELD);
            return results.ToList();
        }
If there is only one matching document, we have achieved "magic" and present the camera directly to the user. Otherwise (as in the example dialog above), we present a choice menu.

What I found in practice is that this works for a wide variety of query and camera names. Typically, the desired camera document(s) will have a score around 0.3, and there is an order-of-magnitude drop-off in the scores of other "matching" documents (which perhaps just match a generic term like "avenue").

So at the end of the day, with no true NLP algorithms in play at all, it seems the bot can do a fairly decent job of handling natural-language style queries in this limited domain.