Sunday, October 16, 2016

C# List-Sampling Extension Method

While messing around with some for-fun code in C# today, I found myself looking for something along the lines of Ruby's Array#sample method, but for .NET's IList. Strangely (or perhaps not-so-strangely, because when you search for something like "C# IList sample extension", you get tons of noise), I couldn't find anything. So I wrote a simple one to share here:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
public static class ListExtensions
{
 readonly static Random myRand = new Random();
 readonly static object lockObj = new object();

 public static T Sample<T>(this IList<T> list)
 {
  lock (lockObj)
  {
   return Sample(list, myRand);
  }
 }

 public static T Sample<T>(this IList<T> list, Random rand)
 {
  if (list == null || list.Count == 0)
  {
   return default(T);
  }
  return list[rand.Next(list.Count)];
 }
 public static IList<T> Sample<T>(this IList<T> list, int size)
 {
  lock (lockObj)
  {
   return Sample(list, size, myRand);
  }
 }

 public static IList<T> Sample<T>(this IList<T> list, int size, Random rand)
 {
  if (list == null)
  {
   return new List<T>();
  }
  return list.Select(x => new { Entropy = rand.Next(), Item = x })
   .OrderBy(x => x.Entropy)
   .Select(x => x.Item)
   .Take(size)
   .ToList();

 }
}

Notes:

  • I put this on the IList interface instead of, say, IEnumerable, because the random access helps greatly with the efficiency of the size-1 overloads. (And that just happens to be the specific case I needed today).
  • You can use the overloads that require you to pass your own Random instance, or not, but if you choose not to, note that you'll take a locking penalty since System.Random isn't thread-safe.
  • Samples of size > 1 are implemented via that Linq-style shuffle.

Tuesday, July 19, 2016

Registering All Implementations of an Interface with Unity

Currently, the extensibility story for the TrafficCamBot requires anyone wanting to add a new set of cameras to implement a specific .NET interface (directly inside the bot assembly, although this may change in the future). All implementations of said interface need to be registered with the .NET Unity dependency injection container, for this is how the master "camera data manager" finds them.

Originally, I was using an overload of UnityContainer.RegisterType that took generic type parameters. This is well-and-good, but because generic type parameters are really only helpful at compile-time, this approach would have required each new implementation to add an individual line of additional code to call RegisterType.


            container.RegisterType<ICameraDataService, SeattleCameraDataService>(
                typeof(SeattleCameraDataService).Name, new ContainerControlledLifetimeManager());
            container.RegisterType<ICameraDataService, BayAreaCameraDataService>(
                typeof(BayAreaCameraDataService).Name, new ContainerControlledLifetimeManager());
            // etc.

Ideally, this would not be necessary, as it is an additional step for the camera data contributor and is prone to error. Fortunately, we can reflect over all of the types in the assembly looking for appropriate implementations and use a non-generic RegisterType overload as follows:


            var cameraDataServiceTypes = from t in Assembly.GetExecutingAssembly().GetTypes()
                                         where t.IsClass && !t.IsAbstract && t.GetInterface(typeof(ICameraDataService).Name) != null
                                         select t;
            foreach (var cameraDataServiceType in cameraDataServiceTypes)
            {
                container.RegisterType(typeof(ICameraDataService), cameraDataServiceType,
                    cameraDataServiceType.Name, new ContainerControlledLifetimeManager(),
                    new InjectionMember[] { });
            }

Later, since Unity is happy with multiple registered implementations per interface, these can be snarfed out of the container using UnityContainer.ResolveAll:


            var managers = UnityConfig.GetConfiguredContainer().ResolveAll(typeof(ICameraDataService));
            foreach (ICameraDataService manager in managers) {
                var service = manager as ICameraDataService;
                services[service.Name] = service;
            }

Monday, July 18, 2016

Faking NLP With Lucene

Since I'm going back to work at Microsoft soon, I decided I ought to start up a C# / .NET project to stretch my skills a bit. I went looking for good ideas and came up with a Microsoft Bot Framework bot to answer queries for public traffic camera data. I started a Github project here, so check it out. I will blog a little about it as opportunities arise.

Once thing I wanted to do was support as much of a natural conversational style as possible, as is the trend with bots. Here is an example dialog between me and my bot:

I spent a little bit of time investigating how I might do "true" natural language processing (NLP) to answer queries like "Show me traffic at Sunset", perhaps using something like Stanford CoreNLP. The question then arose as to how I would train an appropriate model. With traffic cameras, the camera names sometimes look sorta like addresses, which is clearly trainable. But many times, they don't, and in fact I finally decided they didn't really fall into any trainable pattern whatsoever.

Instead, I decided to apply search techniques. I set up a Lucene index. Each document in the index represents one traffic camera. I added text to it with different combinations of possible abbreviations. For example, a camera named "NE 85th St" might be added to the index with a document like:

ne 85th st
northeast 85th st
ne 85th street
northeast 85th street

        private Document CreateDocument(string title, IEnumerable<string> altNames)
        {
            var doc = new Document();
            doc.Add(new Field(CAMERA_NAME_FIELD, title, Field.Store.YES, Field.Index.NOT_ANALYZED));
            var sb = new StringBuilder();
            sb.Append(title);
            sb.Append('\n');
            foreach (var altName in altNames)
            {
                sb.Append(altName);
                sb.Append('\n');
            }
            var content = sb.ToString();
            doc.Add(new Field(CONTENT_FIELD, content, Field.Store.YES, Field.Index.ANALYZED));
            return doc;
        }

When it comes time to process a query, we first look for exact (conjunctive terms) and fuzzy matches. This will fail for the natural-language style text. So at that point (and kind of as a last resort), the whole query string gets passed directly to the index with no preprocessing other than lowercase normalization, and the results scored. All of the documents passing a certain threshold are returned.

        private IList<string> RunQuery(Query query)
        {
            var collector = TopScoreDocCollector.Create(MAX_SEARCH_RESULTS, false);
            searcher.Search(query, collector);
            var scoreDocs = collector.TopDocs().ScoreDocs;
            logger.Debug("Searching for " + query +
                ", top score is " + (scoreDocs.Any() ? scoreDocs[0].Score.ToString() : "non-existent"));
            var results = from hit in scoreDocs
                          where hit.Score >= HitScore
                          select searcher.Doc(hit.Doc).Get(CAMERA_NAME_FIELD);
            return results.ToList();
        }
If there is only one matching document, we have achieved "magic" and present the camera directly to the user. Otherwise (as in the example dialog above), we present a choice menu.

What I found in practice is that this works for a wide variety of query and camera names. Typically, the desired camera document(s) will have a score around 0.3, and there is an order-of-magnitude drop-off in the scores of other "matching" documents (which perhaps just match a generic term like "avenue").

So at the end of the day, with no true NLP algorithms in play at all, it seems the bot can do a fairly decent job of handling natural-language style queries in this limited domain.

Saturday, May 28, 2016

Constructor Injection >> Field Injection

The Guice documentation for injection starts with "Constructor Injection" above the fold. When I was working at Google, I went for a really long time before I even realized that field injection was supported in Guice, as it was so rare to see in actual code.

But when I started doing Spring again (having previously used it in the pre-annotation-based days), I was somewhat horrified to see that field injection via @Autowired seems to be the normal practice in modern Spring code. (For the record, the Spring documentation for annotation-based injection actually kicks off with setter injection, but most code you'll see around the web uses fields.)

In general, I think field injection is poor practice. Let me list the reasons why.

  1. Injected fields cannot be made final. Well, that's not 100% true as Guice will try a setAccessible(true) trick, but in general you don't want to go there. What you do want to do is have immutable classes where possible. (Basic Effective Java stuff.) And the things you @Inject (services and the like), you almost certainly want to be final. Field injection confounds this.
  2. Injected fields are harder to debug. With constructor injection, you can set a breakpoint in the constructor and see precisely what got bound and what the DI framework is handing your object, at the precise time it's happening. Field injection is less predictable and harder to observe in this regard.
  3. Injected fields make unit test setup much more complicated. For me, this is the big one. Let's say you want to unit test with mocks/fakes/stubs. (Certainly you do, as one of the main reasons you are using DI in the first places it to unlock better testability.) In that case, if you are using injected fields, you have two choices for unit test setup, and they are both bad:
    1. Make your fields settable after construction -- either by expanding their scope to package or public, or providing package/public setters into which your tests can push test instances. (Read: break encapsulation.) But at least the fields aren't final thanks to #1, so you've got that going for you, which is nice .
    2. Fire up the DI container in your unit test and have the container wire up the mocks/fakes/stubs for you. This is problematic for two reasons. First, DI containers do a ton of reflection, and this can slow your unit test start by a couple of orders of magnitude in a non-trivial application. (30-second unit-test startup, anyone?) Second, the container will probably find and instantiate not just your desired class-under-test and its dependencies, but a bunch of other things that you probably don't want to initialize, or alternatively have to worry about re-binding to fast, harmless or no-op implementations. From here, there is a slippery slope down to an insanely-complicated parallel DI configuration just for the unit tests, and you don't want to maintain that.
The main argument I've heard in favor of field injection is that it reduces boilerplate code by a couple of lines. Doesn't seem worth it to me. Use constructor injection instead.

Saturday, March 12, 2016

Hello Arduino

Last week I started playing with a new toy -- the Arduino. I have a couple of IoT-style projects kicking around the back of my head, but at this point, I'm just learning.

Here is a photo of my first "original" project -- a binary counter built with 8 LED's:



And the corresponding source code:

int leds[] = {2,3,4,5,6,7,8,9};

void setup() {
  int index;
  for (index = 0; index <= 7; index++) {
    pinMode(leds[index], OUTPUT);
  }
}

void loop() {
  int num;
  for (num = 0; num < 256; num++) {
    int bitNum = 0;
    for (bitNum = 0; bitNum < 8; bitNum++) {
      if (num & (1 << bitNum)) {
        digitalWrite(leds[bitNum], HIGH);
      } else {
        digitalWrite(leds[bitNum], LOW);
      }
    }
    delay(1000);
  }
}

Building things in the Real World is lots of fun for software people like me that spend so much time in the land of abstractions. Assuming I do anything interesting, I'll try to post the details here.

Sunday, March 6, 2016

Telescoping and Microscoping Constructor Patterns in Java

Java does not allow default values on constructor parameters, which greatly limits your options for defining constructor overloads. This is undoubtedly not news to anybody who managed to find their way here. But I wanted to take a post and dissect what you do instead.

The Best Way to organize multiple constructors is by applying what is sometimes called the "telescoping constructor" pattern. (See Effective Java, Item #2.) In this pattern, the constructors with the fewest number of parameters progressively call the constructors with larger numbers of parameters, until finally the constructor that takes the most parameters performs the initialization of member variables. (In Scala, we would call this the primary constructor.) It might look like this:

 // Fields (final) go up here...
 
 public Contact(String firstName, String lastName, String phoneNumber, String description,
      String contactType) {
    this.firstName = firstName;
    this.lastName = lastName;
    this.phoneNumber = phoneNumber;
    this.description = description;
    this.contactType = contactType;
  }

  public Contact(String firstName, String lastName, String phoneNumber, String description) {
    this(firstName, lastName, phoneNumber, description, "uncategorized");
  }
  
  public Contact(String firstName, String lastName, String phoneNumber) {
    this(firstName, lastName, phoneNumber, "(no description)");
  }
  
  public Contact(String firstName, String lastName) {
    this(firstName, lastName, "");
  }

Despite what some have argued, I would not classify this as an anti-pattern or something you should never, ever use. A Factory or Builder pattern implementation adds a lot of boilerplate code (which modern Java has enough of already), and not every class is worthy of such a thing. (Although I should point out that my contrived example here probably does warrant a Builder, but I'll save that for later.)

What I really want to rail on here is what I call the "microscoping constructor" pattern, where the "wider" constructors progressively call the narrow ones, culminating with a call to "this()" where all of the fields get set to their default values:

 // Fields (can't be final) go up here...
 
  public Contact() {
    this.firstName = null;
    this.lastName = null;
    this.phoneNumber = null;
    this.description = "(no description)";
    this.contactType = "uncategorized";
  }

  public Contact(String firstName, String lastName, String phoneNumber, String description,
      String contactType) {
    this(firstName, lastName, phoneNumber, description);
    this.contactType = contactType;
  }

  public Contact(String firstName, String lastName, String phoneNumber, String description) {
    this(firstName, lastName, phoneNumber);
    this.description = description;
  }
  
  public Contact(String firstName, String lastName, String phoneNumber) {
    this(firstName, lastName);
    this.phoneNumber = phoneNumber;
  }
  
  public Contact(String firstName, String lastName) {
    this();
    this.firstName = firstName;
    this.lastName = lastName;
  }  

(Please don't ever call "this()" in Java. A kitten dies every time you do that.)

Why are microscoping constructors bad? For one, the code is repetitive insomuch as the same fields tend to get initialized, then reinitialized at least once, and more than once if the code is particularly sloppy. Second, it's not possible to make a class that's doing this into an immutable class. (Hopefully we're all on board with immutable data classes now, right?) If you don't believe me, try combining final fields and microscoping constructors, and the compiler will explain the details.

To sum up:
1. Go ahead and use telescoping constructors.
2. Never use microscoping constructors, and replace them wherever you're unfortunate enough to find them.
3. Consider the Builder pattern as the class becomes complex and/or the constructor parameters' ordering can be ambiguous in terms of their data types.

I'll take more about #3 in a future installment.

Monday, February 22, 2016

Self-Testing Java Classes

Way back in the COM days when everyone was talking about components and component marketplaces, I first heard of the idea of building tests directly into the production code. That potentially brings together a bunch of different software engineering disciplines that are near and dear to my heart -- testing, reusability, and design.

Last weekend, I finally got time to start playing with an idea I've been kicking around for a long time -- self-testing Java classes. My approach -- still in what I consider the "prototype" stage -- is based on a series of annotations on methods. With Java 8, we can finally stick multiple annotations of the same type on annotation targets. This is fortunate, because you'll see that this is kind of a prerequisite of my implementation. To explain, it's probably best to start from the outside in.

Here are a few simple tests for an absolute-value function:

  @Comparison(inputs = {"50"}, output="50")
  @Comparison(inputs = {"-40"}, output="40", description="Abs of a negative")
  @Comparison(inputs = {"-40"}, output="0", type=Type.GREATER_THAN_OR_EQUALS)
  public int abs(int input) {
    return input >= 0 ? input : -input;
  }

Each @Comparison annotation translates to a unit test case where the given inputs are passed to the annotated method and the specified output is expected. There is an optional comparison type, which defaults to EQUALS.

The next question deals with how these tests are executed. I wrote a custom JUnit Runner that:
  1. Looks for methods in the test class marked with @SelfTestMethod.
  2. Invokes self-testing on the object(s) those methods return.
  3. Reports the results in typical Unit style.
Thus, an entire JUnit test suite can look like this:

@RunWith(SelfTestRunner.class)
public class MathOperationsTest {
  @SelfTestMethod(name = "Default math operations")
  public MathOperations selfTestMathOperations() {
    return new MathOperations();
  }
}

After a run in Eclipse:

 

 Let's talk about where this potentially works well, and where it doesn't. If you have side-effect-free methods (in the style advocated by functional-programming gurus) that don't require a lot of test setup, you might be able to effectively test them this way.

On the other hand, if you need to set up a lot of state before executing the test, or your class requires one or more collaborators that need to be mocked/stubbed/faked, this probably won't be the way to go. And currently there is no support for method inputs that are not Java Strings or primitives (although this is high on my list to address).

That said, I plan to continue extending this work and seeing how many different scenarios I can push it into. Naturally, ideas are welcome, so feel free to reach out here or on Github.

Sunday, February 7, 2016

Implicit Class for List Element Extraction in Scala

Last weekend at the library, I stumbled upon the book "Realm of Racket", and it occurred to me that I should see if I can get my daughter to learn a bit about Lisp-like languages before she dives into Java in high school next year. Of course, I also had to start working through the Racket book myself. Unsurprisingly, it's fun.

One little throwaway thing I learned was that Racket has named list extract methods for first, second, third, up to tenth. Perhaps that's not super-useful, but I wanted it in my Scala toolbox nonetheless.

So I implemented them in a Scala implicit class and put them on Github as well as here for your enjoyment:

/**
 * Implicit methods that add "first" through "tenth"
 * extraction methods to the Scala List.
 */
object ListMethods {
  implicit class Nth[T](lst: List[T]) {
    def first = lst(0)
    def second = lst(1)
    def third = lst(2)
    def fourth = lst(3)
    def fifth = lst(4)
    def sixth = lst(5)
    def seventh = lst(6)
    def eighth = lst(7)
    def ninth = lst(8)
    def tenth = lst(9)
  }
}

It's just about the simplest possible implicit class implementation I can think of -- ten repetitive one-liners. Nonetheless, if you're having trouble structuring an implicit class, perhaps this helps you.

Also, I took the opportunity to try something new -- FunSuite from ScalaTest -- for unit testing. (Keep in mind that ScalaTest actually supports a few different unit testing styles. Here is a snippet of the test code:


class ListMethodsTest extends FunSuite {
  val lst = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

  test("first method works") {
    assert(1 == lst.first)
  }

  // etc

Saturday, January 30, 2016

Optional get Method on a Java Map

Lately I've been doing both Java 8 (for work) and Scala (for fun and personal development). And although I like Java 8's functional features, they're naturally lacking compared to Scala.

One thing specifically that I wanted this week was a java.util.Optional-returning get() method on a Map. (I was dealing with nested maps at the time.) This method is something that clearly ought to exist but doesn't, probably because it would be too hard to engineer into the Map interface in any kind of backward-compatible way.

So I started a playground in my Github blog code repository for Functional Java and took a crack at building something to address this. What I came up with was a Map wrapper classed that I called "FuncMap". Its banner (and currently, only) feature is a new method called "getOption" that takes a key and returns an Optional-wrapped value:

  public Optional getOption(K key) {
    return Optional.ofNullable(get(key));

  }

That's all it takes to enable a Nullable-free programming style on the Map interface. The following three test cases demonstrate how this works with flatMap.
  @Test
  public void testNestedMapsFirstGetIsNull() {
    FuncMap<String, FuncMap<String, String>> nestedMap =
        new FuncMap<>(new HashMap<String, FuncMap<String, String>>());
    
    String result = nestedMap.getOption("user")
        .flatMap(x -> x.getOption("scott"))
        .orElse("UNKNOWN");
    assertEquals("UNKNOWN", result);
  }

  @Test
  public void testNestedMapsSecondGetIsNull() {
    FuncMap<String, FuncMap<String, String>> nestedMap =
        new FuncMap<>(new HashMap<String, FuncMap<String, String>>());
    nestedMap.put("user", new FuncMap<>(new HashMap<>()));
    
    String result = nestedMap.getOption("user")
        .flatMap(x -> x.getOption("scott"))
        .orElse("UNKNOWN");
    assertEquals("UNKNOWN", result);
  }

  @Test
  public void testNestedMapsWholeExpressionNotNull() {
    FuncMap<String, FuncMap<String, String>> nestedMap =
        new FuncMap<>(new HashMap<String, FuncMap<String, String>>());
    nestedMap.put("user", new FuncMap<>(new HashMap<>()));
    nestedMap.get("user").put("scott", "admin");
    
    String result = nestedMap.getOption("user")
        .flatMap(x -> x.getOption("scott"))
        .orElse("UNKNOWN");
    assertEquals("admin", result);
  }
Yeah, still a little clunky compared to Scala. But it's progress, right?

Tuesday, January 19, 2016

Simulating the "Can't Stop" Dice Game Using Scala

The board game "Can't Stop" has captivated my family since I was a kid. Over the past year, we've been playing a lot. But I often find myself on the losing end to my fiancee, who is a very tough board gamer.

At its core, "Can't Stop" is a probability game. As such, I'm sure a precise probabilistic analysis of the best playing approach is possible. But coming up with a basic strategy is a problem best solved with simulation.

So I wrote some Scala code to do some Monte-Carlo-style simulations to answer the following question: For each triple of numbers on the "Can't Stop" board, what is the average number of times I can roll the dice without losing my turn?

The answer for the ever-popular 6-7-8 combination over 10,000 trials comes out to around 11.5 rolls. On the other hand, the most difficult possible combinations of 2-3-12 and 2-11-12 only allow you to roll about 0.75 times. (In other words, if you find yourself on those numbers, you should most likely quit rolling immediately.)

Although I think this could be a step in the direction of writing a decent AI for the game, as it stands, it's not very usable. When I tried playing some real human games rolling the "recommended" number of times, I lost consistently and badly. That's because in a real game, going out on half your rolls is far too frequent. Perhaps if I re-ran the simulation looking for, say, an 80% success rate instead of 50% (at least for the "easier" combinations), I could come up with a playable table of roll counts.

The implementation is quite straightforward, largely thanks to Scala and List's combinations() method. Iterating over all of the triples from 2 to 12 can be set up in a for loop like this:

val numbers = for (i <- 2 to 12) yield i
for (markers <- numbers.combinations(3)) {
  // ...
}

Clean and simple.

Sunday, January 10, 2016

Processing CSV Files in Java -- A Java 8 Perspective

Well, after a couple of years' hiatus, I'm going to try and start blogging here again. I'm going to do some things differently going forward. First, I'm going to moderate comments, since Blogger allows too much spam to get through. Second, instead of jumping through hoops trying to squeeze a bunch of properly formatted code into Blogger, I'm going to keep the bulk of the code in a Github repository I set up expressly for this purpose. You can find it here -- https://github.com/scottmcmaster/blogcode.

For today's post, I'm going to tackle the common task of processing CSV files. My fiancee recently spent a lot of time writing one-off CSV parsers for her work, where she had to read in a file, make a few conditional changes to some values, and write the modifications back out. The canonical way of processing a CSV file without any higher-level assistance goes something like this:
  1. Read a line of the input file.
  2. Split it.
  3. Change one value in the line.
  4. Change another value in the line, then another, until complete.
  5. Write the line back out.
  6. Repeat.
With tools like Apache Commons CSV and openCSV, one can easily do a bit better in terms of conciseness, readability, and robustness.

But after watching my fiancee's specific issues, what I really wanted to be able to do was process columns of values instead of rows. Something like this:
  1. Read the entire input file to an in-memory data structure.
  2. Change all of the values in one column.
  3. Change the values in another column, etc. until complete.
  4. Write the file back out.
(Granted, this requires reading the entire file into memory, but for many applications, this is not an issue.)

My intuition was that column-oriented processing would make the business logic clearer and less error-prone in the code. To further this goal, I wanted to be able to do this with Java 8 syntax, closures and the like.

My proof-of-concept code is available here. It uses Apache Commons CSV to read input into a data structure, the "CSVMaster". It contains a list of rows whose values are mapped by the column headers for easy access. Many CSV frameworks support that. What's slightly new is how the row list's iterator and stream are exposed. So you can write code like this:

// Change each zip value to a default zip+4.
master.forEach(row -> row.set("zip", row.get("zip") + " -0000"));

// Get all of the rows from a specific zip code.
List specialRows = master.stream().filter(row -> row.get("zip").startsWith("95610"))
    .collect(Collectors.toList());

Provided this turns out to be useful and usable, I will continue to enhance and extend it. Additional thoughts welcome.