Looking for Square Pegs: Finding a Solution and Turning it Into Code

Print Friendly, PDF & Email

(Previously published on AndrewComeau.com.)

Introduction

Programming and software development is a type of engineering. More than just sitting down and typing code, it’s about designing solutions to everyday problems using technical knowledge gained over years of training and experience. Larger projects can include a series of challenges to solve, some of which can’t be foreseen until they’re staring you right in the face and laughing at your project deadlines.

Being a software developer is more about being able to solve problems than it is about what languages or tools you know just like being a carpenter is about more than knowing how to use a hammer or a saw.  Programming technologies come and go and a skilled programmer can pick up a new one in a short time.  Being able to evaluate the requirements of a project, break them down into a series of tasks and come up with a robust solution is a much more valuable skill.  Without that, mastery of a dozen languages would be next to useless.

For coding challenges, this series of tasks is often called an algorithm.  Algorithms are logical methods used to accomplish a specific objective.  They can be written out in plain English (or any other human language) as a list of steps and decisions that must be followed in order to achieve the desired result.

A recent experience with this was a small problem that I had to solve as part of a project I was working on and the problem was this:

How do you program a comparison of three or more text values and make sure they’re all the same? 

For two values, it’s easy; you use an IF … THEN statement and an equality operator (= or ==, depending on the language) or, in the .NET languages, you could use the String.Compare function which compares one string to another.  Three or more values starts getting more complex, especially if the number of items to be compared changes to four or more. It’s also best if the code returns more than a TRUE / FALSE value.  If the values don’t match, there needs to be some indicator of what went wrong.

I could just throw a bunch of code on the page showing how I solved this problem and the average programmer could analyze the algorithm from it pretty quickly and then even more quickly tell me how I could have done it better. (Programmers like to do that to each other; it’s almost a compulsion.) As this article is meant for people who are relatively new to programming, I’d rather take you through the process and see how I came to the solution.

How Big of a Hammer Do We Need?

One of the first things to determine is the scope of the problem and the size of the needed solution. Often this means asking questions like:

  • How many users will be working with the program or how many machines it will be running on.
  • How many different systems will be affected?
  • How much data will be moved and stored?
  • Is this a repetitive process that needs to be especially efficient because of how often it will run?

This can make the difference between a small program that can be worked up in a day or two and an enterprise solution designed by a team of programmers that takes a couple of months to plan, build and test.

On a lower level like this one, it’s just a question of how much code is needed and how it should be organized within the program.  In this case, a single function should do it. That function will, of course, be part of a larger program in the end but for now we’re focusing on this one task.

Since it is a self-contained function, my next thought is about code re-use.  If I’m going to take the time to write a specialized function, I’m going to design it so that it can be re-used or adapted elsewhere when needed, if possible by just copying it from one program into another and calling it.  In a sense, functions should behave as much as possible like interchangeable parts in any system. If I was talking about the parts of a stereo system, the first thing I’d consider after the function of the component itself is it’s connections. What kind of inputs does it have for external devices and how will it output its signal to the speakers or other systems?  Again, the same questions are asked when writing code.

With functions, the inputs take the form of the values that can be passed to it as arguments or parameters.  The function will use these values during its calculations. In this case, the function will need to accept a list of text values to compare to each other.  That list could be passed in a number of ways; as an array of strings or a collection object specific to the .NET framework.  Sometimes, there could be reasons to use a type of input that’s specific to a given programming technology but right now, I want the code to be adaptable. I’m going to use the simple choice of a delimited string; text that includes more than one value separated by a specific character.  Comma Separated Values (CSV) in which each value is separated by a comma is the most common. Spaces and semi-colons could also be used. The input might look something like this:

1234,1234,1234,1235,1234

A one-dimensional string array would be my next choice. With an array, the values are pre-formatted so that the function can examine all the values in sequence.  Since I’m designing with C#, however, there’s a simple function that I can use to create such an array from the CSV shown above so I’ll let my function do that step as well and make the input a little simpler.

The output of the function is a bigger question. The requirement states that PASS / FAIL is not enough; if not all the values match, there has to be some indication of why. I also like outputs that can be used in more ways than one so I decided to return a list of the distinct values within the delimited string.  The code that calls this function can then use that list in whatever way it wants or simply count the number of list items. If there’s more than one, it means there was no match.

The next question is how will the function format that list? The function’s output is more easily changed when and if it’s re-used so I’m going to choose an output based on what works for the program that the function is part of right now.  At first, I was going to return it as a string array that could be easily enumerated in code.  The problem is that string arrays need to be declared with a specific number of items in C# and are not easily resized which is a problem since I don’t know how many values I’ll find. After searching some more, I settled on the Generic List class where the individual values are easily referenced and counted by the calling code.

Devising the Plan

Once we know what the function is accepting and returning, we need to figure out how it will arrive at the result. This is the step that gets easier with experience but there can still be some trial and error involved.

One of the often-repeated truths of programming is that there is more than one way to accomplish most tasks, often several ways.  Programming languages are getting more powerful every year and often reduce the amount of custom code that it takes to accomplish any given task.  When it comes down to it, though, a programmer is going to use the methods that are the easiest to remember and understand given the time available to think through a problem and maybe his or her caffeination level.  There will always be room for improvements but the first priority is often getting the job done.

With this function, I know that I have a delimited list and I know that I can split it into an array to make it easier to work with the values so that’s my first step in the algorithm.  I need to go from “1234,1234,1234,1235,1234″ to:

[0] = 1234
[1] = 1234
[2] = 1234
[3] = 1235
[4] = 1234

Remember that arrays in C# are zero-based so the numbering of the elements starts with 0.

1. Split the input string into an array.

I also know that I need to define the Generic.List that the function will build and return.

2. Declare a return list.

So now I have an array of values to check and an empty return list. Now the question is how I’m going to check the array for duplicates. Sometimes the best way is to start with how a human would do it and be glad a computer can use the same method hundreds of times faster.

If you were manually checking a list to verify everything matched, your method would depend on the length of a list and what you were checking. A list of five items or less could be verified at one sweeping glance. If you had more than five items, you might look for one item that you knew was correct and compare everything to that.  This function checks a list of indefinite length and doesn’t have a reference value to work with so the next method is to compile the list of distinct values … one at a time.

3. Add the first value in the input array to the return list.

After all, the first value in the input array is unique so far so that makes a good start.  Now I need to start moving through the input array and looking for discrepancies.  To do that, I use the return list as the reference value(s) to check against.

4.  Check the next item in the input array. Does it match a value in the return list?

a. Yes?  Do nothing.

b. No?  We have a new distinct value.  Add it to the return list.

5.    Repeat #4 until we reach the end of the input array.

After we’ve checked the entire input array, there’s nothing left to do but …

6.    Return the List.

Writing the Code

Method 1: The Scenic Route

So, after all that consideration, we can start writing the function:

private List<String> DistinctList(string ValueList)
{

}

The C# function declaration above defines both the inputs and outputs of the fuction. The List<String> return type means that the function will return a Generic List object that can hold only text strings.  The ValueList argument at the end is the delimited string that will be input to the function.  This first line is also referred to as the function’s signature. A call to this function would look like this:

List<String> resultList = DistinctList("1234,1234,1234,1235,1234")

Now we need to take care of the first three steps in the algorithm.

string[] inputList = ValueList.Split(new char[] {','});

List<String> returnList = new List<String>();

returnList.Add(inputList[0]);

I won’t go into detail about every keyword used in this code – you can find more information in the help files and elsewhere online – but the first line splits the ValueList argument that was passed into the code into a string array called ‘inputList’ using the commas as a delimiter.  The second line creates an empty List object called ‘returnList’, setting the required type as a
String.  The third line copies the first value from inputList to returnList.

arrayIndex = 0

foreach (string inputValue in inputList)
{
  match = false;
  if (arrayIndex > 0)
  {
    foreach (String compareValue in returnList)
    {
      if (!match && (compareValue == inputValue))
        {
          match = true;
        }
    }
    if (!match)
    {
      returnList.Add(inputValue);
    }
 }
  arrayIndex++
}

The above code is the main part of the function.  It uses two nested foreach loops, the first iterating through the input array and matching each value against each of the return list values with the second ForEach loop. (The IF statement on the third line skips over the first item in the input array since it’s already been handled.)  A boolean variable called ‘match’ is used to record a match between the input array and return list. If there is no match, the value is added to the end of the return list.

Despite all the comparisons, this code works extremely fast, instantaneously even when I used it with a space instead of a comma as a delimiter and tested it with a few long paragraphs of text.

Once the code is finished,  all that’s left is to return the returnList object as the output of the function.

Method 2: Getting on the Freeway

Now, the code above works very well and it’s perfectly acceptable for the job.  It’s reliable, it’s fast and it’s easily understood by any programmer with even a basic level of experience.  That last point is important because as a professional programmer, you’ll spend a lot of time analyzing other people’s code whether to support it or to learn from it. There’s also a good chance that someone else will one day be reading your code.  The clearer it is, the better for everyone involved.

Still, as I was writing this article, I kept thinking that with all the power behind C#, there had to be a more advanced way to do this than having the program mimic a human method of manual comparison.  Also, as I said earlier, I knew that if I didn’t find that better method, it was only a matter of time before someone else wrote to me telling me about it and I’d be kicking myself for not finding it.  So I started looking through the documentation again.

The second version of the method starts out the same; a delimited string is passed to the function which will be returning a List<String> object. The Split function is once again used to split the string into an array.

string[] inputList = ValueList.Split(new char[] {','});

The next lines change everthing …

List<string> returnList = inputList.Distinct().ToList();
return returnList;

Done.

That’s right.  With some extra research, we’ve gone from over 15 lines of code (including the variable declarations I didn’t show) down to three, including the return line.  The Distinct() function performs a query on the array elements much like the following SQL query:

SELECT DISTINCT ItemValue
 FROM Collection

The ToList() function then converts the resulting collection back to a List object that can be returned from the function.

Conclusion

So which method is better?  Well, that depends …

As I said early on, being a programmer goes beyond knowing a specific language. If you’re working in a .NET language like C# or VB.NET and have access to functions like these, that’s great.  On the other hand, what if you’re working in one of the earlier flavors of Visual Basic like VBA or VB 6.0?  (Yes, there are companies still using both.)  There are also other languages that are just as current but are specialized and have more limited vocabularies. You might also be on a deadline without the time to find that one function that will do everything for you.  In those cases, it’s necessary to know how to think through an algorithm that will get the job done with the basic functions available from a foundation-level knowledge of whatever language you’re working with.  The fancy stuff can come later. Even when you do have access to powerful functions like those above, you will eventually find that your programs take on greater levels of complexity and those functions become mere pieces in larger puzzles that you have to solve.

Besides, I had more fun taking the scenic route.

Subscribe to the newsletter to be notified of new posts.

You can stay up-to-date on new articles and resources from Comeau Software Solutions by subscribing to our newsletter. You can opt-out at any time.