вівторок, 30 квітня 2013 р.

Scala workshop - day 0

I love Scala. It's combines goodies from JVM and Java infrastructure, Haskell and dynamic languages like Ruby - and it almost doesn't inherit Java programming language deceases.
Anyway, I always wanted to create some end-to-end project using Scala technologies - but never did as  I'm working with completely different technology set. So, I decided to come up with some fake business need and implement it. I'm going to use interesting libraries like Akka and Scalaz in that task, God help me! I'm going to describe things step by step so people with no experience in Scala can pick it up and proceed with me.
Here are things I want to try in this project:

  • Scala parser combinators
  • Explicit error handling with Scalaz
  • MongoDB with Casbah
  • Functional dependency injection via implicit parameters
  • Properties checking with ScalaCheck
  • Concurrency with Akka actors
  • RIA with Play2 or Lift or JS+Scalatra - yet to decide
I'm also going to focus more on functional side of Scala as I see it.

Goal

I've come with following fake 'business idea': we'll be building a service, which provides workflow capabilities. You should be able to submit workflow definitions in textual format and after that create some objects, assign them to workflows and traverse them from step to step. I foresee a lot of users working with that service, so I want to do that as scalable as I can. I will call that stuff WAAS - Workflow-as-a-Service, just because W-buzzword isn't taken yet. 
You see what I'm doing here - I just came up with a task which requires Web, parsing, scalability and persistence. Hopefully it will help me and you to put different pieces of Scala infrastructure together.

What you should know

You should be familiar with Scala syntax - at least not scared with it. I'm not going to explain, what a 'trait', 'val' or 'def' keyword means. If you want to read some introductory stuff - I can recommend  Scala for Java refugees or Another tour of Scala

Starting

If you are familiar with Scala infrastructure, you may skip the rest of this post - I will talk a bit about SBT and Specs2 here. 
So, de-facto standard in building Scala project is Simple Build Tool. Go ahead and install it using site instructions. Also I recommend to install JDK 7 - either OpenJDK or Oracle one. Once you've done, you should be able to launch sbt:


$ sbt
Detected sbt version 0.12.2
Starting sbt: invoke with -help for other options
/home/ytaras/projects/scala/waas_blog doesn't appear to be an sbt project.
If you want to start sbt anyway, run:
  /home/ytaras/bin/sbt -sbt-create

Nice! It says we don't have a project definition here, so we should go ahead and create one:

$ mkdir project                                                                                                                                 
$ cat > project.sbt
scalaVersion := "2.10.1"

resolvers ++= Seq("snapshots" at "http://oss.sonatype.org/content/repositories/snapshots",
                  "releases"  at "http://oss.sonatype.org/content/repositories/releases")
^C

Now you should be able to run `sbt console` and get Scala console running. It can take some time as it downloads Scala 2.10.1 binaries.
So far we didn't do anything interesting - we just specified Scala version to use and added few OSS repositories to be able to access most Scala libs in future. Now lets try some TDD cycle.
After trying Scala test frameworks my personal faworite is Specs2 mutable specifications. Even if it is said that default BDD specs should be preferred over mutable ones, it goes much closer to my JUnit and RSpec experience and also has nice integration with some other libs. If you think I'm wrong and I should use something else, please use the GitHub link I will give in the end of this post and send me a pull request with specs written with your favorite framework - or at least write down your comments here.
So, lets add Specs2 dependency to our project file:

libraryDependencies ++= Seq(
    "org.specs2" %% "specs2" % "1.14" % "test"
)
If you are familiar with Maven or Apache Ivy, you can get whats going on here - you're specifying a group, lib and version and put that into 'test' configuration. Notice that double % is not a typo - it's a SBT hack to overcome Scala's binary incompatibility between versions. Long story short, every Scala lib tends to have compiled versions for every recent Scala version and %% operator picks one based on the current Scala you are running on.
If you have SBT console (that one, which is opened with 'sbt' command, not a Scala console, which is issued by 'sbt console'), you can hit 'reload' there to pick up new project definition. If not - you can just open new SBT shell and will be done for you. Now try typing 'test' into SBT console - it will download binaries if needed, try to compile non-existing product and test files and will gracefully shut down with message about no found tests. Nice! This is your first green build. It is said, that ideal code is no code, so now you have ideal project done :)
Now let's start continuous testing - run '~ test' in SBT shell. It will monitor your source files and relaunch test suite once you saved your code. Let's add first test file src/test/scala/CalculatorSpec.scala


import org.specs2.mutable._

class CalculatorSpec extends Specification {
}
Nothing special is going on here - adding necessary imports and extending specific class. Let's add a first specification. Scala features made possible fluent specs declarations similar to RSpec:

  "Native scala operations" should {
    "add" in { 2 + 2 must_== 4 }
    "subtract" in { 5 - 4 must_== 1 }
    "mutliply" in { 3 * 4 must_== 12 }
    "divide" in { 10 / 2 must_== 5 }
  }
When you save file you should notice pretty-printed specs with success message. As a side-note, what we done here is a useful TDD technical which I call 'wrapping 3rdparty object' (if you know widely adopted name, please let me know). Idea here is that we write unit tests for 3rd-party components, which we don't write and control, to ensure our expectation matches it's real behavior. I don't say you should test all standard lib or operators as I do here, but it may help you if you unsure what some specific method does in 3rd party library - write down a test, make it pass and submit it to your SCM. If version upgrade will break your expectations (and possibly your code) you will be notified immediately.
But enough theory. Let's create a calculator - it's a common showcase for TDD.

  "Calculator" should {
    "add" in { Calculator.add(3, 3) must_== 6 }
    "subtract" in { Calculator.subtract(10, 3) must_== 7 }
    "multiply" in { Calculator.multiply(5, 5) must_== 25 }
    "divide"   in { Calculator.divide(30, 3) must_== 10 }
  }
That ugly API will be our calculator. If we have a glance at our SBT shell we'll notice compilation fails. so let's add calculator definition into src/main/scala/Calculator.scala:

object Calculator {
  def add(x: Int, y: Int): Int = ???
  def subtract(x: Int, y: Int): Int = ???
  def multiply(x: Int, y: Int): Int = ???
  def divide(x: Int, y: Int): Int = ???
}
I really place all that ??? in my code, that's a Scala construct. Long story short, it's a placeholder for your implementation which has whatever type you specify but throws exception in runtime
Scala has very powerful type systems - comparable to Haskell one's - so usually you make use of it and make sure a lot of errors are caught by compiler. Some techniques exists for that and I will mention them in future posts. Anyway, with TDD we want to do one step at a time, so ??? placeholder allows to break Red-Green-Refactor cycle into Red-Compiles-Green-Refactor. Using that in adding 2 integers doesn't make too much sense, but if we're playing with more complex types (especially monadic ones) this technique becomes useful.
If you have a look at SBT console, you'll notice we moved to Compiles stage - tests are executed, but exception is thrown. Nice. Let's use 'obvious implementation' and add, well, obvious implementation for all 4 methods:

object Calculator {
  def add(x: Int, y: Int) = x + y
  def subtract(x: Int, y: Int) = x - y
  def multiply(x: Int, y: Int) = x * y
  def divide(x: Int, y: Int) = x / y
}
I no longer need explicit return types as they are easily inferred out by a compiler. And... Tadam! We're Green - and looks like there's no need for refactoring.
Anyway, currently are testing only against specific values. Is there a way to improve a test coverage, add more sample values? That's what's ScalaCheck framework is for. Let's try to learn it by example.
First, add it to your dependencies:


libraryDependencies ++= Seq(
    "org.scalacheck" %% "scalacheck" % "1.10.1" % "test",
    "org.specs2" %% "specs2" % "1.14" % "test"
)
Specs2 provides nice integration with ScalaCheck, we have to import it and mixin to our spec, so we are able to use helpers in our specs:

import org.specs2.ScalaCheck

class CalculatorSpec extends Specification with ScalaCheck {
Good. Let's write our first property. What about following - Calculator.sum should return sum of two numbers it receives as arguments:

  "Calculator properties" should {
    "add" in prop { (x: Int, y: Int) =>
      Calculator.add(x, y) must_== x + y
    }
  }
If we save a file now, we'll notice that result run has 108 expectations - this is because ScalaCheck generated 100 pairs of ints and verified if satisfies specified property. Thanks to explicit's black magic we can use either boolean comparisons (x == y) or Specs2 matchers (x must_== y) whichever suits better. Notice - we still haven't discovered integer overflow here, just because we didn't write correct assertions, for example that sum of positives is positive; but framework silently generated input values which could help us discover that. But, as we're just learning, let's go to something more obvious - for example, division:

    "divide" in prop { (x: Int, y: Int) =>
      Calculator.divide(x, y) must_== x / y
    }
If we try to run this, it will fail because of zero-division. There are different ways of handling this. Let's imagine that we say 'we never pass 0 as second argument to division, so we don't care what happens there'. It can be easily done with ScalaCheck:

    "divide" in prop { (x: Int, y: Int) => (y != 0) ==> (
      Calculator.divide(x, y) must_== x / y
    )}
If you are scared by lot of ascii symbols and extra parenthesis - I encourage you to read ScalaCheck user guide, which is a really nice description how to write properties. If not - just believe me that this property can be read as 'for any pair of integer such that second integer is not zero, following property is true'
That was rather a big post. Hopefully, you have some insight on Scala infrastructure and building. You can try it on your own, or have a look at GitHub repository. Next time lets do our WAAS!

Heart of Functional Programming

Intro

Every functional programmer should write an article about what FP is at all and I'm not an exception. Next step should be another monad tutorial - and it's still on it's way.

Introduction to functional programming

So, what is functional programming at all? Actually, there's a lot of mess about that - functional programming, functional language, functional features are different things, even if these terms are sometimes confused one with another. For example, I even heard that JavaScript is a functional language as it's allows passing functions as parameters:

function ArrayForEach(array, func) {
  for (var i = 0; i < array.length; i++) {
    if (i in array) {
      func(array[i]);
    }
  }
}

function log(msg) {
  console.log(msg);
}

ArrayForEach([1,2,3,4,5], log);
Example from Wikipedia
So, let's put a couple of definition:
  • Functional language is a programming language which enables, encourages or even enforces functional programming.
  • Functional features are features of a programming language or a library, which makes functional programming easier
  • Functional programming is... well, let's talk about that below
A functional language as a tool use functional features to make using functional programming as a paradigm easier; and we can use those definitions to determine if some programming language is a functional language or not. Whew, enough of F-word :)
But we have one more definition to fill up - functional programming or functional paradigm (FP). This is done by contrast usually and I will follow that path, but with one small deviation. Usually FP is being compared to imperative programming whatever it means. Also is being said that FP and OOP are orthogonal and don't deny each other. I'm going to argue with that. So, before going to FP lang, let's see what we have in OOP.

OOP reprise

Here's a definition given by Alan Key:
  1. Everything is an object.
  2. Objects communicate by sending and receiving messages (in terms of objects).
  3. Objects have their own memory (in terms of objects).
  4. Every object is an instance of a class (which must be an object).
  5. The class holds the shared behavior for its instances (in the form of objects in a program list)
What's interesting here is #2 and #3. Objects communicate by sending and receiving objects, for example:

  result = person.send :full_name
That's valid Ruby code, which sends a message 'full_name' to object 'user' and writes response to 'result' variable. Of course, there's a shorter version, which is in fact a syntactic sugar:

  result = person.full_name
It doesn't matter if we're talking about static or dynamic languages, Java, C#, Ruby, Smalltalk or Python - every method call can be presented as sending a message. We can see simplest basic blocks of OOP are objects and messages aka methods.
Looks like it's easy with #2, but what about #3? It says every object has it's own memory (which can be revealed only by sending and received messages, by the way). In example before object's memory contains, probably, last and first names of a person. But what's not said here - can incoming message mutate internal state of an object? Is following code valid?

class User
def update_last_name(name)
  @last_name = name # Updating internal state here
  # For Java programmers:
  # @ just means instance variable - or a field in Java terms
end
def full_name
  @first_name + " " + @last_name
end
  # Other stuff...
end
u = User.new("John", "Smith")
puts u.send :full_name # John Smith
u.send :update_last_name, "Snow"
puts u.send :full_name # John Snow
Most of the time programming language designers answer 'Yes' to that question. Python, JavaScript, C#, Java, Ruby, Smalltalk - all of them accept objects mutability despite of the differences in those language implementations. In fact, it comes implicitly from a definition - Alan Key used a word 'memory', so it implies that object can 'remember' what happened with it before - and this is a mutability.

Other look

FP doesn't give another answer to 'internal state mutability' question because FP doesn't speak in terms of objects and messages, but if it would we would say 'no, objects are not allowed to change it's state once the are created'. If you haven't heard about that before and can bring only one item out of this article - let it be this one:
Functional programming is a programming using immutable data structures and immutable references
This is not accurate and not full definition of FP - but I consider this as most important part of a paradigm. FP is an art of restricting yourself to immutable things - and all other stuff comes from that. Let's see, if we can still write meaningful things without mutability.
Smallest building block in FP is a pure function. There's a lot of smart words around that - referential transparrency or morphism, but in simple words pure function is a plain old programming function, which has 2 properties:

  1. It's result depends only on input parameters
  2. It's only job is to calculate it's result, it does nothing else - in other words, it doesn't have side effects.
I'm sure you uderstood already what pure function is, but let's have some examples:

Random random = new Random();
random.nextInt();
// is not a pure function as it's result is not determined by its
// parameters, but by internal state of a random generator
int char = System.in.read()
// is not a pure function as well, as it's result
// is determined by user input
System.out.println("Hello, World")
// miss. It's not a pure function, it's not a function at all,
// as it doesn't return any result. Anyway, even if it would
// return something, still it has side effects - output
// in console, so it doesn't satisfy pure function definition.
String.valueOf(1);
// finally we have a pure function here - valueOf depends
// only on a input param and does nothing on the outside.
int l = "Some string".length();
// this is a pure function as well, even if we're writing
// it using object notation.
Let's have few extra words about last case. As I said before, we can say it's sending a message 'length' to object "Some string", but on the other side, there's nothing wrong in looking on it as a function length with one parameter "Some string". A fact that strings are immutable in Java helps in that paradigm shift a lot.

int l = FunctionalStringUtils.length("Some string");
// There's always same result for same string!
// And it doesn't change anything in other parts of a program
Let's add this to our definitions:
Functional programming is a programming using only immutable references/data structures and pure functions

So what?

Ok, forbidden ourselfves to use mutations, we forgot about objects and messages and we are using functions instead - for what? Why on Earth would I use only a subset of possible ways to express a logic of a program? Why should I restrict myself to writing only pure functions?
Think about the following:
Only thing that interests us with pure function is it's result. In fact, we can replace call to pure function with it's result value - and it won't change what our program does. By the way, this property is called referential transparency.
But there's a lot of options for you to get the result value. You can calculate result during compilation if you know function arguments in compile time. Or in another thread. Or in another process. Or at another machine. Or you can not calculate the value before it's really needed. Or you can cache the value and return result from cache for same function arguments.
Oh, by the way, did I mention multithreading? Usually it's almost physical pain to debug a mutlithreaded program build in conventional way. FP changes that - doesn't matter which thread executed the specific function or if it executed at all. In fact, there are prototypes of automatic program paralellization. Think about that - program which knows nothing about multithreading at all, can be run in mutlithreading environment.

Doing mutations

Ok, but what if we need mutations? What do we do to change last name of existing user?
Well, we don't. Instead of destructive updates - we copy to new instance every time we need to change something:

u = {first_name: "John", last_name: "Smith"}
puts full_name(u) # Jonh Smith
u1 = update_last_name(u, "Snow")
# Instead of changing user object, we return an updated copy of it
puts full_name(u1) # John Snow
puts full_name(u) # John Smith
That looks like a huge waste of resources - when we are changing things a lot, we're going to have a lot of copies in our memory. Anyway, fact that all our data structures are immutable, opens a door for some optimizations which can be done by compiler or runtime, for example you can reuse structures or it's parts in different places. Because things are immutable, we can not fear one part of our program impacting other by implicitly changing state of a shared structure.
With that whole class of concurrency problems just go away, so using immutable structures is a good idea not only in FP, but in OOP as well - Joshua Bloch recommends "Minimize mutability" in his book Effective Java

FP distilled

So, we have immutable structures and pure functions - and those are building blocks in FP. If you restrict yourself to this things - you're doing FP, if you relax a restriction and doing some other things - you're outside of FP land.
Also notice we've lost notice of objects and messages - we don't need them anymore. As I said before, it's just a matter of syntax - do we put first argument first or after function name:
In FP we don't mix data and behavior inside of our objects, instead we have them split in different places - our data structures are skinny and contain only data and our functions work with every argument they are provided with - if it conform some preconditions, of course. Note, we haven't talked about types yet - but we're going to do that in next parts of a series.