1. NLP Class
    1. Home
    2. Syllabus
    3. Schedule
    4. Notes
    5. Assignment Requirements
    6. Links
  2. Useful Information
    1. Scala
  3. Assignments
    1. #0 - Programming
    2. #1 - Probability
    3. #2 - Classification
    4. #3 - N-Grams
    5. #4 - HMMs
    6. #5 - MaxEnt
    7. #6 - Parsing
  4. External Links
    1. UTCL Main site
    2. Blackboard

My Scala Utilities Library

  1. Overview
  2. Collection extension methods
  3. File utilities
  4. Graphing

Overview

I have created a libarary for all of the things that I find missing or broken in Scala.

To include this library, add the following to your build.sbt:

resolvers ++= Seq(
  "dhg releases repo" at "http://www.cs.utexas.edu/~dhg/maven-repository/releases",
  "dhg snapshot repo" at "http://www.cs.utexas.edu/~dhg/maven-repository/snapshots"
)

libraryDependencies += "dhg" % "scala-util_2.10" % "1.0.0-SNAPSHOT" changing()

NOTE: This is already included in the nlpclass-fall2013 dependency, so you do not need to include it again.

The API is available online.

Some highlights of the library are below. There is much more beyond what is described here.

Collection extension methods

To use:

import dhg.util.CollectionUtil._

These methods are defined on as wide a type as makes sense. Most work on Iterator as well.

Basic math stuff

Vector(1,2,3,4).avg      // 2.5

Vector(1,2,1).normalize  // Vector(0.25, 0.5, 0.25)

Vector(('a,1), ('b,2), ('c,1)).normalizeValues
// Vector(('a,0.25), ('b,0.5), ('c,0.25))

Counting

Vector('a, 'b, 'a, 'c, 'a, 'b).counts 
// Map('b -> 2, 'a -> 3, 'c -> 1)

Grouping / Ungrouping

Vector(('a, 1), ('b, 2), ('b, 3), ('a, 4)).groupByKey
// Map('b -> Vector(2, 3), 'a -> Vector(1, 4))

val a = Vector(('a, Vector(1,2)), ('b, Vector(2,3))).ungroup  
// Iterator[(Symbol, Int)]
a.toVector  // Vector(('a,1), ('a,2), ('b,2), ('b,3))

mapVals

This corrects the suprising behavior of mapValues that creates a view of a Map, which means that values are recomputed on each access. My method mapVals always creates a new collection, not a view, so values are only computed once.

Vector(('a, 1), ('b, 2), ('b, 3)).mapVals(_ + 1)
// Vector(('a,2), ('b,3), ('b,4))

maxByN / minByN

Get the N max or min results, sorted. Prevents you from having to sort the whole collection or traverse it more than once.

Vector("be", "what", "a", "the").maxByN(_.size, 2) // Vector(what, the)
Vector("be", "what", "a", "the").minByN(_.size, 2) // Vector(a, be)

split / splitWhere

Works on all Seq. Allows you to keep the delimiter at the front or back. Produces and Iterator.

import KeepDelimiter._
val a = Vector("A", "B", ".", "C", ".", "D").split(".")
// Iterator[Vector[String]]
a.toVector  // Vector(Vector(A, B), Vector(C), Vector(D))

val b = Iterator("A", "B", ".", "C", ".", "D")
          .splitWhere((_: String) == ".", KeepDelimiterAsFirst)
b.toVector  // Vector(Vector(A, B), Vector(., C), Vector(., D))

splitAt on Iterator

Split Iterator in two. Front must be traversed first.

val (a,b) = Iterator(1,2,3,4,5).splitAt(3)
a.toVector   // (1, 2, 3)
b.toVector   // Vector(4, 5)

zipSafe

Throw an exception if the two parts are different sizes.

Vector(1,2,3) zipSafe Vector(1,2)   // ERROR!

File utilities

To use:

import dhg.util.FileUtil._

Self-closing file reader

File(filename).readLines.foreach(println)
// file now closed

Self-closing file writing (and a writeLine method)

writeUsing(File(filename)) { 
  f => f.writeLine("something")
}
// file now closed

Graphing

To use:

import dhg.util.viz._

Create a Chart and draw() it.

import java.awt.Color.{red, lightGray}
Chart(
    Histogram(Vector(0, 4, 5, 1, 1.5, 3), 3, lightGray), 
    ScatterGraph(Vector((0.5, 0.6), (1,2), (4,3), (2.5,2.7))), 
    LineGraph(Vector((1,1), (3,3), (5,4)), red))
  .draw(exitOnClose = false)