1 / 66

NLP in Scala with Breeze and Epic

NLP in Scala with Breeze and Epic. David Hall UC Berkeley. ScalaNLP Ecosystem. ≈. ≈. ≈. Linear Algebra Scientific Computing Optimization. Natural Language Processing Structured Prediction. Super-fast GPU parser for English. { }. Numpy / Scipy. PyStruct /NLTK. Breeze.

noleta
Download Presentation

NLP in Scala with Breeze and Epic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP in Scala with Breeze and Epic David Hall UC Berkeley

  2. ScalaNLP Ecosystem ≈ ≈ ≈ • Linear Algebra • Scientific Computing • Optimization • Natural Language Processing • Structured Prediction • Super-fast GPUparser for English { } Numpy/Scipy PyStruct/NLTK Breeze Epic Puck

  3. Natural Language Processing V S S VP VP Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap. It certainly won’t get there on looks. VP NP PP NP

  4. Epic for NLP V S S VP VP Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap. VP NP PP NP

  5. Named Entity Recognition PersonOrganizationLocationMisc

  6. NER with Epic • import epic.models.NerSelector • valnerModel = NerSelector.loadNer("en").get • val tokens = epic.preprocess.tokenize("Almost 20 years ago, Bill Watterson walked away from \"Calvin & Hobbes.\"") • println(nerModel.bestSequence(tokens).render("O"))Almost 20 years ago , [PER:Bill Watterson] walked away from `` [LOC:Calvin & Hobbes] . '' Not a location!

  7. Annotate a bunch of data?

  8. Building an NER system • val data: IndexedSeq[Segmentation[Label, String]] = ??? • val system = SemiCRF.buildSimple(data, startLabel, outsideLabel) • println(system.bestSequence(tokens).render("O"))Almost 20 years ago , [PER:Bill Watterson] walked away from `` [MISC:Calvin& Hobbes] . ''

  9. Gazetteers http://en.wikipedia.org/wiki/List_of_newspaper_comic_strips_A%E2%80%93F

  10. Gazetteers

  11. Using your own gazetteer • val data: IndexedSeq[Segmentation[Label, String]] = ??? • valmyGazetteer = ??? • val system = SemiCRF.buildSimple(data, startLabel, outsideLabel, gaz= myGazetteer)

  12. Gazetteer • Careful with gazetteers! • If built from training data, system will use it and only it to make predictions! • So, only known forms will be detected. • Still, can be very useful…

  13. Semi-CR-What? • Semi-Markov Conditional Random Field • Don’t worry about the name.

  14. Semi-CRFs

  15. Semi-CRFs score(Chez Panisse) + score(Berkeley, CA) + score(- A bowl of ) + score(Churchill-Brenneis Orchards) + score(Page mandarins and medjool dates)

  16. Features = w(starts-with-Chez) • w(starts-with-C…) • w(ends-with-P…) • w(starts-sentence) • w(shape:Xxx Xxx) • w(two-words) • w(in-gazetteer) score(Chez Panisse)

  17. Building your own features valdsl = new WordFeaturizer.DSL[L](counts) with SurfaceFeaturizer.DSL import dsl._ word(begin) // word at the beginning of the span + word(end – 1) // end of the span + word(begin – 1) // before (gets things like Mr.) + word (end) // one past the end + prefixes(begin) // prefixes up to some length + suffixes(begin) + length(begin, end) // names tend to be 1-3 words + gazetteer(begin, end)

  18. Using your own featurizer • val data: IndexedSeq[Segmentation[Label, String]] = ??? • valmyFeaturizer = ??? • val system = SemiCRF.buildSimple(data, startLabel, outsideLabel, featurizer = myFeaturizer)

  19. Features • So far, we’ve been able to do everything with (nearly) no math. • To understand more, need to do some math.

  20. Machine Learning Primer • Training example (x, y) • x: sentence of some sort • y: labeled version • Goal: • want score(x, y) > score(x, y’), forall y’.

  21. Machine Learning Primer score(x, y) = wTf(x, y)

  22. Machine Learning Primer score(x, y) = w.t * f(x, y)

  23. Machine Learning Primer score(x, y) = w dot f(x, y)

  24. Machine Learning Primer score(x, y) >= score(x, y’)

  25. Machine Learning Primer w dot f(x, y) >= w dot f(x, y’)

  26. Machine Learning Primer w dot f(x, y)>= w dot f(x, y’)

  27. Machine Learning Primer w+ f(x,y) f(x,y) w f(x,y’) (w + f(x,y)) dot f(x,y) >= w dot f(x,y)

  28. The Perceptron • valfeatureIndex: Index[Feature] = ??? • vallabelIndex: Index[Label] = ??? • val weights = DenseVector.rand[Double](featureIndex.size) • for ( epoch <- 0 until numEpochs; (x, y) <- data ) { vallabelScores = DV.tabulate(labelIndex.size) { yy => val features = featuresFor(x, yy) weights.t * new FeatureVector(indexed) // or weights dot new FeatureVector(indexed) } … }

  29. The Perceptron (cont’d) • valfeatureIndex : Index[Feature] = ??? • vallabelIndex: Index[Label] = ??? • val weights = DenseVector.rand[Double](featureIndex.size) • for ( epoch <- 0 until numEpochs; (x, y) <- data ) { vallabelScores= ...valy_best = argmax(labelScores) if (y != y_best) { weights += new FeatureVector(featuresFor(x, y))weights -= new FeaureVector(featuresFor(x, y_best)) }}

  30. Structured Perceptron • Can’t enumerate all segmentations! (L * 2n) • But dynamic programs exist to max or sum • … if the feature function has a nice form.

  31. Structured Perceptron • valfeatureIndex: Index[Feature] = ??? • vallabelIndex: Index[Label] = ??? • val weights = DenseVector.rand[Double](featureIndex.size) • for ( epoch <- 0 until numEpochs; (x, y) <- data ) { valy_best = bestStructure(weights, x) if (y != y_best) {weights += featuresFor(x, y) weights -= featuresFor(x, y_best) } }

  32. Constituency Parsing V S S VP VP Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap. VP NP PP NP

  33. Multilingual Parser “Berkeley:” [Petrov & Klein, 2007]; Epic [Hall, Durrett, and Klein, 2014]

  34. Epic Pre-built Models Parsing English, Basque, French, German, Swedish, Polish, Korean (working on Arabic, Chinese, Spanish) Part-of-Speech Tagging English, Basque, French, German, Swedish, Polish Named Entity Recognition English Sentence segmentation English (ok support for others) Tokenization Above languages

  35. What Is Breeze? Dense Vectors, Matrices, Sparse Vectors, Counters, MatrixDecompositions

  36. What Is Breeze? Nonlinear Optimization, Probability Distributions

  37. Getting started libraryDependencies ++= Seq( "org.scalanlp" %% "breeze" % "0.8.1", // optional: native linear algebra libraries // bulky, but faster "org.scalanlp" %% "breeze-natives" % "0.8.1" ) scalaVersion := "2.11.1"

  38. Linear Algebra • import breeze.linalg._ • val x = DenseVector.zeros[Int](5)// DenseVector(0, 0, 0, 0, 0) • val m = DenseMatrix.zeros[Int](5,5) • val r = DenseMatrix.rand(5,5) • m.t // transpose • x + x // addition • m * x // multiplication by vector • m * 3 // by scalar • m * m // by matrix • m :* m // element wise mult, Matlab .*

  39. Return Type Selection • import breeze.linalg.{DenseVector=> DV} • val dv = DV(1.0, 2.0) • valsv = SparseVector.zeros[Double](2) • dv + sv// res: DenseVector[Double] = DenseVector(1.0, 2.0) • sv+ sv// res: SparseVector[Double] = SparseVector(1.0, 2.0)

  40. Return Type Selection • import breeze.linalg.{DenseVector=> DV} • val dv = DV(1.0, 2.0) • valsv = SparseVector.zeros[Double](2) • (dv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = DenseVector(1.0, 2.0) • (sv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = SparseVector() Static: Vector Dynamic: Dense Static: Vector Dynamic: Sparse

  41. Linear Algebra: Slices • val m = DenseMatrix.zeros[Int](5, 5) • m(::,1) // slice a column// DV(0, 0, 0, 0, 0) • m(4,::) // slice a row • m(4,::) := DV(1,2,3,4,5).t • m.toString0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5

  42. Linear Algebra: Slices • m(0 to 1, 3 to 4).toString0 0 2 3 • m(IndexedSeq(3,1,4,2),IndexedSeq(4,4,3,1))0 0 0 0 0 0 0 0 5 5 4 2 0 0 0 0

  43. Universal Functions • import breeze.numerics._ • log(DenseVector(1.0, 2.0, 3.0, 4.0))// DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906) • exp(DenseMatrix( (1.0, 2.0), (3.0, 4.0))) • sin(Array(2.0, 3.0, 4.0, 42.0)) • // also sin, cos, sqrt, asin, floor, round, digamma, trigamma, ...

  44. UFuncs: In-Place • import breeze.numerics._ • val v = DV(1.0, 2.0, 3.0, 4.0) • log.inPlace(v)// v == DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906)

  45. UFuncs: Reductions • sum(DenseVector(1.0, 2.0, 3.0))// 6.0 • sum(DenseVector(1, 2, 3))// 6 • mean(DenseVector(1.0, 2.0, 3.0))// 2.0 • mean(DenseMatrix( (1.0, 2.0, 3.0) (4.0, 5.0, 6.0)))// 3.5

  46. UFuncs: Reductions • valm = DenseMatrix((1.0, 3.0), (4.0, 4.0)) • sum(m) == 12.0 • mean(m) == 3.0 • sum(m(*, ::)) == DenseVector(4.0, 8.0) • sum(m(::, *)) == DenseMatrix((5.0, 7.0)) • mean(m(*, ::)) == DenseVector(2.0, 4.0) • mean(m(::, *)) == DenseMatrix((2.5, 3.5))

  47. Broadcasting • valdm = DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0)) • sum(dm(::, *)) == DenseMatrix((5.0, 7.0, 9.0)) • dm(::, *) + DenseVector(3.0, 4.0)// DenseMatrix((4.0, 5.0, 6.0), (8.0, 9.0, 10.0))) • dm(::, *) := DenseVector(3.0, 4.0)// DenseMatrix((3.0, 3.0, 3.0), (4.0, 4.0, 4.0))

  48. UFuncs: Implementation • object log extends UFunc • implicit object logDouble extends log.Impl[Double, Double] { def apply(x: Double) = scala.math.log(x) } • log(1.0) // == 0.0 • // elsewhere: magic to automatically make this work: • log(DenseVector(1.0)) // == DenseVector(0.0) • log(DenseMatrix(1.0)) // == DenseMatrix(0.0) • log(SparseVector(1.0)) // == SparseVector()

More Related