Recent Posts

: Programming : Life : Economics :

How to get punched in the face by Go’s Standard Library (and arguably my own stupidity)

Posted on June 23, 2014

Ever have one of those days? A “I forgot to add a semi colon to terminate a line” type of day? Yeah, me too. And today was one of them.

Many developers, when allowing user uploaded data, tend to trust the web browser’s headers a little too much. I prefer to use MIME Type sniffing on the actual binary stream. This gives me a greater level of assurance that I’m not allowing a user to upload a different type of file than the one I desire. Say an executable binary with which to do my application damage.

I’ve been using using the net/http/sniff/DetectContentType function in the Go standard library to do this type of detection on images.

func ValidateImageType(b []byte) (string) {
  m := http.DetectContentType(b)
  switch m { 
  case "image/jpeg":
    return "jpg"
  case "image/png":
    return "png"
  case "image/gif":
    return "gif"
  }
  return ""
}

It works wonderfully.

Today I figured I’d implement mp4 detection so as to detect videos. I snagged a video file off of my Note 2 and created a quick test.

func ValidateVideoType(b []byte) (string) {
  m := http.DetectContentType(b)
  switch m { 
  case "video/mp4":
    return "mp4"
  }
  return ""
}

This didn’t work and I headed over to the documentation. After reading the function’s signature, I was pretty sure I was using it correctly. Suspecting that my mp4 was corrupt I went and opened up the file in a hex editor.

mp4_hexadecimal

Comparing it to the living specification, it appeared correct. But just in case the file was in fact corrupt, I went out and found a test mp4 file from Apple.com. Unfortunately this file also failed.

Knowing that I was using the function correctly, and that the file was intact, I headed over to review the sniff.go source file.

package main
 
import (
  "encoding/binary"
  "bytes"
  "log"
  "os"
  "io"
  "net/http"
)
 
func main() {
  var total int
  var ext string
  buf := make([]byte, 512)
 
  fi, err := os.Open("./sample_mpeg4.mp4")
  if err != nil {
    panic(err)
  }
  defer fi.Close()
 
  for {
    n, err := fi.Read(buf)
    total = total + n
    if n == 0 {
      break
    }
    if err == io.EOF {
      break
    }
    if err != nil {
      break
    }
 
    if n == 512 {
 
      data := buf[:n]
      if len(data) < 8 {
        break
      }
 
      boxSize := int(binary.BigEndian.Uint32(data[:4]))
      if boxSize%4 != 0 {
        break
      }
      if len(data) < boxSize {
        break
      }
      if !bytes.Equal(data[4:8], []byte("ftyp")) {
        break
      }
 
      for st := 8; st < boxSize; st+=4 {
        if st == 12 {
          continue
        }
        seg := string(data[st : st+3])
        switch seg {
          case"mp4", "iso", "M4V", "M4P", "M4B":
            ext = "mp4"
        }
        if len(ext) > 1 {
          break
        }
      }
 
      m := http.DetectContentType(buf)
      log.Println("LOCAL: ", ext)
      log.Println("SOURCE: ", m)
    }
  }
}

Bizarrely, the local version found the desired file type. Wuuuuuuuu?!?

I wondered if it was my OSX environment so I spun up a linux box, downloaded the Go source, and dove into the source code.

Making sure I was at least starting off in the right direction, I ran the bash script go/src/all.bash and all the tests completed successfully. I went to the sniff_test.go file and found this test dataset commented out.

        //{"MP4 video", []byte("\x00\x00\x00\x18ftypmp42\x00\x00\x00\x00mp42isom&lt;\x06t\xbfmdat"), "video/mp4"},

Hmmmmm, that looked interesting. So I uncommented the code and ran the tests again. THEY FAILED! It wasn’t what I wanted, but at this point I’ll take anything.

Knowing I was close I did a recursive grep for the function signature in question to see what was using it.

backwardselvis@lts-linux:~/go/src$ grep -nr "mp4Sig" ./
./pkg/net/http/sniff.go:102:	//mp4Sig(0),
./pkg/net/http/sniff.go:164:type mp4Sig int
./pkg/net/http/sniff.go:166:func (mp4Sig) match(data []byte, firstNonWS int) string {
backwardselvis@lts-linux:~/go/src$

And immediately, my face struck my palm and my head hit my keyboard.

I opened up sniff.go. The EXACT SAME FILE that I’d been staring at all day and uncommented line 102. Saved, quit, and upon running the tests again they all passed successfully.

And despite this residual lame duck feeling, I’d argue that it is better to chase down the truth then let something unknown slip by.

:peace

2013-10-29 07.27.37

How To: Shove data into Postgres using Goroutines(Gophers) and GoLang

After watching Rob Pike’s wonderful golang talk, ‘Concurrency Is Not Parallelism’, in which he uses the analogy of having many Gophers running around and getting work done. I realized that I wanted to program a test so as better to solidify the concept.

Rob Pike – ‘Concurrency Is Not Parallelism’

Recently a pain point for me while using Play! and Scala was finding a fast way to insert data into a Postgres Database. My solution was fine but I felt it could be simpler, nay I would even say that ideally it SHOULD be simpler. I decided to see how golang would handle this type of problem.

Please install golang 1.2 and postgres 9.3.1 on a 10.9 OSX machine.

We will then create a database with one table.

$ initdb ~/Programming/postgres-E utf8
$ pg_ctl -D ~/Programming/postgres -l ~/logfile start
$ createdb testdb
$ psql testdb
testdb=# create table test ( gopher_id int, created timestamp );

Next, create a file called test_one.go with my provided code.. Our goal right now is to simply get data to insert into the database. I should mention that I planned to have many goroutines (or as Rob says, gophers) inserting data into my testdb so the column gopher_id was in preparation for this.

package main
 
import (
  //"os"
  "log"
  "time"
  "fmt"
  "database/sql"
  _ "github.com/lib/pq"
)
 
func main() {
  // create the statement string
  var sStmt string = "insert into test (gopher_id, created) values ($1, $2)"
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  stmt, err := db.Prepare(sStmt)
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("StartTime: %v\n", time.Now())
 
  res, err := stmt.Exec(1, time.Now())
  if err != nil || res == nil {
    log.Fatal(err)
  }
 
  // close statement
  stmt.Close()
 
  // close db
  db.Close()
 
  fmt.Printf("StopTime: %v\n", time.Now())
}

Run it and take a look at the results.

$ go build test_one.go 
$ ./test_one 
StartTime: 2013-10-28 19:37:11.014548615 -0700 PDT
StopTime: 2013-10-28 19:37:11.01744029 -0700 PDT
$

Yay! We have inserted exactly one row into the database.

Lets modify our program to insert a 100,000 rows using a simple for loop.

package main
 
import (
  "os"
  "log"
  "time"
  "fmt"
  "database/sql"
  _ "github.com/lib/pq"
)
 
func main() {
  // create the statement string
  var sStmt string = "insert into test (gopher_id, created) values ($1, $2)"
  var entries int = 100000
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("StartTime: %v\n", time.Now())
  for i := 0; i &lt; entries; i++ {
 
    stmt, err := db.Prepare(sStmt)
    if err != nil {
      log.Fatal(err)
    }
 
    res, err := stmt.Exec(1, time.Now())
    if err != nil || res == nil {
      log.Fatal(err)
    }
 
    // close statement
    stmt.Close()
  }
 
  // close db
  db.Close()
 
  fmt.Printf("StopTime: %v\n", time.Now())
  os.Exit(0)
}

And the results after running it.

$ ./test_one 
StartTime: 2013-10-28 19:53:55.397246856 -0700 PDT
StopTime: 2013-10-28 19:54:30.471278797 -0700 PDT
$

Ok. It took 35 seconds to insert a simple timestamp 100,000 times.

Now here is where we want to mix things up a bit. Rob mentions this thing called a goroutine in his talk. What would happen if we used 10 goroutines to insert 10,000 records each. It should be faster right?

Here is my test_many.go application, copy it.

package main
 
// schema we can use along with some select statements
// create table test ( gopher_id int, created timestamp );
// select * from test order by created asc limit 1;
// select * from test order by created desc limit 1;
// select count(created) from test;
 
import (
  "log"
  "time"
  "fmt"
  "database/sql"
  _ "github.com/lib/pq"
)
 
const (
  gophers = 10
  entries = 10000
)
 
func main() {
 
  // create string to pass
  var sStmt string = "insert into test (gopher_id, created) values ($1, $2)"
 
  // run the insert function using 10 go routines
  for i := 0; i &lt; gophers; i++ {
    // spin up a gopher
    go gopher(i, sStmt)
  }
 
  // this is a simple way to keep a program open
  // the go program will close when a key is pressed
  var input string
  fmt.Scanln(&amp;input)
}
 
func gopher(gopher_id int, sStmt string) {
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("Gopher Id: %v || StartTime: %v\n",gopher_id, time.Now())
 
  for i := 0; i &lt; entries; i++ {
 
    stmt, err := db.Prepare(sStmt)
    if err != nil {
      log.Fatal(err)
    }
 
    res, err := stmt.Exec(gopher_id, time.Now())
    if err != nil || res == nil {
      log.Fatal(err)
    }
 
    stmt.Close()
 
  }
  // close db
  db.Close()
 
  fmt.Printf("Gopher Id: %v || StopTime: %v\n",gopher_id, time.Now())
 
}

Compile and run it. And here are the results.

$ go build test_many.go
$ ./test_many 
Gopher Id: 0 || StartTime: 2013-10-28 20:33:22.756366002 -0700 PDT
Gopher Id: 1 || StartTime: 2013-10-28 20:33:22.756647643 -0700 PDT
Gopher Id: 2 || StartTime: 2013-10-28 20:33:22.756794146 -0700 PDT
Gopher Id: 3 || StartTime: 2013-10-28 20:33:22.756918631 -0700 PDT
Gopher Id: 4 || StartTime: 2013-10-28 20:33:22.757068214 -0700 PDT
Gopher Id: 5 || StartTime: 2013-10-28 20:33:22.757215735 -0700 PDT
Gopher Id: 6 || StartTime: 2013-10-28 20:33:22.757371258 -0700 PDT
Gopher Id: 7 || StartTime: 2013-10-28 20:33:22.757492378 -0700 PDT
Gopher Id: 8 || StartTime: 2013-10-28 20:33:22.757640434 -0700 PDT
Gopher Id: 9 || StartTime: 2013-10-28 20:33:22.758553529 -0700 PDT
Gopher Id: 5 || StopTime: 2013-10-28 20:33:31.466897188 -0700 PDT
Gopher Id: 7 || StopTime: 2013-10-28 20:33:31.473265074 -0700 PDT
Gopher Id: 0 || StopTime: 2013-10-28 20:33:31.475096903 -0700 PDT
Gopher Id: 4 || StopTime: 2013-10-28 20:33:31.485587389 -0700 PDT
Gopher Id: 6 || StopTime: 2013-10-28 20:33:31.500614304 -0700 PDT
Gopher Id: 3 || StopTime: 2013-10-28 20:33:31.5006651 -0700 PDT
Gopher Id: 8 || StopTime: 2013-10-28 20:33:31.501664917 -0700 PDT
Gopher Id: 1 || StopTime: 2013-10-28 20:33:31.510199788 -0700 PDT
Gopher Id: 2 || StopTime: 2013-10-28 20:33:31.522350833 -0700 PDT
Gopher Id: 9 || StopTime: 2013-10-28 20:33:31.523710669 -0700 PDT
 
$

Nice! Roughly 9 seconds for 100,000 records.

Lets change our test_many.go application to insert 1,000,000 records. Simply as a way to keep the application running longer.

...snip...
 
const (
  gophers = 10
  entries = 100000
)
 
...snip...

Now go ahead and build the application and run it.

Then in another terminal window, open up top and search for the test_many command.

PID    COMMAND      %CPU TIME     #TH   #WQ  #PORT #MREGS MEM    RPRVT  PURG   CMPRS  VPRVT  VSIZE  PGRP  PPID  STATE    UID       FAULTS    COW     MSGSENT    MSGRECV
89478  test_many    0.0  00:12.39 15    0    84    1448   140M   139M   0B     0B     358M   139G   89478 88263 sleeping 672251097 36630     95      110        51

You will see that under the #TH column that our application, test_many, has 15 threads open. 10 of these, are the threads we created using our goroutine. 

UPDATE: That statement was not true. I’ve learned after working with go that while using goroutines, threads are dynamically assigned. It is not a 1 to 1 correlation.

However, Rob does have a talk where he mentions some more details. At minute 8:56 he writes

“But if you think of it as a very cheap thread, you won’t be far off”

 

Anyway, this is the section of the test_many.go file where the magic happens.

...snip...
 
  for i := 0; i &lt; gophers; i++ {
    // spin up a gopher
    go gopher(i, sStmt)
  }
 
...snip...

You see, simply by placing the “go” call in front of our gopher function, we allow golang to do a bunch of behind the scenes awesomeness. A part of this gain, is an inexpensive and versatile thread and memory manager. This is pretty sweet.

As the last test, here is 20 gophers inserting a million rows into our database.

Change the needed constants.

...snip...
 
const (
  gophers = 20
  entries = 50000
)
 
...snip...

Then recompile and run our application.

$ ./test_many 
Gopher Id: 0 || StartTime: 2013-10-30 14:40:51.619452739 -0700 PDT
Gopher Id: 1 || StartTime: 2013-10-30 14:40:51.619757831 -0700 PDT
Gopher Id: 2 || StartTime: 2013-10-30 14:40:51.619896348 -0700 PDT
Gopher Id: 3 || StartTime: 2013-10-30 14:40:51.620059076 -0700 PDT
Gopher Id: 4 || StartTime: 2013-10-30 14:40:51.620283647 -0700 PDT
Gopher Id: 5 || StartTime: 2013-10-30 14:40:51.620410941 -0700 PDT
Gopher Id: 6 || StartTime: 2013-10-30 14:40:51.620550967 -0700 PDT
Gopher Id: 7 || StartTime: 2013-10-30 14:40:51.620693553 -0700 PDT
Gopher Id: 8 || StartTime: 2013-10-30 14:40:51.62085492 -0700 PDT
Gopher Id: 9 || StartTime: 2013-10-30 14:40:51.62177397 -0700 PDT
Gopher Id: 10 || StartTime: 2013-10-30 14:40:51.621900208 -0700 PDT
Gopher Id: 11 || StartTime: 2013-10-30 14:40:51.622068806 -0700 PDT
Gopher Id: 12 || StartTime: 2013-10-30 14:40:51.622233856 -0700 PDT
Gopher Id: 13 || StartTime: 2013-10-30 14:40:51.622374732 -0700 PDT
Gopher Id: 14 || StartTime: 2013-10-30 14:40:51.622520823 -0700 PDT
Gopher Id: 15 || StartTime: 2013-10-30 14:40:51.622663899 -0700 PDT
Gopher Id: 16 || StartTime: 2013-10-30 14:40:51.6228063 -0700 PDT
Gopher Id: 17 || StartTime: 2013-10-30 14:40:51.62294893 -0700 PDT
Gopher Id: 18 || StartTime: 2013-10-30 14:40:51.623085432 -0700 PDT
Gopher Id: 19 || StartTime: 2013-10-30 14:40:51.623231574 -0700 PDT
Gopher Id: 3 || StopTime: 2013-10-30 14:42:35.551828109 -0700 PDT
Gopher Id: 6 || StopTime: 2013-10-30 14:42:35.619763694 -0700 PDT
Gopher Id: 17 || StopTime: 2013-10-30 14:42:35.642630072 -0700 PDT
Gopher Id: 4 || StopTime: 2013-10-30 14:42:35.660139502 -0700 PDT
Gopher Id: 10 || StopTime: 2013-10-30 14:42:35.67527849 -0700 PDT
Gopher Id: 13 || StopTime: 2013-10-30 14:42:35.685175774 -0700 PDT
Gopher Id: 9 || StopTime: 2013-10-30 14:42:35.69144017 -0700 PDT
Gopher Id: 0 || StopTime: 2013-10-30 14:42:35.729137844 -0700 PDT
Gopher Id: 1 || StopTime: 2013-10-30 14:42:35.749108884 -0700 PDT
Gopher Id: 2 || StopTime: 2013-10-30 14:42:35.770752277 -0700 PDT
Gopher Id: 7 || StopTime: 2013-10-30 14:42:35.777471603 -0700 PDT
Gopher Id: 8 || StopTime: 2013-10-30 14:42:35.780713859 -0700 PDT
Gopher Id: 19 || StopTime: 2013-10-30 14:42:35.783338736 -0700 PDT
Gopher Id: 11 || StopTime: 2013-10-30 14:42:35.794922546 -0700 PDT
Gopher Id: 14 || StopTime: 2013-10-30 14:42:35.799124142 -0700 PDT
Gopher Id: 5 || StopTime: 2013-10-30 14:42:35.800538715 -0700 PDT
Gopher Id: 12 || StopTime: 2013-10-30 14:42:35.802006822 -0700 PDT
Gopher Id: 18 || StopTime: 2013-10-30 14:42:35.806211023 -0700 PDT
Gopher Id: 16 || StopTime: 2013-10-30 14:42:35.817042445 -0700 PDT
Gopher Id: 15 || StopTime: 2013-10-30 14:42:35.824304999 -0700 PDT

With a tiny golang program, we now have 1 million writes taking 1 minute and 44 seconds. Friggin sweet.

UPDATE2: Thanks to corvinusz for pointing this out. By simply moving our prepare request outside of our for loop. We can now insert 1M records in 44 seconds. Thanks dude!

 
...snip...
 
func gopher(gopher_id int, sStmt string) {
 
  // lazily open db (doesn't truly open until first request)
  db, err := sql.Open("postgres","host=localhost dbname=testdb sslmode=disable")
  if err != nil {
    log.Fatal(err)
  }
 
  stmt, err := db.Prepare(sStmt)
  if err != nil {
    log.Fatal(err)
  }
 
  fmt.Printf("Gopher Id: %v \t|| StartTime: %v\n",gopher_id, time.Now())
 
 
  for i := 0; i < entries; i++ {
 
 
    res, err := stmt.Exec(gopher_id, time.Now())
    if err != nil || res == nil {
      log.Fatal(err)
    }
 
 
  }
 
  stmt.Close()
  // close db
  db.Close()
 
  fmt.Printf("Gopher Id: %v \t|| StopTime: %v\n",gopher_id, time.Now())
 
}
 
...snip...
$ ./test_many_pool 
Gopher Id: 8 	|| StartTime: 2013-10-31 08:52:45.56084829 -0700 PDT
Gopher Id: 6 	|| StartTime: 2013-10-31 08:52:45.561137992 -0700 PDT
Gopher Id: 12 	|| StartTime: 2013-10-31 08:52:45.561215818 -0700 PDT
Gopher Id: 4 	|| StartTime: 2013-10-31 08:52:45.561288387 -0700 PDT
Gopher Id: 7 	|| StartTime: 2013-10-31 08:52:45.56135501 -0700 PDT
Gopher Id: 15 	|| StartTime: 2013-10-31 08:52:45.56148435 -0700 PDT
Gopher Id: 2 	|| StartTime: 2013-10-31 08:52:45.562015828 -0700 PDT
Gopher Id: 17 	|| StartTime: 2013-10-31 08:52:45.562434107 -0700 PDT
Gopher Id: 13 	|| StartTime: 2013-10-31 08:52:45.562872507 -0700 PDT
Gopher Id: 19 	|| StartTime: 2013-10-31 08:52:45.564599961 -0700 PDT
Gopher Id: 0 	|| StartTime: 2013-10-31 08:52:45.567561345 -0700 PDT
Gopher Id: 1 	|| StartTime: 2013-10-31 08:52:45.568002729 -0700 PDT
Gopher Id: 3 	|| StartTime: 2013-10-31 08:52:45.568259325 -0700 PDT
Gopher Id: 10 	|| StartTime: 2013-10-31 08:52:45.569216209 -0700 PDT
Gopher Id: 9 	|| StartTime: 2013-10-31 08:52:45.569317516 -0700 PDT
Gopher Id: 11 	|| StartTime: 2013-10-31 08:52:45.569604124 -0700 PDT
Gopher Id: 14 	|| StartTime: 2013-10-31 08:52:45.572600335 -0700 PDT
Gopher Id: 16 	|| StartTime: 2013-10-31 08:52:45.574639425 -0700 PDT
Gopher Id: 18 	|| StartTime: 2013-10-31 08:52:45.576122518 -0700 PDT
Gopher Id: 5 	|| StartTime: 2013-10-31 08:52:45.577174648 -0700 PDT
Gopher Id: 5 	|| StopTime: 2013-10-31 08:53:28.156243751 -0700 PDT
Gopher Id: 14 	|| StopTime: 2013-10-31 08:53:28.240175919 -0700 PDT
Gopher Id: 9 	|| StopTime: 2013-10-31 08:53:28.247910193 -0700 PDT
Gopher Id: 1 	|| StopTime: 2013-10-31 08:53:28.313724365 -0700 PDT
Gopher Id: 10 	|| StopTime: 2013-10-31 08:53:28.372717602 -0700 PDT
Gopher Id: 16 	|| StopTime: 2013-10-31 08:53:28.389852733 -0700 PDT
Gopher Id: 15 	|| StopTime: 2013-10-31 08:53:28.434203629 -0700 PDT
Gopher Id: 19 	|| StopTime: 2013-10-31 08:53:28.470070067 -0700 PDT
Gopher Id: 18 	|| StopTime: 2013-10-31 08:53:28.486355611 -0700 PDT
Gopher Id: 4 	|| StopTime: 2013-10-31 08:53:28.505530435 -0700 PDT
Gopher Id: 8 	|| StopTime: 2013-10-31 08:53:28.507830911 -0700 PDT
Gopher Id: 3 	|| StopTime: 2013-10-31 08:53:28.520788665 -0700 PDT
Gopher Id: 11 	|| StopTime: 2013-10-31 08:53:28.545292303 -0700 PDT
Gopher Id: 13 	|| StopTime: 2013-10-31 08:53:28.585376452 -0700 PDT
Gopher Id: 0 	|| StopTime: 2013-10-31 08:53:28.587530913 -0700 PDT
Gopher Id: 7 	|| StopTime: 2013-10-31 08:53:28.598791481 -0700 PDT
Gopher Id: 2 	|| StopTime: 2013-10-31 08:53:28.605832957 -0700 PDT
Gopher Id: 12 	|| StopTime: 2013-10-31 08:53:28.61655187 -0700 PDT
Gopher Id: 17 	|| StopTime: 2013-10-31 08:53:28.616684023 -0700 PDT
Gopher Id: 6 	|| StopTime: 2013-10-31 08:53:28.618116313 -0700 PDT

Something to call out, is that we are shoving our data, basically, into a black hole. We have no idea if the inserts completed successfully. Yes there is some error logic, but not enough. We are using Go to have our application function in a parallel state, or as Rob might say.

“The concurrent composition of 10 gophers inserting data into a database”

 

Rob in his talk, goes on to mention channels. Which is the logical next step for message passing here. This would bring our application into a concurrent state and if people are interested, hit me up @jaredfolkins and let me know I should post more concerning this.

play_postgres

Play Framework 2.2 and Postgres 9.3.1 java.sql.SQLException: No suitable driver found

Environment

OSX: 10.9
Postgres: 9.3.1
Scala: 2.10.3
PlayFramework: 2.2
Java: 1.7.0_25

build.sbt

...
 
libraryDependencies ++= Seq(
  jdbc,
  anorm,
  cache,
  "org.postgresql" % "postgresql" % "9.2-1003-jdbc4"
)
 
...

application.conf

...
 
db.default.driver=org.postgresql.Driver
db.default.url="jdbc:postgresql://localhost/YOURDATABASENAMEHERE"
 
...

 

Stop/Start your play application and you should be good to go.

Older Posts