1 / 14

autoconf and Biological Annotation Tool (BAT)

autoconf and Biological Annotation Tool (BAT). Bob Zimmermann 6 September 2006. First, a Bit of a Digression (Look Familiar?). checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... no checking for mawk... no

ohio
Download Presentation

autoconf and Biological Annotation Tool (BAT)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. autoconf and Biological Annotation Tool (BAT) Bob Zimmermann 6 September 2006

  2. First, a Bit of a Digression (Look Familiar?) checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... no checking for mawk... no checking for nawk... no checking for awk... awk checking whether make sets $(MAKE)... yes checking build system type... powerpc-apple-darwin8.7.0 checking host system type... powerpc-apple-darwin8.7.0 checking for style of include used by make... GNU checking for gcc... gcc checking for C compiler default output file name... a.out . . .

  3. So • How does everyone have nearly identical 50,000 line configure scripts? • Serendipity. • NO! autoconf • Why have such a hacky shell script? • assume the worst when building on other OSs: Solaris, *BSD, OS X, VMS (ugh.) • Many people are working on it

  4. How does it work? • Write configure.in (or run autoscan) • aclocal; autoheader; autoconf

  5. An Example AC_INIT(iscan, 3.5.0, brent@cse.wustl.edu) … AC_CHECK_LIB([m], [log]) PKG_CHECK_MODULES([GLIB], [ glib-2.0 >= 2.0.0 ], AC_MSG_RESULT([yes]), AC_MSG_RESULT([no])) AC_SUBST(GLIB_CFLAGS) AC_SUBST(GLIB_LDFLAGS) AC_CHECK_HEADERS([libgen.h fcntl.h float.h limits.h stdlib.h string.h unistd.h]) AC_DEFINE_UNQUOTED([BUILD],["`date +'%Y.%m.%d.%R'``whoami`"], [Id of the build for versioning purposes]) AC_CHECK_FUNCS([floor memset pow sqrt sprintf strerror strstr strtol])

  6. An Example /* Id of the build for versioning purposes */ #define BUILD "2006.08.30.04:05rpz” /* Define to 1 if you have the `pow' function. */ #define HAVE_POW 1 /* Define to `unsigned' if <sys/types.h> does not define. */ /* #undef size_t */

  7. I Want Makefiles Too! • OK: automake • Input a short Makefile.am and get a Makefile • Has targets clean, configure, dist, all • Can be built in any directory • Will adjust compiler flags based on results of configure • Can replace missing system calls • Can conditionally use libraries

  8. Freaking Confusing, Bob • I’ll post a little crude guide on nijibabulu.org at some point.

  9. BAT: Why? • I am doing experiments with large annotation files • Eval is slow and uses a lot of memory • Eval has a lot of features we like • A framework for parsing and analyzing annotations quickly and robustly is good • Acronyms.

  10. The General Idea • We keep only one data structure hard coded: the BAT_Annotation • Parsing and writing are handled in plugins • Validation, evaluation, analysis are decoupled of parsing and writing • We keep a low profile for heavy computational tasks • Yes, there are some awful algorithms that go into annotation analysis

  11. The Model BAT_Validator BAT_Evaluator starts gene comp gene_id … frame BAT_Writer BAT_Parser GTF GTF BAT_Annotation UCSC UCSC PSL PSL BAT_Actor cluster …

  12. Example (Fake) use chr01.fa parse chr01.ucsc chr01 parse chr1.extra.gtf chr01 validate check_starts --delete validate gene_ids --cds-only … output chr1.eval.gtf --- act cluster_ests act make_estseq --output=chr01.estseq.fa etc.

  13. Whats There Yet? • Implemented pluggable GTF parser, GTF writer • Implemented gene_id validator. • Benchmarks • Parse, validate and write chr2R.eval.gtf Eval:120MB, BAT: 30MB (down) • chr2R.preds.gtf Eval: 24+ hrs BAT: <2min • Compiles on OS X, Solaris, Linux, OpenBSD (autoconf!)

  14. What Else Can It Be Good For? • Perl and Python bindings • Modest (constant) loss of efficiency • Parameter estimation • Zoe output of multiple formats • Target selection • Name a project involving annotations of any format.

More Related