The lifecycle of transcripts
This presentation is the property of its rightful owner.
Sponsored Links
1 / 10

The Lifecycle of “Transcripts” PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

The Lifecycle of “Transcripts”. Day 1: Interview taped; Harry enters in database for everything but URL. Day 12 to 24: Still photos for header image produced: URL created DVCAM sent to UCTV URL created (usually before the transcript)  “dummy page”

Download Presentation

The Lifecycle of “Transcripts”

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The lifecycle of transcripts

The Lifecycle of “Transcripts”

Day 1:

Interview taped; Harry enters in database for everything but URL.

Day 12 to 24:

Still photos for header image produced: URL created

DVCAM sent to UCTV

URL created (usually before the transcript)  “dummy page”

Audio tape sent to transcriber off campus.

~ 14 days after audio sent:

Transcript done.

~ 14 days after transcript done:

Then light edit; posted.

~ 30 to sixties after interview taped:

UCTV broadcast.

THE SECRET LIFE OF THE TRANSCRIPT

Contractor transcribes audio  raw Word file

Harry annotates (adds outline and chunks the text)

Letitia edits the html version (DreamWeaver) and posts

Additional changes made by interviewee (up to 4 weeks after initial post)


A typical transcript dec 7 2000 james fallows outline linked pages

A Typical Transcript:Dec. 7, 2000: James FallowsOutline + Linked pages


What s in a transcript

What’s in a Transcript?

STATES:

  • Transcripts in a state of being:“Dummy pages” (transcripts in “gestation”)

    • Transcript exists but not posted (e.g. when interviewee hasn’t okay-ed it yet)

    • Transcript not produced yet (e.g. audio not sent to transcriber – due to time and/or funding)

  • Transcripts posted

  • Transcripts edited after being posted

  • Also:

    • Transcripts in limbo (not scheduled for transcription– postponed)

      • With dummy page (e.g. 2005:SEN – see next page)

      • Without dummy page (e.g. 1987: LUTTWAK)

        FORMS:

  • Full-text form (1 piece – no outline or chunking):

    2003: Terkel http://globetrotter.berkeley.edu/people3/Terkel/terkel-con0.html

    1983: Thompson http://globetrotter.berkeley.edu/conversations/Thompson/thompson-con0.html

  • Regular form (chunked into pieces according to an index)

    EXAMPLES:

  • 2004: Englehardt page 1 has 2 embedded pics...

    http://globetrotter.berkeley.edu/people4/Engelhardt/engelhardt-con0.html

  • 2002: Lustick see photos (whole page!) -- also Norman Myers – also Wendy Ewall

    http://globetrotter.berkeley.edu/people2/Lustick/lustick-con0.html

  • 1999: Shinoda & Iwashita: use of different colors in the interview

    http://globetrotter.berkeley.edu/conversations/Shinoda_Iwashita/shinoda_iwashita0.html


More on transcripts in limbo

More on Transcripts “in limbo”

("T"=transcript posted, "V"=video link present).

T V

__ ___

n y 2005 - Sen --> transcript exists but hasn’t been posted yet

n y 2004 - Joffe --> no time for producing transcript yet

n y 1990 - Howard --> historical - no time for producing transcript yet

n y 1989 - Zumwalt --> ditto

n y 1988 - Lewis --> ditto

n y 1986 - Atherton --> ditto

n y 1985 - Fraser --> ditto

n y 1984 - Warnke --> ditto

n y 1984 - Carrington --> ditto

Andriessen through Campo were put up as is because of a grant from the EU and the particulars are:

T V

__ ___

n n 2004 – Wales--not done

n n 2003 – Zhou--not done

n n 1999 -*Krenzler--not done

n n 1998 -*Donnelly--not done

n n 1992 – Andriessen--not done

n n 1990 - Lord Colesshill--not done

n n 1990 – Hanrieder--worth doing not yet done

n n 1988 – Tornudd--worth doing not yet done

n n 1988 - a Campo--worth doing not yet done


Transcript typology

Transcript Typology

TYPES of transcripts

REGULAR Transcripts:

  • 1 transcript - 1 video

    (i.e. 2000:Leon Panetta)

    IRREGULAR Transcripts:

  • 1 transcript - 2 videos

    (i.e. 2003:Josef Joffe)

  • 1 transcript - 1 video - 1 translated transcript

    (i.e. 1999:Alice Karekezi - French

    2002:Massimo D’Alema - Italian)

  • 2 transcripts - 1 video

    (i.e. 1999:Mark Danner)

  • 0 transcripts - 1 video

    (i.e. 1987:Edward Luttwak)

  • 1 transcript – 0 videos

    (i.e. 2002:Henri Peretz)


Smart harvesting of transcripts algorithm

“Smart Harvesting of Transcripts” Algorithm

  • For each chronological listing page (chron & chron2):

    • Extract all semantically valid showID:transcriptURL pairs -> video-link prefix name + transcript URL based on “transcript type” and “transcript and video-link dates”(see 6 cases shown on previous slide)

    • For each pair, crawl the main transcript URL, using the transcript outline to inline the linked pages(tables and images are filtered out)

    • Save each “blended” .html page to a separate file


Example using chron2 html

Example Using “chron2.html”


1 extract showid transcripturl pairs

1. Extract “showID:transcriptURL” Pairs

Retrieve webpage: http://globetrotter.berkeley.edu/conversations/chron2.html ...

Retrieve all the URLs and parse them...

11291:http://globetrotter.berkeley.edu/people/Fallows/fallows-con0.html

6223:http://globetrotter.berkeley.edu/people/Haas/haas-con0.html

6796:http://globetrotter.berkeley.edu/people/Hoffman/hoffman-con0.html

9159:http://globetrotter.berkeley.edu/conversations/Haglund/haglund-con0.html

6233:http://globetrotter.berkeley.edu/people/Herman/herman-con0.html

7790:http://globetrotter.berkeley.edu/conversations/Stark/stark-con0.html

7984:http://globetrotter.berkeley.edu/people/Heyman/heyman-con0.html

7133:http://globetrotter.berkeley.edu/people/Panetta/panetta-con0.html

7128:http://globetrotter.berkeley.edu/people/Joffe/joffe-con0.html

7129:http://globetrotter.berkeley.edu/people/Joffe/joffe-con0.html

9143:http://globetrotter.berkeley.edu/people/Kreisler/kreisler-con0.html

6013:http://globetrotter.berkeley.edu/people/Jacobs/jacobs-con0.html

9178:http://globetrotter.berkeley.edu/people/Tarnoff/tarnoff-con0.html

7782:http://globetrotter.berkeley.edu/people/Karekezi/karekezi-con.e0.html

4946:http://globetrotter.berkeley.edu/conversations/Patten/patten99-con0.html

4944:http://globetrotter.berkeley.edu/conversations/Podhoretz/podhoretz-con0.html

8042:http://globetrotter.berkeley.edu/people/Danner/danner-con1.00.html

9169:http://globetrotter.berkeley.edu/conversations/BeilinHusseini/

8038:http://globetrotter.berkeley.edu/people/Berdahl/berdahl-con0.html

7062:http://globetrotter.berkeley.edu/people/Ellsberg/ellsberg98-0.html

7134:http://globetrotter.berkeley.edu/Peress/peress-con0.html

7900:http://globetrotter.berkeley.edu/conversations/Patten/patten0.html

9146:http://globetrotter.berkeley.edu/conversations/Zumwalt/zumwalt-con0.html

11290:

9148:http://globetrotter.berkeley.edu/conversations/Atherton/atherton-con0.html

9164:http://globetrotter.berkeley.edu/conversations/Fraser/fraser-con0.html

9162:http://globetrotter.berkeley.edu/conversations/Carrington/carrington-con0.html

9150:http://globetrotter.berkeley.edu/conversations/Warnke/warnke-con0.html

9165:http://globetrotter.berkeley.edu/conversations/Pauling/pauling-con0.html

9144:http://globetrotter.berkeley.edu/conversations/Habib/habib0.html


2 resulting blended filtered transcript

2. Resulting Blended & Filtered Transcript

e.g.: James Fallows – showID=11291


3 list of archival blended html transcripts

3. List of Archival “Blended” .html transcripts


  • Login