1 / 12

Comparing Audio Signals

Comparing Audio Signals. A general purpose method to compare audio signals will contribute to ACORNS in a variety of ways. These are: Question and answer. An indigenous speaker asks a question and records a list of possible answers. The language learner speaks the answer and gets feedback.

imaran
Download Presentation

Comparing Audio Signals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing Audio Signals A general purpose method to compare audio signals will contribute to ACORNS in a variety of ways. These are: • Question and answer. An indigenous speaker asks a question and records a list of possible answers. The language learner speaks the answer and gets feedback. • Pronunciation program. Student repeats what they hear and ACORNS indicates whether they repeated the phrase or sentence correctly. • Hear and respond lesson. Student can either speak or type the blanks in the lesson. • Interface to computer games rather than point and click.

  2. Experiments to understand PCM audio These projects will provide the basis for more sophisticated speech recognition applications. Experiment examples include: • Implement general purpose solutions to n linear equations and unknowns based on LPC auto correlation, covariance, and periodic audio assumptions. What are the practical advantages or disadvantages of each? • Play back the audio without the residue. Is it understandable? What happens if we iterate the LPC algorithm on the residue? • Is it possible to uncover the zeroes in the residue signal? Perhaps we can use neural net or HMM algorithms. Are there other possibilities? • Are there other experiments that we can use to learn more from the audio signal?

  3. Sound Editor Enhancements The following enhancements are useful contributions • Speech enhancement • Filter background noise • Remove clicks • Normalize loudness • Currently, the program will not allow any distortion at all. Relaxing this requirement can make this feature more useful • Implement wide and narrow band spectrograph • Implement different band pass filters besides the Window Sync and analyze their characteristis: Chebychev, Eliptical, etc). • Implementing additional algorithms • Finding exact pitch points • Implement PSOLA to increase speed and pitch • Implement LPC and visualize with the original signal • Any other features that we may decide to be worthwhile

  4. Mobile Technology Research A new standard for Internet apps: HTML5, CSS3, and JavaScript. • Advantages • HTML5 has extensive drawing capabilities. • CSS3 has extended styling capabilities for presentation • HTML5, CSS3, and JavaScript has the potential to be a platform-independent, usable on desktops and mobile devices. • Disadvantages • HTML 5 Learning curve • ACORNS lessons do not make use of this new technology • HTML5 is not yet supported by most browsers • Possible Projects • Research the capabilities of HTML5, CSS3, and JavaScript • Convert one or more ACORNS lessons to use this technology

  5. Visualizing the vocal tract This application will have ramifications beyond the ACORNS project. For example, it can be useful to help the hearing impaired to learn to speak without an accent • The most platform-independent approach would be to use HTML5 with CSS3 and JavaScript. This opens the possibility that this application can become mobile-device apps • Determine the audio features needed to accurately visualize the vocal tract • Study how to extract the needed features from the audio signal • Implement an initial visualization program using features described in step 2

  6. Additional Acorns Lesson Types There are three possible lesson types that I’ve considered • Indigenous speaker asks questions and records possible answers. The student answers the question. The program determines if the answer is correct. We can use the magnet game setup and modify it as a new lesson category • Adding captions to a video clip and playing the video for the student while displaying the captions • Phraselator app that can run on mobile devices Additional possible lessons are possible. I have a manual created by inter-tribal language teachers that contains lots of information. This could help spawn ideas.

  7. ACORNS on the Web • Web-based audio recording is considered to be a security breach • Problems: Iphones & Ipads don’t support Java, Blackberries support the antiquated Java ME; Droids support their own flavor of Java • Solutions • Sign the Web-page applets. Unfortunately, typical users of ACORNS don’t have the expertise to do this. • Programmatically sign the applet when the applet is created. This doesn’t help on platforms that don’t support Java applets. • Provide short platform-specific callable modules callable that does nothing but record audio and return the PCM data. In the short term, the applets can call these modules; in the long term, convert ACORNS Web-based lessons to HTML5, CSS3, and JavaScript

  8. Computer Games Is it possible to create an interface where language teachers with minimal technical experience can create platform-independent language-based computer games • This project needs lots of research before even considering actual implementation • There is an open source program (JMonkey – written in Java) that can offer ideas. Are there other open source projects? • Has anyone else addressed this type of application? What is the current state of the research? • Requirements: language independent that can execute on the Web and on mobile devices

  9. WOLF and ELK Projects WOLF: Dictionary creation program for creating linguistic dictionaries. The following are possible enhancements: • Import/Export facilities from/to other dictionary formats. Examples are csv (Excel), mdf (Toolbox), and LIFT (WeSay) • Print capabilities that are template driven ELK: A program that can create indigenous ttf fonts and keyboard mapping (.keylayout files): • Intercept key strokes on linux systems and convert them based on XML-based .keylayout files

  10. Goals in Lab Today • Install the ACORNS SoundEditor • Create an icon button • Create a listener program to respond to the button • Install the plugin software for creating ACORNS lessons • Review plugin documentation

  11. Sound Editor Software • SoundEditor.jar (We will not need to modify this jar file) • A wrapper that includes a main method and links to classes in Tools.jar • This is because the Sound Editor is fully integrated into ACORNS • AcornsTritonus.jar (We will not need to modify this jar file) • Tritonus and other open source projects implement Java Sound System extensions to support additional audio formats • AcornsTritonus.jar is a stripped down version of those packages • The Sound Editor should be able to run without this jar file • Tools.jar • Contains most of the Sound Editor logic • More details are provided in the next slide • ElkKeyboards.jar (Modifications needed only for providing a linux interface) • A program to handle indigenous keyboard layouts.

  12. Tools.jar – Brief Overview • org.acorns.data • SoundData: audio data with interfaces to record, playback, etc. • MovieData: hold movie clips and interface to supporting classes • SoundUndoRedo: undo and redo audio operations • org.acorns.audio • SoundIO: convert audio formats • TimeDomain: convert from PCM to/to array of values • SoundDefaults: audio parameters • Recorder, PlayBack, AudioRead: classes to record, playback, etc. • org.acorns.editor (guts of audio manipulation) • SoundPanel: place to add new icon buttons • SoundListener or WaveListener: listener code that respond to buttons • SignalAnalysis: contains audio algorithms • SoundEditor: time domain audio manipulations • SoundDisplayPanel: visualize the audio signals

More Related