1 / 23

FLAVIUS Meeting – WP4

FLAVIUS Meeting – WP4. June 8, 2010. Giurgiu Bogdan Wong William. Agenda. LW contributions Keys to successful integration Complete integration picture Translation REST API Trustscore ™ and Reporting REST API Version 2 Customization through dictionaries Customization through training

lyle
Download Presentation

FLAVIUS Meeting – WP4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FLAVIUS Meeting – WP4 June 8, 2010 Giurgiu Bogdan Wong William

  2. Agenda • LW contributions • Keys to successful integration • Complete integration picture • Translation REST API • Trustscore™ and Reporting • REST API Version 2 • Customization through dictionaries • Customization through training • FLAVIUS Language Weaver Roadmap • Questions & Answers

  3. Language Weaver’s Contribution

  4. Keys to a Successful Partner Integration • Ability to integrate with Language Weaver Machine Translation for development and testing • Ability to customize baseline engines with dictionaries • Ability to customize baseline engines with training of domain/customer specific vertical system

  5. Complete Picture

  6. Sample UI for the Translation Engine

  7. Translation REST API • SimpleHTTP base communication protocol • Leverage HTTP calls – POST, GET, DELETE • Web 2.0 used by Amazon, Twitter, etc. • Supported text formats: TXT, HTML, TMX, XLIFF • Data is encrypted using SSL (via HTTPS) • Authentication using a custom HTTP scheme • Two addition headers added to every request • LW_Date – Contains a date/time string based on the request time • Authorization – Contains a string made up of three strings (each separated by a colon): “LWA:<userid>:<signature>” • Unique signature generated using a keyed-HMAC (Hash Message Authentication Code) and a SHA1(Secure Hash Algorithm) digest

  8. Translation Rest API /v1/user +HTTP POST User /v1/translation/src.tgt/lpid=<id> + HTTP POST Blocking Translations Language Pair /v1/translation/src.tgt/lpid=<id> + HTTP POST /v1/translation/src.tgt/lpid=<id>/<jobid> + HTTP GET/DELETE Non-Blocking Translations /v1/lpinfo + HTTP GET

  9. Translation REST API • Blocking Translation Request • HTTP POST to https://lwaccess.languageweaver.com/v1/translation/[src].[tgt]/lpid=[lpid]/[optional-params]/ • Appropriate small chunks of data (less than 640 bytes) • Mandatory Input Parameters: • [src] – three letter code for the source language (e.g. “eng” for English) • [tgt] – three letter code for the target language • [lpid] – integer denoting the specific language pair system to be used • “source_text=” – [string] - URL escaped version of the input source (POST DATA) • Optional Input Parameters: • input_format=[value] – string declaring the input format. Choose from “html”, “plain”, “xliff”. • input_encoding=[value] – string defining the input format. Only “utf8” supported • Sample Calls: • Create Blocking Translation Job for Text, Get Language Pair details

  10. Translation REST API • Non-Blocking Translation Request • HTTP POST to https://lwaccess.languageweaver.com/v1/translation-async/[src].[tgt]/lpid=[lpid]/[optional-params]/ • Appropriate for large size files • Mandatory /Optional Input Parameters are similar with the Blocking Translation • Sample calls: • Create Non-Blocking Translation Job for Text/ URL/ File • Get Language Pair details, Get User Info • Followed by HTTP GET’s to https://api.languageweaver.com/v1/translation-async/[src].[tgt]/[jobID]/lpid=[lpid]/[optional-params]/ • [jobID] – integer denoting the specific translation submitted with the POST • Sample calls: • GET Non-Blocking Translation Job for Text/ URL/ File

  11. Translation REST API • Sample code – C# Example // Step 1: Construct the path. Check to see if the LPID and/or input_format is submitted string szPath = "/v1/translation/" + szSrcLang + "." + szTgtLang + "/"; if (0 != szLPID.Length) szPath = szPath + "lpid=" + szLPID + "/"; if (0 != szInputFormat.Length) szPath = szPath + "input_format=" + szInputFormat + "/"; // Step 2: Construct the URL string szURI = m_szHostName + szPath; System.Console.WriteLine(szURI); // Step 3: Prepare the POST request HttpWebRequest request = (HttpWebRequest)WebRequest.Create(szURI); PrepareHttpRequestHeader("POST", szPath, ref request);

  12. Translation REST API // Step 4: Attach the POST data szSourceText = "source_text=" + szSourceText; byte[] postDataBytes = Encoding.UTF8.GetBytes(szSourceText); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = postDataBytes.Length; Stream requestStream = request.GetRequestStream(); requestStream.Write(postDataBytes, 0, postDataBytes.Length); requestStream.Close(); // Step 5: Read the response HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReaderresponseReader = new StreamReader(response.GetResponseStream(), Encoding.UTF8); string lpInfoResponse = responseReader.ReadToEnd(); // Step 6: Parse the XML document for the translated text XmlDocumentxmlDoc = new XmlDocument(); xmlDoc.LoadXml(lpInfoResponse); System.Console.WriteLine(lpInfoResponse); XmlNodeListnodeList = xmlDoc.GetElementsByTagName("translated_text"); szTargetText = nodeList[0].InnerText.Trim();

  13. Translation REST API – Header Generation • Sample code – C# Example • Generate Header // Step 1: Get the current HTTP date string szHttpDate = GetHttpDate(); // Step 2: Generate the signature szRequestType = szRequestType.ToUpper(); string szSignature = GenerateSignature(szRequestType, szHttpDate, szURI); // Step 3: Add the two new headers to the request object request.Headers.Add("LW_Date", szHttpDate); request.Headers.Add("Authorization", "LWA:" + m_szUserID + ":" + szSignature); System.Console.WriteLine(szSignature);

  14. Translation REST API – Header Generation • Generate Signature Encoding u8Encoding = new UTF8Encoding(); HMACSHA1 hmacsha1 = new HMACSHA1(u8Encoding.GetBytes(m_szAPIKey)); string szMessage = szRequestType.Trim() + "\n" + szHttpDate.Trim() + "\n" + szURI.Trim(); string szSignature = Convert.ToBase64String(hmacsha1.ComputeHash(u8Encoding.GetBytes(szMessage.ToCharArray()))); return szSignature;

  15. Translation REST API • Sample request – response for Create Non-Blocking Translation Job for Text e.g. HTTP POST request to https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/lpid=74/ <?xml version='1.0' encoding='UTF-8'?> <lwresponse> <service_version>v1</service_version> <requested_url>/v1/translation-async/eng.fra/lpid=74/</requested_url> <request_type>POST</request_type> <request_time>Wed Mar 3 14:55:51 2010</request_time> <source_language>eng</source_language> <target_language>fra</target_language> <response_data type='translation-async_post'> <retrieval_url>https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/90079.3bccc5e58d50ce7dcaf950f562ec2303/lpid=74</retrieval_url> <job_id>90079</job_id> <translation_signature>3bccc5e58d50ce7dcaf950f562ec2303</translation_signature> <src>eng</src> <tgt>fra</tgt> <lpid>74</lpid> <input_format>text/plain</input_format> <input_encoding></input_encoding> <dictionary></dictionary> <customizer></customizer> <source_text><![CDATA[Hello World]]></source_text> <server> <version>5.1.2 release ENGFRAU20_5.1.x.0</version> </server> </response_data> </lwresponse>

  16. Trustscore™ and Reporting • Internal LW milestone • Migration to version 2 of REST API • Reporting: • Words per minute • Number of documents translated • Average document length • Details about the TrustScore™ • Other metrics to be defined • Trustscore™: • Scored from 1-5 • Document level scoring • Segment level scoring not supported

  17. REST API Version 2 • New format • Sample of Create Non-Blocking Translation Job for Text • https://api.languageweaver.com/v2/language-pair/[lpid]/translation-async/[optional-params]/ • Mandatory and Optional parameters same as v1 • Additional calls/ functionality related to: • Trustscure • Reporting • Dictionary

  18. Customization through Dictionaries • Structure • One entry per term, one translation per entry • Search & Replace mechanism that applies unconditionally • Size • Up to 300.000 entries • Best practice to build one • Using CSV files • Limitations • No limitations on the content • Recommend use of dictionaries is via phrase replacement instead of word replacement • Gender is not automatically generated • UTF-8 • Impact on performance • No significant impact

  19. Customization through Training Parallel Aligned Text LW Training Compute Cloud d Optional: Regression Text Evaluation Product Delivery via TOD • Data: • Fix noisy text • More text • Text alignment • Text segmentation Optional: Test Text

  20. Customization through Training • Structure: • Train on any language pair specified in the FLAVIUS agreement • Inputs: TMX parallel segments, optional regression text files, optional test sets for evaluation • Outputs: • Trained engine • Results of BLEU scored test set • Translated output of regression text files • Metrics from input training corpus • Evaluate customized engine via TOD deployment

  21. FLAVIUS Language Weaver Roadmap

  22. Questions & Answers

  23. Thank you! Accelerating the way the world communicates

More Related