1 / 48

Implementing DFS Search Services

Implementing DFS Search Services. Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette. DFS 6.5 Search and Classification Services. DFS: Service-oriented and platform-agnostic Search service in DFS since 6.0: Federated Search on Documentum repositories and external repositories

judith-holt
Download Presentation

Implementing DFS Search Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing DFS Search Services Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette

  2. DFS 6.5 Search and Classification Services • DFS: Service-oriented and platform-agnostic • Search service in DFS since 6.0: • Federated Search on Documentum repositories and external repositories • 6.5: New search and content intelligence features: • Nonblocking search • Clustering of search results • Saved searches • Classification service • A platform to build wide range of search applications from mobile search to advanced discovery interface • This presentation of services put them in practice by progressively building an application example.

  3. Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting

  4. DFS 6.5 Search and Classification Services Search service • Simple search • Federated search • Nonblocking search • Advanced queries

  5. Search Service • SearchService: • execute • Executes a query and returns results • getRepositoryList • Returns the list of available sources (managed and unmanaged repositories) • Query can be structured or passthrough (straight DQL) • Results contains Query status and DataPackage (list of DataObject) • Stateless: relies on a caching mechanism

  6. Consumers DFS DFC Content Server Search Service WSDL-based Proxies Search Service ECI Server Query Store Service DFS Runtime JAX-WS / JAXB Analytics Service CI Server DFS Runtime Control flow Services Architecture

  7. Demo: A Simple Search Application

  8. Example: A Simple Search Application • A simple example that performs a search on one repository and displays results • Architecture of the example: • User interface in AJAX • Java servlets call DFS and format results in JSON for the UI • Remote call to DFS but could also be local calls

  9. Example: Execute Query Setup context StructuredQuery q = new StructuredQuery(); q.addRepository("MSSQL60ECI4"); q.setObjectType("dm_document"); ExpressionSet expressionSet = new ExpressionSet(); expressionSet.addExpression(new FullTextExpression(searchQuery)); q.setRootExpressionSet(expressionSet); QueryExecution queryExec = new QueryExecution(0, 100, 100); QueryResult queryResult = searchService.execute(q, queryExec, null); RepositoryIdentity identity = new RepositoryIdentity("MSSQL60ECI4", "userdev1", "userdev1", ""); ContextFactory contextFactory = ContextFactory.getInstance(); IServiceContext context = contextFactory.newContext(); context.addIdentity(identity); ISearchService searchService = ServiceFactory.getInstance().getRemoteService(ISearchService.class, context, "search", "http://127.0.0.1:8080/services"); Build and execute query

  10. Example: Wrap the Query in a Servlet Get parameter public class SearchServlet extends HttpServlet { protected void doPost(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse) throws ServletException, IOException { String searchQuery = httpServletRequest.getParameter("queryTerms"); //…

  11. JSON: A JavaScript-friendly structure Easy to represent lists and name/value pairs Example: Format Response as JSON

  12. Example: Format Response as JSON public void writeJSON(PrintWriter writer, QueryResult response) { writer.append("["); for (Iterator it = response.getDataObjects().iterator(); it.hasNext();) { DataObject dataObject = (DataObject) it.next(); writer.append("{"); PropertySet set = dataObject.getProperties(); Iterator<Property> iterator = set.iterator(); while (iterator.hasNext()) { Property prop = iterator.next(); String strName = prop.getName(); String value = prop.getValueAsString(); writer.append("\"").append(strName).append("\":\"").append(value).append("\""); if (iterator.hasNext()) writer.append(","); } writer.append("}\n"); if (it.hasNext()) writer.append(","); } writer.append("]"); }

  13. Example: HTML Form function updatepage(str){ var rsp = eval("("+str+")"); // use eval to parse JSON response var html= "<table>"; for (i = 0 ; i < rsp.length; i++) { var result = rsp[i]; html += "\n<tr><td>" + result.object_name + "</td></tr>"; } html += "</table>" document.getElementById("result").innerHTML = html; } <!-- … --!> <form name="searchForm" onsubmit='xmlhttpPost("/EMCWorldDemo/search",updatepage, getQueryParams()); return false;'> <p>query: <input name="queryTerms" type="text"> <input value="Go" type="submit"></p> <div id="result"></div></td> </form>

  14. Federated Search • DFS Search Service supports federated search across multiple Documentum repositories and external repositories • Requires ECI option for external repositories • ECI supports a large catalog of adapters to external sources: • CMS (FileNet, SharePoint, IBMCM…) • Websites (Google, Yahoo …) • Databases • Indexers (Verity, Fast, IndexServer…) • Specialized sources (legal, science, regulation, patents, health…) • EMC products (eRoom, EX, AX…) • Support for authentication using the same service as Docbase repositories

  15. Federated Search: Configure ECI To search external repositories: • Install ECIS • Edit dfc.properties in DFS ear: • dfc.search.ecis.enable=true • dfc.search.ecis.host=ecishost

  16. Example: Querying Several Sources Listing available sources String[] sources = httpServletRequest.getParameterValues("sources"); ContextFactory contextFactory = ContextFactory.getInstance(); IServiceContext context = contextFactory.newContext(); for (String source: sources) { RepositoryIdentity identity = new RepositoryIdentity( source, "userdev1", "userdev1", ""); context.addIdentity(identity); } StructuredQuery q = new StructuredQuery(); for (String source: sources) q.addRepository(source); List<Repository> repositories = searchService.getRepositoryList(null); for (Repository dataObject: repositories) { Repository dataObject = it.next(); String sourceName = dataObject.getName(); String userLogin = dataObject.getProperties().getUserLoginCapability(); } Querying multiple sources

  17. Demo: Nonblocking Search

  18. Nonblocking Search • DFS is based on DFC, which supports asynchronous search execution • Allows dynamic display of results • DFS supports it through nonblocking query call: • Allows multiple successive call to get new results and query status DFS Client DFS Service execute(query,0,100) no results wait 1 second execute(query,0,100) 10 results wait 1 second execute(query,10,100) 90 results

  19. Nonblocking Search: Cache • DFS queries are cached • Each query has a definition and a query ID used as key in the cache • Cache policy is size-based and time-based • Each Search Service call contains the initial query (definition) so that the query may be re-executed in case of cache miss. • Configurable in dfs-runtime.properties: • dfs.query_cache_house_keeper.period = 5

  20. Nonblocking Search: QueryStatus • QueryStatus contains status of the query for each repository • Example: Two sources, one successful, one failed with network error

  21. Example: Nonblocking Query Execution Set asynchronous call QueryExecution queryExec = new QueryExecution(start, len, 350); queryExec.setQueryId(queryId); SearchProfile profile = new SearchProfile(); profile.setAsyncCall(true); OperationOptions options = new OperationOptions(); options.setSearchProfile(profile); QueryResult queryResult = searchService.execute(q, queryExec, options);

  22. Advanced Queries StructuredQuery: an abstract query • Allow to refine the query. • Allow to bind the query to UI controls. • Independent of the Full-text Indexer and Content Server version. Independent on the presence of an Indexer.

  23. Advanced Queries • FullTextExpression • Supports a Boolean ‘mini-language’: phrase AND, OR, NOT and parentheses • Example: EMC contract AND (“end of life” OR termination) NOT ECIS • ExpressionSet • Boolean expression between FullTextExpression and PropertyExpression • PropertyExpression • Constraints on document attributes • Operators: EQUAL, NOT_EQUAL, GREATER_THAN, LESS_THAN, GREATER_EQUAL, LESS_EQUAL, BEGINS_WITH, CONTAINS, DOES_NOT_CONTAIN, ENDS_WITH, IN, NOT_IN, BETWEEN, IS_NULL, IS_NOT_NULL, • Values: SimpleValue, ValueList, ValueRange, RelativeDateValue

  24. Advanced Queries: Example Example of structured query: • Object_name contains “test”, modified date in the last month and owner_name is “marc” or “ghislain” Advanced query example ExpressionSet expr = new ExpressionSet(); expr.addExpression(new PropertyExpression("object_name", Condition.CONTAINS,"test")); expr.addExpression(new PropertyExpression("r_modify_date", Condition.GREATER_EQUAL, new RelativeDateValue(-1, TimeUnit.MONTH))); ExpressionSet orExpr = new ExpressionSet(ExpressionSetOperator.OR); orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL,"marc")); orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL,"ghislain")); expr.addExpression(orExpr);

  25. Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting

  26. DFS 6.5 Search and Classification Services Clustering • Simple clustering of search results • Multiple facets and strategies • Getting results • Go beyond search

  27. Clustering • Dynamic grouping of results into ‘clusters’ • Based on results properties (not content) • Uses linguistic rules • Option of Search Service • Requires an SBO to be installed • An installer is provided (Webtop Extended Search) • Supports hierarchical clustering

  28. Clustering • SearchService: • getClusters • Return the clusters for a query • getSubClusters • Return the clusters for a subset of a query • getResultsProperties • Return the properties for a subset of a query • The services are stateless • Reuse query cached by SearchService.execute. Reexecute it if needed. • All the methods have query and query execution parameter in case of cache miss

  29. Demo: Enhance the Search Application with Clustering

  30. Example: Computing Clusters Get clusters for a query QueryExecution queryExec = new QueryExecution(0, 100, 350); queryExec.setQueryId(queryId); ClusteringProfile profile = new ClusteringProfile(); profile.addClusteringStrategy(new ClusteringStrategy("Topics", Arrays.asList("object_name", "title", "subject", "summary"))); OperationOptions options = new OperationOptions(); options.setClusteringProfile(profile); QueryCluster queryClusters = searchService.getClusters(query, queryExec, options);

  31. QueryCluster 0..* ClusterTree ClusteringStrategy + isRefreshable: Boolean + strategyName: String 1 0..* Cluster + clusterSize: int + clusterValues: List<String> + isSubClusterTreeAvailable: Boolean 0..1 ObjectIdentitySet Example: Clustering Response Objects • getClusters() response

  32. Multiple Facets and Strategies • Several ways to group results together • Defined by a strategy: • Topic • Person names • Dates • Document sizes

  33. Example: Multiple Strategies Set cluster strategy for ‘Topic’ and ‘Date’ • INSERT example of strategies call: author & date by quarter ClusteringProfile profile = new ClusteringProfile(); profile.addClusteringStrategy(new ClusteringStrategy("Topics", Arrays.asList("object_name", "title", "subject", "summary"))); ClusteringStrategy dateClusteringStrategy = new ClusteringStrategy("Date", Arrays.asList("r_modify_date")); PropertySet tokenizerPropSet = new PropertySet(new StringProperty("r_modify_date", "quarterdate")); dateClusteringStrategy.setTokenizers(tokenizerPropSet); profile.addClusteringStrategy(dateClusteringStrategy);

  34. Go Beyond Search • Clustering can be used for nonsearch applications • Example: most active subjects in a repository (automatic tag clouds)

  35. Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting

  36. Saved Queries • QueryStoreService: • listSavedQueries • loadSavedQuery • saveQuery • Allow to manipulate dm_smart_list object (exposed in Webtop since 5.3) • Allow to control which results are saved

  37. Saved Queries List the saved queries for the current user. IQueryStoreService service = ServiceFactory.getInstance().getRemoteService(IQueryStoreService.class, context, "core", "http://127.0.0.1:8080/services"); QueryExecution queryExec = new QueryExecution(0, 100, 100); SavedQueryFilter filter = new SavedQueryFilter(SavedQueryAccessibility.OWNED); DataPackage queryResult = service.listSavedQueries("MSSQL60ECI4", queryExec, filter, null);

  38. Saved Queries Load a saved query ObjectIdentity queryId = new ObjectIdentity(new ObjectId("0821f7588000132e"), "MSSQL60ECI3"); QueryExecution queryExec = new QueryExecution(0, 100, 100); SavedQuery queryResult = queryStoreService.loadSavedQuery(queryId, queryExec, null); SavedQuery RichQuery Query + displayedAttributes: List<String> 1 1 + propertySet: PropertySet 0..1 QueryResult

  39. Saved Queries Save a query Query query = //… ObjectIdentity queryId = new ObjectIdentity("MSSQL60ECI3"); DataObject metadata = new DataObject(queryId) ; metadata.getProperties().set("object_name", "My Saved Query"); RichQuery richQuery = new RichQuery(); richQuery.setQuery(query); QueryExecution queryExec = new QueryExecution(0, 100, 100); ObjectIdentity queryResult = queryStoreService.saveQuery(metadata, richQuery, queryExec, null, null);

  40. Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting

  41. Classification • Introduce a service to compute ‘tags’ for documents • Based on CIS classification engine and managed taxonomy • AnalyticsService: • analyze • Takes a list of object IDs and computes the list of categories for each document

  42. Classification Configuration • Install CIS Server • Installer deploy ear with embedded app server (JBoss) • Install taxonomy • Available Taxonomies • Energy / Energy Industry • Energy / Oil Trading • General Finance • General Knowledge • Information Science and Technology • Law / Federal Legislation Terms • Life Sciences • Manufacturing / Chemical Hazards • Military / DTIC • Science and Engineering • …

  43. Classification: Compute Categories Analyze an object ObjectIdentitySet documentsSet = new ObjectIdentitySet(new ObjectIdentity(new ObjectId("0821f7588000132e"), MY_DOCBASE)); OperationOptions operationOptions = new OperationOptions(); PropertyProfile propProfile = new PropertyProfile(); propProfile.setIncludeProperties(Arrays.asList("CATEGORIES")); operationOptions.setPropertyProfile(propProfile); IAnalyticsService analyticsService = serviceFactory.getRemoteService(IAnalyticsService.class, context, "analytics", "http://127.0.0.1:7001/services"); List<AnalyticsResult> analyticsResults = analyticsService.analyze(documentsSet, operationOptions);

  44. Classification: ‘Analyze’ Response Display the categories for each object for (AnalyticsResult classResult : analyticsResults) { System.out.println("Document ID: " + classResult.getObjIdentity()); List<CategoryAssign> catAssigns = classResult.getCategoryAssignList(); for (CategoryAssign catAssign : catAssigns) { System.out.println("\t " + catAssign.getCategory().getName()); } }

  45. Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting

  46. Troubleshooting • Diagnose query issue: print QueryStatus object: • Diagnose ECIS communication problem: log4j traces Diagnose query issues after execute QueryResult queryResult = searchService.execute(query, exec, options); System.out.println(queryResults.getStatus());

  47. Troubleshooting • Trace DFS request/response on SUN JVM: System.setProperty("com.sun.xml.ws.transport.http.client.HttpTransportPipe.dump", "true");

More Related