1 / 7

Data Science Using Scikit-Learn

Several Python libraries offer solid execution of a range of machine learning algorithms. One of the best called is Scikit-Learn, a package that supports accurate versions of a large number of standard algorithms. A clean, uniform features and Scikit-Learn, and streamlined API, as well as by beneficial and complete online documentation.

Ducat1
Download Presentation

Data Science Using Scikit-Learn

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to Ducat India Language | Industrial Training | Digital Marketing | Web Technology | Testing+ | Database | Networking | Mobile Application | ERP | Graphic | Big Data | Cloud Computing Apply Now Training and Certification Call us: 70-70-90-50-90 www.ducatindia.com

  2. Data Science Using Scikit-Learn Several Python libraries offer solid execution of a range of machine learning algorithms. One of the best called is Scikit-Learn, a package that supports accurate versions of a large number of standard algorithms. A clean, uniform features and Scikit-Learn, and streamlined API, as well as by beneficial and complete online documentation. Data Representation in Scikit-Learn Machine learning is about generating models from data: for that reason, we will start by discussing how data can be represented to be learned by the computer. The best method to thought about data within Scikit-Learn is in terms of tables of data. Data as table A virtual table is a two-dimensional grid of data, in which the rows describe single elements of the dataset, and the columns describe quantities associated with each of these elements. For example, consider the Iris dataset, popularly analyzed by Ronald Fisher in 1936. We can download this dataset in the form of a Pandas DataFrame using the Seaborn library: In[1]: import seaborn as sns iris = sns.load_dataset('iris') iris.head()

  3. Out[1]: sepal_lengthsepal_widthpetal_lengthpetal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Therefore, each row of the data defines a single observed flower, and the multiple rows are the total number of flowers in the dataset. In general we will define the rows of the matrix as samples and the number of rows as n_samples. Each column of the data refers to a particular quantitative piece of information that describes each sample. In general, we will refer to the columns of the matrix as features, and the number of columns as n_features.

  4. Features matrix This table layout makes clear that the information can be thought of as a two-dimensional numerical array or matrix, which we will call the features matrix. By convention, this features matrix is often stored in a variable named X. The features matrix is assumed to be two-dimensional, with shape [n_samples, n_features], and is included in a NumPy array or a Pandas DataFrame. However, some ScikitLearn models also accept SciPy sparse matrices. The samples (i.e., rows) always defines the individual objects defined by the dataset. For example, the sample can be a flower, a person, a document, an image, a sound file, a video, an astronomical object, or anything else we can define with a set of quantitative measurements. The features (i.e., columns) always describes the distinct observations that quantitatively represent each sample. Features are generally real-valued but can be Boolean or discrete-valued in some methods. Target array In addition to the feature matrix X, we also generally work with a label or target array, which by convention we will usually call y. The target array is usually one dimensional, with length n_samples, and is generally contained in a NumPy array or Pan‐ das Series. The target array can have continuous analytical values or discrete classes/labels. While some Scikit-Learn estimators do handle multiple target values in the form of a two-dimensional [n_samples, n_targets] target array, we will generally be working with the typical case of a one-dimensional target array.

  5. For example, in the primary data, we can wish to generate a model that can predict the species of the flower depends on the other measurements; in this case,the species column can be considered the feature. In[3]: X_iris = iris.drop('species', axis=1) X_iris.shape Out[3]: (150, 4) In[4]: y_iris = iris['species'] y_iris.shape Out[4]: (150,)

  6. Basics of the API Most generally, the steps in using the Scikit-Learn estimator API are as follows: Select a class of model by importing the appropriate estimator class from ScikitLearn. Select model hyperparameters by instantiating this class with desired values. Sequence the data into a features matrix and target vector following the discussion from before. Fit the model to our data by calling the fit() method of the model instance. Apply the model to new data: For supervised learning, we predict labels for new data using the predict() method. For unsupervised learning, we often transform or infer properties of the data using the transform() or predict() method. Read More: https://tutorials.ducatindia.com/data-science/data-science-using-scikit/

  7. Thank You Call us: 70-70-90-50-90 www.ducatindia.com

More Related