How to use datasets.fetch_mldata() in sklearn – Python?

26 July 2024

2

mldata.org does not have an enforced convention for storing data or naming the columns in a data set. The default behavior of this function works well with most of the common cases mentioned below:

Data values stored in the column are ‘Data’, and target values stored in the column are ‘label’.
The first column table stores target, and the second stores’ data.
The data array is stored as features and samples and needed to be transposed to match the sklearn standard.

Fetch a machine learning data set, if the file does not exist, it is downloaded automatically from mldata.org.

sklearn.datasets package directly loads datasets using function: sklearn.datasets.fetch_mldata()

Syntax: sklearn.datasets.fetch_mldata(dataname, target_name=’label’, data_name=’data’, transpose_data=True, data_home=None)

Parameters:

dataname: (<str>) It is the name of the dataset on mldata.org, e.g: “Iris” , “mnist”, “leukemia”, etc.

target_name: (optional, default: ‘label’) It accepts the name or index of the column containing the target values and needed to pass the default values of the label.

data_name: (optional, default: ‘data’) It accepts the name or index of the column containing the data and needed to pass default values of data.

transpose_data: (optional, default: True) The default value passed is true, and if True, it transposes the loaded data.

data_home: (optional, default: None) It loads cache folder for the datasets. By default, all sklearn data is stored in ‘~/scikit_learn_data’ subfolders.

Returns: data, (Bunch) Interesting attributes are: ‘data’, data to learn, ‘target’, classification labels, ‘DESCR’, description of the dataset, and ‘COL_NAMES’, the original names of the dataset columns.

Let’s see the examples:

Example 1: Load the ‘iris’ dataset from mldata, which needs to be transposed.

Python3

# import fetch_mldata function
from sklearn.datasets.mldata import fetch_mldata
 
# load data and transpose data
iris = fetch_mldata('iris', 
                    transpose_data = False)
 
# iris data is very large 
# so print the dataset shape
# print(iris)
print(iris.data.shape)

Output:

(4,150)

Example 2: Load the MNIST digit recognition dataset from mldata.

Python3

# import fetch_mldata function
from sklearn.datasets.mldata import fetch_mldata
 
# load data  
mnist = fetch_mldata('MNIST original')
 
# mnist data is very large
# so  print the shape of data
print(mnist.data.shape)

Output:

 (70000, 784)

Note: This post is according to Scikit-learn (version 0.19).

How to use datasets.fetch_mldata() in sklearn – Python?

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

OnePlus’ decision to ditch Samsung’s OLED screens could backfire in the US

Recent Comments

EDITOR PICKS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR POSTS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR CATEGORY

ABOUT US

FOLLOW US