2016年2月27日 星期六

[Python] Google Deep Learning TensorFlow 課程(二) -- 作業一 (上)


Google 大神開的課程還有放 sample code 和作業在Github上,邊念就邊把作業心得放來這邊衝文章數.

  • 第一個範例code是下載圖片的壓縮檔
  • url = 'http://yaroslavvb.com/upload/notMNIST/'
    def maybe_download(filename, expected_bytes):
    """Download a file if not present, and make sure it's the right size."""
    if not os.path.exists(filename):
    filename, _ = urlretrieve(url + filename, filename)
    statinfo = os.stat(filename)
    if statinfo.st_size == expected_bytes:
    print('Found and verified', filename)
    else:
    raise Exception(
    'Failed to verify' + filename + '. Can you get to it with a browser?')
    return filename
    train_filename = maybe_download('notMNIST_large.tar.gz', 247336696)
    test_filename = maybe_download('notMNIST_small.tar.gz', 8458043)
檔案超大的...
  • 第二個範例code是要把檔案解壓縮,解完後的檔案大概有4G
  • num_classes = 10
    def extract(filename):
    tar = tarfile.open(filename)
    root = os.path.splitext(os.path.splitext(filename)[0])[0] # remove .tar.gz
    print('Extracting data for %s. This may take a while. Please wait.' % root)
    sys.stdout.flush()
    #tar.extractall()
    #tar.close()
    data_folders = [
    os.path.join(root, d) for d in sorted(os.listdir(root)) if (d != '.DS_Store' and d != 'A.pickle'
    and d!='B.pickle' and d != 'C.pickle'
    and d!='D.pickle' and d != 'E.pickle'
    and d!='F.pickle' and d!='G.pickle'
    and d!='H.pickle' and d!='I.pickle'
    and d!='J.pickle')]
    if len(data_folders) != num_classes:
    raise Exception(
    'Expected %d folders, one per class. Found %d instead.' % (
    num_classes, len(data_folders)))
    print(data_folders)
    return data_folders
    train_folders = extract(train_filename)
    test_folders = extract(test_filename)
    view raw extract.py hosted with ❤ by GitHub
因為程式裡沒有判定是否有下載,所以我就把解壓縮那段註解掉了.而且因為會檢查檔案數量,得把一些不要算進去的檔案加到排除條件中.
  • 接著遇到第一個問題,用ipython的功能讀取影片檔
Let's take a peek at some of the data to make sure it looks sensible. Each exemplar should be an image of a character A through J rendered in a different font. Display a sample of the images that we just downloaded. Hint: you can use the package IPython.display.
其實這個作業真的不難,但是當初跟Ipython.display不熟搞了好久XD
  • 另外試試一個比較進階的做法,用ndimage.imread讀取圖片
把圖片讀取成28*28的矩陣,再用plt.imshow() 把矩陣檔案還原成圖片,記得要加那一行magic function: %matplotlib inline才會在notebook裡面打開圖片
  • 接著把圖片檔轉成pickle檔
  • image_size = 28 # Pixel width and height.
    pixel_depth = 255.0 # Number of levels per pixel.
    def load_letter(folder, min_num_images):
    image_files = os.listdir(folder)
    dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
    dtype=np.float32)
    image_index = 0
    print folder
    for image in os.listdir(folder):
    image_file = os.path.join(folder, image)
    try:
    image_data = (ndimage.imread(image_file).astype(float) -
    pixel_depth / 2) / pixel_depth
    if image_data.shape != (image_size, image_size):
    raise Exception('Unexpected image shape: %s' % str(image_data.shape))
    dataset[image_index, :, :] = image_data
    image_index += 1
    except IOError as e:
    print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')
    num_images = image_index
    dataset = dataset[0:num_images, :, :]
    if num_images < min_num_images:
    raise Exception('Many fewer images than expected: %d < %d' %
    (num_images, min_num_images))
    print('Full dataset tensor:', dataset.shape)
    print('Mean:', np.mean(dataset))
    print('Standard deviation:', np.std(dataset))
    return dataset
    def load(data_folders, min_num_images_per_class):
    dataset_names = []
    for folder in data_folders:
    dataset = load_letter(folder, min_num_images_per_class)
    set_filename = folder + '.pickle'
    try:
    with open(set_filename, 'wb') as f:
    pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
    dataset_names.append(set_filename)
    except Exception as e:
    print('Unable to save data to', pickle_file, ':', e)
    return dataset_names
    train_datasets = load(train_folders, 45000)
    test_datasets = load(test_folders, 1800)
    view raw loadpickle.py hosted with ❤ by GitHub
Pickle可以將python 物件序列化,節省空間和加快讀取速度
  • 祭祖中,以下待續



沒有留言:

張貼留言