Google 大神開的課程還有放 sample code 和作業在Github上,邊念就邊把作業心得放來這邊衝文章數.
- 第一個範例code是下載圖片的壓縮檔
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
url = 'http://yaroslavvb.com/upload/notMNIST/' | |
def maybe_download(filename, expected_bytes): | |
"""Download a file if not present, and make sure it's the right size.""" | |
if not os.path.exists(filename): | |
filename, _ = urlretrieve(url + filename, filename) | |
statinfo = os.stat(filename) | |
if statinfo.st_size == expected_bytes: | |
print('Found and verified', filename) | |
else: | |
raise Exception( | |
'Failed to verify' + filename + '. Can you get to it with a browser?') | |
return filename | |
train_filename = maybe_download('notMNIST_large.tar.gz', 247336696) | |
test_filename = maybe_download('notMNIST_small.tar.gz', 8458043) |
檔案超大的...
- 第二個範例code是要把檔案解壓縮,解完後的檔案大概有4G
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
num_classes = 10 | |
def extract(filename): | |
tar = tarfile.open(filename) | |
root = os.path.splitext(os.path.splitext(filename)[0])[0] # remove .tar.gz | |
print('Extracting data for %s. This may take a while. Please wait.' % root) | |
sys.stdout.flush() | |
#tar.extractall() | |
#tar.close() | |
data_folders = [ | |
os.path.join(root, d) for d in sorted(os.listdir(root)) if (d != '.DS_Store' and d != 'A.pickle' | |
and d!='B.pickle' and d != 'C.pickle' | |
and d!='D.pickle' and d != 'E.pickle' | |
and d!='F.pickle' and d!='G.pickle' | |
and d!='H.pickle' and d!='I.pickle' | |
and d!='J.pickle')] | |
if len(data_folders) != num_classes: | |
raise Exception( | |
'Expected %d folders, one per class. Found %d instead.' % ( | |
num_classes, len(data_folders))) | |
print(data_folders) | |
return data_folders | |
train_folders = extract(train_filename) | |
test_folders = extract(test_filename) |
因為程式裡沒有判定是否有下載,所以我就把解壓縮那段註解掉了.而且因為會檢查檔案數量,得把一些不要算進去的檔案加到排除條件中.
- 接著遇到第一個問題,用ipython的功能讀取影片檔
Let's take a peek at some of the data to make sure it looks sensible. Each exemplar should be an image of a character A through J rendered in a different font. Display a sample of the images that we just downloaded. Hint: you can use the package IPython.display.
其實這個作業真的不難,但是當初跟Ipython.display不熟搞了好久XD
- 另外試試一個比較進階的做法,用ndimage.imread讀取圖片
把圖片讀取成28*28的矩陣,再用plt.imshow() 把矩陣檔案還原成圖片,記得要加那一行magic function: %matplotlib inline才會在notebook裡面打開圖片
- 接著把圖片檔轉成pickle檔
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
image_size = 28 # Pixel width and height. | |
pixel_depth = 255.0 # Number of levels per pixel. | |
def load_letter(folder, min_num_images): | |
image_files = os.listdir(folder) | |
dataset = np.ndarray(shape=(len(image_files), image_size, image_size), | |
dtype=np.float32) | |
image_index = 0 | |
print folder | |
for image in os.listdir(folder): | |
image_file = os.path.join(folder, image) | |
try: | |
image_data = (ndimage.imread(image_file).astype(float) - | |
pixel_depth / 2) / pixel_depth | |
if image_data.shape != (image_size, image_size): | |
raise Exception('Unexpected image shape: %s' % str(image_data.shape)) | |
dataset[image_index, :, :] = image_data | |
image_index += 1 | |
except IOError as e: | |
print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.') | |
num_images = image_index | |
dataset = dataset[0:num_images, :, :] | |
if num_images < min_num_images: | |
raise Exception('Many fewer images than expected: %d < %d' % | |
(num_images, min_num_images)) | |
print('Full dataset tensor:', dataset.shape) | |
print('Mean:', np.mean(dataset)) | |
print('Standard deviation:', np.std(dataset)) | |
return dataset | |
def load(data_folders, min_num_images_per_class): | |
dataset_names = [] | |
for folder in data_folders: | |
dataset = load_letter(folder, min_num_images_per_class) | |
set_filename = folder + '.pickle' | |
try: | |
with open(set_filename, 'wb') as f: | |
pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL) | |
dataset_names.append(set_filename) | |
except Exception as e: | |
print('Unable to save data to', pickle_file, ':', e) | |
return dataset_names | |
train_datasets = load(train_folders, 45000) | |
test_datasets = load(test_folders, 1800) |
Pickle可以將python 物件序列化,節省空間和加快讀取速度
- 祭祖中,以下待續
沒有留言:
張貼留言