Bryce's LIS 2600 Blog: September 2011

Monday, September 26, 2011

Muddiest Point for Week 4

Two Points:

1) If the DBMS is only the graphical representation of what is going on in my database, how do I see the raw data? Or is the raw data just my tables?

2) Do object-oriented, object-relational, or multidimensional databases look any different from relational databases, or do they just function differently?

Unit 5 Reading Comments

“Setting the Stage,” Anne J. Gilliland

· Creators (as opposed to info professionals) are generating metadata, but not necessarily assigning that name to it.

· Varieties of metadata: admin, descriptive, preservation, technical (how systems function), and use (use of collections and info resources).

· Three components to metadata: content (what it contains or is about), context (who, what, why, how, where of object), and structure (associations/relations to other objects). Structure is taking an increasingly prominent role due to computer processing powers.

· Functions of metadata: creation, reuse, recontexting, and multiversioning – does reuse mean at say the institutional level, or archival level, or both?

o Organizing and describing of objects

o Validating – forming trustworthiness and authentication of objects

o Search/retrieve – metadata can help you re-find the object

o Use and preservation

o Disposition – deciding whether to archive or destroy it

· Metadata’s utility: increases accessibility, retains context, can expand use, help in legal issues (maybe), assist in preserving potentially.

· Library metadata=indexes, abstracts, and bibliographic records conforming to various standards

· Archival/museum metadata focuses more on context, both providing and preserving it

“An Overview of DCMI,” Eric Miller

· DCMI (model or metadata apparently) created DCES (Dublin Core Element Set) to support cross-discipline resource discovery

o Apparently most of DCMI’s effort has been in clarifying what exactly it does. That and supporting “richer” descriptive requirements.

· So, if there’s something to describe, which can be anything, Dublin Core would like to be able to describe it by ascribing it properties, classes, and literals.

o Properties: a type of resource (and a resource is anything that can be unique)

o Classes: specific types of resources

o Literals: simple text strings (XML) < rfds: title or whatever >The Title</rfds: title or whatever> open, new element, close new element, close, etc.

§ Interestingly it can use a “namespace” to tie a specific word(s) to an element.

· Resource Description Framework (RDF)

· Basically it’s a flexible way of describing information to a standard. Similar to say EAD, but even more flexible. Worth looking into in further depth.

EndNote X5: Introduction

· I wish I had watched some of these videos last week. They would have saved me (some) of the aggravation of getting EndNote to do what I want.

Wednesday, September 21, 2011

Flickr Photos for Assignment 1

My Flickr photos (both the screen display and thumbnail copies) are now up and available at http://www.flickr.com/photos/66255398@N07/

Sunday, September 18, 2011

Unit 4 Reading Comments

Wikipedia – “Database”

If not every collection of data is a database then what are the other collections called?

If “stored data in a database is not generally portable across different DBMS,” should they be? As data becomes more important in scholarship, it would seem like some type of standardization to be interoperable to a high degree across DBMSs will be increasingly important. Later in Database Migration between DBMSs, they say “A database built with one DBMS is not portable to another DBMS,” but go on to talk about the ways in which it is possible to “migrate” and “transform” from one to another. I don’t get how it is possible (but tricky) and all not possible (put doable).

This article makes it sound as if while SQL is the (a?) standard for ANSI and ISO that it is not as useful as perhaps other languages. Is this an example of an early infrastructure adoption that is maybe not perfect, but is so ubiquitous that we cannot free ourselves of it?

Wikipedia – “Entity-relationship model”

See, Peter Chen, "The Entity Relationship Model: Toward a Unified View of Data"

Entity(-type)= physical object, event, or concept. Entity is only one instance of these things, but entity-type is the actual category that includes many instances.

Entities are the nouns, and the relationship is the verb. Ex.: Artist—Perform—Song. This is one instance, the entity-relationship is composed of a set of instances.

http://www.phlonx.com/resources/nf3/

Normal Form (NF1) needs ”atomicity”: the indivisibility of an attribute into similar parts.

Does figure A-1 break atomicity, it has columns that contain repeating data, but it’s rows seem okay. I don’t see how fig. A-1 is really any different from B other than they filled in the repeating data. It seems like both have elements that make their rows unique. The answer is that it doesn’t and hence is broken up into several different tables until it does.

This tutorial would be easier to follow if it included a database that I was manipulating. Also, if it began with showing the final product (fig. K) displayed next to A-1 I think it would help. Not that it would need to explain what exactly was different in the beginning, only show that we need to go from spreadsheet to database and they each look like this. Then they could go on to explain how that is accomplished and why each step is necessary. As it stands, I had to go back the beginning and re-read in order to really start to grasp what happened.

Muddiest Point for Week 3

I thought the lecture was clear. I have no outstanding questions this week.

Sunday, September 11, 2011

Muddiest Point for Week 2

We talked about suggested a controlled vocabulary for tags on delicious.com in order to maintain a consistency between different users. This seems like a good idea, and is also talked about a bit in the Galloway article. What I would like to know, is if there is a way to create new tags consistently? That is, can you suggest a way in which new tags would be able to become standardized within the user community? Alternately, should a site like delicious follow the Historic Pittsburgh website and adopt an existing subject thesauri like the Library of Congress Subject Headings?

Unit 3 Reading Comments

Wikipedia – “Data Compression.” Lossless: Compressed data that is error free. Lossy: Some fidelity or quality is lost in the compression. What I don’t get or is not explained in this article is why you would ever want lossy over lossless. Is lossless a lot bigger?

“Data Compression Basics.”

The examples really help clarify wikipedia’s poor ‘e’ versus ‘z’ example.

If tiff files, which are huge, use LZ algorithms then they are somewhat compressed. How large would an image be with no compression at all? And would creating such an image possess any advantages if size was not an issue?

The smooth and jiggled grayscale comparisons are a great example of ‘rounding’ in lossy compression.

Lossy is not loss of quality, but loss of data.

Ed Galloway, “Imaging Pittsburgh”

Several helpful pieces of advice in working with multiple partners with different agenda on one project

Coming to terms with consistent subject headings; figuring out a way to cope with an array of metadata styles and still look consistent; and then there is the copyright. The solution to copyright seemed a little weak.

Did image reproduction make any money? I have heard that it rarely does, but perhaps they have an insight into it that I do not?
I like the idea that by offering various collections from different types of institutions that you begin to create context through the images themselves.

Paula Webb, “YouTube and Libraries.”

Wow, someone sure is enthusiastic. Are there no disadvantages to using YouTube? Do you sacrifice any rights to the materials posted as with Facebook, or other social media sites?

Saturday, September 3, 2011

Unit 2: Computer Basics, Digitization Reading Notes and Comments

Wikipedia. "Hardware"

There seem to have been very few changes to the overall structure of the computer, sure things have improved in terms of smaller and faster, but the physical components appear to be unchanged almost since inception. So, there's the calculator brain (CPU), a holding tank for current operations (RAM), its central nervous system (the buses), and a reptile back brain (BIOS) running crucial but largely uncontrolled operations like power (the heart) or firmware. If anything has changed, it is the computer's tool wielding devices (optical drives), though how much change has happened beyond memory size and speed, I'm not sure. Therefore, in these terms the computer almost appears lifelike at least until we consider that unlike anything in life it must have some type of user in order to do anything. In order to interface with the hardware the user makes use of special organs called input devices, which have changed or evolved slightly over time - e.g. the addition of a mouse and then a few specialized keyboard keys not present on the typewriter from which it was adapted.

Wikipedia. "Software"

The mental functions, or the stuff that makes the machine think; these include its etiquette and protocols. When you anthropomorphize the computer it is no wonder that several philosophers at the dawn of the PC age (maybe now too?) have attempted to show how very machine-like people are, and whereas they have a valid point, have we not created the machine in our own image?

Software is where things get tricky. The hardware is designed to operate without error (ideally), but once the thought process is introduced in the form of software things can start to go off track and we see bugs, or wrong orders, being issued.

A couple issues with this Wikipedia entry. How is the "the history of computer software [...] most often traced back to the first software bug in 1946." What does that even mean? Also confusing is the statement that, "it is hard to imagine today that people once felt that software was worthless without a machine." I think I get that the author is driving at the fact people once thought that computers have to be a bundled hardware/software package and that pieces of software on its own would be worthless. However, the way it currently reads begs the question, "isn't it?"

Stuart D. Lee. "Digitization: Is It Worth It?"

"Not to mention the money." I almost wish he would mention the funding that has cropped up since digitization was deemed a valid method of preservation. It would make the article longer of course, but the issue of ‘if there is money out there to do it people will flock towards it’ seems like a large piece of the story here.
Lee suggests that digital imaging is "the most common form of digitization we all encounter," and perhaps it was generally true in 2001 (maybe it’s still true in libraries?), but I would argue that Lee has overlooked private efforts at digitization. Bootleggers and the commercial industry have both been heavily engaged in digitization from the start, and they have focused not on imaging, but on video and audio. If libraries fell behind on this front, there were always private individuals to pick up the slack.
Is the only real benefit to digitization the access it creates?
One thing I feel Lee overlooks in his cost effectiveness/access discussion is that if an item is photocopied 19+ times then its longevity may have become compromised. Therefore, add on to the copy cost the cost of conservation to this high use item.
On the issue of collection development versus digitization, it is true that having access to journal databases is a wonderful thing. In the long-term however, what do libraries gain from these subscriptions? A digitized rare manuscript's costs are finite, but the yearly cost of database subscription is never ending and if it should ever become beyond an institution's means to finance it they have nothing in the form of collection to show for it. All those journals suddenly vanish while that scanned rare book can be used well over a year and potentially indefinitely.

Doreen Carvajal. "European libraries face problems in digitalizing." New York Times

"The basic problem is that there isn't enough money to digitize everything we want to." I hope this statement was not meant to be a revelation to anyone because if it was, then welcome to the real world.
I guess I fail to see the confusion or problem in the article. Were European efforts in digitization attempting to be completely independent of private finances while trying to compete with the massive resources of a company like Google? Are they surprised that limited government funding just is not enough? Or are European libraries in 2007 just coming to the realization that if a private for profit group gives you money that they are expecting to receive something in return? The concern, as I understand it, is that there exists a danger that Google will crush world culture by digitizing only U.S.-centric materials. Does Google then have a cultural agenda? It does not seem like it since they appear to be reaching out to Europe and their digitization efforts.

Charles Smith. "A Few Thoughts on the Google Books Library Project." EDUCAUSE Quarterly. 2008.

Is the issue really that books will become obsolete and not that a private for-profit company is digitizing them?
"Remember being taken to the library at the beginning of the school year to learn how to access its resources." Most libraries still run these seminars and they continue to be as, or more, relevant now than they ever were. It's true that in order to fine tune your research abilities you will need to a) research and b) get individualized help, those seminars exposing the user to a library’s resources are still crucial regardless of experience level.
I'm not sure I agree with Smith's conclusions. It has been my experience that if a researcher actually wants/needs something that they will go to great lengths to get it. If something is digitized then use it, but does it not being digitized preclude use? Perhaps, but only if there is another book or article nearly identical that has been digitized. Certainly, people are lazy and gravitate towards the easiest possible way, but I still have not seen this generation unable to do or think anything that is not Internet accessible. Is this a real concern? If something is not online does it no longer exist? I hope not, because for all the bits available there is still only a fraction of the useful information on the web in the overall scheme of things.

Week 1 - Muddiest Point

I felt the first lecture was clear, and do not have any questions or concerns there. I do have a question in terms of weekly readings however. Are the readings for discussion the readings that I am supposed to be posting about in this blog, or are the required readings what I am to be posting? For now I will post on the required readings and be ready to discuss the discussion readings in class, or online, or wherever said readings are to take place.

Thursday, September 1, 2011

The Initial Post

This blog will track my progress with readings and issues, comments, or concerns for the LIS 2600 class at University of Pittsburgh.