Bryce's LIS 2600 Blog: 2011

Tuesday, November 29, 2011

Monday, November 7, 2011

Can XML or its stylesheets be used to specify display order? So, say I always want my data to display in this order:
<unitid>LIS 2600</unitid>
<title>Introduction to IT</title>
<date>November 7, 2011</date>
<subject>Muddiest Point for XML</subject>

Is there any code (or whatever) I can add to force it to display as such despite that it was entered it in a different order? Say for instance:
<title>Introduction to IT</title>
<unitid>LIS 2600</unitid>
<subject>Muddiest Point for XML</subject>
<date>November 7, 2011</date>

Or will it just not parse correctly? If that's the case, is there a way to automatically rearrange my tags and their text?

Digital Libraries Readings

Andreas Paepcke, et al, “Dewey Meets Turing”

Are Google’s original algorithms really derived from DLI? Where’s the reference?

There isn’t enough background to understand what’s going on in this article. It precludes an advanced understanding of DLI. For instance, what do publishers have to do with anything? I thought it was an NSF project.

William H. Mischo, “Digital Libraries: Challenges and Influential Work”

Oops. I guess I needed to read this first before “Dewey Meets Turing.” Now everything makes slightly more sense except the software companies. If you already have a fleet of computer scientists used to building things from the ground up (as “Dewey Meets Turing” mentions) what added value did restricting the final product add? Especially if you were just going to have to use government funds again to buy it back.

Clifford A. Lynch, “Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age”

Are institutional repositories not succeeding because of policy, management failure, or technical problems? It doesn’t seem like it, at least not in the way Lynch means it. Instead, the problem seems to be that the old system works for faculty, and therefore if it works why fix it? They have no incentive to use an IR. Publishing occurs for a variety of reasons and only one of them is accessibility. The others like prestige, job promotion, and other concerns are equally important issues to address. Lynch has got all the technological bases covered, but misses the human element of getting people to use IRs.

Saturday, November 5, 2011

Assignment 5

Here is a link to my Koha list: http://jade.exp.sis.pitt.edu:8080/cgi-bin/koha/virtualshelves/shelves.pl?viewshelf=19

My username is brh68

My List is Titled: "Bryce Henry Assignment5 LIS2600 - An Abundance of Cox"

Monday, October 31, 2011

Muddiest Point for CSS lecture

Nothing too confusing this week. It was nice to go over everything twice in class and lab.

XML Readings

“An Introduction to the Extensible Markup Language (XML)”

by Martin Bryan of The SGML Centre

You never hear about SGML anymore, does it still exist or is it still relevant?

So, beyond the fact that XML allows you to compound documents, it lets you identify where digital objects appear, how they are controlled, as well as add metadata to a file. It isn’t however a set of tags like HTML, but a way of using tags predefined by some type of governing body.

I get how you define attributes and tags, but where do definitions live? That must be the processing instructions and hyperlink to a document type declaration.

“Extending Your Markup,” Andre Bergholz

SGML is what lets you write define the structure.

Is what was described in the previous article DTD or schema?

The stylesheet sounds pretty interesting – follow up on it.

“A survey of XML standards: Part 1,” Uche Ogbuji

Catalogs: instructions for how an XML processor revolves entity identifiers into documents.

Namespaces: a mechanism for universal element and attribute naming. Can be identical to another language if you define it differently here.

Base: associates elements with UROs

Inclusions (XInclude): a system to merge XML documents.

Inforset: a way of describing objects with special properties.

Pointer: defines a language that can be used to refer to fragments of an XML document.

XLink: framework for expressing links in XML

Relax NG vs. W3C XML Schema vs. Schematron: Competing schemas, who won?

W3C Schema tutorial

A tutorial on schemas from W3C, nuff’ said.

Thursday, October 27, 2011

Assignment 4: Personal Bibliographic Management Systems

Here's the link to Assignment 4: http://www.citeulike.org/user/brh68

Monday, October 24, 2011

Muddiest Point for last week, whichever one it was

What has changed between HTML 1 and HTML 4? And how is XHTML different from these?

Unit 9 Readings: CSS

Starting with HTML + CSS

A pretty straightforward tutorial. Everything worked great for me up until it came to separating the stylesheet. It took moving it to an actual directory rather than just having it all on the desktop for some reason. Anyway, it worked!

Håkon Wium Lie and Bert Bos, Chapter 2 of the book Cascading Style Sheets, designing for the Web.

Does CSS use curly braces ({ }) and colon (:) for its property and value to distinguish it from a metaldata language? The brevity aspects like comma and semicolon separated lists (e.g. H1, H2, H3) are really nice.

Presumbably all browsers are CSS enhanced at this point. Especially if they’re talking about Netscape.

W3 CSS

W3 never ceases to amaze me. So useful, so easy to follow, and when I do I always learn like 10 new things. This is more useful and indepth than the Starting with HTML+CSS link, though that was also a helpful example.

Sunday, October 16, 2011

Muddiest Point for Week 8: Submarine Internet Cables

So, there are actually cables connecting the Internet across the ocean and across continents? Do they ever need repair? For instance, do earthquakes and the like damage them, and if so what happens? Are they managed by private companies, international government cooperatives, or what?

Unit 8 Readings: HTML and Web Authoring

HTML Tutorial

W3 never ceases to impress. This tutorial helped me understand what HTML actually does, and assisted me in separating it from my own preconceived notions. I had previously thought HTML was simply just tagging how digital objects appear on a web page (e.g. color, size, placement, font, etc.). However, the sections on “headings” and “meta” made me rethink this somewhat, as they begin to give HTML some contextual meaning – that is, it can apparently also describe what something is rather than only how it looks.

HTML Cheatsheet

I know I’m going to be reusing this again; nice to have it all in one place. One questions: W3 describes line breaks as really needing only an open break without ever closing it, and HXTML as needing the < br/ >, do you just use the one with the break for convenience sake? Will it always work regardless of whether its absolutely necessary?

Doug Goans, Guy Leach, and Teri Vogel, “Beyond HTML”

Wow, just from the opening paragraphs you can tell their FrontPage project was not going to go well. Right off the bat you see all the examples of why projects fail: lack of standards, communication, and training. I love that everyone was using their own fonts, colors, and layouts, it must have felt slightly psychotic going from section to section or even page to page. Still, it’s unfortunate that it took the web development librarian three years to even begin to implement standards.

So basically, librarians were getting hung-up on presentation when what they needed to be focusing on was content. The CMS allowed them to plug-in content, which could then be run through some type of stylesheet to give it a uniform appearance.

I’m impressed they had a blogging system in place in 2001, and had the foresight to start including social media into their site from the moment they began to standardize it. Guess it just shows how collaboration between web/computer specialists and librarians lead to the best systems for libraries. Also, this is yet another example of the superiority of largely homemade websites as opposed to commercial or opensource packages.

Monday, October 10, 2011

Muddiest Point for Week 7

I'm a little confused about what exactly an Intranet is. Are LANs also Intranets?

Unit 7 Readings

Andrew K. Pace, “Dismantling Integrated Library Systems”

Integration may have been lost as library technology made the jump onto the web, but that doesn’t mean it won’t be recovered. As vendor software comes and goes it will likely grow to meet demands and standards as they are established. What will increasingly distinguish products is as the author says, “new products and alliances,” (34). Successful vendors are those who will take into account the library communities resistance to change quickly by innovating there products on top of existing products as much as possible, that way the changes are gradual and older versions/systems are not left quite as behind.

Jeff Tyson, “How Internet Infrastructure Works”

POP=Point of Presence, or the place for local users to access a company’s network through phone or dedicated line. Managed (coordinated?) by high-level networks connected through NAPs or Network Access Points.

“What is incredible about this process is that a message can leave one computer and travel halfway across the world through several different networks and arrive at another computer in a fraction of a second!” Indeed, but how exactly is it possible? In addition, how are the various continents physically connected? Are there just big underwater cables stretching from here to there?

The backbones are clearly very fast, are home connections slower because the limbs to the backbone are slower? If I was the only person on the Internet one day would I be able to take advantage of OC-48 speeds, or would my home modem, router, cables, etc. still limit me?

The IP address examples must be ghost addresses because there’s nothing there. Spooky!

I feel like the majority of this article was covered pretty well in class, but the DNS and URL discussions helped to clear a few things up.

Brin and Page video

A 20-minute presentation on why Google’s founders think their company’s great. Nothing is really addressed in any depth, though the model of who is using Google and where was interesting. I wondered how their Montessori education influenced Google, they mention it, but not really what that meant to the company. Also, beyond the mention of the future and showing a slide of HAL 9000, what did they actually say about the future? Nothing really, Page just showed a tiny screenshot of a blog that Google’s algorithm was making fun of.

Wednesday, October 5, 2011

Assignment 2 part 2 - Jing video on screencast.com

Here's the link to my Jing video:
http://www.screencast.com/users/byrce/folders/Default/media/737d125d-fe8b-413e-a649-d34d24dbe488

Assignment 2 part 1 - Jing slides on Flickr

Follow the urls to see my Jing screen captures:
Slide 1: http://www.flickr.com/photos/66255398@N07/6215805712/in/photostream
Slide 2: http://www.flickr.com/photos/66255398@N07/6215291153/in/photostream
Slide 3: http://www.flickr.com/photos/66255398@N07/6215291205/in/photostream
Slide 4: http://www.flickr.com/photos/66255398@N07/6215805948/in/photostream

Slide 5: http://www.flickr.com/photos/66255398@N07/6215291251/in/photostream
Slide 6: http://www.flickr.com/photos/66255398@N07/6215805866/in/photostream
Slide 7: http://www.flickr.com/photos/66255398@N07/6215805904/in/photostream

Monday, October 3, 2011

Week 5 Muddiest Point

All clear this week. No muddy points.

Unit 6 Readings

“Computer network” Wikipedia

Better be careful of the HAM guys if you’re networking.

What are the tradeoffs for wire choices? Why doesn’t everyone just use optical fiber? Cost, or are there other factors involved? Also, when comparing wireless technologies, are they all equally fast? Or just different reaches?

TCP/IP: defines the addressing, identification, and routing specification.

Bluetooth=PAN technology (Portable Area Network)

LAN (local) links several computer is one location. Is there a real difference between LAN and PAN beyond the 10 meter range? Perhaps it’s the reliance on firewire/USB rather than Ethernet more than range. Can LAN also be PAN then?

Intranets are apparently not internets, but web browsers and P2P. Internet on the other hand is actually the backbone of what I think the Internet is.

In firewalls, what makes one source safe and another unsafe. How is this determined?

“LAN” Wikipedia

Mostly use Ethernet and wi-fi. Ethernet is from 1973!

If at “higher layers” TCP/IP is the standard, why is TCP/IP based networking’s market share “much reduced”? Especially if the continue to be both standard and influential?

"Management of RFID in Libraries," Karen Coyle

An interesting premise, and it makes sense for books. However, it seems to become problematic as you move away from books to other media types. The fact that thin items can potentially cause the tags to interfere with other RFID tags seems particularly awkward. However, so long as you don’t care about where something is on the shelf, but use it primarily to checkout media and prevent it being stolen, then it seems like it would work fine. I wonder how the field has changed in 5-years, is RFID standard in library books? Was it deemed too time consuming or costly financially?

Monday, September 26, 2011

Muddiest Point for Week 4

Two Points:

1) If the DBMS is only the graphical representation of what is going on in my database, how do I see the raw data? Or is the raw data just my tables?

2) Do object-oriented, object-relational, or multidimensional databases look any different from relational databases, or do they just function differently?

Unit 5 Reading Comments

“Setting the Stage,” Anne J. Gilliland

· Creators (as opposed to info professionals) are generating metadata, but not necessarily assigning that name to it.

· Varieties of metadata: admin, descriptive, preservation, technical (how systems function), and use (use of collections and info resources).

· Three components to metadata: content (what it contains or is about), context (who, what, why, how, where of object), and structure (associations/relations to other objects). Structure is taking an increasingly prominent role due to computer processing powers.

· Functions of metadata: creation, reuse, recontexting, and multiversioning – does reuse mean at say the institutional level, or archival level, or both?

o Organizing and describing of objects

o Validating – forming trustworthiness and authentication of objects

o Search/retrieve – metadata can help you re-find the object

o Use and preservation

o Disposition – deciding whether to archive or destroy it

· Metadata’s utility: increases accessibility, retains context, can expand use, help in legal issues (maybe), assist in preserving potentially.

· Library metadata=indexes, abstracts, and bibliographic records conforming to various standards

· Archival/museum metadata focuses more on context, both providing and preserving it

“An Overview of DCMI,” Eric Miller

· DCMI (model or metadata apparently) created DCES (Dublin Core Element Set) to support cross-discipline resource discovery

o Apparently most of DCMI’s effort has been in clarifying what exactly it does. That and supporting “richer” descriptive requirements.

· So, if there’s something to describe, which can be anything, Dublin Core would like to be able to describe it by ascribing it properties, classes, and literals.

o Properties: a type of resource (and a resource is anything that can be unique)

o Classes: specific types of resources

o Literals: simple text strings (XML) < rfds: title or whatever >The Title</rfds: title or whatever> open, new element, close new element, close, etc.

§ Interestingly it can use a “namespace” to tie a specific word(s) to an element.

· Resource Description Framework (RDF)

· Basically it’s a flexible way of describing information to a standard. Similar to say EAD, but even more flexible. Worth looking into in further depth.

EndNote X5: Introduction

· I wish I had watched some of these videos last week. They would have saved me (some) of the aggravation of getting EndNote to do what I want.

Wednesday, September 21, 2011

Flickr Photos for Assignment 1

My Flickr photos (both the screen display and thumbnail copies) are now up and available at http://www.flickr.com/photos/66255398@N07/

Sunday, September 18, 2011

Unit 4 Reading Comments

Wikipedia – “Database”

If not every collection of data is a database then what are the other collections called?

If “stored data in a database is not generally portable across different DBMS,” should they be? As data becomes more important in scholarship, it would seem like some type of standardization to be interoperable to a high degree across DBMSs will be increasingly important. Later in Database Migration between DBMSs, they say “A database built with one DBMS is not portable to another DBMS,” but go on to talk about the ways in which it is possible to “migrate” and “transform” from one to another. I don’t get how it is possible (but tricky) and all not possible (put doable).

This article makes it sound as if while SQL is the (a?) standard for ANSI and ISO that it is not as useful as perhaps other languages. Is this an example of an early infrastructure adoption that is maybe not perfect, but is so ubiquitous that we cannot free ourselves of it?

Wikipedia – “Entity-relationship model”

See, Peter Chen, "The Entity Relationship Model: Toward a Unified View of Data"

Entity(-type)= physical object, event, or concept. Entity is only one instance of these things, but entity-type is the actual category that includes many instances.

Entities are the nouns, and the relationship is the verb. Ex.: Artist—Perform—Song. This is one instance, the entity-relationship is composed of a set of instances.

http://www.phlonx.com/resources/nf3/

Normal Form (NF1) needs ”atomicity”: the indivisibility of an attribute into similar parts.

Does figure A-1 break atomicity, it has columns that contain repeating data, but it’s rows seem okay. I don’t see how fig. A-1 is really any different from B other than they filled in the repeating data. It seems like both have elements that make their rows unique. The answer is that it doesn’t and hence is broken up into several different tables until it does.

This tutorial would be easier to follow if it included a database that I was manipulating. Also, if it began with showing the final product (fig. K) displayed next to A-1 I think it would help. Not that it would need to explain what exactly was different in the beginning, only show that we need to go from spreadsheet to database and they each look like this. Then they could go on to explain how that is accomplished and why each step is necessary. As it stands, I had to go back the beginning and re-read in order to really start to grasp what happened.

Muddiest Point for Week 3

I thought the lecture was clear. I have no outstanding questions this week.

Sunday, September 11, 2011

Muddiest Point for Week 2

We talked about suggested a controlled vocabulary for tags on delicious.com in order to maintain a consistency between different users. This seems like a good idea, and is also talked about a bit in the Galloway article. What I would like to know, is if there is a way to create new tags consistently? That is, can you suggest a way in which new tags would be able to become standardized within the user community? Alternately, should a site like delicious follow the Historic Pittsburgh website and adopt an existing subject thesauri like the Library of Congress Subject Headings?

Unit 3 Reading Comments

Wikipedia – “Data Compression.” Lossless: Compressed data that is error free. Lossy: Some fidelity or quality is lost in the compression. What I don’t get or is not explained in this article is why you would ever want lossy over lossless. Is lossless a lot bigger?

“Data Compression Basics.”

The examples really help clarify wikipedia’s poor ‘e’ versus ‘z’ example.

If tiff files, which are huge, use LZ algorithms then they are somewhat compressed. How large would an image be with no compression at all? And would creating such an image possess any advantages if size was not an issue?

The smooth and jiggled grayscale comparisons are a great example of ‘rounding’ in lossy compression.

Lossy is not loss of quality, but loss of data.

Ed Galloway, “Imaging Pittsburgh”

Several helpful pieces of advice in working with multiple partners with different agenda on one project

Coming to terms with consistent subject headings; figuring out a way to cope with an array of metadata styles and still look consistent; and then there is the copyright. The solution to copyright seemed a little weak.

Did image reproduction make any money? I have heard that it rarely does, but perhaps they have an insight into it that I do not?
I like the idea that by offering various collections from different types of institutions that you begin to create context through the images themselves.

Paula Webb, “YouTube and Libraries.”

Wow, someone sure is enthusiastic. Are there no disadvantages to using YouTube? Do you sacrifice any rights to the materials posted as with Facebook, or other social media sites?

Saturday, September 3, 2011

Unit 2: Computer Basics, Digitization Reading Notes and Comments

Wikipedia. "Hardware"

There seem to have been very few changes to the overall structure of the computer, sure things have improved in terms of smaller and faster, but the physical components appear to be unchanged almost since inception. So, there's the calculator brain (CPU), a holding tank for current operations (RAM), its central nervous system (the buses), and a reptile back brain (BIOS) running crucial but largely uncontrolled operations like power (the heart) or firmware. If anything has changed, it is the computer's tool wielding devices (optical drives), though how much change has happened beyond memory size and speed, I'm not sure. Therefore, in these terms the computer almost appears lifelike at least until we consider that unlike anything in life it must have some type of user in order to do anything. In order to interface with the hardware the user makes use of special organs called input devices, which have changed or evolved slightly over time - e.g. the addition of a mouse and then a few specialized keyboard keys not present on the typewriter from which it was adapted.

Wikipedia. "Software"

The mental functions, or the stuff that makes the machine think; these include its etiquette and protocols. When you anthropomorphize the computer it is no wonder that several philosophers at the dawn of the PC age (maybe now too?) have attempted to show how very machine-like people are, and whereas they have a valid point, have we not created the machine in our own image?

Software is where things get tricky. The hardware is designed to operate without error (ideally), but once the thought process is introduced in the form of software things can start to go off track and we see bugs, or wrong orders, being issued.

A couple issues with this Wikipedia entry. How is the "the history of computer software [...] most often traced back to the first software bug in 1946." What does that even mean? Also confusing is the statement that, "it is hard to imagine today that people once felt that software was worthless without a machine." I think I get that the author is driving at the fact people once thought that computers have to be a bundled hardware/software package and that pieces of software on its own would be worthless. However, the way it currently reads begs the question, "isn't it?"

Stuart D. Lee. "Digitization: Is It Worth It?"

"Not to mention the money." I almost wish he would mention the funding that has cropped up since digitization was deemed a valid method of preservation. It would make the article longer of course, but the issue of ‘if there is money out there to do it people will flock towards it’ seems like a large piece of the story here.
Lee suggests that digital imaging is "the most common form of digitization we all encounter," and perhaps it was generally true in 2001 (maybe it’s still true in libraries?), but I would argue that Lee has overlooked private efforts at digitization. Bootleggers and the commercial industry have both been heavily engaged in digitization from the start, and they have focused not on imaging, but on video and audio. If libraries fell behind on this front, there were always private individuals to pick up the slack.
Is the only real benefit to digitization the access it creates?
One thing I feel Lee overlooks in his cost effectiveness/access discussion is that if an item is photocopied 19+ times then its longevity may have become compromised. Therefore, add on to the copy cost the cost of conservation to this high use item.
On the issue of collection development versus digitization, it is true that having access to journal databases is a wonderful thing. In the long-term however, what do libraries gain from these subscriptions? A digitized rare manuscript's costs are finite, but the yearly cost of database subscription is never ending and if it should ever become beyond an institution's means to finance it they have nothing in the form of collection to show for it. All those journals suddenly vanish while that scanned rare book can be used well over a year and potentially indefinitely.

Doreen Carvajal. "European libraries face problems in digitalizing." New York Times

"The basic problem is that there isn't enough money to digitize everything we want to." I hope this statement was not meant to be a revelation to anyone because if it was, then welcome to the real world.
I guess I fail to see the confusion or problem in the article. Were European efforts in digitization attempting to be completely independent of private finances while trying to compete with the massive resources of a company like Google? Are they surprised that limited government funding just is not enough? Or are European libraries in 2007 just coming to the realization that if a private for profit group gives you money that they are expecting to receive something in return? The concern, as I understand it, is that there exists a danger that Google will crush world culture by digitizing only U.S.-centric materials. Does Google then have a cultural agenda? It does not seem like it since they appear to be reaching out to Europe and their digitization efforts.

Charles Smith. "A Few Thoughts on the Google Books Library Project." EDUCAUSE Quarterly. 2008.

Is the issue really that books will become obsolete and not that a private for-profit company is digitizing them?
"Remember being taken to the library at the beginning of the school year to learn how to access its resources." Most libraries still run these seminars and they continue to be as, or more, relevant now than they ever were. It's true that in order to fine tune your research abilities you will need to a) research and b) get individualized help, those seminars exposing the user to a library’s resources are still crucial regardless of experience level.
I'm not sure I agree with Smith's conclusions. It has been my experience that if a researcher actually wants/needs something that they will go to great lengths to get it. If something is digitized then use it, but does it not being digitized preclude use? Perhaps, but only if there is another book or article nearly identical that has been digitized. Certainly, people are lazy and gravitate towards the easiest possible way, but I still have not seen this generation unable to do or think anything that is not Internet accessible. Is this a real concern? If something is not online does it no longer exist? I hope not, because for all the bits available there is still only a fraction of the useful information on the web in the overall scheme of things.