google 3
STUCK with your assignment? When is it due? Hire our professional essay experts who are available online 24/7 for an essay paper written to a high standard at a reasonable price.
Order a Similar Paper Order a Different Paper
You will develop a short 1300 word document.
Task: Describe how OSINT can be used to supplement your organizational collection plan, identify 10 sites that can be used to research sites/domains for:
- legitimacy
- sender verification
- list of domains (country codes/domains/extensions, organization domains, other IOC (indicators of compromise) that might need to be researched)
https://github.com/dthroner/Hacking/blob/master/Google%20Hacking%20For%20Penetration%20Testers-Syngress%20(2015).pdf
https://www.youtube.com/watch?v=6dRT5VEvIo8&feature=youtu.be
https://www.youtube.com/watch?v=WQo03MJG0m4
Understanding
Metadata
What is Metadata? …………………………………………………………………………………….. 1
What Does Metadata Do? …………………………………………………………………….. 1
Structuring Metadata ……………………………………………………………………… 2
Metadata Schemes and Element Sets ……………………………………….. 3
Dublin Core ………………………………………………………………………………………………………3
TEI and METS………………………………………………………………………………………………..4
MODS ………………………………………………………………………………………………………..5
EAD and LOM…………………………………………………………………………………………6
<indecs>, ONIX, CDWA, and VRA …………………………………………………………7
MPEG …………………………………………………………………………………………….8
FGDC and DDI …………………………………………………………………………….9
Creating Metadata ………………………………………… 10
Interoperability and Exchange of Metadata ….11
Future Directions ……………………………… 12
More Information on Metadata …….. 13
Glossary ……………………………….. 15
Acknowledgements
Understanding Metadata is a revision and expansion of Metadata Made
Simpler: A guide for libraries published by NISO Press in 2001.
NISO Press extends its thanks and appreciation to Rebecca Guenther
and Jacqueline Radebaugh, staff members in the Library of Congress
Network Development and MARC Standards Office, for sharing their
expertise and contributing to this publication.
About NISO
NISO, a non-profit association accredited by the American National
Standards Institute (ANSI), identifies, develops, maintains, and publishes
technical standards to manage information in our changing and ever-more
digital environment. NISO standards apply both traditional and new
technologies to the full range of information-related needs, including
retrieval, re-purposing, storage, metadata, and preservation. NISO
Standards, information about NISO’s activities and membership are
featured on the NISO website <http://www.niso.org>.
This booklet is available for free on the NISO website
(www.niso.org) and in hardcopy from NISO Press.
Published by:
NISO Press
National Information Standards Organization
4733 Bethesda Avenue, Suite 300
Bethesda, MD 20814 USA
Email: nisohq@niso.org
Tel: 301-654-2512
Fax: 301-654-1721
URL: www.niso.org
Copyright © 2004 National Information Standards Organization
ISBN: 1-880124-62-9
What Is Metadata?
Metadata is structured infor-
mation that describes, explains,
locates, or otherwise makes it
easier to retrieve, use, or manage
an information resource. Metadata
is often called data about data or
information about information.
The term metadata is used
differently in different communities.
Some use it to refer to machine
understandable information, while
others use it only for records that
describe electronic resources. In
the library environment, metadata
is commonly used for any formal
scheme of resource description,
applying to any type of object, digital
or non-digital. Traditional library
cataloging is a form of metadata;
MARC 21 and the rule sets used
with it, such as AACR2, are
metadata standards. Other
metadata schemes have been
developed to describe various types
of textual and non-textual objects
including published books,
electronic documents, archival
finding aids, art objects, educational
and training materials, and scientific
datasets.
There are three main types of
metadata:
• Descriptive metadata describes
a resource for purposes such as
discovery and identification. It
can include elements such as
title, abstract, author, and
keywords.
• Structural metadata indicates
how compound objects are put
together, for example, how
pages are ordered to form
chapters.
• Administrative metadata pro-
vides information to help
manage a resource, such as
when and how it was created, file
type and other technical
information, and who can access
it. There are several subsets of
Understanding Metadata
Page 1
Metadata is key
to ensuring that
resources will
survive and
continue to be
accessible into
the future.
administrative data; two that
sometimes are listed as separate
metadata types are:
− Rights management meta-
data, which deals with
intellectual property rights,
and
− Preservation metadata, which
contains information needed
to archive and preserve a
resource.
Metadata can describe re-
sources at any level of aggregation.
It can describe a collection, a single
resource, or a component part of a
larger resource (for example, a
photograph in an article). Just as
catalogers make decisions about
whether a catalog record should be
created for a whole set of volumes
or for each particular volume in the
set, so the metadata creator makes
similar decisions. Metadata can also
be used for description at any level
of the information model laid out in
the IFLA (International Federation
of Library Associations and
Institutions) Functional Require-
ments for Bibliographic Records:
work, expression, manifestation, or
item. For example, a metadata
record could describe a report, a
particular edition of the report, or a
specific copy of that edition of the
report.
Metadata can be embedded in
a digital object or it can be stored
separately. Metadata is often
embedded in HTML documents and
in the headers of image files.
Storing metadata with the object it
describes ensures the metadata will
not be lost, obviates problems of
linking between data and metadata,
and helps ensure that the metadata
and object will be updated together.
However, it is impossible to embed
metadata in some types of objects
(for example, artifacts). Also, storing
metadata separately can simplify
the management of the metadata
itself and facilitate search and
retrieval. Therefore, metadata is
commonly stored in a database
system and linked to the objects
described.
What Does
Metadata Do?
An important reason for creating
descriptive metadata is to facilitate
discovery of relevant information. In
addition to resource discovery,
metadata can help organize
electronic resources, facilitate
interoperability and legacy resource
integration, provide digital
identification, and support archiving
and preservation.
Resource Discovery
Metadata serves the same
functions in resource discovery as
good cataloging does by:
• allowing resources to be found
by relevant criteria;
• identifying resources;
• bringing similar resources
together;
• distinguishing dissimilar re-
sources; and
• giving location information.
Organizing Electronic
Resources
As the number of Web-based
resources grows exponentially,
aggregate sites or portals are
increasingly useful in organizing
Page Understanding Metadata2
l inks to resources based on
audience or topic. Such lists can be
built as static webpages, with the
names and locations of the
resources “hardcoded” in the
HTML. However, it is more efficient
and increasingly more common to
build these pages dynamically from
metadata stored in databases.
Various software tools can be used
to automatically extract and
reformat the information for Web
applications.
Interoperability
Describing a resource with
metadata allows it to be understood
by both humans and machines in
ways that promote interoperability.
Interoperability is the ability of
multiple systems with different
hardware and software platforms,
data structures, and interfaces to
exchange data with minimal loss of
content and functionality. Using
defined metadata schemes, shared
transfer protocols, and crosswalks
between schemes, resources
across the network can be
searched more seamlessly.
Two approaches to inter-
operability are cross-system search
and metadata harvesting. The
Z39.50 protocol is commonly used
for cross-system search. Z39.50
implementers do not share
metadata but map their own search
capabilities to a common set of
search attributes. A contrasting
approach taken by the Open
Archives Initiative is for all data
providers to translate their native
metadata to a common core set of
elements and expose this for
harvesting. A search service
provider then gathers the metadata
into a consistent central index to
allow cross-repository searching
regardless of the metadata formats
used by participating repositories.
Digital Identification
Most metadata schemes include
elements such as standard
numbers to uniquely identify the
work or object to which the
metadata refers. The location of a
digital object may also be given
using a file name, URL (Uniform
Resource Locator), or some more
persistent identifier such as a PURL
(Persistent URL) or DOI (Digital
Object Identifier). Persistent
identifiers are preferred because
object locations often change,
making the standard URL (and
therefore the metadata record)
invalid. In addition to the actual
elements that point to the object, the
metadata can be combined to act
as a set of identifying data,
differentiating one object from
another for validation purposes.
Archiving and
Preservation
Most current metadata efforts
center around the discovery of
recently created resources.
However, there is a growing
concern that digital resources will
not survive in usable form into the
future. Digital information is fragile;
it can be corrupted or altered,
intentionally or unintentionally. It
may become unusable as storage
media and hardware and software
technologies change. Format
migration and perhaps emulation of
current hardware and software
behavior in future hardware and
software platforms are strategies for
overcoming these challenges.
Metadata is key to ensuring that
resources will survive and continue
to be accessible into the future.
Archiving and preservation require
special elements to track the
lineage of a digital object (where it
came from and how it has changed
over time), to detail its physical
characteristics, and to document its
behavior in order to emulate it on
future technologies.
Many organizations inter-
nationally have worked on defining
metadata schemes for digital
preservation, including the National
Library of Australia, the British
Cedars Project (CURL Exemplars
in Digital Archives), and a joint
Working Group of OCLC and the
Research Libraries Group (RLG).
The latter group developed a
framework outlining types of
presentation metadata. A follow-up
group, PREMIS (PREservation
Metadata: Implementation Strat-
egies)—also sponsored by OCLC
and RLG—is developing a set of
core elements and strategies for the
encoding, storage, and manage-
ment of preservation metadata
within a digital preservation system.
Many of these initiatives are based
on or compatible with the ISO
Reference Model for an Open
Archival Information System
(OAIS).
Structuring Metadata
Metadata schemes (also called
schema) are sets of metadata
elements designed for a specific
purpose, such as describing a
particular type of information
resource. The definition or meaning
of the elements themselves is
known as the semantics of the
scheme. The values given to
metadata elements are the content.
Metadata schemes generally
specify names of elements and their
semantics. Optionally, they may
specify content rules for how
content must be formulated (for
example, how to identify the main
title), representation rules for
content (for example, capitalization
rules), and allowable content values
(for example, terms must be used
from a specified controlled
vocabulary).
There may also be syntax rules
for how the elements and their
content should be encoded. A
metadata scheme with no
prescribed syntax rules is called
syntax independent. Metadata can
be encoded in any definable syntax.
Many current metadata schemes
use SGML (Standard Generalized
Mark-up Language) or XML
(Extensible Mark-up Language).
XML, developed by the World Wide
Web Consortium (W3C), is an
extended form of HTML that allows
for locally defined tag sets and the
easy exchange of structured
PageUnderstanding Metadata
Dublin Core Example
Title=”Metadata Demystified”
Creator=”Brand, Amy”
Creator=”Daly, Frank”
Creator=”Meyers, Barbara”
Subject=”metadata”
Description=”Presents an overview of
metadata conventions in
publishing.”
Publisher=”NISO Press”
Publisher=”The Sheridan Press”
Date=”2003-07″
Type=”Text”
Format=”application/pdf”
Identifier=”http://www.niso.org/
standards/resources/
Metadata_Demystified.pdf”
Language=”en”
3
information. SGML is a superset of
both HTML and XML and allows for
the richest mark-up of a document.
Useful XML tools are becoming
widely available as XML plays an
increasingly crucial role in the
exchange of a variety of data on the
Web.
Metadata Schemes and
Element Sets
Many different metadata
schemes are being developed in a
variety of user environments and
disciplines. Some of the most
common ones are discussed in this
section.
Dublin Core
The Dublin Core Metadata
Element Set arose from discussions
at a 1995 workshop sponsored by
OCLC and the National Center for
Supercomputing Applications
(NCSA). As the workshop was held
in Dublin, Ohio, the element set was
named the Dublin Core. The
continuing development of the
Dublin Core and related spec-
ifications is managed by the Dublin
Core Metadata Initiative (DCMI).
The original objective of the
Dublin Core was to define a set of
elements that could be used by
authors to describe their own Web
resources. Faced with a pro-
liferation of electronic resources
and the inability of the library
profession to catalog all these
resources, the goal was to define a
few elements and some simple
rules that could be applied by
noncatalogers. The original 13 core
elements were later increased to
15: Title, Creator, Subject, Descrip-
tion, Publisher, Contributor, Date,
Type, Format, Identifier, Source,
Language, Relation, Coverage, and
Rights.
The Dublin Core was developed
to be simple and concise, and to
describe Web-based documents.
However, Dublin Core has been
used with other types of materials
and in applications demanding
some complexity. There has
historically been some tension
between supporters of a minimalist
view, who emphasize the
need to keep the elements
to a minimum and the
semantics and syntax
simple, and supporters of
a structuralist view who
argue for finer semantic
distinctions and more
extensibility for particular
communities.
These discussions
have led to a distinction
between qualified and
unqualified (or simple)
Dublin Core. Qualifiers can
be used to refine (narrow
the scope of) an element,
or to identify the encoding
scheme used in repre-
senting an element value.
The element Date, for
example, can be used with
the refinement qualifier
created to narrow the
meaning of the element to
the date the object was
created. Date can also be
used with an encoding scheme
qualifier to identify the format in
which the date is recorded, for
example, following the ISO 8601
standard for representing date and
time.
All Dublin Core elements are
optional and all are repeatable. The
elements may be presented in any
order. While the Dublin Core
description recommends the use of
controlled values for fields where
they are appropriate (for example,
controlled vocabularies for the
Subject field), this is not required.
However, working groups have
been established to discuss
authoritative lists for certain
elements such as Resource Type.
While Dublin Core leaves content
rules to the particular imple-
mentation, the DCMI encourages
the adoption of application profiles
(domain-specific rules) for particular
domains such as education and
government. An application profile
for libraries is being developed by
the Libraries Working Group.
Because of its simplicity, the
Dublin Core element set is now
used by many outside the library
c o m m u n i t y — r e s e a r c h e r s ,
museum curators, and music
collectors to name only a few. There
are hundreds of projects worldwide
that use the Dublin Core either for
cataloging or to collect data from the
Internet; more than 50 of these have
links on the DCMI website. The
subjects range from cultural
heritage and art to math and
physics. Meanwhile the Dublin Core
Metadata Initiative has expanded
beyond simply maintaining the
Dublin Core Metadata Element Set
into an organization that describes
itself as “dedicated to promoting the
widespread adoption of inter-
operable metadata standards and
developing specialized metadata
vocabularies for discovery
systems.”
Page Understanding Metadata4
The Text Encoding
Initiative (TEI)
The Text Encoding Initiative is an
international project to develop
guidelines for marking up electronic
texts such as novels, plays, and
poetry, primarily to support research
in the humanities. In addition to
specifying how to encode the text
of a work, the TEI Guidelines for
Electronic Text Encoding and
Interchange also specify a header
portion, embedded in the resource,
that consists of metadata about the
work. The TEI header, like the rest
of the TEI, is defined as an SGML
DTD (Document Type Definition)—
a set of tags and rules defined in
SGML syntax that describe the
structure and elements of a
document. This SGML mark-up
becomes part of the electronic
resource itself. Since the TEI DTD
is rather large and complicated in
order to apply to a vast range of
texts and uses, a simpler subset of
the DTD, known as TEI Lite, is
commonly used in libraries.
It is assumed that TEI-encoded
texts are electronic versions of
printed texts. Therefore the TEI
Header can be used to record
bibliographic information about both
the electronic version of the text and
about the non-electronic source
version. The basic bibliographic
information is similar to that
recorded in library cataloging and
can be mapped to and from MARC.
However, there are also elements
defined to record details about how
the text was transcribed and edited,
how mark-up was performed, what
revisions were made, and other
non-bibliographic facts. Libraries
tend to use TEI headers when they
have collections of SGML-encoded
full text. Some libraries use TEI
headers to derive MARC records for
their catalogs, while others use
MARC records as the basis for
creating TEI header descriptions for
the source texts.
Metadata Encoding and
Transmission Standard
(METS)
The Metadata Encoding and
Transmission Standard (METS)
was developed to fill the need for a
standard data structure for
describing complex digital library
objects. METS is an XML Schema
for creating XML document
instances that express the structure
of digital l ibrary objects, the
associated descriptive and
administrative metadata, and the
names and locations of the files that
comprise the digital object.
The metadata nec-
essary for successful
management and use of
digital objects is both more
extensive than and
different from the
metadata used for
managing collections of
printed works and other
physical materials.
Structural metadata is
needed to ensure that
separately digitized files
(for example, different
pages of a digitized book)
are structured appro-
priately. Technical
metadata is needed for
information about the
digitization process so
that scholars may
determine how accurate a
reflection of the original
the digital version
provides. Other technical
metadata is required for
internal purposes in order
to periodically refresh and
migrate the data, ensuring
the durability of valuable
resources.
METS was originally
an outgrowth of the
Making of America II
project, a digitization
project of major research
libraries that attempted to
address these metadata
issues, in part by providing
an encoding format for metadata
for textual and image-based works.
The Digital Library Federation (DLF)
built on that earlier work to create
METS, a standard schema for
providing a method for expressing
and packaging together descriptive,
administrative, and structural
metadata for objects within a digital
library. Expressed using the XML
schema language, METS provides
a document format for encoding the
metadata necessary for manage-
ment of digital library objects within
a repository and for exchange
between repositories.
Metadata in Action
An oral historian makes tape-
recordings of interviews with members of
a particular ethnic group. Interviewees
sign a paper release form giving
intellectual property rights to the historian.
Most interviewees grant permission to
disseminate the interviews in print and
electronically, but several restrict
publication and dissemination until 25
years after death.
Information about each interview is
kept in a database: Interviewer,
Interviewee, Date, Place, etc. Each
interview follows a questionnaire format.
The questionnaire exists as a text file. The
tapes, release forms, database, and text
file are donated to a library that has a
special collection focusing on the particular
ethnic group.
The tapes are digitized. Since each
interview runs over several tapes,
technicians record structural metadata to
keep component parts of each interview
together. Technicians record
administrative metadata such as file
names, location of each interview in the
files, equipment used, the methods of
digitizing and assuring quality and
completeness, file formats, etc. Different
segments of this metadata allow the audio
files to be automatically tracked, accessed,
stored, refreshed, and migrated.
An archivist expands the database to
include the persistent identifier of each
interview, thereby linking the audio file to
the descriptive metadata. The names of
the data elements are revised to match
Dublin Core terminology, including
qualifiers used specifically for audio
(continued on page 5)
PageUnderstanding Metadata 5
A METS document contains
seven major sections:
• METS Header – Contains
metadata describing the METS
document itself, including such
information as creator, editor,
etc.
• Descriptive Metadata – Points to
descriptive metadata external to
the METS document (for
example, a MARC record in an
OPAC or an Encoded Archival
Description finding aid main-
tained on a webserver), or to
internally embedded descriptive
metadata, or both.
• Administrative Metadata –
Provides information regarding
how the files are created and
stored, intellectual property
rights, the original source object
from which the digital library
object derives, and the prov-
enance of the files comprising
the digital library object.
• File Section – Lists all files
containing content that comprise
the electronic versions of the
digital object.
• Structural Map – Outlines a
hierarchical structure for the
digital library object and links the
elements of that structure
to content files and
metadata that pertain to
each element.
• Structural Links –
Allows METS creators to
record the nodes in the
hierarchy outlined in the
Structural Map.
• Behavior –
Associates executable
behaviors with content in
the METS object.
The METS header, file
section, structural map,
structural l inks, and
behavior sections are
defined within the METS
schema. METS is less
prescriptive about
descriptive and admin-
istrative metadata, relying
on extension schemas—
externally developed
metadata schemes—to
provide specific elements.
The METS Editorial Board
has endorsed three
descriptive metadata
schemes: simple Dublin
Core, MARCXML, and
MODS (discussed below).
For technical metadata
the METS website makes
available schemas for text
and digital still images.
The latter standard is
called MIX, Metadata for Images in
XML Schema, and is based on a
proposed NISO standard, Z39.87,
Data Dictionary: Technical
Metadata for Digital Still Images.
Further work is in process on
extension schemas for audio, video,
and websites. Another current area
of concentration for the METS
development community is the
creation of METS application
profiles to give guidance regarding
the creation of METS documents for
particular object types.
Use of the METS schema is
widespread. A list of implementation
registries using METS, a tutorial,
and other important information can
be found on the METS website.
Metadata Object
Description Schema
(MODS)
The Metadata Object
Description Schema (MODS) is a
descriptive metadata schema that
is a derivative of MARC 21 and
intended to either carry selected
data from existing MARC 21
records or enable the creation of
original resource description
records. It includes a subset of
MARC fields and uses language-
based tags rather than the numeric
ones used in MARC 21 records. In
some cases, it regroups elements
from the MARC 21 bibliographic
format. Like METS, MODS is
expressed using the XML schema
language.
Although the MODS standard
can stand on its own, it may also
complement other metadata
formats. Because of its flexibility
and use of XML, MODS may
potentially be used as a Z39.50
Next Generation specified format,
an extension schema to METS, a
metadata set for harvesting, and for
creating original resource metadata
records in an XML syntax.
Rich description of electronic
resources is a particular focus of
MODS, which provides some
advantages over other metadata
Metadata in Action
(continued from page 4)
materials. Information on rights and
permissions is entered.
An archivist creates an EAD finding
aid for the audio collection using the
database as the core. Portions of the
questionnaire text file are incorporated as
a rich source of subject keywords. A MARC
record is derived from the EAD finding aid
and added to OCLC and RLIN.
A webpage is created where
researchers can access the finding aid,
search the database, and listen to the
audio files. Interviews coded as restricted
are invisible to the search program until
the date when they become open to the
public. Administrative, structural, and
descriptive metadata is created for the
webpage to hold all the pieces together,
allow them to be managed, and allow
them to be accessed.
The library participates in a metadata
harvesting protocol to provide extracts of
local metadata in a common format to a
service provider so that information about
the collection is automatically included in
a number of relevant tools such as
catalogs and portals.
The webpage is linked to the library’s
website dedicated to resources about the
ethnic group, where it is available to
researchers in context with archival and
visual materials, digitized secondary
sources, etc. Administrative, structural,
and descriptive metadata at the website
level has also been created.
Page Understanding Metadata
A MODS Record Example
<mods>
<titleInfo>
<title>Metadata demystified</title>
</titleInfo>
<name type=”personal”>
<namePart type=”family”>Brand</namePart>
<namePart type=”given”>Amy</namePart>
<role>
<roleTerm authority=”marcrelator” type=”text”>author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<originInfo>
<dateIssued>2003</dateIssued>
<place>
<placeTerm type=”text”>Bethesda, MD</placeTerm>
</place>
<publisher>NISO Press</publisher>
</originInfo>
<identifier type=”isbn”>1-880124-59-9</identifier>
</mods>
6
schemes. MODS elements are
richer than the Dublin Core; its
elements are more compatible with
library data than the ONIX or Dublin
Core standards; and it is simpler to
apply than the full MARC 21
bibliographic format. With its use of
XML Schema language, MODS
offers enhancements over MARC
21, such as the use of an optional
ID attribute to facilitate linking at the
element level; the ability to specify
language, script, and transliteration
scheme at the element level; and
the ability to embed a rich
description of components in the
related Item element.
The ability in MODS to give
granular descriptions of constituent
parts of an object works particularly
well with the METS structural map
for complex digital library objects.
The Encoded Archival
Description (EAD)
The Encoded Archival
Description (EAD) was developed
as a way of marking up the data
contained in finding aids so that they
can be searched and displayed
online.
In archives and special
collections, the finding aid is an
important tool for resource
description. Finding aids differ from
catalog records by being much
longer, more narrative and
explanatory, and highly structured in
a hierarchical fashion. They
generally start with a description of
the collection as a whole, indicating
what types of materials it contains
and why they are important. If the
collection consists of the personal
papers of an individual there can be
a lengthy biography of that person.
The finding aid describes the series
into which the collection is
organized—such as corres-
pondence, business records,
personal papers, and campaign
speeches—and ends with an
itemization of the contents of the
physical boxes and folders
comprising the collection.
Like the TEI Header, the EAD is
defined as an SGML DTD. It begins
with a header section that describes
the finding aid itself (for example,
who wrote it) and then goes on to
the description of the collection as
a whole and successively more
detailed information about the
records or series within the
collection. If individual items being
described exist in digital form, the
EAD can include pointers to the
digital objects. The 2002 version of
the EAD DTD provides
support for both SGML
and XML through the use
of defined “switches” for
turning off features used
only in SGML and turning
on features used only in
XML. The EAD standard
is maintained jointly by the
Library of Congress and
the Society of American
Archivists.
The EAD is particularly
popular in academic
libraries, historical
societies, and museums
with large special
collections. Many of these
collections contain unique
materials unavailable
elsewhere and often the
materials in the
collections are not individually
cataloged like traditional library
materials. By creating searchable
EAD finding aids, libraries and
archives can increase awareness of
their unique collections to the
Internet community.
Learning Object Metadata
The IEEE Learning Technology
Standards Committee (LTSC)
developed the Learning Object
Metadata (LOM) standard (IEEE
1484.12.1-2002) to enable the use
and re-use of technology-supported
learning resources such as
computer-based training and
distance learning. The LOM defines
the minimal set of attributes to man-
age, locate, and evaluate learning
objects. The attributes are grouped
into eight categories:
• General, containing information
about the object as a whole;
• Lifecycle, containing metadata
about the objects evolution;
• Technical, with descriptions of
the technical characteristics and
requirements;
• Educational, containing the
educational / pedagogical
attributes;
PageUnderstanding Metadata 7
• Rights, describing the intellectual
property rights and use
conditions;
• Relation, identifying related
objects;
• Annotation, containing com-
ments and the date and author
of the comments; and
• Classification, which identifies
other classification system
identifiers for the object.
Within each category is a
hierarchy of data elements to which
the metadata values are assigned.
Examples of learning-related
metadata elements found in the
Education category are Typical Age
Range (of the intended user),
Difficulty, Typical Learning Time,
and Interactivity Level.
The IMS Global Learning
Consortium has developed a suite
of specifications to enable
interoperability in a learning
environment. Their Meta-Data
Information Model specification is
based on the IEEE LOM scheme
with only minor modifications.
E-Commerce – <indecs>
and ONIX
Metadata schemas are
increasingly being developed to
support electronic commerce
applications. The <indecs>
Framework (Interoperability of Data
in ECommerce Systems) was an
international collaborative effort
supported by the European
Commission’s Info 2000 Pro-
gramme. The collaborators were
major rights owners, such as
publishers and members of the
recording industry, who wanted to
develop a framework for metadata
standards to support network
commerce in intellectual property.
The foundation of the <indecs>
work is a data model for intellectual
property and its transfer. Rather
than developing a new metadata
scheme, <indecs> sought to
develop a common framework to
allow various schemes for
transactions related to different
genres such as music, journal
articles, and books to be able to
interchange information, particularly
that related to intellectual property
rights. In order to support this
common framework, <indecs> has
developed a minimal kernel of
required metadata.
Several organizations have built
on the <indecs> Framework to
develop specific metadata schemas.
Among them is the ONIX (Online
Information Exchange) International
standard. ONIX is an XML-based
metadata scheme developed by
publishers under the auspices of a
number of book industry trade
groups in the United States and
Europe. The original ONIX
specification was a direct response
to the enormous growth in online
book sales and the realization that
books described with images, cover
blurbs, reviews, and similar
information significantly outsold
books without this information.
Therefore ONIX for Books has
elements to record a wide range of
evaluative and promotional infor-
mation as well as basic bibliographic
and trade data. ONIX for Serials is
in development to define serials
product metadata at the title, item,
and subscription package levels.
While ONIX information was
designed for use in the commerce
cycle of a publication, it may also
provide a source for enrichment of
library-created catalog records; the
Bibliographic Enrichment Advisory
Team (BEAT) project at the Library
of Congress is experimenting with
this use. ONIX metadata may also
be used by libraries in the future for
the creation of a beginning
bibliographic record. Mappings
between ONIX for Books and both
MARC 21 and UNIMARC have
already been created.
Visual Objects – CDWA
and VRA
Metadata used to describe visual
objects such as a painting or
sculpture has its own special
requirements. The Art Information
Task Force (AITF), developed a
conceptual framework for describ-
ing and accessing information about
objects and images called
Categories for the Descriptions of
Works of Art (CDWA). Some 30
categories were defined, most with
multiple subcategories. Some
examples of the specialized
descriptive elements relevant to
artworks included are: Orientation,
Dimensions, Condition, Inscrip-
tions, Conservation Treatment, and
Exhibition / Loan History.
Typically, visual resources
collections used in teaching art
history and similar subjects do not
contain original art works but rather
slides or photographs of the original
art. Metadata for these materials
therefore has to accommodate the
description of multiple levels of
related resources, such as an
original painting, a slide of the
painting, and a digitized image of
the slide. The VRA Core Categories
build on and expand the CDWA
work to define a single metadata
element set that can be used to
describe the work (the actual
painting, photograph, sculpture,
building, etc. ) as well as the images
(visual representations) of them.
Version 3.0 of the VRA Core
Categories consists of 17 metadata
elements which can be used as
applicable to describe each of these
versions and relate them to each
other: Record Type, Type, Title,
Measurements, Material, Tech-
nique, Creator, Date, Location, ID
Number, Style/Period, Culture,
Subject, Relation, Description,
Source, and Rights. Like the Dublin
Core, the VRA Core scheme does
not specify any particular syntax or
rules for representing content.
Both CDWA and VRA
emphasize the use of controlled
vocabularies for specified elements.
A number of existing vocabularies
are suggested and communities are
encouraged to develop additional
vocabularies as needed.
Page Understanding Metadata8
MPEG Multimedia
Metadata
The ISO/IEC Moving Picture
Experts Group (MPEG) has
developed a suite of standards for
coded representation of digital
audio and video. Two of the
standards address metadata:
MPEG-7, Multimedia Content
Description Interface (ISO/IEC
15938), and MPEG-21, Multimedia
Framework (ISO/IEC 21000).
MPEG-7 defines the metadata
elements, structure, and rela-
tionships that are used to describe
audiovisual objects including still
pictures, graphics, 3D models,
music, audio, speech, video, or
multimedia collections. It is a multi-
part standard that addresses:
• Description Tools including
Descriptors that define the
syntax and the semantics of
each metadata element and
Description Schemes that
specify the structure and
semantics of the relationships
between the elements.
• A Description Definition Lang-
uage to define the syntax of the
Description Tools, allow the
creation of new Description
Schemes, and allow the
extension and modification of
existing Description Schemes.
• System tools, to support storage
and transmission, synch-
ronization of descriptions with
content, and management and
protection of intellectual property.
Descriptors for visual and audio
are defined separately using a
hierarchy of elements and sub-
elements. For visual objects there
are descriptors for Basic Structure,
Color, Texture, Shape, Motion,
Localization, and Face Recognition.
Audio descriptors are divided into
two categories: low-level
descriptors that are common to
audio objects across most
applications, and high-level
descriptors that are specific to
particular applications of audio. The
cross-application low-level descrip-
tors cover Structures and Features
(temporal and spectral). The
domain-specific high-level descrip-
tors include such elements as
Musical Instrument Timbre, Melody
Description, and Spoken Content
Description.
The Description Schemes are
based on XML, and can be
expressed in textual form suitable
for editing, searching, filtering, and
human readability; or in a binary
form for storage, transmission, and
streaming delivery. Since the full
description of a multimedia object
can be quite complex, the standard
provides for a Summary Description
Scheme geared to browsing and
navigation.
The standard envisions that
search engines could use MPEG-7
metadata descriptions to identify
audiovisual objects in entirely new
ways, such as digitizing a musical
phrase played on a keyboard and
then retrieving a list of musical
pieces that contain the sequence of
notes; drawing some lines on an
electronic drawing tablet and
retrieving images with similar
graphics; or using a voice excerpt
to retrieve related speech files,
photographs, video clips, and
biographical information of the
speaker. These retrieval mech-
anisms are outside the scope of
MPEG-7, but the standards
developers wanted to
accommodate these futuristic
capabilities and have included
many interoperability requirements
beyond the typical metadata
elements.
MPEG-21 was developed to
address the need for an overarching
framework to ensure interoperability
of digital multimedia objects. The
multi-part standard is not yet fully
completed but is intended to include
the following:
• Part 1: Vision, Technologies and
Strategy provides the overview
of the complete vision and plan
for the framework. It was issued
as an ISO technical report (ISO/
IEC TR 21000:1-2001) and is
available as a free download
from ISO’s publicly available
standards website. A second
edition of the vision document is
underway to address comments
and suggestions received from
other organizations following the
initial publication.
• Part 2: Digital Item Declaration,
issued in 2003, describes a
model for defining Digital Items.
It includes a description of the
syntax and semantics of each of
the Digital Item Declaration
elements and a corresponding
XML schema.
• Part 3: Digital Item Identification,
also issued in 2003, describes
how to uniquely identify Digital
Items and how to link Digital
Items with related information
such as descriptive metadata.
• Part 4: Intellectual Property
Management and Protection is
still in development. It is intended
to define the framework for
ensuring interoperability of
intellectual property manage-
ment tools, including authen-
tication, and accommodates the
Rights information defined in the
following two parts.
• Part 5: Rights Expression
Language, issued in 2004, is a
machine-readable language that
can declare rights and per-
missions.
• Part 6: Rights Data Dictionary is
still in development. It will define
a standard set of terms to be
used with the Rights Expression
Language. It is also expected to
include specifications for
mapping and transforming rights
metadata terminology. The
Rights Data Dictionary and
Expression Language are being
viewed as models for the
handling of intellectual property
metadata for applications
beyond audiovisual.
PageUnderstanding Metadata 9
• Part 7: Digital Item Adaptation,
also in development, is intended
to standardize networking and
interoperability description tools.
Included in this part will be User
Characteristic description tools
that specify user preferences.
There are some seven additional
parts identified and in various
stages of development that deal
with technical interoperability issues
of less specific relevance to
metadata. All of the published parts
are available from ISO as ISO/IEC
21000-[part#].
Metadata for Datasets
Metadata schemes for datasets
are enabling original data in the
science and social science fields to
be shared in a way that was never
possible before the Internet. One of
the most well developed element
sets is the Federal Geographic Data
Committee (FGDC) Content
Standard for Digital Geospatial
Metadata (CSDGM), officially
known as FGDC-STD-001-1998.
Geospatial datasets include
topographic and demographic data,
GIS (geographic information
systems), and computer-aided
cartography base files. They are
used in a wide variety of areas,
including soil and land use studies,
biodiversity counts, climatology and
global change tracking, remote
sensing, and satellite imagery. The
FGDC Content Standard is required
for use with resources created and
funded by the U.S. Government and
is also being used by many state
governments.
An international standard, ISO
19115, Geographic Information—
metadata was issued in 2003. A
technical amendment that will allow
datasets to be both ISO and FGDC
compliant is underway along with an
implementation model that can be
used in conjunction with an XML
schema.
A metadata scheme becoming
well established in the social and
behavioral sciences is the Data
Documentation Initiative (DDI)
standard for describing social
science datasets. The DDI is
defined as an XML DTD, and allows
for top down hierarchical description
of a social science study, the data
files resulting from that
study, and the variables
used in the data files.
There is also a header
area that uses Dublin Core
elements for a high-level
description of the DDI
document itself.
Extensions and
Profiles
Despite the recent
development of many of
these metadata schemes,
most have already been
subject to the changes
brought about by imple-
menting them in real world
situations. These modifi-
cations are of two types:
extensions and profiles.
An extension is the
addition of elements to an
already developed
scheme to support the
description of an infor-
mation resource of a
particular type or subject
or to meet the needs of a
particular interest group.
Extensions increase the
number of elements.
Profiles are subsets of a scheme
that are implemented by a particular
interest group. Profiles can
constrain the number of elements
that will be used, refine element
definitions to describe the specific
types of resources more accurately,
and specify values that an element
can take.
In practice, many applications
use both extensions and profiles of
base metadata schemes. For
example, the National Biological
Information Infrastructure (NBII)
has developed a Biological Data
Profile of the FGDC Content
Standard for use with biological
information resources. The profile
defines an extended set of data for
describing biological data, such as
the taxonomic name of the
organism and its classification in the
taxonomic hierarchy.
The U.S. Department of
Education’s Gateway to Edu-
cational Materials (GEM) project
has based their own metadata
scheme on the Dublin Core. The
GEM profile limits the Dublin Core
elements that can be used (for
example, Contributor is not allowed)
and makes some elements
mandatory. GEM also defines ad-
ditional elements such as Audience,
Grade, Quality, and Standards,
extending the base Dublin Core set
for educational use.
Metadata in Action
A county land planner is studying the
impact of new zoning laws on a particular
bird species. The study team is composed
of an ecologist, hydrologist, civil engineer,
and environmental protection specialist.
Remote sensing data for the last 20
years provides a trend analysis of the
decrease in wetlands, the bird’s habitat.
These datasets have FGDC metadata. The
biologists on the study team need to
document the results of a field inventory.
Using a biological profile to extend the
FGDC element set, the biologists add the
genus-species name and taxonomic
hierarchy. The ecologists are concerned
with collection methods and modeling
tools. The data related to the changes in
human population are documented using
a metadata set developed by the Census
Bureau.
This study results in a technical report
which is assigned Dublin Core metadata
by the author. When the technical report
is cataloged into the organization’s
repository, the Dublin Core elements are
used as the basis for automatic generation
of a MARC cataloging record. This record
is enhanced by the cataloger and included
in the library’s online public access
catalog.
Page Understanding Metadata10
Creating Metadata
Who creates metadata? The
answer to this varies by discipline,
the resource being described, the
tools available, and the expected
outcome, but it is almost always a
cooperative effort.
Much basic structural and
administrative metadata is supplied
by the technical staff who initially
digitize or otherwise create the
digital object, or is generated
through an automated process. For
descriptive metadata, it is best in
some situations if the originator of
the resource provides the
information. This is particularly true
in the documentation of scientific
datasets where the originator has
significant understanding of the
rationale for the dataset and the
uses to which it could be put, and
for which there is little if any textual
information from which an indexer
could work.
However, many projects have
found that it is more efficient to have
indexers or other information
professionals create the descriptive
metadata, because the authors or
creators of the data do not have the
time or the skills. In other cases, a
combination of researcher and
information professional is used.
The researcher may create a
skeleton, completing the elements
that can be supplied most readily.
Then results may be supplemented
or reviewed by the information
specialist for consistency and
compliance with the schema syntax
and local guidelines.
Creation Tools
Many metadata project
initiatives have developed tools and
made them available to others,
sometimes for free. A growing
number of commercial software
tools are also becoming available.
Creation tools fall into several
categories:
• Templates allow a user to enter
the metadata values into pre-set
fields that match the element set
being used. The template will
then generate a formatted set of
the element attributes and their
corresponding values.
• Mark-up tools will structure the
metadata attributes and values
into the specified schema
language. Most of these tools
generate XML or SGML
Document Type Definitions
(DTD). Some templates include
such a mark-up as part of their
final translation of the metadata.
• Extraction tools will
automatically create metadata
from an analysis of the digital
resource. These tools are
generally limited to textual
resources. The quality of the
metadata extracted can vary
significantly based on the tool’s
algorithms as well as the content
and structure of the source text.
These tools should be con-
sidered as an aid to creating
metadata. The resulting
metadata should always be
manually reviewed and edited.
• Conversion tools will translate
one metadata format to another.
The similarity of elements in the
source and target formats will
affect how much additional
editing and manual input of
metadata may be required.
Metadata tools are generally
developed to support specific
metadata schemas or element sets.
The websites for the particular
schema will frequently have links to
relevant toolsets.
Metadata Quality Control
The creation of metadata
automatically or by information
originators who are not familiar with
cataloging, indexing, or vocabulary
control can create quality problems.
Mandatory elements may be
missing or used incorrectly. Schema
syntax may have errors that prevent
the metadata from being processed
correctly. Metadata content ter-
minology may be inconsistent,
making it difficult to locate relevant
information.
The Framework of Guidance for
Building Good Digital Collections,
available on the NISO website,
articulates six principles applying to
good metadata:
• Good metadata should be
appropriate to the materials in
the collection, users of the
collection, and intended, current
and likely use of the digital
object.
• Good metadata supports inter-
operability.
• Good metadata uses standard
controlled vocabularies to reflect
the what, where, when and who
of the content.
• Good metadata includes a clear
statement on the conditions and
terms of use for the digital object.
• Good metadata records are
objects themselves and
therefore should have the
qualities of archivability,
persistence, unique ident-
ification, etc. Good metadata
should be authoritative and
verifiable.
• Good metadata supports the
long-term management of
objects in collections.
There are a number of ongoing
efforts for dealing with the metadata
quality challenge:
• Metadata creation tools are
being improved with such
features as templates, pick lists
that limit the selection in a
particular field, and improved
validation rules.
• Software interoperability pro-
grams that can automate the
“crosswalk” between different
schemas are continuously being
developed and refined.
• Content originators are being
formally trained in understanding
metadata and controlled
vocabulary concepts and in the
PageUnderstanding Metadata 11
use of metadata-related software
tools.
• Existing controlled vocabularies
that may have initially been
designed for a specific use or a
narrow audience are getting
broader use and awareness. For
example, the Content Types and
Subtypes originally defined for
MIME email exchange are
commonly used as the controlled
list for the Dublin Core Format
element.
• Communities of users are
developing and refining
audience-specific metadata
schemas, application profiles,
controlled vocabularies, and
user guidelines. The MODS User
Guidelines are a good example
of the latter.
Interoperability and
Exchange of Metadata
Some people ask: Do we need
so many metadata standards? With
all the metadata standards,
initiatives, extensions, and profiles,
how can interoperability be
ensured?
It is important to remember that
different schemes serve distinct
needs and audiences. Comple-
mentary schemes can be used to
describe the same resource for
multiple purposes and to serve a
number of user groups. For ex-
ample, a technical report could have
a MARC metadata set in a library’s
online catalog, an FGDC
description as part of the National
Spatial Data Infrastructure
C l e a r i n g h o u s e
Mechanism, and an
embedded set of
Dublin Core ele-
ments.
The Resource
D e s c r i p t i o n
Framework (RDF),
developed by the
World Wide Web
Consortium (W3C),
is a data model for
the description of
resources on the
Web that provides a
mechanism for
integrating multiple
metadata schemes.
In RDF a name-
space is defined by
a URL pointing to a
Web resource that
describes the
metadata scheme
that is used in the
description. Multiple
namespaces can
be defined, allowing
elements from
different schemes
to be combined in a
single resource
description. Multiple
descriptions, created at different
times for different purposes, can
also be linked to each other. RDF is
generally expressed in XML.
Metadata Crosswalks
The interoperability and ex-
change of metadata is further
facilitated by metadata crosswalks.
A crosswalk is a mapping of the
elements, semantics, and syntax
from one metadata scheme to those
of another.
A crosswalk allows metadata
created by one community to be
used by another group that employs
a different metadata standard. The
degree to which these crosswalks
are successful at the individual
record level depends on the
similarity of the two schemes, the
granularity of the elements in the
target scheme compared to that of
the source, and the compatibility of
the content rules used to fill the
elements of each scheme.
Crosswalks are important for
virtual collections where resources
are drawn from a variety of sources
and are expected to act as a whole,
perhaps with a single search engine
applied. While these crosswalks are
key, they are also labor intensive to
develop and maintain. The mapping
of schemes with fewer elements
(less granularity) to those with more
elements (more granularity) is
problematic.
Table 1 on page 12 shows a
crosswalk between Dublin Core,
MARC 21, and VRA Core for
selected elements. In this case,
there is no attempt to map at the
content level.
Metadata Registries
Registries are an important tool
for managing metadata. Metadata
registries can provide information
on the definition, origin, source, and
location of data. Registration can
apply at many levels, including
schemes, usage profiles, metadata
elements, and code lists for element
values. The metadata registry
provides an integrating resource for
A Dublin Core description
represented in RDF
<?xml version=”1.0″?>
<!DOCTYPE rdf:RDF SYSTEM “http://purl.org/
dc/schemas/dcmes-xml-20000714.dtd”>
<rdf:RDF xmlns_rdf=”http://www.w3.org/
1999/02/22-rdf-syntax-
ns#”xmlns:dc=”http://purl.org/dc/elements/
1.1/”>
<rdf:Description about=”http://
www.niso.org/standards/resources/
Metadata_Demystified.pdf”>
<dc:title>Metadata Demystified</
dc:title>
<dc:creator>Brand, Amy</dc:creator>
<dc:creator>Daly, Frank</dc:creator>
<dc:creator>Meyers, Barbara</
dc:creator>
<dc:subject>metadata</dc:subject>
<dc:description>Presents an overview
of metadata conventions in publish-
ing.</dc:description>
<dc:publisher>NISO Press</
dc:publisher>
<dc:publisher>The Sheridan Press</
dc:publisher>
<dc:date>2003-07</dc:date>
<dc:format>application/pdf</
dc.format>
</rdf:Description>
</rdf:RDF>
Page Understanding Metadata
legacy data, acts as a lookup tool
for designers of new databases,
and documents each data element.
Registries can also document
multiple schemes or element sets,
particularly within a specific field of
interest. A good example is the U.S.
Environmental Protection Agency’s
Environmental Data Registry that
provides information about
thousands of data elements used
in current and legacy EPA
databases.
Standards relevant to metadata
registries include ISO/IEC 11179,
Specification and Standardization of
Data Elements, and ANSI X3.285,
Metamodel for the Management of
Shareable Data.
Future Directions
Most early metadata standards
have focused on the descriptive
elements needed for discovery,
identification, and retrieval. As
metadata initiatives developed,
administrative metadata, especially
in the rights and preservation areas
was further emphasized. Technical
metadata is one area that still does
not get much attention in metadata
schemas. The effective exchange
and use of the digital objects
described by the metadata often
requires knowledge of specific
technical aspects of the objects
beyond its filename and type.
Newer standards are beginning to
address this need. The NISO/AIIM
standard, Z39.87, Data Dictionary—
Technical Metadata for Digital Still
Images, focuses solely on the
technical data needed to facilitate
interoperability between systems of
digital image files. The metadata
elements defined in the standard
cover basic image parameters such
as compression and color profile,
information about the equipment
and settings use to create the
image, and performance assess-
ment data such as sampling
frequency and color maps.
Metadata work is ongoing
across a number of standards
development organizations. In the
International Organization for
Standardization (ISO), a subcom-
mittee of Technical Committee (TC)
46 (Information and documen-
tation), is addressing metadata
development for bibliographic
applications. ISO TC 211 (Geo-
graphic information / Geomatics) is
developing metadata standards for
applications in geographic
information systems. The Data
management and interchange
subcommittee of ISO-IEC JTC1
(Information technology) is
developing standards for the
specification and management of
metadata and has recently issued
a technical report on Procedures for
achieving metadata registry content
consistency (ISO/IEC 20943).
Many organizations that
developed metadata specifications
outside the formal standards
community are seeking to have their
specifications turned into
international standards. The Dublin
Core is an example of this
approach. It was originally de-
veloped in 1995 at a workshop
sponsored by OCLC and the
12
National Center for Super-
computing Applications. In 2001, it
became an official ANSI/NISO
standard (Z39.85) and in 2003
Dublin Core was issued as an
international standard (ISO 15836).
The World Wide Web
Consortium’s (W3C) metadata
activity has been incorporated into
the Semantic Web, their initiative to
“provide a common framework that
allows data to be shared and reused
across application, enterprise, and
community boundaries.” The RDF
framework is one of the key
enabling standards. The Semantic
Web efforts are directed to
standards that increase the
interoperability of metadata, rather
than specific metadata schemas.
The World Wide Web has
created a revolution in the
accessibility of information. The
development and application of
metadata represents a major
improvement in the way information
can be discovered and used. New
technologies, standards, and best
practices are continually advancing
the applications for metadata. The
resources in the following section
will give you a head start in tracking
developments and contain links to
more information on the projects
discussed throughout this
document.
Dublin Core EAD MARC 21
Title Element Title <titleproper> 245 00$a (Title Statement/Title proper)
Author Element Creator <author>
700 1#$a (Added Entry–Personal Name)
(with $e=author)
720$a (Added Entry–Uncontrolled
Name/Name) (with $e=author)
Date Created
Element Date.Created <unitdate> 260 ##$c (Date of publication, distribution,
etc.)
Table 1. Example of Metadata Crosswalk Mapping
PageUnderstanding Metadata
More Information on Metadata
13
General Resources
Digital Libraries: Metadata
Resources (IFLA)
http://www.ifla.org/II/
metadata.htm
A Framework of Guidance for
Building Good Digital
Collections
http://www.niso.org/framework/
forumframework.html
Introduction to Metadata:
Pathways to Digital
Information
by Martha Baca
http://www.getty.edu/research/
conducting_research/standards/
intrometadata/index.html
Metadata: Cataloging by Any
Other Name
by Jessica Milstead and Susan
Feldman
ONLINE, January 1999
http://www.onlinemag.net/
OL1999/milstead1.html
Metadata and Its Application
by Brad Eden
Library Technology Reports
(September-October 2002)
Metadata Demystified: A Guide
for Publishers
by Amy Brand, Frank Daly,
Barbara Meyers
NISO Press & The Sheridan
Press, 2003,
ISBN 1-880125-49-9
http://www.niso.org/standards/
resources/
Metadata_Demystified.pdf
Metadata Fundamentals for All
Librarians
by Priscilla Caplan
ALA, 2003, ISBN: 0-8389-0847-0
Metadata Information
Clearinghouse Interactive
(MICI)
http://www.metadata
information.org
Metadata Portals and Multi-
standard Projects
by Candy Schwartz
http://web.simmons.edu/
~schwartz/meta.html
Metadata Primer – A “How To”
Guide on Metadata
Implementation [for digital spatial
data]
by David Hart and Hugh Phillips
http://www.lic.wisc.edu/metadata/
metaprim.htm
Metadata Principles and
Practicalities
Duval, Erik, Wayne Hodgins,
Stuart Sutton, and Stuart L.
Weibel
D-Lib Magazine 8(4) (April 2002)
http://www.dlib.org/dlib/april02/
weibel/04weibel.html
Metadata Resources (UKOLN)
http://www.ukoln.ac.uk/metadata/
resources
Metadata Standards
http://www.chin.gc.ca/English/
Standards/metadata_intro.html
Metadata Standards,
Crosswalks, and Standards
Organizations
http://staff.library.mun.ca/staff/
toolbox/standards.htm
Metadata.net – Projects, Tools &
Services, and Schema Registry
(Australia)
http://metadata.net/
Preservation Metadata for
Digital Objects: A Review of the
State of the Art
A White Paper by the OCLC/RLG
Working Group on Preservation
Metadata, January 31, 2001
www.oclc.org/research/projects/
pmwg/presmeta_wp.pdf
Schemes, Initiatives,
and Related Sites
Application profiles: mixing and
matching metadata schemas
Rachel Heery and Manjula Patel,
Ariadne, Issue 25, September
2000.
http://www.ariadne.ac.uk/issue25/
app-profiles/intro.html
The Cedars Project (CURL
exemplars in digital archives)
http://www.leeds.ac.uk/cedars/
metadata.html
CDWA (Categories for the
Description of Works of Art)
http://www.getty.edu/research/
conducting_research/standards/
cdwa/
DDI (Data Documentation
Initiative)
http://www.icpsr.umich.edu/DDI/
DOI (Digital Object Identifier)
http://www.doi.org/
Dublin Core Metadata Initiative
(DCMI)
http://dublincore.org
EAD (Encoded Archival
Description)
http://www.loc.gov/ead/
Environmental Data Registry
(EPA)
http://www.epa.gov/edr/
FGDC Content Standard for
Digital Geospatial Metadata
(CSDGM)
http://www.fgdc.gov/metadata/
Gateway to Educational
Materials (GEM)
http://www.geminfo.org/
Page Understanding Metadata14
IFLA Functional Requirements
for Bibliographic Records
http://www.ifla.org/VII/s13/frbr/
frbr.htm
IMS Global Learning
Consortium
http://www.imsglobal.org
<indecs> interoperability of
data in ecommerce systems
http://www.indecs.org/
LOM (Learning Object
Metadata)
http://ltsc.ieee.org/wg12/
MARC 21 (Machine-Readable
Cataloging)
http://www.loc.gov/marc
MetaWeb Project
http://www.dstc.edu.au/Research/
Projects/metaweb/
METS (Metadata Encoding and
Transmission Standard)
http://www.loc.gov/standards/
mets/
MIX (Metadata for Images in
XML Schema)
http://www.loc.gov/standards/mix/
MODS (Metadata Object
Description Schema)
http://www.loc.gov/standards/
mods/
MPEG (Moving Picture Experts
Group)
http://www.chiariglione.org/mpeg/
NBII (National Biological
Information Infrastructure)
http://www.nbii.gov/
Nordic Metadata Projects
http://www.lib.helsinki.fi/meta/
NSDI (National Spatial Data
Infrastructure)
http://www.fgdc.gov/nsdi/
OAI (Open Archives Initiative)
http://www.openarchives.org/
OAIS (Open Archival
Information System)
http://www.ccsds.org/documents/
650x0b1.pdf
ONIX (Online Information
Exchange)
http://www.editeur.org/onix.html
Open GIS Consortium
http://www.opengis.org/
PADI (Preserving Access to
Digital Information)
http://www.nla.gov.au/padi/topics/
32.html
PREMIS (PREservation
Metadata: Implementation
Strategies)
http://www.oclc.org/research/
projects/pmwg
PURL (Persistent Uniform
Resource Locator)
http://purl.org
RDF (Resource Description
Framework)
http://www.w3.org/RDF/
SCHEMAS: Forum for Metadata
Schema Implementors
(UKOLN)
http://www.ukoln.ac.uk/metadata/
schemas/
TEI (Text Encoding Initiative)
VRA (Visual Resources
Association) Core Categories
http://www.vraweb.org/
vracore3.htm
XML (Extensible Markup
Language)
http://www.w3.org/XML/
Z39.50
http://www.loc.gov/z3950/agency/
ZING (Z39.50 Next Generation)
http://www.loc.gov/z3950/agency/
zing/zing-home.html
Crosswalks and Lists of
Crosswalks
All about Crosswalks
http://www.oclc.org/research/
projects/mswitch/
1_crosswalks.htm
Dublin Core / MARC / GILS
Crosswalk
http://www.loc.gov/marc/
dccross.html
FGDC to MARC
http://www.alexandria.ucsb.edu/
public-documents/metadata/
fgdc2marc.html
Issues in Crosswalking Content
Metadata Standards
by Margaret St. Pierre and William
P. LaPlant, Jr.
http://www.niso.org/press/
whitepapers/crsswalk.html
MARC 21 to Dublin Core
http://www.loc.gov/marc/
marc2dc.html
Metadata: Mapping between
Metadata Formats (UKOLN)
http://www.ukoln.ac.uk/metadata/
interoperability/
Metadata Mappings
(Crosswalks)
http://libraries.mit.edu/guides/
subjects/metadata/mappings.html
Metadata Standards Crosswalk
(Getty)
http://www.getty.edu/research/
conducting_research/standards/
intrometadata/3_crosswalks/
crosswalk1.html
Metadata Standards Crosswalks
(Canadian Heritage Information
Network)
http://www.chin.gc.ca/English/
Standards/
metadata_crosswalks.html
PageUnderstanding Metadata 15
Metadata Registries &
Clearinghouses
DCMI Registry Working Group
http://dublincore.org/groups/
registry/
DESIRE Metadata Registry
http://desire.ukoln.ac.uk/registry/
Environmental Data Registry
http://www.epa.gov/edr/
FGDC Clearinghouse Registry
http://registry.gsdi.org/
MICI (Metadata Information
Clearinghouse Interactive)
http://
www.metadatainformation.org/
AACR2 (Anglo-American
Cataloging Rules) – A standard set
of rules for cataloging library
materials. The “2” refers to the
second edition.
administrative metadata –
metadata related to the use,
management, and encoding
processes of digital objects over a
period of time. Includes the subsets
of technical metadata, rights
management metadata, and
preservation metadata.
ANSI (American National
Standards Institute) – administers
and coordinates the U.S. voluntary
standardization and conformity
assessment system.
CDWA (Categories for the
Descriptions of Works of Art) – a
metadata element set for describing
artworks.
crosswalk – a mapping of the
elements, semantics, and syntax
from one metadata scheme to
another.
CSDGM (Content Standard for
Digital Geospatial Metadata) – a
metadata standard developed by
the FGDC. Officially known as
FGDC-STD-001.
dataset – a collection of computer-
readable data records.
DC (Dublin Core) – a general
metadata element set for describing
all types of resources.
DDI (Data Documentation
Initiative) – a specification for
describing social science datasets.
descriptive metadata – metadata
that describes a work for purposes
of discovery and identification, such
as creator, title, and subject.
DLF (Digital Library Federation)
– a membership organization
dedicated to making digital
information widely accessible.
DOI (Digital Object Identifier) – a
unique identifier assigned to
electronic objects of intellectual
property which can be resolved to
the object’s location on the Internet.
DTD (Document Type Definition)
– a formal description in SGML or
XML syntax of the structure
(elements, attributes, and entities)
to be used for describing the
specified document type.
EAD (Encoded Archival
Description) – a metadata scheme
for collection finding aids.
element set – information
segments of the metadata record,
often called semantics or content.
encoding rules – the syntax or
prescribed order for the elements
contained in the metadata
description.
Glossary
NBII Metadata Clearinghouse
http://metadata.nbii.gov/
The SCHEMAS Registry
http://www.schemas-forum.org/
registry/
Tools for Metadata
Creation
DDI Tools
http://www.icpsr.umich.edu/DDI/
users/tools.html#a01
Dublin Core tools
http://dublincore.org/tools/
FGDC Metadata Tools
http://www.nbii.gov/datainfo/
metadata/tools/
Metadata Software Tools
http://ukoln.bath.ac.uk/metadata/
software-tools/
OAI-Specific Tools
http://www.openarchives.org/tools/
tools.html
RDF Editors and Tools
http://www.ilrt.bris.ac.uk/
discovery/rdf/resources/#sec-tools
TEI Software
http://www.tei-c.org/Software/
index.html
extension – an element that is not
officially part of a metadata scheme,
which is defined for use with that
scheme for a particular application.
FGDC (Federal Geographic Data
Committee) – a U.S. Federal
government interagency committee
responsible for developing the
National Spatial Data Infrastructure.
GEM (Gateway to Educational
Materials) – a U.S. Department of
Education initiative that has defined
an extension to the Dublin Core
element set to accommodate
educational resources.
GIS (Geographic Information
System) – a computer system for
capturing, managing, and
displaying data related to positions
on the Earth’s surface.
HTML (Hypertext Mark-up
Language) – a set of tags and rules
derived from SGML used to create
hypertext documents for the World
Wide Web. Officially, a W3C
Recommendation.
<indecs> (Interoperability of Data
in ECommerce Systems) – a
framework for metadata to support
commerce in intellectual property.
interoperability – the ability of
multiple systems, using different
hardware and software platforms,
data structures, and interfaces, to
exchange and share data.
Page Understanding Metadata16
ISO (International Organization
for Standardization) – the primary
international standards develop-
ment organization.
IEC (International Electro-
technical Commission) – an
international standards develop-
ment organization for all electrical,
electronic and related technologies.
Co-sponsors with ISO the Joint
Technical Committee 1 on Infor-
mation Technology.
LOM (Learning Object Metadata)
– a metadata scheme for
technology-supported learning
resources.
MARC 21 (MAchine Readable
Cataloging) — a formatting, record
structure, and encoding standard
for electronic bibliographic
cataloging records developed by
the Library of Congress. The “21”
refers to the version of MARC
issued in 1998 that integrated the
U.S. and Canadian versions of
MARC.
MARCXML – a metadata scheme
for working with MARC data in a
XML environment
metadata – structured information
that describes, explains, locates,
and otherwise makes it easier to
retrieve and use an information
resource.
metadata harvesting – a technique
for extracting metadata from
individual repositories and
collecting it in a central catalog
METS (Metadata Encoding and
Transmission Standard) – a
metadata scheme for complex
digital library objects.
MODS (Metadata Object
Description Schema) – a
metadata scheme for rich
description of electronic resources.
MPEG (Moving Pictures Experts
Group) – Standards Committee 29,
Working Group 11 of ISO/IEC JTC1,
which develops standards for digital
audio and video. Also refers to a
suite of standards developed by the
group.
namespace – in RDF, a way to tie
a specific use of a metadata
element to the scheme where the
intended definition is to be found.
NISO (National Information
Standards Organization) – a
standards development organ-
ization, accredited by the American
National Standards Institute, that
develops library and information-
related standards.
ONIX (Online Information
Exchange) – a metadata scheme
for book bibliographic, trade, and
promotional data.
preservation metadata – a form of
administrative metadata dealing
with the provenance of a resource
and its archival management.
profile – a subset of a scheme
defined and used by a particular
interest group to customize the
scheme for its purposes.
PURL (Persistent URL) – a naming
and resolution system developed by
OCLC utilizing an intermediate
redirection service to locate a
resource’s URL.
qualifier – an optional sub-element
to a Dublin Core element that is
used to further refine the element
or support a specific encoding
scheme.
RDF (Resource Description
Framework) – a language for
representing metadata about Web
resources so it can be exchanged
between applications without loss
of meaning. Officially, a suite of
W3C specifications.
registry – a formal system for the
documentation of the element sets,
descriptions, semantics, and syntax
of one or more metadata schemes.
rights management metadata – a
form of administrative metadata
dealing with the intellectual property
rights of a resource.
scheme (schema)– a metadata
element set and rules for using it.
semantics – the names and
meanings of metadata elements.
SGML (Standard Generalized
Markup Language) – a language
used to mark-up electronic
documents with tags that define the
relationship between the content
and the structure. Officially,
international standard ISO 8879,
Information processing—Text and
office systems—Standard Gen-
eralized Markup Language (SGML).
structural metadata – metadata
that indicates how compound
objects are structured, provided to
support use of the objects.
syntax – rules for how metadata
elements and their content are
encoded.
technical metadata – a form of
administrative metadata dealing
with the creation or storage
encoding processes or formats of
the resource.
TEI (Text Encoding Initiative) – a
metadata scheme for electronic text
URL (Uniform Resource Locator)
– A unique address for identifying
and locating a resource on the
Internet.
VRA (Visual Resources
Association ) Core – a metadata
scheme for describing a visual work
and its representations
W3C (World Wide Web
Consortium) – an international
consortium that develops
consensus protocols and
specifications to ensure the
interoperability of the World Wide
Web.
XML (Extensible Mark-up
Language) – an application profile
of SGML designed for use in Web
applications. Officially, a W3C
Recommendation.
Z39.50 – a NISO and ISO standard
protocol for cross-system search
and retrieval. Officially, international
standard, ISO 23950, Information
Retrieval (Z39.50): Application
Service Definition and Protocol
Specification, and ANSI/NISO
standard Z39.50.
Glossary
PageUnderstanding Metadata
3M
American Association of Law
Libraries
American Chemical Society
American Library Association
American Society for Information
Science and Technology
American Society of Indexers
American Theological Library
Association
ARMA International
Armed Forces Medical Library
Art Libraries Society of North
America
AIIM International
Association of Information and
Dissemination Centers
Association of Jewish Libraries
Association of Research Libraries
Auto-Graphics, Inc.
Barnes & Noble, Inc.
Book Industry Communication
California Digital Library
Cambridge Information Group
Checkpoint Systems, Inc.
College Center for Library
Automation
Colorado State Library
CrossRef
Davandy, L.L.C.
Docutek Information Systems
Dynix Corporation
EBSCO Information Services
Elsevier Science Inc.
Endeavor Information
Systems, Inc.
Entopia, Inc.
ExLibris USA
Fretwell-Downing Informatics
Gale Group
Geac Library Solutions
GIS Information Systems, Inc.
H.W. Wilson Company
Helsinki University Library
Index Data
Infotrieve
Innovative Interfaces, Inc.
Institute for Scientific Information
The International DOI Foundation
Ithaka/JSTOR/ARTstor
John Wiley & Sons, Inc.
KINS, Inc.
Library Binding Institute
Library of Congress
The Library Corporation
Los Alamos National Laboratory
Lucent Technologies
Medical Library Association
MINITEX
Modern Language Association
Motion Picture Association of
America
MuseGlobal, Inc.
Music Library Association
National Agricultural Library
National Archives and Records
Administration
National Federation of Abstracting
and Information Services
National Library of Medicine
National Security Agency
Nylink
OCLC, Inc.
Openly Informatics, Inc.
ProQuest Information and Learning
Random House, Inc.
Recording Industry Association of
America
The Research Libraries Group
SAGE Publications
Serials Solutions, Inc.
SIRSI Corporation
Society for Technical
Communication
Society of American Archivists
Special Libraries Association
Synapse Corporation
TAGSYS, Inc.
Talis Information Ltd.
Triangle Research Libraries
Network
U.S. Department of Commerce,
NIST, Office of Information
Services
U.S. Department of Defense, DTIC
(Defense Technical Information
Center)
U.S. Department of Energy, Office
of Scientific & Technical
Information
U.S. Government Printing Office
U.S. National Commission on
Libraries and Information Science
VTLS, Inc.
WebFeat
Support the leaders in our community who support NISO as Voting Members:
ISBN 1-880124-62-9
- What is Metadata?
- What Does Metadata Do?
- Resource Discovery
- Organizing Electronic Resources
- Interoperability
- Digital Identification
- Archiving and Preservation
- Structuring Metadata
- Metadata Schemes and Element Sets
- Dublin Core
- Text Encoding Initiative (TEI)
- Metadata Encoding and Transmission Standard (METS)
- Metadata Objects Description Schema (MODS)
- Encoded Archival Description (EAD)
- Learning Object Metadata (LOM)
- E-Commerce
- <indecs>
- ONIX
- Visual Objects
- Categories for the Description of Works of Art (CDWA)
- VRA Core Categories
- MPEG Multimedia Metadata
- Metadata for Datasets
- Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM)
- Data Documentation Initiative (DDI)
- Extensions and Profiles
- NBII Biological Data Profile
- Gateway to Educational Materials (GEM)
- Creating Metadata
- Creation Tools
- Metadata Quality Control
- Interoperability and Exchange of Metadata
- Resource Description Framework (RDF)
- Metadata Crosswalks
- Metadata Registries
- Future Directions
- More Information on Metadata
- General Resources
- Schemes, Initiatives, and Related Sites
- Crosswalks and Lists of Crosswalks
- Metadata Registries and Clearinghouses
- Glossary
- Sidebars and Tables
- Dublin Core Example
- Metadata in Action (1)
- MODS Record Example
- Metadata in Action (2)
- Dublin Core description represented in RDF
- Example of Metadata Crosswalk Mapping
Methodology | Preparation | Execution | Documentation
Pre-Operational Considerations Workspace & Tools
Time and Resource Constraints
Adversary Sophistication
Clean/Secure Workstation
Fresh Research Accounts
Collection Tools
Deliverables and Scope Clean/Secure Connectivity
Exposure/Risk Factors Clean Browser w/Extensions
OSINT Cheat-Sheet
Investigative Resources – Summer 2019
Control Expectations Storage/Archiving Solution
Communication and Sit-reps Documentation System
Investigative Steps OSINT Resources
Knoll Your Tools
Document Your “Knowns”
Query, Sweep, and Pivot
Define The Question
Set Up Collection
Complete Reporting and Archive
Consolidate Findings
OSINTFramework.com
Netbootcamp.org
Investigativedashboard.org
OSINTBrowser.com
Workinukraine.space
Start.me/p/b56xX8/osint
Ethical and Legal Assessment
INTELTECHNIQUES
.com
Tab Management
https://www.one-tab.com/ (Local Storage Only)
Simple Tab Management/Export For Chrome and Firefox
https://chrome.google.com/webstore/detail/graphitabs/dcfclemgmkccmnpgn-
ldhldjmflphkimp?hl=en GraphiTabs – Tree View of Tabs
http://tabsoutliner.com/
Tab Management – Outline Format, Export, Sync (Paid version)
http://www.gettoby.com/ (Account Bases w/Sync)
Thumbnailed Tab Management For Chrome and Firefox
https://clusterwm.com/
Simple Tab Manager w/Export (Sync Premium Offered)
Useful Browser Extensions
https://www.onenote.com/clipper
Screen Capture and Tag (One-Note Users Only)
https://github.com/ssborbis/ContextSearch-web-ext
Context Menu Search Menu
https://github.com/az0/linkgopher/
Simple Link Extraction
https://getfireshot.com/
Screen Capture and Annotation (as image or pdf)
http://www.osintbrowser.com/
OSINT Bookmarks
https://github.com/mozilla/multi-account-containers#readme
Firefox – Multi-Account Containers (Compartmentalization)
https://github.com/marklieberman/downloadstar
Firefox – Download all items in a webpage that match a pattern
Link Analysis/Visualization
https://www.paterva.com/buy/maltego-clients.php
Maltego CE and CaseFile
https://gephi.org/
http://www.automatingosint.com/blog/category/gephi/
https://medium.com/@raebaker/using-lampyre-for-basic-email-
and-phone-number-osint-e0e36c710880 (Lampyre)
https://vis.occrp.org/
Create Link Charts – Organized Crime & Corruption Project
https://www xmind.net/
Mind Mapping – Free and Paid Versions
My Workstation Setup
Workstation – Win 10, PIA/ProtonVPN, Chrome/Firefox, Vbox, Bus-
cador/Kali, Nox/Geny, Hunch.ly, UC Cable/Mifi, Keypass, Malware-
bytes, Glasswire
Email/Payments – Prontonmail, GMX, Fastmail, Blur, 33mail, Priva-
cy.com, Vanilla Visa
Alt-Hardware: MacBook Air, Atom Text Editor, VMware Fusion,
Chrome/Firefox, Little Snitch
Mobile – iPhone, MySudo, Signal, Wire
– Android, burner, unlocked, on Mint sim kit
Office Software – Libre, OneNote, Notepad++, CherryTree, Stan-
dard Notes, Paper notebook, Teams/Slack/Mattermost/Rocket
Hypervisors: Virtualbox, Buscador Linux, Kali Linux, Genymotion,
Nox
http://www.visualsitemapper.com/
Domain Mapping
https://www.draw.io/
https://github.com/michenriksen/drawio-threatmodeling
https://github.com/woj-ciech/Danger-zone
Link IPs, Domains, and Email Addresses
https://www.mindmup.com/
Mind Mapping – Free and Paid Tiers
https://www.nodexlgraphgallery.org/Pages/Registration.aspx
Powerful Graphing Client – Free and Paid Tiers
https://github.com/mozilla/multi-account-containers#readme
Firefox – Multi-Account Containers (Compartmentalization)
https://webrobots.io/
Scrape YP, Yelp, Ebay, Amazon, etc. Save as Excel or CSV
https://www.gettabli.com/
Simple, Private (offline-storage only) Tab Management
Google Operators
Remember we can string multiple operators together
site: Limit results to those from a specific domain site:apple.
com
“ ” Quotes indicate search for exact term “red rider BB gun”
AND Only show results for both terms apple AND orange
OR Search for term A, term B, or both. A pipe symbol is the
same as OR. gun OR rifle is the same as gun | rifle
* Wildcard for words in a phrase that you don’t know wish
* a star
( ) Group a set of words/operators separately (gun | pistol)
ammo
– Exclude results including this word chicago baseball
-cubs
$ Search for a certain price “apple watch” $299
cache: Most recent cached version of a domain cache:boston.
gov
filetype: Only search for specific filetype, ext: works the same
filetype:pdf “confidential” or ext:pdf “confidential”
related: Search for sites related to a domain related:sony.com
intitle: Find pages with a term in the page title intitle:sabotage
inurl: Find pages with a term in the url inurl:private
around(x) Find pages with terms in X words proximity of each oth-
er microsoft (7) surface
info: Sometimes shows related pages, cache date etc. in-
fo:chicago.gov
Adv. Search https://www.google.com/advanced_search
Bing Operators
Most of the Google operators work in Bing
( ) Just like Google, terms or operators grouped in paren-
thesis are processed together and separate from other
conditions
OR All Bing searches are treated as AND searches unless
you specify OR between terms goat OR pig OR cow
NOT Exclude results with a specific term(s) the – symbol also
works boat NOT (raft OR ship)
loc: Return pages from a specific region(s) dogs (loc:GB OR
loc:FR)
prefer: Weight results in favor of a term prefer:tomato plum
apple
near:x Words in x proximity of each other red near:4 blue
ip Finds sites hosted on an IP address ip:208.43.115.82
site/domain: Filter for specific domain type site/.gov confidential
feed: Finds RSS feeds based on search terms feed:osint
Bing Adv. MS retired Bing’s advanced search page
info:https://www.lifewire.com/bing-advanced-search-3482817
More Operators: https://ahrefs.com/blog/google-ad-
vanced-search-operators/
DuckDuckGo
DuckDuckGo handles some operators a little differently
Cats dogs Results about cats or dogs
“cats and dogs” Results for exact term “cats and dogs”. If no results are
found, we’ll try to show related results.
cats +dogs More dogs in results
cats filetype:pdf PDFs about cats. Supported file types: pdf, doc(x),
xls(x), ppt(x), html
dogs site:exam-
ple.com Pages about dogs from example.com
Cats -site:exam-
ple.com Pages about cats, excluding example.com
intitle:dogs Page title includes the word “dogs”
inurl:cats Page url includes the word “cats”
Startpage
Startpage makes Google requests on your behalf (privacy)
Operators Most standard Google operators work
Adv. Search https://www.startpage.com/en/advanced-search.
html
Search Tips https://support.startpage.com/index.php?/Knowl-
edgebase/List/Index/1
Yandex
Most standard Boolean operators work (Google operators) such as site:
and “quotes“
Adv. Search Click the icon in the search bar
lang: Language filter ccn lang:fr
mime: Similar to filetype mime:docx gdpr
date: Page modified date bombing date:20180416
url:
Similar to site: but adding a * to the end of
the url pulls up any docs sharing that url
url: Alice url:en.wikiquote.org/wiki/*
special operators: https://yandex.com/support/direct/
keywords/symbols-and-operators.html
Baidu
Most standard Google Operators work on Baidu
Adv. Search https://www.baidu.com/gaoji/advanced.html
In English http://www.baiduinenglish.com/
Search Tips https://www.seomandarin.com/baidu-search-tips.html
Other International
Consider using a proxy or VPN to appear in the target region
Adv. Search https://www.alexa.com/topsites/countries
Colossus http://www.searchenginecolossus.com/
Occrp https://data.occrp.org/
Int. OSINT https://start.me/p/W2kwBd/sources-cnty
UK https://investigativedashboard.org/databases/
http://www.rba.co.uk/search/TopSearchTips.html
Twitter
Don’t forget Google – “site:twitter.com keyword”
Advanced Search https://twitter.com/search-advanced
Toolset http://tweetbeaver.com/
User Report https://tinfoleak.com/
Analytics https://socialbearing.com/
Analytics https://analytics.mentionmapp.com/
Analytics https://foller.me
Analytics http://twiangulate.com/search/
Older Posts http://staringispolite.github.io/twayback-machine/
Search https://snapbird.org/
Followers https://doesfollow.com
Video https://twdown.net/
Visualization https://treeverse.app/
Profile Changes https://spoonbill.io/
Mapping https://onemilliontweetmap.com
Inteltechniques https://inteltechniques.com/menu/pages/twitter.
tool.html
Legal Requests https://help.twitter.com/en/rules-and-policies/twit-
ter-law-enforcement-support#19
Facebook
Warning: Many of these tools may not function correctly as
Facebook continues to kill graph search capabilty. https://www.
vice.com/en ca/article/zmpgmx/facebook-stops-graph-search
FB Expand http://com.hemiola.com/bookmarklet/
Messenger https://www.messenger.com/
Mobile View https://m.facebook.com/
FB Videos https://www.facebook.com/watch
Video Download https://www.fbdown.net/index.php
Video Download https://www.tubeninja.net/how-to-download/face-
book
NetBootcamp http://netbootcamp.org/facebook.html
(Warning: Netbootcamp.com does run tracking scripts)
Research Tools http://www.researchclinic.net/facebook/
User -> ID https://lookup-id.com/
(lookup-id.com runs some tracking scripts)
Graph Search https://inteltechniques.com/menu/pages/facebook.
tool.html (Reminder FB Graph Is Broken as of 8/2019)
Graph Search http://socmint.tools/graph.htm
Graph Search https://peoplefindthor.dk/
Graph Search https://pitoolbox.com.au/facebook-tool/
Graph Search https://searchisback.com/
Graph Search https://whopostedwhat.com/
Graph Search https://www.uk-osint.net/facebook.html
Graph Search https://github.com/sowdust/searchbook
Graph Discussion https://inteltechniques.com/blog/2019/08/02/
the-privacy-security-osint-show-episode-133/
Legal & Privacy https://www.facebook.com/safety/groups/law/guide-
lines
Reddit
Don’t Forget Google – site:reddit.com keyword
Topic Search https://www.reddit.com/search?q=keyword
User Search https://www.reddit.com/user/username
Analytics https://pushshift.io/api-parameters/
Archives https://web.archive.org/web/*/https://www.reddit.com/
user/username
Inteltech-
niques
https://inteltechniques.com/menu/pages/communities.tool.
html
TikTok
https://www.ticktick.com
Search https://tiktokapi.ga/
Search https://www.osintcombine.com/tiktok-quick-search
How To IOS https://www.pageflows.com/post/ios/general-browsing/
tiktok
How To Android https://www.wikihow.tech/Find-Friends-on-Tik-Tok-on-
Android
Downloader https://en.savefrom.net/download-from-tiktok
Video Caputre https://airmore.com/watch-tik-tok-pc.html
Legal Requests https://www.tiktok.com/en/law-enforcement
Instagram
User/Tag Search https://www.yooying.com/search
User/Tag Search https://www.social-searcher.com/
Hashtag Search https://tagboard.com/
Analyze Followers https://hypeauditor.com/
Location Search https://www.osintcombine.com/instagram-explorer
Search https://mulpix.com/
Media Capture https://downloadgram.com/
Media Capture https://instasave.xyz/
Downloader https://www.4kdownload.com/products/prod-
uct-stogram
Profile Pic https://instadp.net/
Profile Pic http://izuum.com/
Stories https://storiesig.com/
Image Search https://imgwonders.com/
User/Hashtag http://picdeer.com/
User/Hashtag https://www.pictame.com/
Inteltechniques https://inteltechniques.com/menu/pages/instagram.
tool.html
Snapchat
User Search https://somesnapcode.com/
User Search https://www.snapdex.com/
Loc Search https://map.snapchat.com
Loc Search https://sovip.io
https://storage.googleapis.com/snap-inc/privacy/lawenforcement.pdf
Site Archives
Searching pre-existing archives or requesting a capture
Wayback Ma-
chine http://archive.org/web/
Archive Today http://archive.fo/
How To – Belling-
cat
https://www.bellingcat.com/resources/how-
tos/2018/02/22/archive-open-source-materials/
How To – Tech.co https://tech.co/news/tools-to-help-you-search-the-ar-
chived-internet-2018-06
Mass Archive
Script https://github.com/motherboardgithub/mass archive
OSINT Resource Lists
Collections curated by my favorite OSINT experts:
OSINT.Team https://osint.team/home (OSINT rocket chat group)
Ph055a https://github.com/Ph055a/OSINT-Collec-
tion#ph055as-osint-collection
Bellingcat Tool-
Kit
https://docs.google.com/document/d/1BfLPJpRty-
q4RFtHJoNpvWQjmGnyVkfE2HYoICKOGguA/edit
Sprp77 https://drive.google.com/drive/folders/1CBcemF-
dorkAqJ-Sthsh67OVHgH4FQF05
Baywolf88 https://www.learnallthethings.net/osint-resources
Sector0355 https://medium.com/@sector035
Justin Nordine https://osintframework.com/
Start.me’s:
Technisette
Bruno Mortier
Emmanuelle
-Welch
Travis Birch
https://start.me/p/7kxL6K/search-engines
https://start.me/p/b56xX8/osint
https://start.me/p/gyXexK/dating-apps-and-sites
https://start.me/p/kx72n5/databases
https://start.me/p/rxeRqr/aml-toolbox
https://start.me/p/ZME8nR/osint
Reuser http://arnoreuser.com/osint-repertorium/
Phonexicum https://phonexicum.github.io/infosec/osint.html#tools
i-intelligence https://www.i-intelligence.eu/wp-content/up-
loads/2018/06/OSINT_Handbook_June-2018_Final.pdf
PI Links https://diligentiagroup.com/due-diligence/101-investi-
gative-links-for-digging-up-information-on-people/
Photo/Image Search
Reminder: we do not upload sensitive photos to the internet
Search/Reverse https://images.google.com/
Search/Reverse https://tineye.com
Search/Reverse https://www.bing.com/images/
Reverse Russia https://www.yandex.com/images/
Reverse Asia http://images.baidu.com/
Search http://www.picsearch.com/
Twitter Search http://twipho.net/
Flickr https://www.flickr.com/map
Exif http://exif.regex.info/exif.cgi
Edit Detection http://www.errorlevelanalysis.com/
Basic Forensics https://fotoforensics.com/
Text Recog. https://www.newocr.com/
Stolen Check www.stolencamerafinder.com/
Document Search
Google “keyword AND ext:pdf OR ext:docx OR ext:txt OR ext.xlsx”
https://psbdmp.ws http://www.findpdfdoc.com/
http://cryptome.org https://www.base-search.net/
http://megasearch.co https://psbdmp.ws
Video
Extension https://www.downloadhelper.net/
Youtube-DL https://github.com/ytdl-org/youtube-dl
Extension https://addons.mozilla.org/en-US/firefox/addon/
video-downloader-profession/
Screen Capture https://www.techsmith.com/screen-capture.html
Video Archives https://archiving.witness.org/archive-guide/ac-
quire/acquiring-raw-video-and-metadata/
Capture/Collection Tools
Although not open-source, Hunch.ly remains my go-to ;safety-net & collec-
tion too.
Hunch.ly
https://hunch.ly/try-it-now
https://hunch.ly//guides
Screen Capture
Extension https://getfireshot.com/
Snip & Sketch https://www.microsoft.com/en-us/p/snip-sketch/9mz-
95kl8mr0l#activetab=pivot:overviewtab
Annotation https://www.diigo.com/
OneNote Clip https://www.onenote.com/clipper
Spiderfoot https://www.spiderfoot.net/
Documentation Tools
Hunch.ly’s Report Builder Is Great To Build Off Of
OneNote https://www.onenote.com
Win Text Editor https://notepad-plus-plus.org/
Mac Text Editor https://atom.io/
Backnote https://chrome.google.com/webstore/detail/backnote/
gcikdkpooobdlgkkimomdgochmclliek?hl=en-US
Paliscope https://www.paliscope.com (Free Standard Ed for LE)
Zotero https://www.zotero.org/
Private Notes https://app.standardnotes.org/
Office Alternative https://www.libreoffice.org/
Maps/Locations
https://www.google.com/maps https://www.osintcombine.com/
social-geo-lens
https://www.mapillary.com/ https://openstreetcam.org
https://ctrlq.org/maps/address/ https://livingatlas.arcgis.com/way-
back/
https://www.gpsies.com/track-
List.do https://www.zillow.com/
Classifieds
Ebay https://www.ebay.com/
Fatfingers http://fatfingers.com/default.aspx
Flippity http://www.flippity.com/
Kijiji https://www.kijiji.ca/
SearchAllJunk http://www.searchalljunk.com/
SearchTempest https://www.searchtempest.com/
NotiCraig https://noticraig.com/
Oodle https://www.oodle.com/local/burien-wa/
Offerup https://offerup.com/
Craigslist https://craigslist.org
Inteltechniques https://inteltechniques.com/menu/pages/communities.
links.html
User Names
Knowem https://knowem.com/checksocialnames.php?u=
NameChk https://namechk.com/
NameCheckr https://www.namecheckr.com/
NameVine https://namevine.com/
UserSearch https://usersearch.org/
UserSherlock http://usersherlock.com/
Profilr https://www.profilr.social/search/
Tinder https://www.gotinder.com/@user
Amazon https://www.google.com/search?q=site%3Aamazon.
com+%22name%22
SocialCatfish https://socialcatfish.com/reverse-username-search/
WhatsMyName https://github.com/webbreacher/whatsmyname
Sherlock https://github.com/sherlock-project/sherlock
Inteltechniques https://inteltechniques.com/menu/index.html
Real Name
“People” search engines
TruePeopleSch https://www.truepeoplesearch.com/
Spokeo https://www.spokeo.com/
Thatsthem https://thatsthem.com/
Adv Background https://www.advancedbackgroundchecks.com/
Nuwber https://nuwber.com/
FamTreeNow https://www.familytreenow.com/
PeopelByNm http://www.peoplebyname.com/
UFind http://ufind.name/…
PublicRcrds https://publicrecords.directory/
GoLookup https://golookup.com/
PMR http://publicemailrecords.com/name listings
Radaris https://radaris.com/
Cubib https://cubib.com/
ComLullar http://com.lullar.com/
Yasni http://www.yasni.com/
TabSearch https://www.zabasearch.com/
Spytox https://www.spytox.com/
Intelius https://www.intelius.com/
ZoomInfo https://www.zoominfo.com/
Whoodle https://www.whoodle.com/
PeekYou https://peekyou.com/
Webmil http://webmii.com/
CvGadget https://cvgadget.com/
Classmates https://www.classmates.com/
192 (UK) https://www.192.com/
Inteltechniques https://inteltechniques.com/menu/pages/person.tool.
html
Email
Don’t Forget A Basic Google Search “[email protected]”
Hunter.io https://hunter.io/ (make a free account)
HIBP https://haveibeenpwned.com/ (may be premium soon)
Verify https://tools.verifyemailaddress.io/
Verifalia https://verifalia.com/validate-email
Mailtester http://www.mailtester.com/testmail.php
FindThatEmail http://findthat.email/
AnyMailFinder https://anymailfinder.com/
EmailMatcher https://emailmatcher.com/
ProspectLinked https://prospectlinked.com/#/home
MetricSparrow http://metricsparrow.com/toolkit/email-permutator/
ThatsThem https://thatsthem.com/reverse-email-lookup
Spokeo https://www.spokeo.com/email-search
PsbDmp https://psbdmp.ws/
HackedEmails https://hacked-emails.com/
OCCRP https://data.occrp.org/search?q=gmail.com
Dehashed https://dehashed.com/
Hashes.org https://hashes.org/leaks.php
Gravatar https://en.gravatar.com/site/check/[email protected]
ReverseGenie http://www.reversegenie.com/searching=email
ManyContacts https://www.manycontacts.com/en/mail-check
ComLullar http://com.lullar.com/
Inteltechniques https://inteltechniques.com/osint/menu.email.html
Basic Guide https://www.blurbiz.io/blog/the-most-complete-
guide-to-finding-anyones-email
OSINT Flow Charts: https://www.dfir.training/osint
Domains/IPs
Censys https://censys.io
IntelX https://intelx.io
Domaintools https://www.domaintools.com/
CentralOps https://centralops.net/co/
Whoxy https://www.whoxy.com/
IPLocation https://www.iplocation.net/
DNSLytics https://dnslytics.com/reverse-ip
Randhome https://www.randhome.io/blog/2018/02/23/harpoon-
an-osint-/-threat-intelligence-tool/
CrimeFlare http://crimeflare.org:82/
Spyonweb http://spyonweb.com/
Pub-DB http://pub-db.com/
Whoisology https://whoisology.com/
Visualping https://visualping.io/
WatchThatPage http://watchthatpage.com/
PentestTools https://pentest-tools.com/information-gathering/
find-subdomains-of-domain#
SharedCount https://www.sharedcount.com/
SmallSEO https://smallseotools.com/backlink-checker/
SimilarWeb https://www.similarweb.com/
Alexa https://www.alexa.com/siteinfo/inteltechniques.com
Hunter.io https://hunter.io/
ViewDNS https://viewdns.info/
Robtex https://www.robtex.com/?=
Majestic https://majestic.com/
D-Me http://d-me.info/
Netcraft https://www.netcraft.com/
DomainBigData https://domainbigdata.com/
Inteltechniques https://inteltechniques.com/osint/domain.search.html
Inteltechniques https://inteltechniques.com/blog/2018/04/24/search-
ing-subdomains-with-findsubdomains-com/
IP6Locator http://ipv6locator.net/
ViewDNS https://viewdns.info/
Maxmind https://www.maxmind.com/en/home
IP2Location https://www.ip2location.com/demo/
IPFingerprints https://www.ipfingerprints.com/
ThatsThem https://thatsthem.com/reverse-ip-lookup
Netbootcamp https://netbootcamp.org/websitetool.html
Shodan https://www.shodan.io/
Inteltechniques https://inteltechniques.com/menu/pages/ip.tool.html#
Phone Numbers
For phone #s consider gov/paid options (OSINT is limited)
Zaba https://www.zabasearch.com/reverse-phone-lookup/
USPhoneBook https://www.usphonebook.com/
TruePeopleSearch https://www.truepeoplesearch.com/#
Whitepages+ https://whitepages.plus/
ThatsThem https://thatsthem.com/
TrueCaller https://www.truecaller.com/
Whitepages https://www.whitepages.com/reverse-phone | Reverse
Phone Lookup
411 https://www.411.com/reverse-phone
CellRevealer https://www.cellrevealer.com/
FoneFinder http://www.fonefinder.net/
WhoCalld https://whocalld.com/
SpyDialer https://www.spydialer.com/
Searchbug https://www.searchbug.com/tools/
NumberGuru https://www.numberguru.com/phone/
ReverseGenie http://www.reversegenie.com/
YellowPages https://people.yellowpages.com/whitepages/?re=SP
people search
Spokeo https://www.spokeo.com/reverse-phone-lookup
PhoneValidator https://www.phonevalidator.com/index.aspx
CallerIDTest https://www.calleridtest.com/
IMEI https://www.imei.info/
IMEI24 https://imei24.com/phone base/
Sync https://sync.me/
Infobel https://www.infobel.com/
DialingCode http://www.dialingcode.com/
OpenCnam https://www.opencnam.com/
TeleFoonGids https://telefoongids.2link.be/
ServiceObjects https://www.serviceobjects.com/developers/lookups/
geophone-plus
WTNG http://www.wtng.info/index.html
SeanLawson
https://www.seanlawson.net/2019/02/use-chrome-
developer-tools-view-masked-phone-numbers-for-free-
people-search/
NANPA https://www.nationalnanpa.com/enas/coCodeRepor-
tUnsecured.do?reportType=7
Inteltechniques https://inteltechniques.com/osint/menu.phone.html
Vehicles
CarOwners https://carsowners.net
NICB https://www.nicb.org/vincheck
OReilly https://www.oreillyauto.com/
Carvana https://www.carvana.com/
CheckThatVIN https://checkthatvin.com/ctv#/home
CarFax https://www.carfax.com/processQuickVin.cfx
VehicleHistory https://www.vehiclehistory.com/license-plate-search
CarOwners https://carsowners.net/
Misc. Tools & Tricks
Efficiency and Organizational Tools That I Use
Better Windows
File Search
https://www.voidtools.com/
Synced Notes https://www.onenote.com
Encrypted Coms https://signal.org/
Encrypted Coms https://wire.com/en/
Encrypted Email https://protonmail.com/ (use the free tier for burner/
seed accounts)
Hotkey Panel https://www.elgato.com/en/gaming/stream-deck
NAS/Local Cloud https://www.synology.com/en-us
Screen Capture https://www.techsmith.com/store/snagit
Screen Capture https://getfireshot.com/buy.php (pro supports multi-
page pdf)
Paper Notebooks https://www.costco.com/Moleskine-Cahier-6-Pack-Extra-
Large-Notebooks.product.100300742.html
Veracrypt
https://www.youtube.com/watch?v=cxo8xosH TI Vera-
crypt containers are ideal for archiving cases or placing
them on flash media for delivery to clients.
Tech Issues
https://stackoverflow.com/ Aside from Googling your
tech issues, stackoverflow has discussion on just about
any desktop or software issue.
Virtual Machines
Follow written steps verbatim when installing VMs
Buscador https://inteltechniques.com/buscador/
Virtualbox https://www.virtualbox.org/wiki/Downloads
VBox
Extensions
https://download.virtualbox.org/virtualbox/6.0.10/Oracle
VM VirtualBox Extension Pack-6.0.10.vbox-extpack
Kali Linux https://www.kali.org/downloads/
Tails https://tails.boum.org/
Update Linux apt-get update && apt-get upgrade
Update You-
tube-DL sudo -H pip install –upgrade youtube-dl
Common Error Make sure virtualization is enabled in BIOS settings
Host Key Win – Right Control Key Mac – Left Command Key
Vbox Scale
Issues
host + f, to switch to full screen mode, if not yet,
host + c, to switch to/out of scaled mode,
host + f, to switch back normal size, if need
3rd Party Over-
view https://www.youtube.com/watch?v=7Y fKC5EN10
LinkedIn
site:linkedin.com inurl:pub -inurl:dir “at Microsoft” “Current”
site:linkedin.com “Real Name”
User Query https://gitlab.com/initstring/linkedin2username
Email Query https://github.com/pry0cc/GoogLinked
Breach Data https://archive.org/details/LIUsers.7z
Inteltechniques https://inteltechniques.com/menu/pages/linkedin.
tool.html
Speed Tricks
Saving a few seconds here and there adds up over time
Context Search https://github.com/ssborbis/ContextSearch-web-ext
Add As Search
Engine
https://www.wired.com/2014/07/tip-week-chrome-site-
search/
Default to Last
Year
https://thepracticalsysadmin.com/defaulting-google-
search-results-to-the-past-year/
Keyboard
Shortcuts
https://www.quinnssmtbrand.com/windows-key-
board-shortcut/
Gaming
Legal requests: https://www.search.org/resources/isp-list/
Discord Search https://www.discordportal.com/
Discord Search https://discordservers.com/
Discord Search https://discord.center/
Discord Search https://disboard.org/
Discord Search https://discord.me/
Discord Search https://support.discordapp.com/hc/en-us/arti-
cles/115000468588-Using-Search
Discord Capture https://dht.chylex.com/ | Discord History Tracker
Twitch https://www.twitchtools.com/
Fortnite https://fortnitetracker.com/profile/search?q=
PSN https://psnprofiles.com/search/
Mixer https://www.lifewire.com/what-is-mixer-4156866
Steam https://steamrep.com/ or https://steamid.uk/
Business & Organizations
Google: resume AND “real name”
OpenCorp https://opencorporates.com/
Rocketreach https://rocketreach.co/
OCCRP https://data.occrp.org/
CorpWiki https://www.corporationwiki.com/
Recruitin https://recruitin.net/
Indeed https://www.indeed.com/
MarketVisual http://marketvisual.com/
AihitData https://www.aihitdata.com/
Glassdoor https://www.glassdoor.com/Reviews/index.htm
LittleSis https://littlesis.org/
OpenSanctions https://www.opensanctions.org/
CEOEmail https://ceoemail.com/
Enigma https://public.enigma.com/browse/collection/
corp-watch-company-subsidiaries/
Angel https://angel.co/
RipoffReport https://www.ripoffreport.com/
Sector035’s
Guide
https://medium.com/@sector035/gathering-company-in-
tel-the-agile-way-6db12ca031c9
Operational Security – Browsers
Browser, Session, and Site Tests
Device Fingerpint https://panopticlick.eff.org/
Browser Fingerpint https://amiunique.org/fp
Browser Fingerpint https://www.deviceinfo.me/
Browser Fingerpint https://browseraudit.com
Browser Fingerpint https://browserleaks.com/
Browser Fingerpint https://pixelprivacy.com/resources/browser-fin-
gerprinting/
Browser Fingerpint https://detectmybrowser.com/
IP Leaks https://ipleak.net
DNS Leaks https://www.dnsleaktest.com/
Email Leaks https://www.emailprivacytester.com
Site Privacy Test https://webbkoll.dataskydd.net/en/
Privacy Resources https://inteltechniques.com/links.html
Operational Security – Windows
Recommended Tools For Windows Security
Create Non-Priv-
ledged User
https://support.microsoft.com/en-us/help/4026923/win-
dows-10-create-a-local-user-or-administrator-account
Anti-Virus https://www.microsoft.com/en-us/windows/comprehen-
sive-security
Anti-Malware https://www.malwarebytes.com/mwb-download/
Anti-Spyware https://www.safer-networking.org/
Windows Privacy https://ssd.eff.org/en/module/how-delete-your-data-se-
curely-windows
Win10 Privacy https://www.thewindowsclub.com/privatewin10-ad-
vanced-windows-10-privacy-tool
Win10 Privacy https://fdossena.com/?p=w10debotnet/index 1903.frag
Check Your Micro-
Soft Data https://account.microsoft.com/account/privacy
Network Activity https://www.glasswire.com/
Password Manager https://keepassxc.org/
Cleaner https://www.bleachbit.org/download/windows
Cleaning Manually https://www.makeuseof.com/tag/best-way-clean-win-
dows-10-step-step-guide/
Common Missteps
Methodology is more important that tools or techniques because
those things change. Invest in defining strong process.
Are you signed into a live session for the platform you are query-
ing? ie: make sure you are signed into FB in another tab
Do you have script blockers that might be preventing data from
loading on a page? (ie:privacy badger, ublock, ghostery)
Failure to use non-OSINT approaches and strategies ie: social
engineering (consider a friendly phone call)
Including a space at the end when pasting a account ID or other
keyword into a query form field.
Start looking at page source to see what is going on behind the
scenes. If you only look at the gui, you are missing alot.
Location. Your search results are being scewed by yoru perceived
location, consider using VPN to “relocate”.
Tenacity wins the day. Most answers are not going to fall into
your lap. Patience and persistence above all else.
More OSINT Resources
https://docs.google.com/document/d/1BfLPJpRtyq4RFtHJoNpvWQjm-
GnyVkfE2HYoICKOGguA/ (Bellingcat Toolkit)
https://www.i-intelligence.eu/wp-content/uploads/2018/06/OSINT
Handbook June-2018 Final.pdf (I-Intelligence Collection)
https://medium.com/@sector035 (@sector035)
https://github.com/Ph055a/OSINT-Collection (OSINT.Team Collection)
https://www.osinttechniques.com/osint-tools.html
https://osintcurio.us/10-minute-tips/
https://www.learnallthethings.net/creepyosint (@baywolf88)
https://atlas.mindmup.com/digintel/digital intelligence training/index.
html
OSINT METHODOLOGY 101
BUILDING AN EFF IC IENT, REPEATABLE, AND ARTICULABLE PROCESS
Basic Investigative Steps
Working up your first case with your new tools and techniques
1. Set up your note-taking and data collection to track your work – paper notebook, One-Note, Hunch.ly, direc
tory on encrypted flash drive, etc.
2. List your investigative goals – full profile, locate for apprehension, identify associates, collect digital evi
dence, etc. (are you collecting intel or evidence for court?)
3. List your seed info – emails, phone numbers, names, etc.
4. Run all of your paid and/or gov queries and use those to add to your seed information. If possible get a
hold of a booking or DOL photo for comparison while researching social media.
5. Run Accurint (Lexis-Nexis), TLO, or Clear reports.
6. Fire up firefox/chrome with your plugins of choice – noscript, https everywhere, ghostery, fireshot, one-tab
(or use browsers in Buscador VM)
7. If it’s a serious investigation I turn on hunch.ly and enter my “selectors” (keywords from seed info)
8. I do a quick Google search and check my people finder site of choice for that week. [“James McIntire”
“Denver”] and then this week truepeoplesearch.com These are just quick for low hanging fruit.
9. Go to https://inteltechniques.com/menu.html (or your OSINT toolset of choice ie: osintframework.com)
and use the tabs on the left hand side to select the categories that match your seed info. My typical order is
email, real name, search engines, Facebook, twitter and then the rest depending on what you have to go
on.
10. I exhaust inteltechniques.com tools closing any tabs that return false positives or no useful results. Any
page that is important I note any identifiers (account IDs, user names, etc) on my notepad and fireshot a
pdf of the page. That pdf is saved in the case directory. On a case with multiple targets create subfolders
for each person of interest.
11. Either periodically or when I’m done with my research I copy/paste or manually enter any pertinent info
into a profile or case report in either word or one-note. I embed any pertinent screen captures, pdfs such
as lexis-nexis reports, and good photos of the targets, any vehicles and addresses.
12. I go over that report with the case detective or agent to explain my investigation and see if they have any
questions or want any additional info.
13. My rough notes, workbooks, hunch.ly files, and/or cloned VMs (if I used buscador) are usually saved in
case I need them for court. The exceptions are things like intel gathering for operations, events, threat assess-
ments, etc. A hunch.ly export might be burned to disc as evidence but be cautious of any unintend ed
data that might have been unintentionally saved during that session. The VM backup should not go into
evidence as it would divulge trade-craft. Treat it as an undercover laptop that you can refer to, but avoid
exposing it unless you are forced to (work with your prosecutor to fight this). If you don’t need that VM for
court, do not keep it (hording data comes with custodial responsibilities and potential liabilities).
14. I make sure I have a fresh VM for the next case or crisis that comes up. I also make new accounts to have
in pocket if any of my research accounts were burned. Better to prepare for the next case at the end of the
previous and be ready to go at a moments notice
15. Wash, rinse, repeat. Track successes to justify more equipment, staffing, and training.
Note: My standard setup is an off-grid windows pc, on a UC cable modem or mifi (VPN as appropriate).
For quick checks such as events, threats, etc. I stay in windows and just use chrome/Firefox and the links on
inteltechniques.com. This is for convenience and speed with less fuss when there’s less of a need for com
partmentalization, security, and/or anonymity. For investigations I typically use Buscador with Hunch.ly
installed, and all fresh research account. Quick utility vs. backstopped single purpose – use the right tool for
each mission.
ACCOUNT CREATION 101
BUILDING AN EFF IC IENT, REPEATABLE, AND ARTICULABLE PROCESS
Building Reliable Research Accounts
This is a list of recommended steps for creating investigative/research social media accounts. These are largely based on
feedback from our community and their experiences with having their accounts locked or suspended. Where applicable
steps are in order of preference in regards to successfully avoiding security challenges.
Equipment Setup – It may seem simple, but the equipment and connection you are on matters.
1. Avoid VPNs during account creation, most of their IP ranges are flagged
2. Mifi’s or dynamic IP devices work quite well for account creation
3. Public networks (Starbucks Wi-Fi) but be aware that you are being exposed and cross-correlated with other
users on that network
4. Phone #- A real non-VOIP phone number will save you a lot of hassle, we recommend a $5 Mint sim card kit
paired with an unlocked smart phone (mintmobile.com)
5. Online Footprint – “Google” your name and employer. Print the first two pages of results and include this in
your binder as the “low hanging fruit” of personal data.
Covert Accounts
1. We usually make FB, IG, and Twitter at once and tie them in as one covert profile. Each adds depth and verac
ity to the others (intentional cross correlation).
2. Keep notes on your covert details either in a paper notebook or a digital format like a password manager or
spread sheet, having your security requirements in mind.
3. If it is a sensitive or deep infiltration case make sure to compartmentalize this profile from the get-go (connec
tion, browser, device (use VM to isolate), etc.)
4. Connection:
a. no VPN during account creation, most VPN IP blocks are flagged
b. Cellular data connections (MiFi’s) are good – dynamic/shared IPs
c. Another technique is to get a free tier AWS EC2 or Digital Ocean VM and use it to make the account
as then you will have an AWS IP, this is more advanced but works pretty well if you are comfortable with
VMs and learning to navigate AWS. Some groups even run full investigative VMs on AWS, but again this
is a more advanced setup that takes some work to sort out.
d. Another advanced technique is to roll your own VPN thru AWS as the providers tend not to flag AWS
https://github.com/StreisandEffect/streisand
5. Email Address:
a. no Gmail, Hotmail, yahoo, or other top free mail (Gmx is an exception for now)
b. Private domains work best, grab a Namecheap or GoDaddy domain and webmail for cheap and make a
bunch of account with them
c. Gmx.us accounts seem to work ok (for now) and require no existing email or contact info
d. Sudomail and Protonmail addresses work ok, not as good as a private domain though
6. Phone #:
a. You might get lucky and not get the phone number requirement, but also sometimes it won’t require it
at first but then a couple hours or days in it will throw it at you as a security requirement
b. No VOIP – most number blocks are flagged
c. Mint test kits and an unlocked phone are a cheap way to get 7 days on a real number
1. Make sure you have Mint coverage in your area
2. https://www.amazon.com/Mint-Mobile-Starter-Verify-Compatibility/dp/B0786RD524 ($5 for
two sims)
3. You might then port the number over to google voice
4. Some groups buy these in bulk
d. You can also use an extra # on a real account (i.e.: Verizon) and then port it over to google voice and
then draw a new # for that Verizon account
e. Some people will also use hotel phones and the like when traveling to roll accounts, but that is kind of
ACCOUNT CREATION (CONT.)
BUILDING AN EFF IC IENT, REPEATABLE, AND ARTICULABLE PROCESS
7. Once we get into our new account, we do not leave it fallow, start making it feel real right away
8. Choose a name that is generic, but not too generic
a. i.e.: Nicky Robinson, Hunter Reynolds, etc.
b. http://howmanyofme.com/
9. Name, gender, city, employer (school) should make sense, remember a real person at FB will likely look at
your profile if it is reported as suspicious, we want to pass the smell test
10. Profile/cover photo
a. We don’t ever purport to be a specific individual without consent (i.e.: no identity theft)
b. Pikwizard.com – Good source for free for anything licensed photos
c. Pixabay.com is also decent
d. Avatar makers are another option https://mashable.com/2007/09/12/avatars/#mn3Ph1PwgZqi
e. fiverr.com – You can buy profile photos for cheap or anything else really…avoid buying bulk accounts,
they are often locked, scams, or stolen
f. I also like taking a pic from images.bing.com of a large crowd (road race, sporting event, concert), use
the snip tool to crop it, and then post the still large group shot, it’s unclear who we are in the group
and yet it’s the kind of content people post for profiles or banners because the internet is all about
bragging
g. Get creative – general rule is snip, crop, filter, logical pic choice
1. Time to flesh out our profile by making some friends
a. Join Groups – anything that has large groups that accept anyone
b. Nerdy groups and pop culture are my favs: video games, cosplay (cause then costumed profiles make
sense), etc.
c. If you are doing a deep infiltration you may have to research your targets groups, don’t join her/his
groups directly, join similar and work your way in slowly after you have some history
d. Do some liking and commenting in groups for a day or two
e. then https://www.facebook.com/find-friends/browser/ and let FB recommend friends. We never cold
call friends anymore, we let FB tell who it’s already cross correlated with our profile. This reduces
chances of getting flagged significantly.
2. Posts: August 1st Facebook cut off all 3rd part app access except for messenger or FB pages. We formerly
used IFTTT and WordPress to auto-post but they are broken for now. IFTTT still works for twitter.
3. Avoid political chat and comments. Politics and social issues are high on the radar of the FB watchdogs due
to the fake news and voter tampering concerns.
4. Keep track of covert accounts in a spread sheet or better yet a password manager.
5. Sim jacking Twitter accounts is very popular so use long passphrases even on your sock accounts and consid
er 2-factor if they are mature or otherwise valuable accounts
6. Know your agencies policies around things like friending and any levels of approval or documentation req-
uired
7. …and of course, we always use our powers for good so we always assume that our investigation will eventu
ally see the light of day so make sure you are proud of how your activity will look in retrospect by an objective
3rd party in regard to reasonable and responsible
Note: This is purely anecdotal, but in addition to “getting into character” and making our accounts feel real, I suspect
that there may be some value to occasionally clicking on ads and other content that the platform is pushing at you. This
is not a privacy/security best practice, but there are detection algorithms that may favor revenue positive accounts.
Again, this is just a theory.
REPORTING
SAMPLE COVER/FACE SHEET
LOGO HERE
Company/Org Name
Section or Analyst Name
Open Source Investigative Profile
Summary of Findings
Subject ID
Name: DOB:
Address: Phone #1:
Phone #2:
Employer: SS#:
Vehicles:
Alternate Identities and Associations
Email #1: Email #2:
Email #3: Email #4:
User Name: UN #2
Facebook : FB #
Twitter: TW #:
Instagram: IG #:
Photos/Video
☐Photos
☐Video
Description Source
Attachments
☐ Excel Profile Report
☐ Data Source DVD
☐ Photographs
☐ Hunch.ly Archive
☐ Link Analysis Report
☐ Comprehensive TLO, Clear, Accurint Report
☐ DOL/GOV Checks
☐ Other: ____________________
Relatives:
SHORTCUTS & HOT-KEYS
COMPLETING 1,000 SMALL TASKS A L IT TLE FASTER
Windows Shortcut Keys Shortcuts for Mac
Windows Key + R: Opens the Run menu. Command + X: Cut selected text and copy it.
Windows Key + E: Opens Explorer. Command + C: Copy selected text.
Alt + Tab: Switch between open programs. Command + V: Paste copied text.
Windows Key + Up Arrow: Maximize current window. Command + Z: Undo previous command.
Ctrl + Shift + Esc: Open Task Manager. Command + A: Select all items.
Windows Key + Break: Opens system properties. Command + F: Open Find window to search text.
Windows Key + F: Opens search for files and folders. Command + H: Hide windows of the front app.
Windows Key + D: Hide/display the desktop. Command + N: Open a new document or window.
Alt + Esc: Switch between programs in order they were opened. Command + O: Open a selected item.
Alt + Letter: Select menu item by underlined letter. Command + P: Print current document.
Ctrl + Esc: Open Start menu. Command + S: Save current document.
Ctrl + F4: Close active document (does not work with some applications). Command + W: Close front window.
Alt + F4: Quit active application or close current window. Command + Q: Quit the app.
Alt + Spacebar: Open menu for active program. Command + M: Minimize the front window to the Dock.
Ctrl + Left or Right Arrow: Move cursor forward or back one word. Command + Spacebar: Open Spotlight search field.
Ctrl + Up or Down Arrow: Move cursor forward or back one paragraph. Command + Tab: Switch between open apps.
F1: Open Help menu for active application. Command + B: Bold selected text.
Windows Key + M: Minimize all windows. Command + I: Italicize selected text.
Shift + Windows Key + M: Restore windows that were minimized with
previous keystroke.
Command + U: Underline selected text.
Windows + F1: Open Windows Help and Support. Command + Semicolon (;): Find misspelled words in document.
Windows + Tab: Open Task view. Option + Command + Esc: Choose an app to force quit.
Windows + Break: Open the System Properties dialog box. Shift + Command + Tilde (~): Switch between open windows.
Hold Right SHIFT key for eight seconds: Switch FilterKeys on and off. Shift + Command + 3: Take a screenshot.
Left Alt + Left Shift + Print Screen: Switch High Contrast on and off. Fn + Up Arrow: Scroll up one page.
Left Alt + Left Shift + Num Lock: Switch Mouse keys on and off. Fn + Down Arrow: Scroll down one page.
Press Shift five times: Switch Sticky keys on and off. Fn + Left Arrow: Scroll to beginning of document.
Hold Num Lock for five seconds: Switch Toggle keys on and off. Fn + Right Arrow: Scroll to end of document.
Ctrl+Tab Switch Between Program Groups
F11 Maximize Window Finder Shortcuts
Ctrl+A Select Text (Expanded with Windows 10) Shift + Command + F: Open All My Files window.
Ctrl+C Copy Text Shift + Command + K: Open Network window.
Ctrl+V Paste Text Option + Command + L: Open Downloads folder.
Win+R, then type ‘cmd’ Command Prompt Shift + Command + O: Open documents folder.
Tab Autocomplete Folder or File Name Shift + Command + U: Open Utilities folder.
Alt-Tab Switch Between Open Applications Option + Command + D: Show or hide the Dock.
Windows logo key + Tab Task View Shift + Command + N: Create a new folder.
Windows logo key + X Shutdown Your Workstation Command + Delete: Move selected item to the Trash.
Windows logo key + L Lock Your Workstation Shift + Command + Delete: Empty Trash.
*www.quinnssmtbrand.com/windows-keyboard-shortcut/
SHORTCUTS & HOT-KEYS
COMPLETING 1,000 SMALL TASKS A L IT TLE FASTER
Chrome
Shortcut Keys Description
Alt+Home Open your homepage.
Alt+Left Arrow Back a page.
Alt+Right Arrow Forward a page.
F11 Display the current website in full-screen mode. Pressing F11 again will exit this mode.
Esc Stop loading the page or a download from loading.
Ctrl+(- or +) Zoom in or out of a page, “-” will zoom out and “+” will zoom in on the page.
Ctrl+1-8 Pressing Ctrl and any number 1 through 8 moves to the corresponding tab in your tab bar.
Ctrl+9 Switch to last tab.
Ctrl+0 Reset browser zoom to default.
Ctrl+Enter This combination is used to quickly complete an address. For example, type “computerhope” in the
address bar and press Ctrl+Enter to get https://www.computerhope.com.
Ctrl+Shift+Del Open the Clear browsing data window to quickly clear private data.
Ctrl+Shift+B Toggle the bookmarks bar between hidden and shown.
Ctrl+A Select everything on a page.
Ctrl+D Add a bookmark for the page currently opened.
Ctrl+F Open the “find” bar to search text on the current page.
Ctrl+O Open a file in the browser.
Ctrl+Shift+O Open the Bookmark manager.
Ctrl+H Open browser history in a new tab.
Ctrl+J Display the downloads window.
Ctrl+K or Ctrl+E Moves your text cursor to the omnibox so that you can begin typing your search query and per-
form a Google search.
Ctrl+L Move the cursor to the browser address bar and highlight everything in it.
Ctrl+N Open New browser window.
Ctrl+Shift+N Open a new window in incognito (private) mode.
Ctrl+P Print current page or frame.
Ctrl+R or F5 Refresh the current page or frame.
Ctrl+S Opens the Save As window to save the current page.
Ctrl+T Opens a new tab.
Ctrl+U View a web page’s source code.
Ctrl+W Closes the currently selected tab.
Ctrl+Shift+W Closes the currently selected window.
Ctrl+Shift+T This combination reopens the last tab you’ve closed. If you’ve closed multiple tabs, you can press
this shortcut key multiple times to restore each of the closed tabs.
Ctrl+Tab Moves through each of the open tabs going to the right.
Ctrl+Shift+Tab Moves through each of the open tabs going to the left.
Ctrl+Left-click Open a link in a new tab in the background.
Ctrl+Shift Left-click Open a link in a new tab and switch to the new tab.
Ctrl+Page Down Open the browser tab to the right.
Ctrl+Page Up Open the browser tab to the left.
Spacebar Moves down a page at a time.
Shift+Spacebar Moves up a page at a time.
Home Go to top of page.
End Go to bottom of page.
Alt+Down Arrow Display all previous text entered in a text box and available options on a drop-down menu.
*Shortcut List Source: www.computerhope.com
SHORTCUTS & HOT-KEYS
COMPLETING 1,000 SMALL TASKS A L IT TLE FASTER
Firefox
Shortcut Keys Description
F5 Refresh current page, frame, or tab.
F11 Display the current website in fullscreen mode. Pressing F11 again will exit this mode.
Esc Stop page or download from loading.
Spacebar Moves down a page at a time.
Alt+Home Open your homepage.
Alt+Down arrow Display all previous text entered in a text box and available options on drop-down menu.
Alt+Left Arrow Back a page.
Alt+Right Arrow Forward a page.
Ctrl+(- or +) Increase or decrease the font size, pressing ‘-‘ will decrease and ‘+’ will increase. Ctrl+0 will reset
back to default.
Ctrl+D Add a bookmark for the page currently opened.
Ctrl+F Access the Find option, to search for any text on the currently open web page.
Ctrl+H View browsing history.
Ctrl+I Display available bookmarks.
Ctrl+J Display the download window.
Ctrl+K or Ctrl+E Move the cursor to the search box.
Ctrl+L Move cursor to address box.
Ctrl+N Open New browser window.
Ctrl+O Access the Open File window to open a file in Firefox.
Ctrl+P Print current page or frame.
Ctrl+T Opens a new tab.
Ctrl+U View a web page’s source code.
Ctrl+F4 or Ctrl+W Closes the currently selected tab.
Ctrl+F5 Refresh the page, ignoring the Internet cache (force full refresh).
Ctrl+Enter Quickly complete an address.
Ctrl+Tab Moves through each of the open tabs.
Ctrl+Shift+Del Open the Clear Data window to quickly clear private data.
Ctrl+Shift+B Open the Bookmarks window, to view all bookmarks in Firefox.
Ctrl+Shift+J Open the Browser Console to troubleshoot an unresponsive script error.
Ctrl+Shift+P Open a new Private Browsing window.
Ctrl+Shift+T Undo the close of a window.
Ctrl+Shift+W Close the Firefox browser window.
Shift+Spacebar Moves up a page at a time.
Ctrl+Shift+Tab Moves through each of the open tabs going to the left.
Ctrl+Left-click Open a link in a new tab in the background.
Ctrl+Shift Left-click Open a link in a new tab and switch to the new tab.
Ctrl+Page Down Open the browser tab to the right.
Ctrl+Page Up Open the browser tab to the left.
Spacebar Moves down a page at a time.
Shift+Spacebar Moves up a page at a time.
Home Go to top of page.
End Go to bottom of page.
Alt+Down Arrow Display all previous text entered in a text box and available options on a drop-down menu.
*Shortcut List Source: www.computerhope.com
BUSCADOR 2.0
OSINT L INUX DISTRO
Installation Notes (2.0)
You will need a Virtual Machine application in order to use this system. VirtualBox is free and will suffice for most investigations. Some users prefer a
more robust option with VMWare Workstation for Windows or VMWare Fusion for Mac. Any of these options will get you started.
VirtualBox Installation and Configuration:
* Make sure you have latest version of VirtualBox and VirtualBox Extension Pack installed
1) In the VirtualBox menu, click on File > Import Appliance
2) Navigate to the OVA file that was downloaded (Buscador)
3) Choose this file and select “Import”
4) Before starting the new machine, highlight it and choose “Settings”
5) Under General > Basic, rename this machine as desired (Buscador?)
6) Under General > Advanced, change Shared Clipboard to Bi-Directional
7) Under System > Motherboard, increase the RAM if you have ample resources (half of total system)
8) Under Display > Screen, increase the Video Memory to 128MB is available
9) Under Shared Folders, click the “plus” on the right, choose folder to store evidence, select “Auto-Mount”
10) Click “OK” twice, then launch the new machine (Double Click)
11) Upon boot, log into the user “osint” with the password of osint
12) In the VirtualBox Menu, select Devices > “Insert Guest Additions CD Image”
13) Click “Cancel” when the dialogue box pops up.
14) Open Terminal (Tilex)
15) In Terminal, Create a directory on the Desktop titled vbox: mkdir ~/Desktop/vbox
16) Copy everything from the CD media on the Desktop to vbox folder (copy/paste)
17) In Terminal, input the following commands:
cd Desktop/vbox
chmod +x *.sh
./autorun.sh
(type password when prompted)
18) Allow the image to be installed, and reboot upon completion.
19) Start the Terminal in the new VM and type sudo adduser osint vboxsf
20) Provide the password as needed (osint)
21) Reboot
You should now have access to the shared directory in order to save data to the host operating system (evidence). It can be found in the File Manag-
er (Home), on the left column, titled “sf_” followed by the name of the folder to which it is connected. This shared folder will also be on your desk-
top for easy access. You can make the machine full-screen, copy and paste text to and from the image, and you are ready to begin using the applica-
tions.
Support & Updates
Open Tilix (Terminal), and enter the following commands:
NOTE:
Update_scripts no longer needed!
Video Download Update:
sudo -H pip install –upgrade youtube-dl
Spiderfoot Update:
cd /opt/spiderfoot
git reset –hard
git pull
sudo reboot

Everyone needs a little help with academic work from time to time. Hire the best essay writing professionals working for us today!
Get a 15% discount for your first order
Order a Similar Paper Order a Different Paper