Database and Research Matrix.pptx

Database: Indexing database and
Citation database
1. What is database:
A database is a collection of information that is organized so that it can be easily accessed, managed and updated. Computer
databases typically contain aggregations of data records or files, containing information about sales transactions or
interactions with specific customers.
2. Indexing database:
Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses required when a
query is processed. It is a data structure technique which is used to quickly locate and access the data in a database.
Indexes are used to quickly locate data without having to search every row in a database table every time a database table is
Accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random
lookups and efficient access of ordered record

Types of Indexing in DBMS
Indexes are created using a few database columns.
1. The first column is the Search key that contains a copy of
the primary key or candidate key of the table. These values
are stored in sorted order so that the corresponding data
can be accessed quickly.
2. The second column is the Data Reference or Pointer which
contains a set of pointers holding the address of the disk
block where that particular key value can be found.
The indexing has various attributes:
Access Types: This refers to the type of access such as value based search, range access, etc.
Access Time: It refers to the time needed to find particular data element or set of elements.
Insertion Time: It refers to the time taken to find the appropriate space and insert a new data.
Deletion Time: Time taken to find an item and delete it as well as update the index structure.
Space Overhead: It refers to the additional space required by the index.

Types of Indexing in database
In general, there are two types of file organization mechanism which are followed by the indexing methods to store the
data:
1. Sequential File Organization or Ordered Index File: In this, the indices are based on a sorted ordering of the values.
These are generally fast and a more traditional type of storing mechanism. These Ordered or Sequential file
organization might store the data in a dense or sparse format:
(i) Dense Index:
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with that search key value.
This helps you to search faster but needs more space to store index records.

(ii) Sparse Index:
• The index record appears only for a few items in the data file. Each item points to a block as shown.
• To locate a record, we find the index record with the largest search key value less than or equal to the search key value we
are looking for.
• We start at that record pointed to by the index record, and proceed along with the pointers in the file (that is, sequentially)
until we find the desired record.
• Number of Accesses required=log₂(n)+1, (here n=number of blocks acquired by index file)
However, sparse Index stores index records for only some
search-key values. It needs less space, less maintenance
overhead for insertion, and deletions but It is slower
compared to the dense Index for locating records.

2. Hash File organization: Indices are based on the values being
distributed uniformly across a range of buckets. The buckets to which a
value is assigned is determined by a function called a hash function.
There are primarily three methods of indexing:
• Clustered Indexing
• Non-Clustered or Secondary Indexing
• Multilevel Indexing

1. Clustered Indexing
• When more than two records are stored in the same file these types of storing known as cluster indexing. By
using the cluster indexing we can reduce the cost of searching reason being multiple records related to the
same thing are stored at one place and it also gives the frequent joining of more than two tables (records).
• Clustering index is defined on an ordered data file. The data file is ordered on a non-key field. In some cases,
the index is created on non-primary key columns which may not be unique for each record. In such cases, in
order to identify the records faster, we will group two or more columns together to get the unique values
and create index out of them. This method is known as the clustering index. Basically, records with similar
characteristics are grouped together and indexes are created for these groups.
• For example, students studying in each semester are grouped together. i.e. 1st Semester students, 2nd
semester students, 3rd semester students etc. are grouped.

2. Non-clustered or Secondary Indexing
• A non clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or references to the location
where the data is actually stored. Data is not physically stored in the order of the index. Instead, data is present in leaf
nodes. For eg. the contents page of a book. Each entry gives us the page number or location of the information stored. The
actual data here(information on each page of the book) is not organized but we have an ordered reference(contents page)
to where the data points actually lie. We can have only dense ordering in the non-clustered index as sparse ordering is not
possible because data is not physically organized accordingly.
• It requires more time as compared to the clustered index because some amount of extra work is done in order to extract
the data by further following the pointer. In the case of a clustered index, data is directly present in front of the index.

3. Multilevel Indexing
With the growth of the size of the database, indices also grow. As the index is stored in the main memory, a single-level index
might become too large a size to store with multiple disk accesses. The multilevel indexing segregates the main block into
various smaller blocks so that the same can stored in a single block. The outer blocks are divided into inner blocks which in turn
are pointed to the data blocks. This can be easily stored in the main memory with fewer overheads.

Citation Databases
• Citation databases are databases that have been developed for evaluating
publications. The citation databases enable you to count citations and check, for
example, which articles or journals are the most cited ones.

What is a citation database?
• Citation databases are collections of referenced papers/ articles/ books and
other material entered into an online system (database) in a structured and
consistent way. All the information relating to a single document (author, title,
publication details, abstract, and perhaps the full text) make up the ‘record’ for
that document. Each of these items of information becomes a separate ‘field’ in
that record and enables the document to be retrieved via any of these items, or
by keywords

Why use a citation database?
• A citation database allows you to access published, peer-reviewed, high-quality material such as journal articles,
research reports, systematic reviews, conference proceedings, editorials, and related works. When a document is
originally entered into a database it is analysed for its key subjects, and descriptors (MeSH terms in MEDLINE, PubMed
etc.) are assigned to it. MeSH terms are Medical Subject Headings, which is a controlled vocabulary thesaurus used for
indexing and cataloguing articles for medical and biomedical purposes. These MeSH terms allow precise searching as
the databases search for these specific terms in a hierarchical order.
• Searches can then be limited, for example, by author or title fields, or year/s of publication, and keywords can be
focused and searched separately. Searches undertaken in citation databases are therefore more precise, and
comprehensive than searches on general internet search engines and the results are of consistently higer quality and
reliability.

Why not just use Google or Google Scholar?
• Searching on Google, or similar internet search engines, will return at least a few sources on almost any topic, but
finding high-quality, reliable, and the most relevant sources is less likely. Google is an internet search engine that returns
and ranks results on the ‘basis of popularity’ with no filters to remove bias or unreliable information. Results returned
are from ‘all internet material’. There are many peer-reviewed scholarly articles that do not appear on open websites so
they cannot be searched by Google due to subscription requirements. Internet searches also return many results of
dubious quality.
• By comparison, Google Scholar provides a simple way to ‘broadly search for scholarly literature’. It searches across many
disciplines and a variety of sources. It ranks a document by where it was published, who it was written by, as well as how
often and how recently it has been cited in other scholarly literature.
• If you were to use only Google or Google Scholar, significant articles would not be retrieved due to sorting, vocabulary,
and subscription limitations of these search engines. However it can sometimes help to find the full text of an article,
and occasionally will retrieve useful information not found in a database search.

• The most important multidisciplinary databases containing citation information are Web of Sciece (WoS) by Thomson Reuters and Scopus by
Elsevier. Citations can also be retrieved from Google Scholar (GS) keeping in mind the limitations of the database
Feature
Web of Science (more
information)
Scopus (more
information)
Google Scholar (more information)
Availability subscription based subscription based freely accessible
Number of
journals
22 000 peer-reviewed
journals
23 500 peer-reviewed
journals
information is not publically
available
Other contents
conference proceedings,
books
conference proceedings,
professional magazines,
patents and book series
books, pre-prints, theses and
dissertations, and webpages
Main disiplines
Natural Sciences,
Technology, Social Sciences,
Fine Arts and Humanities
Physics, Technology,
Health Sciences, Bio
sciences, Fine Arts and
Humanities,
Social Sciences
availabale
Time span
from 1900 (Science), 1956
(Social Sciences) and 1975
(Arts and Humanities),
accessble
records back to 1788 information is not publically
availabale
Up-dates weekly daily
information not publically available,
but more or less monthly
Collection policy public public
information not publically available,
contracts with most significant
publishing houses
Citation analysis Citation Report -tool
View citation overview -
tool
search report with a 'Cited by' link,
giving all pulications which cite the
publication in question
Time span of
citation
information
from 1900 (Science), from
1956 (Social Sciences) and
from 1975 (Arts and
Humanities); citation
statistics available at Oulu
University Library for the
whole period, but the
referencing articles only
available from 1975
cited references dating
back to 1970
availabale
Web of
Science
Scopus
Google
Scholar
Indicators
Journal Citation Reports:
- Article Influence (AI)
- Eigenfactor
- H-index
- Immediacy Index
- Impact Factor (IF)
- H-index
- Raw impact per publication (RIP)
- SCImago Journal Rank (SJR)
- Source normalized impact per
paper (SNIP)
- Field-Weighted Citation Impact
- H-index
Tools
- Journal Citation Reports
- Eigenfactor
- ScienceWatch
- Scival
- SCImago Journal and Country
Rank
- CWTS Journal Indicators
- Publish or Perish
University
rankings
- Shanghai Ranking eli
Academic Ranking of
World Universities (ARWU)
- National Taiwan
University Ranking (NTU)
- University Ranking by
Academic Performance
(URAP)
- U.S. News & World
Report's Best Global
Universities Rankings
- CWTS Leiden Ranking
- U-Multirank
- Review of the state of
scientific research in
Finland by The Academy of
Finland
- Times Higher Education World
University Rankings
- QS World University Rankings
- Webometrics
- Webometrics
Researche
r profile
- ResearcherID
- Scopus Author Identifier
- also Scopus Affiliation Identifier
- Google Scholar
Profile

What are research metrics?
• Research metrics are quantitative tools used to help assess the quality and impact of research outputs. Metrics are
available for use at the journal, article, and even researcher level. However, any one metric only tells a part of the story
and each metric also has its limitations. Therefore, a single metric should never be considered in isolation.
• For a long time, the only tool for assessing journal performance was the Impact Factor – more on that in a moment.
Now there are a range of different research metrics available, from the Impact Factor to altmetrics, h-index, and more.
• But what do they all mean? How is each metric calculated? Which research metrics are the most relevant to your
journal? And how can you use these tools to monitor your journal’s performance?
• Keep reading for a more in-depth look at the range of different metrics available.

How to identify the right metrics for your journal
Who is your target audience?
For journals with a practitioner focus, academic citations may be less valuable than mentions in policy documents (as
reported by Altmetric). If your journal is for a purely academic audience, traditional citation metrics like Impact Factor are
more relevant. If your journal has a regional focus, then geographical usage might be important to you.
What are you trying to achieve?
If your objective is to publish more high-quality, high-impact authors, consider analyzing the h-indices of authors in recent
volumes to assess whether you’re achieving this. If your aim is to raise your journal’s profile within the wider community, it
makes sense to consider altmetrics in your analysis. Perhaps your goal is to generate more citations from high-profile
journals within your field – so looking at Eigenfactor rather than Impact Factor would be helpful.
What subject area are you working in?
The relevancy of different research metrics varies hugely between disciplines. Is Impact Factor appropriate, or would the 5-
year Impact Factor be more representative of citation patterns in your field? Which metrics are your competitors using? It
might be more useful to think about your journal’s ranking within its subject area, rather than considering specific metrics in
isolation.
What business model does your journal use?
For journals following a traditional subscription model, usage can be particularly crucial. It’s a key consideration for
librarians when it comes to renewals.

How to interpret research metrics
It’s tempting to reach for simple numbers and extrapolate meaning, but be careful about reading too closely into metrics. The best strategy is to
see metrics as generating questions, rather than answers.
Metrics simply tells us “what”. What are the number of views of the work? What are the number of downloads from the journal? What are the
number of citations?
To interpret your metrics effectively, think less about “what” and use your metrics as a starting point to delve deeper into “who”, “how”, and
“why”:
• Who is reading the journal? Where are they based, what is their role, how are they accessing it?
• Who are the key authors in your subject area? Where are they publishing now?
• How are users responding to your content? Are they citing it in journals, mentioning it in policy documents, talking about it on Twitter?
• How is your subject area developing? What are the hot topics, emerging fields, and key conversations?
• Why was a specific article successful? What made the media pick up on it, what prompted citations from other journals, who was talking about
it?
It’s easy to damage the overall picture of your research metrics by focusing too much on one specific metric. For example, if you wanted to boost
your Impact Factor by publishing more highly-cited articles, you might be disregarding low-cited articles used extensively by your readers.
Therefore, if you chose to publish only highly-cited content for a higher Impact Factor, you could lose the value of your journal for a particular
segment of your readership.
Generally, the content most used by practitioners, educators, or students (who don’t traditionally publish) is not going to improve your Impact
Factor, but will probably add value in other ways to your community.
Fundamentally, it’s important to consider a range of research metrics when monitoring your journal’s performance. It can be tempting to
concentrate on one metric, like the Impact Factor, but citations are not the be-all and end-all.
Think about each research metric as a single tile in a mosaic: you need to piece them all together to see the bigger picture of journal performance.

I10/H10 Index
• I10-Index is a sole metrics design and develop by Google Scholar and
used in Google’s My Citations feature. It can be defined as follows:
10-Index = The number of publications with at least 10 citations
• I10-Index will give you a number of your online published articles
citations and how much of it was cited by other authors with more
than 10 times. If your article has been referred by more than 10
authors and citation in their research articles means your article is
having a separate core value as far as the quality article metrics
concern.

Citations take very long to appear in meaningful quantities
Citation metrics are dependent on the corpus that is used for
calculation
A single indicator is not sufficient to assess impact

Elsevier promotes the responsible use of research metrics encapsulated in two
‚golden rules‛. Those are:
• Always use both qualitative and quantitative input for decisions (i.e. expert opinion
alongside metrics)
• Always use more than one research metric as the quantitative input.
Using multiple complementary metrics can help to provide a more complete picture
and reflect different aspects of research productivity and impact in the final
assessment.

Cite Score metrics
• CiteScore metrics are a suite of indicators calculated from data in Scopus, the
world’s leading abstract and citation database of peer-reviewed literature.
• CiteScore itself is an average of the sum of the citations received in a given
year to publications published in the previous three years divided by the sum
of publications in the same previous three years.
• CiteScore is calculated for the current year on a monthly basis until it is fixed as
a permanent value in May the following year, permitting a real-time view on
how the metric builds as citations accrue.

SCImago Journal Rank (SJR)
• SCImago Journal Rank (SJR) is based on the concept of a transfer of
prestige between journals via their citation links. Drawing on a similar
approach to the Google PageRank algorithm – which assumes that
important websites are linked to from other important websites .
• SJR weights each incoming citation to a journal by the SJR of the citing
journal, with a citation from a high-SJR source counting for more than a
citation from a low-SJR source. Like CiteScore, SJR accounts for journal size
by averaging across recent publications and is calculated annually.
• SJR is also powered by Scopus data and is freely available alongside
CiteScore at www.scopus.com/sources

Source Normalized Impact per Paper (SNIP)
• Source Normalized Impact per Paper (SNIP) is a sophisticated metric
that intrinsically accounts for field-specific differences in citation
practices. It does so by comparing each journal’s citations per
publication with the citation potential of its field, defined as the set of
publications citing that journal.
• SNIP therefore measures contextual citation impact and enables
direct comparison of journals in different subject fields, since the
value of a single citation is greater for journals in fields where
citations are less likely, and vice versa.
• SNIP is calculated annually from Scopus data and is freely available
alongside CiteScore and SJR at www.scopus.com/sources.

Database and Research Matrix.pptx

Database and Research Matrix.pptx

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Database and Research Matrix.pptx

Similar a Database and Research Matrix.pptx (20)

Último

Último (20)

Database and Research Matrix.pptx