Feed aggregator

Remembering Thomas N. Burbridge, Superhero of Science and Medicine

Brought to Light Blog - Wed, 2016-10-19 08:00

The Society of American Archivists’ Science, Technology and Health Care roundtable recently launched a project titled Forgotten Superheroes of Science and Medicine to highlight “underrepresented and diverse persons and groups in collections of the history of science, technology and health care.” We’ll be contributing to this project by periodically posting to the blog regarding these heroes.

Thomas N. Burbridge, MD, PhD (1921-1972), was an African-American scientist, physician, and civil rights activist. He devoted his life to social justice and his work continues to impact UCSF and the larger San Francisco community.

Thomas N. Burbridge. Photograph collection, portraits.

Burbridge was born in New Orleans, Louisiana in 1921. He attended Talladega College and later joined the US Navy. After years of military service, he enrolled in the UCSF School of Medicine, earning his MD in 1948. He completed his residency at San Francisco General Hospital and then enrolled in a graduate program in the UCSF Department of Pharmacology.

While in graduate school, Burbridge helped lead UC’s efforts to support development of medical education at the University of Indonesia, following the Indonesian fight for independence. Burbridge worked with local students and officials in Jakarta from 1952-1955.

Thomas N. Burbridge with his wife and medical students at the University of Indonesia, 1952-1955. Photograph published in the Alumni-Faculty Association Bulletin of the UCSF School of Medicine, Winter 1956. University Publications.

After returning to the US, Burbridge joined the faculty of the UCSF School of Medicine in 1956, where he conducted research related to the pharmacology of alcohol and the metabolism of marijuana. As a teacher and scientist, Burbridge advocated for increased minority student enrollment at UCSF. In the 1960s, he led recruiting trips to predominantly black universities in the southern United States, speaking with students about opportunities in the health sciences. He also served as a leader of the San Francisco chapter of the NAACP and organized sit-ins of auto dealerships and other businesses in protest of their discriminatory employment practices. This non-violent, direct action strategy brought about equal employment opportunities for people of color in San Francisco.

Memorial of Dr. Thomas Burbridge. Published on the back cover of the October 1972 edition of the Black Bulletin, a newsletter created by UCSF’s Black Caucus. Black Caucus records, MSS 85-38.

Following Burbridge’s death in 1972, the UCSF Black Caucus petitioned Chancellor Philip Lee to name a Chancellor’s Award in his honor. Today, Burbridge’s legacy continues to inspire the UCSF community through the Thomas N. Burbridge Chancellor’s Award for Public Service.

UCSF Archives & Special Collections houses the Thomas Nathaniel Burbridge Papers, 1959-1972 and other related collections. Please make an appointment if you would like to research the material.

Categories: Brought to Light

“Create petitions for smokers to sign - and as they sign up...


https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/sfwl0187


https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/sfwl0187

“Create petitions for smokers to sign - and as they sign up add them to the database…This can’t look like a marketing campaign. It must look like a true corporate initiative.”

The “Additive Freedom = Smokers Freedom” campaign is from a 2004 Brainstorming session for Winston brands 50th Anniversary. Ideas included “golden anniversary” promotions such as parties at the Playboy mansion, a Rolling Stones concert and film festival, as well as an evolution in corporate image to a brand that works for smokers rights.  In honor of 50 years, Winston execs planned to create a grassroots smokers rights campaign that would really serve as a conduit for direct marketing to consumers.

Read more at the UCSF Truth Tobacco Industry Documents Library:

BRAINSTORMING SUMMARY (2004)
URL : https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/sfwl0187

Using the New RefWorks; Adding and Organizing Information

In Plain Sight - Mon, 2016-10-17 09:09

Importing information to the new RefWorks is ridiculously easy. We will refer to RefWorks as RW for the rest of this note.

Citing articles in Word is very similar to “Legacy” RW.

Now for some details.

PubMed

Once you have identified articles from PubMed you wish to save to RW, click on the Save to RefWorks bookmarklet. You will see a list of the articles on your PubMed page, check off the ones you want, then Save to RefWorks at the bottom of the page.  See image below. That’s it!

Most other databases (e.g., Web of Science, PsycINFO, Sociological Abstracts, etc.) work the same way.

An exception is Embase. Choose the articles you want from Embase, select Export (see image below). 

Next, choose Direct Export to RefWorks (see image below), download a small file to your computer and then import that file into RefWorks by clicking the + icon on top left bar of RW page.

PDFs you can drag and drop a PDF on RW and it will find citation information for you.

For websites, click on Save to RefWorks bookmarklet. RefWorks, like all reference managers, has trouble finding the information it needs to create a citation from websites. Always check the information to make sure RW has it right

For GoogleScholar, set preferences to RefWorks and you can import one article at a time.

   To do so see two images below and left.

  1. Go to scholar.google.com.

  2. Click on Settings

  3. Change Bibliography Manager to RefWorks

Finally, you may organize what you put in RW by checking the boxes of the articles you would like to place in a folder. Then click on the folder icon (red arrow below). Finally either create a new folder or add checked articles to an existing folder.

The final installment of this series will be about using RefWorks to add citations and references to Microsoft Word or GoogleDocs.

Final post in this series will discuss how to use the new RW to cite article in Word or GoogleDocs. –Whit

Categories: In Plain Sight

Archives Talk 10/21/16: Historical Medical Collections in the 21st Century

Brought to Light Blog - Tue, 2016-10-11 13:15

Date: Friday, October 21, 2016
Time: 12 pm – 1:15 pm
Lecturer: Jeffrey S. Reznick, PhD (NLM)
Location: Lange Room, 5th Floor, UCSF Library – Parnassus
530 Parnassus Ave, SF, CA 94143

This event is free and open to the public. Light refreshments will be provided.
REGISTRATION REQUIRED: http://calendars.library.ucsf.edu/event/2851245

 Join UCSF Archives & Special Collections for an afternoon talk with Jeffrey S. Reznick, PhD, chief of the History of Medicine of the National Library of Medicine, the world’s largest biomedical library, located on the Bethesda, Maryland, campus of the National Institutes of Health.
In this talk, Reznick will offer an overview of the division, its current partnerships and programs, and its future plans as he and his colleagues embrace the future as stewards of the past, as the NLM itself anticipates its third century under the leadership of Patricia Flatley Brennan, PhD, RN., RN.

Jeffrey S. Reznick, PhD, chief of the History of Medicine at the National Library of Medicine

Reznick joined the NLM in 2009 following his tenure as director of the Institute for the Study of Occupation and Health of the American Occupational Therapy Foundation. Dr. Reznick’s record of scholarly historical research is as extensive as his executive career in the national nonprofit sector. As a social and cultural historian of medicine and war, he maintains and active research portfolio supported by the Intramural Research Program of the National Institutes of Health, and he is the author of two books, both published by Manchester University Press in its Cultural History of Modern War series, as well as numerous book reviews, articles for the popular press, and entries in major reference works.

About the UCSF Archives & Special Collections Lecture Series
UCSF Archives & Special Collections launched this lecture series to introduce a wider community to treasures and collections from its holdings, to provide an opportunity for researchers to discuss how they use this material, and to celebrate clinicians, scientists, and health care professionals who donated their papers to the archives.

Categories: Brought to Light

Data Munging Addendum: The Long Way To Handle Comma Delimited Lists

CKM Blog - Mon, 2016-10-10 08:17

In an earlier post, we discussed the issue of comma delimited lists within an excel spreadsheet. This way of one-to-many relationships in data can make it more difficult to build look up tables, run queries, and do other types of analysis. Although there are some concise coding approaches, both in SQL and pandas, sometimes you just want to [give up on trying to be clever and] reconstruct your data frame line by line. Here’s a quick overview on how to do this.

You can follow along, cut and paste into your own notebook, or view/checkout the code from github.

Let’s go ahead and build a pandas dataframe with comma delimited information in some of the cells.

import pandas as pd import numpy as np

First, we’ll create three lists

ar1 = [1,2,3,4,5] ar2 = [2,4,6,8,10] ar3 = ['one','two,three,four','three,four,five','four,five','five']

Next, add the lists as columns to a dataframe

df = pd.DataFrame({'A' : ar1, 'B' : ar2, 'C' : ar3})

And, of course, if you query it through pandasql, you’ll get a single row with a comma delimited list

from pandasql import sqldf pysqldf = lambda q: sqldf(q, globals()) pysqldf("SELECT * FROM df WHERE A = 3")

Like last time, we want to have each value for C on a separate row, with corresponding values for A and B.

Here’s how to do this the computationally long and expensive way (which is, in fact, sometimes the way you have to do things – sometimes because performance doesn’t matter and you’re tired of trying to be clever, sometimes because the logic is so intricate that you have to knit it all together line by line anyway).

We’ll create three new arrays (again, to hold the column values).

a0 = [] a1 = [] a2 = []

Next, we’ll loop through each row of our existing dataframe. We’ll split on the the third row (we start counting from 0 in a list, so that will be at index 2). Splitting on the comma will create a new array with three new strings. We’ll add a new value to each column for each string (word) in that row.

for index, row in df.iterrows(): for s in row[2].split(','): a0.append(row[0]) a1.append(row[1]) a2.append(s)

Now, let’s create a new data frame out of the three columns we just created

ndf = pd.DataFrame({'A' : a0, 'B' : a1, 'C' : a2})

Take a look

ndf

We can now query both dataframes using pandasql

from pandasql import sqldf pysqldf = lambda q: sqldf(q, globals())

Querying our original dataframe, you’ll see everything on one comma delimited line

pysqldf("SELECT * FROM df WHERE A = 3")

Quering our new dataframe, you’ll set a separate line for each entry

pysqldf("SELECT * FROM ndf WHERE A = 3")

It’s the long way to do this, but sometimes the long way can save you a lot of time.

Categories: CKM

Pfizer’s Neurontin/gabapentin clinical trials - A clear and...


https://www.industrydocumentslibrary.ucsf.edu/docs/#id=jmjm0223


https://www.industrydocumentslibrary.ucsf.edu/docs/#id=nfjm0223

Pfizer’s Neurontin/gabapentin clinical trials - A clear and deliberate pattern of reporting bias, ghostwriting and “spin”

Johns Hopkins’ Kay Dickersin, in her 2008 expert report for the Court, reviewed the internal company documents and concluded that Pfizer conducted a marketing and publication strategy that suppressed and misrepresented negative findings.  

This September 29, 2000, email thread from Pfizer (top image) and the subsequent email 2 years later (bottom image) demonstrates this reporting bias as well as a use of ghostwriters that pervades Parke-Davis/Pfizer’s clinical trials and subsequent marketing of its drug, Neurontin: 

“I think that we can limit the potential downsides of the 224 study by delaying the publication for as long as possible and also from where it is published. More importantly it will be more important to how WE write up the study.  We are using a medical agency to put the paper together which we will show to Dr. Reckless.  We are not allowing him to write it up himself.” pg.1 

Then in 2002 - “By the way, Christine, from a MKT point of view we are not interested at all in having this paper published because it is negative!!!”

Study 224 was a randomized, double-blind placebo-controlled study looking at the use of gabapentin/Neurontin for painful diabetic neuropathy.  The lead on the study was Dr. Reckless (not a pseudonym!). Industry documents show him pushing for publication of his study #224 after it had concluded but Pfizer attempted to first suppress and then delay publication in journals because it felt the study lacked positive results. In the end they conclude they may publish but only if Dr. Reckless does not write the article himself.  

Read the rest of the email threads at:
“Re: 25 and 26”, 2000
Re: Reckless Contact Information”, 2002

Find over 1,153 documents in the Neurontin Litigation Documents collection at the UCSF Drug Industry Documents Archive.

Article Spotlight: Industry efforts to shape understanding of tobacco-attributable deforestation

Industry Documents Library - Mon, 2016-10-03 12:24

Every month, we highlight a newly published article along with a few key industry documents used by the author(s):

Lee K, Carrillo Botero N, Novotny T. ‘Manage and mitigate punitive regulatory measures, enhance the corporate image, influence public policy’: industry efforts to shape understanding of tobacco-attributable deforestation Global Health. 2016 Sep 20;12(1):55-016-0192-6.

The percentage of deforestation caused by tobacco farming reached 4% globally by the early 2000s but was substantially higher in countries such as China (18 %),
Zimbabwe (20 %), Malawi (26 %) and Bangladesh (>30 %). Transnational tobacco companies (TTCs) have argued that tobacco-attributable deforestation is not a serious problem, and that the industry has addressed the issue through corporate social responsibility (CSR) initiatives such as reforestation. The authors reviewed the tobacco industry documents as well as the existing literature on tobacco and deforestation in order to understand how the industry framed this issue and sought to undermine economic policy: by emphasizing the benefits of production in low and middle income countries, by blaming alternative causes of deforestation, and claiming successful forestation efforts on their part.

Key Documents from the UCSF Truth Tobacco Industry Documents:

  • The WHO reported woodfuel curing requires one tree per 300 cigarettes. To counter these concerns, the industry initiated a “pro-active strategy” against “WHO’s propagandist views” focusing on “common interests” between the industry and farmers and claiming economic solidarity with tobacco farmers in the developing world.
    https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=xhlh0196
    https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=kghy0085
  • The ITGA (International Tobacco Growers Association) published an editorial in its in-house journal, Tobacco Forum, which claimed there were many other industries responsible for this deforestation. It stated: “A lot of nonsense is promulgated about the use of wood by tobacco farmers. Typical of such misinformation, an article published in the UN Department of Information’s ‘Development Forum’…claimed that ‘perhaps one out of every eight trees worldwide is used for curing tobacco’. The fact is that the tobacco industry as a whole accounts for significantly less than 1 % of all wood consumed in the developing world, not all of which is used for curing. The tobacco industry is only one of many industries which use wood as fuel.”
    https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=zjkn0198
  • The industry rejected the idea that transnational tobacco company activities in developing nations were to blame for deforestation and instead blamed the lack of government action: “Where Third World governments have generally encouraged the development of tobacco, their forestry departments have often been slow to recognize the need for reforestation. Tobacco companies have, therefore, taken the initiative, encouraging farmers to plant trees either individually or on a cooperative basis, even providing free seedlings for both depleted forestland and new land…”
    https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=yrfv0037
  • The Framework Convention on Tobacco Control’s Alliance Bulletin in 2001 reports:
    “In Uganda, BAT has been planting the fast growing eucalyptus trees to replace depleted indigenous species like the shea butter tree whose oil is used in cooking in many parts of Northern Uganda. The eucalyptus is an anti-social thirsty tree. Its fast growth rate places a great demand on the soil water and nutrients, while its fallen leaves contain chemicals that discourage the growth of other vegetation near the tree”
    https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=ltlj0054

New Documents Posted

Industry Documents Library - Fri, 2016-09-30 15:01

238 new documents have been posted to the Industry Documents Library.

This includes:

A research study for BAT focuses on the underlying motives and...





A research study for BAT focuses on the underlying motives and attitudes associated with the smoking habits and practices of children and adolescents.

A 1993 study by Hugh Bain Research for the British American Tobacco company specifically focuses on what motivated a focus group of children and adolescents to start smoking. This information was gathered in order to “develop a more sophisticated understanding of the repertoire of consumer benefits which can meaningfully be attached to cigarette smoking in advertising for BAT brands.”

Read the rest of The Psychology of Significant Moments and Peak Experiences in Cigarette Smoking: The Motivations and Semiological Significances of Smoking: Qualitative Research Report.

Author: Hugh Bain Research

Document Date: November 1993

Find over 1,600,000 documents in the British American Tobacco collection at the UCSF Industry Documents Library.

Archives Month – October 2016

Brought to Light Blog - Thu, 2016-09-29 08:38

October is Archives Month! Along with archives from across the country, we’re celebrating the value of historical records and the preservation of the past.

We have special events planned on Wednesday, October 5. Visit us in the Library 5th floor Reading Room from 12noon-1pm to view historical collections, tour library exhibits, and meet archives staff. Also, tweet your questions all day @ucsf_archives using #AskAnArchivist. RSVP preferred for the open house – sign up here.

Categories: Brought to Light

New items to be digitized from the Eric L. Berne papers

Brought to Light Blog - Wed, 2016-09-28 10:07

Thanks to fundraising and donations from the International Transactional Analysis Association, we are embarking on another round of digitization of papers from the Eric L. Berne collections.

The papers selected for digitization in this round will range from early fiction writings to publisher correspondence, photographs, writings foundational to TA theory, materials documenting ITAA history and other items that continue to round out our understanding of Berne’s life, personality, intellectual process, and legacy.

Eric Berne with Fritz Perls

You can expect to find these items online on Calisphere, alongside previously digitized items under their respective collection numbers. As always, guides to the collections are available on the Online Archive of California. 

Categories: Brought to Light

Data Munging with Python, SQL, and Excel

CKM Blog - Tue, 2016-09-27 10:32

So much of data science is, in fact, data munging and wrangling, moving and transforming it from one format to another. Data, when you’re fortunate enough to know where to find it, is almost never in the nicely organized format you need for your analysis. This is one of the reasons python is so popular among data scientists – it is a good language and environment for collecting, formatting, parsing, combining and splitting data from different sources.

Data frequently arrives in comma delimited files or excel spreadsheets. You can do certainly some analysis with spreadsheet operations, but odds are you’ll eventually want to load it into a data frame and use python (or R) for more meaningful analysis. This post is a write up of a few tips I learned from my recent efforts to wrangle some genomic data available on the web as Excel downloads. These spreadsheets presented a few common issues that arise when dealing with non-normalized data sets in single table format.

The exact nature of the data isn’t really the focus here, but for some context: researchers at UCSF often need information about a gene, variants of that gene, and the effect this Gene has on responsiveness of different tumors or conditions to treatment. There are a number of different places to find this data, a number of different datasets, and (of course) varied ways to search, filter, or query those datasets. Searching and finding can be a long, error prone, irritating, manual process.

We’ll use Python, Pandas, and Jupyter Notebook to try to get a better handle on some of this data. I’m assuming you already know how to open a 1) Jupyter Notebook and issue basic Python commands, and 2) Use pandasql to run SQL queries against a dataframe. (If you don’t, consider signing up for a Software Carpentry class or attend one of our Python/R workshops at Mission Bay).

Otherwise, you can follow the jupyter and python installation documents (I used anaconda for both).

A full jupyter notebook for the code in this post is available on the ucsf-ckm github repository.

Create a DataFrame from an Excel Spreadsheet

We’ll use a spreadsheet from The Precision Medicine Knowledge Base. To follow along, click on the “Download All Interpretations (Excel)” link.

Before analyzing, let’s load the excel spreadsheet into a pandas DataFrame. Open up a jupyter notebook (or just the python interactive shell)  to start.

First off, remember to include the pandas module…

import pandas as pd

Pandas has an easy method to load a spreadsheet (I’m assuming the spreadsheet is in your working directory, otherwise you’ll need to edit the path)

ipm = pd.read_excel("IPM_Knowledgebase_Interpretations_Complete_20160913-2207.xlsx")

Now take at your newly populated dataframe

ipm

You’ll see that your spreadsheet headers and data have become the column names and rows of a pandas dataframe. Let’s try using pandasql to run a query on it. First, load the module and globals (more information on this)

from pandasql import sqldf pysqldf = lambda q: sqldf(q, globals())

And try running a query. Let’s get the variants that match a particular Gene.

pysqldf("SELECT Gene, [Tumor Type(s)], [Variant(s)] FROM ipm WHERE Gene = 'PIK3CA'")


Note: you must use the brackets around Tumor Type(s) so the white space and parentheses around (s) won’t be interpreted as SQL.

Even without any additional data munging, you’re in a pretty good spot for analyzing your data. You have it in a dataframe, where you can run queries, python expressions, and pandas operations on it. However, there are a few issues, common to spreadsheets, that may make this data harder to work with and analyse.

Relational Databases frequently have “one to many” relationships. In this case, a Gene has a one to many relationship with Tumor Types, Variants, and Citations. Each Gene has an effect on multiple Tumor Types, each Gene can have multiple Variants, and the effect a Gene and Variant has on a Tumor Type can have multiple Citations.

This spreadsheet stores the data for one to many relationships in two different ways. For Genes to Tumor Types and Variants, the spreadsheet provides a comma delimited list in a single cell. For Citations, the spreadsheet adds tacks on a varying number of columns to the right side of the spreadsheet. Although this does provide the data, it can make the data harder to work with.

The next two sections will review techniques for converting comma delimited lists and multiple columns into a one-to-many lookup table.

Common Issue #1 – Comma Delimited Lists

You may notice that although the Genes are provided as single (atomic) values, other fields, such as Variants or Tumor Types are provided as a comma delimited list. This won’t be much of a problem if you want to find all Tumor Types associated with a Gene – the query is straightforward:

pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE Gene = 'CSF3R'")

You’ll get back a single row with a comma delimited list of Tumor Types, rather than a separate row for each Tumor Type, but you can parse that relatively easily.

Now, suppose you wanted to reverse this query, to find all genes that that match a particular Tumor Type. In this case, a query like the one above won’t work, as it will miss fields that have multiple Tumor Types separated by commas. Because SQL will look for an exact match, you’ll won’t get all the results for a query like this.

pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE [Tumor Type(s)] = 'Diffuse Large B Cell Lymphoma'")

Note that you only received a single row from this query, even though there are multiple records that match this tumor type.  SQL does provide a way to find sub-patterns in a text field. You can get all records (sort of/kind of/hand waving) with a LIKE query

pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE [Tumor Type(s)] LIKE '%Diffuse Large B Cell Lymphoma%'")

NOTE: you may not want the text truncated in your results. To handle this, set a property on your dataframe:

pd.set_option('display.max_colwidth', -1)

Although this works, you might want to split the comma delimited values into separate rows to create a Tumor_Type to Gene lookup table (perhaps to put it into first or higher normal forms https://en.wikipedia.org/wiki/First_normal_form). As always, there are a number of different ways to do this. You can certainly do this through SQL and temporary tables, but since we’re in python and can access this table as a dataframe, let’s try a python solution.

First, let’s get the Gene and Tumor Type as a dataframe

tumor_types = pysqldf("SELECT Gene, [Tumor Type(s)] as Tumor_Type FROM ipm")

Next, we’ll split the comma delimited tumor_types into separate rows.

gene_tumor_types = pd.DataFrame(tumor_types["Tumor_Type"].str.split(',').tolist(), index=tumor_types["Gene"]).stack() gene_tumor_types = gene_tumor_types.reset_index()[[0, 'Gene']] gene_tumor_types.columns = ['Tumor_Type', 'Gene']

See stack overflow for a nice discussion of this solution and other pandas dataframe based solutions.

Take a look at the resulting gene_tumor_types data frame.

gene_tumor_types

You now have an association from Gene to Tumor Type, with each tumor type as an individual row rather than as a comma delimited list. To get the Genes associated with a particular Tumor Type, we no longer need a LIKE query.

pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type = 'Diffuse Large B Cell Lymphoma'")

Wonderful! Except that… not so wonderful, it didn’t work – we’re missing data! There’s a big gotcha here. Compare the earlier LIKE query and this one. As an exercise, you might want to stop and try to figure out why (answer is in the next paragraph).

Common Issue # 2: Leading or Trailing White Space

This takes us to another common pitfall – white space! Don’t forget, an equals operator in SQL (and programming languages in general) is an exact match. “ Hello” and “Hello” do not match!

Take a look at the dataframe for gene_tumor_types  – you’ll notice many of the Tumor Types have leading whitespace. This prevents the exact match from occurring, though you will find them through like queries, which find it as a partial match. You can still them them through a LIKE query

pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type LIKE '%Diffuse Large B Cell Lymphoma'")

But that’s a hack and kind of defeats the purpose of creating a new lookup table. We should be able to get this through an equality operator. Let’s trim the whitespace from this column.

gene_tumor_types["Tumor_Type"] = gene_tumor_types["Tumor_Type"].str.strip()

And take a look at the list to see the whitespace has been removed

gene_tumor_types["Tumor_Type"]

Now retry the LIKE query and the exact match query – you’ll see that you are now retrieving all the rows.

pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type = 'Diffuse Large B Cell Lymphoma'")

Common Issue # 3: Repeated Columns

Another common spreadsheet practice is to tack on a variable number of columns to store one-to-many data relationships. Take a look toward the end (right hand side) of the spreadsheet (or ipm dataframe)

ipm

Each row has one or more citations. This spreadsheet stores the one to many relationship by creating a new column for each set.

Unfortunately, this does make it more difficult to query, since we need to know in advance how many Citations to query. Furthermore, the column headers that hold Citations beyond the first one don’t have names, making the query less informative.

For instance, not all Citations have a multiple citations. To get the citations for Gene JAK1, we’d need to write:

pysqldf("SELECT Gene, Citations, [Unnamed: 7], [Unnamed: 8], [Unnamed: 9], [Unnamed: 10], [Unnamed: 11], [Unnamed: 12], [Unnamed: 13], [Unnamed: 14] FROM ipm WHERE Gene = 'JAK1'")

This query will return all the citations for Gene “JAK1”. However, if you run this query against Gene “MPL”, you’ll receive a value of “None” for several columns. By contrast, if you run this query against Gene “MYD88”, you’ll miss a number of citations that extend out to “Unnamed: 26”.

It would be more convenient to be able to write a query like this:

pysqldf("SELECT Gene, Citation from ipm”)

And receive a separate row for each citation.

Let’s create a lookup table for Gene and Citations. There are, as always, a number of different ways to accomplish this, through SQL or pandas. In this case, we’ll use SQL with a python loop to create a “UNION ALL” query.

query = "SELECT Gene, [Tumor Type(s)], [Variant(s)], Tier, Interpretations, Citations as Citation FROM ipm WHERE Citations != 'None'" for i in range(7, 27): query += (" UNION ALL SELECT Gene, [Tumor Type(s)], [Variant(s)], Tier, Interpretations, [Unnamed: {val}] as Citation FROM ipm WHERE [Unnamed: {val}] != 'None' ".format(val=i)) query += ("ORDER BY Gene")

This approach uses a python loop to build a SQL statement. The UNION ALL statement combines the results of more than one query into a single output table. In this case, we are taking querying each Gene and Citation combination and outputting the results into separate rows. You can take a look at the full query (it’s long, so I’m not posting it here, just view the output of the “query” string in jupyter or interactive python).

Let’s look at the results of this table

gene_citations = pysqldf(query)

Let’s re-run the query for JAK1 and MPL.

pysqldf("SELECT Gene, Citation FROM gene_citations WHERE Gene = 'JAK1'") pysqldf("SELECT Gene, Citation FROM gene_citations WHERE Gene = 'MPL'")

You are now able to get clean, multiple row results from the Python dataframe through a one-to-many table relation.

We’ll tackle a few more data sources for responsiveness of various tumors and conditions to genes and variants in future posts. Stay tuned.

And, as always, if you’d like to learn more about Python or R, please consider signing up for a Software Carpentry Class or attend one of our workshops at Mission Bay!

Categories: CKM

”I understand from Johnny, Jr. (Gus Wayne) that the idea of...





”I understand from Johnny, Jr. (Gus Wayne) that the idea of candy Philip Morris 4’s for children has already been mentioned.“

#TBT - Johnny Jr. was a popular marketing personality for Philip Morris in the late 1940s and 1950s.  Along with radio and TV appearances, Johnny Jr. the bellhop gave away samples of Philip Morris cigarettes and autographs at public appearances.

The 1953 memo above floats an idea that having Johnny Jr show up at leading department stores would be a good publicity stunt for PM. The author notes this may not sell cigarettes at the venue, “but it would create Philip Morris in the minds of our future smokers.  This would be a good gimmick to get women and children into the department stores and in return they would give Philip Morris good publicity in area.  I understand from Johnny, Jr. (Gus Wayne) that the idea of candy Philip Morris 4’s for children has already been mentioned.”

Title: Johnny, Jr. Operation
Author : Porterfield, Jack
Document Date : 1953 December 19
URL : https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/rmgj0045

Forgotten Super Heroes of Science and Medicine: Choh Hao Li

Brought to Light Blog - Thu, 2016-09-15 07:37

The Society of American Archivists’ Science, Technology and Health Care roundtable recently launched a project to highlight “underrepresented and diverse persons and groups in collections of the history of science, technology and health care.” The section is calling this endeavor the “Forgotten Super Heroes of Science and Medicine.” UCSF Archives & Special Collections will be contributing to this project by periodically posting to the blog regarding these heroes. This is our first installment.

Biochemist Choh Hao Li was among the first to synthesize the human growth hormone and later discovered beta-endorphin. Born in 1913 in Guangzhou, China, Li graduated from the University of Nanjing before moving to the US to attend graduate school at UC Berkeley in 1935. Upon earning his Ph. D. in Organic Chemistry in 1938, Li began working on the UC Berkeley campus at the Institute of Experimental Biology with Herbert McLean Evans. In 1950, Li became the first director of the newly created Hormone Research Laboratory. He moved with the laboratory to UCSF in 1967, where Li worked until his retirement in 1983. As an emeritus professor at UCSF, Li then established the Laboratory of Molecular Endocrinology, where he remained director until his death in 1987.

Dr. Li spent most of his career studying the functions of the pituitary gland, which is located at the base of the brain and controls many of the body’s functions.  At the Institute of Experimental Biology, Li first began his attempts to isolate and identify the anterior pituitary hormones; he was eventually able to isolate and purify six of the eight known hormones secreted. It wasn’t until the early 1970s, when heading the Hormone Research Laboratory, that Li was able to actually synthesize human growth hormone. Later that decade, Li discovered beta-endorphin, a neuropeptide that acts as a pain killer. Before his retirement, Li was also able to synthesize insulin-like growth factor 1, a protein that mediates the effects of growth hormone. During his lifetime, Li published over 1100 scientific articles, was given many awards, including the Albert Lasker Award for Basic Medical Research, and was nominated at least twice for the Nobel Prize.

The Choh Hao Li papers are open for research at UCSF Archives & Special Collections: http://www.oac.cdlib.org/findaid/ark:/13030/tf738nb543/

Categories: Brought to Light

“RJR can gain competitive advantage among military YAS (young...




https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=ssbb0048

“RJR can gain competitive advantage among military YAS (young adult smokers).”

In 1989, RJ Reynolds created a report that detailed their promotion of tobacco products to the military through the Military YAS Program. The program declared that the “Military YAS program is the downscale smoker equivalent to Marlboro’s college YAS program in 1960’s” and suggested that Camel had potential among white military YAS while Salem could grow among Black military YAS.

URL: https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=ssbb0048

Author: unknown

Document Date: 1989

See more documents from the Marketing to Military, Minorities, and the Gays Collection at the UCSF Industry Documents Library.

Memo from attorney to BAT R&D - “It is important that...





Memo from attorney to BAT R&D - “It is important that contact between the scientists should be routed through the lawyers…” 

It is “our desire to create a modus operandi to ensure that legal professional privilege is not lost. Because correspondence on the subject of Buerger’s disease exchanged between you and your colleagues in other companies might not be privileged, it is important that contact between the scientists should be routed through the lawyers. In addition, you should ensure that any internal memoranda written on the subject of Buerger’s disease in relation to the current investigations should be captioned “Privileged and Confidential”. 

Title: Buerger’s Disease
Author: Foyle, Andrew
Document Date: 1988 March 21
Collection: Ness Motley Law Firm Documents

https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=hyfp0042

This 1988 British American Tobacco (BAT) document on Buerger’s disease, a rare condition of the arteries and veins that occurs mostly in people who are also tobacco users, is just one of many examples of the tobacco companies’ practice of hiding information from public scrutiny using inappropriate privilege (attorney-client, etc) claims in preparation for possible litigation.       

LeGresley and Lee, in their 2016 study, note that BAT has asserted inappropriate privilege claims over 49% of the documents reviewed (n=63) for their paper.   

Upcoming Lecture: “Vaccination and Society Since the Sixties”

Brought to Light Blog - Wed, 2016-09-07 13:30

Date: Friday, September 30, 2016
Time: 12 pm – 1:15 pm
Lecturer: Elena Conis, PhD (UC Berkeley & UCSF)
Location: Lange Room, 5th Floor, UCSF Library – Parnassus
530 Parnassus Ave, SF, CA 94143

This event is free and open to the public. Light refreshments will be provided.
REGISTRATION REQUIRED: tiny.ucsf.edu/vaccination930

Join UCSF Archives & Special Collections for an afternoon talk with author Elena Conis as she discusses her book Vaccine Nation: America’s Changing Relationship with Immunization. A limited number of books will be available for purchase.

W. McD. Hammon with triplets participating in a polio study at the Hooper Foundation (UCSF)

The past fifty years have witnessed an enormous upsurge in vaccine use in the United States: American children now receive more vaccines than any previous generation, and laws requiring their immunization against a litany of diseases are standard. And yet, while vaccination rates have soared and cases of preventable infections have plummeted, an increasingly vocal cross-section of Americans have questioned the safety and necessity of vaccines. In this talk, Elena Conis explores the emergence of widespread acceptance – and rejection – of vaccines from the 1960s to the present, finding the origins of today’s vaccination controversies in historical debates over topics ranging from national security to body piercing to the role of women in contemporary society. Vaccine acceptance, she argues, has never been simply a scientific matter, but one profoundly shaped by our politics, economics, and culture.

Elena Conis, PhD

Elena Conis is a writer and historian of medicine, public health, and the environment. She is a member of the faculty of the Graduate School of Journalism at UC Berkeley and an affiliated faculty member of the Department of Anthropology, History, and Social Medicine at UCSF. Previously, she was a history professor and the Mellon Fellow in Health and Humanities at Emory University; the Cain Fellow at the Chemical Heritage Foundation; and an award-winning health columnist for the Los Angeles Times. Her first book, Vaccine Nation, won the Arthur J. Viseltear Award from the American Public Health Association and was named a Choice magazine outstanding title and a pick of the week by the journal Nature. She is currently working on a book on the history of the pesticide DDT. She holds a PhD in the history of health sciences from UCSF, masters degrees in journalism and public health from Berkeley, and a bachelors degree in biology from Columbia University.

About the UCSF Archives & Special Collections Lecture Series
UCSF Archives & Special Collections launched this lecture series to introduce a wider community to treasures and collections from its holdings, to provide an opportunity for researchers to discuss how they use this material, and to celebrate clinicians, scientists, and health care professionals who donated their papers to the archives.

Categories: Brought to Light

New Documents Posted for September

Industry Documents Library - Fri, 2016-09-02 09:08

Greetings!


934 new documents were added to the Truth Tobacco Industry Documents yesterday.



This includes:

  • 815 RJR documents
  • 30 Philip Morris documents
  • 89 Depositions and Trial Testimony (DATTA) documents


  • Happy Labor Day!

Cigarette brand promotional events frequently encourage alcohol...







Cigarette brand promotional events frequently encourage alcohol use. Why? Because it has been shown that linking cigarettes with alcohol reinforces the use of both substances and makes it harder to quit smoking…

In the mid 1980′s, RJ Reynolds wanted to position Camel as “THE younger adult brand” and “reinforce the target prospect’s psychological desire to attain an image of being independent, adventurous and masculine”.  The marketing campaign above pitches a certain masculine lifestyle coupled with alcohol as a way to get young adults to buy Camel.  The promotional events involve “skill, dexterity, strength and other masculine qualities” such as T-Shirt contests (men and women); Beer Chug contests; Special drink promotions; video game competitions; decathlons involving beer and beach towels; six-pack ring pull; etc.

Read the entire document at the UCSF Industry Documents Library:

OBJECTIVES OF CAMEL FIELD MARKETING PROMOTIONS.
URL : https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/xhdh0083
Author : Unknown
Document Date : 1984 November 29

Syndicate content