Oddly Shaped Pegs

An inquiry into the Nature and Causes of Stuff

2010 Sloan Fellows

leave a comment »

Once again this year, theorists and their friends were well represented among the Sloan Research Fellows. These are the names I recognize:

  • Joel Tropp
  • Jonathan Kelner
  • Constantinos Daskalakis
  • Amin Saberi
  • Brent Waters
  • Balázs Szegedy

Congratulations to all!

Written by adamdsmith

February 17, 2010 at 11:00 pm

Posted in Uncategorized

Tagged with , ,

Collaborative tools?

with 2 comments

Like most theorists I collaborate a lot with remote colleagues1. Recent technological developments like the telephone have made this much easier, but I am ready to move on to something even more advanced.

I have started to use this Internet thing. For example, I use cvs or subversion on most projects to track and synchronize files (see Suresh’s recent post for an explanation of why). I call people on Skype when the connection is good enough for video, and use a mix of email, IM and Wave2 to jot down notes in real time about the conversation.

One thing is missing in my collaborative online life, though, and that is a good white board. Ideally, I would like a live holographic image of my colleagues in my office, together with sound and a remotely controllable marker for my whiteboard. Until I write a paper with the doctor from Star Trek Voyager, I’m willing to settle for something less ambitious: an interactive white board on which several people can simultaneously draw, paste/drop in graphics, and write text and latex equations that get formatted in real time.

There are a few free web services and software packages out there that offer something along these lines:

  • Scriblink: this has a lot of what I want but a very clunky interface.
  • Imagination^3: better drawing interface, no latex/math support
  • Latex plugins for Gaim: adds simple latex support to most IM platforms. No drawing. Also, annoying to install on a Mac since it requires a bunch of other packages to be installed first.
  • Jarnal: haven’t tried it yet.

A question for readers, then: what other similar tools are out there? More generally, what software has improved your remote collaborations?

1 I mean physically remote. Some of them are also emotionally remote, but so far neither the telephone nor the Internet have helped them.

2 Although Google’s recent Buzz privacy fiasco made me reconsider using Wave.

Written by adamdsmith

February 14, 2010 at 9:49 pm

A private event

leave a comment »

It wasn’t exactly in stealth mode, but I heard about Data Privacy Day 2010 only after it happened.

Born of an effort to promote awareness of data privacy issues by the non-profit The Privacy Projects, this year’s celebration (?) included events at a several universities. Most interesting to me was a roundtable discussion at UC Berkeley sponsored by the Federal Trade Commission. I’m skeptical about how much the federal government will do about protecting privacy, but it is good to see serious interest.

This year’s events concentrated on consumer privacy and its apparent conflict with emerging business models. My recent research has been on handling privacy concerns in “statistical databases” — large collections of sensitive information that we would like to open up to wider scrutiny and analysis. Unsurprisingly, I would like to see “Data Privacy Day” also cover this aspect of data privacy. There is a danger, though, that the topic becomes too diffuse. What are really the most pressing privacy issues, and what should a broad “data privacy” awareness event cover?

UPDATE (2/3/10): Arvind was part of the roundtable and has some notes on it at 33 bits. He includes there some interesting comments on academics’ participation in policy discussions. I’ll add only that at Penn State, quite a few faculty members are involved in policy, but mostly away from the public eye. For example, two weeks ago I met with a White House official about privacy issues in the release of government data sets; I’ve also been involved in (executive branch) panels on government handling of biometric data. However, it is true that public participation in policy discussions by academics is limited. That may be because many academics realize they would make bad politicians; as Arvind notes, misaligned incentives also play a role.

Written by adamdsmith

February 2, 2010 at 2:02 pm

Posted in Data privacy

Tagged with ,

Military security

with one comment

Two interesting reads this morning on military security.

First, Matt Blaze blogs about physical and procedural security measures at a decommissioned nuclear ICBM silo. Startling pictures and lots of food for thought. The cultural and game theoretic aspects of our current conflicts are pretty different from those of the cold war; I was surprised to find myself looking back with something like nostalgia at the bright ideological lines of my childhood.

“MAD [Mutually Assured Destruction] may well be the most perfectly evocative acronym in the English language, but for 40 years, it actually seemed to work. Leaders on both sides evidently knew a Nash equilibrium when they saw one. […]

A few hundred of the successors to the Titans, the “Minuteman III” missiles, remain active in silos throughout the northern US. […] Looking up from the bottom of the silo at the little crack of sunlight 150 feet above, an obvious fact hit home for me. I realized at that moment that these things are actually aimed somewhere, somewhere not at all abstract.”


On an unrelated (?) topic, the Washington Post (discussed by Harry Lewis) reports that US drones have been transmitting video in the clear, and that militants in Iraq and Afghanistan have been watching avidly.

“The U.S. government has known about the flaw since the U.S. campaign in Bosnia in the 1990s… But the Pentagon assumed local adversaries wouldn’t know how to exploit it, [current and former] officials said.”

That explains it. But surely, in 15+ years, someone would have had time to patch the hole.

“Fixing the security gap would have caused delays, according to current and former military officials. It would have added to the Predator’s price.”

Huh? Software encryption is cheap! All right, maybe dedicated hardware costs more. Maybe even thousands of dollars. Ok, now I get it.

“Today, the Air Force is buying hundreds of Reaper drones, a newer model, whose video feeds could be intercepted in much the same way as with the Predators, according to people familiar with the matter. A Reaper costs between $10 million and $12 million…”

Never mind.

PS: Of course, the security of video transmissions may be the least of the problems that the extensive use of drones in Pakistan and Afghanistan raises. Reading the Post article this morning reminded me of this Fresh Air interview from a few months ago on the use of robotic weapons in general.

Written by adamdsmith

December 18, 2009 at 3:15 pm

Posted in Uncategorized

Differential privacy and the secrecy of the sample

with 5 comments

(This post was laid out lazily, using Luca‘s lovely latex2wp.)

— 1. Differential Privacy —

Differential privacy is a definition of “privacy” for statistical databases. Roughly, a statistical database is one which is used to provide aggregate, large-scale information about a population, without leaking information specific to individuals. Think, for example, of the data from government surveys (e.g. the decennial census or epidemiological studies), or data about a company’s customers that it would like a consultant to analyze.

The idea behind the definition is that users–that is, people getting access to aggregate information–should not be able to tell if a given individual’s data has been changed.

More formally, a data set is just a subset of items in a domain {D}. For a given data set {x\subset D}, we think of the server holding the data as applying a randomized algorithm {A}, producing a random variable {A(x)} (distributed over vectors, strings, charts, or whatever). We say two data sets {x,x'} are neighbors if they differ in one element, that is, {x\ \triangle\ x' =1}.

Definition 1 A randomized algorithm {A} is {\epsilon}-differentially private if, for all pairs of neighbor data sets {x,x'}, and for all events {S} in the output space of {A}:

\displaystyle \Pr(A(x)\in S) \leq e^\epsilon \Pr(A(x')\in S\,.

This definition has the flavor of indistinguishability in cryptography: it states that the random variables {A(x)} and {A(x')} must have similar distributions. The difference with the normal cryptographic setting is that the distance measure is multiplicative rather than additive. This is important for the semantics of differential privacy—see this paper for a discussion.

I hope to write a sequence of posts on differential privacy, mostly discussing aspects that don’t appear in published papers or that I feel escaped attention.

— 2. Sampling to Amplify Privacy —

To kick it off, I’ll prove here an “amplification” lemma for differential privacy. It was used implicitly in the design of an efficient, private PAC learner for the PARITY class in a FOCS 2008 paper by Shiva Kasiviswanathan, Homin Lee, Kobbi Nissim, Sofya Raskhodnikova and myself. But I think it is of much more general usefulness.

Roughly it states that given a {O(1)}-differentially private algorithm, one can get an {\epsilon}-differentially private algorithm at the cost of shrinking the size of the data set by a factor of {\epsilon}.

Suppose {A} is a {1}-differentially private algorithm that expects data sets from a domain {D} as input. Consider a new algorithm {A'}, which runs {A} on a random subsample of { \approx\epsilon n} points from its input:

Algorithm 2 (Algorithm {A'}) On input {\epsilon \in (0,1 )} and a multi-set {x\subseteq D}

  1. Construct a set {T\subseteq x} by selecting each element of {x} independently with probability {\epsilon}.
  2. Return {A(T)}.

Lemma 3 (Amplification via sampling) If {A} is {1}-differentially private, then for any {\epsilon\in(0,1)}, {A'(\epsilon,\cdot)} is {2\epsilon}-differentially private.

Read the rest of this entry »

Written by adamdsmith

September 2, 2009 at 12:19 pm

Locational Privacy

with one comment

Leo Reyzin pointed out the following EFF article on locational privacy. The article does a reasonable job of explaining the role that new technology can have in both violating and protecting privacy.

I can’t count how many times I’ve had a similar conversation (ok, monologue) with government folks and non-technical friends. Maybe next time I can spare myself the trouble and email them the link.

Written by adamdsmith

August 6, 2009 at 10:27 am

Posted in Uncategorized

Je m’en (af)fiche

with 4 comments

Tucked neatly between (US) Independence Day and (French) Bastille Day this year was YESS, a joint French-US workshop for “young scientists and engineers”, held at the French embassy. This year’s theme was Identity Management, which largely meant biometrics, database privacy and anonymous communication. I wasn’t sure from the initial invitation how serious the whole thing was (note to organizers: explicit mention of freely flowing champagne in a workshop invitation is appealing, but a little suspicious), but it turned out to be a great workshop.

Ok, almost great. There were a number of program/department overviews by French and US officials, some of which made me feel like Bill Gasarch did in this picture. (To the speakers’ credit, no one made jokes about freedom fries even though fries were served in the Embassy cafeteria.) But the scientific talks were interesting, and I got to meet a cross-section of French researchers I don’t normally interact with. I also learned a lot about Tor and it’s role in Iran from Roger Dingledine.

I walked away from “YESS” with a few distinct impressions.

  • It is extremely difficult to give a program or departmental overview that is fun to listen to.
  • The French are a few years into the process of switching to a competitive, proposal-driven mode of funding research.
  • A significant fraction of the biometrics/security community read the fuzzy extractor papers and got the point (in a nutshell: it is possible to be rigorous about the security properties of biometric systems). Unfortunately, that fraction is a lot less than 1.
  • Making scientific posters is a questionable use of anyone’s time.

The first point is self-explanatory, I’ll save the second for a future discussion and I haven’t figured out what to make of the third.

But the posters! Called post-ère in French (why not affiche? Nobody knew.), the scientific poster lies somewhere between talk slides and a regular written article. In its ideal form it provides both a quick overall impression of a work, for the casual passer-by, and more in-depth explanations, for the viewer with more time and patience. In practice, it is  hacked together in a hurry and provides neither. For a typical failure, see my own feeble attempt here.

However, the real problem with posters is that even the good ones don’t seem to get read. It is almost always more compelling to listen to a bad talk than to read a good poster. What gives? By all rights, a poster session should be more interesting than a typical session of talks since you can allocate your time and attention more flexibly. But it doesn’t work — posters just don’t seem to make people care.

What is the best format for allowing everyone to get a short overviews of all the papers, and also have the option of learning more about a given paper? Is the web pushing science into a post-post-ère era? Or do I just need to learn to read?

I am troubled by this since the IPAM workshop I’m co-organizing will have time for a relatively small number of talks, but we want to give all attendees a chance to present their ideas. The initial plan was to have a poster session, but I am having my doubts. Maybe a rump session with many five-minute talks is more effective, or  perhaps a combination of the two (five-minute poster ads)?

Written by adamdsmith

July 29, 2009 at 9:36 pm

Posted in Conferences

Systematization of Knowledge track at Oakland

with 5 comments

I am on the PC for “Oakland” this year (a.k.a. the IEEE Symposium on Security and Privacy).

I have been on the PC of a few conferences in areas outside my immediate expertise and so far I’ve enjoyed the experience. Usually, I am asked to join because they need someone to help them reject carefully review the few crypto/privacy papers that get submitted. Along the way, I get to learn about a different area of research, and about the taste in problems that is prevalent in the other community. Oakland is different because it is nominally about  my area, but the author community is essentially disjoint from the STOC/FOCS/Crypto/Eurocrypt crowd that I hang out with; consequently, the focus of the submissions is very different from that of the papers I am used to reading. I promise to share any (constructive…) comments I have on my experience.

Anyway, my main point: this year’s Oakland will include a new “systematization of knowledge” track. The call for papers says it all:

“The goal of this [track] is to encourage work that evaluates, systematizes, and contextualizes existing knowledge. These papers will provide a high value to our community but would otherwise not be accepted because they lack novel research contributions. Suitable papers include survey papers that provide useful perspectives on major research areas, papers that support or challenge long-held beliefs with compelling evidence, or papers that provide an extensive and realistic evaluation of competing approaches to solving specific problems. … [Submissions] will be reviewed by the full PC and held to the same standards as traditional research papers, except instead of emphasizing novel research contributions the emphasis will be on value to the community. Accepted papers will be presented at the symposium and included in the proceedings.”

Well-written regular papers already include a “systematization of knowledge” component in their related work sections: the obligation to summarize related papers often results in a clean, concise presentation of their high-level ideas. Unfortunately, the quality of the related work section rarely makes or breaks a conference submission, so mileage varies; hence the need for a separate track.

A few questions jump to mind:

  1. If this “systemization” track becomes standard, how will job candidates be viewed if their publication lists contain many such systemization papers? A successful textbook can dramatically increase a researcher’s profile; is the same true of survey papers?
  2. What areas of crypto/security/privacy are in direst need of “systemization”? Here a few suggestions for Oakland-appropriate topics:
    • definitions of security for anonymous communication systems (e.g. Tor)
    • techniques for de-anonymization of sanitized data (hopefully tying together papers published at Oakland, KDD, SIGMOD, VLDB, ICDE, etc)
    • Notions of “computation with encrypted data”: homomorphic encryption,predicate encryption, deterministic encryption, order-preserving encryption, etc.
  3. Assuming the Oakland systemization track is a success, what other conferences would benefit from adding such a track?

Written by adamdsmith

July 9, 2009 at 12:21 pm

Insensitive attributes

leave a comment »

And the award for best blog post title of the day goes to…

There is an Elephant in the Room; & Everyone’s Social Security Numbers are Written on Its Hide.

The post, by Richard Power, reports on an article by Alessandro Acquisti and Ralph Gross,Predicting Social Security numbers from public data”, (faq, PNAS paper) which highlights how one can narrow down a US citizen’s social security number to a relatively small range based only on his or her state and date of birth.

As the Social Security Administration explained (see the elephantine blog post linked above), this was not really a secret; the SSA’s algorithm for generating SSN’s is public. The virtue of the Acquisti-Gross article is in pointing out the security implications of this clearly.

One of the interesting notions the study puts to rest is the distinction between “insensitive” and “sensitive” attributes. Almost anything can be used to identify a person, and once someone has a handle on you it is remarkably easy to predict, or find out, even more.

Written by adamdsmith

July 7, 2009 at 10:23 pm

Posted in Data privacy

Tagged with , ,

Copyright, copywrong

with 2 comments

Michaels Nielsen and Mitzenmacher pointed out a recent post by Harvard’s Stuart Shieber about the “don’t ask, don’t tell” policy that is the implicit norm in scholarly publications, at least in computer science.

“Publishers officially forbid online distribution, authors do it anyway without telling the publishers, and publishers don’t ask them to stop even though it violates contractual obligations. What happens when you refuse to play that game?”

I recommend reading the whole thing. Shieber does post his papers online and, unlike many authors, he makes sure to attach an addendum to any copyright agreements with publishers to ensure that he is not in breach of contract. Publishers almost never complain, he says.

“In retrospect, this may make sense.  Since the contractual modification applies only to a single article by a single author, it is unlikely that anyone looking for copyright clearance would even know that all copyright hadn’t been assigned to the publisher.  And in any case publishers must realize that authors act as if they have a noncommercial distribution license…”

I will consider using the Science Commons addenda for future copyright agreements with publishers. But just to share my own story: When we submitted the final version of the fuzzy extractors paper to SICOMP (SIAM Journal on Computing), Leo Reyzin suggested we explicitly modify SIAM’s copyright agreement to make it a “publication agreement” that confers only non-exclusive publication rights to SIAM. The revised agreement let us retain all other publications rights, including free online distribution via sites of our choice. For my readers’ entertainment, here is our modified agreement with SIAM, which SIAM accepted without comment.

Finally, David Eppstein points out that free online journals make all the hassle so last century.

P.S.: For a great radio show about what people usually mean by “don’t ask, don’t tell”, listen to the June 16 episode of NPR’s Fresh Air, in which Terry Gross interviews Nathaniel Frank, author of Unfriendly Fire.

Written by adamdsmith

July 7, 2009 at 4:36 pm