Oddly Shaped Pegs

An inquiry into the Nature and Causes of Stuff

Archive for the ‘Getting Science Done’ Category

Tutorial Videos on and around Differential Privacy

leave a comment »

Aaron Roth and I organized a workshop on “Differential Privacy Across Computer Science” at DIMACS in the fall. Videos from the tutorials are now up (presumably they have been for a while, but I did not know it).
The tutorial speakers covered connections between DP and a range of areas:

  • Moritz Hardt: Differential private algorithms via learning theory
  • Gerome Miklau: Query optimization techniques from the DB community
  • Benjamin Pierce: Using PL techniques to automate and verify proofs of privacy
  • Aaron Roth: Game-theoretic perspectives on privacy
All four talks were excellent, and they are a great resource for people (interested in) getting into the field.
Those talks all assume at least passing familiarity with differential privacy. For a gentler introduction, my tutorial from CRYPTO 2012 is online. The first third or so of the talk is not on differential privacy at all, but rather surveys the attacks and privacy breaches that motivated approaches such as differential privacy.
Watching the video, I realize that my talk was very slow-paced, so you may prefer to just read the slides (or maybe watch the video at 2x ?):
Comments on any of the tutorials are welcome.

Written by adamdsmith

February 8, 2013 at 2:21 pm

ICITS 2012 — playing with the format

with 2 comments

I am the program chair for this year’s ICITS, the International Conference on Information-Theoretic Security. (The acronym is admittedly a bit of a mouthful. I like “ickets” as the pronunciation. That way, papers at ICITS are “pickets”, talks there are “tickets”, you get the idea.) ICITS will be held in Montreal right before CRYPTO, August 15-17, 2012.

ICITS occupies an interesting spot at the intersection of a few different fields: crypto, information theory, quantum computing and combinatorics. In the past, ICITS has worked like a normal computer science conference: papers are reviewed carefully, papers cannot have appeared at other conferences or journals, etc. However, because ICITS serves several different communities, the format has sometimes cost it good papers: some are lost to more specific or better-known venues in computer science, others are lost because conference “publication” doesn’t fit well with the culture in other fields, etc.

So to try to broaden participation and make the conference more scientifically useful, we’re shaking up the format this year with a two-track submission process. The “conference” track will operate like a traditional conference with the usual review process and published proceedings. The “workshop” track will operate more like an informal workshop, without published proceedings. Submissions to the former track will follow a traditional page-limited format. Submissions to the latter are much more flexible in format (they can range from full papers or to extended abstracts), and may consist of previously published papers or works in progress. For example, the workshop track would be a great place to come present your Crypto/Eurocrypt, QIP or ISIT paper to the other communities that work on info-theoretic security.

You can see the call for papers if you’re curious about the process. But most importantly, get your papers ready for submission! The deadlines are

  • March 12 for the regular track and
  • April 9 for workshop papers.

In addition to the contributed papers we will have a great slate of invited speakers from a broad range of disciplines. And did I mention that the program committee rocks?

Of course, the best part of this is that ICITS will be in Montreal in the summer time. Despite its French character, not all of Montreal goes on vacation in August (in fact, the city does shut down for two weeks, the “construction holidays”, that will be over by the time ICITS hits town). There are festivals, tasty food, nice weather and, for me, lots of friends and family to see.

So submit your papers! And attend!

Written by adamdsmith

February 9, 2012 at 12:05 am

Postdocs in data privacy at Penn State

leave a comment »

I have been excessively delinquent in posting to this blog for the last little while (ok, two years). But a postdoc announcement is a terrible thing to hide from public view in the current economy.

Postdoctoral positions in statistical, computational and learning-theoretic aspects of data privacy

As part of a joint project between Penn State, CMU and Cornell, we are inviting applications for several postdoctoral positions at Penn State University.

The principal investigators at Penn State are:
Sofya Raskhodnikova,
Aleksandra Slavkovic  and
Adam Smith.

The other principal investigators on this project are:
Stephen Fienberg (CMU) and
John Abowd (Cornell).

We are looking for strong candidates interested in algorithmic, cryptographic, statistical and learning-theoretic aspects of data privacy. Candidates should have a Ph.D. in statistics, computer science or a related field and a strong record of original research. The positions are for one year, extendable to up to three years. The starting date is negotiable.

The project spans a broad range of activities from the exploration of foundational theory to the development of concrete methodology for the social and economic sciences. Postdoctoral fellows may be involved in any of these aspects, depending on their interests and expertise. Extended research visits at CMU and Cornell are possible, though not necessary.

Interested candidates should send a CV and brief research statement, along with the names of three references, to one of the three Penn State investigators (sofya@cse.psu.edu, sesa@stat.psu.edu, asmith@psu.edu). Applications received before February 25, 2012 will receive full consideration. Applications will continue to considered after that date until the position is filled.

Looking for a different postdoc?

In case the opportunity above isn’t your cup of tea, here are some public service tips on where to look for postdoc announcements.

… and that’s pretty much it. The postdoc market, especially in CS, is ridiculously inefficient. That’s partly because many postdocs (like mine) are project specific, and partly because there’s just no good central repository of relevant jobs.

With that in mind, I will mention the postdoc position in the theory of privacy and economics at the University of Pennsylvania. If you really want to do a postdoc on data privacy, and the Penn State/CMU/Cornell position won’t work for you, then talk to Aaron Roth (or Mike Kearns, Sham Kakade or Mallesh Pai).

Written by adamdsmith

February 4, 2012 at 10:43 pm

Posted in Getting Science Done

Tagged with , , ,

IPAM Workshop Wrap-Up

with 2 comments

Last week was the Statistical and Learning-Theoretic Challenges in Data Privacy, which I co-organized with Cynthia Dwork, Steve Fienberg and Sesa Slavkovic. As I explained in my initial post on the workshop, the goal was to tie together work on privacy in statistical databases with the theoretical foundations of learning and statistics.

The workshop was a success. For one thing, I got a new result out of it and lots of ideas for  problems to work on. I even had fun most of the time1.

— A shift in tone —

More importantly, I felt a different tone in the conversations and talks at this workshop than at a previous ones involving a similar crowd. For the first time, most participants seemed to agree on what the important issues are. I’ve spent lots of time hanging out with statisticians recently, so this feeling may not have been shared by everyone. But one change was objectively clear: the statisticians in the crowd have become much better at describing their problems in computational terms. I distinctly remember encountering fierce resistance, at the original 2005 CS-Stats privacy workshop in Bertinoro, when we reductionist CS types tried to get statisticians to spell out the procedures they use to analyze social science data.

“Analysis requires judgement. It is as much art as science,” they said (which we translated as, “Recursion, shmecursion. We do not know our own programs!”).

“But can’t you try to pin down some common objectives?”, we answered.

This week, there were algorithms and well-defined objectives galore. It helped that we had some polyglots, like Martin Wainwright and Larry Wasserman, around to translate.

— The “computational lens” at work —

An interesting feature of several talks was the explicit role of “computational” perspective. Both Frank McSherry and Yuval Nardi used techniques from numerical analysis, namely gradient ascent and the Newton-Raphson method, to design protocols which were both more efficient and easier to analyze than previous attempts based on a more global, structural perspective. Frank described a differentially private algorithm for logistic regression, joint with Ollie Williams; Yuval described an efficient SFE protocol for linear regression, joint with Steve Fienberg, Rob Hall, and others.

— Two under-investigated ideas —

At the wrap-up session (see the notes), I pointed out two directions that I think have been investigated with much less rigor than they deserve:

“Cryptanalysis” for database privacy

It would be nice to have a systematic study of, and standard nomenclature for, attacks on privacy/anonymity in statistical databases. Right now it seems every paper ends up defining (or not defining) a model from scratch, yet many papers are doing essentially the same thing in different domains. Even an incomplete taxonomy would be helpful. Here are a few terms I’d like to see becoming standard:

  • linkage attack
  • reconstruction attack
  • composition attack (my personal favorite)

On a related point, it would be nice to see a good categorization of the kinds of side information that gets used. For example, Johannes Gehrke at Cornell and his students have a few papers laying out categories of side information (I have issues with some of the  positive results in those papers, but I think the quantification of side information is interesting).

Relaxed definitions of privacy with meaningful semantics

This is probably a topic for a much longer post, but briefly: it would be nice to see meaningful definitions of privacy in statistical databases that exploit the adversary’s uncertainty about the data. The normal approach to this is to specify a set of allowable prior distributions on the data (from the adversary’s point of view). However, one has to be careful. The versions I have seen are quite brittle.  Some properties to keep in mind when considering new definitions:

  • Composition
  • Side information: is the class of priors rich enough to incorporate complex side information, such as an anonymization of a related database? [see composition above]
  • Convexity and post-processing, as in Dan Kifer’s talk
  • Equivalent, “semantic” characterizations [e.g. here, here]

— Other notes —

  • The majority of the talks were completely or partly on differential privacy. Notable exceptions: Brad Malin, Xiaofeng Wang, Ravi Kumar, Jiashun Jin, Yuval Nardi. Our goal was not to have such a preponderance of differential privacy talks, but some of the people we expected to talk about other things (like Jerry Reiter) decided to focus on differential privacy. Tailoring the talk to the crowd?
  • The nonspeaker participants were heavily skewed towards CS. In particular, at least [see comments!] four professors (Gerome Miklau, Anupam Gupta, Jonathan Katz, Yevgeniy Dodis) and three postdocs (Katrina Liggett, Anand Sarwate, Arvind Narayanan) from CS departments attended just to listen to the talks; I recognized only one stats postdoc (Saki Kinney). I also recognized lots of UCLA locals there too from CS (Yuval Ishai, Rafi Ostrovsky, Amit Sahai) but none from statistics.
  • The rump session + posters combination worked very well (despite my earlier doubts). Rump session slides are online.

1 Serious sleep deprivation due to jet-lagged kids and talk prep  made the “fun” part occasionally difficult.

Written by adamdsmith

March 4, 2010 at 8:55 pm

Collaborative tools?

with 2 comments

Like most theorists I collaborate a lot with remote colleagues1. Recent technological developments like the telephone have made this much easier, but I am ready to move on to something even more advanced.

I have started to use this Internet thing. For example, I use cvs or subversion on most projects to track and synchronize files (see Suresh’s recent post for an explanation of why). I call people on Skype when the connection is good enough for video, and use a mix of email, IM and Wave2 to jot down notes in real time about the conversation.

One thing is missing in my collaborative online life, though, and that is a good white board. Ideally, I would like a live holographic image of my colleagues in my office, together with sound and a remotely controllable marker for my whiteboard. Until I write a paper with the doctor from Star Trek Voyager, I’m willing to settle for something less ambitious: an interactive white board on which several people can simultaneously draw, paste/drop in graphics, and write text and latex equations that get formatted in real time.

There are a few free web services and software packages out there that offer something along these lines:

  • Scriblink: this has a lot of what I want but a very clunky interface.
  • Imagination^3: better drawing interface, no latex/math support
  • Latex plugins for Gaim: adds simple latex support to most IM platforms. No drawing. Also, annoying to install on a Mac since it requires a bunch of other packages to be installed first.
  • Jarnal: haven’t tried it yet.

A question for readers, then: what other similar tools are out there? More generally, what software has improved your remote collaborations?

1 I mean physically remote. Some of them are also emotionally remote, but so far neither the telephone nor the Internet have helped them.

2 Although Google’s recent Buzz privacy fiasco made me reconsider using Wave.

Written by adamdsmith

February 14, 2010 at 9:49 pm

Systematization of Knowledge track at Oakland

with 5 comments

I am on the PC for “Oakland” this year (a.k.a. the IEEE Symposium on Security and Privacy).

I have been on the PC of a few conferences in areas outside my immediate expertise and so far I’ve enjoyed the experience. Usually, I am asked to join because they need someone to help them reject carefully review the few crypto/privacy papers that get submitted. Along the way, I get to learn about a different area of research, and about the taste in problems that is prevalent in the other community. Oakland is different because it is nominally about  my area, but the author community is essentially disjoint from the STOC/FOCS/Crypto/Eurocrypt crowd that I hang out with; consequently, the focus of the submissions is very different from that of the papers I am used to reading. I promise to share any (constructive…) comments I have on my experience.

Anyway, my main point: this year’s Oakland will include a new “systematization of knowledge” track. The call for papers says it all:

“The goal of this [track] is to encourage work that evaluates, systematizes, and contextualizes existing knowledge. These papers will provide a high value to our community but would otherwise not be accepted because they lack novel research contributions. Suitable papers include survey papers that provide useful perspectives on major research areas, papers that support or challenge long-held beliefs with compelling evidence, or papers that provide an extensive and realistic evaluation of competing approaches to solving specific problems. … [Submissions] will be reviewed by the full PC and held to the same standards as traditional research papers, except instead of emphasizing novel research contributions the emphasis will be on value to the community. Accepted papers will be presented at the symposium and included in the proceedings.”

Well-written regular papers already include a “systematization of knowledge” component in their related work sections: the obligation to summarize related papers often results in a clean, concise presentation of their high-level ideas. Unfortunately, the quality of the related work section rarely makes or breaks a conference submission, so mileage varies; hence the need for a separate track.

A few questions jump to mind:

  1. If this “systemization” track becomes standard, how will job candidates be viewed if their publication lists contain many such systemization papers? A successful textbook can dramatically increase a researcher’s profile; is the same true of survey papers?
  2. What areas of crypto/security/privacy are in direst need of “systemization”? Here a few suggestions for Oakland-appropriate topics:
    • definitions of security for anonymous communication systems (e.g. Tor)
    • techniques for de-anonymization of sanitized data (hopefully tying together papers published at Oakland, KDD, SIGMOD, VLDB, ICDE, etc)
    • Notions of “computation with encrypted data”: homomorphic encryption,predicate encryption, deterministic encryption, order-preserving encryption, etc.
  3. Assuming the Oakland systemization track is a success, what other conferences would benefit from adding such a track?

Written by adamdsmith

July 9, 2009 at 12:21 pm